Intel HD Audio: sound stops working in Xen PV dom0 in >=5.17

Marek Marczykowski-Górecki marmarek at invisiblethingslab.com
Fri Jan 20 13:11:34 CET 2023


On Fri, Jan 20, 2023 at 08:26:09AM +0100, Takashi Iwai wrote:
> On Fri, 20 Jan 2023 03:24:30 +0100,
> Marek Marczykowski-Górecki wrote:
> > 
> > On Fri, Jan 20, 2023 at 02:10:37AM +0100, Marek Marczykowski-Górecki wrote:
> > > On Wed, Jan 18, 2023 at 01:39:56PM +0100, Takashi Iwai wrote:
> > > > On Wed, 18 Jan 2023 11:39:18 +0100,
> > > > Marek Marczykowski-Górecki wrote:
> > > > > 
> > > > > On Wed, Jan 18, 2023 at 09:59:26AM +0100, Takashi Iwai wrote:
> > > > > > On Tue, 17 Jan 2023 21:34:11 +0100,
> > > > > > Marek Marczykowski-Górecki wrote:
> > > > > > > 
> > > > > > > On Tue, Jan 17, 2023 at 05:52:25PM +0100, Takashi Iwai wrote:
> > > > > > > > On Tue, 17 Jan 2023 17:49:28 +0100,
> > > > > > > > Marek Marczykowski-Górecki wrote:
> > > > > > > > > 
> > > > > > > > > On Tue, Jan 17, 2023 at 03:33:42PM +0100, Takashi Iwai wrote:
> > > > > > > > > > On Tue, 17 Jan 2023 15:21:23 +0100,
> > > > > > > > > > Marek Marczykowski-Górecki wrote:
> > > > > > > > > > > 
> > > > > > > > > > > On Tue, Jan 17, 2023 at 12:36:28PM +0100, Marek Marczykowski-Górecki wrote:
> > > > > > > > > > > > On Tue, Jan 17, 2023 at 08:58:57AM +0100, Takashi Iwai wrote:
> > > > > > > > > > > > > On Mon, 16 Jan 2023 16:55:11 +0100,
> > > > > > > > > > > > > Takashi Iwai wrote:
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > On Tue, 27 Dec 2022 16:26:54 +0100,
> > > > > > > > > > > > > > Marek Marczykowski-Górecki wrote:
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > On Thu, Dec 22, 2022 at 09:09:15AM +0100, Takashi Iwai wrote:
> > > > > > > > > > > > > > > > On Sat, 10 Dec 2022 17:17:42 +0100,
> > > > > > > > > > > > > > > > Marek Marczykowski-Górecki wrote:
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > On Sat, Dec 10, 2022 at 02:00:06AM +0100, Marek Marczykowski-Górecki wrote:
> > > > > > > > > > > > > > > > > > On Fri, Dec 09, 2022 at 01:40:15PM +0100, Marek Marczykowski-Górecki wrote:
> > > > > > > > > > > > > > > > > > > On Fri, Dec 09, 2022 at 09:10:19AM +0100, Takashi Iwai wrote:
> > > > > > > > > > > > > > > > > > > > On Fri, 09 Dec 2022 02:27:30 +0100,
> > > > > > > > > > > > > > > > > > > > Marek Marczykowski-Górecki wrote:
> > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > Hi,
> > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > Under Xen PV dom0, with Linux >= 5.17, sound stops working after few
> > > > > > > > > > > > > > > > > > > > > hours. pavucontrol still shows meter bars moving, but the speakers
> > > > > > > > > > > > > > > > > > > > > remain silent. At least on some occasions I see the following message in
> > > > > > > > > > > > > > > > > > > > > dmesg:
> > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > >   [ 2142.484553] snd_hda_intel 0000:00:1f.3: Unstable LPIB (18144 >= 6396); disabling LPIB delay counting
> > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > Hit the issue again, this message did not appear in the log (or at least
> > > > > > > > > > > > > > > > > > not yet).
> > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > (...)
> > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > In anyway, please check the behavior with 6.1-rc8 + the commit
> > > > > > > > > > > > > > > > > > > > cc26516374065a34e10c9a8bf3e940e42cd96e2a
> > > > > > > > > > > > > > > > > > > >     ALSA: memalloc: Allocate more contiguous pages for fallback case
> > > > > > > > > > > > > > > > > > > > from for-next of my sound git tree (which will be in 6.2-rc1).
> > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > This did not helped.
> > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > Looking at the mentioned commits, there is one specific aspect of Xen PV
> > > > > > > > > > > > > > > > > > > that may be relevant. It configures PAT differently than native Linux.
> > > > > > > > > > > > > > > > > > > Theoretically Linux adapts automatically and using proper API (like
> > > > > > > > > > > > > > > > > > > set_memory_wc()) should just work, but at least for i915 driver it
> > > > > > > > > > > > > > > > > > > causes issues (not fully tracked down yet). Details about that bug
> > > > > > > > > > > > > > > > > > > report include some more background:
> > > > > > > > > > > > > > > > > > > https://lore.kernel.org/intel-gfx/Y5Hst0bCxQDTN7lK@mail-itl/
> > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > Anyway, I have tested it on a Xen modified to setup PAT the same way as
> > > > > > > > > > > > > > > > > > > native Linux and the audio issue is still there.
> > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > If the problem persists, another thing to check is the hack below
> > > > > > > > > > > > > > > > > > > > works.
> > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > Trying this one now.
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > And this one didn't either :/
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > (Sorry for the late reply, as I've been off in the last weeks.)
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > I think the hack doesn't influence on the PCM buffer pages, but only
> > > > > > > > > > > > > > > > about BDL pages.  Could you check the patch below instead?
> > > > > > > > > > > > > > > > It'll disable the SG-buffer handling on x86 completely. 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > This seems to "fix" the issue, thanks!
> > > > > > > > > > > > > > > I guess I'll run it this way for now, but a proper solution would be
> > > > > > > > > > > > > > > nice. Let me know if I can collect any more info that would help with
> > > > > > > > > > > > > > > that.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Then we seem to go back again with the coherent memory allocation for
> > > > > > > > > > > > > > the fallback sg cases.  It was changed because the use of
> > > > > > > > > > > > > > dma_alloc_coherent() caused a problem with IOMMU case for retrieving
> > > > > > > > > > > > > > the page addresses, but since the commit 9736a325137b, we essentially
> > > > > > > > > > > > > > avoid the fallback when IOMMU is used, so it should be fine again.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Let me know if the patch like below works for you instead of the
> > > > > > > > > > > > > > previous hack to disable SG-buffer (note: totally untested!)
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Gah, there was an obvious typo, scratch that.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Below is a proper patch.  Please try this one instead.
> > > > > > > > > > > > 
> > > > > > > > > > > > Thanks, I'll give it a try.
> > > > > > > > > > > 
> > > > > > > > > > > Unfortunately, it doesn't help, it stopped working again, after about 3h
> > > > > > > > > > > uptime.
> > > > > > > > > > 
> > > > > > > > > > Aha, then it might be rather other way round;
> > > > > > > > > > dma_alloc_noncontiguous() doesn't work on Xen properly.
> > > > > > > > > > 
> > > > > > > > > > Could you try the one below instead of the previous?
> > > > > > > > > 
> > > > > > > > > Unfortunately, this one doesn't fix it either :/
> > > > > > > > 
> > > > > > > > Hmm.  Then how about applying both of the last two patches?  The last
> > > > > > > > one to enforce the fallback allocation and the previous one to use
> > > > > > > > dma_alloc_coherent().  It should be essentially reverting to the old
> > > > > > > > way.
> > > > > > > 
> > > > > > > Oh, I noticed only now: the last patch made it fail to initialize.
> > > > > > 
> > > > > > The "last patch" means the patch to enforce the fallback allocation?
> > > > > 
> > > > > Yes, the one about dma_alloc_noncontiguous().
> > > > > 
> > > > > > > I
> > > > > > > don't see obvious errors in dmesg, but when trying aplay, I get:
> > > > > > > 
> > > > > > >     ALSA lib pcm_direct.c:1284:(snd1_pcm_direct_initialize_slave) unable to install hw params
> > > > > > >     ALSA lib pcm_dmix.c:1087:(snd_pcm_dmix_open) unable to initialize slave
> > > > > > >     aplay: main:830: audio open error: Cannot allocate memory
> > > > > > 
> > > > > > It's -ENOMEM, so it must be from there.  Does it appear always?  If
> > > > > > yes, your system is with IOMMU, and the patch made return always NULL
> > > > > > intentionally.
> > > > > 
> > > > > While the system do have IOMMU, it isn't configured by Linux, but by
> > > > > Xen. And it maps all the memory that Linux see.
> > > > > 
> > > > > > If that's the case, the problem is that IOMMU doesn't handle the
> > > > > > coherent memory on Xen.
> > > > > > 
> > > > > > Please check more explicitly, whether get_dma_ops(dmab->dev.dev) call
> > > > > > in snd_dma_noncontig_alloc() returns NULL or not.
> > > > > 
> > > > > Will do.
> > > > 
> > > > If get_dma_ops() is non-NULL, 
> > > 
> > > Yes, it's non-NULL.
> > > 
> > > > it means we need some Xen-specific
> > > > workaround not to use dma_alloc_noncontiguous().
> > > > What's the best way to see whether the driver is running on Xen PV?
> > > 
> > > Usually it's this: cpu_feature_enabled(X86_FEATURE_XENPV)
> > > 
> > > > Meanwhile, it's helpful if you can try the combo of my last two
> > > > patches, too.  It should work, and if it doesn't, it implies that
> > > > we're looking at a wrong place.
> > > 
> > > It doesn't because the last of them causes "Cannot allocate memory".
> > > I'm trying now with this on top:
> > > 
> > > ---8<---
> > > diff --git a/sound/core/memalloc.c b/sound/core/memalloc.c
> > > index 97d7b8106869..e927d18d1ebb 100644
> > > --- a/sound/core/memalloc.c
> > > +++ b/sound/core/memalloc.c
> > > @@ -545,7 +545,7 @@ static void *snd_dma_noncontig_alloc(struct snd_dma_buffer *dmab, size_t size)
> > >  	// sgt = dma_alloc_noncontiguous(dmab->dev.dev, size, dmab->dev.dir,
> > >  	//	      DEFAULT_GFP, 0);
> > >  #ifdef CONFIG_SND_DMA_SGBUF
> > > -	if (!sgt && !get_dma_ops(dmab->dev.dev)) {
> > > +	if (!sgt) { // && !get_dma_ops(dmab->dev.dev)) {
> > >  		if (dmab->dev.type == SNDRV_DMA_TYPE_DEV_WC_SG)
> > >  			dmab->dev.type = SNDRV_DMA_TYPE_DEV_WC_SG_FALLBACK;
> > >  		else
> > > ---8<---
> > 
> > Unfortunately, the above doesn't help. I mean, I don't get an error
> > anymore, but no sound output either (even though pavucontrol says I
> > should hear it). So, it's like the original issue, but without any
> > delay, just straight from the start.
> 
> Hmm, it's the result with the combination of both patches, right?

Yes.

> What I meant as the combo is something like below.

Something like this, yes.

BTW, xen_domain() will also return true on PVH/HVM domain, which should
not need any of this special treatment. It's PV that is weird.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
URL: <http://mailman.alsa-project.org/pipermail/alsa-devel/attachments/20230120/13bbcb87/attachment.sig>


More information about the Alsa-devel mailing list