Intel HD Audio: sound stops working in Xen PV dom0 in >=5.17

Marek Marczykowski-Górecki marmarek at invisiblethingslab.com
Fri Dec 9 13:40:15 CET 2022


On Fri, Dec 09, 2022 at 09:10:19AM +0100, Takashi Iwai wrote:
> On Fri, 09 Dec 2022 02:27:30 +0100,
> Marek Marczykowski-Górecki wrote:
> > 
> > Hi,
> > 
> > Under Xen PV dom0, with Linux >= 5.17, sound stops working after few
> > hours. pavucontrol still shows meter bars moving, but the speakers
> > remain silent. At least on some occasions I see the following message in
> > dmesg:
> > 
> >   [ 2142.484553] snd_hda_intel 0000:00:1f.3: Unstable LPIB (18144 >= 6396); disabling LPIB delay counting
> > 
> > I'm not sure if that happens before sound stops working, after, of if
> > it's related at all, but that's pretty much the only sound-related error
> > I found in logs.
> > When the issue happens, on rare occasions it starts working again later
> > for a short time, but generally the fix is to reboot. Reloading all
> > snd_* modules (surprisingly) do not help. I don't know what exactly
> > triggers the issue, sometimes is happen after short time like 15 minutes
> > uptime, but usually after several hours. I guess it depends on usage
> > pattern, but I haven't spotted any specific relation.
> > 
> > I managed to bisect it to this commit:
> > 
> >     2c95b92ecd92e784785b1db8cccc4f0f2bfa850c is the first bad commit
> >     commit 2c95b92ecd92e784785b1db8cccc4f0f2bfa850c
> >     Author: Takashi Iwai <tiwai at suse.de>
> >     Date:   Tue Nov 16 08:33:58 2021 +0100
> > 
> >         ALSA: memalloc: Unify x86 SG-buffer handling (take#3)
> >         
> >         This is a second attempt to unify the x86-specific SG-buffer handling
> >         code with the new standard non-contiguous page handler.
> >         
> >         The first try (in commit 2d9ea39917a4) failed due to the wrong page
> >         and address calculations, hence reverted.  (And the second try failed
> >         due to a copy&paste error.)  Now it's corrected with the previous fix
> >         for noncontig pages, and the proper sg page iteration by this patch.
> >         
> >         After the migration, SNDRV_DMA_TYPE_DMA_SG becomes identical with
> >         SNDRV_DMA_TYPE_NONCONTIG on x86, while others still fall back to
> >         SNDRV_DMA_TYPE_DEV.
> >         
> >         Tested-by: Alex Xu (Hello71) <alex_y_xu at yahoo.ca>
> >         Tested-by: Harald Arnesen <harald at skogtun.org>
> >         Link: https://lore.kernel.org/r/20211017074859.24112-4-tiwai@suse.de
> >         Link: https://lore.kernel.org/r/20211109062235.22310-1-tiwai@suse.de
> >         Link: https://lore.kernel.org/r/20211116073358.19741-1-tiwai@suse.de
> >         Signed-off-by: Takashi Iwai <tiwai at suse.de>
> > 
> >      include/sound/memalloc.h |  14 ++--
> >      sound/core/Makefile      |   1 -
> >      sound/core/memalloc.c    |  53 ++++++++++++-
> >      sound/core/sgbuf.c       | 201 -----------------------------------------------
> >      4 files changed, 56 insertions(+), 213 deletions(-)
> >      delete mode 100644 sound/core/sgbuf.c
> > 
> > I've seen further follow ups to this commit, but I still observe this
> > issue on Linux 6.0.8.
> > 
> > I have observed this issue on KBL-based system, but I've got reports
> > also from users of other platforms (including as old as Sandy Bridge).
> > 
> > I tried to include all relevant information above, but some more details
> > can be found at original report at
> > https://github.com/QubesOS/qubes-issues/issues/7465
> > 
> > Any ideas?
> 
> Hm, is it specific to Xen, i.e. if you run the normal kernel on the
> same machine, does it still work?

I don't know if that's specific to Xen, but I assume if it wouldn't be,
there would be a lot more bug reports. I can't think of any other
relevant difference. Unfortunately, I can't run Linux without Xen on
this system long enough to confirm.


> In anyway, please check the behavior with 6.1-rc8 + the commit
> cc26516374065a34e10c9a8bf3e940e42cd96e2a
>     ALSA: memalloc: Allocate more contiguous pages for fallback case
> from for-next of my sound git tree (which will be in 6.2-rc1).

Looking at the mentioned commits, there is one specific aspect of Xen PV
that may be relevant. It configures PAT differently than native Linux.
Theoretically Linux adapts automatically and using proper API (like
set_memory_wc()) should just work, but at least for i915 driver it
causes issues (not fully tracked down yet). Details about that bug
report include some more background:
https://lore.kernel.org/intel-gfx/Y5Hst0bCxQDTN7lK@mail-itl/

Anyway, I have tested it on a Xen modified to setup PAT the same way as
native Linux and the audio issue is still there.

> If the problem persists, another thing to check is the hack below
> works.

Thanks, I'll check both and report back.

> thanks,
> 
> Takashi
> 
> -- 8< --
> --- a/sound/pci/hda/hda_intel.c
> +++ b/sound/pci/hda/hda_intel.c
> @@ -1808,9 +1808,16 @@ static int azx_create(struct snd_card *card, struct pci_dev *pci,
>  	if (err < 0)
>  		return err;
>  
> +#if 0
>  	/* use the non-cached pages in non-snoop mode */
>  	if (!azx_snoop(chip))
>  		azx_bus(chip)->dma_type = SNDRV_DMA_TYPE_DEV_WC_SG;
> +#else
> +	if (!azx_snoop(chip))
> +		azx_bus(chip)->dma_type = SNDRV_DMA_TYPE_DEV_SG;
> +	else
> +		azx_bus(chip)->dma_type = SNDRV_DMA_TYPE_DEV;
> +#endif
>  
>  	if (chip->driver_type == AZX_DRIVER_NVIDIA) {
>  		dev_dbg(chip->card->dev, "Enable delay in RIRB handling\n");

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
URL: <http://mailman.alsa-project.org/pipermail/alsa-devel/attachments/20221209/7a13c22a/attachment.sig>


More information about the Alsa-devel mailing list