On Fri, 09 Dec 2022 02:27:30 +0100, Marek Marczykowski-Górecki wrote:
Hi,
Under Xen PV dom0, with Linux >= 5.17, sound stops working after few hours. pavucontrol still shows meter bars moving, but the speakers remain silent. At least on some occasions I see the following message in dmesg:
[ 2142.484553] snd_hda_intel 0000:00:1f.3: Unstable LPIB (18144 >= 6396); disabling LPIB delay counting
I'm not sure if that happens before sound stops working, after, of if it's related at all, but that's pretty much the only sound-related error I found in logs. When the issue happens, on rare occasions it starts working again later for a short time, but generally the fix is to reboot. Reloading all snd_* modules (surprisingly) do not help. I don't know what exactly triggers the issue, sometimes is happen after short time like 15 minutes uptime, but usually after several hours. I guess it depends on usage pattern, but I haven't spotted any specific relation.
I managed to bisect it to this commit:
2c95b92ecd92e784785b1db8cccc4f0f2bfa850c is the first bad commit commit 2c95b92ecd92e784785b1db8cccc4f0f2bfa850c Author: Takashi Iwai <tiwai@suse.de> Date: Tue Nov 16 08:33:58 2021 +0100 ALSA: memalloc: Unify x86 SG-buffer handling (take#3) This is a second attempt to unify the x86-specific SG-buffer handling code with the new standard non-contiguous page handler. The first try (in commit 2d9ea39917a4) failed due to the wrong page and address calculations, hence reverted. (And the second try failed due to a copy&paste error.) Now it's corrected with the previous fix for noncontig pages, and the proper sg page iteration by this patch. After the migration, SNDRV_DMA_TYPE_DMA_SG becomes identical with SNDRV_DMA_TYPE_NONCONTIG on x86, while others still fall back to SNDRV_DMA_TYPE_DEV. Tested-by: Alex Xu (Hello71) <alex_y_xu@yahoo.ca> Tested-by: Harald Arnesen <harald@skogtun.org> Link: https://lore.kernel.org/r/20211017074859.24112-4-tiwai@suse.de Link: https://lore.kernel.org/r/20211109062235.22310-1-tiwai@suse.de Link: https://lore.kernel.org/r/20211116073358.19741-1-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de> include/sound/memalloc.h | 14 ++-- sound/core/Makefile | 1 - sound/core/memalloc.c | 53 ++++++++++++- sound/core/sgbuf.c | 201 ----------------------------------------------- 4 files changed, 56 insertions(+), 213 deletions(-) delete mode 100644 sound/core/sgbuf.c
I've seen further follow ups to this commit, but I still observe this issue on Linux 6.0.8.
I have observed this issue on KBL-based system, but I've got reports also from users of other platforms (including as old as Sandy Bridge).
I tried to include all relevant information above, but some more details can be found at original report at https://github.com/QubesOS/qubes-issues/issues/7465
Any ideas?
Hm, is it specific to Xen, i.e. if you run the normal kernel on the same machine, does it still work?
In anyway, please check the behavior with 6.1-rc8 + the commit cc26516374065a34e10c9a8bf3e940e42cd96e2a ALSA: memalloc: Allocate more contiguous pages for fallback case from for-next of my sound git tree (which will be in 6.2-rc1).
If the problem persists, another thing to check is the hack below works.
thanks,
Takashi
-- 8< -- --- a/sound/pci/hda/hda_intel.c +++ b/sound/pci/hda/hda_intel.c @@ -1808,9 +1808,16 @@ static int azx_create(struct snd_card *card, struct pci_dev *pci, if (err < 0) return err;
+#if 0 /* use the non-cached pages in non-snoop mode */ if (!azx_snoop(chip)) azx_bus(chip)->dma_type = SNDRV_DMA_TYPE_DEV_WC_SG; +#else + if (!azx_snoop(chip)) + azx_bus(chip)->dma_type = SNDRV_DMA_TYPE_DEV_SG; + else + azx_bus(chip)->dma_type = SNDRV_DMA_TYPE_DEV; +#endif
if (chip->driver_type == AZX_DRIVER_NVIDIA) { dev_dbg(chip->card->dev, "Enable delay in RIRB handling\n");