On Wed, 10 Nov 2021 23:15:40 +0100, Kai Vehmanen wrote:
Hey,
On Wed, 10 Nov 2021, Takashi Iwai wrote:
On Wed, 10 Nov 2021 22:03:07 +0100, Kai Vehmanen wrote:
Fix a corner case between PCI device driver remove callback and runtime PM idle callback.
[...]
Some non-persistent direct links showing the bug trigger on different platforms with linux-next 20211109:
- https://intel-gfx-ci.01.org/tree/linux-next/next-20211109/fi-tgl-1115g4/igt@...
- https://intel-gfx-ci.01.org/tree/linux-next/next-20211109/fi-jsl-1/igt@i915_...
Notably with 20211110 linux-next, the bug does not trigger:
Is this the case with CONFIG_DEBUG_KOBJECT_RELEASE? This would be the only logical explanation I can think of for now.
hmm, that doesn't seem to be used. Here's a link to kconfig used in the failing CI run: https://intel-gfx-ci.01.org/tree/linux-next/next-20211109/kconfig.txt
OK, then it's not due to the delayed release, but the cause should be the same, I suppose.
It's still a bit odd, especially given Scott just reported the other HDA related regression in 5.15 today. The two issues don't seem to be related though, although both are fixed by clearing drvdata (but in different places of hda_intel.c).
I don't think it's the same issue, rather a coincidence of the timing. There have been many changes in 5.15, after all :)
I'll try to run some more tests tomorrow. The fix should be good in any case, but it would be interesting to understand better what change made this more (?) likely to hit than before. This is not a new test and the problem happens on fairly old platforms, so something has changed.
A potential problem with the current code is that it doesn't disable the runtime PM at the release procedure. Could you try the patch below? You can put WARN_ON(!chip) at azx_runtime_idle(), too, for catching the invalid runtime call.
thanks,
Takashi
--- a/sound/pci/hda/hda_intel.c +++ b/sound/pci/hda/hda_intel.c @@ -1347,8 +1347,13 @@ static void azx_free(struct azx *chip) if (hda->freed) return;
- if (azx_has_pm_runtime(chip) && chip->running) + if (azx_has_pm_runtime(chip) && chip->running) { pm_runtime_get_noresume(&pci->dev); + pm_runtime_forbid(&pci->dev); + pm_runtime_dont_use_autosuspend(&pci->dev); + pm_runtime_disable(&pci->dev); + } + chip->running = 0;
azx_del_card_list(chip); @@ -2320,6 +2325,7 @@ static int azx_probe_continue(struct azx *chip) set_default_power_save(chip);
if (azx_has_pm_runtime(chip)) { + pm_runtime_enable(&pci->dev); pm_runtime_use_autosuspend(&pci->dev); pm_runtime_allow(&pci->dev); pm_runtime_put_autosuspend(&pci->dev);