[alsa-devel] [PATCH] ALSA: hda - Disable runtime PM on LynxPoint(-LP) controllers
We got bug reports of the stalled HD-audio, typically after S3 or S4, and it turned out that they seemed triggered by runtime PM on Lynx Point and Lynx Point-LP controllers. As there is no way to recover properly from the stalled controller, it's safer to disable the runtime PM support on these chips for now.
Further notes: I actually could reproduce this on a few HP laptops here. Go to S3 after runtime suspend, then the next playback fails, resulting in either a codec stall or repeated sounds.
The problem seems lying in a deeper level. The complete stall could be avoided by disabling the call of azx_stop_chip() in azx_runtime_suspend(). More specifically, it's the disablement of CORB/RIRB in azx_free_cmd_io(). After removing this call, the sound is resumed.
However, even with that workaround, the first playback after resume stalls due to the missing RIRB interrupts (so you get "switch to polling mode" kernel warning). Interestingly, the codec communication in the resume procedure does work. The system goes to runtime suspend immediately after resume, then something gets broken at that point.
This missing interrupt problem happens even if you do nothing in runtime suspend/resume callback with empty callbacks. This implies that it's an issue in the underlying layer. So, the only feasible "fix" in the sound driver side to suppress the runtime PM, so far.
Cc: stable@vger.kernel.org Signed-off-by: Takashi Iwai tiwai@suse.de ---
Yet another note: the patch is based on v3.12, not on linux-next, so that it can be backported cleanly for 3.12 and earlier kernels.
sound/pci/hda/hda_intel.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c index 6e61a019aa5e..27fc33e54a50 100644 --- a/sound/pci/hda/hda_intel.c +++ b/sound/pci/hda/hda_intel.c @@ -3973,7 +3973,7 @@ static DEFINE_PCI_DEVICE_TABLE(azx_ids) = { .driver_data = AZX_DRIVER_PCH | AZX_DCAPS_INTEL_PCH_NOPM }, /* Lynx Point */ { PCI_DEVICE(0x8086, 0x8c20), - .driver_data = AZX_DRIVER_PCH | AZX_DCAPS_INTEL_PCH }, + .driver_data = AZX_DRIVER_PCH | AZX_DCAPS_INTEL_PCH_NOPM }, /* Wellsburg */ { PCI_DEVICE(0x8086, 0x8d20), .driver_data = AZX_DRIVER_PCH | AZX_DCAPS_INTEL_PCH }, @@ -3981,10 +3981,10 @@ static DEFINE_PCI_DEVICE_TABLE(azx_ids) = { .driver_data = AZX_DRIVER_PCH | AZX_DCAPS_INTEL_PCH }, /* Lynx Point-LP */ { PCI_DEVICE(0x8086, 0x9c20), - .driver_data = AZX_DRIVER_PCH | AZX_DCAPS_INTEL_PCH }, + .driver_data = AZX_DRIVER_PCH | AZX_DCAPS_INTEL_PCH_NOPM }, /* Lynx Point-LP */ { PCI_DEVICE(0x8086, 0x9c21), - .driver_data = AZX_DRIVER_PCH | AZX_DCAPS_INTEL_PCH }, + .driver_data = AZX_DRIVER_PCH | AZX_DCAPS_INTEL_PCH_NOPM }, /* Haswell */ { PCI_DEVICE(0x8086, 0x0a0c), .driver_data = AZX_DRIVER_SCH | AZX_DCAPS_INTEL_PCH |
(Adding Mengdong to cc)
On 11/19/2013 05:51 PM, Takashi Iwai wrote:
We got bug reports of the stalled HD-audio, typically after S3 or S4, and it turned out that they seemed triggered by runtime PM on Lynx Point and Lynx Point-LP controllers. As there is no way to recover properly from the stalled controller, it's safer to disable the runtime PM support on these chips for now.
Oh, this is a bit sad news. Have you talked to Intel about it?
Anyway, I saw something similar a while ago, but never with access to the hardware, and then it was difficult to reproduce for the person on the other side. Nevertheless, when I read through the PM code I found that the GCTL register was sometimes accessed with readb (although it is a 32 bit register), so I wrote a patch for that, but the testing results of this patch were a bit inconclusive, so I never upstreamed it.
Anyway, I'm attaching the draft patch. Do you think it could be related?
Further notes: I actually could reproduce this on a few HP laptops here. Go to S3 after runtime suspend, then the next playback fails, resulting in either a codec stall or repeated sounds.
The problem seems lying in a deeper level. The complete stall could be avoided by disabling the call of azx_stop_chip() in azx_runtime_suspend(). More specifically, it's the disablement of CORB/RIRB in azx_free_cmd_io(). After removing this call, the sound is resumed.
However, even with that workaround, the first playback after resume stalls due to the missing RIRB interrupts (so you get "switch to polling mode" kernel warning). Interestingly, the codec communication in the resume procedure does work. The system goes to runtime suspend immediately after resume, then something gets broken at that point.
This missing interrupt problem happens even if you do nothing in runtime suspend/resume callback with empty callbacks. This implies that it's an issue in the underlying layer. So, the only feasible "fix" in the sound driver side to suppress the runtime PM, so far.
Cc: stable@vger.kernel.org Signed-off-by: Takashi Iwai tiwai@suse.de
Yet another note: the patch is based on v3.12, not on linux-next, so that it can be backported cleanly for 3.12 and earlier kernels.
sound/pci/hda/hda_intel.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c index 6e61a019aa5e..27fc33e54a50 100644 --- a/sound/pci/hda/hda_intel.c +++ b/sound/pci/hda/hda_intel.c @@ -3973,7 +3973,7 @@ static DEFINE_PCI_DEVICE_TABLE(azx_ids) = { .driver_data = AZX_DRIVER_PCH | AZX_DCAPS_INTEL_PCH_NOPM }, /* Lynx Point */ { PCI_DEVICE(0x8086, 0x8c20),
.driver_data = AZX_DRIVER_PCH | AZX_DCAPS_INTEL_PCH },
/* Wellsburg */ { PCI_DEVICE(0x8086, 0x8d20), .driver_data = AZX_DRIVER_PCH | AZX_DCAPS_INTEL_PCH },.driver_data = AZX_DRIVER_PCH | AZX_DCAPS_INTEL_PCH_NOPM },
@@ -3981,10 +3981,10 @@ static DEFINE_PCI_DEVICE_TABLE(azx_ids) = { .driver_data = AZX_DRIVER_PCH | AZX_DCAPS_INTEL_PCH }, /* Lynx Point-LP */ { PCI_DEVICE(0x8086, 0x9c20),
.driver_data = AZX_DRIVER_PCH | AZX_DCAPS_INTEL_PCH },
/* Lynx Point-LP */ { PCI_DEVICE(0x8086, 0x9c21),.driver_data = AZX_DRIVER_PCH | AZX_DCAPS_INTEL_PCH_NOPM },
.driver_data = AZX_DRIVER_PCH | AZX_DCAPS_INTEL_PCH },
/* Haswell */ { PCI_DEVICE(0x8086, 0x0a0c), .driver_data = AZX_DRIVER_SCH | AZX_DCAPS_INTEL_PCH |.driver_data = AZX_DRIVER_PCH | AZX_DCAPS_INTEL_PCH_NOPM },
At Wed, 20 Nov 2013 09:54:42 +0100, David Henningsson wrote:
(Adding Mengdong to cc)
On 11/19/2013 05:51 PM, Takashi Iwai wrote:
We got bug reports of the stalled HD-audio, typically after S3 or S4, and it turned out that they seemed triggered by runtime PM on Lynx Point and Lynx Point-LP controllers. As there is no way to recover properly from the stalled controller, it's safer to disable the runtime PM support on these chips for now.
Oh, this is a bit sad news. Have you talked to Intel about it?
Anyway, I saw something similar a while ago, but never with access to the hardware, and then it was difficult to reproduce for the person on the other side. Nevertheless, when I read through the PM code I found that the GCTL register was sometimes accessed with readb (although it is a 32 bit register), so I wrote a patch for that, but the testing results of this patch were a bit inconclusive, so I never upstreamed it.
Anyway, I'm attaching the draft patch. Do you think it could be related?
It didn't change the behavior although the change looks good.
After a long debugging session in this morning, I finally nailed down. This was the fault in the sound driver after all, shamefully :)
The fix patch is below.
The code needs a bit clean up, and I have it, but will apply this for 3.14.
Takashi
===
From: Takashi Iwai tiwai@suse.de Subject: [PATCH 1/2] ALSA: hda - Fix unbalanced runtime PM notification at resume
When a codec is resumed, it keeps the power on while the resuming phase via hda_keep_power_on(), then turns down via snd_hda_power_down(). At that point, snd_hda_power_down() notifies the power down to the controller, and this may confuse the refcount if the codec was already powered up before the resume.
In the end result, the controller goes to runtime suspend even before the codec is kicked off to the power save, and the communication stalls happens.
The fix is to add the power-up notification together with hda_keep_power_on(), and clears the flag appropriately.
Cc: stable@vger.kernel.org Signed-off-by: Takashi Iwai tiwai@suse.de --- sound/pci/hda/hda_codec.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/sound/pci/hda/hda_codec.c b/sound/pci/hda/hda_codec.c index be60f5227b34..bada677df8a7 100644 --- a/sound/pci/hda/hda_codec.c +++ b/sound/pci/hda/hda_codec.c @@ -4058,6 +4058,10 @@ static void hda_call_codec_resume(struct hda_codec *codec) * in the resume / power-save sequence */ hda_keep_power_on(codec); + if (codec->pm_down_notified) { + codec->pm_down_notified = 0; + hda_call_pm_notify(codec->bus, true); + } hda_set_power_state(codec, AC_PWRST_D0); restore_shutup_pins(codec); hda_exec_init_verbs(codec);
At Wed, 20 Nov 2013 13:05:23 +0100, Takashi Iwai wrote:
At Wed, 20 Nov 2013 09:54:42 +0100, David Henningsson wrote:
(Adding Mengdong to cc)
On 11/19/2013 05:51 PM, Takashi Iwai wrote:
We got bug reports of the stalled HD-audio, typically after S3 or S4, and it turned out that they seemed triggered by runtime PM on Lynx Point and Lynx Point-LP controllers. As there is no way to recover properly from the stalled controller, it's safer to disable the runtime PM support on these chips for now.
Oh, this is a bit sad news. Have you talked to Intel about it?
Anyway, I saw something similar a while ago, but never with access to the hardware, and then it was difficult to reproduce for the person on the other side. Nevertheless, when I read through the PM code I found that the GCTL register was sometimes accessed with readb (although it is a 32 bit register), so I wrote a patch for that, but the testing results of this patch were a bit inconclusive, so I never upstreamed it.
Anyway, I'm attaching the draft patch. Do you think it could be related?
It didn't change the behavior although the change looks good.
After a long debugging session in this morning, I finally nailed down. This was the fault in the sound driver after all, shamefully :)
The fix patch is below.
The code needs a bit clean up, and I have it, but will apply this for 3.14.
Takashi
===
From: Takashi Iwai tiwai@suse.de Subject: [PATCH 1/2] ALSA: hda - Fix unbalanced runtime PM notification at resume
BTW, it's marked as 1/2 just because of the clean up patch I mentioned above. Only this one is needed for now.
Takashi
On 11/20/2013 02:32 PM, Takashi Iwai wrote:
At Wed, 20 Nov 2013 13:05:23 +0100, Takashi Iwai wrote:
At Wed, 20 Nov 2013 09:54:42 +0100, David Henningsson wrote:
(Adding Mengdong to cc)
On 11/19/2013 05:51 PM, Takashi Iwai wrote:
We got bug reports of the stalled HD-audio, typically after S3 or S4, and it turned out that they seemed triggered by runtime PM on Lynx Point and Lynx Point-LP controllers. As there is no way to recover properly from the stalled controller, it's safer to disable the runtime PM support on these chips for now.
Oh, this is a bit sad news. Have you talked to Intel about it?
Anyway, I saw something similar a while ago, but never with access to the hardware, and then it was difficult to reproduce for the person on the other side. Nevertheless, when I read through the PM code I found that the GCTL register was sometimes accessed with readb (although it is a 32 bit register), so I wrote a patch for that, but the testing results of this patch were a bit inconclusive, so I never upstreamed it.
Anyway, I'm attaching the draft patch. Do you think it could be related?
It didn't change the behavior although the change looks good.
It looks good indeed, but it's always scary to do subtle changes without testing on all the 50+ controllers we support...do you think I should submit a proper patch for it?
After a long debugging session in this morning, I finally nailed down. This was the fault in the sound driver after all, shamefully :)
The fix patch is below.
The code needs a bit clean up, and I have it, but will apply this for 3.14.
Takashi
===
From: Takashi Iwai tiwai@suse.de Subject: [PATCH 1/2] ALSA: hda - Fix unbalanced runtime PM notification at resume
BTW, it's marked as 1/2 just because of the clean up patch I mentioned above. Only this one is needed for now.
Ok, thanks for the clarification, and glad you finally found it! And so are you I guess :-)
participants (2)
-
David Henningsson
-
Takashi Iwai