[alsa-devel] HDA controller w/o CLKSTOP and EPSS support
Hi Takashi,
Henning Kühn reports that due to my commit 07f4f97d7b4b ("vga_switcheroo: Use device link for HDA controller"), the discrete GPU on his hybrid graphics laptop no longer runtime suspends.
The root cause is that the single codec of the GPU's HDA controller doesn't support CLKSTOP and EPSS. (The "Supported Power States" are 0x00000009, i.e. CLKSTOP and EPSS bits are not set, cf. page 209 of the HDA spec.)
This means that in hda_codec_runtime_suspend() we do not call snd_hdac_codec_link_down():
if (codec_has_clkstop(codec) && codec_has_epss(codec) && (state & AC_PWRST_CLK_STOP_OK)) snd_hdac_codec_link_down(&codec->core);
If snd_hdac_codec_link_down() isn't called, the bit in the codec_powered bitmask isn't cleared, which in turn prevents the controller from going to PCI_D3hot in azx_runtime_idle():
if (!power_save_controller || !azx_has_pm_runtime(chip) || azx_bus(chip)->codec_powered || !chip->running) return -EBUSY;
The codec does runtime suspend to D3, but the PS-ClkStopOk bit in the response to "Get Power State" is not set. (Response is 0x00000033, cf. page 151 of the HD Audio spec.) Hence the check above "state & AC_PWRST_CLK_STOP_OK" also results in "false".
I'm not familiar enough with the intricacies of the HD Audio spec to fully comprehend the implications of missing EPSS and CLKSTOP support and to come up with a fix. We could quirk any HDA controller in a vga_switcheroo setup to ignore missing EPSS and CLKSTOP support, but would that be safe? E.g. the spec says that if the bus clock does stop, "a full reset shall be performed" when the clock is reenabled. Are we handling this correctly?
Any help in coming up with a proper fix would be greatly appreciated.
dmesg output is available here: https://bugs.freedesktop.org/show_bug.cgi?id=106957
It's a muxed hybrid graphics machine, the HDA controller has PCI ID 1002:aa60.
Thanks!
Lukas
On Wed, 20 Jun 2018 11:34:52 +0200, Lukas Wunner wrote:
Hi Takashi,
Henning Kühn reports that due to my commit 07f4f97d7b4b ("vga_switcheroo: Use device link for HDA controller"), the discrete GPU on his hybrid graphics laptop no longer runtime suspends.
The root cause is that the single codec of the GPU's HDA controller doesn't support CLKSTOP and EPSS. (The "Supported Power States" are 0x00000009, i.e. CLKSTOP and EPSS bits are not set, cf. page 209 of the HDA spec.)
This means that in hda_codec_runtime_suspend() we do not call snd_hdac_codec_link_down():
if (codec_has_clkstop(codec) && codec_has_epss(codec) && (state & AC_PWRST_CLK_STOP_OK)) snd_hdac_codec_link_down(&codec->core);
If snd_hdac_codec_link_down() isn't called, the bit in the codec_powered bitmask isn't cleared, which in turn prevents the controller from going to PCI_D3hot in azx_runtime_idle():
if (!power_save_controller || !azx_has_pm_runtime(chip) || azx_bus(chip)->codec_powered || !chip->running) return -EBUSY;
The codec does runtime suspend to D3, but the PS-ClkStopOk bit in the response to "Get Power State" is not set. (Response is 0x00000033, cf. page 151 of the HD Audio spec.) Hence the check above "state & AC_PWRST_CLK_STOP_OK" also results in "false".
I'm not familiar enough with the intricacies of the HD Audio spec to fully comprehend the implications of missing EPSS and CLKSTOP support and to come up with a fix. We could quirk any HDA controller in a vga_switcheroo setup to ignore missing EPSS and CLKSTOP support, but would that be safe? E.g. the spec says that if the bus clock does stop, "a full reset shall be performed" when the clock is reenabled. Are we handling this correctly?
I guess it would work with a quirk. The EPSS and CLKSTOP checks are just to assure the modern codec PM, and GPU is always exceptional.
Supposing that it's AMD GPU, does a fix like below work?
thanks,
Takashi
-- 8< -- --- a/sound/pci/hda/hda_codec.c +++ b/sound/pci/hda/hda_codec.c @@ -2899,8 +2899,9 @@ static int hda_codec_runtime_suspend(struct device *dev) list_for_each_entry(pcm, &codec->pcm_list_head, list) snd_pcm_suspend_all(pcm->pcm); state = hda_call_codec_suspend(codec); - if (codec_has_clkstop(codec) && codec_has_epss(codec) && - (state & AC_PWRST_CLK_STOP_OK)) + if (codec->link_down_at_suspend || + (codec_has_clkstop(codec) && codec_has_epss(codec) && + (state & AC_PWRST_CLK_STOP_OK))) snd_hdac_codec_link_down(&codec->core); snd_hdac_link_power(&codec->core, false); return 0; diff --git a/sound/pci/hda/hda_codec.h b/sound/pci/hda/hda_codec.h index 681c360f29f9..5b00c1eb857e 100644 --- a/sound/pci/hda/hda_codec.h +++ b/sound/pci/hda/hda_codec.h @@ -258,6 +258,7 @@ struct hda_codec { unsigned int power_save_node:1; /* advanced PM for each widget */ unsigned int auto_runtime_pm:1; /* enable automatic codec runtime pm */ unsigned int force_pin_prefix:1; /* Add location prefix */ + unsigned int link_down_at_suspend:1; /* force to link down at suspend */ #ifdef CONFIG_PM unsigned long power_on_acct; unsigned long power_off_acct; diff --git a/sound/pci/hda/patch_hdmi.c b/sound/pci/hda/patch_hdmi.c index 8840daf9c6a3..98e1c411c56a 100644 --- a/sound/pci/hda/patch_hdmi.c +++ b/sound/pci/hda/patch_hdmi.c @@ -3741,6 +3741,11 @@ static int patch_atihdmi(struct hda_codec *codec)
spec->chmap.channels_max = max(spec->chmap.channels_max, 8u);
+ /* AMD GPUs have neither EPSS nor CLKSTOP bits, hence preventing + * the link-down as is. Tell the core to allow it. + */ + codec->link_down_at_suspend = 1; + return 0; }
On Wed, 20 Jun 2018 12:14:32 +0200, Takashi Iwai tiwai@suse.de wrote:
I guess it would work with a quirk. The EPSS and CLKSTOP checks are just to assure the modern codec PM, and GPU is always exceptional.
Supposing that it's AMD GPU, does a fix like below work?
The suggested fix restores the previous behavior: the dGPU is properly powered down. But this previous behavior is really broken in other ways, so I'm now wondering if it could work any better than that.
On kernels <4.17 and on 4.17 with that patch applied the notebook screen turns off completely when running things with DRI_PRIME=1 and only comes back a few seconds after the process ends. glxinfo is showing radeon instead of intel as expected, but with a blank screen, it's useless.
On kernel 4.17 without the patch, when the dGPU is constantly on, I can have intel render things with DRI_PRIME=0 and radeon with DRI_PRIME=1 without the screen turning off. Also, switches between vt and X are now instant and external displays are working, which wasn't the case before. Why is this now working suddenly? Is the dGPU rendering all of the desktop when it's always on anyway? Or is the iGPU rendering the desktop and the dGPU could potentially be suspended when not in use, if it was just done the right way?
I used to accept this broken behavior and just changed the BIOS setting for the GPU from "switchable" to "discrete" when I wanted to actually use the dGPU. Initially when I reported the bug I just wanted to find a way to suspend the dGPU again to save power. But now that I've seen my notebook working like this I'd like to have both: a powered down dGPU when not in use and properly working DRI_PRIME.
Any ideas what's up with the current situation or should I file a new bug report?
Kind regards Henning
On Thu, 21 Jun 2018 00:28:37 +0200, prg@cooco.de wrote:
On Wed, 20 Jun 2018 12:14:32 +0200, Takashi Iwai tiwai@suse.de wrote:
I guess it would work with a quirk. The EPSS and CLKSTOP checks are just to assure the modern codec PM, and GPU is always exceptional.
Supposing that it's AMD GPU, does a fix like below work?
The suggested fix restores the previous behavior: the dGPU is properly powered down. But this previous behavior is really broken in other ways, so I'm now wondering if it could work any better than that.
On kernels <4.17 and on 4.17 with that patch applied the notebook screen turns off completely when running things with DRI_PRIME=1 and only comes back a few seconds after the process ends. glxinfo is showing radeon instead of intel as expected, but with a blank screen, it's useless.
On kernel 4.17 without the patch, when the dGPU is constantly on, I can have intel render things with DRI_PRIME=0 and radeon with DRI_PRIME=1 without the screen turning off. Also, switches between vt and X are now instant and external displays are working, which wasn't the case before. Why is this now working suddenly? Is the dGPU rendering all of the desktop when it's always on anyway? Or is the iGPU rendering the desktop and the dGPU could potentially be suspended when not in use, if it was just done the right way?
I used to accept this broken behavior and just changed the BIOS setting for the GPU from "switchable" to "discrete" when I wanted to actually use the dGPU. Initially when I reported the bug I just wanted to find a way to suspend the dGPU again to save power. But now that I've seen my notebook working like this I'd like to have both: a powered down dGPU when not in use and properly working DRI_PRIME.
Any ideas what's up with the current situation or should I file a new bug report?
It's a hybrid graphics, not switchable one, right? If so, the symptom sounds like that Intel side is turned off mistakenly just because the rendering is done in dGPU.
In anyway I think we should go forward with this patch to fix the runtime PM in audio side.
thanks,
Takashi
On Wed, Jun 20, 2018 at 12:14:32PM +0200, Takashi Iwai wrote:
On Wed, 20 Jun 2018 11:34:52 +0200, Lukas Wunner wrote:
Henning Kühn reports that due to my commit 07f4f97d7b4b ("vga_switcheroo: Use device link for HDA controller"), the discrete GPU on his hybrid graphics laptop no longer runtime suspends.
The root cause is that the single codec of the GPU's HDA controller doesn't support CLKSTOP and EPSS. (The "Supported Power States" are 0x00000009, i.e. CLKSTOP and EPSS bits are not set, cf. page 209 of the HDA spec.)
This means that in hda_codec_runtime_suspend() we do not call snd_hdac_codec_link_down():
if (codec_has_clkstop(codec) && codec_has_epss(codec) && (state & AC_PWRST_CLK_STOP_OK)) snd_hdac_codec_link_down(&codec->core);
If snd_hdac_codec_link_down() isn't called, the bit in the codec_powered bitmask isn't cleared, which in turn prevents the controller from going to PCI_D3hot in azx_runtime_idle():
if (!power_save_controller || !azx_has_pm_runtime(chip) || azx_bus(chip)->codec_powered || !chip->running) return -EBUSY;
The codec does runtime suspend to D3, but the PS-ClkStopOk bit in the response to "Get Power State" is not set. (Response is 0x00000033, cf. page 151 of the HD Audio spec.) Hence the check above "state & AC_PWRST_CLK_STOP_OK" also results in "false".
Supposing that it's AMD GPU, does a fix like below work?
[snip]
--- a/sound/pci/hda/patch_hdmi.c +++ b/sound/pci/hda/patch_hdmi.c @@ -3741,6 +3741,11 @@ static int patch_atihdmi(struct hda_codec *codec)
spec->chmap.channels_max = max(spec->chmap.channels_max, 8u);
- /* AMD GPUs have neither EPSS nor CLKSTOP bits, hence preventing
* the link-down as is. Tell the core to allow it.
*/
- codec->link_down_at_suspend = 1;
- return 0;
}
In hda_intel.c:azx_probe_continue(), we currently do this:
if (use_vga_switcheroo(hda)) list_for_each_codec(codec, &chip->bus) codec->auto_runtime_pm = 1;
An alternative to setting the flag in patch_atihdmi() would be to set it here.
One small nit, the code comment you're adding above is in network subsystem style ("/*" isn't on a line by itself).
Otherwise this is Reviewed-by: Lukas Wunner lukas@wunner.de Reported-and-tested-by: Henning Kühn prg@cooco.de Fixes: 07f4f97d7b4b ("vga_switcheroo: Use device link for HDA controller") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106957
Feel free to just copy/paste the problem description from my original e-mail to the commit message so that you don't have additional work there.
Thanks so much for the fast response with a working patch!
Lukas
On Thu, 21 Jun 2018 11:18:59 +0200, Lukas Wunner wrote:
On Wed, Jun 20, 2018 at 12:14:32PM +0200, Takashi Iwai wrote:
On Wed, 20 Jun 2018 11:34:52 +0200, Lukas Wunner wrote:
Henning Kühn reports that due to my commit 07f4f97d7b4b ("vga_switcheroo: Use device link for HDA controller"), the discrete GPU on his hybrid graphics laptop no longer runtime suspends.
The root cause is that the single codec of the GPU's HDA controller doesn't support CLKSTOP and EPSS. (The "Supported Power States" are 0x00000009, i.e. CLKSTOP and EPSS bits are not set, cf. page 209 of the HDA spec.)
This means that in hda_codec_runtime_suspend() we do not call snd_hdac_codec_link_down():
if (codec_has_clkstop(codec) && codec_has_epss(codec) && (state & AC_PWRST_CLK_STOP_OK)) snd_hdac_codec_link_down(&codec->core);
If snd_hdac_codec_link_down() isn't called, the bit in the codec_powered bitmask isn't cleared, which in turn prevents the controller from going to PCI_D3hot in azx_runtime_idle():
if (!power_save_controller || !azx_has_pm_runtime(chip) || azx_bus(chip)->codec_powered || !chip->running) return -EBUSY;
The codec does runtime suspend to D3, but the PS-ClkStopOk bit in the response to "Get Power State" is not set. (Response is 0x00000033, cf. page 151 of the HD Audio spec.) Hence the check above "state & AC_PWRST_CLK_STOP_OK" also results in "false".
Supposing that it's AMD GPU, does a fix like below work?
[snip]
--- a/sound/pci/hda/patch_hdmi.c +++ b/sound/pci/hda/patch_hdmi.c @@ -3741,6 +3741,11 @@ static int patch_atihdmi(struct hda_codec *codec)
spec->chmap.channels_max = max(spec->chmap.channels_max, 8u);
- /* AMD GPUs have neither EPSS nor CLKSTOP bits, hence preventing
* the link-down as is. Tell the core to allow it.
*/
- codec->link_down_at_suspend = 1;
- return 0;
}
In hda_intel.c:azx_probe_continue(), we currently do this:
if (use_vga_switcheroo(hda)) list_for_each_codec(codec, &chip->bus) codec->auto_runtime_pm = 1;
An alternative to setting the flag in patch_atihdmi() would be to set it here.
I think it's safer to put in patch_atihdmi(). It's about AMD GPU specific, and the code in azx_probe_continue() may touch all codecs.
One small nit, the code comment you're adding above is in network subsystem style ("/*" isn't on a line by itself).
It's OK in the sound tree, too. I like that style :)
Otherwise this is Reviewed-by: Lukas Wunner lukas@wunner.de Reported-and-tested-by: Henning Kühn prg@cooco.de Fixes: 07f4f97d7b4b ("vga_switcheroo: Use device link for HDA controller") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106957
Feel free to just copy/paste the problem description from my original e-mail to the commit message so that you don't have additional work there.
OK, I'll cook up the proper patch and submit/merge later.
Thanks!
Takashi
participants (3)
-
Lukas Wunner
-
prg@cooco.de
-
Takashi Iwai