[RFC PATCH] ALSA: hda/hdmi: fix race in handling acomp ELD notification at resume
When snd-hda-codec-hdmi is used with ASoC HDA controller like SOF (acomp used for ELD notifications), display connection change done during suspend, can be lost due to following sequence of events:
1. system in S3 suspend 2. DP/HDMI receiver connected 3. system resumed 4. HDA controller resumed, but card->deferred_resume_work not complete 5. acomp eld_notify callback 6. eld_notify ignored as power state is not CTL_POWER_D0 7. HDA resume deferred work completed, power state set to CTL_POWER_D0
This results in losing the notification, and the jack state reported to user-space is not correct.
The check on step 6 was added in commit 8ae743e82f0b ("ALSA: hda - Skip ELD notification during system suspend"). It would seem with the deferred resume logic in ASoC core, this check is not safe.
Fix the issue by modifying the check to only skip ELD notification processing if power state is D3 or deeper. This helps in the ASoC controller case as card power state is set to D2 at start of soc_resume_deferred().
BugLink: https://github.com/thesofproject/linux/issues/2825 Signed-off-by: Kai Vehmanen kai.vehmanen@linux.intel.com --- sound/pci/hda/patch_hdmi.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
NOTES: - I wonder if there is a better way to check for system suspend case than looking at snd_power_get_state() - 'chip->pm_prepared' is one option, but this is not directly available to codec drivers - storing PM target is hda_codec_pm_prepare() is perhaps one option
diff --git a/sound/pci/hda/patch_hdmi.c b/sound/pci/hda/patch_hdmi.c index 5de3666a7101..a43df036db1d 100644 --- a/sound/pci/hda/patch_hdmi.c +++ b/sound/pci/hda/patch_hdmi.c @@ -2654,7 +2654,7 @@ static void generic_acomp_pin_eld_notify(void *audio_ptr, int port, int dev_id) /* skip notification during system suspend (but not in runtime PM); * the state will be updated at resume */ - if (snd_power_get_state(codec->card) != SNDRV_CTL_POWER_D0) + if (snd_power_get_state(codec->card) >= SNDRV_CTL_POWER_D3) return; /* ditto during suspend/resume process itself */ if (snd_hdac_is_in_pm(&codec->core)) @@ -2840,7 +2840,7 @@ static void intel_pin_eld_notify(void *audio_ptr, int port, int pipe) /* skip notification during system suspend (but not in runtime PM); * the state will be updated at resume */ - if (snd_power_get_state(codec->card) != SNDRV_CTL_POWER_D0) + if (snd_power_get_state(codec->card) >= SNDRV_CTL_POWER_D3) return; /* ditto during suspend/resume process itself */ if (snd_hdac_is_in_pm(&codec->core))
base-commit: 7dc53a38e4ac00d68943bab91deadc67f07d4a0b
On Wed, 07 Apr 2021 17:47:27 +0200, Kai Vehmanen wrote:
When snd-hda-codec-hdmi is used with ASoC HDA controller like SOF (acomp used for ELD notifications), display connection change done during suspend, can be lost due to following sequence of events:
- system in S3 suspend
- DP/HDMI receiver connected
- system resumed
- HDA controller resumed, but card->deferred_resume_work not complete
- acomp eld_notify callback
- eld_notify ignored as power state is not CTL_POWER_D0
- HDA resume deferred work completed, power state set to CTL_POWER_D0
This results in losing the notification, and the jack state reported to user-space is not correct.
Hrm, that's odd. The logic there is: there is a manual call of hdmi_present_sense() for each pin in the resume call back of HDMI codec driver, so at the point 7, update_eld() is invoked from hdmi_present_sense(), which notifies the state to user-space.
So I don't see what's missing there. Could you check whether the scenario above is correct? The state is updated in snd_hdac_acomp_get_eld() call in sync_eld_via_acomp(). We can see what state is returned there at which timing.
The only possible case I can think of now is that the graphics driver isn't ready for returning the right value at the HDMI codec resume. But this should have been covered by the device link...
thanks,
Takashi
The check on step 6 was added in commit 8ae743e82f0b ("ALSA: hda - Skip ELD notification during system suspend"). It would seem with the deferred resume logic in ASoC core, this check is not safe.
Fix the issue by modifying the check to only skip ELD notification processing if power state is D3 or deeper. This helps in the ASoC controller case as card power state is set to D2 at start of soc_resume_deferred().
BugLink: https://github.com/thesofproject/linux/issues/2825 Signed-off-by: Kai Vehmanen kai.vehmanen@linux.intel.com
sound/pci/hda/patch_hdmi.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
NOTES:
- I wonder if there is a better way to check for system suspend case than looking at snd_power_get_state()
- 'chip->pm_prepared' is one option, but this is not directly available to codec drivers
- storing PM target is hda_codec_pm_prepare() is perhaps one option
diff --git a/sound/pci/hda/patch_hdmi.c b/sound/pci/hda/patch_hdmi.c index 5de3666a7101..a43df036db1d 100644 --- a/sound/pci/hda/patch_hdmi.c +++ b/sound/pci/hda/patch_hdmi.c @@ -2654,7 +2654,7 @@ static void generic_acomp_pin_eld_notify(void *audio_ptr, int port, int dev_id) /* skip notification during system suspend (but not in runtime PM); * the state will be updated at resume */
- if (snd_power_get_state(codec->card) != SNDRV_CTL_POWER_D0)
- if (snd_power_get_state(codec->card) >= SNDRV_CTL_POWER_D3) return; /* ditto during suspend/resume process itself */ if (snd_hdac_is_in_pm(&codec->core))
@@ -2840,7 +2840,7 @@ static void intel_pin_eld_notify(void *audio_ptr, int port, int pipe) /* skip notification during system suspend (but not in runtime PM); * the state will be updated at resume */
- if (snd_power_get_state(codec->card) != SNDRV_CTL_POWER_D0)
- if (snd_power_get_state(codec->card) >= SNDRV_CTL_POWER_D3) return; /* ditto during suspend/resume process itself */ if (snd_hdac_is_in_pm(&codec->core))
base-commit: 7dc53a38e4ac00d68943bab91deadc67f07d4a0b
2.31.0
Hey,
On Wed, 7 Apr 2021, Takashi Iwai wrote:
On Wed, 07 Apr 2021 17:47:27 +0200, Kai Vehmanen wrote:
When snd-hda-codec-hdmi is used with ASoC HDA controller like SOF (acomp used for ELD notifications), display connection change done during suspend, can be lost due to following sequence of events:
- system in S3 suspend
- DP/HDMI receiver connected
- system resumed
- HDA controller resumed, but card->deferred_resume_work not complete
- acomp eld_notify callback
- eld_notify ignored as power state is not CTL_POWER_D0
- HDA resume deferred work completed, power state set to CTL_POWER_D0
This results in losing the notification, and the jack state reported to user-space is not correct.
Hrm, that's odd. The logic there is: there is a manual call of hdmi_present_sense() for each pin in the resume call back of HDMI codec driver, so at the point 7, update_eld() is invoked from hdmi_present_sense(), which notifies the state to user-space.
In the bug case, the codec resume is completed in step (4). i915 is up and running but no HDMI/DP receiver is yet found/setup at this point. So HDA codec driver resumes and concludes no HDMI/DP receivers are available.
A bit later, the HDMI/DP receiver is found and i915 calls eld_notify. But as HDA controller's soc_resume_deferred() is still running, card->power_state==D2 still at this point. patch_hdmi.c:*pin_eld_notify() checks power_state, figures card is not in D0 and ignores the notification.
Then another moment later, HDA controller's deferred resume work completes and card power state is set to D0, but at this point there are no actions left that would trigger reprocessing the ELD nodification.
I now changed this so that if card is in D2, that's good enough and we process the notification in patch_hdmi.c:*pin_eld_notify().
So I don't see what's missing there. Could you check whether the scenario above is correct? The state is updated in snd_hdac_acomp_get_eld() call in sync_eld_via_acomp(). We can see what state is returned there at which timing.
At this point, state for the ports is still disconnected (monitor was connected while system was in suspend).
The only possible case I can think of now is that the graphics driver isn't ready for returning the right value at the HDMI codec resume. But this should have been covered by the device link...
Yes, this seems to be the case. The device link seems to be honoured, but the fact that 1) monitor/receiver is not immediately found, and 2) ASoC core does some of the resume work in a work-queue, opens this race still.
Seems quite odd indeed, but I've now got reports of systems where this is hit, and unfortunately it's very systematic on these systems. By adding some arbitrary delay to soc_resume_deferred(), I could easily hit this myself as well on the systems I have at hand.
Br, Kai
On Wed, 07 Apr 2021 18:40:29 +0200, Kai Vehmanen wrote:
Hey,
On Wed, 7 Apr 2021, Takashi Iwai wrote:
On Wed, 07 Apr 2021 17:47:27 +0200, Kai Vehmanen wrote:
When snd-hda-codec-hdmi is used with ASoC HDA controller like SOF (acomp used for ELD notifications), display connection change done during suspend, can be lost due to following sequence of events:
- system in S3 suspend
- DP/HDMI receiver connected
- system resumed
- HDA controller resumed, but card->deferred_resume_work not complete
- acomp eld_notify callback
- eld_notify ignored as power state is not CTL_POWER_D0
- HDA resume deferred work completed, power state set to CTL_POWER_D0
This results in losing the notification, and the jack state reported to user-space is not correct.
Hrm, that's odd. The logic there is: there is a manual call of hdmi_present_sense() for each pin in the resume call back of HDMI codec driver, so at the point 7, update_eld() is invoked from hdmi_present_sense(), which notifies the state to user-space.
In the bug case, the codec resume is completed in step (4). i915 is up and running but no HDMI/DP receiver is yet found/setup at this point. So HDA codec driver resumes and concludes no HDMI/DP receivers are available.
A bit later, the HDMI/DP receiver is found and i915 calls eld_notify. But as HDA controller's soc_resume_deferred() is still running, card->power_state==D2 still at this point. patch_hdmi.c:*pin_eld_notify() checks power_state, figures card is not in D0 and ignores the notification.
Then another moment later, HDA controller's deferred resume work completes and card power state is set to D0, but at this point there are no actions left that would trigger reprocessing the ELD nodification.
I now changed this so that if card is in D2, that's good enough and we process the notification in patch_hdmi.c:*pin_eld_notify().
So I don't see what's missing there. Could you check whether the scenario above is correct? The state is updated in snd_hdac_acomp_get_eld() call in sync_eld_via_acomp(). We can see what state is returned there at which timing.
At this point, state for the ports is still disconnected (monitor was connected while system was in suspend).
OK, that's a messy problem, indeed. It's partly because of ASoC referred resume that is completely independent from the rest resume via HD-audio bus. More badly, this can't be managed via the device link because the resume callback itself has been processed.
And, IIUC, another part of the problem is that i915 notifies the HPD *after* the resume completion, right? Then indeed it can be racy.
The only possible case I can think of now is that the graphics driver isn't ready for returning the right value at the HDMI codec resume. But this should have been covered by the device link...
Yes, this seems to be the case. The device link seems to be honoured, but the fact that 1) monitor/receiver is not immediately found, and 2) ASoC core does some of the resume work in a work-queue, opens this race still.
Seems quite odd indeed, but I've now got reports of systems where this is hit, and unfortunately it's very systematic on these systems. By adding some arbitrary delay to soc_resume_deferred(), I could easily hit this myself as well on the systems I have at hand.
Judging from the above, I see no problem to merge the patch as is. It's no intrusive changes and cover practically ASoC cases (mostly).
Another possible fix would be to check dev->power.power_state instead of the global card state. This is set in each PM callback in hda_codec.c to indicate the current PM state of the codec. Something like below. Let me know if this works, too.
thanks,
Takashi
--- --- a/sound/pci/hda/patch_hdmi.c +++ b/sound/pci/hda/patch_hdmi.c @@ -2658,7 +2658,7 @@ static void generic_acomp_pin_eld_notify(void *audio_ptr, int port, int dev_id) /* skip notification during system suspend (but not in runtime PM); * the state will be updated at resume */ - if (snd_power_get_state(codec->card) != SNDRV_CTL_POWER_D0) + if (codec->core.dev.power.power_state.event != PM_EVENT_ON) return; /* ditto during suspend/resume process itself */ if (snd_hdac_is_in_pm(&codec->core))
Hi,
On Thu, 8 Apr 2021, Takashi Iwai wrote:
OK, that's a messy problem, indeed. It's partly because of ASoC referred resume that is completely independent from the rest resume via HD-audio bus. More badly, this can't be managed via the device link because the resume callback itself has been processed.
And, IIUC, another part of the problem is that i915 notifies the HPD *after* the resume completion, right? Then indeed it can be racy.
yes, exactly.
Seems quite odd indeed, but I've now got reports of systems where this is hit, and unfortunately it's very systematic on these systems. By adding some arbitrary delay to soc_resume_deferred(), I could easily hit this myself as well on the systems I have at hand.
Another possible fix would be to check dev->power.power_state instead of the global card state. This is set in each PM callback in hda_codec.c to indicate the current PM state of the codec. Something like below. Let me know if this works, too.
Thanks, this works in my setup and is much cleaner. I think this is also more robust. I realized that with snd_power_get_state() check, there is a theoretical race still possible if notify comes before soc_resume_deferred() gets scheduled (i.e. delay is not within soc_resume_deferred() but in getting it scheduled to begin with). This would seem really unlikely, but it's a possible race nevertheless.
I'll update the patch to use dev->power.power_state, ask people with affected systems to double check, and I'll send a V2.
Br, Kai
participants (3)
-
Kai Vehmanen
-
Kai Vehmanen
-
Takashi Iwai