hda codec unbind refcount hang
Hi Takashi,
commit 7206998f578d ("ALSA: hda: Fix potential deadlock at codec unbinding") introduced a problem on at least one of my older machines.
The problem happens when hda_codec_driver_remove() encounters a codec without any pcms (and thus the refcount is 1) and tries to call refcount_dec(). Turns out refcount_dec() doesn't like to be used for dropping the refcount to 0, and instead if spews a warning and does its saturate thing. The subsequent wait_event() is then permanently stuck waiting on the saturated refcount.
I've definitely seen the same kind of pattern used elsewhere in the kernel as well, so the fact that refcount_t can't be used to implement it is a bit of surprise to me. I guess most other places still use atomic_t instead.
On Fri, 09 Sep 2022 17:45:25 +0200, Ville Syrjälä wrote:
Hi Takashi,
commit 7206998f578d ("ALSA: hda: Fix potential deadlock at codec unbinding") introduced a problem on at least one of my older machines.
The problem happens when hda_codec_driver_remove() encounters a codec without any pcms (and thus the refcount is 1) and tries to call refcount_dec(). Turns out refcount_dec() doesn't like to be used for dropping the refcount to 0, and instead if spews a warning and does its saturate thing. The subsequent wait_event() is then permanently stuck waiting on the saturated refcount.
I've definitely seen the same kind of pattern used elsewhere in the kernel as well, so the fact that refcount_t can't be used to implement it is a bit of surprise to me. I guess most other places still use atomic_t instead.
Does the patch below work around it? It seem to be a subtle difference between refcount_dec() and refcount_dec_and_test().
thanks,
Takashi
-- 8< -- --- a/sound/pci/hda/hda_bind.c +++ b/sound/pci/hda/hda_bind.c @@ -157,10 +157,11 @@ static int hda_codec_driver_remove(struct device *dev) return codec->bus->core.ext_ops->hdev_detach(&codec->core); }
- refcount_dec(&codec->pcm_ref); - snd_hda_codec_disconnect_pcms(codec); - snd_hda_jack_tbl_disconnect(codec); - wait_event(codec->remove_sleep, !refcount_read(&codec->pcm_ref)); + if (!refcount_dec_and_test(&codec->pcm_ref)) { + snd_hda_codec_disconnect_pcms(codec); + snd_hda_jack_tbl_disconnect(codec); + wait_event(codec->remove_sleep, !refcount_read(&codec->pcm_ref)); + } snd_power_sync_ref(codec->bus->card);
if (codec->patch_ops.free)
On Fri, Sep 09, 2022 at 05:59:47PM +0200, Takashi Iwai wrote:
On Fri, 09 Sep 2022 17:45:25 +0200, Ville Syrjälä wrote:
Hi Takashi,
commit 7206998f578d ("ALSA: hda: Fix potential deadlock at codec unbinding") introduced a problem on at least one of my older machines.
The problem happens when hda_codec_driver_remove() encounters a codec without any pcms (and thus the refcount is 1) and tries to call refcount_dec(). Turns out refcount_dec() doesn't like to be used for dropping the refcount to 0, and instead if spews a warning and does its saturate thing. The subsequent wait_event() is then permanently stuck waiting on the saturated refcount.
I've definitely seen the same kind of pattern used elsewhere in the kernel as well, so the fact that refcount_t can't be used to implement it is a bit of surprise to me. I guess most other places still use atomic_t instead.
Does the patch below work around it? It seem to be a subtle difference between refcount_dec() and refcount_dec_and_test().
Aye, this works.
Tested-by: Ville Syrjälä ville.syrjala@linux.intel.com
thanks,
Takashi
-- 8< -- --- a/sound/pci/hda/hda_bind.c +++ b/sound/pci/hda/hda_bind.c @@ -157,10 +157,11 @@ static int hda_codec_driver_remove(struct device *dev) return codec->bus->core.ext_ops->hdev_detach(&codec->core); }
- refcount_dec(&codec->pcm_ref);
- snd_hda_codec_disconnect_pcms(codec);
- snd_hda_jack_tbl_disconnect(codec);
- wait_event(codec->remove_sleep, !refcount_read(&codec->pcm_ref));
if (!refcount_dec_and_test(&codec->pcm_ref)) {
snd_hda_codec_disconnect_pcms(codec);
snd_hda_jack_tbl_disconnect(codec);
wait_event(codec->remove_sleep, !refcount_read(&codec->pcm_ref));
} snd_power_sync_ref(codec->bus->card);
if (codec->patch_ops.free)
On Fri, 09 Sep 2022 21:39:19 +0200, Ville Syrjälä wrote:
On Fri, Sep 09, 2022 at 05:59:47PM +0200, Takashi Iwai wrote:
On Fri, 09 Sep 2022 17:45:25 +0200, Ville Syrjälä wrote:
Hi Takashi,
commit 7206998f578d ("ALSA: hda: Fix potential deadlock at codec unbinding") introduced a problem on at least one of my older machines.
The problem happens when hda_codec_driver_remove() encounters a codec without any pcms (and thus the refcount is 1) and tries to call refcount_dec(). Turns out refcount_dec() doesn't like to be used for dropping the refcount to 0, and instead if spews a warning and does its saturate thing. The subsequent wait_event() is then permanently stuck waiting on the saturated refcount.
I've definitely seen the same kind of pattern used elsewhere in the kernel as well, so the fact that refcount_t can't be used to implement it is a bit of surprise to me. I guess most other places still use atomic_t instead.
Does the patch below work around it? It seem to be a subtle difference between refcount_dec() and refcount_dec_and_test().
Aye, this works.
Tested-by: Ville Syrjälä ville.syrjala@linux.intel.com
Good to hear.
I think the below is slightly safer, assuring the other *_disconnect() calls.
Could you give it a try again? Once after confirming it works, I'll re-submit and merge to my tree.
thanks,
Takashi
-- 8< -- From: Takashi Iwai tiwai@suse.de Subject: [PATCH] ALSA: hda: Fix hang at HD-audio codec unbinding due to refcount saturation MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit
We fixed the potential deadlock at dynamic unbinding the HD-audio codec at the commit 7206998f578d ("ALSA: hda: Fix potential deadlock at codec unbinding"), but ironically, this caused another potential deadlock. The current code uses refcount_dec() and waits for the pending task with wait_event for dropping the refcount to 0. This works fine when PCMs are assigned and actually waiting for the refcount drop.
Meanwhile, when there was no PCM assigned, the refcount_dec() call itself was supposed to drop to zero -- alas, it doesn't in reality; refcount_dec() complains, spews kernel warning and it saturates instead of dropping to 0, due to the nature of refcount_dec() implementation. This eventually blocks the wait_event() wakeup and the code get stuck there.
For avoiding the problem, we call refcount_dec_and_test() and skips the sync-wait if it already reaches to zero.
The patch does a slight code reshuffling to make sure to invoke other disconnect calls before the sync-wait, too.
Fixes: 7206998f578d ("ALSA: hda: Fix potential deadlock at codec unbinding") Reported-by: Ville Syrjälä ville.syrjala@linux.intel.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/YxtflWQnslMHVlU7@intel.com Signed-off-by: Takashi Iwai tiwai@suse.de --- sound/pci/hda/hda_bind.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/sound/pci/hda/hda_bind.c b/sound/pci/hda/hda_bind.c index cae9a975cbcc..1a868dd9dc4b 100644 --- a/sound/pci/hda/hda_bind.c +++ b/sound/pci/hda/hda_bind.c @@ -157,10 +157,10 @@ static int hda_codec_driver_remove(struct device *dev) return codec->bus->core.ext_ops->hdev_detach(&codec->core); }
- refcount_dec(&codec->pcm_ref); snd_hda_codec_disconnect_pcms(codec); snd_hda_jack_tbl_disconnect(codec); - wait_event(codec->remove_sleep, !refcount_read(&codec->pcm_ref)); + if (!refcount_dec_and_test(&codec->pcm_ref)) + wait_event(codec->remove_sleep, !refcount_read(&codec->pcm_ref)); snd_power_sync_ref(codec->bus->card);
if (codec->patch_ops.free)
On Sat, Sep 10, 2022 at 12:22:04PM +0200, Takashi Iwai wrote:
On Fri, 09 Sep 2022 21:39:19 +0200, Ville Syrjälä wrote:
On Fri, Sep 09, 2022 at 05:59:47PM +0200, Takashi Iwai wrote:
On Fri, 09 Sep 2022 17:45:25 +0200, Ville Syrjälä wrote:
Hi Takashi,
commit 7206998f578d ("ALSA: hda: Fix potential deadlock at codec unbinding") introduced a problem on at least one of my older machines.
The problem happens when hda_codec_driver_remove() encounters a codec without any pcms (and thus the refcount is 1) and tries to call refcount_dec(). Turns out refcount_dec() doesn't like to be used for dropping the refcount to 0, and instead if spews a warning and does its saturate thing. The subsequent wait_event() is then permanently stuck waiting on the saturated refcount.
I've definitely seen the same kind of pattern used elsewhere in the kernel as well, so the fact that refcount_t can't be used to implement it is a bit of surprise to me. I guess most other places still use atomic_t instead.
Does the patch below work around it? It seem to be a subtle difference between refcount_dec() and refcount_dec_and_test().
Aye, this works.
Tested-by: Ville Syrjälä ville.syrjala@linux.intel.com
Good to hear.
I think the below is slightly safer, assuring the other *_disconnect() calls.
Could you give it a try again? Once after confirming it works, I'll re-submit and merge to my tree.
This works too. Thanks
Tested-by: Ville Syrjälä ville.syrjala@linux.intel.com
thanks,
Takashi
-- 8< -- From: Takashi Iwai tiwai@suse.de Subject: [PATCH] ALSA: hda: Fix hang at HD-audio codec unbinding due to refcount saturation MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit
We fixed the potential deadlock at dynamic unbinding the HD-audio codec at the commit 7206998f578d ("ALSA: hda: Fix potential deadlock at codec unbinding"), but ironically, this caused another potential deadlock. The current code uses refcount_dec() and waits for the pending task with wait_event for dropping the refcount to 0. This works fine when PCMs are assigned and actually waiting for the refcount drop.
Meanwhile, when there was no PCM assigned, the refcount_dec() call itself was supposed to drop to zero -- alas, it doesn't in reality; refcount_dec() complains, spews kernel warning and it saturates instead of dropping to 0, due to the nature of refcount_dec() implementation. This eventually blocks the wait_event() wakeup and the code get stuck there.
For avoiding the problem, we call refcount_dec_and_test() and skips the sync-wait if it already reaches to zero.
The patch does a slight code reshuffling to make sure to invoke other disconnect calls before the sync-wait, too.
Fixes: 7206998f578d ("ALSA: hda: Fix potential deadlock at codec unbinding") Reported-by: Ville Syrjälä ville.syrjala@linux.intel.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/YxtflWQnslMHVlU7@intel.com Signed-off-by: Takashi Iwai tiwai@suse.de
sound/pci/hda/hda_bind.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/sound/pci/hda/hda_bind.c b/sound/pci/hda/hda_bind.c index cae9a975cbcc..1a868dd9dc4b 100644 --- a/sound/pci/hda/hda_bind.c +++ b/sound/pci/hda/hda_bind.c @@ -157,10 +157,10 @@ static int hda_codec_driver_remove(struct device *dev) return codec->bus->core.ext_ops->hdev_detach(&codec->core); }
- refcount_dec(&codec->pcm_ref); snd_hda_codec_disconnect_pcms(codec); snd_hda_jack_tbl_disconnect(codec);
- wait_event(codec->remove_sleep, !refcount_read(&codec->pcm_ref));
if (!refcount_dec_and_test(&codec->pcm_ref))
wait_event(codec->remove_sleep, !refcount_read(&codec->pcm_ref));
snd_power_sync_ref(codec->bus->card);
if (codec->patch_ops.free)
-- 2.35.3
participants (2)
-
Takashi Iwai
-
Ville Syrjälä