[PATCH v3 0/2] ALSA: core: Fix deadlock when shutdown a frozen userspace

Ricardo Ribalda ribalda at chromium.org
Mon Nov 28 14:34:21 CET 2022


Hi Takashi

Thanks  for your response

On Mon, 28 Nov 2022 at 10:53, Takashi Iwai <tiwai at suse.de> wrote:
>
> On Mon, 28 Nov 2022 10:26:36 +0100,
> Ricardo Ribalda wrote:
> >
> > Hi Takashi
> >
> > Thanks for your prompt reply
> >
> > On Mon, 28 Nov 2022 at 10:24, Takashi Iwai <tiwai at suse.de> wrote:
> > >
> > > On Mon, 28 Nov 2022 10:10:12 +0100,
> > > Ricardo Ribalda wrote:
> > > >
> > > > Since 83bfc7e793b5 ("ASoC: SOF: core: unregister clients and machine drivers in .shutdown")
> > > > we wait for userspace to close its fds.
> > >
> > > IMO, the fix above brought more problem.  If you'd need to want to
> > > avoid later accesses during shutdown, the driver should rather just
> > > disconnect devices without waiting for the user-space completion.
> > > And, for that, a simple call of snd_card_disconnect() should suffice.
> > >
> > > > But that will never occur with a frozen userspace (like during kexec()).
> > > >
> > > > Lets detect the frozen userpace and act accordingly.
> > >
> > > ... and skipping the user-space sync at snd_card_disconnect_sync() as
> > > of this patch set is a dangerous move, I'm afraid.  The user-space
> > > gets frozen also at the normal suspend/resume, and it implies that the
> > > sync will be lost even for the normal PM, too (although it must be a
> > > very corner case).
> > >
> >
> > And what about checking kexec_in_progress instead?
>
> I still think that the call of snd_card_disconnect_sync() itself at
> shutdown is somehow wrong.  If this only comes from the SOF code path
> above, we should address in that code path instead.
>
> OTOH, you showed two code paths: one is
>
> [   84.943749] Freezing user space processes ... (elapsed 0.111 seconds) done.
> [  246.784446] INFO: task kexec-lite:5123 blocked for more than 122 seconds.
> [  246.819035] Call Trace:
> [  246.821782]  <TASK>
> [  246.824186]  __schedule+0x5f9/0x1263
> [  246.828231]  schedule+0x87/0xc5
> [  246.831779]  snd_card_disconnect_sync+0xb5/0x127
> ...
> [  246.889249]  snd_sof_device_shutdown+0xb4/0x150
> [  246.899317]  pci_device_shutdown+0x37/0x61
> [  246.903990]  device_shutdown+0x14c/0x1d6
> [  246.908391]  kernel_kexec+0x45/0xb9
>
> and another is
>
> [  246.893222] INFO: task kexec-lite:4891 blocked for more than 122 seconds.
> [  246.927709] Call Trace:
> [  246.930461]  <TASK>
> [  246.932819]  __schedule+0x5f9/0x1263
> [  246.936855]  ? fsnotify_grab_connector+0x5c/0x70
> [  246.942045]  schedule+0x87/0xc5
> [  246.945567]  schedule_timeout+0x49/0xf3
> [  246.949877]  wait_for_completion+0x86/0xe8
> [  246.954463]  snd_card_free+0x68/0x89
> ...
> [  247.001080]  platform_device_unregister+0x12/0x35
>
> The former is likely the SOF code path by the commit you mentioned
> (but it's not 100% clear because you trimmed the stack trace), and
> this should be reconsidered.

Let me add both traces completely:


[  247.080494] INFO: task kexec-lite:5441 blocked for more than 122 seconds.
[  247.088100]       Tainted: G     U            5.15.79-14932-g6eecac25030c #5
[  247.095984] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  247.104745] task:kexec-lite      state:D stack:    0 pid: 5441
ppid:  5390 flags:0x00004002
[  247.114083] Call Trace:
[  247.116833]  <TASK>
[  247.119185]  __schedule+0x5f9/0x1263
[  247.123193]  schedule+0x8e/0xdb
[  247.126698]  snd_card_disconnect_sync+0xb5/0x127
[  247.131877]  ? init_wait_entry+0x31/0x31
[  247.136270]  soc_cleanup_card_resources+0x27/0x1d3
[  247.141636]  snd_soc_unbind_card+0xa6/0xfc
[  247.146218]  snd_soc_unregister_card+0x26/0x34
[  247.151192]  release_nodes+0x43/0x62
[  247.155200]  devres_release_all+0x8b/0xc4
[  247.159695]  device_release_driver_internal+0x11a/0x1be
[  247.165546]  bus_remove_device+0xd9/0x103
[  247.170038]  device_del+0x1f7/0x355
[  247.173946]  platform_device_del+0x28/0x8e
[  247.178539]  platform_device_unregister+0x12/0x35
[  247.183807]  snd_sof_device_shutdown+0x57/0x94 [snd_sof
7d15170beb0e48032711b3230201744d5e4e590d]
[  247.193742]  pci_device_shutdown+0x37/0x61
[  247.198335]  device_shutdown+0x14c/0x1d6
[  247.202735]  kernel_kexec+0x45/0xb9
[  247.206646]  __se_sys_reboot+0x173/0x1f6
[  247.211041]  ? syscall_enter_from_user_mode+0x1a6/0x1ab
[  247.216885]  do_syscall_64+0x55/0x9d
[  247.220879]  ? exit_to_user_mode_prepare+0x3c/0x8b
[  247.226245]  entry_SYSCALL_64_after_hwframe+0x61/0xcb
[  247.231889] RIP: 0033:0x79b6a8563693
[  247.235897] RSP: 002b:00007fff9ab771b8 EFLAGS: 00000202 ORIG_RAX:
00000000000000a9
[  247.244375] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000079b6a8563693
[  247.252363] RDX: 0000000045584543 RSI: 0000000028121969 RDI: 00000000fee1dead
[  247.260337] RBP: 00007fff9ab77200 R08: 0000000000000004 R09: 00007fff9ab7930b
[  247.268323] R10: 00007fff9ab7936a R11: 0000000000000202 R12: 00000000ffffffff
[  247.276297] R13: 0000000000000004 R14: 00007fff9ab7936a R15: 0000000000000003
[  247.284272]  </TASK>
[  247.286729] task:init            state:D stack:    0 pid:    1
ppid:     0 flags:0x00004006
[  247.296079] Call Trace:
[  247.298810]  <TASK>
[  247.301155]  __schedule+0x5f9/0x1263
[  247.305163]  ? core_sys_select+0x278/0x343
[  247.309754]  schedule+0x8e/0xdb
[  247.313261]  __refrigerator+0x5e/0x97
[  247.317364]  get_signal+0x5e4/0x5e9
[  247.321280]  arch_do_signal_or_restart+0x4a/0x27a
[  247.326550]  exit_to_user_mode_loop+0x76/0xde
[  247.331433]  exit_to_user_mode_prepare+0x61/0x8b
[  247.336606]  syscall_exit_to_user_mode+0x26/0x168
[  247.341877]  do_syscall_64+0x63/0x9d
[  247.345871]  ? exit_to_user_mode_prepare+0x3c/0x8b
[  247.351227]  entry_SYSCALL_64_after_hwframe+0x61/0xcb



Then, after adding the (pm_freezing hack) in card_disconnect_sync


[  247.081334] INFO: task kexec-lite:5568 blocked for more than 122 seconds.
[  247.088940]       Tainted: G     U
5.15.79-14932-g6eecac25030c-dirty #6
[  247.097428] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  247.106198] task:kexec-lite      state:D stack:    0 pid: 5568
ppid:  5389 flags:0x00004002
[  247.115552] Call Trace:
[  247.118297]  <TASK>
[  247.120651]  __schedule+0x5f9/0x1263
[  247.124655]  ? fsnotify_grab_connector+0x5c/0x70
[  247.129829]  schedule+0x8e/0xdb
[  247.133354]  schedule_timeout+0x49/0xf3
[  247.137655]  wait_for_completion+0x86/0xe8
[  247.142249]  snd_card_free+0x68/0x89
[  247.146258]  soc_cleanup_card_resources+0x1b5/0x1d3
[  247.151725]  snd_soc_unbind_card+0xa6/0xfc
[  247.156314]  snd_soc_unregister_card+0x26/0x34
[  247.161293]  release_nodes+0x43/0x62
[  247.165303]  devres_release_all+0x8b/0xc4
[  247.169797]  device_release_driver_internal+0x11a/0x1be
[  247.175654]  bus_remove_device+0xd9/0x103
[  247.180147]  device_del+0x1f7/0x355
[  247.184058]  platform_device_del+0x28/0x8e
[  247.188649]  platform_device_unregister+0x12/0x35
[  247.193923]  snd_sof_device_shutdown+0x57/0x94 [snd_sof
e88f872f6695f09a149f1a182cec1fe8e07d5eea]
[  247.203858]  pci_device_shutdown+0x37/0x61
[  247.208455]  device_shutdown+0x14c/0x1d6
[  247.212858]  kernel_kexec+0x45/0xb9
[  247.216759]  __se_sys_reboot+0x173/0x1f6
[  247.221157]  ? syscall_enter_from_user_mode+0x1a6/0x1ab
[  247.227016]  do_syscall_64+0x55/0x9d
[  247.231020]  ? exit_to_user_mode_prepare+0x3c/0x8b
[  247.236386]  entry_SYSCALL_64_after_hwframe+0x61/0xcb
[  247.242048] RIP: 0033:0x7d2d2634a693
[  247.246056] RSP: 002b:00007ffdceeb6338 EFLAGS: 00000202 ORIG_RAX:
00000000000000a9
[  247.254532] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007d2d2634a693
[  247.262520] RDX: 0000000045584543 RSI: 0000000028121969 RDI: 00000000fee1dead
[  247.270500] RBP: 00007ffdceeb6380 R08: 0000000000000004 R09: 00007ffdceeb8305
[  247.278489] R10: 00007ffdceeb836a R11: 0000000000000202 R12: 00000000ffffffff
[  247.286468] R13: 0000000000000004 R14: 00007ffdceeb836a R15: 0000000000000003
[  247.294458]  </TASK>
[  247.296909] task:init            state:D stack:    0 pid:    1
ppid:     0 flags:0x00004006
[  247.306253] Call Trace:
[  247.308994]  <TASK>
[  247.311337]  __schedule+0x5f9/0x1263
[  247.315330]  ? core_sys_select+0x278/0x343
[  247.319924]  schedule+0x8e/0xdb
[  247.323445]  __refrigerator+0x5e/0x97
[  247.327537]  get_signal+0x5e4/0x5e9
[  247.331449]  arch_do_signal_or_restart+0x4a/0x27a
[  247.336721]  exit_to_user_mode_loop+0x76/0xde
[  247.341602]  exit_to_user_mode_prepare+0x61/0x8b
[  247.346763]  syscall_exit_to_user_mode+0x26/0x168
[  247.352035]  do_syscall_64+0x63/0x9d
[  247.356043]  ? exit_to_user_mode_prepare+0x3c/0x8b
[  247.361410]  entry_SYSCALL_64_after_hwframe+0x61/0xcb


Simply commenting out snd_sof_machine_unregister(sdev, pdata); does
the trick though... will send a new version with just that.



>
> But, the latter seems to be independent from that.  If that's the code
> path where the unbind is triggered before kexec, your fix might not
> work, either; it could be already at the wait_event*() when kexec
> starts.
>
> Maybe a simpler workaround would be to replace it with
> wait_event_killable*() stuff.  But whether we can discontinue the sync
> there is still another thing to consider...
>
>
> Takashi
>
> >
> > Thanks!
> >
> > >
> > > thanks,
> > >
> > > Takashi
> > >
> > > >
> > > > To: Jaroslav Kysela <perex at perex.cz>
> > > > To: Takashi Iwai <tiwai at suse.com>
> > > > To: "Rafael J. Wysocki" <rafael at kernel.org>
> > > > To: Pavel Machek <pavel at ucw.cz>
> > > > To: Len Brown <len.brown at intel.com>
> > > > To: Kai Vehmanen <kai.vehmanen at linux.intel.com>
> > > > To: Ranjani Sridharan <ranjani.sridharan at linux.intel.com>
> > > > To: Pierre-Louis Bossart <pierre-louis.bossart at linux.intel.com>
> > > > To: Mark Brown <broonie at kernel.org>
> > > > Cc: alsa-devel at alsa-project.org
> > > > Cc: linux-kernel at vger.kernel.org
> > > > Cc: "Joel Fernandes (Google)" <joel at joelfernandes.org>
> > > > Cc: linux-pm at vger.kernel.org
> > > > Signed-off-by: Ricardo Ribalda <ribalda at chromium.org>
> > > > ---
> > > > Changes in v3:
> > > > - Wrap pm_freezing in a function
> > > > - Link to v2: https://lore.kernel.org/r/20221127-snd-freeze-v2-0-d8a425ea9663@chromium.org
> > > >
> > > > Changes in v2:
> > > > - Only use pm_freezing if CONFIG_FREEZER
> > > > - Link to v1: https://lore.kernel.org/r/20221127-snd-freeze-v1-0-57461a366ec2@chromium.org
> > > >
> > > > ---
> > > > Ricardo Ribalda (2):
> > > >       freezer: Add processes_frozen()
> > > >       ALSA: core: Fix deadlock when shutdown a frozen userspace
> > > >
> > > >  include/linux/freezer.h |  2 ++
> > > >  kernel/freezer.c        | 11 +++++++++++
> > > >  sound/core/init.c       | 13 +++++++++++++
> > > >  3 files changed, 26 insertions(+)
> > > > ---
> > > > base-commit: 4312098baf37ee17a8350725e6e0d0e8590252d4
> > > > change-id: 20221127-snd-freeze-1ee143228326
> > > >
> > > > Best regards,
> > > > --
> > > > Ricardo Ribalda <ribalda at chromium.org>
> > > >
> >
> >
> >
> > --
> > Ricardo Ribalda
> >



-- 
Ricardo Ribalda


More information about the Alsa-devel mailing list