[PATCH v8 0/3] ASoC: SOF: Fix deadlock when shutdown a frozen userspace
Since: 83bfc7e793b5 ("ASoC: SOF: core: unregister clients and machine drivers in .shutdown") we wait for all the workloads to be completed during shutdown. This was done to avoid a stall once the device is started again.
Unfortunately this has the side effect of stalling kexec(), if the userspace is frozen. Let's handle that case.
To: Joel Fernandes joel@joelfernandes.org To: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com To: Liam Girdwood lgirdwood@gmail.com To: Peter Ujfalusi peter.ujfalusi@linux.intel.com To: Bard Liao yung-chuan.liao@linux.intel.com To: Ranjani Sridharan ranjani.sridharan@linux.intel.com To: Kai Vehmanen kai.vehmanen@linux.intel.com To: Daniel Baluta daniel.baluta@nxp.com To: Mark Brown broonie@kernel.org To: Jaroslav Kysela perex@perex.cz To: Takashi Iwai tiwai@suse.com To: Eric Biederman ebiederm@xmission.com To: Chromeos Kdump chromeos-kdump@google.com To: Steven Rostedt rostedt@goodmis.org To: Michael Ellerman mpe@ellerman.id.au To: Nicholas Piggin npiggin@gmail.com To: Christophe Leroy christophe.leroy@csgroup.eu To: "K. Y. Srinivasan" kys@microsoft.com To: Haiyang Zhang haiyangz@microsoft.com To: Wei Liu wei.liu@kernel.org To: Dexuan Cui decui@microsoft.com To: Thomas Gleixner tglx@linutronix.de To: Ingo Molnar mingo@redhat.com To: Borislav Petkov bp@alien8.de To: Dave Hansen dave.hansen@linux.intel.com To: x86@kernel.org To: "H. Peter Anvin" hpa@zytor.com To: Juergen Gross jgross@suse.com To: Boris Ostrovsky boris.ostrovsky@oracle.com To: Ard Biesheuvel ardb@kernel.org To: Bjorn Helgaas bhelgaas@google.com To: "Rafael J. Wysocki" rafael@kernel.org To: Pavel Machek pavel@ucw.cz To: Len Brown len.brown@intel.com Cc: stable@vger.kernel.org Cc: sound-open-firmware@alsa-project.org Cc: alsa-devel@alsa-project.org Cc: linux-kernel@vger.kernel.org Cc: kexec@lists.infradead.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-hyperv@vger.kernel.org Cc: xen-devel@lists.xenproject.org Cc: linux-efi@vger.kernel.org Cc: linux-pci@vger.kernel.org Cc: linux-pm@vger.kernel.org Signed-off-by: Ricardo Ribalda ribalda@chromium.org --- Changes in v8: - Wrap pm_freezing and kexec_inprogress in functions. - Do not run snd_sof_machine_unregister(sdev, pdata) during kexec (Thanks Kai). - Link to v7: https://lore.kernel.org/r/20221127-snd-freeze-v7-0-127c582f1ca4@chromium.org
Changes in v7: - Fix commit message (Thanks Pierre-Louis). - Link to v6: https://lore.kernel.org/r/20221127-snd-freeze-v6-0-3e90553f64a5@chromium.org
Changes in v6: - Check if we are in kexec with the userspace frozen. - Link to v5: https://lore.kernel.org/r/20221127-snd-freeze-v5-0-4ededeb08ba0@chromium.org
Changes in v5: - Edit subject prefix. - Link to v4: https://lore.kernel.org/r/20221127-snd-freeze-v4-0-51ca64b7f2ab@chromium.org
Changes in v4: - Do not call snd_sof_machine_unregister from shutdown. - Link to v3: https://lore.kernel.org/r/20221127-snd-freeze-v3-0-a2eda731ca14@chromium.org
Changes in v3: - Wrap pm_freezing in a function. - Link to v2: https://lore.kernel.org/r/20221127-snd-freeze-v2-0-d8a425ea9663@chromium.org
Changes in v2: - Only use pm_freezing if CONFIG_FREEZER . - Link to v1: https://lore.kernel.org/r/20221127-snd-freeze-v1-0-57461a366ec2@chromium.org
--- Ricardo Ribalda (3): kexec: Refactor kexec_in_progress into a function freezer: refactor pm_freezing into a function. ASoC: SOF: Fix deadlock when shutdown a frozen userspace
arch/powerpc/platforms/pseries/vio.c | 2 +- arch/x86/kernel/cpu/mshyperv.c | 6 +++--- arch/x86/xen/enlighten_hvm.c | 2 +- drivers/firmware/efi/efi.c | 2 +- drivers/pci/pci-driver.c | 2 +- include/linux/freezer.h | 3 ++- include/linux/kexec.h | 5 ++--- kernel/freezer.c | 3 +-- kernel/kexec_core.c | 12 ++++++++++-- kernel/power/process.c | 24 ++++++++++++++++++++---- sound/soc/sof/core.c | 9 ++++++--- 11 files changed, 48 insertions(+), 22 deletions(-) --- base-commit: 4312098baf37ee17a8350725e6e0d0e8590252d4 change-id: 20221127-snd-freeze-1ee143228326
Best regards,
Drivers running .shutdown() might want to behave differently during kexec.
Convert kexec_in_progress into a function and export it, so it can be used by drivers that are either built-in or modules.
Cc: stable@vger.kernel.org Fixes: 83bfc7e793b5 ("ASoC: SOF: core: unregister clients and machine drivers in .shutdown") Signed-off-by: Ricardo Ribalda ribalda@chromium.org --- arch/powerpc/platforms/pseries/vio.c | 2 +- arch/x86/kernel/cpu/mshyperv.c | 6 +++--- arch/x86/xen/enlighten_hvm.c | 2 +- drivers/firmware/efi/efi.c | 2 +- drivers/pci/pci-driver.c | 2 +- include/linux/kexec.h | 5 ++--- kernel/kexec_core.c | 12 ++++++++++-- 7 files changed, 19 insertions(+), 12 deletions(-)
diff --git a/arch/powerpc/platforms/pseries/vio.c b/arch/powerpc/platforms/pseries/vio.c index 00ecac2c205b..923f9a36b992 100644 --- a/arch/powerpc/platforms/pseries/vio.c +++ b/arch/powerpc/platforms/pseries/vio.c @@ -1289,7 +1289,7 @@ static void vio_bus_shutdown(struct device *dev) viodrv = to_vio_driver(dev->driver); if (viodrv->shutdown) viodrv->shutdown(viodev); - else if (kexec_in_progress) + else if (kexec_in_progress()) vio_bus_remove(dev); } } diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c index 831613959a92..f91f35206489 100644 --- a/arch/x86/kernel/cpu/mshyperv.c +++ b/arch/x86/kernel/cpu/mshyperv.c @@ -122,21 +122,21 @@ void hv_remove_crash_handler(void) #ifdef CONFIG_KEXEC_CORE static void hv_machine_shutdown(void) { - if (kexec_in_progress && hv_kexec_handler) + if (kexec_in_progress() && hv_kexec_handler) hv_kexec_handler();
/* * Call hv_cpu_die() on all the CPUs, otherwise later the hypervisor * corrupts the old VP Assist Pages and can crash the kexec kernel. */ - if (kexec_in_progress && hyperv_init_cpuhp > 0) + if (kexec_in_progress() && hyperv_init_cpuhp > 0) cpuhp_remove_state(hyperv_init_cpuhp);
/* The function calls stop_other_cpus(). */ native_machine_shutdown();
/* Disable the hypercall page when there is only 1 active CPU. */ - if (kexec_in_progress) + if (kexec_in_progress()) hyperv_cleanup(); }
diff --git a/arch/x86/xen/enlighten_hvm.c b/arch/x86/xen/enlighten_hvm.c index c1cd28e915a3..769163833ffc 100644 --- a/arch/x86/xen/enlighten_hvm.c +++ b/arch/x86/xen/enlighten_hvm.c @@ -145,7 +145,7 @@ DEFINE_IDTENTRY_SYSVEC(sysvec_xen_hvm_callback) static void xen_hvm_shutdown(void) { native_machine_shutdown(); - if (kexec_in_progress) + if (kexec_in_progress()) xen_reboot(SHUTDOWN_soft_reset); }
diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c index a46df5d1d094..608bc2146802 100644 --- a/drivers/firmware/efi/efi.c +++ b/drivers/firmware/efi/efi.c @@ -1040,7 +1040,7 @@ static int update_efi_random_seed(struct notifier_block *nb, struct linux_efi_random_seed *seed; u32 size = 0;
- if (!kexec_in_progress) + if (!kexec_in_progress()) return NOTIFY_DONE;
seed = memremap(efi_rng_seed, sizeof(*seed), MEMREMAP_WB); diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c index 107d77f3c846..23eeb7538b03 100644 --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -519,7 +519,7 @@ static void pci_device_shutdown(struct device *dev) * If it is not a kexec reboot, firmware will hit the PCI * devices with big hammer and stop their DMA any way. */ - if (kexec_in_progress && (pci_dev->current_state <= PCI_D3hot)) + if (kexec_in_progress() && pci_dev->current_state <= PCI_D3hot) pci_clear_master(pci_dev); }
diff --git a/include/linux/kexec.h b/include/linux/kexec.h index 41a686996aaa..2ec0aec1a0de 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -423,8 +423,7 @@ extern int kexec_load_disabled; #define KEXEC_FILE_FLAGS (KEXEC_FILE_UNLOAD | KEXEC_FILE_ON_CRASH | \ KEXEC_FILE_NO_INITRAMFS)
-/* flag to track if kexec reboot is in progress */ -extern bool kexec_in_progress; +bool kexec_in_progress(void);
int crash_shrink_memory(unsigned long new_size); ssize_t crash_get_memory_size(void); @@ -507,7 +506,7 @@ static inline void __crash_kexec(struct pt_regs *regs) { } static inline void crash_kexec(struct pt_regs *regs) { } static inline int kexec_should_crash(struct task_struct *p) { return 0; } static inline int kexec_crash_loaded(void) { return 0; } -#define kexec_in_progress false +static inline bool kexec_in_progress(void) { return false; } #endif /* CONFIG_KEXEC_CORE */
#ifdef CONFIG_KEXEC_SIG diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c index ca2743f9c634..4495d0fc28ae 100644 --- a/kernel/kexec_core.c +++ b/kernel/kexec_core.c @@ -52,8 +52,16 @@ atomic_t __kexec_lock = ATOMIC_INIT(0); note_buf_t __percpu *crash_notes;
/* Flag to indicate we are going to kexec a new kernel */ -bool kexec_in_progress = false; +static bool kexec_in_progress_internal;
+/** + * kexec_in_progress - Check if the system is going to kexec + */ +bool kexec_in_progress(void) +{ + return kexec_in_progress_internal; +} +EXPORT_SYMBOL(kexec_in_progress);
/* Location of the reserved area for the crash kernel */ struct resource crashk_res = { @@ -1175,7 +1183,7 @@ int kernel_kexec(void) } else #endif { - kexec_in_progress = true; + kexec_in_progress_internal = true; kernel_restart_prepare("kexec reboot"); migrate_to_reboot_cpu();
Add a way to let the drivers know if the processes are frozen.
This is needed by drivers that are waiting for processes to end on their shutdown path.
Convert pm_freezing into a function and export it, so it can be used by drivers that are either built-in or modules.
Cc: stable@vger.kernel.org Fixes: 83bfc7e793b5 ("ASoC: SOF: core: unregister clients and machine drivers in .shutdown") Signed-off-by: Ricardo Ribalda ribalda@chromium.org
sdad --- include/linux/freezer.h | 3 ++- kernel/freezer.c | 3 +-- kernel/power/process.c | 24 ++++++++++++++++++++---- 3 files changed, 23 insertions(+), 7 deletions(-)
diff --git a/include/linux/freezer.h b/include/linux/freezer.h index b303472255be..3413c869d68b 100644 --- a/include/linux/freezer.h +++ b/include/linux/freezer.h @@ -13,7 +13,7 @@ #ifdef CONFIG_FREEZER DECLARE_STATIC_KEY_FALSE(freezer_active);
-extern bool pm_freezing; /* PM freezing in effect */ +bool pm_freezing(void); extern bool pm_nosig_freezing; /* PM nosig freezing in effect */
/* @@ -80,6 +80,7 @@ static inline int freeze_processes(void) { return -ENOSYS; } static inline int freeze_kernel_threads(void) { return -ENOSYS; } static inline void thaw_processes(void) {} static inline void thaw_kernel_threads(void) {} +static inline bool pm_freezing(void) { return false; }
static inline bool try_to_freeze(void) { return false; }
diff --git a/kernel/freezer.c b/kernel/freezer.c index 4fad0e6fca64..2d3530ebdb7e 100644 --- a/kernel/freezer.c +++ b/kernel/freezer.c @@ -20,7 +20,6 @@ EXPORT_SYMBOL(freezer_active); * indicate whether PM freezing is in effect, protected by * system_transition_mutex */ -bool pm_freezing; bool pm_nosig_freezing;
/* protects freezing and frozen transitions */ @@ -46,7 +45,7 @@ bool freezing_slow_path(struct task_struct *p) if (pm_nosig_freezing || cgroup_freezing(p)) return true;
- if (pm_freezing && !(p->flags & PF_KTHREAD)) + if (pm_freezing() && !(p->flags & PF_KTHREAD)) return true;
return false; diff --git a/kernel/power/process.c b/kernel/power/process.c index ddd9988327fe..8a4d0e2c8c20 100644 --- a/kernel/power/process.c +++ b/kernel/power/process.c @@ -108,6 +108,22 @@ static int try_to_freeze_tasks(bool user_only) return todo ? -EBUSY : 0; }
+/* + * Indicate whether PM freezing is in effect, protected by + * system_transition_mutex. + */ +static bool pm_freezing_internal; + +/** + * pm_freezing - indicate whether PM freezing is in effect. + * + */ +bool pm_freezing(void) +{ + return pm_freezing_internal; +} +EXPORT_SYMBOL(pm_freezing); + /** * freeze_processes - Signal user space processes to enter the refrigerator. * The current thread will not be frozen. The same process that calls @@ -126,12 +142,12 @@ int freeze_processes(void) /* Make sure this task doesn't get frozen */ current->flags |= PF_SUSPEND_TASK;
- if (!pm_freezing) + if (!pm_freezing()) static_branch_inc(&freezer_active);
pm_wakeup_clear(0); pr_info("Freezing user space processes ... "); - pm_freezing = true; + pm_freezing_internal = true; error = try_to_freeze_tasks(true); if (!error) { __usermodehelper_set_disable_depth(UMH_DISABLED); @@ -187,9 +203,9 @@ void thaw_processes(void) struct task_struct *curr = current;
trace_suspend_resume(TPS("thaw_processes"), 0, true); - if (pm_freezing) + if (pm_freezing()) static_branch_dec(&freezer_active); - pm_freezing = false; + pm_freezing_internal = false; pm_nosig_freezing = false;
oom_killer_enable();
On Thu, Dec 1, 2022 at 12:08 PM Ricardo Ribalda ribalda@chromium.org wrote:
Add a way to let the drivers know if the processes are frozen.
This is needed by drivers that are waiting for processes to end on their shutdown path.
Convert pm_freezing into a function and export it, so it can be used by drivers that are either built-in or modules.
Cc: stable@vger.kernel.org Fixes: 83bfc7e793b5 ("ASoC: SOF: core: unregister clients and machine drivers in .shutdown") Signed-off-by: Ricardo Ribalda ribalda@chromium.org
Why can't you export the original pm_freezing variable and why is this fixing anything?
include/linux/freezer.h | 3 ++- kernel/freezer.c | 3 +-- kernel/power/process.c | 24 ++++++++++++++++++++---- 3 files changed, 23 insertions(+), 7 deletions(-)
diff --git a/include/linux/freezer.h b/include/linux/freezer.h index b303472255be..3413c869d68b 100644 --- a/include/linux/freezer.h +++ b/include/linux/freezer.h @@ -13,7 +13,7 @@ #ifdef CONFIG_FREEZER DECLARE_STATIC_KEY_FALSE(freezer_active);
-extern bool pm_freezing; /* PM freezing in effect */ +bool pm_freezing(void); extern bool pm_nosig_freezing; /* PM nosig freezing in effect */
/* @@ -80,6 +80,7 @@ static inline int freeze_processes(void) { return -ENOSYS; } static inline int freeze_kernel_threads(void) { return -ENOSYS; } static inline void thaw_processes(void) {} static inline void thaw_kernel_threads(void) {} +static inline bool pm_freezing(void) { return false; }
static inline bool try_to_freeze(void) { return false; }
diff --git a/kernel/freezer.c b/kernel/freezer.c index 4fad0e6fca64..2d3530ebdb7e 100644 --- a/kernel/freezer.c +++ b/kernel/freezer.c @@ -20,7 +20,6 @@ EXPORT_SYMBOL(freezer_active);
- indicate whether PM freezing is in effect, protected by
- system_transition_mutex
*/ -bool pm_freezing; bool pm_nosig_freezing;
/* protects freezing and frozen transitions */ @@ -46,7 +45,7 @@ bool freezing_slow_path(struct task_struct *p) if (pm_nosig_freezing || cgroup_freezing(p)) return true;
if (pm_freezing && !(p->flags & PF_KTHREAD))
if (pm_freezing() && !(p->flags & PF_KTHREAD)) return true; return false;
diff --git a/kernel/power/process.c b/kernel/power/process.c index ddd9988327fe..8a4d0e2c8c20 100644 --- a/kernel/power/process.c +++ b/kernel/power/process.c @@ -108,6 +108,22 @@ static int try_to_freeze_tasks(bool user_only) return todo ? -EBUSY : 0; }
+/*
- Indicate whether PM freezing is in effect, protected by
- system_transition_mutex.
- */
+static bool pm_freezing_internal;
+/**
- pm_freezing - indicate whether PM freezing is in effect.
- */
+bool pm_freezing(void) +{
return pm_freezing_internal;
+} +EXPORT_SYMBOL(pm_freezing);
Use EXPORT_SYMBOL_GPL() instead, please.
/**
- freeze_processes - Signal user space processes to enter the refrigerator.
- The current thread will not be frozen. The same process that calls
@@ -126,12 +142,12 @@ int freeze_processes(void) /* Make sure this task doesn't get frozen */ current->flags |= PF_SUSPEND_TASK;
if (!pm_freezing)
if (!pm_freezing()) static_branch_inc(&freezer_active); pm_wakeup_clear(0); pr_info("Freezing user space processes ... ");
pm_freezing = true;
pm_freezing_internal = true; error = try_to_freeze_tasks(true); if (!error) { __usermodehelper_set_disable_depth(UMH_DISABLED);
@@ -187,9 +203,9 @@ void thaw_processes(void) struct task_struct *curr = current;
trace_suspend_resume(TPS("thaw_processes"), 0, true);
if (pm_freezing)
if (pm_freezing()) static_branch_dec(&freezer_active);
pm_freezing = false;
pm_freezing_internal = false; pm_nosig_freezing = false; oom_killer_enable();
--
Hi Rafael
On Fri, 2 Dec 2022 at 18:48, Rafael J. Wysocki rafael@kernel.org wrote:
On Thu, Dec 1, 2022 at 12:08 PM Ricardo Ribalda ribalda@chromium.org wrote:
Add a way to let the drivers know if the processes are frozen.
This is needed by drivers that are waiting for processes to end on their shutdown path.
Convert pm_freezing into a function and export it, so it can be used by drivers that are either built-in or modules.
Cc: stable@vger.kernel.org Fixes: 83bfc7e793b5 ("ASoC: SOF: core: unregister clients and machine drivers in .shutdown") Signed-off-by: Ricardo Ribalda ribalda@chromium.org
Why can't you export the original pm_freezing variable and why is this fixing anything?
Because then any module will be able to modify the content of the variable.
The Fixes: is because the last patch on the set is doing a real fix. If you only cherry-pick the last patch on a stable branch, the build will fail. (Also, the zero-day builder complains)
Anyway, I think we can hold this patch for a bit. The snd people are discussing if this the way to handle it, or if we should handle .shutdown in a different way.
Thanks!
include/linux/freezer.h | 3 ++- kernel/freezer.c | 3 +-- kernel/power/process.c | 24 ++++++++++++++++++++---- 3 files changed, 23 insertions(+), 7 deletions(-)
diff --git a/include/linux/freezer.h b/include/linux/freezer.h index b303472255be..3413c869d68b 100644 --- a/include/linux/freezer.h +++ b/include/linux/freezer.h @@ -13,7 +13,7 @@ #ifdef CONFIG_FREEZER DECLARE_STATIC_KEY_FALSE(freezer_active);
-extern bool pm_freezing; /* PM freezing in effect */ +bool pm_freezing(void); extern bool pm_nosig_freezing; /* PM nosig freezing in effect */
/* @@ -80,6 +80,7 @@ static inline int freeze_processes(void) { return -ENOSYS; } static inline int freeze_kernel_threads(void) { return -ENOSYS; } static inline void thaw_processes(void) {} static inline void thaw_kernel_threads(void) {} +static inline bool pm_freezing(void) { return false; }
static inline bool try_to_freeze(void) { return false; }
diff --git a/kernel/freezer.c b/kernel/freezer.c index 4fad0e6fca64..2d3530ebdb7e 100644 --- a/kernel/freezer.c +++ b/kernel/freezer.c @@ -20,7 +20,6 @@ EXPORT_SYMBOL(freezer_active);
- indicate whether PM freezing is in effect, protected by
- system_transition_mutex
*/ -bool pm_freezing; bool pm_nosig_freezing;
/* protects freezing and frozen transitions */ @@ -46,7 +45,7 @@ bool freezing_slow_path(struct task_struct *p) if (pm_nosig_freezing || cgroup_freezing(p)) return true;
if (pm_freezing && !(p->flags & PF_KTHREAD))
if (pm_freezing() && !(p->flags & PF_KTHREAD)) return true; return false;
diff --git a/kernel/power/process.c b/kernel/power/process.c index ddd9988327fe..8a4d0e2c8c20 100644 --- a/kernel/power/process.c +++ b/kernel/power/process.c @@ -108,6 +108,22 @@ static int try_to_freeze_tasks(bool user_only) return todo ? -EBUSY : 0; }
+/*
- Indicate whether PM freezing is in effect, protected by
- system_transition_mutex.
- */
+static bool pm_freezing_internal;
+/**
- pm_freezing - indicate whether PM freezing is in effect.
- */
+bool pm_freezing(void) +{
return pm_freezing_internal;
+} +EXPORT_SYMBOL(pm_freezing);
Use EXPORT_SYMBOL_GPL() instead, please.
/**
- freeze_processes - Signal user space processes to enter the refrigerator.
- The current thread will not be frozen. The same process that calls
@@ -126,12 +142,12 @@ int freeze_processes(void) /* Make sure this task doesn't get frozen */ current->flags |= PF_SUSPEND_TASK;
if (!pm_freezing)
if (!pm_freezing()) static_branch_inc(&freezer_active); pm_wakeup_clear(0); pr_info("Freezing user space processes ... ");
pm_freezing = true;
pm_freezing_internal = true; error = try_to_freeze_tasks(true); if (!error) { __usermodehelper_set_disable_depth(UMH_DISABLED);
@@ -187,9 +203,9 @@ void thaw_processes(void) struct task_struct *curr = current;
trace_suspend_resume(TPS("thaw_processes"), 0, true);
if (pm_freezing)
if (pm_freezing()) static_branch_dec(&freezer_active);
pm_freezing = false;
pm_freezing_internal = false; pm_nosig_freezing = false; oom_killer_enable();
--
-- You received this message because you are subscribed to the Google Groups "Chromeos Kdump" group. To unsubscribe from this group and stop receiving emails from it, send an email to chromeos-kdump+unsubscribe@google.com. To view this discussion on the web, visit https://groups.google.com/a/google.com/d/msgid/chromeos-kdump/CAJZ5v0jbKSTQo....
-- Ricardo Ribalda
If we are shutting down due to kexec and the userspace is frozen, the system will stall forever waiting for userspace to complete.
Do not wait for the clients to complete in that case.
This fixes:
[ 84.943749] Freezing user space processes ... (elapsed 0.111 seconds) done. [ 246.784446] INFO: task kexec-lite:5123 blocked for more than 122 seconds. [ 246.819035] Call Trace: [ 246.821782] <TASK> [ 246.824186] __schedule+0x5f9/0x1263 [ 246.828231] schedule+0x87/0xc5 [ 246.831779] snd_card_disconnect_sync+0xb5/0x127 ... [ 246.889249] snd_sof_device_shutdown+0xb4/0x150 [ 246.899317] pci_device_shutdown+0x37/0x61 [ 246.903990] device_shutdown+0x14c/0x1d6 [ 246.908391] kernel_kexec+0x45/0xb9
And:
[ 246.893222] INFO: task kexec-lite:4891 blocked for more than 122 seconds. [ 246.927709] Call Trace: [ 246.930461] <TASK> [ 246.932819] __schedule+0x5f9/0x1263 [ 246.936855] ? fsnotify_grab_connector+0x5c/0x70 [ 246.942045] schedule+0x87/0xc5 [ 246.945567] schedule_timeout+0x49/0xf3 [ 246.949877] wait_for_completion+0x86/0xe8 [ 246.954463] snd_card_free+0x68/0x89 ... [ 247.001080] platform_device_unregister+0x12/0x35
Cc: stable@vger.kernel.org Fixes: 83bfc7e793b5 ("ASoC: SOF: core: unregister clients and machine drivers in .shutdown") Signed-off-by: Ricardo Ribalda ribalda@chromium.org --- sound/soc/sof/core.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/sound/soc/sof/core.c b/sound/soc/sof/core.c index 3e6141d03770..9587b6a85103 100644 --- a/sound/soc/sof/core.c +++ b/sound/soc/sof/core.c @@ -9,6 +9,8 @@ //
#include <linux/firmware.h> +#include <linux/kexec.h> +#include <linux/freezer.h> #include <linux/module.h> #include <sound/soc.h> #include <sound/sof.h> @@ -484,9 +486,10 @@ int snd_sof_device_shutdown(struct device *dev) * make sure clients and machine driver(s) are unregistered to force * all userspace devices to be closed prior to the DSP shutdown sequence */ - sof_unregister_clients(sdev); - - snd_sof_machine_unregister(sdev, pdata); + if (!(kexec_in_progress() && pm_freezing())) { + sof_unregister_clients(sdev); + snd_sof_machine_unregister(sdev, pdata); + }
if (sdev->fw_state == SOF_FW_BOOT_COMPLETE) return snd_sof_shutdown(sdev);
On 01.12.22 12:08, Ricardo Ribalda wrote:
If we are shutting down due to kexec and the userspace is frozen, the system will stall forever waiting for userspace to complete.
Do not wait for the clients to complete in that case.
Hi,
I am afraid I have to state that this approach is bad in every case, not just this corner case. It basically means that user space can stall the kernel for an arbitrary amount of time. And we cannot have that.
Regards Oliver
Hi Oliver
Thanks for your review
On Thu, 1 Dec 2022 at 13:29, Oliver Neukum oneukum@suse.com wrote:
On 01.12.22 12:08, Ricardo Ribalda wrote:
If we are shutting down due to kexec and the userspace is frozen, the system will stall forever waiting for userspace to complete.
Do not wait for the clients to complete in that case.
Hi,
I am afraid I have to state that this approach is bad in every case, not just this corner case. It basically means that user space can stall the kernel for an arbitrary amount of time. And we cannot have that.
Regards Oliver
This patchset does not modify this behaviour. It simply fixes the stall for kexec().
The patch that introduced the stall: 83bfc7e793b5 ("ASoC: SOF: core: unregister clients and machine drivers in .shutdown")
was sent as a generalised version of: https://github.com/thesofproject/linux/pull/3388
AFAIK, we would need a similar patch for every single board.... which I am not sure it is doable in a reasonable timeframe.
On the meantime this seems like a decent compromises. Yes, a miss-behaving userspace can still stall during suspend, but that was not introduced in this patch.
Regards!
On 01.12.22 14:03, Ricardo Ribalda wrote:
Hi,
This patchset does not modify this behaviour. It simply fixes the stall for kexec().
The patch that introduced the stall: 83bfc7e793b5 ("ASoC: SOF: core: unregister clients and machine drivers in .shutdown")
That patch is problematic. I would go as far as saying that it needs to be reverted.
was sent as a generalised version of: https://github.com/thesofproject/linux/pull/3388
AFAIK, we would need a similar patch for every single board.... which I am not sure it is doable in a reasonable timeframe.
On the meantime this seems like a decent compromises. Yes, a miss-behaving userspace can still stall during suspend, but that was not introduced in this patch.
Well, I mean if you know what wrong then I'd say at least return to a sanely broken state.
The whole approach is wrong. You need to be able to deal with user space talking to removed devices by returning an error and keeping the resources association with the open file allocated until user space calls close()
Regards Oliver
Hi Oliver
On Thu, 1 Dec 2022 at 14:22, 'Oliver Neukum' via Chromeos Kdump chromeos-kdump@google.com wrote:
On 01.12.22 14:03, Ricardo Ribalda wrote:
Hi,
This patchset does not modify this behaviour. It simply fixes the stall for kexec().
The patch that introduced the stall: 83bfc7e793b5 ("ASoC: SOF: core: unregister clients and machine drivers in .shutdown")
That patch is problematic. I would go as far as saying that it needs to be reverted.
It fixes a real issue. We have not had any complaints until we tried to kexec in the platform. I wont recommend reverting it until we have an alternative implementation.
kexec is far less common than suspend/reboot.
was sent as a generalised version of: https://github.com/thesofproject/linux/pull/3388
AFAIK, we would need a similar patch for every single board.... which I am not sure it is doable in a reasonable timeframe.
On the meantime this seems like a decent compromises. Yes, a miss-behaving userspace can still stall during suspend, but that was not introduced in this patch.
Well, I mean if you know what wrong then I'd say at least return to a sanely broken state.
The whole approach is wrong. You need to be able to deal with user space talking to removed devices by returning an error and keeping the resources association with the open file allocated until user space calls close()
In general, the whole shutdown is broken for all the subsystems ;). It is a complicated issue. Users handling fds, devices with DMAs in the middle of an operation, dma fences....
Unfortunately I am not that familiar with the sound subsystem to make a proper patch for this.
Regards Oliver
-- You received this message because you are subscribed to the Google Groups "Chromeos Kdump" group. To unsubscribe from this group and stop receiving emails from it, send an email to chromeos-kdump+unsubscribe@google.com. To view this discussion on the web, visit https://groups.google.com/a/google.com/d/msgid/chromeos-kdump/d3730d1d-6f92-....
On Thu, 01 Dec 2022 14:22:12 +0100, Oliver Neukum wrote:
On 01.12.22 14:03, Ricardo Ribalda wrote:
Hi,
This patchset does not modify this behaviour. It simply fixes the stall for kexec().
The patch that introduced the stall: 83bfc7e793b5 ("ASoC: SOF: core: unregister clients and machine drivers in .shutdown")
That patch is problematic. I would go as far as saying that it needs to be reverted.
... or fixed.
was sent as a generalised version of: https://github.com/thesofproject/linux/pull/3388
AFAIK, we would need a similar patch for every single board.... which I am not sure it is doable in a reasonable timeframe.
On the meantime this seems like a decent compromises. Yes, a miss-behaving userspace can still stall during suspend, but that was not introduced in this patch.
Well, I mean if you know what wrong then I'd say at least return to a sanely broken state.
The whole approach is wrong. You need to be able to deal with user space talking to removed devices by returning an error and keeping the resources association with the open file allocated until user space calls close()
As I already mentioned in another thread, if the user-space action has to be cut off, we just need to call snd_card_disconnect() instead without sync. A quick hack would be like below (totally untested and might be wrong, though).
In anyway, Ricardo, please stop spinning too frequently; v8 in a few days is way too much, and now the recipient list became unmanageable. Let's give people some time to review and consider a better solution at first.
thanks,
Takashi
-- 8< -- --- a/sound/soc/sof/core.c +++ b/sound/soc/sof/core.c @@ -475,7 +475,7 @@ EXPORT_SYMBOL(snd_sof_device_remove); int snd_sof_device_shutdown(struct device *dev) { struct snd_sof_dev *sdev = dev_get_drvdata(dev); - struct snd_sof_pdata *pdata = sdev->pdata; + struct snd_soc_component *component;
if (IS_ENABLED(CONFIG_SND_SOC_SOF_PROBE_WORK_QUEUE)) cancel_work_sync(&sdev->probe_work); @@ -484,9 +484,9 @@ int snd_sof_device_shutdown(struct device *dev) * make sure clients and machine driver(s) are unregistered to force * all userspace devices to be closed prior to the DSP shutdown sequence */ - sof_unregister_clients(sdev); - - snd_sof_machine_unregister(sdev, pdata); + component = snd_soc_lookup_component(sdev->dev, NULL); + if (component && component->card && component->card->snd_card) + snd_card_disconnect(component->card->snd_card);
if (sdev->fw_state == SOF_FW_BOOT_COMPLETE) return snd_sof_shutdown(sdev);
participants (4)
-
Oliver Neukum
-
Rafael J. Wysocki
-
Ricardo Ribalda
-
Takashi Iwai