[PATCH v2] ALSA: hda/i915 - avoid hung task timeout in i915 wait
If kernel is built with hung task detection enabled and CONFIG_DEFAULT_HUNG_TASK_TIMEOUT set to less than 60 seconds, snd_hdac_i915_init() will trigger the hung task timeout in case i915 is not available and taint the kernel.
Split the 60sec wait into a loop of smaller waits to avoid this.
Reviewed-by: Lucas De Marchi lucas.demarchi@intel.com Co-developed-by: Ramalingam C ramalingam.c@intel.com Signed-off-by: Ramalingam C ramalingam.c@intel.com Signed-off-by: Kai Vehmanen kai.vehmanen@linux.intel.com --- sound/hda/hdac_i915.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
Changes V1->V2: - address local variable naming issue raised by Amadeusz and use Takashi's proposal
diff --git a/sound/hda/hdac_i915.c b/sound/hda/hdac_i915.c index 454474ac5716..aa48bed7baf7 100644 --- a/sound/hda/hdac_i915.c +++ b/sound/hda/hdac_i915.c @@ -143,7 +143,7 @@ static bool i915_gfx_present(void) int snd_hdac_i915_init(struct hdac_bus *bus) { struct drm_audio_component *acomp; - int err; + int err, i;
if (!i915_gfx_present()) return -ENODEV; @@ -159,9 +159,11 @@ int snd_hdac_i915_init(struct hdac_bus *bus) if (!acomp->ops) { if (!IS_ENABLED(CONFIG_MODULES) || !request_module("i915")) { - /* 60s timeout */ - wait_for_completion_timeout(&acomp->master_bind_complete, - msecs_to_jiffies(60 * 1000)); + /* max 60s timeout */ + for (i = 0; i < 60; i++) + if (wait_for_completion_timeout(&acomp->master_bind_complete, + msecs_to_jiffies(1000))) + break; } } if (!acomp->ops) {
base-commit: fd7698cf0858f8c5e659b655109cd93c2f15cdd3
On 08/03/2022 17:27, Kai Vehmanen wrote:
If kernel is built with hung task detection enabled and CONFIG_DEFAULT_HUNG_TASK_TIMEOUT set to less than 60 seconds, snd_hdac_i915_init() will trigger the hung task timeout in case i915 is not available and taint the kernel.
Split the 60sec wait into a loop of smaller waits to avoid this.
Reviewed-by: Lucas De Marchi lucas.demarchi@intel.com Co-developed-by: Ramalingam C ramalingam.c@intel.com Signed-off-by: Ramalingam C ramalingam.c@intel.com Signed-off-by: Kai Vehmanen kai.vehmanen@linux.intel.com
sound/hda/hdac_i915.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
Changes V1->V2:
- address local variable naming issue raised by Amadeusz and use Takashi's proposal
diff --git a/sound/hda/hdac_i915.c b/sound/hda/hdac_i915.c index 454474ac5716..aa48bed7baf7 100644 --- a/sound/hda/hdac_i915.c +++ b/sound/hda/hdac_i915.c @@ -143,7 +143,7 @@ static bool i915_gfx_present(void) int snd_hdac_i915_init(struct hdac_bus *bus) { struct drm_audio_component *acomp;
- int err;
int err, i;
if (!i915_gfx_present()) return -ENODEV;
@@ -159,9 +159,11 @@ int snd_hdac_i915_init(struct hdac_bus *bus) if (!acomp->ops) { if (!IS_ENABLED(CONFIG_MODULES) || !request_module("i915")) {
/* 60s timeout */
Where does this 60s come from and why is the fix to work around DEFAULT_HUNG_TASK_TIMEOUT in a hacky way deemed okay? For instance would limiting the wait here to whatever the kconfig is set to be an option?
Regards,
Tvrtko
wait_for_completion_timeout(&acomp->master_bind_complete,
msecs_to_jiffies(60 * 1000));
/* max 60s timeout */
for (i = 0; i < 60; i++)
if (wait_for_completion_timeout(&acomp->master_bind_complete,
msecs_to_jiffies(1000)))
} } if (!acomp->ops) {break;
base-commit: fd7698cf0858f8c5e659b655109cd93c2f15cdd3
Hi,
On Wed, 9 Mar 2022, Tvrtko Ursulin wrote:
/* 60s timeout */
Where does this 60s come from and why is the fix to work around DEFAULT_HUNG_TASK_TIMEOUT in a hacky way deemed okay? For instance would limiting the wait here to whatever the kconfig is set to be an option?
this was discussed in https://lists.freedesktop.org/archives/intel-gfx/2022-February/290821.html ... and that thread concluded it's cleaner to split the wait than try to figure out hung-task configuration from middle of audio driver.
The 60sec timeout comes from 2019 patch "ALSA: hda: Extend i915 component bind timeout" to fix an issue reported by Paul Menzel (cc'ed).
This patch keeps the timeout intact. Br, Kai
On 09/03/2022 08:39, Kai Vehmanen wrote:
Hi,
On Wed, 9 Mar 2022, Tvrtko Ursulin wrote:
/* 60s timeout */
Where does this 60s come from and why is the fix to work around DEFAULT_HUNG_TASK_TIMEOUT in a hacky way deemed okay? For instance would limiting the wait here to whatever the kconfig is set to be an option?
this was discussed in https://lists.freedesktop.org/archives/intel-gfx/2022-February/290821.html ... and that thread concluded it's cleaner to split the wait than try to figure out hung-task configuration from middle of audio driver.
The 60sec timeout comes from 2019 patch "ALSA: hda: Extend i915 component bind timeout" to fix an issue reported by Paul Menzel (cc'ed).
This patch keeps the timeout intact.
I did not spot discussion touching on the point I raised.
How about not fight the hung task detector but mark your wait context as "I really know what I'm doing - not stuck trust me". Maybe using wait_for_completion_killable_timeout would do it since snd_hdac_i915_init is allowed to fail with an error already?
Regards,
Tvrtko
On Wed, 09 Mar 2022 10:02:13 +0100, Tvrtko Ursulin wrote:
On 09/03/2022 08:39, Kai Vehmanen wrote:
Hi,
On Wed, 9 Mar 2022, Tvrtko Ursulin wrote:
/* 60s timeout */
Where does this 60s come from and why is the fix to work around DEFAULT_HUNG_TASK_TIMEOUT in a hacky way deemed okay? For instance would limiting the wait here to whatever the kconfig is set to be an option?
this was discussed in https://lists.freedesktop.org/archives/intel-gfx/2022-February/290821.html ... and that thread concluded it's cleaner to split the wait than try to figure out hung-task configuration from middle of audio driver.
The 60sec timeout comes from 2019 patch "ALSA: hda: Extend i915 component bind timeout" to fix an issue reported by Paul Menzel (cc'ed).
This patch keeps the timeout intact.
I did not spot discussion touching on the point I raised.
How about not fight the hung task detector but mark your wait context as "I really know what I'm doing - not stuck trust me".
The question is how often this problem hits. Basically it's a very corner case, and I even think we may leave as is; that's a matter of configuration, and lowering such a bar should expect some side-effect. OTOH, if the problem happens in many cases, it's beneficial to fix in the core part, indeed.
Maybe using wait_for_completion_killable_timeout would do it since snd_hdac_i915_init is allowed to fail with an error already?
It makes it killable -- which is a complete behavior change.
Takashi
On 09/03/2022 09:23, Takashi Iwai wrote:
On Wed, 09 Mar 2022 10:02:13 +0100, Tvrtko Ursulin wrote:
On 09/03/2022 08:39, Kai Vehmanen wrote:
Hi,
On Wed, 9 Mar 2022, Tvrtko Ursulin wrote:
/* 60s timeout */
Where does this 60s come from and why is the fix to work around DEFAULT_HUNG_TASK_TIMEOUT in a hacky way deemed okay? For instance would limiting the wait here to whatever the kconfig is set to be an option?
this was discussed in https://lists.freedesktop.org/archives/intel-gfx/2022-February/290821.html ... and that thread concluded it's cleaner to split the wait than try to figure out hung-task configuration from middle of audio driver.
The 60sec timeout comes from 2019 patch "ALSA: hda: Extend i915 component bind timeout" to fix an issue reported by Paul Menzel (cc'ed).
This patch keeps the timeout intact.
I did not spot discussion touching on the point I raised.
How about not fight the hung task detector but mark your wait context as "I really know what I'm doing - not stuck trust me".
The question is how often this problem hits. Basically it's a very corner case, and I even think we may leave as is; that's a matter of configuration, and lowering such a bar should expect some side-effect. OTOH, if the problem happens in many cases, it's beneficial to fix in the core part, indeed.
Yes argument you raise can be made I agree.
Maybe using wait_for_completion_killable_timeout would do it since snd_hdac_i915_init is allowed to fail with an error already?
It makes it killable -- which is a complete behavior change.
Complete behaviour change how? Isn't this something ran on probe so likelihood of anyone sending SIGKILL to the modprobe process is only the init process? And in that case what is the fundamental difference in init giving up before the internal 60s in HDA driver does? I don't see a difference. Either party decided to abort the wait and code can just unwind and propagate the different error codes.
Regards,
Tvrtko
On Wed, 09 Mar 2022 10:48:49 +0100, Tvrtko Ursulin wrote:
On 09/03/2022 09:23, Takashi Iwai wrote:
On Wed, 09 Mar 2022 10:02:13 +0100, Tvrtko Ursulin wrote:
On 09/03/2022 08:39, Kai Vehmanen wrote:
Hi,
On Wed, 9 Mar 2022, Tvrtko Ursulin wrote:
/* 60s timeout */
Where does this 60s come from and why is the fix to work around DEFAULT_HUNG_TASK_TIMEOUT in a hacky way deemed okay? For instance would limiting the wait here to whatever the kconfig is set to be an option?
this was discussed in https://lists.freedesktop.org/archives/intel-gfx/2022-February/290821.html ... and that thread concluded it's cleaner to split the wait than try to figure out hung-task configuration from middle of audio driver.
The 60sec timeout comes from 2019 patch "ALSA: hda: Extend i915 component bind timeout" to fix an issue reported by Paul Menzel (cc'ed).
This patch keeps the timeout intact.
I did not spot discussion touching on the point I raised.
How about not fight the hung task detector but mark your wait context as "I really know what I'm doing - not stuck trust me".
The question is how often this problem hits. Basically it's a very corner case, and I even think we may leave as is; that's a matter of configuration, and lowering such a bar should expect some side-effect. OTOH, if the problem happens in many cases, it's beneficial to fix in the core part, indeed.
Yes argument you raise can be made I agree.
Maybe using wait_for_completion_killable_timeout would do it since snd_hdac_i915_init is allowed to fail with an error already?
It makes it killable -- which is a complete behavior change.
Complete behaviour change how? Isn't this something ran on probe so likelihood of anyone sending SIGKILL to the modprobe process is only the init process? And in that case what is the fundamental difference in init giving up before the internal 60s in HDA driver does? I don't see a difference. Either party decided to abort the wait and code can just unwind and propagate the different error codes.
The point is that it does change the actual behavior, and changing the actual behavior just for working around the corner case like the above wouldn't be justified without the proper evaluation.
That said, if this behavior change is intentional and even desired, that's a way to go.
Takashi
Hi,
On Wed, 9 Mar 2022, Takashi Iwai wrote:
Takashi Iwai wrote:
The question is how often this problem hits. Basically it's a very corner case, and I even think we may leave as is; that's a matter of configuration, and lowering such a bar should expect some side-effect. OTOH, if the problem happens in many cases, it's beneficial to fix in the core part, indeed.
I'm basicly helping out the intel-gfx folks here. This is now happening systematically in the intel-gfx CI. The hung-task timeout is configured to 30sec (in intel-gfx CI), and there's some new hw configs where this happens every time (I have a separate patch in progress [1] that tries to detect this case and skip the init, but this will require more time as there is risk of breaking existing configurations).
[1] https://lists.freedesktop.org/archives/intel-gfx-trybot/2022-February/128278...
Tvrtko Ursulin wrote:
Takashi Iwai wrote:
Complete behaviour change how? Isn't this something ran on probe so likelihood of anyone sending SIGKILL to the modprobe process is only the init process? And in that case what is the fundamental difference
[...]
The point is that it does change the actual behavior, and changing the actual behavior just for working around the corner case like the above wouldn't be justified without the proper evaluation.
That said, if this behavior change is intentional and even desired, that's a way to go.
Let me try this out and test on a few configs (with and without the timeout occurring) and send a V3 if this seems ok.
Br, Kai
On Wed, 09 Mar 2022 09:36:54 +0100, Tvrtko Ursulin wrote:
On 08/03/2022 17:27, Kai Vehmanen wrote:
If kernel is built with hung task detection enabled and CONFIG_DEFAULT_HUNG_TASK_TIMEOUT set to less than 60 seconds, snd_hdac_i915_init() will trigger the hung task timeout in case i915 is not available and taint the kernel.
Split the 60sec wait into a loop of smaller waits to avoid this.
Reviewed-by: Lucas De Marchi lucas.demarchi@intel.com Co-developed-by: Ramalingam C ramalingam.c@intel.com Signed-off-by: Ramalingam C ramalingam.c@intel.com Signed-off-by: Kai Vehmanen kai.vehmanen@linux.intel.com
sound/hda/hdac_i915.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
Changes V1->V2:
- address local variable naming issue raised by Amadeusz and use Takashi's proposal
diff --git a/sound/hda/hdac_i915.c b/sound/hda/hdac_i915.c index 454474ac5716..aa48bed7baf7 100644 --- a/sound/hda/hdac_i915.c +++ b/sound/hda/hdac_i915.c @@ -143,7 +143,7 @@ static bool i915_gfx_present(void) int snd_hdac_i915_init(struct hdac_bus *bus) { struct drm_audio_component *acomp;
- int err;
- int err, i; if (!i915_gfx_present()) return -ENODEV;
@@ -159,9 +159,11 @@ int snd_hdac_i915_init(struct hdac_bus *bus) if (!acomp->ops) { if (!IS_ENABLED(CONFIG_MODULES) || !request_module("i915")) {
/* 60s timeout */
Where does this 60s come from and why is the fix to work around DEFAULT_HUNG_TASK_TIMEOUT in a hacky way deemed okay?
The 60s timeout comes from the fact that the binding with i915 *might* be mandatory for HD-audio driver on some platforms, while the binding couldn't be achieved depending on the dynamic configuration change or any other reasons, so we don't want to block forver. And, basically the hung check is false-positive, and if there is a better way to mark to skip the hung check, we'd take it. But currently this seems to be the easiest workaround for avoiding the false-positive checks.
For instance would limiting the wait here to whatever the kconfig is set to be an option?
No, the hunk task timeout can be changed dynamically via procfs, hence the fixed Kconfig won't help at all.
Takashi
participants (3)
-
Kai Vehmanen
-
Takashi Iwai
-
Tvrtko Ursulin