[Intel-gfx] [PATCH v2] ALSA: hda/i915 - avoid hung task timeout in i915 wait

Takashi Iwai tiwai at suse.de
Wed Mar 9 10:55:13 CET 2022


On Wed, 09 Mar 2022 10:48:49 +0100,
Tvrtko Ursulin wrote:
> 
> 
> On 09/03/2022 09:23, Takashi Iwai wrote:
> > On Wed, 09 Mar 2022 10:02:13 +0100,
> > Tvrtko Ursulin wrote:
> >>
> >>
> >> On 09/03/2022 08:39, Kai Vehmanen wrote:
> >>> Hi,
> >>>
> >>> On Wed, 9 Mar 2022, Tvrtko Ursulin wrote:
> >>>
> >>>>> -			/* 60s timeout */
> >>>>
> >>>> Where does this 60s come from and why is the fix to work around
> >>>> DEFAULT_HUNG_TASK_TIMEOUT in a hacky way deemed okay? For instance would
> >>>> limiting the wait here to whatever the kconfig is set to be an option?
> >>>
> >>> this was discussed in
> >>> https://lists.freedesktop.org/archives/intel-gfx/2022-February/290821.html
> >>> ... and that thread concluded it's cleaner to split the wait than try
> >>> to figure out hung-task configuration from middle of audio driver.
> >>>
> >>> The 60sec timeout comes from 2019 patch "ALSA: hda: Extend i915 component
> >>> bind timeout" to fix an issue reported by Paul Menzel (cc'ed).
> >>>
> >>> This patch keeps the timeout intact.
> >>
> >> I did not spot discussion touching on the point I raised.
> >>
> >> How about not fight the hung task detector but mark your wait context
> >> as "I really know what I'm doing - not stuck trust me".
> >
> > The question is how often this problem hits.  Basically it's a very
> > corner case, and I even think we may leave as is; that's a matter of
> > configuration, and lowering such a bar should expect some
> > side-effect. OTOH, if the problem happens in many cases, it's
> > beneficial to fix in the core part, indeed.
> 
> Yes argument you raise can be made I agree.
> 
> >> Maybe using
> >> wait_for_completion_killable_timeout would do it since
> >> snd_hdac_i915_init is allowed to fail with an error already?
> >
> > It makes it killable -- which is a complete behavior change.
> 
> Complete behaviour change how? Isn't this something ran on probe so
> likelihood of anyone sending SIGKILL to the modprobe process is only
> the init process? And in that case what is the fundamental difference
> in init giving up before the internal 60s in HDA driver does? I don't
> see a difference. Either party decided to abort the wait and code can
> just unwind and propagate the different error codes.

The point is that it does change the actual behavior, and changing the
actual behavior just for working around the corner case like the above
wouldn't be justified without the proper evaluation.

That said, if this behavior change is intentional and even desired,
that's a way to go.


Takashi


More information about the Alsa-devel mailing list