[alsa-devel] pcm_post_start race condition with hardware error interrupts
Takashi Iwai
tiwai at suse.de
Wed Dec 20 20:30:25 CET 2017
On Wed, 20 Dec 2017 20:13:22 +0100,
srainey at delta-info.com wrote:
>
>
> >-----Original Message-----
> >From: "Takashi Iwai" <tiwai at suse.de>
> >Sent: Wednesday, December 20, 2017 1:02pm
> >On Wed, 20 Dec 2017 18:07:50 +0100,
> >srainey at delta-info.com wrote:
> >>
> >>
> >> This is more of a general question/issue, but a bit of background: I encountered this in trying to resolve an issue on a fairly mature embedded ARM platform wherein captured audio was left/right swapped some small percent (~4%) of the time on startup. I should mention that I'm locked into using a fairly old kernel on this platform, but having a quick look at the latest at kernel.org reveals it to be not much changed.
> >>
> >> In tracing events, I found that during the stream startup, the following sequence is possible (and indeed does happen some small percent of time):
> >>
> >> -Audio Pipeline startups routines are kicked off through snd_pcm_action_start, eventually calling the driver's trigger op.
> >>
> >> -The driver's trigger function is called with SNDRV_PCM_TRIGGER_START command, which configures the necessary hardware and enables interrupts.
> >>
> >> -While the startup sequence is still running, an interrupt is generated due to some real error condition in the hardware, and the ISR sets the stream's state to SNDRV_PCM_STATE_XRUN.
> >
> >At setting XRUN, did you take the substream lock? Without that, it's
> >racy, of course.
> >
> >> -The startup sequence reaches snd_pcm_post_start, which unconditionally sets the stream's state to RUNNING, clobbering the error state if it was set. See https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/sound/core/pcm_native.c?h=v4.15-rc4#n1192
> >>
> >> If the hardware reaches an error state, the user must be informed in order to take action, but the unconditional assignment here does not allow that to happen if there is a hardware error in this small startup time. I don't have a patch because I am working on a very old vendor provided kernel (2.6.37), but by only setting the state if the current state is PREPARED, along with some other necessary changes in the driver, I was able to mitigate the l/r swap issue and get correct behavior.
> >>
> >> This is likely a very rare issue, but also it was somewhat difficult to debug. I'm not familiar enough with kernel dev processes or the audio framework to say something should be done or if there is perhaps some other solution here. It seems like something *should* be done anyway.
> >
> >By taking the substream lock (e.g. via snd_pcm_stop_xrun()), it
> >assures that the whole start sequence has been done. It's the way to
> >avoid such an inconsistent state.
> >
> >
> >Takashi
>
> The driver/IRQ handler in my case is the davinci mcasp driver https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/sound/soc/davinci/davinci-mcasp.c?h=v4.15-rc4#n309
>
> It does manually grab the lock. I also currently have it modified to ALWAYS set the state to xrun while the lock is held - when the IRQ handler is triggered before we reach running state, the user should be notified as things may need to be reset (in this case I think we want to go from prepared->xrun instead of the usual running->xrun). However this does not happen in the current unmodified kernel - snd_pcm_post_start does not check the state for error before setting it, and no errors are reported to the user.
The whole snd_pcm_start() is basically protected inside the substream
lock, i.e. pre_start(), do_start() and post_start() are performed
sequentially while a lock is kept. When xrun happens before this
sequence, pre_start() checks already the state and return -EBADFD.
If xrun happens during the sequence, snd_pcm_stop() is blocked due to
substream lock, thus the pcm state change happens after post_start().
When a stream is in sleep, it's woken up at this point.
Takashi
More information about the Alsa-devel
mailing list