Re: [alsa-devel] race condition between dmaengine_pcm_dma_complete and snd_pcm_release

16 Oct 2015

      On 10/12/2015 02:01 PM, Lars-Peter Clausen wrote:
...
On 10/08/2015 03:46 PM, Matt Campbell wrote:
...
Hi Lars,
Note: re-sending this message because I accidentally wasn't plain text
email so it got bounced from the email list (thanks GMail). Sorry for
the double mail to those that are CC'd.
Thanks for the info, this felt like a limitation to the DMA engine
before I found this older thread so it's good to have that feeling
validated.
Just for posterity I'll describe the situation I'm in with the RT
patch. I'm on an i.MX6 solo processor which uses the imx-sdma driver.
As far as I can tell, terminate_all properly shuts off the DMA (simply
disabling the bit in the controller). The issue I'm having is the
described in the following steps:

An SDMA interrupt comes in, is handled by the top half and the

bottom half tasklet is scheduled.

Control returns to my RT process which is higher priority than the

tasklet thread. My process decides it is time to shutdown and calls
snd_pcm_close which will terminate the DMA and deallocates everything
(including the substream which is the argument data to the tasklet)

With my process out of the way the dma tasklet can run. It

eventually calls dmaengine_pcm_dma_complete (or in my case
imx_pcm_dma_complete). You'll run into problems when dereferencing the
now invalid substream pointer.
I have one other note about your proposed fix. In
dmaengine_pcm_dma_complete you can end up with xrun handling which
will result in a reset of the stream including calling terminate_all
on the DMA. I'm worried that even with an API call to the DMA system
to synchronize completion of DMA tasklets, you could end up with a
deadlock when this is called from the tasklet itself when resetting
the stream. I think we need to put the tasklet synchronization just
before snd_pcm_release is called (as you suggested in the older
thread). This still makes me uneasy because deadlock is still possible
if the tasklet takes locks that are held when snd_pcm_release is
called. Thanks for the input!
Yes, very good analysis. This is the reason why we can't synchronize in
terminate_all() itself and have to introduce a separate synchronization
primitive to the API. Most users will be fine with terminating and syncing
at the same time, hence it makes sense to introduce a new API
terminate_all_sync() to cover those common cases. For cases where it is not
possible, e.g. because they either terminate the DMA from atomic context or
they can end up calling terminate_all() from the completion callback, a new
separate API function (e.g. dmaengine_synchronize()) will be introduced.
Users which use the asynchronous terminate API need to call
dmaengine_synchronize() before they free any resources that are accessed in
either the completion callback or the memory that is accessed by the DMA itself.
...
This still makes me uneasy because deadlock is still possible
if the tasklet takes locks that are held when snd_pcm_release is
called. Thanks for the input
Good point. If that can happen we properly need to reference count the
struct that is passed to the complete callback.
Thinking a bit more about this, this shouldn't be possible to happen. The
completion callback runs in atomic context, which means in can only use
spinlocks etc. On the other hand the dmaengine_synchronize() function must
not be called from atomic context, this means it can't be called while
holding a lock that can be taken from within the complete callback.