Re: [alsa-devel] async between dmaengine_pcm_dma_complete and snd_pcm_release

10 Oct 2013

      On 10/10/2013 10:54 AM, Vinod Koul wrote:
...
On Wed, Oct 09, 2013 at 01:00:08PM +0200, Lars-Peter Clausen wrote:
...
Added Vinod to Cc.
On 10/09/2013 12:23 PM, Qiao Zhou wrote:
...
On 10/09/2013 04:30 PM, Lars-Peter Clausen wrote:
...
On 10/09/2013 10:19 AM, Lars-Peter Clausen wrote:
...
On 10/09/2013 09:29 AM, Qiao Zhou wrote:
...
Hi Mark, Liam, Jaroslav, Takashi
I met an issue in which kernel panic appears in dmaengine_pcm_dma_complete
function on a quad-core system. The dmaengine_pcm_dma_complete is running
core0, while snd_pcm_release has already been executed on core1, due to in
low memory stress oom killer kills the audio thread to release some memory.
snd_pcm_release frees the runtime parameters, and runtime is used in
dmaengine_pcm_dma_complete, which is a callback from tasklet in dmaengine.
In current audio driver, we can't promise that
dmaengine_pcm_dma_complete is
not executed after snd_pcm_release on multi cores. Maybe we should add some
protection. Do you have any suggestion?
I have tried to apply below workaround, which can fix the panic, but I'm
not
confident it's proper. Need your comment and better suggestion.
I think this is a general problem with your dmaengine driver, nothing audio
specific. If the callback is able to run after dmaengine_terminate_all() has
returned successfully there is a bug in the dmaengine driver. You need to
The terminate_all runs after callback, and they run just very close on
different cores. should soc-dmaengine add such protection anyway?
The problem is that if there is a race, that the callback races against the
freeing of the prtd, then there is also the chance that the callback races
against the freeing of the substream. So in that case, e.g. with your patch,
you'd try to lock a mutex for which the memory already has been freed. So we
need a way to synchronize against the callbacks, i.e. makes sure that non of
the callbacks are running anymore at a given point. And only after that
point we are allowed to free the memory that is referenced in the callback.
Okay reading thru the mail series and code:
Since we are using cyclic dma here, we will get callback based on periods. So
it is a very common case that you terminate and the callback is invoked.
Now callback can be invoked by

the thread terminating audio, in TRIGGER_STOP
in the callback context, you invoked callback which would then go and call

the period_elapsed ultimately leading to TRIGGER_STOP (xrun)
We need to take care of these conditions:

In dma driver, once terminate_all in invoked, grab the lock, disable the

tasklet, pause/stop the dmaengine remove all the descriptors from the lists.
This ensures that dmaengine doesnt trigger anything new. And if it does we dont
call into client
what lock do you refer to? is it "snd_pcm_stream_lock" or a new one in 
dma driver?
...

If we get an interrupt or tasklet invoked after this, then it is the

resposiblity of dma driver to clear interrupt and return

While you have invoked the terminate_all you might get a callback, in that

case the substream is still valid (you are still in TRIGGER_STOP). There should
be no harm in calling period_elapsed, but it would be good if we detect that and
return from here.

My only worry is that during callback we drop the locks held, so callback can

be running on different CPU while you process the terminate all. This is very
racy and possibly the issue being seen in this thread. This gets complicated by
that fact that xrun would invoke the stop thus terminate_all.
The timing is very racy. we have two platforms, of which the only 
difference is that one is 2 * a9 cpu, and the other is 4 * a7 cpu. all 
other components and peripherals are the same. The result is we can't 
reproduce the panic issue after more than 4 days stress test on 2-cpu 
platform, but can reproduce the issue in ~10 hours level on the 4-cpu 
platform.
...
...
...
...
On the other hand that last part could get tricky as the
dmaengine_terminate_all() might be call from within the callback.
It's tricky indeed in case xrun happens. we should avoid possible deadlock.
I think we'll eventually need to versions of dmaengine_terminate_all(). A
sync version which makes sure that the tasklet has finished and a non-sync
version that only makes sure that no new callbacks are started. I think the
sync version should be the default with an optional async version which must
be used, if it can run from within the callback. So we'd call the async
version in the pcm_trigger callback and the sync version in the pcm_close
callback.
Yes this can be done. We can name this disable_callback cmd. The cmd will tell
dma driver to disable all callback on the channel. This can be invoked from the
TRIGEGR_STOP and then terminate_all in the free
Which dma driver are you guys using in this? I will send a patch for the core
and pcm layer. Someone need to test on actual hardware with driver fix :)
I'm using the mmp_tdma driver under /drivers/dma/, and I can test the 
patch on our 4-cpu platform. thanks.
-- 

Best Regards
Qiao