[alsa-devel] async between dmaengine_pcm_dma_complete and snd_pcm_release

Qiao Zhou zhouqiao at marvell.com
Thu Oct 10 07:50:54 CEST 2013


On 10/10/2013 10:54 AM, Vinod Koul wrote:
> On Wed, Oct 09, 2013 at 01:00:08PM +0200, Lars-Peter Clausen wrote:
>> Added Vinod to Cc.
>>
>> On 10/09/2013 12:23 PM, Qiao Zhou wrote:
>>> On 10/09/2013 04:30 PM, Lars-Peter Clausen wrote:
>>>> On 10/09/2013 10:19 AM, Lars-Peter Clausen wrote:
>>>>> On 10/09/2013 09:29 AM, Qiao Zhou wrote:
>>>>>> Hi Mark, Liam, Jaroslav, Takashi
>>>>>>
>>>>>> I met an issue in which kernel panic appears in dmaengine_pcm_dma_complete
>>>>>> function on a quad-core system. The dmaengine_pcm_dma_complete is running
>>>>>> core0, while snd_pcm_release has already been executed on core1, due to in
>>>>>> low memory stress oom killer kills the audio thread to release some memory.
>>>>>>
>>>>>> snd_pcm_release frees the runtime parameters, and runtime is used in
>>>>>> dmaengine_pcm_dma_complete, which is a callback from tasklet in dmaengine.
>>>>>> In current audio driver, we can't promise that
>>>>>> dmaengine_pcm_dma_complete is
>>>>>> not executed after snd_pcm_release on multi cores. Maybe we should add some
>>>>>> protection. Do you have any suggestion?
>>>>>>
>>>>>> I have tried to apply below workaround, which can fix the panic, but I'm
>>>>>> not
>>>>>> confident it's proper. Need your comment and better suggestion.
>>>>>
>>>>> I think this is a general problem with your dmaengine driver, nothing audio
>>>>> specific. If the callback is able to run after dmaengine_terminate_all() has
>>>>> returned successfully there is a bug in the dmaengine driver. You need to
>>> The terminate_all runs after callback, and they run just very close on
>>> different cores. should soc-dmaengine add such protection anyway?
>>
>> The problem is that if there is a race, that the callback races against the
>> freeing of the prtd, then there is also the chance that the callback races
>> against the freeing of the substream. So in that case, e.g. with your patch,
>> you'd try to lock a mutex for which the memory already has been freed. So we
>> need a way to synchronize against the callbacks, i.e. makes sure that non of
>> the callbacks are running anymore at a given point. And only after that
>> point we are allowed to free the memory that is referenced in the callback.
> Okay reading thru the mail series and code:
>
> Since we are using cyclic dma here, we will get callback based on periods. So
> it is a very common case that you terminate and the callback is invoked.
>
> Now callback can be invoked by
> 1) the thread terminating audio, in TRIGGER_STOP
> 2) in the callback context, you invoked callback which would then go and call
> the period_elapsed ultimately leading to TRIGGER_STOP (xrun)
>
> We need to take care of these conditions:
>
> 1. In dma driver, once terminate_all in invoked, grab the lock, disable the
> tasklet, pause/stop the dmaengine remove all the descriptors from the lists.
> This ensures that dmaengine doesnt trigger anything new. And if it does we dont
> call into client
what lock do you refer to? is it "snd_pcm_stream_lock" or a new one in 
dma driver?
>
> 2. If we get an interrupt or tasklet invoked after this, then it is the
> resposiblity of dma driver to clear interrupt and return
>
> 3. While you have invoked the terminate_all you might get a callback, in that
> case the substream is still valid (you are still in TRIGGER_STOP). There should
> be no harm in calling period_elapsed, but it would be good if we detect that and
> return from here.
>
> 4. My only worry is that during callback we drop the locks held, so callback can
> be running on different CPU while you process the terminate all. This is very
> racy and possibly the issue being seen in this thread. This gets complicated by
> that fact that xrun would invoke the stop thus terminate_all.
The timing is very racy. we have two platforms, of which the only 
difference is that one is 2 * a9 cpu, and the other is 4 * a7 cpu. all 
other components and peripherals are the same. The result is we can't 
reproduce the panic issue after more than 4 days stress test on 2-cpu 
platform, but can reproduce the issue in ~10 hours level on the 4-cpu 
platform.
>
>>>> On the other hand that last part could get tricky as the
>>>> dmaengine_terminate_all() might be call from within the callback.
>>> It's tricky indeed in case xrun happens. we should avoid possible deadlock.
>>
>> I think we'll eventually need to versions of dmaengine_terminate_all(). A
>> sync version which makes sure that the tasklet has finished and a non-sync
>> version that only makes sure that no new callbacks are started. I think the
>> sync version should be the default with an optional async version which must
>> be used, if it can run from within the callback. So we'd call the async
>> version in the pcm_trigger callback and the sync version in the pcm_close
>> callback.
> Yes this can be done. We can name this disable_callback cmd. The cmd will tell
> dma driver to disable all callback on the channel. This can be invoked from the
> TRIGEGR_STOP and then terminate_all in the free
>
> Which dma driver are you guys using in this? I will send a patch for the core
> and pcm layer. Someone need to test on actual hardware with driver fix :)
>
I'm using the mmp_tdma driver under /drivers/dma/, and I can test the 
patch on our 4-cpu platform. thanks.

-- 

Best Regards
Qiao


More information about the Alsa-devel mailing list