[RFC PATCH v2 0/5] ASoC: soc-pcm: fix trigger race conditions with shared BE
Pierre-Louis Bossart
pierre-louis.bossart at linux.intel.com
Tue Oct 12 15:45:41 CEST 2021
>>> And indeed there's a deadlock!
>>>
>>> snd_pcm_period_elapsed() takes the FE pcm stream lock, and will call
>>> snd_pcm_trigger.
>>
>> Indeed, this would deadlock.
>>
>>> So if we also take the pcm stream lock in the BE
>>> trigger, there's a conceptual deadlock: we painted ourselves in a corner
>>> by using the same lock twice.
>>>
>>> Takashi, are you really sure we should protect these for_each_dpcm_be()
>>> loops with the same pcm lock?
>>
>> The call within the FE lock is done only in dpcm_dai_trigger_fe_be(),
>> and this should call dpcm_be_dai_trigger() as is. In other places,
>> the calls are without FE lock, hence they can take the lock,
>> e.g. create a variant dpcm_dai_trigger_fe_be_lock().
>
> Or rather it was the code path of snd_soc_dpcm_check_state()?
> The function could be called from dpcm_be_dai_trigger().
> Currently dpcm_lock seems to be added at places casually covering some
> of the for_each_dpcm_be() or whatever... The whole lock scheme has to
> be revisited.
>
>>> it seems like asking for trouble to
>>> revisit the ALSA core just to walking through a list of BEs? Would you
>>> object to changing dpcm_lock as I suggested, but not interfering with
>>> stream handling?
>>
>> That would work, too, it's just a pity to degrade the fine-grained
>> locks that have been already taken into global locks...
>
> For the performance problem, at least, you can make it rwlock and
> rwsem types (depending on nonatomic) so that the concurrent accesses
> would work. The only exclusive lock is the case for adding and
> removing the entries, which should be done with write lock / sem down,
> while other link traversals can be executed in the read lock / sem.
Thanks Takashi for your feedback.
Let's first get the locking right. We can optimize performance later.
I will continue with the idea of using existing fine-grained locks a bit
more, I am with you that this dpcm_lock was not added in a consistent
way and reusing the concept is really building on sand.
We can remove the lock in snd_soc_dpcm_check_state, I already did the
change in other versions. But what I'll need to check further is if
indeed dpcm_be_dai_trigger() is called with the FE lock taken already.
Adding a lockdep_assert_help() or something would help I guess.
The part where I am still not clear is that snd_pcm_period_elapsed uses
the irqsave/irqrestore version, but in the initial code you suggested
the vanilla irq version is fine. I am not sure I get the nuance here,
and btw in the case of SOF the snd_pcm_period_elapsed is called from a
workqueue, not an interrupt handler, as a work-around to avoid an IPC
deadlock, so we probably don't need the irqsave/irqrestore version anyways.
More information about the Alsa-devel
mailing list