[RFC PATCH v2 0/5] ASoC: soc-pcm: fix trigger race conditions with shared BE

Tue Oct 12 12:42:10 CEST 2021

On Tue, 12 Oct 2021 08:34:07 +0200,
Takashi Iwai wrote:
> 
> On Mon, 11 Oct 2021 22:06:51 +0200,
> Pierre-Louis Bossart wrote:
> > 
> > 
> > >>> Basically we need to protect two things:
> > >>> - The BE links
> > >>> - The concurrent accesses to BEs
> > >>> The former belongs to each FE that holds the links, and the FE stream
> > >>> lock would cover.  The latter is rather a per-BE business.
> > >>>
> > >>> An oft-seen risk of multiple locks is deadlocking, but in this case,
> > >>> as long as we keep the lock order FE->BE, nothing wrong can happen.
> > >>
> > >> famous last words  "nothing wrong can happen." :-)
> > >>
> > >> I already added a helper to do this FE lock, I can easily replace the
> > >> implementation to remove the spin_lock and use the FE PCM lock.
> > >> we might even add the lock in the definition of for_each_dpcm_be() to
> > >> avoid misses.
> > >>
> > >> Let me try this out today, thanks for the suggestions.
> > > 
> > > well, it's not successful at all...
> > > 
> > > When I replace the existing dpcm_lock with the FE PCM lock as you
> > > suggested, without any additional changes, speaker-test produces valid
> > > audio on the endpoints, but if I try a Ctrl-C or limit the number of
> > > loops with the '-l' option, I hear an endless loop on the same buffer
> > > and I have to power cycle my test device to stop the sound.
> > > 
> > > See 2 patches attached, the first patch with the introduction of the
> > > helper works fine, the second with the use of the FE PCM lock doesn't.
> > > In hindsight I am glad I worked on minimal patches, one after the other,
> > > to identify problems.
> > > 
> > > And when I add the BE lock, then nothing happens. Device stuck and no
> > > audio...
> > > 
> > > There must be something we're missing related to the locking...
> > 
> > And indeed there's a deadlock!
> > 
> > snd_pcm_period_elapsed() takes the FE pcm stream lock, and will call
> > snd_pcm_trigger.
> 
> Indeed, this would deadlock.
> 
> > So if we also take the pcm stream lock in the BE
> > trigger, there's a conceptual deadlock: we painted ourselves in a corner
> > by using the same lock twice.
> > 
> > Takashi, are you really sure we should protect these for_each_dpcm_be()
> > loops with the same pcm lock?
> 
> The call within the FE lock is done only in dpcm_dai_trigger_fe_be(),
> and this should call dpcm_be_dai_trigger() as is.  In other places,
> the calls are without FE lock, hence they can take the lock,
> e.g. create a variant dpcm_dai_trigger_fe_be_lock().

Or rather it was the code path of snd_soc_dpcm_check_state()?
The function could be called from dpcm_be_dai_trigger().
Currently dpcm_lock seems to be added at places casually covering some
of the for_each_dpcm_be() or whatever...  The whole lock scheme has to
be revisited.

> > it seems like asking for trouble to
> > revisit the ALSA core just to walking through a list of BEs? Would you
> > object to changing dpcm_lock as I suggested, but not interfering with
> > stream handling?
> 
> That would work, too, it's just a pity to degrade the fine-grained
> locks that have been already taken into global locks...

For the performance problem, at least, you can make it rwlock and
rwsem types (depending on nonatomic) so that the concurrent accesses
would work.  The only exclusive lock is the case for adding and
removing the entries, which should be done with write lock / sem down,
while other link traversals can be executed in the read lock / sem.

Takashi