[RFC PATCH v2 0/5] ASoC: soc-pcm: fix trigger race conditions with shared BE

Takashi Iwai tiwai at suse.de
Thu Oct 7 13:06:00 CEST 2021


On Wed, 06 Oct 2021 21:47:33 +0200,
Pierre-Louis Bossart wrote:
> 
> 
> > I tested this further. It appears that things work fine till 'patch 3/5'
> > of yours. After I take 'patch 4/5', the print "BUG: scheduling while
> > atomic: aplay" started showing up and I see a hang. This failure was
> > seen for 2x1 mixer test itself.
> > 
> > The 'patch 4/5' introduces mutex_lock/unlock() in dpcm_be_dai_trigger().
> > This seems to be the problem, since trigger() runs in atomic context
> > depending on the PCM 'nonatomic' flag. I am not sure if your design sets
> > 'nonatomic' flag by default and that is why the issue is not seen at
> > your end?
> > 
> > With below (just for testing purpose), tests ran well. I was able to run
> > 2x1, 3x1 ... 10x1 mixer tests.
> 
> This is useful information, thanks a lot Sameer for your time.
> 
> Unfortunately removing the lines below will not work, that's
> precisely what's needed to prevent multiple triggers from being sent to
> the same shared BE.
> 
> I don't have a clear idea why we see different results, and now I have
> even less understanding of the ALSA/ASoC/DPCM locking model. We use
> card->mutex, card->pcm_mutex, card->dpcm_lock,
> substream->self_group.mutex or lock, client_mutex, dapm_mutex....
> 
> I don't think any of the DPCM code is ever called from an interrupt
> context, so the errors you reported seem more like a card configuration,
> specifically the interaction with 'aplay'/userspace will happen for FEs.
> BEs are mostly hidden to userspace.
> 
> One possible difference is that all our DPCM solutions are based on
> non-atomic FEs (since they all involve an IPC with a DSP), so we would
> always use the top banch of these tests:
> 
> 	if (nonatomic) \
> 		mutex_ ## mutex_action(&group->mutex); \
> 	else \
> 		spin_ ## action(&group->lock);
> 
> I don't see this nonatomic flag set anywhere in the sound/soc/tegra
> code, so it's possible that in your case the spin_lock/spin_lock_irq is
> used before triggering the FE, and using a mutex before the BE trigger
> throws errors? I think Takashi mentioned that the BEs inherit the
> properties of the FE to some extent.
> 
> Looking at the code, I see that only Intel legacy, SOF and Qualcomm
> drivers set nonatomic=1, so maybe we can assume that that FEs for a card
> are either all atomic or all non-atomic, maybe we could then use a
> spinlock or a mutex. But if the FEs can be different then I am not sure
> what locking model might work, we can't have a BE protected by a
> spinlock or a mutex depending on the property of the FE. We need one
> method only to guarantee a mutual exclusion.
> 
> Takashi, Mark, do you think that an all/none assumption on FE nonatomic
> properties would make sense?

As long as BE's are updated from FE's PCM callback, BE has to follow
the atomicity of its FE, so we can't assume all or none globally.

How is the expected lock granularity and the protection context?  Do
we need a card global lock/mutex, or is it specific to each BE
substream?


thanks,

Takashi


More information about the Alsa-devel mailing list