[RFC PATCH v2 0/5] ASoC: soc-pcm: fix trigger race conditions with shared BE
Pierre-Louis Bossart
pierre-louis.bossart at linux.intel.com
Mon Oct 11 22:06:51 CEST 2021
>>> Basically we need to protect two things:
>>> - The BE links
>>> - The concurrent accesses to BEs
>>> The former belongs to each FE that holds the links, and the FE stream
>>> lock would cover. The latter is rather a per-BE business.
>>>
>>> An oft-seen risk of multiple locks is deadlocking, but in this case,
>>> as long as we keep the lock order FE->BE, nothing wrong can happen.
>>
>> famous last words "nothing wrong can happen." :-)
>>
>> I already added a helper to do this FE lock, I can easily replace the
>> implementation to remove the spin_lock and use the FE PCM lock.
>> we might even add the lock in the definition of for_each_dpcm_be() to
>> avoid misses.
>>
>> Let me try this out today, thanks for the suggestions.
>
> well, it's not successful at all...
>
> When I replace the existing dpcm_lock with the FE PCM lock as you
> suggested, without any additional changes, speaker-test produces valid
> audio on the endpoints, but if I try a Ctrl-C or limit the number of
> loops with the '-l' option, I hear an endless loop on the same buffer
> and I have to power cycle my test device to stop the sound.
>
> See 2 patches attached, the first patch with the introduction of the
> helper works fine, the second with the use of the FE PCM lock doesn't.
> In hindsight I am glad I worked on minimal patches, one after the other,
> to identify problems.
>
> And when I add the BE lock, then nothing happens. Device stuck and no
> audio...
>
> There must be something we're missing related to the locking...
And indeed there's a deadlock!
snd_pcm_period_elapsed() takes the FE pcm stream lock, and will call
snd_pcm_trigger. So if we also take the pcm stream lock in the BE
trigger, there's a conceptual deadlock: we painted ourselves in a corner
by using the same lock twice.
Takashi, are you really sure we should protect these for_each_dpcm_be()
loops with the same pcm lock? it seems like asking for trouble to
revisit the ALSA core just to walking through a list of BEs? Would you
object to changing dpcm_lock as I suggested, but not interfering with
stream handling?
See the traces below based on the hack in
https://github.com/plbossart/sound/tree/test/dpcm-lock-loop-on-stop
[ 67.892082] skl_hda_dsp_generic skl_hda_dsp_generic:
snd_sof_pcm_period_elapsed_work: plb taking lock
[ 67.892088] skl_hda_dsp_generic skl_hda_dsp_generic:
snd_sof_pcm_period_elapsed_work: plb taking lock - done
[ 67.892092] skl_hda_dsp_generic skl_hda_dsp_generic:
snd_pcm_period_elapsed_under_stream_lock: start
[ 67.892096] skl_hda_dsp_generic skl_hda_dsp_generic:
snd_pcm_period_elapsed_under_stream_lock: check running
[ 67.892099] skl_hda_dsp_generic skl_hda_dsp_generic:
snd_pcm_period_elapsed_under_stream_lock: check update_hw_ptr0
[ 67.892103] skl_hda_dsp_generic skl_hda_dsp_generic:
snd_pcm_update_hw_ptr0: start
[ 67.892110] skl_hda_dsp_generic skl_hda_dsp_generic:
snd_pcm_update_hw_ptr0: delta
[ 67.892113] skl_hda_dsp_generic skl_hda_dsp_generic:
snd_pcm_update_hw_ptr0: check1
[ 67.892116] skl_hda_dsp_generic skl_hda_dsp_generic:
snd_pcm_update_hw_ptr0: no_delta_check
[ 67.892119] skl_hda_dsp_generic skl_hda_dsp_generic:
snd_pcm_update_hw_ptr0: playback silence
[ 67.892123] skl_hda_dsp_generic skl_hda_dsp_generic:
snd_pcm_update_hw_ptr0: checks 3
[ 67.892126] skl_hda_dsp_generic skl_hda_dsp_generic:
snd_pcm_update_hw_ptr0: done
[ 67.892129] skl_hda_dsp_generic skl_hda_dsp_generic:
snd_pcm_update_state: start
[ 67.892132] skl_hda_dsp_generic skl_hda_dsp_generic:
snd_pcm_update_state: before draining
[ 67.892136] skl_hda_dsp_generic skl_hda_dsp_generic:
snd_pcm_update_state: before draining2
[ 67.892139] skl_hda_dsp_generic skl_hda_dsp_generic:
snd_pcm_update_state: before snd_pcm_drain_done
[ 67.892143] skl_hda_dsp_generic skl_hda_dsp_generic: snd_pcm_do_stop:
start
[ 67.892147] skl_hda_dsp_generic skl_hda_dsp_generic: snd_pcm_do_stop:
before TRIGGER_STOP
<<< we never reach the end of this TRIGGER_STOP
[ 67.892153] sof-audio-pci-intel-cnl 0000:00:1f.3: pcm: trigger stream
0 dir 0 cmd 0
[ 67.892166] sof-audio-pci-intel-cnl 0000:00:1f.3: ipc tx: 0x60050000:
GLB_STREAM_MSG: TRIG_STOP
[ 67.892396] sof-audio-pci-intel-cnl 0000:00:1f.3: ipc tx succeeded:
0x60050000: GLB_STREAM_MSG: TRIG_STOP
[ 67.892408] sof-audio-pci-intel-cnl 0000:00:1f.3: FW Poll Status:
reg[0x160]=0x140000 successful
[ 67.892418] sof-audio-pci-intel-cnl 0000:00:1f.3: ipc tx: 0x60030000:
GLB_STREAM_MSG: PCM_FREE
[ 67.892564] sof-audio-pci-intel-cnl 0000:00:1f.3: ipc tx succeeded:
0x60030000: GLB_STREAM_MSG: PCM_FREE
[ 67.892571] HDA Analog: dpcm_be_dai_trigger: BE TRIGGER_STOP start
[ 67.892575] HDA Analog: dpcm_be_dai_trigger: BE TRIGGER_STOP check
can_be_free_stop
[ 67.892579] HDA Analog: snd_soc_dpcm_check_state: plb: taking fe
lock_irqsave from snd_soc_dpcm_check_state
<<< checkmate at this point, we're trying to take the same lock twice.
More information about the Alsa-devel
mailing list