Basically we need to protect two things:
- The BE links
- The concurrent accesses to BEs
The former belongs to each FE that holds the links, and the FE stream lock would cover. The latter is rather a per-BE business.
An oft-seen risk of multiple locks is deadlocking, but in this case, as long as we keep the lock order FE->BE, nothing wrong can happen.
famous last words "nothing wrong can happen." :-)
I already added a helper to do this FE lock, I can easily replace the implementation to remove the spin_lock and use the FE PCM lock. we might even add the lock in the definition of for_each_dpcm_be() to avoid misses.
Let me try this out today, thanks for the suggestions.
well, it's not successful at all...
When I replace the existing dpcm_lock with the FE PCM lock as you suggested, without any additional changes, speaker-test produces valid audio on the endpoints, but if I try a Ctrl-C or limit the number of loops with the '-l' option, I hear an endless loop on the same buffer and I have to power cycle my test device to stop the sound.
See 2 patches attached, the first patch with the introduction of the helper works fine, the second with the use of the FE PCM lock doesn't. In hindsight I am glad I worked on minimal patches, one after the other, to identify problems.
And when I add the BE lock, then nothing happens. Device stuck and no audio...
There must be something we're missing related to the locking...
And indeed there's a deadlock!
snd_pcm_period_elapsed() takes the FE pcm stream lock, and will call snd_pcm_trigger. So if we also take the pcm stream lock in the BE trigger, there's a conceptual deadlock: we painted ourselves in a corner by using the same lock twice.
Takashi, are you really sure we should protect these for_each_dpcm_be() loops with the same pcm lock? it seems like asking for trouble to revisit the ALSA core just to walking through a list of BEs? Would you object to changing dpcm_lock as I suggested, but not interfering with stream handling?
See the traces below based on the hack in https://github.com/plbossart/sound/tree/test/dpcm-lock-loop-on-stop
[ 67.892082] skl_hda_dsp_generic skl_hda_dsp_generic: snd_sof_pcm_period_elapsed_work: plb taking lock [ 67.892088] skl_hda_dsp_generic skl_hda_dsp_generic: snd_sof_pcm_period_elapsed_work: plb taking lock - done [ 67.892092] skl_hda_dsp_generic skl_hda_dsp_generic: snd_pcm_period_elapsed_under_stream_lock: start [ 67.892096] skl_hda_dsp_generic skl_hda_dsp_generic: snd_pcm_period_elapsed_under_stream_lock: check running [ 67.892099] skl_hda_dsp_generic skl_hda_dsp_generic: snd_pcm_period_elapsed_under_stream_lock: check update_hw_ptr0 [ 67.892103] skl_hda_dsp_generic skl_hda_dsp_generic: snd_pcm_update_hw_ptr0: start [ 67.892110] skl_hda_dsp_generic skl_hda_dsp_generic: snd_pcm_update_hw_ptr0: delta [ 67.892113] skl_hda_dsp_generic skl_hda_dsp_generic: snd_pcm_update_hw_ptr0: check1 [ 67.892116] skl_hda_dsp_generic skl_hda_dsp_generic: snd_pcm_update_hw_ptr0: no_delta_check [ 67.892119] skl_hda_dsp_generic skl_hda_dsp_generic: snd_pcm_update_hw_ptr0: playback silence [ 67.892123] skl_hda_dsp_generic skl_hda_dsp_generic: snd_pcm_update_hw_ptr0: checks 3 [ 67.892126] skl_hda_dsp_generic skl_hda_dsp_generic: snd_pcm_update_hw_ptr0: done [ 67.892129] skl_hda_dsp_generic skl_hda_dsp_generic: snd_pcm_update_state: start [ 67.892132] skl_hda_dsp_generic skl_hda_dsp_generic: snd_pcm_update_state: before draining [ 67.892136] skl_hda_dsp_generic skl_hda_dsp_generic: snd_pcm_update_state: before draining2 [ 67.892139] skl_hda_dsp_generic skl_hda_dsp_generic: snd_pcm_update_state: before snd_pcm_drain_done [ 67.892143] skl_hda_dsp_generic skl_hda_dsp_generic: snd_pcm_do_stop: start [ 67.892147] skl_hda_dsp_generic skl_hda_dsp_generic: snd_pcm_do_stop: before TRIGGER_STOP
<<< we never reach the end of this TRIGGER_STOP
[ 67.892153] sof-audio-pci-intel-cnl 0000:00:1f.3: pcm: trigger stream 0 dir 0 cmd 0 [ 67.892166] sof-audio-pci-intel-cnl 0000:00:1f.3: ipc tx: 0x60050000: GLB_STREAM_MSG: TRIG_STOP [ 67.892396] sof-audio-pci-intel-cnl 0000:00:1f.3: ipc tx succeeded: 0x60050000: GLB_STREAM_MSG: TRIG_STOP [ 67.892408] sof-audio-pci-intel-cnl 0000:00:1f.3: FW Poll Status: reg[0x160]=0x140000 successful [ 67.892418] sof-audio-pci-intel-cnl 0000:00:1f.3: ipc tx: 0x60030000: GLB_STREAM_MSG: PCM_FREE [ 67.892564] sof-audio-pci-intel-cnl 0000:00:1f.3: ipc tx succeeded: 0x60030000: GLB_STREAM_MSG: PCM_FREE [ 67.892571] HDA Analog: dpcm_be_dai_trigger: BE TRIGGER_STOP start [ 67.892575] HDA Analog: dpcm_be_dai_trigger: BE TRIGGER_STOP check can_be_free_stop [ 67.892579] HDA Analog: snd_soc_dpcm_check_state: plb: taking fe lock_irqsave from snd_soc_dpcm_check_state
<<< checkmate at this point, we're trying to take the same lock twice.