[RFC PATCH 0/2] ASoC: soc-pcm: fix trigger race conditions with shared BE
Pierre-Louis Bossart
pierre-louis.bossart at linux.intel.com
Tue Aug 17 18:40:52 CEST 2021
We've been adding a 'deep buffer' PCM device to several SOF topologies
in order to reduce power consumption. The typical use-case would be
music playback over a headset: this additional PCM device provides
more buffering and longer latencies, leaving the rest of the system
sleep for longer periods. Notifications and 'regular' low-latency
audio playback would still use the 'normal' PCM device and be mixed
with the 'deep buffer' before rendering on the headphone endpoint. The
tentative direction would be to expose this alternate device to
PulseAudio/PipeWire/CRAS via the UCM SectionModifier definitions.
That seemed a straightforward topology change until our automated
validation stress tests started reporting issues on SoundWire
platforms, when e.g. two START triggers might be send and conversely
the STOP trigger is never sent. The SoundWire stream state management
flagged inconsistent states when the two 'normal' and 'deep buffer'
devices are used concurrently with rapid play/stop/pause monkey
testing.
Looking at the soc-pcm.c code, it seems that the BE state
management needs a lot of love.
a) there is no consistent protection for the BE state. In some parts
of the code, the state updates are protected by a spinlock but in the
trigger they are not. When we open/play/close the two PCM devices in
stress tests, we end-up testing a state that is being modified. That
can't be good.
b) there is a conceptual deadlock: on stop we check the FE states to
see if a shared BE can be stopped, but since we trigger the BE first
the FE states have not been modified yet, so the TRIGGER_STOP is never
sent.
This patchset suggests using the same spinlock used in other parts of
soc-dpcm.c, and the use of a refcount to decide when to trigger the
BE. With the two patches I am able to run our entire validation suite
without any issues with this new 'deep buffer' topology.
One might ask 'how come we didn't see this earlier'? The answer is
probably that the .trigger callbacks in most implementations seems to
perform DAPM operations, and sending the triggers multiple times is
not an issue. In the case of SoundWire, we do use the .trigger
callback to reconfigure the bus using the 'bank switch' mechanism. It
could be acceptable to tolerate a trigger multiple times, but the
deadlock on stop cannot be fixed at the SoundWire layer alone.
Opens:
1) is this the right solution? The DPCM code is far from simple, has
notions such as SND_SOC_DPCM_UPDATE_NO and 'trigger_pending' that I
have no background on.
2) it's not clear if we need the full-blown spin_lock_irq_save()
version or just the regular spin_lock()?
3) is this universal? this might break platforms with other types of
'POST' or 'BESPOKE' trigger.
4) is this ok to proceed with such an incremental fix, or do we have
to revisit (possibly rewrite) the entire FE-BE interaction?
I chose to send this patchset as an RFC to gather feedback and make
use others know about DPCM issues. We're going to spend more time on
this but if others can provide feedback/test results it would be
greatly appreciated.
Note that these DPCM issues will add more complexity to our SOF
distribution. We've never hit a case where even recent kernels do not
support a minor topology change that isn't related to an ABI
change. We might have to introduce the deep buffer PCM device in a new
topology file, add a kernel patch to make use of this new file and
keep the old (w/o deep buffer) and new file (w/ deep buffer) for a
very long time (not sure when we can assume that all users of SOF
would transition to 5.15+...)
Pierre-Louis Bossart (2):
ASoC: soc-pcm: protect BE dailink state changes in trigger
ASoC: soc-pcm: test refcount before triggering
include/sound/soc-dpcm.h | 2 +
sound/soc/soc-pcm.c | 151 +++++++++++++++++++++++++++++++++------
2 files changed, 130 insertions(+), 23 deletions(-)
--
2.25.1
More information about the Alsa-devel
mailing list