[RFC PATCH 0/2] ASoC: soc-pcm: fix trigger race conditions with shared BE
We've been adding a 'deep buffer' PCM device to several SOF topologies in order to reduce power consumption. The typical use-case would be music playback over a headset: this additional PCM device provides more buffering and longer latencies, leaving the rest of the system sleep for longer periods. Notifications and 'regular' low-latency audio playback would still use the 'normal' PCM device and be mixed with the 'deep buffer' before rendering on the headphone endpoint. The tentative direction would be to expose this alternate device to PulseAudio/PipeWire/CRAS via the UCM SectionModifier definitions.
That seemed a straightforward topology change until our automated validation stress tests started reporting issues on SoundWire platforms, when e.g. two START triggers might be send and conversely the STOP trigger is never sent. The SoundWire stream state management flagged inconsistent states when the two 'normal' and 'deep buffer' devices are used concurrently with rapid play/stop/pause monkey testing.
Looking at the soc-pcm.c code, it seems that the BE state management needs a lot of love.
a) there is no consistent protection for the BE state. In some parts of the code, the state updates are protected by a spinlock but in the trigger they are not. When we open/play/close the two PCM devices in stress tests, we end-up testing a state that is being modified. That can't be good.
b) there is a conceptual deadlock: on stop we check the FE states to see if a shared BE can be stopped, but since we trigger the BE first the FE states have not been modified yet, so the TRIGGER_STOP is never sent.
This patchset suggests using the same spinlock used in other parts of soc-dpcm.c, and the use of a refcount to decide when to trigger the BE. With the two patches I am able to run our entire validation suite without any issues with this new 'deep buffer' topology.
One might ask 'how come we didn't see this earlier'? The answer is probably that the .trigger callbacks in most implementations seems to perform DAPM operations, and sending the triggers multiple times is not an issue. In the case of SoundWire, we do use the .trigger callback to reconfigure the bus using the 'bank switch' mechanism. It could be acceptable to tolerate a trigger multiple times, but the deadlock on stop cannot be fixed at the SoundWire layer alone.
Opens:
1) is this the right solution? The DPCM code is far from simple, has notions such as SND_SOC_DPCM_UPDATE_NO and 'trigger_pending' that I have no background on.
2) it's not clear if we need the full-blown spin_lock_irq_save() version or just the regular spin_lock()?
3) is this universal? this might break platforms with other types of 'POST' or 'BESPOKE' trigger.
4) is this ok to proceed with such an incremental fix, or do we have to revisit (possibly rewrite) the entire FE-BE interaction?
I chose to send this patchset as an RFC to gather feedback and make use others know about DPCM issues. We're going to spend more time on this but if others can provide feedback/test results it would be greatly appreciated.
Note that these DPCM issues will add more complexity to our SOF distribution. We've never hit a case where even recent kernels do not support a minor topology change that isn't related to an ABI change. We might have to introduce the deep buffer PCM device in a new topology file, add a kernel patch to make use of this new file and keep the old (w/o deep buffer) and new file (w/ deep buffer) for a very long time (not sure when we can assume that all users of SOF would transition to 5.15+...)
Pierre-Louis Bossart (2): ASoC: soc-pcm: protect BE dailink state changes in trigger ASoC: soc-pcm: test refcount before triggering
include/sound/soc-dpcm.h | 2 + sound/soc/soc-pcm.c | 151 +++++++++++++++++++++++++++++++++------ 2 files changed, 130 insertions(+), 23 deletions(-)
When more than one FE is connected to a BE, e.g. in a mixing use case, the BE can be triggered multiple times when the FE are opened/started concurrently. This race condition is problematic in the case of SoundWire BE dailinks, and this is not desirable in a general case. The code carefully checks when the BE can be stopped or hw_free'ed, but the trigger code does not use any mutual exclusion.
Fix by using the same spinlock already used to check FE states, and set the state before the trigger. In case of errors, the initial state will be restored.
This patch does not change how the triggers are handled, it only makes sure the states are handled in critical sections.
Signed-off-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com --- sound/soc/soc-pcm.c | 103 ++++++++++++++++++++++++++++++++++++-------- 1 file changed, 85 insertions(+), 18 deletions(-)
diff --git a/sound/soc/soc-pcm.c b/sound/soc/soc-pcm.c index 48f71bb81a2f..0717f39d2eec 100644 --- a/sound/soc/soc-pcm.c +++ b/sound/soc/soc-pcm.c @@ -1999,6 +1999,8 @@ int dpcm_be_dai_trigger(struct snd_soc_pcm_runtime *fe, int stream, struct snd_soc_pcm_runtime *be; struct snd_soc_dpcm *dpcm; int ret = 0; + unsigned long flags; + enum snd_soc_dpcm_state state;
for_each_dpcm_be(fe, stream, dpcm) { struct snd_pcm_substream *be_substream; @@ -2015,76 +2017,141 @@ int dpcm_be_dai_trigger(struct snd_soc_pcm_runtime *fe, int stream,
switch (cmd) { case SNDRV_PCM_TRIGGER_START: + spin_lock_irqsave(&fe->card->dpcm_lock, flags); if ((be->dpcm[stream].state != SND_SOC_DPCM_STATE_PREPARE) && (be->dpcm[stream].state != SND_SOC_DPCM_STATE_STOP) && - (be->dpcm[stream].state != SND_SOC_DPCM_STATE_PAUSED)) + (be->dpcm[stream].state != SND_SOC_DPCM_STATE_PAUSED)) { + spin_unlock_irqrestore(&fe->card->dpcm_lock, flags); continue; + } + state = be->dpcm[stream].state; + be->dpcm[stream].state = SND_SOC_DPCM_STATE_START; + spin_unlock_irqrestore(&fe->card->dpcm_lock, flags);
ret = soc_pcm_trigger(be_substream, cmd); - if (ret) + if (ret) { + spin_lock_irqsave(&fe->card->dpcm_lock, flags); + be->dpcm[stream].state = state; + spin_unlock_irqrestore(&fe->card->dpcm_lock, flags); goto end; + }
- be->dpcm[stream].state = SND_SOC_DPCM_STATE_START; break; case SNDRV_PCM_TRIGGER_RESUME: - if ((be->dpcm[stream].state != SND_SOC_DPCM_STATE_SUSPEND)) + spin_lock_irqsave(&fe->card->dpcm_lock, flags); + if (be->dpcm[stream].state != SND_SOC_DPCM_STATE_SUSPEND) { + spin_unlock_irqrestore(&fe->card->dpcm_lock, flags); continue; + } + + state = be->dpcm[stream].state; + be->dpcm[stream].state = SND_SOC_DPCM_STATE_START; + spin_unlock_irqrestore(&fe->card->dpcm_lock, flags);
ret = soc_pcm_trigger(be_substream, cmd); - if (ret) + if (ret) { + spin_lock_irqsave(&fe->card->dpcm_lock, flags); + be->dpcm[stream].state = state; + spin_unlock_irqrestore(&fe->card->dpcm_lock, flags); goto end; + }
- be->dpcm[stream].state = SND_SOC_DPCM_STATE_START; break; case SNDRV_PCM_TRIGGER_PAUSE_RELEASE: - if ((be->dpcm[stream].state != SND_SOC_DPCM_STATE_PAUSED)) + spin_lock_irqsave(&fe->card->dpcm_lock, flags); + if (be->dpcm[stream].state != SND_SOC_DPCM_STATE_PAUSED) { + spin_unlock_irqrestore(&fe->card->dpcm_lock, flags); continue; + } + + state = be->dpcm[stream].state; + be->dpcm[stream].state = SND_SOC_DPCM_STATE_START; + spin_unlock_irqrestore(&fe->card->dpcm_lock, flags);
ret = soc_pcm_trigger(be_substream, cmd); - if (ret) + if (ret) { + spin_lock_irqsave(&fe->card->dpcm_lock, flags); + be->dpcm[stream].state = state; + spin_unlock_irqrestore(&fe->card->dpcm_lock, flags); goto end; + }
- be->dpcm[stream].state = SND_SOC_DPCM_STATE_START; break; case SNDRV_PCM_TRIGGER_STOP: + spin_lock_irqsave(&fe->card->dpcm_lock, flags); if ((be->dpcm[stream].state != SND_SOC_DPCM_STATE_START) && - (be->dpcm[stream].state != SND_SOC_DPCM_STATE_PAUSED)) + (be->dpcm[stream].state != SND_SOC_DPCM_STATE_PAUSED)) { + spin_unlock_irqrestore(&fe->card->dpcm_lock, flags); continue; + } + spin_unlock_irqrestore(&fe->card->dpcm_lock, flags);
if (!snd_soc_dpcm_can_be_free_stop(fe, be, stream)) continue;
+ spin_lock_irqsave(&fe->card->dpcm_lock, flags); + state = be->dpcm[stream].state; + be->dpcm[stream].state = SND_SOC_DPCM_STATE_STOP; + spin_unlock_irqrestore(&fe->card->dpcm_lock, flags); + ret = soc_pcm_trigger(be_substream, cmd); - if (ret) + if (ret) { + spin_lock_irqsave(&fe->card->dpcm_lock, flags); + be->dpcm[stream].state = state; + spin_unlock_irqrestore(&fe->card->dpcm_lock, flags); goto end; + }
- be->dpcm[stream].state = SND_SOC_DPCM_STATE_STOP; break; case SNDRV_PCM_TRIGGER_SUSPEND: - if (be->dpcm[stream].state != SND_SOC_DPCM_STATE_START) + spin_lock_irqsave(&fe->card->dpcm_lock, flags); + if (be->dpcm[stream].state != SND_SOC_DPCM_STATE_START) { + spin_unlock_irqrestore(&fe->card->dpcm_lock, flags); continue; + } + spin_unlock_irqrestore(&fe->card->dpcm_lock, flags);
if (!snd_soc_dpcm_can_be_free_stop(fe, be, stream)) continue;
+ spin_lock_irqsave(&fe->card->dpcm_lock, flags); + state = be->dpcm[stream].state; + be->dpcm[stream].state = SND_SOC_DPCM_STATE_STOP; + spin_unlock_irqrestore(&fe->card->dpcm_lock, flags); + ret = soc_pcm_trigger(be_substream, cmd); - if (ret) + if (ret) { + spin_lock_irqsave(&fe->card->dpcm_lock, flags); + be->dpcm[stream].state = state; + spin_unlock_irqrestore(&fe->card->dpcm_lock, flags); goto end; + }
- be->dpcm[stream].state = SND_SOC_DPCM_STATE_SUSPEND; break; case SNDRV_PCM_TRIGGER_PAUSE_PUSH: - if (be->dpcm[stream].state != SND_SOC_DPCM_STATE_START) + spin_lock_irqsave(&fe->card->dpcm_lock, flags); + if (be->dpcm[stream].state != SND_SOC_DPCM_STATE_START) { + spin_unlock_irqrestore(&fe->card->dpcm_lock, flags); continue; + } + spin_unlock_irqrestore(&fe->card->dpcm_lock, flags);
if (!snd_soc_dpcm_can_be_free_stop(fe, be, stream)) continue;
+ spin_lock_irqsave(&fe->card->dpcm_lock, flags); + state = be->dpcm[stream].state; + be->dpcm[stream].state = SND_SOC_DPCM_STATE_PAUSED; + spin_unlock_irqrestore(&fe->card->dpcm_lock, flags); + ret = soc_pcm_trigger(be_substream, cmd); - if (ret) + if (ret) { + spin_lock_irqsave(&fe->card->dpcm_lock, flags); + be->dpcm[stream].state = state; + spin_unlock_irqrestore(&fe->card->dpcm_lock, flags); goto end; + }
- be->dpcm[stream].state = SND_SOC_DPCM_STATE_PAUSED; break; } }
From: Mark Brown,,, broonie@kernel.org
On Tue, 17 Aug 2021 11:40:53 -0500, Pierre-Louis Bossart wrote:
When more than one FE is connected to a BE, e.g. in a mixing use case, the BE can be triggered multiple times when the FE are opened/started concurrently. This race condition is problematic in the case of SoundWire BE dailinks, and this is not desirable in a general case. The code carefully checks when the BE can be stopped or hw_free'ed, but the trigger code does not use any mutual exclusion.
[...]
Applied, thanks!
[1/2] ASoC: soc-pcm: protect BE dailink state changes in trigger commit: 0c75fc7193387776c10f7c7b440d93496e3d5e21 [2/2] ASoC: soc-pcm: test refcount before triggering commit: 6479f7588651cbc9c91e61c20ff39119cbc8feba
Best regards,
On 8/26/21 1:30 PM, Mark Brown wrote:
From: Mark Brown,,, broonie@kernel.org
On Tue, 17 Aug 2021 11:40:53 -0500, Pierre-Louis Bossart wrote:
When more than one FE is connected to a BE, e.g. in a mixing use case, the BE can be triggered multiple times when the FE are opened/started concurrently. This race condition is problematic in the case of SoundWire BE dailinks, and this is not desirable in a general case. The code carefully checks when the BE can be stopped or hw_free'ed, but the trigger code does not use any mutual exclusion.
[...]
Applied, thanks!
[1/2] ASoC: soc-pcm: protect BE dailink state changes in trigger commit: 0c75fc7193387776c10f7c7b440d93496e3d5e21 [2/2] ASoC: soc-pcm: test refcount before triggering commit: 6479f7588651cbc9c91e61c20ff39119cbc8feba
Ah sorry, there were still some issues in this RFC, we did more testing and came up with a lot of improvements. The intent of the RFC status was also to make sure it wasn't applied before the merge window.
Can this be reverted in your branch Mark?
On Thu, Aug 26, 2021 at 02:24:19PM -0500, Pierre-Louis Bossart wrote:
Ah sorry, there were still some issues in this RFC, we did more testing and came up with a lot of improvements. The intent of the RFC status was also to make sure it wasn't applied before the merge window.
Can this be reverted in your branch Mark?
Ugh, right.
On start/pause_release/resume, when more than one FE is connected to the same BE, it's possible that the trigger is sent more than once. This is not desirable, we only want to trigger a BE once, which is straightforward to implement with a refcount.
For stop/pause/suspend, the problem is more complicated: the check implemented in snd_soc_dpcm_can_be_free_stop() may fail due to a conceptual deadlock when we trigger the BE before the FE. In this case, the FE states have not yet changed, so there are corner cases where the TRIGGER_STOP is never sent - the dual case of start where multiple triggers might be sent.
This patch suggests an unconditional trigger in all cases, without checking the FE states, using a refcount protected by a spinlock.
Signed-off-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com --- include/sound/soc-dpcm.h | 2 ++ sound/soc/soc-pcm.c | 46 ++++++++++++++++++++++++++++++++++++---- 2 files changed, 44 insertions(+), 4 deletions(-)
diff --git a/include/sound/soc-dpcm.h b/include/sound/soc-dpcm.h index e296a3949b18..6cc751002da7 100644 --- a/include/sound/soc-dpcm.h +++ b/include/sound/soc-dpcm.h @@ -101,6 +101,8 @@ struct snd_soc_dpcm_runtime { enum snd_soc_dpcm_state state;
int trigger_pending; /* trigger cmd + 1 if pending, 0 if not */ + + int be_start; /* refcount protected by dpcm_lock */ };
#define for_each_dpcm_fe(be, stream, _dpcm) \ diff --git a/sound/soc/soc-pcm.c b/sound/soc/soc-pcm.c index 0717f39d2eec..b2440f2f9bf5 100644 --- a/sound/soc/soc-pcm.c +++ b/sound/soc/soc-pcm.c @@ -1534,7 +1534,7 @@ int dpcm_be_dai_startup(struct snd_soc_pcm_runtime *fe, int stream) be->dpcm[stream].state = SND_SOC_DPCM_STATE_CLOSE; goto unwind; } - + be->dpcm[stream].be_start = 0; be->dpcm[stream].state = SND_SOC_DPCM_STATE_OPEN; count++; } @@ -2001,6 +2001,7 @@ int dpcm_be_dai_trigger(struct snd_soc_pcm_runtime *fe, int stream, int ret = 0; unsigned long flags; enum snd_soc_dpcm_state state; + bool do_trigger;
for_each_dpcm_be(fe, stream, dpcm) { struct snd_pcm_substream *be_substream; @@ -2015,6 +2016,7 @@ int dpcm_be_dai_trigger(struct snd_soc_pcm_runtime *fe, int stream, dev_dbg(be->dev, "ASoC: trigger BE %s cmd %d\n", be->dai_link->name, cmd);
+ do_trigger = false; switch (cmd) { case SNDRV_PCM_TRIGGER_START: spin_lock_irqsave(&fe->card->dpcm_lock, flags); @@ -2025,13 +2027,20 @@ int dpcm_be_dai_trigger(struct snd_soc_pcm_runtime *fe, int stream, continue; } state = be->dpcm[stream].state; + if (be->dpcm[stream].be_start == 0) + do_trigger = true; + be->dpcm[stream].be_start++; be->dpcm[stream].state = SND_SOC_DPCM_STATE_START; spin_unlock_irqrestore(&fe->card->dpcm_lock, flags);
+ if (!do_trigger) + continue; + ret = soc_pcm_trigger(be_substream, cmd); if (ret) { spin_lock_irqsave(&fe->card->dpcm_lock, flags); be->dpcm[stream].state = state; + be->dpcm[stream].be_start--; spin_unlock_irqrestore(&fe->card->dpcm_lock, flags); goto end; } @@ -2045,13 +2054,20 @@ int dpcm_be_dai_trigger(struct snd_soc_pcm_runtime *fe, int stream, }
state = be->dpcm[stream].state; + if (be->dpcm[stream].be_start == 0) + do_trigger = true; + be->dpcm[stream].be_start++; be->dpcm[stream].state = SND_SOC_DPCM_STATE_START; spin_unlock_irqrestore(&fe->card->dpcm_lock, flags);
+ if (!do_trigger) + continue; + ret = soc_pcm_trigger(be_substream, cmd); if (ret) { spin_lock_irqsave(&fe->card->dpcm_lock, flags); be->dpcm[stream].state = state; + be->dpcm[stream].be_start--; spin_unlock_irqrestore(&fe->card->dpcm_lock, flags); goto end; } @@ -2065,13 +2081,20 @@ int dpcm_be_dai_trigger(struct snd_soc_pcm_runtime *fe, int stream, }
state = be->dpcm[stream].state; + if (be->dpcm[stream].be_start == 0) + do_trigger = true; + be->dpcm[stream].be_start++; be->dpcm[stream].state = SND_SOC_DPCM_STATE_START; spin_unlock_irqrestore(&fe->card->dpcm_lock, flags);
+ if (!do_trigger) + continue; + ret = soc_pcm_trigger(be_substream, cmd); if (ret) { spin_lock_irqsave(&fe->card->dpcm_lock, flags); be->dpcm[stream].state = state; + be->dpcm[stream].be_start--; spin_unlock_irqrestore(&fe->card->dpcm_lock, flags); goto end; } @@ -2084,9 +2107,15 @@ int dpcm_be_dai_trigger(struct snd_soc_pcm_runtime *fe, int stream, spin_unlock_irqrestore(&fe->card->dpcm_lock, flags); continue; } + if ((be->dpcm[stream].state == SND_SOC_DPCM_STATE_START && + be->dpcm[stream].be_start == 1) || + (be->dpcm[stream].state == SND_SOC_DPCM_STATE_PAUSED && + be->dpcm[stream].be_start == 0)) + do_trigger = true; + be->dpcm[stream].be_start--; spin_unlock_irqrestore(&fe->card->dpcm_lock, flags);
- if (!snd_soc_dpcm_can_be_free_stop(fe, be, stream)) + if (!do_trigger) continue;
spin_lock_irqsave(&fe->card->dpcm_lock, flags); @@ -2098,6 +2127,7 @@ int dpcm_be_dai_trigger(struct snd_soc_pcm_runtime *fe, int stream, if (ret) { spin_lock_irqsave(&fe->card->dpcm_lock, flags); be->dpcm[stream].state = state; + be->dpcm[stream].be_start++; spin_unlock_irqrestore(&fe->card->dpcm_lock, flags); goto end; } @@ -2109,9 +2139,12 @@ int dpcm_be_dai_trigger(struct snd_soc_pcm_runtime *fe, int stream, spin_unlock_irqrestore(&fe->card->dpcm_lock, flags); continue; } + if (be->dpcm[stream].be_start == 1) + do_trigger = true; + be->dpcm[stream].be_start--; spin_unlock_irqrestore(&fe->card->dpcm_lock, flags);
- if (!snd_soc_dpcm_can_be_free_stop(fe, be, stream)) + if (!do_trigger) continue;
spin_lock_irqsave(&fe->card->dpcm_lock, flags); @@ -2123,6 +2156,7 @@ int dpcm_be_dai_trigger(struct snd_soc_pcm_runtime *fe, int stream, if (ret) { spin_lock_irqsave(&fe->card->dpcm_lock, flags); be->dpcm[stream].state = state; + be->dpcm[stream].be_start++; spin_unlock_irqrestore(&fe->card->dpcm_lock, flags); goto end; } @@ -2134,9 +2168,12 @@ int dpcm_be_dai_trigger(struct snd_soc_pcm_runtime *fe, int stream, spin_unlock_irqrestore(&fe->card->dpcm_lock, flags); continue; } + if (be->dpcm[stream].be_start == 1) + do_trigger = true; + be->dpcm[stream].be_start--; spin_unlock_irqrestore(&fe->card->dpcm_lock, flags);
- if (!snd_soc_dpcm_can_be_free_stop(fe, be, stream)) + if (!do_trigger) continue;
spin_lock_irqsave(&fe->card->dpcm_lock, flags); @@ -2148,6 +2185,7 @@ int dpcm_be_dai_trigger(struct snd_soc_pcm_runtime *fe, int stream, if (ret) { spin_lock_irqsave(&fe->card->dpcm_lock, flags); be->dpcm[stream].state = state; + be->dpcm[stream].be_start++; spin_unlock_irqrestore(&fe->card->dpcm_lock, flags); goto end; }
participants (2)
-
Mark Brown
-
Pierre-Louis Bossart