On Mon, 28 Nov 2022 23:51:55 +0100, Carl Hetherington wrote:
Hi Takashi,
Thank you for your continued attention with this!
[snip]
I'll see if can apply a similar fix to this case, though to my naive eyes it looks a little trickier as the xrun is found in the snd_pcm code rather than the USB code. Any suggestions most welcome!
OK, then it's a bit different problem, and not so trivial to fix in the kernel side alone, I'm afraid. Basically it's a race between start and stop of two streams. The key point is that, for stopping a (USB) stream, a sync-stop operation is needed, and this can't be performed at the PCM trigger itself (which is an atomic operation). So, the kernel trigger may at most return an error there.
I assume that it's from snd_usb_endpoint_start() and it returning -EPIPE error. If so, we may change the PCM core code to set the PCM state again XRUN in such an error case, so that application may repeat the standard recovery process. Something like below.
Thanks for the suggestion. I experimented a little with this, but I think the problem I'm seeing is that (even if the application knows it should retry the snd_pcm_prepare() step) we still end up with an endpoint in EP_STATE_STOPPING while the corresponding stop_operating flag is 0.
Ah, I guess that's a fallout in the logic. When XRUN happens at start -- receiving an -EPIPE error at snd_pcm_do_start() -- then the patch sets the XRUN state. This assumed that the stream gets stopped the following snd_pcm_undo_start() call. Indeed it does stop but there we forgot setting stop_operating flag unlike what snd_pcm_stop() does.
This means that snd_pcm_sync_stop will never call the USB sync_stop handler, which AFAICS is the only way (?) the endpoint can get back to EP_STATE_STOPPED.
In my error case, the code in snd_pcm_sync_stop sets stop_operating to false (perhaps assuming that substream->ops->sync_stop will "succeed" in setting any STOPPING endpoints to STOPPED) but then this doesn't happen because of this xrun that arrives halfway through the sync_stop operation.
I experimented with removing the check at the top of snd_pcm_sync_stop, so that we enter the if body regardless of substream->runtime->stop_operating, and making my application retry snd_pcm_prepare() if it fails with -EPIPE, and this seems to "fix" my problem. Obviously this causes more (unnecessary) calls to the sync_stop() entry point...
I'd be grateful of any thoughts you have about that.
How about the revised patch below?
Takashi
-- 8< -- --- a/sound/core/pcm_native.c +++ b/sound/core/pcm_native.c @@ -1424,16 +1424,28 @@ static int snd_pcm_pre_start(struct snd_pcm_substream *substream, static int snd_pcm_do_start(struct snd_pcm_substream *substream, snd_pcm_state_t state) { + int err; + if (substream->runtime->trigger_master != substream) return 0; - return substream->ops->trigger(substream, SNDRV_PCM_TRIGGER_START); + err = substream->ops->trigger(substream, SNDRV_PCM_TRIGGER_START); + /* XRUN happened during the start; usually we do trigger(STOP) + * then set the PCM state to XRUN, but in this case, the stream + * is stopped in snd_pcm_undo_start() right after this point without + * knowing the reason -- so set the PCM state beforehand as exception. + */ + if (err == -EPIPE) + __snd_pcm_set_state(substream->runtime, SNDRV_PCM_STATE_XRUN); + return err; }
static void snd_pcm_undo_start(struct snd_pcm_substream *substream, snd_pcm_state_t state) { - if (substream->runtime->trigger_master == substream) + if (substream->runtime->trigger_master == substream) { substream->ops->trigger(substream, SNDRV_PCM_TRIGGER_STOP); + substream->runtime->stop_operating = true; + } }
static void snd_pcm_post_start(struct snd_pcm_substream *substream,