On Thu, 21 Nov 2019 20:22:03 +0100, Sridharan, Ranjani wrote:
> > > > I couldn't find anything obvious. Could you try without changing > > snd_sof_pcm_period_elapsed(), i.e. only adding the stuff and calling > > sync_stop, in order to see whether the additional stuff broke > > anything? > It is indeed the removal of snd_sof_pcm_period_elapsed() that makes the > device hang when the stream is stoppped. But that's a bit surprising > given that all I tried was using the snd_pcm_period_elapsed() directly > instead of scheduling the delayed work to call it. If I read the code correctly, this can't work irrelevantly from the sync_stop stuff. The call of period_elapsed is from hda_dsp_stream_check() which is performed in bus->reg_lock spinlock in hda_dsp_stream_threaded_handler(). Meanwhile, the XRUN trigger goes to hda_dsp_pcm_trigger() that follows hda_dsp_stream_trigger(), and this function expects the sleepable context due to snd_sof_dsp_read_poll_timeout() call. So something like below works? Takashi --- a/sound/soc/sof/intel/hda-stream.c +++ b/sound/soc/sof/intel/hda-stream.c @@ -592,8 +592,11 @@ static bool hda_dsp_stream_check(struct hdac_bus *bus, u32 status) continue; /* Inform ALSA only in case not do that with IPC */ - if (sof_hda->no_ipc_position) - snd_sof_pcm_period_elapsed(s-> substream); + if (sof_hda->no_ipc_position) { + spin_unlock_irq(&bus->reg_lock); + snd_pcm_period_elapsed(s->substream); + spin_lock_irq(&bus->reg_lock); Thanks, Takashi. Yes, I realized it this morning as well that it is due to the reg_lock. It does work with this change now. I will run some stress tests with this change and get back with the results.
Hi Takashi,
Sorry the stress tests took a while. As we discussed earlier, adding the sync_stop() op didnt quite help the SOF driver in removing the delayed work for snd_pcm_period_elapsed().
Yeah, that's understandable. If the stop operation itself needs some serialization, sync_stop() won't influence at all.
However, now after these discussions, I have some concerns in the current code:
- The async work started by schedule_work() may be executed (literally) immediately. So if the timing or the serialization matters, it doesn't guarantee at all. The same level of concurrency can happen at any time.
- The period_elapsed work might be pending at prepare or other operation; the async work means also that it doesn't guarantee its execution in time, and it might be delayed much, and the PCM core might go to prepare or other state even before the work is executed.
The second point can be fixed easily now with sync_stop. You can just put flush_work() in sync_stop in addition to synchronize_irq().
But the first point is still unclear. More exactly, which operation does it conflict? Does it the playback drain? Then it might take very long (up to seconds) to block the next operation?
thanks,
Takashi