[PATCH 3/3] echoaudio: Address bugs in the interrupt handling

Thu Jun 18 13:07:33 CEST 2020

On Thu, 18 Jun 2020, Takashi Iwai wrote:

> On Wed, 17 Jun 2020 12:51:05 +0200,
> Mark Hills wrote:
> > 
> > On Tue, 16 Jun 2020, Takashi Iwai wrote:
> > 
> > > On Tue, 16 Jun 2020 16:01:11 +0200,
> > > Mark Hills wrote:
[...]
> > 
> > /* Update software pointer to match the hardware
> >  *
> >  * \pre chip->lock is held
> >  */
> > static void snd_echo_update_substream_position(struct echoaudio *chip,
> > 					struct snd_pcm_substream *substream)
> > {
> > 	struct snd_pcm_runtime *runtime = substream->runtime;
> > 	struct audiopipe *pipe = runtime->private_data;
> > 	u32 counter, step;
> > 	size_t period_bytes;
> > 
> > 	if (pipe->state != PIPE_STATE_STARTED)
> > 		return;
> > 
> > 	period_bytes = frames_to_bytes(runtime, runtime->period_size);
> > 
> > 	counter = le32_to_cpu(*pipe->dma_counter);
> > 
> > 	step = counter - pipe->last_counter;  /* handles wrapping of counter */
> > 	step -= step % period_bytes;  /* acknowledge whole periods only */
> > 
> > 	if (step == 0)
> > 		return;  /* haven't advanced a whole period yet */
> > 	pipe->last_counter += step;  /* does not always wrap on a period */
> > 	pipe->position += step;
> > 
> > 	/* wraparound the buffer */
> > 	pipe->position %= frames_to_bytes(runtime, runtime->buffer_size);
> > 
> > 	/* notify only once, even if multiple periods elapsed */
> > 	spin_unlock(&chip->lock);
> > 	snd_pcm_period_elapsed(substream);
> > 	spin_lock(&chip->lock);
> > }
> > 
> > static irqreturn_t snd_echo_interrupt(int irq, void *dev_id)
> > {
> > 	struct echoaudio *chip = dev_id;
> > 	int ss, st;
> > 
> > 	spin_lock(&chip->lock);
> > 	st = service_irq(chip);
> > 	if (st < 0) {
> > 		spin_unlock(&chip->lock);
> > 		return IRQ_NONE;
> > 	}
> > 	/* The hardware doesn't tell us which substream caused the irq,
> > 	thus we have to check all running substreams. */
> > 	for (ss = 0; ss < DSP_MAXPIPES; ss++) {
> > 		struct snd_pcm_substream *substream;
> > 
> > 		substream = chip->substream[ss];
> > 		if (substream)
> > 			snd_echo_update_substream_position(chip, substream);
> > 	}
> > 	spin_unlock(&chip->lock);
> > 
> > #ifdef ECHOCARD_HAS_MIDI
> > 	if (st > 0 && chip->midi_in) {
> > 		snd_rawmidi_receive(chip->midi_in, chip->midi_buffer, st);
> > 		dev_dbg(chip->card->dev, "rawmidi_iread=%d\n", st);
> > 	}
> > #endif
> > 	return IRQ_HANDLED;
> > }
> > 
> > static snd_pcm_uframes_t pcm_pointer(struct snd_pcm_substream *substream)
> > {
> > 	struct snd_pcm_runtime *runtime = substream->runtime;
> > 	struct audiopipe *pipe = runtime->private_data;
> > 
> > 	return bytes_to_frames(runtime, pipe->position);
> 
> I guess this misses the update of the precise position; in your code,
> pipe->position gets updated only with the period size at irq handler.
> 
> 
> IMO, we should have the code like:
> 
> static bool update_stream_position(struct snd_pcm_substream *substream)
> {
> 	// update pipe->position and others, returns true if period elapsed
> }
> 
> static irqreturn_t snd_echo_interrupt()
> {
> 	spin_lock(&chip->lock);
> 	....
> 	if (update_stream_position(substream)) {
> 		spin_unlock(&chip->lock);
> 		snd_pcm_period_elapsed(substream);
> 		spin_lock(&chip->lock);
> 	}
> 	....
> 	spin_unlock(&chip->lock);
> 	return IRQ_HANDLED;
> }
> 
> static snd_pcm_uframes_t pcm_pointer(struct snd_pcm_substream *substream)
> {
> 	....
> 	update_stream_position(substream);
> 	return bytes_to_frames(runtime, pipe->position);
> }

Thanks.

I certainly understand this in isolation. 

But could I please ask for help with the bigger picture? As it feels 
mismatched.

* Code should take every possible opportunity to update the stream 
  position; interrupts, or explicit pcm_pointer calls (whereas the docs 
  guide towards doing it in the interrupt handler)

* I critiqued (elsewhere in thread) the older interrupt handler for 
  checking the counter, unlocking, calling back into PCM core and checking 
  again a moment later. Whereas this is considered good behaviour.

* Seems like the overall aim is for userland to be able (if it wants to)  
  to poll the soundcard, even outside of periods.

If all the above is true, I would expect interrupt handling to be very 
simple -- it would straight away call into PCM core, join existing if the 
codepaths (to take locks) and do a position update. PCM core would decide 
if a period really elapsed, not the driver. But this is not how it works.

This now relates strongly to a question of locking:

I ran the code (top of this message) all day, with a few instances in 
dmesg (at irregular intervals, not wrapping):

  [161644.076666] snd_echo3g 0000:09:02.0: invalid position: , pos = 4096, buffer size = 4096, period size = 64
  [163232.392501] snd_echo3g 0000:09:02.0: invalid position: , pos = 4096, buffer size = 4096, period size = 64
  [164976.098069] snd_echo3g 0000:09:02.0: invalid position: , pos = 4096, buffer size = 4096, period size = 64
  [165054.946225] snd_echo3g 0000:09:02.0: invalid position: , pos = 4096, buffer size = 4096, period size = 64
  [165312.141545] snd_echo3g 0000:09:02.0: invalid position: , pos = 4096, buffer size = 4096, period size = 64

A definite bug, of course.

However (and I am happy to be corrected) the function never finishes with 
position == buffer size. Only way is a race between interrupt handler and 
pcm_pointer.

Therefore one of these is needed:

* pcm_pointer locks chip->lock

  Even though the docs emphasise PCM core has exclusivity, it it not worth 
  much as it does not protect against the interrupt handler.

  But now interrupt handler becomes ugly in total: take chip->lock, check 
  the counter, release chip->lock, go to PCM core (which takes middle 
  layer locks), take chip->lock again, check counter again, release 
  chip->lock again.

* interrupt handler must make atomic update of pipe->position

  This could have been a solution, but not if we expect pcm_pointer to 
  also invoke the position update. Now we have a race: both the interrupt 
  handler and pcm_position want to read dma_counter and write 
  pipe->position after.

So either do everthing in interrupt, everything in the pointer callback 
(though there isn't the API for this), but doing both does not seem to 
work well (though probably can be made to work if necessary)

If we can clarify the requirements then I do not think it would be hard 
for me to write the code.

[...]
> Takashi

Thanks again,

-- 
Mark