[alsa-devel] underruns and strange code in pcm_rate.c (and patch)

Wed Nov 7 11:54:56 CET 2007

At Tue, 06 Nov 2007 19:10:28 +0300,
Stas Sergeev wrote:
> 
> Hi.
> 
> Takashi Iwai wrote:
> > In principle, using rate plugin with two periods doesn't work well in
> > case that the sample rates aren't aligned.  It's a design issue.  You
> > shouldn't use two periods except for hw.  Period.
> I understand that and I even tried
> a hack to increase the buffer_size
> and stop_threshold by the period_size
> in a rate plugin, although without
> a success.
> However, there is one exception: it
> will work provided the app is allowed
> to fill both fragments entirely.
> Without that patch, it simply can't.
> With the patch - the write returns
> earlier, the app writes again, the
> rate plugin gets the period filled,
> and commits it, before the first one
> expired.
> Isn't it exactly the right and
> reliable fix then?

I don't understand your description well.  Could you give a simple
test code to prove the bug?

> > Here, note that 940 != 1024 * 44.1 / 48.0 exactly.  This rounding
> > causes the drift of wake-up time at each period and the delay is
> > accumulated.
> I understand, but the app can start writing
> _third_ period if write returns earlier.
> And that write will fill up the reminder
> of the second one, no matter how it accumulated.

In the case above, it's woken up *LATER* than expected.  write/poll
doesn't return earlier.  Wel, write would return ealier but it doesn't
return the size of the period size but must be less than that.

> > So, even applying your patch, the XRUN problem may occur at some time
> > as long as you use two periods.  It can't be fixed without the
> > fundamental change of the irq / poll handling routines in the ALSA
> > driver.
> I am more inclinced to think there is
> a race - except for the underruns, I've
> also seen a lockups in poll().
> Do you think the race I described in the
> previous posting doesn't exist?

I don't think it's a race.  It appears to be just the matter of
configuration mismatch (or misconception) to me.

> > Whether that hack really does any good thing is questionable, indeed.
> Actually, I've found out that it plaques
> many other bugs. For example, mpg123 sets
> avail_min=1, which is silly but should
> work. With the hack, the avail_min just
> gets increased. Without the hack -
> snd_pcm_write_areas() spins in a loop
> trying to commit the samples one-by-one
> (because the poll returns immediately),
> eating 100% of CPU. But that's an unrelated
> bug, which was just hidden, and I think
> there are more...

Possibly.  Changing avail_min isn't a good idea, I believe, too.

> > First, it skips the avail_min adjustment if the app fills the period
> > size.  Thus only for apps that fills arbitrary amount of data via
> > snd_pcm_writei() triggers this hack.
> I don't think so.

It is.  It has the condition "avail_min > 0", where avail_min is
computed from appl_ptr % period_size.  Thus, if you fill always the
period_size and calls poll, this hack is always skipped.

> Because of the rounding
> errors in a rate plugin, when the app
> thinks it writes the entire fragment,
> it actually does not. Any app is affected!

The app must not think that the whole fragment can be written at a
single call.  That's the bug of the app!

> > Second, avail_min is checked usually in irq handler and thus its
> > resolution is also in period size.  It means avail_min + 1 is
> > equivalent with avail_min + period_size.
> Yes, so don't increase it then. :)
> 
> > So what we can do better?  As a temporary solution, we can get rid of
> > the problematic part, or, at least, add the check whether avail_min
> > comes over stop_threshold.  I'm not sure whether any big impact by
> > removing the hack there.  Maybe not.  But, I feel it's a barren
> > discussion.  It's really a design problem.  Sigh.
> I think allowing an app to start writing
> the third fragment, filling actually the
> second one, is a very good fix.
> What problems can you see with it?

The problem is that two periods with the current rate plugin cannot
work.  Suppose the case of client period size 940, slave period size
1024.  The wake up happens at each slave period size, i.e. slave
pointer at 1024, 2048, 3072, ...  It corresponds to (almost) client
pointer at 940, 1881, 2822.  At each wakeup point, app is notified
that one period gets empty (and one another is being played).  Now,
you see the wake up is delayed more and more from the expected point,
940, 1880, 2820, ...  Hence, at some time, the delay will reach to the
period size.  The app is notiried that *one* period is empty but at
this point *both* periods are empty, that is, buffer underrun.

So, without another wake-up source, the problem cannot be solved.

Takashi