[alsa-devel] underruns and strange code in pcm_rate.c (and patch)

Takashi Iwai tiwai at suse.de
Thu Nov 8 05:42:50 CET 2007


At Wed, 07 Nov 2007 21:40:39 +0300,
Stas Sergeev wrote:
> 
> Hi.
> 
> Takashi Iwai wrote:
> > I don't think it's a race.  It appears to be just the matter of
> > configuration mismatch (or misconception) to me.
> OK then. The race will have to wait
> till I get to it hard and (in case
> there really is) produce a patch.

Yeah, a testcase is required to reproduce the bug anyway.

> > Possibly.  Changing avail_min isn't a good idea, I believe, too.
> Well, what I was trying to say is
> that there are a few serious bugs
> that were hidden by the hack...
> Never mind, I'll post the patches
> when the time will come. One step
> at a time.

OK, thanks.

> >>> size.  Thus only for apps that fills arbitrary amount of data via
> >>> snd_pcm_writei() triggers this hack.
> >> I don't think so.
> > It is.
> Firstly, it seems all or most apps
> write the arbitrary amounts.

Well, I object it - as far as I know (through hundreds of SUSE
package I've been maintaining), most of them write the period_size
data at once.  Writing arbitrary size is rather minor.


> mpg123:
> ---
> #7  0x00002aaaaad34afa in snd_pcm_mmap_writei (pcm=0x565dd0, buffer=0x5660b0, 
>     size=4096) at pcm_mmap.c:186
> 186             return snd_pcm_write_areas(pcm, areas, 0, size,
> (gdb) p pcm->period_size
> $1 = 5512
> ---
> writes 4096, while period_size is 5512.
> 
> ogg123:
> ---
> #9  0x00002aaab2c48a69 in snd_pcm_writei (pcm=0x673430, buffer=0x2aaab4215108, 
>     size=11340) at pcm.c:1186
> 1186            return _snd_pcm_writei(pcm, buffer, size);
> (gdb) p pcm->period_size
> $2 = 2756
> ---
> writes 11340 with period_size 2756.

Both are for libao.
The normal apps are more careful about the write timing and sync with
GUI.  So they tend to write the data at the time poll permits, instead
of dumb write sequence relying on blocking mode.  Maybe this is the
key to get or avoid the bug.

> But I don't agree with you even if
> some app does not. It will still be
> affected. Let me demonstrate. I added
> the following:
> ---
> +if (size > pcm->period_size && size % pcm->period_size)
> +size -= size % pcm->period_size;
> ---
> to snd_pcm_writei() and snd_pcm_mmap_writei()
> to simulate the condition when an app
> transfers by periods. Here we go:
> ---
> #5  0x00002aaab2c5dbc1 in snd_pcm_mmap_writei (pcm=0x672e30, 
>     buffer=0x2aaab4215108, size=11024) at pcm_mmap.c:188
> 188             return snd_pcm_write_areas(pcm, areas, 0, size,
> (gdb) p pcm->period_size
> $1 = 2756
> (gdb) p pcm->period_size*4
> $2 = 11024
> (gdb) p pcm->period_size*4==size
> $3 = 1
> ---
> OK, it is trying to write 4 periods.
> 
> ---
> Breakpoint 2, snd_pcm_rate_poll_descriptors (pcm=0x672e30, pfds=0x409ffdd0, 
>     space=1) at pcm_rate.c:720
> 720             snd_pcm_rate_t *rate = pcm->private_data;
> (gdb) n
> 724             ret = snd_pcm_generic_poll_descriptors(pcm, pfds, space);
> (gdb) n
> 725             if (ret < 0)
> (gdb) n
> 728             avail_min = rate->appl_ptr % pcm->period_size;
> (gdb) n
> 729             if (avail_min > 0) {
> (gdb) n
> 730                     recalc(pcm, &avail_min);
> (gdb) p avail_min
> $4 = 4
> ---
> See a small reminder?
> Why you ask? Here:
> (the ogg has the rate 22050, the
> driver has 48000)
> ---
> (gdb) p rate->gen.slave->period_size
> $10 = 6000
> (gdb) p 6000/2756.0
> $11 = 2.1770682148040637
> (gdb) p 48000/22050.0
> $12 = 2.1768707482993195
> ---
> See a slight difference? I think this
> is exactly why _any_ app is affected,
> not only those that write arbitrary
> amounts (even though they are a majority,
> it seems to me).
> So let me challenge you on that. :)

Then try aplay to play 22.5kHz samples on 48k.  You'll see it doesn't
happen.

The difference seems to be whether you do simply writei() sequences
without checks which causes partial writes and eventually triggers the
hack.  As mentioned, many apps do check the available spaces before
write, so the partial write doesn't happen practically.


> > The app must not think that the whole fragment can be written at a
> > single call.  That's the bug of the app!
> Hmm, I don't think it does, but I don't
> understand what you mean here. So you
> say the app should write an arbitrary
> amounts? Well, then the above examples
> of mine makes no sense, but I don't see
> how it helps.

No, what I meant is that the app should check always the return value
of the write properly.  Even in the blocking mode, write *may* return
without writing the whole data.  Well, it's off-topic right now.

> > The problem is that two periods with the current rate plugin cannot
> > work.  Suppose the case of client period size 940, slave period size
> Thanks for the example, I now understand
> what fundamental problem you have in mind.
> 
> > So, without another wake-up source, the problem cannot be solved.
> OK, it appears we are talking about two
> completely different problems.
> Yes, I now see the fundamental one and
> I admit my patch doesn't solve it. Neither
> it was intended to.
> 
> So what we have:
> 1. A fundamental problem with the different
> fragment sizes. This may (or may not, if the
> HW period is adjustable) give you underruns.
> 2. A bug in snd_pcm_rate_poll_descriptors(),
> which prevents an app from filling the second
> fragment before the first one is played.
> This _does_ give an underruns, constantly,
> unavoidably and badly. I have a patch, it needs
> to be discussed.
> 3. A bug in snd_pcm_write_areas() I mentioned
> earlier. I have a patch, will post it when the
> time will come.
> 4. A (possible) race in snd_pcm_playback_poll()
> which is harmless most of the times, but I can
> get underruns out of it by rapidly switching
> consoles.
> 5. Some other bug related to that problem,
> which makes me a headache, but is still to be
> located.
> 
> And you basically say: "if we have 1, then we
> are not interested in fixing 2,3 and investigating 4,5".
> That point of view may exist, but given that
> 2 gives some 90% of underruns with 1 being
> only theoretically harmfull, I am not satisfied. :)

I'm not satisfied because I have no testcase to reproduce the
practical bug.  The patch looks fine, but I cannot test it because the
serious bug doesn't occur to me.  OK?  That's why I claim a testcase.


thanks,

Takashi


More information about the Alsa-devel mailing list