Re: [alsa-devel] underruns and strange code in pcm_rate.c (and patch)

8 Nov 2007

      At Wed, 07 Nov 2007 21:40:39 +0300,
Stas Sergeev wrote:
...
Hi.
Takashi Iwai wrote:
...
I don't think it's a race.  It appears to be just the matter of
configuration mismatch (or misconception) to me.
OK then. The race will have to wait
till I get to it hard and (in case
there really is) produce a patch.
Yeah, a testcase is required to reproduce the bug anyway.
...
...
Possibly.  Changing avail_min isn't a good idea, I believe, too.
Well, what I was trying to say is
that there are a few serious bugs
that were hidden by the hack...
Never mind, I'll post the patches
when the time will come. One step
at a time.
OK, thanks.
...
...
...
...
size.  Thus only for apps that fills arbitrary amount of data via
snd_pcm_writei() triggers this hack.
I don't think so.
It is.
Firstly, it seems all or most apps
write the arbitrary amounts.
Well, I object it - as far as I know (through hundreds of SUSE
package I've been maintaining), most of them write the period_size
data at once.  Writing arbitrary size is rather minor.
...
mpg123:
#7  0x00002aaaaad34afa in snd_pcm_mmap_writei (pcm=0x565dd0, buffer=0x5660b0, 
    size=4096) at pcm_mmap.c:186
186             return snd_pcm_write_areas(pcm, areas, 0, size,
(gdb) p pcm->period_size
$1 = 5512

writes 4096, while period_size is 5512.
ogg123:
#9  0x00002aaab2c48a69 in snd_pcm_writei (pcm=0x673430, buffer=0x2aaab4215108, 
    size=11340) at pcm.c:1186
1186            return _snd_pcm_writei(pcm, buffer, size);
(gdb) p pcm->period_size
$2 = 2756

writes 11340 with period_size 2756.
Both are for libao.
The normal apps are more careful about the write timing and sync with
GUI.  So they tend to write the data at the time poll permits, instead
of dumb write sequence relying on blocking mode.  Maybe this is the
key to get or avoid the bug.
...
But I don't agree with you even if
some app does not. It will still be
affected. Let me demonstrate. I added
the following:

+if (size > pcm->period_size && size % pcm->period_size)
+size -= size % pcm->period_size;
to snd_pcm_writei() and snd_pcm_mmap_writei()
to simulate the condition when an app
transfers by periods. Here we go:

#5  0x00002aaab2c5dbc1 in snd_pcm_mmap_writei (pcm=0x672e30, 
    buffer=0x2aaab4215108, size=11024) at pcm_mmap.c:188
188             return snd_pcm_write_areas(pcm, areas, 0, size,
(gdb) p pcm->period_size
$1 = 2756
(gdb) p pcm->period_size*4
$2 = 11024
(gdb) p pcm->period_size*4==size
$3 = 1

OK, it is trying to write 4 periods.

Breakpoint 2, snd_pcm_rate_poll_descriptors (pcm=0x672e30, pfds=0x409ffdd0, 
    space=1) at pcm_rate.c:720
720             snd_pcm_rate_t *rate = pcm->private_data;
(gdb) n
724             ret = snd_pcm_generic_poll_descriptors(pcm, pfds, space);
(gdb) n
725             if (ret < 0)
(gdb) n
728             avail_min = rate->appl_ptr % pcm->period_size;
(gdb) n
729             if (avail_min > 0) {
(gdb) n
730                     recalc(pcm, &avail_min);
(gdb) p avail_min
$4 = 4

See a small reminder?
Why you ask? Here:
(the ogg has the rate 22050, the
driver has 48000)

(gdb) p rate->gen.slave->period_size
$10 = 6000
(gdb) p 6000/2756.0
$11 = 2.1770682148040637
(gdb) p 48000/22050.0
$12 = 2.1768707482993195

See a slight difference? I think this
is exactly why _any_ app is affected,
not only those that write arbitrary
amounts (even though they are a majority,
it seems to me).
So let me challenge you on that. :)
Then try aplay to play 22.5kHz samples on 48k.  You'll see it doesn't
happen.
The difference seems to be whether you do simply writei() sequences
without checks which causes partial writes and eventually triggers the
hack.  As mentioned, many apps do check the available spaces before
write, so the partial write doesn't happen practically.
...
...
The app must not think that the whole fragment can be written at a
single call.  That's the bug of the app!
Hmm, I don't think it does, but I don't
understand what you mean here. So you
say the app should write an arbitrary
amounts? Well, then the above examples
of mine makes no sense, but I don't see
how it helps.
No, what I meant is that the app should check always the return value
of the write properly.  Even in the blocking mode, write *may* return
without writing the whole data.  Well, it's off-topic right now.
...
...
The problem is that two periods with the current rate plugin cannot
work.  Suppose the case of client period size 940, slave period size
Thanks for the example, I now understand
what fundamental problem you have in mind.
...
So, without another wake-up source, the problem cannot be solved.
OK, it appears we are talking about two
completely different problems.
Yes, I now see the fundamental one and
I admit my patch doesn't solve it. Neither
it was intended to.
So what we have:

A fundamental problem with the different

fragment sizes. This may (or may not, if the
HW period is adjustable) give you underruns.
2. A bug in snd_pcm_rate_poll_descriptors(),
which prevents an app from filling the second
fragment before the first one is played.
This _does_ give an underruns, constantly,
unavoidably and badly. I have a patch, it needs
to be discussed.
3. A bug in snd_pcm_write_areas() I mentioned
earlier. I have a patch, will post it when the
time will come.
4. A (possible) race in snd_pcm_playback_poll()
which is harmless most of the times, but I can
get underruns out of it by rapidly switching
consoles.
5. Some other bug related to that problem,
which makes me a headache, but is still to be
located.
And you basically say: "if we have 1, then we
are not interested in fixing 2,3 and investigating 4,5".
That point of view may exist, but given that
2 gives some 90% of underruns with 1 being
only theoretically harmfull, I am not satisfied. :)
I'm not satisfied because I have no testcase to reproduce the
practical bug.  The patch looks fine, but I cannot test it because the
serious bug doesn't occur to me.  OK?  That's why I claim a testcase.
thanks,
Takashi