At Mon, 05 Nov 2007 02:48:52 +0300, Stas Sergeev wrote:
Hi.
I spent 4 week-ends debugging the constant underruns and lock-ups with the libao-based programs. Now I've finally tracked the problem to some strange code in pcm_rate.c, namely snd_pcm_rate_poll_descriptors(). I am not sure what was the reason behind altering the "avail_min" every time. Looks like some heuristic was intended, say "see what amount of fragment part was written now, and make sure that amount of free space will be available next time, after poll". Whatever, this code gives underruns. It can increase avail_min by almost period_size. libao sets avail_min==period_size, so we can end up with avail_min==period_size*2. libao sets buffer_size==stop_threshold==period_size*2.
Depends on the configuration. I thought it's not so as default.
So we end up with avail_min~=stop_threshold, which gives underruns. Since I don't know how this code was indended to work, I just removed it, and everything works perfectly now (also had to patch libao to check the returned rate).
The patch is attached, any comments?
In principle, using rate plugin with two periods doesn't work well in case that the sample rates aren't aligned. It's a design issue. You shouldn't use two periods except for hw. Period.
... but with this attitute, there is no improvement. Let's see more details.
The problem is that the hardware is woken up only at the period size of the slave side. Assume the slave (hardware playback) is running 48kHz and the client (input) is 44.1kHz. When dmix is used, usually the period size = 1024 in the h/w side. Then the period size of the client side is supposed to be 940. Here, note that 940 != 1024 * 44.1 / 48.0 exactly. This rounding causes the drift of wake-up time at each period and the delay is accumulated.
So, even applying your patch, the XRUN problem may occur at some time as long as you use two periods. It can't be fixed without the fundamental change of the irq / poll handling routines in the ALSA driver.
Now, back to the problematic code part in rate plugin. Whether that hack really does any good thing is questionable, indeed.
First, it skips the avail_min adjustment if the app fills the period size. Thus only for apps that fills arbitrary amount of data via snd_pcm_writei() triggers this hack. Second, avail_min is checked usually in irq handler and thus its resolution is also in period size. It means avail_min + 1 is equivalent with avail_min + period_size.
So what we can do better? As a temporary solution, we can get rid of the problematic part, or, at least, add the check whether avail_min comes over stop_threshold. I'm not sure whether any big impact by removing the hack there. Maybe not. But, I feel it's a barren discussion. It's really a design problem. Sigh.
Takashi