There is a bug in the way alsa-lib sleeps when using a plugin that can cause very high CPU usage for PCM playback with using rate conversion.
I've attached a simple test program. It simply uses blocking mode and calls snd_pcm_writei() with a buffer of zeros one thousand times and then exits. One would think the CPU usage would be very low. And indeed when the write sized passed to snd_pcm_writei is equal to the period size it is quite low (i5 with HDA):
lucid@Dashboard:~$ /usr/bin/time ./a.out > /dev/null period size 352, write size 352 done, wrote 1000 buffers of 352 frames 0.10user 0.15system 0:09.01elapsed 2%CPU (0avgtext+0avgdata 9360maxresident)k
But, with one small change to have snd_pcm_writei() write buffers one frame larger than the period size, one gets this:
period size 352, write size 353 done, wrote 1000 buffers of 353 frames 1.92user 2.00system 0:09.04elapsed 43%CPU (0avgtext+0avgdata 9360maxresident)k
Simply increasing the write size by one caused the CPU usage to jump to 43% from 2%! Because more software writes in chunks equal to the period size it doesn't see this problem. That's why it wasn't found and fixed long ago.
I've figured out why this is happening. The problem is caused by how alsa-lib decides to sleep when waiting for available buffer space and the way the rate plugin can only process complete periods at a time.
Suppose we have the rate plugin with hw as its slave. Both have a ring buffer that is 3+ periods in size. Allowing for the rate conversion, the buffers and periods are the same size for both. avail_min is set to the normal value of one period for both.
Suppose there are two empty periods in the buffer and we call snd_pcm_writei() to write 1.5 periods. That fits in the empty space in the rate plugin's buffer and so gets written immediately. The rate plugin's buffer now has 0.5 empty periods of space. The rate plugin will resample just one period of the newly written data, because it can only process complete periods at a time, and write that to the hw buffer. So the hw buffer now has one free period. This means the rate plugin has less free space in its buffer than the hw has, because the rate plugin has half a period of data it can't yet resample and pass to the slave.
Now we write another one period of data with another snd_pcm_writei() call. This gets to snd_pcm_write_areas() in pcm.c and triggers this code (edited for brevity):
if ((state == SND_PCM_STATE_RUNNING && (snd_pcm_uframes_t)avail < pcm->avail_min && size > (snd_pcm_uframes_t)avail)) { err = snd_pcm_wait(pcm, -1); goto _again; }
avail is the available space in the rate plugin's buffer, which is half a period. pcm->avail_min is also for the rate plugin. size is the amount remaining in the writei() call, which is one period in this example.
As we can see, the conditions of the if() statement are true. avail of 0.5 periods is less than avail_min of 1 period and the size of the write (1 period) is greater than avail. So we call snd_pcm_wait() and then go back to the top when it returns. avail will get updated to the current value and this code runs again.
Eventually snd_pcm_wait() will call poll() and try to sleep in the kernel while waiting for more room to be available. But remember the hw buffer had one full period free! That's more than tghe half period that's free in the rate buffer and, and this is the key here, it is *more than avail_min*. So poll() will not sleep at all! That's where the 43% CPU comes from. alsa-lib is in a tight loop inside snd_pcm_write_areas() calling snd_pcm_wait() over and over again.
snd_pcm_wait() doesn't sleep, because it's based on the status of the hw buffer and that buffer does have free space. snd_pcm_write_areas() would write to the hw buffer and not try to sleep. But it's not looking at the hw buffer when it decides to sleep or write, it's looking at the rate buffer. And the rate buffer's fill level says to sleep, not write. And if snd_pcm_wait() was looking at the rate buffer, it would sleep.
We have the rate buffer that's current fill level says, "sleep first, then write" and the HW buffer that says the opposite, "write first, then sleep". The code that writes looks at the former and decides to sleep but the code that sleeps looks at the latter and decides it's not time sleep.
A trivial patch that stops this behavior is below, which simply forces the rate plugin's avail_min to 1.
diff --git a/src/pcm/pcm_rate.c b/src/pcm/pcm_rate.c index 41089d7..86f57c4 100644 --- a/src/pcm/pcm_rate.c +++ b/src/pcm/pcm_rate.c @@ -1142,6 +1142,7 @@ static int snd_pcm_rate_start(snd_pcm_t *pcm) return 0; } rate->start_pending = 0; + pcm->avail_min = 1; return snd_pcm_start(rate->gen.slave); }
I think it would be better to have the rate plugin decrease avail_min by how much unresampled data it has yet to pass to the slave. This preserves the idea of avail_min if it were set to more than one period. This also fixes the problem. I've attached a real patch to do that.
For the test program to show the problem YOU MUST USE THE RATE CONVERT PLUGIN! It seems obvious but people keep missing this. I found the easiest way to do that was to create a custom device in ~/.asoundrc that forced rate converting to 48kHz. Like this: pcm.foo { type plug; slave { pcm "hw:0,0"; rate 48000; } }