At Thu, 8 Nov 2012 13:47:52 -0800, Trent Piepho wrote:
There is a bug in the way alsa-lib sleeps when using a plugin that can cause very high CPU usage for PCM playback with using rate conversion.
I've attached a simple test program. It simply uses blocking mode and calls snd_pcm_writei() with a buffer of zeros one thousand times and then exits. One would think the CPU usage would be very low. And indeed when the write sized passed to snd_pcm_writei is equal to the period size it is quite low (i5 with HDA):
lucid@Dashboard:~$ /usr/bin/time ./a.out > /dev/null period size 352, write size 352 done, wrote 1000 buffers of 352 frames 0.10user 0.15system 0:09.01elapsed 2%CPU (0avgtext+0avgdata 9360maxresident)k
But, with one small change to have snd_pcm_writei() write buffers one frame larger than the period size, one gets this:
period size 352, write size 353 done, wrote 1000 buffers of 353 frames 1.92user 2.00system 0:09.04elapsed 43%CPU (0avgtext+0avgdata 9360maxresident)k
Simply increasing the write size by one caused the CPU usage to jump to 43% from 2%! Because more software writes in chunks equal to the period size it doesn't see this problem. That's why it wasn't found and fixed long ago.
I've figured out why this is happening. The problem is caused by how alsa-lib decides to sleep when waiting for available buffer space and the way the rate plugin can only process complete periods at a time.
Suppose we have the rate plugin with hw as its slave. Both have a ring buffer that is 3+ periods in size. Allowing for the rate conversion, the buffers and periods are the same size for both. avail_min is set to the normal value of one period for both.
Suppose there are two empty periods in the buffer and we call snd_pcm_writei() to write 1.5 periods. That fits in the empty space in the rate plugin's buffer and so gets written immediately. The rate plugin's buffer now has 0.5 empty periods of space. The rate plugin will resample just one period of the newly written data, because it can only process complete periods at a time, and write that to the hw buffer. So the hw buffer now has one free period. This means the rate plugin has less free space in its buffer than the hw has, because the rate plugin has half a period of data it can't yet resample and pass to the slave.
Now we write another one period of data with another snd_pcm_writei() call. This gets to snd_pcm_write_areas() in pcm.c and triggers this code (edited for brevity):
if ((state == SND_PCM_STATE_RUNNING && (snd_pcm_uframes_t)avail < pcm->avail_min && size > (snd_pcm_uframes_t)avail)) { err = snd_pcm_wait(pcm, -1); goto _again; }
avail is the available space in the rate plugin's buffer, which is half a period. pcm->avail_min is also for the rate plugin. size is the amount remaining in the writei() call, which is one period in this example.
As we can see, the conditions of the if() statement are true. avail of 0.5 periods is less than avail_min of 1 period and the size of the write (1 period) is greater than avail. So we call snd_pcm_wait() and then go back to the top when it returns. avail will get updated to the current value and this code runs again.
Eventually snd_pcm_wait() will call poll() and try to sleep in the kernel while waiting for more room to be available. But remember the hw buffer had one full period free! That's more than tghe half period that's free in the rate buffer and, and this is the key here, it is *more than avail_min*. So poll() will not sleep at all! That's where the 43% CPU comes from. alsa-lib is in a tight loop inside snd_pcm_write_areas() calling snd_pcm_wait() over and over again.
snd_pcm_wait() doesn't sleep, because it's based on the status of the hw buffer and that buffer does have free space. snd_pcm_write_areas() would write to the hw buffer and not try to sleep. But it's not looking at the hw buffer when it decides to sleep or write, it's looking at the rate buffer. And the rate buffer's fill level says to sleep, not write. And if snd_pcm_wait() was looking at the rate buffer, it would sleep.
We have the rate buffer that's current fill level says, "sleep first, then write" and the HW buffer that says the opposite, "write first, then sleep". The code that writes looks at the former and decides to sleep but the code that sleeps looks at the latter and decides it's not time sleep.
A trivial patch that stops this behavior is below, which simply forces the rate plugin's avail_min to 1.
diff --git a/src/pcm/pcm_rate.c b/src/pcm/pcm_rate.c index 41089d7..86f57c4 100644 --- a/src/pcm/pcm_rate.c +++ b/src/pcm/pcm_rate.c @@ -1142,6 +1142,7 @@ static int snd_pcm_rate_start(snd_pcm_t *pcm) return 0; } rate->start_pending = 0;
pcm->avail_min = 1; return snd_pcm_start(rate->gen.slave);
}
I think it would be better to have the rate plugin decrease avail_min by how much unresampled data it has yet to pass to the slave. This preserves the idea of avail_min if it were set to more than one period. This also fixes the problem. I've attached a real patch to do that.
Yeah, this is a known longtime PITA in the rate plugin.
Tweaking avail_min is also an option, but I'm afraid it's too hackish.
The primary problem is the discrepancy of the state "I have to wait until period size becomes available" between the rate plugin and the underlying PCM. Since the poll is performed only on the slave PCM, checking the rate plugin's avail is bogus in such a state. In other words, if the rate plugin puts the decision for the sleep simply depending on the slave state, there will be no inconsistency between them.
Below is a quick fix. Does it work for you?
thanks,
Takashi
--- diff --git a/src/pcm/pcm.c b/src/pcm/pcm.c index 5880057..33e39c6 100644 --- a/src/pcm/pcm.c +++ b/src/pcm/pcm.c @@ -2339,6 +2339,15 @@ int snd_pcm_open_named_slave(snd_pcm_t **pcmp, const char *name, } #endif
+/* return true if the PCM stream needs for waiting */ +static int check_for_wait(snd_pcm_t *pcm, snd_pcm_uframes_t avail) +{ + if (pcm->fast_ops->check_for_wait) + return pcm->fast_ops->check_for_wait(pcm, avail); + else + return avail < pcm->avail_min; +} + /** * \brief Wait for a PCM to become ready * \param pcm PCM handle @@ -2352,7 +2361,7 @@ int snd_pcm_open_named_slave(snd_pcm_t **pcmp, const char *name, */ int snd_pcm_wait(snd_pcm_t *pcm, int timeout) { - if (snd_pcm_mmap_avail(pcm) >= pcm->avail_min) { + if (check_for_wait(pcm, snd_pcm_mmap_avail(pcm))) { /* check more precisely */ switch (snd_pcm_state(pcm)) { case SND_PCM_STATE_XRUN: @@ -6771,14 +6780,14 @@ snd_pcm_sframes_t snd_pcm_write_areas(snd_pcm_t *pcm, const snd_pcm_channel_area goto _end; } if ((state == SND_PCM_STATE_RUNNING && - (snd_pcm_uframes_t)avail < pcm->avail_min && - size > (snd_pcm_uframes_t)avail)) { + size > (snd_pcm_uframes_t)avail && + check_for_wait(pcm, (snd_pcm_uframes_t)avail))) { if (pcm->mode & SND_PCM_NONBLOCK) { err = -EAGAIN; goto _end; }
- err = snd_pcm_wait(pcm, -1); + err = snd_pcm_wait_nocheck(pcm, -1); if (err < 0) break; goto _again; diff --git a/src/pcm/pcm_local.h b/src/pcm/pcm_local.h index e7798fd..1ce6a64 100644 --- a/src/pcm/pcm_local.h +++ b/src/pcm/pcm_local.h @@ -177,6 +177,7 @@ typedef struct { int (*poll_descriptors_count)(snd_pcm_t *pcm); int (*poll_descriptors)(snd_pcm_t *pcm, struct pollfd *pfds, unsigned int space); int (*poll_revents)(snd_pcm_t *pcm, struct pollfd *pfds, unsigned int nfds, unsigned short *revents); + int (*check_for_wait)(snd_pcm_t *pcm, snd_pcm_uframes_t avail); } snd_pcm_fast_ops_t;
struct _snd_pcm { diff --git a/src/pcm/pcm_rate.c b/src/pcm/pcm_rate.c index 4ba8521..c0a28aa 100644 --- a/src/pcm/pcm_rate.c +++ b/src/pcm/pcm_rate.c @@ -1067,6 +1067,17 @@ static int snd_pcm_rate_poll_revents(snd_pcm_t *pcm, struct pollfd *pfds, unsign return snd_pcm_poll_descriptors_revents(rate->gen.slave, pfds, nfds, revents); }
+/* we rely on the available space in the slave instead of the rate PCM itself, + * since there might be pending samples in the rate plugin buffer although + * the slave has an empty period. + */ +static int snd_pcm_rate_check_for_wait(snd_pcm_t *pcm, snd_pcm_uframes_t avail) +{ + snd_pcm_rate_t *rate = pcm->private_data; + snd_pcm_t *slave = rate->gen.slave; + return snd_pcm_avail_update(slave) < slave->avail_min; +} + static int snd_pcm_rate_drain(snd_pcm_t *pcm) { snd_pcm_rate_t *rate = pcm->private_data; @@ -1234,6 +1245,7 @@ static const snd_pcm_fast_ops_t snd_pcm_rate_fast_ops = { .poll_descriptors_count = snd_pcm_generic_poll_descriptors_count, .poll_descriptors = snd_pcm_generic_poll_descriptors, .poll_revents = snd_pcm_rate_poll_revents, + .check_for_wait = snd_pcm_rate_check_for_wait, };
static const snd_pcm_ops_t snd_pcm_rate_ops = {