[alsa-devel] [PATCH] Wrong latency in pulseaudio plugin breaks Adobe Flash Player
When Adobe Flash player is playing a live stream in "live" (no internal latency) mode[1], through alsa's pulseaudio plugin, the audio is a half-second out of sync with the video. This is with alsa-plugins version 1.0.25.[2]
I see the problem is one already reported on the bug tracker several years ago (issue #3944). The plugin sets pulseaudio's target buffer length to match the IO buffer length, and sets the prebuffering attribute to match the IO buffer length less one period.
In our case, the IO buffer is 500ms long, the IO period length is 20ms (for 16KHz Speex soud packets), and the application requests that playback start immediately (by setting the start_threshold software parameter very low).
However, the plugin ignores start_threshold and sets the pulseaudio target buffer length to 500ms, meaning pulseaudio plays the sound with a half-second delay.
The fix initially suggested in the bug report was to implement the start_threshold parameter and use it to lower the prebuffering attribute but this does not work: playback STARTS sooner, but when dropouts and underruns occur pulseaudio increases the latency to match the target buffer length. It is necessary to lower the target buffer length too.
Below is a proposed patch (my apologies if I've missed some subtleties as I've never looked at the alsa source code before). The patch changes the plugin to do the following:
-- decouple the length of the fake ring buffer seen by ALSA (which needs to match the IO buffer length) from the "target length" parameter sent to pulseaudio (which needs to reflect the desired latency), adding a new variable to the structure to store the former value. Currently they are forced to be the same value.
-- when the start_threshold software parameter is seen, lower both the target length and prebuffering attributes to match it, except don't set them less than one IO period or longer than the IO buffer length
-- when hardware parameters are seen, don't mess with the target length or prebuffering attributes if already set by software.
It doesn't currently tell pulseaudio to change its parameters if the start_threshold is enountered after the stream is already started, but that could be added too if necessary with if (pcm->stream) pa_stream_set_buffer_attr( ... )
I hope this patch (or something like it) can get put into an alsa release since otherwise the Flash Player 2-way interaction app we're developing won't be usable by Linux users unless it runs in about 750ms time-delayed mode!
Thank you,
Philip
[1] Normally Flash Player plays prerecorded streams with a slight internal buffering delay, and in this mode it requests the audio be buffered for 500ms in addition to its own internal delay, and it automatically delays video frames for 500ms beyond its own internal delay to compensate, so there are no out-of-sync problems with prerecorded streams or live streams played in regular buffered mode.
[2] In 1.0.23 with low-sample-rate streams there was no sync problem but each sound sample was truncated, resulting in distorted and crackly sound; this problem has been fixed in 1.0.25 -- thanks!
--------------------------------------------+------------------------------- Philip Spencer pspencer@fields.utoronto.ca | Director of Computing Services Room 336 (416)-348-9710 ext3036 | The Fields Institute for 222 College St, Toronto ON M5T 3J1 Canada | Research in Mathematical Sciences
Signed-Of-By: Philip Spencer pspencer@fields.utoronto.ca
--- alsa-plugins-1.0.25.orig/pulse/pcm_pulse.c 2012-01-25 02:57:07.000000000 -0500 +++ alsa-plugins-1.0.25/pulse/pcm_pulse.c 2012-02-15 16:23:08.000000000 -0500 @@ -49,6 +49,9 @@ pa_sample_spec ss; size_t frame_size; pa_buffer_attr buffer_attr; + uint32_t ring_buffer_length; + int tlength_set_by_sw; + } snd_pcm_pulse_t;
static int check_stream(snd_pcm_pulse_t *pcm) @@ -93,12 +96,12 @@ size -= pcm->offset;
/* Prevent accidental overrun of the fake ringbuffer */ - if (size > pcm->buffer_attr.tlength - pcm->frame_size) - size = pcm->buffer_attr.tlength - pcm->frame_size; + if (size > pcm->ring_buffer_length - pcm->frame_size) + size = pcm->ring_buffer_length - pcm->frame_size;
if (size > pcm->last_size) { pcm->ptr += size - pcm->last_size; - pcm->ptr %= pcm->buffer_attr.tlength; + pcm->ptr %= pcm->ring_buffer_length; }
pcm->last_size = size; @@ -830,12 +833,16 @@ pcm->ss.rate = io->rate; pcm->ss.channels = io->channels;
+ pcm->ring_buffer_length = + io->buffer_size * pcm->frame_size; pcm->buffer_attr.maxlength = 4 * 1024 * 1024; - pcm->buffer_attr.tlength = - io->buffer_size * pcm->frame_size; - pcm->buffer_attr.prebuf = - (io->buffer_size - io->period_size) * pcm->frame_size; + if (! pcm->tlength_set_by_sw) { + pcm->buffer_attr.tlength = + io->buffer_size * pcm->frame_size; + pcm->buffer_attr.prebuf = + (io->buffer_size - io->period_size) * pcm->frame_size; + } pcm->buffer_attr.minreq = io->period_size * pcm->frame_size; pcm->buffer_attr.fragsize = io->period_size * pcm->frame_size;
@@ -845,6 +852,36 @@ return err; }
+static int pulse_sw_params(snd_pcm_ioplug_t * io, + snd_pcm_sw_params_t * params) +{ + snd_pcm_pulse_t *pcm = io->private_data; + int err = 0; + + assert(pcm); + + if (!pcm->p || !pcm->p->mainloop) + return -EBADFD; + + pa_threaded_mainloop_lock(pcm->p->mainloop); + snd_pcm_uframes_t start_thresh; + err = snd_pcm_sw_params_get_start_threshold(params, &start_thresh); + if (! err) { + if (start_thresh < io->period_size) + start_thresh = io->period_size; + if (start_thresh > io->buffer_size) + start_thresh = io->buffer_size; + + pcm->buffer_attr.tlength = start_thresh * pcm->frame_size; + pcm->buffer_attr.prebuf = start_thresh * pcm->frame_size; + pcm->tlength_set_by_sw = 1; + } + + pa_threaded_mainloop_unlock(pcm->p->mainloop); + return err; + +} + static int pulse_close(snd_pcm_ioplug_t * io) { snd_pcm_pulse_t *pcm = io->private_data; @@ -911,6 +948,7 @@ .poll_revents = pulse_pcm_poll_revents, .prepare = pulse_prepare, .hw_params = pulse_hw_params, + .sw_params = pulse_sw_params, .close = pulse_close, .pause = pulse_pause }; @@ -1080,6 +1118,7 @@ pcm->io.mmap_rw = 0; pcm->io.callback = stream == SND_PCM_STREAM_PLAYBACK ? &pulse_playback_callback : &pulse_capture_callback; + pcm->tlength_set_by_sw = 0; pcm->io.private_data = pcm;
err = snd_pcm_ioplug_create(&pcm->io, name, stream, mode);
On 02/16/2012 04:48 PM, Philip Spencer wrote:
When Adobe Flash player is playing a live stream in "live" (no internal latency) mode[1], through alsa's pulseaudio plugin, the audio is a half-second out of sync with the video. This is with alsa-plugins version 1.0.25.[2]
Hi and thanks for investigating!
I see the problem is one already reported on the bug tracker several years ago (issue #3944). The plugin sets pulseaudio's target buffer length to match the IO buffer length, and sets the prebuffering attribute to match the IO buffer length less one period.
In our case, the IO buffer is 500ms long, the IO period length is 20ms (for 16KHz Speex soud packets), and the application requests that playback start immediately (by setting the start_threshold software parameter very low).
Hmm, if you want ~20 ms of latency, why have a 500 ms long buffer in the first place?
However, the plugin ignores start_threshold and sets the pulseaudio target buffer length to 500ms, meaning pulseaudio plays the sound with a half-second delay.
The fix initially suggested in the bug report was to implement the start_threshold parameter and use it to lower the prebuffering attribute
This sounds like a reasonable thing to do.
but this does not work: playback STARTS sooner, but when dropouts and underruns occur pulseaudio increases the latency to match the target buffer length.
Just a quick question: Are you saying there is an immediate change from 20 to 500 ms at the first underrun, or that things gradually adapts up to a stable level?
It is necessary to lower the target buffer length too.
I think this part requires more investigation though. First, I'm assuming you are talking about underruns between the client and PulseAudio, rather than underruns between PulseAudio and the hardware.
Setting a tlength of 500 ms will have PulseAudio believe you intend to have latency in the range of 250 - 500 ms, so you're starting with the buffer almost empty, and if I understand it correctly, things continue that way? This is usually bad and a recipe for underruns. Anyway, this is probably why lowering the tlength works for your particular use case.
Note though that I'm sceptic to the tlength part of the patch not mainly for philosophical reasons (hey, I want things to just work too!) but because I'm afraid it will fix some applications and break others. Emulating ALSA over PulseAudio is not easy due to the asynchronous nature of PulseAudio (among other things), and applications use and expect ALSA to do different things.
[2] In 1.0.23 with low-sample-rate streams there was no sync problem but each sound sample was truncated, resulting in distorted and crackly sound; this problem has been fixed in 1.0.25 -- thanks!
Hmm, I wonder why this could be. Probably because we have changed the underrun handling a few times, but it always seems to improve one application and at the same time break another :-(
Thanks for the reply!
In our case, the IO buffer is 500ms long, the IO period length is 20ms (for 16KHz Speex soud packets), and the application requests that playback start immediately (by setting the start_threshold software parameter very low).
Hmm, if you want ~20 ms of latency, why have a 500 ms long buffer in the first place?
It seems to be hardcoded into Flash Player, not under control of the Flash Player app, and Flash Player is closed source so I can't say why for sure (I'm merely observing what it does).
However, I imagine the intent is to be reslient against a wide variety of network conditions.
If audio arrives in uniformly spaced 20ms packets, then it should be rendered with only 20ms latency. But if there's a lot of network congestion and high jitter, several hundred ms could arrive at once. If Flash Player used a short buffer, these audio samples would get lost. By using a 500ms buffer, playback is smooth, with only as much latency as is necessitated by the network jitter.
So: short buffer, short latency request --> samples get lost if they arrive with high jitter
long buffer, long latency request --> long undesired latency
long buffer, short latency request --> smooth playback always, with short latency when possible, longer latency only when necessitated by high jitter
but this does not work: playback STARTS sooner, but when dropouts and underruns occur pulseaudio increases the latency to match the target buffer length.
Just a quick question: Are you saying there is an immediate change from 20 to 500 ms at the first underrun, or that things gradually adapts up to a stable level?
It's fairly immediate (within the first half-second, anyway).
It is necessary to lower the target buffer length too.
I think this part requires more investigation though. First, I'm assuming you are talking about underruns between the client and PulseAudio, rather than underruns between PulseAudio and the hardware.
I'm actually not entirely sure which is happening. But whichever one it is, pulseaudio very quickly adjusts things and pauses playback until the buffer fills up to its target length. Perhaps a pulseaudio developer could comment on whether that is indeed pulseaudio's intended behaviour.
Setting a tlength of 500 ms will have PulseAudio believe you intend to have latency in the range of 250 - 500 ms, so you're starting with the buffer almost empty, and if I understand it correctly, things continue that way? This is usually bad and a recipe for underruns. Anyway, this is probably why lowering the tlength works for your particular use case.
Yes, it does not seem to be right to set PulseAudio's tlength to 500ms just because ALSA's IO buffer is 500ms -- I think ALSA's IO buffer length corresponds better to PulseAudio's *maxlength* than to tlength. And the PulseAudio docs, though being a little ambiguous on the exact working of tlength and prebuf, do recommend setting prebuf to be the same as tlength.
Note though that I'm sceptic to the tlength part of the patch not mainly for philosophical reasons (hey, I want things to just work too!) but because I'm afraid it will fix some applications and break others. Emulating ALSA over PulseAudio is not easy due to the asynchronous nature of PulseAudio (among other things), and applications use and expect ALSA to do different things.
The patch will only affect applications which (a) Set an explicit start threshold requesting playback to begin before the buffer is filled, AND (b) Specify a low value for IO period
If an app specifically requests both an early playback start and a small IO period, then I think it's reasonable to honour that request and ask pulseaudio to maintain low latency.
Your comment, though, about "500 ms tlength means pulseaudio thinks you want a latency between 250ms and 500ms" -- if that really is the case, and tlength=n means pulseaudio thinks latency should be between n/2 and n, then perhaps it would be safer to only lower tlength to
2 * (larger of start threshold, IO period)
instead of 1 * (larger of start threshold, IO period) as in my patch.
- Philip
--------------------------------------------+------------------------------- Philip Spencer pspencer@fields.utoronto.ca | Director of Computing Services Room 336 (416)-348-9710 ext3036 | The Fields Institute for 222 College St, Toronto ON M5T 3J1 Canada | Research in Mathematical Sciences
On 02/17/2012 05:25 AM, Philip Spencer wrote:
Thanks for the reply!
In our case, the IO buffer is 500ms long, the IO period length is 20ms (for 16KHz Speex soud packets), and the application requests that playback start immediately (by setting the start_threshold software parameter very low).
Hmm, if you want ~20 ms of latency, why have a 500 ms long buffer in the first place?
It seems to be hardcoded into Flash Player, not under control of the Flash Player app, and Flash Player is closed source so I can't say why for sure (I'm merely observing what it does).
So this is the usual problem with closed source applications: no application is bug free, but when the closed source application is broken, it's much harder to fix it. The sad reality is that by depending on a closed source application, you're putting yourself in this situation.
However, I imagine the intent is to be reslient against a wide variety of network conditions.
If audio arrives in uniformly spaced 20ms packets, then it should be rendered with only 20ms latency. But if there's a lot of network congestion and high jitter, several hundred ms could arrive at once. If Flash Player used a short buffer, these audio samples would get lost. By using a 500ms buffer, playback is smooth, with only as much latency as is necessitated by the network jitter.
So: short buffer, short latency request --> samples get lost if they arrive with high jitter
long buffer, long latency request --> long undesired latency
long buffer, short latency request --> smooth playback always, with short latency when possible, longer latency only when necessitated by high jitter
The way most applications (should) work with ALSA is that they fill the buffer up completely, when it's full they start playback, and at every period interrupt they fill up yet another period. So when you have a large number of periods (like in your case, 500/20 = 25 periods), the buffer is almost full all the time.
Now; if an application start playback with an empty buffer (either by calling "trigger" or by setting a low start threshold), should we assume that 1) the application is going to stay almost empty all the time, like it seems Flash does in this scenario, or 2) fill it up completely and stay there (IIRC, mpg123 did this)?
Lowering tlength might be reasonable in scenario 1, but scenario 2 will cause the client to block (trying to fill an already filled buffer) in ways that differs from the expected behaviour.
but this does not work: playback STARTS sooner, but when dropouts and underruns occur pulseaudio increases the latency to match the target buffer length.
Just a quick question: Are you saying there is an immediate change from 20 to 500 ms at the first underrun, or that things gradually adapts up to a stable level?
It's fairly immediate (within the first half-second, anyway).
Ok, thanks.
It is necessary to lower the target buffer length too.
I think this part requires more investigation though. First, I'm assuming you are talking about underruns between the client and PulseAudio, rather than underruns between PulseAudio and the hardware.
I'm actually not entirely sure which is happening. But whichever one it is, pulseaudio very quickly adjusts things and pauses playback until the buffer fills up to its target length. Perhaps a pulseaudio developer could comment on whether that is indeed pulseaudio's intended behaviour.
That's a good question. For investigation, maybe you could compile a version without the tlength adjustment (only the prebuf adjustment), and then make a PulseAudio log [1] and pastebin it (it's probably too large to make it as an attachment to this list) where we can see how PulseAudio changes the buffers when underruns happen?
Setting a tlength of 500 ms will have PulseAudio believe you intend to have latency in the range of 250 - 500 ms, so you're starting with the buffer almost empty, and if I understand it correctly, things continue that way? This is usually bad and a recipe for underruns. Anyway, this is probably why lowering the tlength works for your particular use case.
Yes, it does not seem to be right to set PulseAudio's tlength to 500ms just because ALSA's IO buffer is 500ms -- I think ALSA's IO buffer length corresponds better to PulseAudio's *maxlength* than to tlength. And the PulseAudio docs, though being a little ambiguous on the exact working of tlength and prebuf, do recommend setting prebuf to be the same as tlength.
As you probably already have discovered, PulseAudio's buffering does not work exactly the way ALSA's buffering does. Most applications try to keep their buffers as full as they can, so that's the case we're optimising for. In that case, setting tlength to the IO buffer length is the correct way to do it.
As for maxlength, that's an interesting question. But if maxlength is set to -1 or to 500 ms I don't think matters for your particular case.
Note though that I'm sceptic to the tlength part of the patch not mainly for philosophical reasons (hey, I want things to just work too!) but because I'm afraid it will fix some applications and break others. Emulating ALSA over PulseAudio is not easy due to the asynchronous nature of PulseAudio (among other things), and applications use and expect ALSA to do different things.
The patch will only affect applications which (a) Set an explicit start threshold requesting playback to begin before the buffer is filled, AND (b) Specify a low value for IO period
If an app specifically requests both an early playback start and a small IO period, then I think it's reasonable to honour that request and ask pulseaudio to maintain low latency.
I think the root cause of this problem is that PulseAudio does not handle almost-empty buffers very well, and this is the problem we need to fix, rather than to work around it in the ALSA plugin layer.
Your comment, though, about "500 ms tlength means pulseaudio thinks you want a latency between 250ms and 500ms" --
E g, it tends to send out underrun reports although there might still be audio left to play back (it's just in another buffer), with tlength=500 ms this might happen when there is still up to 250 ms left. (One difference though; ALSA Plugins do not seem to run in adjust latency mode though - maybe they should? - so the figures might not apply directly.) I think this is bad, but not everyone agrees. If you're interested, you can read about a similar thread with VLC which starts here [2], in particular read [3] for why some think this is correct to do so...
if that really is the case, and tlength=n means pulseaudio thinks latency should be between n/2 and n, then perhaps it would be safer to only lower tlength to
2 * (larger of start threshold, IO period)
instead of 1 * (larger of start threshold, IO period) as in my patch.
Yes, also because minreq should be a lot lower than tlength.
It seems you are right ... things are much more complicated than I realized.
Now; if an application start playback with an empty buffer (either by calling "trigger" or by setting a low start threshold), should we assume that 1) the application is going to stay almost empty all the time, like it seems Flash does in this scenario, or 2) fill it up completely and stay there (IIRC, mpg123 did this)?
Lowering tlength might be reasonable in scenario 1, but scenario 2 will cause the client to block (trying to fill an already filled buffer) in ways that differs from the expected behaviour.
Obviously the plugin needs to work properly for both scenarios -- and indeed Flash itself could be in either scenario, depending on whether it's receiving packets from the network smoothly or in bunches.
I agree that it is unacceptable for the client to block if it's writing fewer bytes than the buffer length it asked for. I did not realize pulseaudio would do that. I thought that, if there were more than tlength bytes in the buffer, pulseaudio simply would not issue the "I want more data!" callback to the plugin until the length dropped below tlength/2, but that client was free to write, without blocking, as many bytes as it wished up to maxlength. Apparently I was wrong.
Next complication -- when I went to get a pulseaudio log of how the latency rose when only prebuf was lowered, I found I had accidentally been compiling the plugin when configure had been run against an older pulseaudio version. Compiling it against pulseaudio 1.1 caused everything to break again -- samples coming with no latency (good) but with part of each sample cut off (very bad). I realized this is because the plugin is now setting handle_underrun to 1 by default. With handle_underrun turned off, I get the behaviour I saw before: works perfectly with tlength lowered, but latency jumps up to 500ms when tlength=500ms, even with prebuf lowered.
Putting some trace statements into the plugin code, I see the following sequence of events is happening:
1. Alsa initialized with buffer = 500ms, io period = 20ms, start threshold = 20ms, 16 kHz audio, 2byte samples, 1 channel.
2. Flash writes 640 bytes (320 samples, 20ms) and these are sent by the plugin to pulseaudio.
3. An *explicit* stream start is then called (so in this case the prebuf setting is actually totally irrelevant!)
4. 11 milliseconds later, the plugin gets an underrun callback from PulseAudio. (Note that at this point only half the sample will actually have been played, so this is definitely connected to the "false alarm" underrun discussions you pointed me to).
5. If the plugin's underrun handling is enabled, this causes the whole stream to be closed and re-opened. Unplayed samples are lost, but the next packet plays at exactly the right time. Result: in-sync, but unacceptably distorted, audio.
6. If the plugin's underrun handling is disabled, nothing further happens. Flash writes the next 640 bytes about 16ms after its first packet was written (so well before the first 20ms have finished actually playing).
7. From then on in, there are *NO* underruns reported back to the plugin from pulseaudio.
8. However, I hear several brief hiccups in playback, and the pulseaudio latency very quickly increases to 500ms.
I assume, then, that the tlength value also influences the buffering between pulseaudio and the hardware sink, and it is this that is underrunning and causing pulseaudio to raise the latency when it is much larger than the fill level at which the client keeps the buffer.
This means that something probably needs to be fixed in pulseaudio. For the plugin, there are probably some things that could be changed but they won't help me directly:
- Definitely prebuf should be lowered if the client requests an explicit start threshold, but this fix won't have any impact on my situation since the stream is already being started explicitly.
- Something should be done about that early false-alarm underrun. Maybe the pulse_start routine should not do anything if fewer than n IO periods' worth of data has been written yet (where n is to be determined), just set a flag and have the start retried after enough data is written? Or maybe an underrun detected when fewer than n IO period's worth of data have been sent should be silently ignored?
(If no such fix is made, though, I can easily work around things by arranging my data stream so that Flash player always gets sound packets in pairs and that will avoid this underrun. Indeed, I note that the ffmpeg encoder by default always stuffs two Speex packets together into a single packet and notes that Flash Player doesn't play it properly otherwise; I wonder if this is why?)
Now I guess it's time to dig in to pulseaudio, and see whether (a) it can be changed so that the client is free to write more than tlength bytes without blocking, or (b) the buffering between pulseaudio and its hardware sink can be adjusted to use prebuf or minreq or something when the stream is started early.
So this is the usual problem with closed source applications: no application is bug free, but when the closed source application is broken, it's much harder to fix it. The sad reality is that by depending on a closed source application, you're putting yourself in this situation.
I agree wholeheartedly about the drawback with closed source applications. Unfortunately we don't really have any choice: we need something that can work for anybody who has a browser and webcam, without the need for them to install any special software from us, and until HTML5's device access (camera/microphone) and WebSocket APIs have matured and become available on a majority of browsers, Flash Player is pretty much the only choice. I do look forward, though, to being able to ditch it in favour of an all-in-browser, HTML5 solution in a few years once that technology is ready.
Besides, I really don't think there are any bugs in Flash here. It has to try to deliver audio packets to the sound system as soon as they are received and decoded from the network. It can't set a small IO buffer in case it gets a large bunch of packets at once. It has to set an early playback start threshold to minimize latency. So I really don't see what else it can do (except maybe waiting until it sent the second packet to start the stream, instead of starting it after only the first) -- do you have any thoughts about what it could/should have been doing differently?
Anyway, I will do some more digging into PulseAudio (and try to get a detailed log of what's happening).
Thanks,
Philip
--------------------------------------------+------------------------------- Philip Spencer pspencer@fields.utoronto.ca | Director of Computing Services Room 336 (416)-348-9710 ext3036 | The Fields Institute for 222 College St, Toronto ON M5T 3J1 Canada | Research in Mathematical Sciences
I have pulseaudio logs from two runs of FlashPlayer. In each one I commented out the plugin's call to pa_stream_trigger just to avoid the irrelevant problem of FlashPlayer starting the stream a little too early; I merely patched it to set prebuf low.
In the first run, logged at www.fields.utoronto.ca/~pspencer/pulsebad.log, I made no changes to tlength so it was left at 500ms. This resulted in audio with several hiccups during the first two seconds and latency rising to 500ms after two seconds.
In the second run I altered the patch to lower tlength. This resulted in smooth audio with no discernible latency. The log is at
www.fields.utoronto.ca/~pspencer/pulsegood.log
Unfortunately, there is no discernible difference between the two logs. A diff of the log files (with the initial timestamp removed before diffing) is at www.fields.utoronto.ca/~pspencer/pulse.diff. The only changes are
(a) different addresses for things like bus ids (b) the different tlength and latency values being reported (c) minor differences (on the order of a few bytes / a few microseconds) in some of the rewinds and volume adjustments.
So I am still at a loss as to why lowering tlength produces smooth audio and raising it causes hiccups and -- presumably -- hardware underruns that aren't getting logged.
- Philip
--------------------------------------------+------------------------------- Philip Spencer pspencer@fields.utoronto.ca | Director of Computing Services Room 336 (416)-348-9710 ext3036 | The Fields Institute for 222 College St, Toronto ON M5T 3J1 Canada | Research in Mathematical Sciences
On 02/17/2012 08:36 PM, Philip Spencer wrote:
It seems you are right ... things are much more complicated than I realized.
Indeed. Welcome :-)
Now; if an application start playback with an empty buffer (either by calling "trigger" or by setting a low start threshold), should we assume that
- the
application is going to stay almost empty all the time, like it seems Flash does in this scenario, or 2) fill it up completely and stay there (IIRC, mpg123 did this)?
Lowering tlength might be reasonable in scenario 1, but scenario 2 will cause the client to block (trying to fill an already filled buffer) in ways that differs from the expected behaviour.
Obviously the plugin needs to work properly for both scenarios -- and indeed Flash itself could be in either scenario, depending on whether it's receiving packets from the network smoothly or in bunches.
This smooth vs bunches thing is something we haven't talked much about, but I guess Flash would have some kind of latency adaption to try determine whether packages are delayed, dropped, or on time and adjust latency accordingly to how many percent of packages are in one category or another?
I agree that it is unacceptable for the client to block if it's writing fewer bytes than the buffer length it asked for. I did not realize pulseaudio would do that.
At least it does if you use the simple API to access it. The asynchronous API gives you a few options.
I thought that, if there were more than tlength bytes in the buffer, pulseaudio simply would not issue the "I want more data!" callback to the plugin until the length dropped below tlength/2, but that client was free to write, without blocking, as many bytes as it wished up to maxlength. Apparently I was wrong.
PulseAudio adjusts tlength upwards as it sees it needs to, to avoid underruns. Maxlength only tells PA not to adjust tlength any further than up to maxlength.
Next complication -- when I went to get a pulseaudio log of how the latency rose when only prebuf was lowered, I found I had accidentally been compiling the plugin when configure had been run against an older pulseaudio version. Compiling it against pulseaudio 1.1 caused everything to break again -- samples coming with no latency (good) but with part of each sample cut off (very bad). I realized this is because the plugin is now setting handle_underrun to 1 by default. With handle_underrun turned off, I get the behaviour I saw before: works perfectly with tlength lowered, but latency jumps up to 500ms when tlength=500ms, even with prebuf lowered.
Putting some trace statements into the plugin code, I see the following sequence of events is happening:
- Alsa initialized with buffer = 500ms, io period = 20ms,
start threshold = 20ms, 16 kHz audio, 2byte samples, 1 channel.
- Flash writes 640 bytes (320 samples, 20ms) and these are sent by the
plugin to pulseaudio.
- An *explicit* stream start is then called (so in this case the prebuf
setting is actually totally irrelevant!)
- 11 milliseconds later, the plugin gets an underrun callback from
PulseAudio. (Note that at this point only half the sample will actually have been played, so this is definitely connected to the "false alarm" underrun discussions you pointed me to).
- If the plugin's underrun handling is enabled, this causes the whole
stream to be closed and re-opened. Unplayed samples are lost, but the next packet plays at exactly the right time. Result: in-sync, but unacceptably distorted, audio.
- If the plugin's underrun handling is disabled, nothing further happens.
Flash writes the next 640 bytes about 16ms after its first packet was written (so well before the first 20ms have finished actually playing).
I think the 4 ms margin here is quite tight; given that the threads we're working with are normal priority. A small thing in the background, like fetching new email or checking distribution updates or maybe even flushing cache to disk can cause an underrun; regardless of whether you're running PulseAudio or not.
- From then on in, there are *NO* underruns reported back to the plugin
from pulseaudio.
- However, I hear several brief hiccups in playback, and the pulseaudio
latency very quickly increases to 500ms.
Does 7 and 8 both apply to both underrun enabled and underrun disabled? And if there are no underruns reported, how come every sample is cut off...?
I assume, then, that the tlength value also influences the buffering between pulseaudio and the hardware sink, and it is this that is underrunning and causing pulseaudio to raise the latency when it is much larger than the fill level at which the client keeps the buffer.
The buffer parameters affects the buffering between PulseAudio and the hardware sink, yes.
What bothers me is that I don't see either type of underrun in the logs you provided. Underrun on PA - hardware looks something like:
"alsa-sink.c: Underrun!"
Underrun on client - PA looks something like:
"protocol-native.c: Underrun on '<name>', 0 bytes in queue"
There's something fishy here...
This means that something probably needs to be fixed in pulseaudio. For the plugin, there are probably some things that could be changed but they won't help me directly:
- Definitely prebuf should be lowered if the client requests an explicit
start threshold, but this fix won't have any impact on my situation since the stream is already being started explicitly.
- Something should be done about that early false-alarm underrun.
Maybe the pulse_start routine should not do anything if fewer than n IO periods' worth of data has been written yet (where n is to be determined), just set a flag and have the start retried after enough data is written? Or maybe an underrun detected when fewer than n IO period's worth of data have been sent should be silently ignored?
(If no such fix is made, though, I can easily work around things by arranging my data stream so that Flash player always gets sound packets in pairs and that will avoid this underrun. Indeed, I note that the ffmpeg encoder by default always stuffs two Speex packets together into a single packet and notes that Flash Player doesn't play it properly otherwise; I wonder if this is why?)
Now I guess it's time to dig in to pulseaudio, and see whether (a) it can be changed so that the client is free to write more than tlength bytes without blocking, or (b) the buffering between pulseaudio and its hardware sink can be adjusted to use prebuf or minreq or something when the stream is started early.
I don't recommend trying (a). I think a lot of applications, including those using alsa-plugins, is depending on blocking to work correctly. (b) The person who wrote all this, Lennart Poettering, stopped working on PulseAudio two years ago to work on systemd instead. Us who are working on it now, still try to understand all details and consequences of the buffering. You're welcome to try though, but don't expect it to easy :-)
So this is the usual problem with closed source applications: no application is bug free, but when the closed source application is broken, it's much harder to fix it. The sad reality is that by depending on a closed source application, you're putting yourself in this situation.
I agree wholeheartedly about the drawback with closed source applications. Unfortunately we don't really have any choice: we need something that can work for anybody who has a browser and webcam, without the need for them to install any special software from us, and until HTML5's device access (camera/microphone) and WebSocket APIs have matured and become available on a majority of browsers, Flash Player is pretty much the only choice. I do look forward, though, to being able to ditch it in favour of an all-in-browser, HTML5 solution in a few years once that technology is ready.
Besides, I really don't think there are any bugs in Flash here. It has to try to deliver audio packets to the sound system as soon as they are received and decoded from the network. It can't set a small IO buffer in case it gets a large bunch of packets at once.
It could set a small IO buffer, buffer them internally in flash and pump out them every 20 ms when PA asks for more data.
It has to set an early playback start threshold to minimize latency. So I really don't see what else it can do (except maybe waiting until it sent the second packet to start the stream, instead of starting it after only the first) -- do you have any thoughts about what it could/should have been doing differently?
I would go for the small buffer *and* have an almost filled buffer when I start playback, with at least two periods filled. You could also consider changing buffer sizes when circumstances change.
But this is IMO a grey zone of what is wrong and right. I believe we could and should do things in PulseAudio as well to make it handle almost-empty client buffers better, and to not send underrun reports until we're sure all samples have been played back.
Anyway, I will do some more digging into PulseAudio (and try to get a detailed log of what's happening).
Somewhat simplified: protocol-native.c would be the most interesting file for the client-PA connection, and alsa-sink.c would be the most interesting file for the PA-hardware connection. IIRC, there is a DEBUG_TIMING conditional that might be interesting to turn on, and several commented out logs in protocol-native.c as well.
Thank you for the reply! (I've not quoted below because it was getting too cumbersome and the contents of this message are somewhat different from the previous discussions).
I have now dug into exactly what's happening by putting trace statements in all the alsa calls, and it turns out to be something very different from what I thought. It isn't pulseaudio at all that's introducing the extra latency, nor are underruns happening there.
What's happening is this:
- Flash Player's understanding of ALSA polling and snd_pcm_wait is different from how ALSA actually behaves; it expects to be able to use these to wake up at the time of the next IO period.
- After putting the first few packets into the buffer (each packet has 20ms of samples, and 20ms is the IO period length), it waits about 16.4 ms then calls snd_pcm_wait.
- It expects the call to return at the start of the next I/O period (when the first 20ms worth of data have been played) and then writes the next packet, which by then it will have received from the network.
- However, that's not how ALSA works. If the buffer isn't full, polling returns immediately, instead of waiting for the next I/O period.
- Flash player then thinks playback is happening at an accelerated rate. The first time it probably has a packet or two buffered up internally so it sends those packets, but when these run out it inserts extra packets of silence and then starts trying to resample the incoming data to match what it thinks is a faster playback rate, resulting in extra packets being passed to ALSA: a 20ms packet is written every 16.4ms.
- Eventually (after about 2.5 seconds) these extra packets have filled up the buffer and then Flash Player's snd_pcm_wait calls actually block until the next I/O period. At this point playback stabilizes but with a 500ms latency due to the now-full buffer.
This presumably happens with other ALSA backends too, not just the pulseaudio plugin.
If I change the pulseaudio plugin so that:
-- Whenever a write request callback is received from pulseaudio, it records the writable size
-- Whenever writes occur that reduce the writable size by at least an IO period's worth from the value that was recorded, polling is deactivated, and the "avail_min" parameter is adjusted so that the pcm will only be considered "ready" when the available bytes rise by an IO period's amount.
-- When the next write request callback is received (pulseaudio seems to send them whenever an IO period's worth of samples are transferred to the sink), avail_min is reset to its normal value and polling is reactivated
thereby establishing the polling behaviour flash player seems to expect, then it works flawlessly.
However, that's probably not how ALSA was intended to behave -- or is it? The documentation is somewhat ambiguous, and it does seem to imply in places that polling can be used to "wake up once every IO period".
What is the "right" behaviour in the following scenario?
1. An app writes samples, at least one I/O period's worth, but not a full buffer's worth.
2. The app then calls pcm_snd_wait, or calls poll or select on the polling file descriptors.
3. What should happen?
(a) The call returns immediately, since there's lots of room in the buffer.
(b) The call returns after one I/O period's worth of data has been drained from the buffer.
If the correct answer is (a) (ALSA's current behaviour), then is there any mechanism for an app to achieve (b) -- get woken up after one I/O period's worth of data have been drained (other than just doing some infinite loop of sleeping then checking snd_pcm_avail periodically)?
If the correct answer is (b), then things would need to be changed in ALSA, but would this break any existing apps that wait or poll before filling up the buffer?
If ALSA were to be changed to behave as (b), then an app can easily achieve the old behaviour (a) by simply doing
if (snd_pcm_avail(...) < ...) snd_pcm_wait(...)
instead of a plain snd_pcm_wait, but I don't see how to easily achieve (b) if ALSA behaves as (a).
Do you, or anyone, have a definitive answer on what the "right" polling behaviour should be, and is ALSA currently doing the right thing or not?
Thanks,
Philip
--------------------------------------------+------------------------------- Philip Spencer pspencer@fields.utoronto.ca | Director of Computing Services Room 336 (416)-348-9710 ext3036 | The Fields Institute for 222 College St, Toronto ON M5T 3J1 Canada | Research in Mathematical Sciences
On 02/23/2012 03:50 AM, Philip Spencer wrote:
What is the "right" behaviour in the following scenario?
- An app writes samples, at least one I/O period's worth, but not
a full buffer's worth.
- The app then calls pcm_snd_wait, or calls poll or select on the
polling file descriptors.
- What should happen?
(a) The call returns immediately, since there's lots of room in the buffer.
(b) The call returns after one I/O period's worth of data has been drained from the buffer.
I would say that (a) is the correct one. According to the documentation, snd_pcm_wait says:
/** * \brief Wait for a PCM to become ready * \param pcm PCM handle * \param timeout maximum time in milliseconds to wait, * a negative value means infinity * \return a positive value on success otherwise a negative error code * (-EPIPE for the xrun and -ESTRPIPE for the suspended status, * others for general errors) * \retval 0 timeout occurred * \retval 1 PCM stream is ready for I/O */
Since the PCM stream is ready for I/O (the buffer is not full, you can write more data to it), it is correct for it to return.
If the correct answer is (a) (ALSA's current behaviour), then is there any mechanism for an app to achieve (b) -- get woken up after one I/O period's worth of data have been drained (other than just doing some infinite loop of sleeping then checking snd_pcm_avail periodically)?
Good question, I'm not really sure. Well, setting a system timer is obviously more power efficient than checking snd_pcm_avail every ms or so.
If the correct answer is (b), then things would need to be changed in ALSA, but would this break any existing apps that wait or poll before filling up the buffer?
If ALSA were to be changed to behave as (b), then an app can easily achieve the old behaviour (a) by simply doing
if (snd_pcm_avail(...) < ...) snd_pcm_wait(...)
instead of a plain snd_pcm_wait, but I don't see how to easily achieve (b) if ALSA behaves as (a).
Well, changing the behaviour in all applications is not trivial considering the amount of applications out there, including closed source ones ;-)
As for Flash, it looks to me that they should reduce the total length of the buffer to the latency they want to achieve. If they need to change the latency while playing back, I don't think there is a way to do that in ALSA without closing and re-opening the stream. If you use the PulseAudio client API directly, you can use pa_stream_set_buffer_attr.
That is, try working with filled buffers and change the buffer length if needed, rather than working with almost empty buffers.
Philip Spencer wrote:
3. What should happen? (a) The call returns immediately, since there's lots of room in the buffer. (b) The call returns after one I/O period's worth of data has been drained from the buffer.
(a)
If the correct answer is (a) (ALSA's current behaviour), then is there any mechanism for an app to achieve (b) -- get woken up after one I/O period's worth of data have been drained (other than just doing some infinite loop of sleeping then checking snd_pcm_avail periodically)?
Adjust avail_min (but note that wakeups happen only at period boundaries).
Regards, Clemens
On Thu, 23 Feb 2012, Clemens Ladisch wrote:
Philip Spencer wrote:
3. What should happen? (a) The call returns immediately, since there's lots of room in the buffer. (b) The call returns after one I/O period's worth of data has been drained from the buffer.
(a)
If the correct answer is (a) (ALSA's current behaviour), then is there any mechanism for an app to achieve (b) -- get woken up after one I/O period's worth of data have been drained (other than just doing some infinite loop of sleeping then checking snd_pcm_avail periodically)?
Adjust avail_min (but note that wakeups happen only at period boundaries).
That won't work if using the pulseaudio plugin as the backend -- it'll stop the first part of snd_pcm_wait from succeeding right away, but when it moves on to snd_pcm_wait_nocheck the poll call will return right away, because the pulseaudio plugin marks the file descriptor as ready whenever there are more than buffer_attr.minreq bytes available in the buffer.
Perhaps (though that won't help me, as I don't imagine I'll have any success getting Flash Player developers to change anything!, but it may help any other similar app) that should be fixed in the pulseaudio plugin:
- Implement sw_params
- When sw_params is called, check the avail_min value and record it as a new field avail_min (and if it's different from the previous recorded value and the stream is running, call update_active)
- In check_active, instead of "return wsize >= pcm->buffer_attr.minreq", "return wsize >= pcm->avail_min". (Question: should buffer_attr.minreq be changed too, or left alone?)
- At the same time the start_threshold parameter could be implemented too.
I could whip up a patch for that if needed, although of course it won't actually help my situation -- I may just have to be resigned to using non-live mode with about a 3/4 second latency for linux users.
Regards,
Philip
--------------------------------------------+------------------------------- Philip Spencer pspencer@fields.utoronto.ca | Director of Computing Services Room 336 (416)-348-9710 ext3036 | The Fields Institute for 222 College St, Toronto ON M5T 3J1 Canada | Research in Mathematical Sciences
participants (3)
-
Clemens Ladisch
-
David Henningsson
-
Philip Spencer