[alsa-devel] underruns and strange code in pcm_rate.c (and patch)
Hi.
I spent 4 week-ends debugging the constant underruns and lock-ups with the libao-based programs. Now I've finally tracked the problem to some strange code in pcm_rate.c, namely snd_pcm_rate_poll_descriptors(). I am not sure what was the reason behind altering the "avail_min" every time. Looks like some heuristic was intended, say "see what amount of fragment part was written now, and make sure that amount of free space will be available next time, after poll". Whatever, this code gives underruns. It can increase avail_min by almost period_size. libao sets avail_min==period_size, so we can end up with avail_min==period_size*2. libao sets buffer_size==stop_threshold==period_size*2. So we end up with avail_min~=stop_threshold, which gives underruns. Since I don't know how this code was indended to work, I just removed it, and everything works perfectly now (also had to patch libao to check the returned rate).
The patch is attached, any comments?
Signed-off-by: Stas Sergeev stsp@aknet.ru
At Mon, 05 Nov 2007 02:48:52 +0300, Stas Sergeev wrote:
Hi.
I spent 4 week-ends debugging the constant underruns and lock-ups with the libao-based programs. Now I've finally tracked the problem to some strange code in pcm_rate.c, namely snd_pcm_rate_poll_descriptors(). I am not sure what was the reason behind altering the "avail_min" every time. Looks like some heuristic was intended, say "see what amount of fragment part was written now, and make sure that amount of free space will be available next time, after poll". Whatever, this code gives underruns. It can increase avail_min by almost period_size. libao sets avail_min==period_size, so we can end up with avail_min==period_size*2. libao sets buffer_size==stop_threshold==period_size*2.
Depends on the configuration. I thought it's not so as default.
So we end up with avail_min~=stop_threshold, which gives underruns. Since I don't know how this code was indended to work, I just removed it, and everything works perfectly now (also had to patch libao to check the returned rate).
The patch is attached, any comments?
In principle, using rate plugin with two periods doesn't work well in case that the sample rates aren't aligned. It's a design issue. You shouldn't use two periods except for hw. Period.
... but with this attitute, there is no improvement. Let's see more details.
The problem is that the hardware is woken up only at the period size of the slave side. Assume the slave (hardware playback) is running 48kHz and the client (input) is 44.1kHz. When dmix is used, usually the period size = 1024 in the h/w side. Then the period size of the client side is supposed to be 940. Here, note that 940 != 1024 * 44.1 / 48.0 exactly. This rounding causes the drift of wake-up time at each period and the delay is accumulated.
So, even applying your patch, the XRUN problem may occur at some time as long as you use two periods. It can't be fixed without the fundamental change of the irq / poll handling routines in the ALSA driver.
Now, back to the problematic code part in rate plugin. Whether that hack really does any good thing is questionable, indeed.
First, it skips the avail_min adjustment if the app fills the period size. Thus only for apps that fills arbitrary amount of data via snd_pcm_writei() triggers this hack. Second, avail_min is checked usually in irq handler and thus its resolution is also in period size. It means avail_min + 1 is equivalent with avail_min + period_size.
So what we can do better? As a temporary solution, we can get rid of the problematic part, or, at least, add the check whether avail_min comes over stop_threshold. I'm not sure whether any big impact by removing the hack there. Maybe not. But, I feel it's a barren discussion. It's really a design problem. Sigh.
Takashi
Takashi Iwai wrote:
removing the hack there. Maybe not. But, I feel it's a barren discussion. It's really a design problem. Sigh.
Are any steps being taken to change the design so that sample rate conversion works better? The options are: 1) get the sound card hardware to produce time based interrupts for the rate plugin to use as it's period trigger. This is how the old OSS drivers did it. 2) allow the user application to use different buffer/period sizes than the hardware itself. 3) Try to encourage applications not to use the pcm_rate plugin at all!!! Instead force each application to do its own sample rate conversion to match what the hardware can do.
I favor option (3) myself.
James
Hi.
James Courtier-Dutton wrote:
Are any steps being taken to change the design so that sample rate conversion works better?
Yes, you've seen it, its in my patch. :) Seriously though, I think that attitude is unfair. I dare to claim that it was not some random hack that just happened to work. But rather a correct and reliable fix.
- allow the user application to use different buffer/period sizes than
the hardware itself.
That's what the rate plugin already allows, it seems.
- Try to encourage applications not to use the pcm_rate plugin at
all!!! Instead force each application to do its own sample rate conversion to match what the hardware can do.
The applications have nothing to do with that plugin I think - it all in an alsa config.
At Tue, 06 Nov 2007 20:14:26 +0300, Stas Sergeev wrote:
Hi.
James Courtier-Dutton wrote:
Are any steps being taken to change the design so that sample rate conversion works better?
Yes, you've seen it, its in my patch. :) Seriously though, I think that attitude is unfair. I dare to claim that it was not some random hack that just happened to work. But rather a correct and reliable fix.
Sorry but the deep problem of the rate plugin cannot be fixed in that way...
- allow the user application to use different buffer/period sizes than
the hardware itself.
That's what the rate plugin already allows, it seems.
Not exactly. The buffer/period sizes of rate plugin is bound to the ones of its slave. They cannot be arbitrary values but must be the converted sizes from the sizes of the slave. Allowing arbitrary sizes means the possiblity of wake-up at arbitrary timing, which isn't implemented right now.
- Try to encourage applications not to use the pcm_rate plugin at
all!!! Instead force each application to do its own sample rate conversion to match what the hardware can do.
The applications have nothing to do with that plugin I think - it all in an alsa config.
ALSA config can disallow it, too, of course. But, I won't call it a solution :)
Takashi
At Tue, 06 Nov 2007 14:47:38 +0000, James Courtier-Dutton wrote:
Takashi Iwai wrote:
removing the hack there. Maybe not. But, I feel it's a barren discussion. It's really a design problem. Sigh.
Are any steps being taken to change the design so that sample rate conversion works better? The options are:
- get the sound card hardware to produce time based interrupts for the
rate plugin to use as it's period trigger. This is how the old OSS drivers did it.
It's not neccessarily to be a sound card that generates interrupts. It can be simply an any timer. We have a hrtimer now, so the material to cook is there. We need an exact synchronization, though, for example with the timing correction a la PLL.
- allow the user application to use different buffer/period sizes than
the hardware itself.
This is the best option, together with an arbitrary interrupt source.
- Try to encourage applications not to use the pcm_rate plugin at
all!!! Instead force each application to do its own sample rate conversion to match what the hardware can do.
No, this cannot be accepted in the current situation. It's 100 steps backward. If we really do this, we should get rid of the whole plugins and provide only very bare stuff. That is, it'd be better to re-design the whole ALSA API from scratch.
Takashi
Hi.
Takashi Iwai wrote:
In principle, using rate plugin with two periods doesn't work well in case that the sample rates aren't aligned. It's a design issue. You shouldn't use two periods except for hw. Period.
I understand that and I even tried a hack to increase the buffer_size and stop_threshold by the period_size in a rate plugin, although without a success. However, there is one exception: it will work provided the app is allowed to fill both fragments entirely. Without that patch, it simply can't. With the patch - the write returns earlier, the app writes again, the rate plugin gets the period filled, and commits it, before the first one expired. Isn't it exactly the right and reliable fix then?
Here, note that 940 != 1024 * 44.1 / 48.0 exactly. This rounding causes the drift of wake-up time at each period and the delay is accumulated.
I understand, but the app can start writing _third_ period if write returns earlier. And that write will fill up the reminder of the second one, no matter how it accumulated.
So, even applying your patch, the XRUN problem may occur at some time as long as you use two periods. It can't be fixed without the fundamental change of the irq / poll handling routines in the ALSA driver.
I am more inclinced to think there is a race - except for the underruns, I've also seen a lockups in poll(). Do you think the race I described in the previous posting doesn't exist?
Whether that hack really does any good thing is questionable, indeed.
Actually, I've found out that it plaques many other bugs. For example, mpg123 sets avail_min=1, which is silly but should work. With the hack, the avail_min just gets increased. Without the hack - snd_pcm_write_areas() spins in a loop trying to commit the samples one-by-one (because the poll returns immediately), eating 100% of CPU. But that's an unrelated bug, which was just hidden, and I think there are more...
First, it skips the avail_min adjustment if the app fills the period size. Thus only for apps that fills arbitrary amount of data via snd_pcm_writei() triggers this hack.
I don't think so. Because of the rounding errors in a rate plugin, when the app thinks it writes the entire fragment, it actually does not. Any app is affected!
Second, avail_min is checked usually in irq handler and thus its resolution is also in period size. It means avail_min + 1 is equivalent with avail_min + period_size.
Yes, so don't increase it then. :)
So what we can do better? As a temporary solution, we can get rid of the problematic part, or, at least, add the check whether avail_min comes over stop_threshold. I'm not sure whether any big impact by removing the hack there. Maybe not. But, I feel it's a barren discussion. It's really a design problem. Sigh.
I think allowing an app to start writing the third fragment, filling actually the second one, is a very good fix. What problems can you see with it?
At Tue, 06 Nov 2007 19:10:28 +0300, Stas Sergeev wrote:
Hi.
Takashi Iwai wrote:
In principle, using rate plugin with two periods doesn't work well in case that the sample rates aren't aligned. It's a design issue. You shouldn't use two periods except for hw. Period.
I understand that and I even tried a hack to increase the buffer_size and stop_threshold by the period_size in a rate plugin, although without a success. However, there is one exception: it will work provided the app is allowed to fill both fragments entirely. Without that patch, it simply can't. With the patch - the write returns earlier, the app writes again, the rate plugin gets the period filled, and commits it, before the first one expired. Isn't it exactly the right and reliable fix then?
I don't understand your description well. Could you give a simple test code to prove the bug?
Here, note that 940 != 1024 * 44.1 / 48.0 exactly. This rounding causes the drift of wake-up time at each period and the delay is accumulated.
I understand, but the app can start writing _third_ period if write returns earlier. And that write will fill up the reminder of the second one, no matter how it accumulated.
In the case above, it's woken up *LATER* than expected. write/poll doesn't return earlier. Wel, write would return ealier but it doesn't return the size of the period size but must be less than that.
So, even applying your patch, the XRUN problem may occur at some time as long as you use two periods. It can't be fixed without the fundamental change of the irq / poll handling routines in the ALSA driver.
I am more inclinced to think there is a race - except for the underruns, I've also seen a lockups in poll(). Do you think the race I described in the previous posting doesn't exist?
I don't think it's a race. It appears to be just the matter of configuration mismatch (or misconception) to me.
Whether that hack really does any good thing is questionable, indeed.
Actually, I've found out that it plaques many other bugs. For example, mpg123 sets avail_min=1, which is silly but should work. With the hack, the avail_min just gets increased. Without the hack - snd_pcm_write_areas() spins in a loop trying to commit the samples one-by-one (because the poll returns immediately), eating 100% of CPU. But that's an unrelated bug, which was just hidden, and I think there are more...
Possibly. Changing avail_min isn't a good idea, I believe, too.
First, it skips the avail_min adjustment if the app fills the period size. Thus only for apps that fills arbitrary amount of data via snd_pcm_writei() triggers this hack.
I don't think so.
It is. It has the condition "avail_min > 0", where avail_min is computed from appl_ptr % period_size. Thus, if you fill always the period_size and calls poll, this hack is always skipped.
Because of the rounding errors in a rate plugin, when the app thinks it writes the entire fragment, it actually does not. Any app is affected!
The app must not think that the whole fragment can be written at a single call. That's the bug of the app!
Second, avail_min is checked usually in irq handler and thus its resolution is also in period size. It means avail_min + 1 is equivalent with avail_min + period_size.
Yes, so don't increase it then. :)
So what we can do better? As a temporary solution, we can get rid of the problematic part, or, at least, add the check whether avail_min comes over stop_threshold. I'm not sure whether any big impact by removing the hack there. Maybe not. But, I feel it's a barren discussion. It's really a design problem. Sigh.
I think allowing an app to start writing the third fragment, filling actually the second one, is a very good fix. What problems can you see with it?
The problem is that two periods with the current rate plugin cannot work. Suppose the case of client period size 940, slave period size 1024. The wake up happens at each slave period size, i.e. slave pointer at 1024, 2048, 3072, ... It corresponds to (almost) client pointer at 940, 1881, 2822. At each wakeup point, app is notified that one period gets empty (and one another is being played). Now, you see the wake up is delayed more and more from the expected point, 940, 1880, 2820, ... Hence, at some time, the delay will reach to the period size. The app is notiried that *one* period is empty but at this point *both* periods are empty, that is, buffer underrun.
So, without another wake-up source, the problem cannot be solved.
Takashi
Hello.
Takashi Iwai wrote:
I don't understand your description well. Could you give a simple test code to prove the bug?
No, its not like that. I am not writing any prog. What I am trying to point out, is that _any_ program is affected. My whole desktop is pretty much "speechless" these days. Well, a few progs, like mplayer, are unaffected (I guess they do not trust "default" and use "hw"), but most of anything else is broken. (and as a most frequent test-case I use ogg123 while debugging) Well, maybe its an asound.conf issue - I'll attach my asound.conf... attached. Maybe its a driver issue - I was using snd-pcsp before and I don't remember any problems. But now I do use snd-intel8x0, and the results are much worse. And this will remain so until I get an x86_64 port of snd-pcsp, and in a mean time I thought it would be a good idea to get working also that. :) Jokes aside, the difference is that snd-intel8x0 accepts only 48000, while snd-pcsp used to accept some other rate.
I'll answer the technical questions in a separate e-mail. I'll try to collect more precise info and examples.
# Generated by system-config-soundcard, do not edit by hand defaults.pcm.card 0 pcm.!default { type plug slave.pcm "svol" }
pcm.svol { type softvol slave.pcm "swmix" control.name "SoftVol Playback Volume" control.card 0 min_dB -0.1 max_dB 10.0 }
pcm.swmix { type dmix ipc_key 1111 slave.pcm "hw:0,0" slave.rate 48000 }
At Wed, 07 Nov 2007 20:16:36 +0300, Stas Sergeev wrote:
Hello.
Takashi Iwai wrote:
I don't understand your description well. Could you give a simple test code to prove the bug?
No, its not like that. I am not writing any prog. What I am trying to point out, is that _any_ program is affected. My whole desktop is pretty much "speechless" these days. Well, a few progs, like mplayer, are unaffected (I guess they do not trust "default" and use "hw"), but most of anything else is broken. (and as a most frequent test-case I use ogg123 while debugging)
Well, I'm using ogg123 with dmix but it works fine. That's why I'm wondering why you got so *many* problems.
And, to fix the problem, a test case is always helpful.
Well, maybe its an asound.conf issue - I'll attach my asound.conf... attached. Maybe its a driver issue - I was using snd-pcsp before and I don't remember any problems. But now I do use snd-intel8x0, and the results are much worse.
Interesting. I don't get such a problem with intel8x0 with the standard alsa config with dmix (the default "default")...
Or, possibly wrapping with softvol is the problem? In the default configuration, softvol is used only when "PCM Playback Volume" doesn't exist.
And this will remain so until I get an x86_64 port of snd-pcsp, and in a mean time I thought it would be a good idea to get working also that. :) Jokes aside, the difference is that snd-intel8x0 accepts only 48000, while snd-pcsp used to accept some other rate. I'll answer the technical questions in a separate e-mail. I'll try to collect more precise info and examples.
Thanks.
Takashi
[2 asound.conf <text/plain (7bit)>] # Generated by system-config-soundcard, do not edit by hand defaults.pcm.card 0 pcm.!default { type plug slave.pcm "svol" }
pcm.svol { type softvol slave.pcm "swmix" control.name "SoftVol Playback Volume" control.card 0 min_dB -0.1 max_dB 10.0 }
pcm.swmix { type dmix ipc_key 1111 slave.pcm "hw:0,0" slave.rate 48000 }
Hi.
Takashi Iwai wrote:
Well, I'm using ogg123 with dmix but it works fine. That's why I'm wondering why you got so *many* problems.
Honestly, I don't want to search an answer to this. :) Now, when I have located all, or most of, the bugs I needed to locate, this question concerns me not too much. :)
And, to fix the problem, a test case is always helpful.
I don't think the (code) test-case will do the trick. I have "anything" as a code test-case, but for you that test-case doesn't work. The only remaining thing I can think of, is asound.conf, which you now have of mine and can try.
Or, possibly wrapping with softvol is the problem?
No, it is not (tried before, tried now once again). I am quite sure in that I have located the problem(s) correctly. So you may see some reluctance, but frankly, I don't know what kind of a test-case I can provide, and why. Write some prog with libao? But it won't be any better than ogg123...
At Wed, 07 Nov 2007 21:52:46 +0300, Stas Sergeev wrote:
Hi.
Takashi Iwai wrote:
Well, I'm using ogg123 with dmix but it works fine. That's why I'm wondering why you got so *many* problems.
Honestly, I don't want to search an answer to this. :) Now, when I have located all, or most of, the bugs I needed to locate, this question concerns me not too much. :)
I'm concerned much because I cannot reproduce the bug here (underrun) in the practical use even with the same application and driver.
And, to fix the problem, a test case is always helpful.
I don't think the (code) test-case will do the trick. I have "anything" as a code test-case, but for you that test-case doesn't work. The only remaining thing I can think of, is asound.conf, which you now have of mine and can try.
Oh a testcase is damn useful! It's far more useful than 100 words of human texts. It's especially because we have the situation that I cannot reproduce the bug practically but only you do. With a test case, we both will be able to check in detail. OK?
Or, possibly wrapping with softvol is the problem?
No, it is not (tried before, tried now once again). I am quite sure in that I have located the problem(s) correctly. So you may see some reluctance, but frankly, I don't know what kind of a test-case I can provide, and why. Write some prog with libao? But it won't be any better than ogg123...
What we need is: The simulation of libao, stripping the similar sequence and configuration that reliablly triggers XRUN.
Note that the code in libao has been changed much from version to version. Your system is not my system, your libao might be different from mine, your ogg123 might be different from mine, and so on.
Takashi
Hello.
Takashi Iwai wrote:
I'm concerned much because I cannot reproduce the bug here (underrun) in the practical use even with the same application and driver.
Have you already tried my asound.conf? Also, I use the 22050Hz ogg file for testing - maybe that makes some difference, so please try the 22050Hz ogg file. And I explicitly tell "ogg123 -d alsa09", or it will apply to some sound server.
cannot reproduce the bug practically but only you do.
No, its not only me. As I already said, I've got a positive reply from the guy saying that it fixes portaudio+espeak. I can also try mailing the guy that raised that problem here before. But unfortunately people prefer to mail privately, so you'll have to beleive me on the other's feedback.
With a test case, we both will be able to check in detail. OK?
OK, but first please try my asound.conf with 22050Hz ogg, and maybe that will rescue me from writing a test-case which won't work on your system most probably anyway...
At Thu, 08 Nov 2007 11:09:57 +0300, Stas Sergeev wrote:
Hello.
Takashi Iwai wrote:
I'm concerned much because I cannot reproduce the bug here (underrun) in the practical use even with the same application and driver.
Have you already tried my asound.conf? Also, I use the 22050Hz ogg file for testing - maybe that makes some difference, so please try the 22050Hz ogg file. And I explicitly tell "ogg123 -d alsa09", or it will apply to some sound server.
Yeah I've seen the problem, but as I wrote, it's basically because you use periods=2 and unaligned sample rates. So, of course, it doesn't work.
For example, set
slave.periods 3 slave.period_size 4096
then you'll hear the improvement, I guess.
cannot reproduce the bug practically but only you do.
No, its not only me. As I already said, I've got a positive reply from the guy saying that it fixes portaudio+espeak. I can also try mailing the guy that raised that problem here before. But unfortunately people prefer to mail privately, so you'll have to beleive me on the other's feedback.
"You" can be plural :)
With a test case, we both will be able to check in detail. OK?
OK, but first please try my asound.conf with 22050Hz ogg, and maybe that will rescue me from writing a test-case which won't work on your system most probably anyway...
A testcase is always a good thing. At least, it helps to understand and spot the problem very much. This is awfully important. Sometimes (like this case), more useful than a patch.
If it can be easily reproducible via only apps included in alsa-utils, then it's fine. We have the known code logic and can trace the flow. But, when the thing goes through a different layer and a different app, it's pretty hard to follow.
thanks,
Takashi
Hello.
Takashi Iwai wrote:
Yeah I've seen the problem,
Grrr... so you reproduced the problem already, right?
but as I wrote, it's basically because you use periods=2 and unaligned sample rates.
I do not - maybe libao does, but in fact I am sure also the sound servers have the same problem.
So, of course, it doesn't work.
It does - have you tried my patch?
slave.periods 3 slave.period_size 4096 then you'll hear the improvement, I guess.
So you already tried everything - then why asking me for the test-case? Anyway. There is a bug. And not the only one. You knew the workaround while I was searching for the fix. Now you decide what is better - fix it or apply to workarounds. Btw, I can still reproduce the underrun by rapidly switching the consoles, even though now it is harder to reproduce.
At Thu, 08 Nov 2007 12:05:21 +0300, Stas Sergeev wrote:
Hello.
Takashi Iwai wrote:
Yeah I've seen the problem,
Grrr... so you reproduced the problem already, right?
But still not exactly sure whether it's the same as you've got. It's no good testcase after all, you see?
but as I wrote, it's basically because you use periods=2 and unaligned sample rates.
I do not - maybe libao does, but in fact I am sure also the sound servers have the same problem.
You didn't set the slave period_size and periods properly in your configuration. This defines periods=2 eventually.
So, of course, it doesn't work.
It does - have you tried my patch?
It works casually until a certain point. But XRUN shall happen, as I explained. So, the patch helps well but not cures completely as long as you use that configuration.
slave.periods 3 slave.period_size 4096 then you'll hear the improvement, I guess.
So you already tried everything - then why asking me for the test-case?
Because ogg123 is not the good way to understand the bug. It has a middle layer, threading, and so on.
A testcase is the standard way to debug. I don't like XP particularly but some of its concept is benifitable for normal programming, too...
Anyway. There is a bug. And not the only one. You knew the workaround while I was searching for the fix. Now you decide what is better - fix it or apply to workarounds.
Sigh. A bug is a bug. I know. But, the problem is that the configuration still doesn't work. I'll likely apply your patch soon later, but I need a better way to check.
Btw, I can still reproduce the underrun by rapidly switching the consoles, even though now it is harder to reproduce.
This is utterly another problem, rather than the real-time response issue. Because the realtime responsiveness is important for two period case, slight difference of period/buffer size or its wakeup condition influences greatly on the behavior.
Takashi
Hello.
Takashi Iwai wrote:
But still not exactly sure whether it's the same as you've got. It's no good testcase after all, you see?
OK, I'll see what can be done about a test-case. But it can't be made immediately. Maybe this week-end.
You didn't set the slave period_size and periods properly in your configuration. This defines periods=2 eventually.
That's what the program did, I simply have not made an override.
It works casually until a certain point. But XRUN shall happen, as I explained. So, the patch helps well but not cures completely as long as you use that configuration.
Well, do we agree that at least on the hardware that allows an arbitrary fragment sizes, it fixes the problem completely? If so, then IMHO this is already very good.
Sigh. A bug is a bug. I know. But, the problem is that the configuration still doesn't work.
It will likely to work with most cards, right? But I don't have Ye Olde SB16 to test...
This is utterly another problem, rather than the real-time response issue. Because the realtime responsiveness is important for two period case, slight difference of period/buffer size or its wakeup condition influences greatly on the behavior.
As I said, I can reproduce the underrun even after setting 3 periods. This is very suspicious. I need to check for a race.
Yeah, but it's the case of partial writes again, i.e. when apps cannot write a full period size. When apps check its availability via poll, the hack isn't triggered.
But how then it happens that it fixes the portaudio problems? By the way, I still don't see what's the difference. When the poll returns, there is usually a period_size free on slave. But does it mean that the app can write the full period? What if the rate plugin converts the full period of app to the full period of HW + 4 extra bytes? In this case, even after poll, the write may block because of these 4 extra bytes, and then the partial write will happen. Or what am I missing?
Takashi Iwai wrote:
For example, set
slave.periods 3 slave.period_size 4096
then you'll hear the improvement, I guess.
Can someone tell me exactly where I'm supposed to put those two lines? Every time I try to understand ALSA configuration files, I just get more confused.
Hello.
Timur Tabi wrote:
For example, set slave.periods 3 slave.period_size 4096 then you'll hear the improvement, I guess.
Can someone tell me exactly where I'm supposed to put those two lines?
/etc/asound.conf for instance, or ~/.asoundrc Like this: --- pcm.swmix { type dmix ipc_key 1111 slave.pcm "hw:0,0" slave.periods 3 slave.period_size 4096 } ---
Also change the "default" to use it, like this: --- pcm.!default { type plug slave.pcm "swmix" } --- (that's also where "rate" gets into play - "plug" may use it)
Stas Sergeev wrote:
pcm.swmix { type dmix ipc_key 1111 slave.pcm "hw:0,0" slave.periods 3 slave.period_size 4096 } pcm.!default { type plug slave.pcm "swmix" }
Thanks, but this didn't make any difference. How can I tell that my /etc/asound.conf is actually loaded and used?
Stas Sergeev wrote:
Timur Tabi wrote:
How can I tell that my /etc/asound.conf is actually loaded and used?
Write some total crap into it, and if the sound stops working, then it is definitely used. :)
It looks like it's not being used. I also tried ~/.asoundrc and that didn't make a difference either. When is /etc/asound.conf supposed to be loaded? I must be missing something fundamental here. I'm running ALSA 1.0.14 and my libasound is 2.0.0.
On Nov 9, 2007 3:06 PM, Timur Tabi timur@freescale.com wrote:
Stas Sergeev wrote:
Timur Tabi wrote:
How can I tell that my /etc/asound.conf is actually loaded and used?
Write some total crap into it, and if the sound stops working, then it is definitely used. :)
It looks like it's not being used. I also tried ~/.asoundrc and that didn't make a difference either. When is /etc/asound.conf supposed to be loaded? I must be missing something fundamental here. I'm running ALSA 1.0.14 and my libasound is 2.0.0.
Did you install all of libasound properly including the config files in /usr/share/alsa?
How are you testing?
Lee
Lee Revell wrote:
Did you install all of libasound properly including the config files in /usr/share/alsa?
Beats me. I have a /usr/share/alsa that has this:
-rw-r--r-- 1 root root 8611 Sep 28 2007 alsa.conf drwxr-xr-x 3 root root 4096 Sep 28 2007 cards drwxr-xr-x 2 root root 4096 Sep 28 2007 pcm -rw-r--r-- 1 root root 132 Sep 28 2007 smixer.conf -rw-r--r-- 1 root root 3205 Sep 28 2007 sndo-mixer.alisp
But I've never mucked with anything in here. I didn't create the file system I'm using, so I don't know how it was made. How would I know if I'm missing something?
How are you testing?
mplayer -v -ao oss Cars480_1[1].5M.divx
This movie has audio encoded at 44100KHz, and that's what causes the underrun problem. I've been following this thread, but I'm still not sure what the solution is, if any.
On Nov 9, 2007 3:16 PM, Timur Tabi timur@freescale.com wrote:
mplayer -v -ao oss Cars480_1[1].5M.divx
This movie has audio encoded at 44100KHz, and that's what causes the underrun problem. I've been following this thread, but I'm still not sure what the solution is, if any.
There's your problem. The in-kernel OSS emulation bypasses ALL of alsa-lib including plugins and config files.
Lee
Lee Revell wrote:
There's your problem. The in-kernel OSS emulation bypasses ALL of alsa-lib including plugins and config files.
Ok, now I'm confused. Isn't this whole thread is about underrun problems with OSS?
On Nov 9, 2007 3:33 PM, Timur Tabi timur@freescale.com wrote:
Lee Revell wrote:
There's your problem. The in-kernel OSS emulation bypasses ALL of alsa-lib including plugins and config files.
Ok, now I'm confused. Isn't this whole thread is about underrun problems with OSS?
Yes. I don't know why anyone brought up ALSA config files.
Lee
Stas Sergeev wrote:
Hello.
Lee Revell wrote:
Ok, now I'm confused. Isn't this whole thread is about underrun problems with OSS?
Yes.
Any evidence of this?
I can only get an underrun if using OSS.
Timur Tabi wrote:
[...]
It is likely that your problem is related to the sample rate converter in the OSS emulation code.
Try disabling CONFIG_SND_PCM_OSS_PLUGINS in the kernel configuration, then mplayer will use its owner sample rate converter.
Regards, Clemens
Clemens Ladisch wrote:
Timur Tabi wrote:
[...]
It is likely that your problem is related to the sample rate converter in the OSS emulation code.
Try disabling CONFIG_SND_PCM_OSS_PLUGINS in the kernel configuration, then mplayer will use its owner sample rate converter.
Dude, you rock! That fixed it. Looks like the sample rate converter in the OSS code is busted.
Clemens Ladisch wrote:
Timur Tabi wrote:
[...]
It is likely that your problem is related to the sample rate converter in the OSS emulation code.
Try disabling CONFIG_SND_PCM_OSS_PLUGINS in the kernel configuration, then mplayer will use its owner sample rate converter.
Has there been any progress made on this issue since November? I have a customer that is affected by it, and I was wondering if the problem is fixed in the latest ALSA.
If not, is there a bug entry in https://bugtrack.alsa-project.org for it? I need to make a formal write-up of this bug, but I don't really understand it that well.
At Thu, 31 Jan 2008 14:49:45 -0600, Timur Tabi wrote:
Clemens Ladisch wrote:
Timur Tabi wrote:
[...]
It is likely that your problem is related to the sample rate converter in the OSS emulation code.
Try disabling CONFIG_SND_PCM_OSS_PLUGINS in the kernel configuration, then mplayer will use its owner sample rate converter.
Has there been any progress made on this issue since November? I have a customer that is affected by it, and I was wondering if the problem is fixed in the latest ALSA.
If not, is there a bug entry in https://bugtrack.alsa-project.org for it? I need to make a formal write-up of this bug, but I don't really understand it that well.
The bug report on BTS isn't more effectively handled than on ML...
I don't know whether you tested what Clemens suggested. If it's really a rate converter issue, then it should be possible to reproduce with other systems. First, we need to find out how to reproduce the bug with other drivers such as snd-dummy.
Takashi
Takashi Iwai wrote:
The bug report on BTS isn't more effectively handled than on ML...
True, but at least I have a URL I can point to. That's handy.
I don't know whether you tested what Clemens suggested.
If you mean disabling CONFIG_SND_PCM_OSS_PLUGINS, then yes, doing so does "fix" the problem. That is, I don't get any more underruns, but then the application is forced to do sample rate conversion.
If it's really a rate converter issue, then it should be possible to reproduce with other systems. First, we need to find out how to reproduce the bug with other drivers such as snd-dummy.
It happens on my 8610 driver.
At Fri, 01 Feb 2008 08:46:25 -0600, Timur Tabi wrote:
Takashi Iwai wrote:
The bug report on BTS isn't more effectively handled than on ML...
True, but at least I have a URL I can point to. That's handy.
An ML archive has also a URL :) An advantage of ML is that it can involve developers more easily.
I don't know whether you tested what Clemens suggested.
If you mean disabling CONFIG_SND_PCM_OSS_PLUGINS, then yes, doing so does "fix" the problem. That is, I don't get any more underruns, but then the application is forced to do sample rate conversion.
OK, now the question is in which condition this happens...
If it's really a rate converter issue, then it should be possible to reproduce with other systems. First, we need to find out how to reproduce the bug with other drivers such as snd-dummy.
It happens on my 8610 driver.
Could you rewrite snd-dummy driver to behave like your driver? As mentionted, it's important to get the environment to reproduce the problem reliably independent on hardwares.
thanks,
Takashi
Takashi Iwai wrote:
An ML archive has also a URL :) An advantage of ML is that it can involve developers more easily.
I'm not saying that the developers should use BTS instead of ML. I'm saying that if it really is a bug in ALSA, I'd like to have formal recognition of that. A entry in BTS would qualify. I strewn-out discussion on a mailing list has too low of a signal-to-noise ratio to be useful to the PHBs.
OK, now the question is in which condition this happens...
I'll try to get an exact testcase, but I think it's pretty much always with "mplayer -ao oss".
Could you rewrite snd-dummy driver to behave like your driver?
Uh, are you kidding? My driver is an ASoC driver. How am I supposed to make a dummy driver "behave" like my driver? My driver isn't doing anything unusual.
As mentionted, it's important to get the environment to reproduce the problem reliably independent on hardwares.
I understand that, but I don't see how I can do that.
At Fri, 01 Feb 2008 09:18:46 -0600, Timur Tabi wrote:
Takashi Iwai wrote:
An ML archive has also a URL :) An advantage of ML is that it can involve developers more easily.
I'm not saying that the developers should use BTS instead of ML. I'm saying that if it really is a bug in ALSA, I'd like to have formal recognition of that. A entry in BTS would qualify. I strewn-out discussion on a mailing list has too low of a signal-to-noise ratio to be useful to the PHBs.
Do as you like. I just give you advise that you have a better chance here for debugging such a problem than on BTS. BTS are full of **** bug reports that one can't handle sanely.
OK, now the question is in which condition this happens...
I'll try to get an exact testcase, but I think it's pretty much always with "mplayer -ao oss".
Could you rewrite snd-dummy driver to behave like your driver?
Uh, are you kidding?
I'm serious.
My driver is an ASoC driver. How am I supposed to make a dummy driver "behave" like my driver? My driver isn't doing anything unusual.
It's unusual that this problem happens only on your system. And, your driver isn't portable to other systems, so we have to find out the way to reproduce the bug.
As mentionted, it's important to get the environment to reproduce the problem reliably independent on hardwares.
I understand that, but I don't see how I can do that.
Simply make snd_pcm_hardware fields to match with yours so that we get the identical hardware constraints. You see many examples in the code.
Takashi
Takashi Iwai wrote:
Simply make snd_pcm_hardware fields to match with yours so that we get the identical hardware constraints. You see many examples in the code.
Alright, I'll add it to my to-do list.
Hi.
Takashi Iwai wrote:
I don't think it's a race. It appears to be just the matter of configuration mismatch (or misconception) to me.
OK then. The race will have to wait till I get to it hard and (in case there really is) produce a patch.
Possibly. Changing avail_min isn't a good idea, I believe, too.
Well, what I was trying to say is that there are a few serious bugs that were hidden by the hack... Never mind, I'll post the patches when the time will come. One step at a time.
size. Thus only for apps that fills arbitrary amount of data via snd_pcm_writei() triggers this hack.
I don't think so.
It is.
Firstly, it seems all or most apps write the arbitrary amounts.
mpg123: --- #7 0x00002aaaaad34afa in snd_pcm_mmap_writei (pcm=0x565dd0, buffer=0x5660b0, size=4096) at pcm_mmap.c:186 186 return snd_pcm_write_areas(pcm, areas, 0, size, (gdb) p pcm->period_size $1 = 5512 --- writes 4096, while period_size is 5512.
ogg123: --- #9 0x00002aaab2c48a69 in snd_pcm_writei (pcm=0x673430, buffer=0x2aaab4215108, size=11340) at pcm.c:1186 1186 return _snd_pcm_writei(pcm, buffer, size); (gdb) p pcm->period_size $2 = 2756 --- writes 11340 with period_size 2756.
But I don't agree with you even if some app does not. It will still be affected. Let me demonstrate. I added the following: --- +if (size > pcm->period_size && size % pcm->period_size) +size -= size % pcm->period_size; --- to snd_pcm_writei() and snd_pcm_mmap_writei() to simulate the condition when an app transfers by periods. Here we go: --- #5 0x00002aaab2c5dbc1 in snd_pcm_mmap_writei (pcm=0x672e30, buffer=0x2aaab4215108, size=11024) at pcm_mmap.c:188 188 return snd_pcm_write_areas(pcm, areas, 0, size, (gdb) p pcm->period_size $1 = 2756 (gdb) p pcm->period_size*4 $2 = 11024 (gdb) p pcm->period_size*4==size $3 = 1 --- OK, it is trying to write 4 periods.
--- Breakpoint 2, snd_pcm_rate_poll_descriptors (pcm=0x672e30, pfds=0x409ffdd0, space=1) at pcm_rate.c:720 720 snd_pcm_rate_t *rate = pcm->private_data; (gdb) n 724 ret = snd_pcm_generic_poll_descriptors(pcm, pfds, space); (gdb) n 725 if (ret < 0) (gdb) n 728 avail_min = rate->appl_ptr % pcm->period_size; (gdb) n 729 if (avail_min > 0) { (gdb) n 730 recalc(pcm, &avail_min); (gdb) p avail_min $4 = 4 --- See a small reminder? Why you ask? Here: (the ogg has the rate 22050, the driver has 48000) --- (gdb) p rate->gen.slave->period_size $10 = 6000 (gdb) p 6000/2756.0 $11 = 2.1770682148040637 (gdb) p 48000/22050.0 $12 = 2.1768707482993195 --- See a slight difference? I think this is exactly why _any_ app is affected, not only those that write arbitrary amounts (even though they are a majority, it seems to me). So let me challenge you on that. :)
The app must not think that the whole fragment can be written at a single call. That's the bug of the app!
Hmm, I don't think it does, but I don't understand what you mean here. So you say the app should write an arbitrary amounts? Well, then the above examples of mine makes no sense, but I don't see how it helps.
The problem is that two periods with the current rate plugin cannot work. Suppose the case of client period size 940, slave period size
Thanks for the example, I now understand what fundamental problem you have in mind.
So, without another wake-up source, the problem cannot be solved.
OK, it appears we are talking about two completely different problems. Yes, I now see the fundamental one and I admit my patch doesn't solve it. Neither it was intended to.
So what we have: 1. A fundamental problem with the different fragment sizes. This may (or may not, if the HW period is adjustable) give you underruns. 2. A bug in snd_pcm_rate_poll_descriptors(), which prevents an app from filling the second fragment before the first one is played. This _does_ give an underruns, constantly, unavoidably and badly. I have a patch, it needs to be discussed. 3. A bug in snd_pcm_write_areas() I mentioned earlier. I have a patch, will post it when the time will come. 4. A (possible) race in snd_pcm_playback_poll() which is harmless most of the times, but I can get underruns out of it by rapidly switching consoles. 5. Some other bug related to that problem, which makes me a headache, but is still to be located.
And you basically say: "if we have 1, then we are not interested in fixing 2,3 and investigating 4,5". That point of view may exist, but given that 2 gives some 90% of underruns with 1 being only theoretically harmfull, I am not satisfied. :)
Sorry for the lengthy posting, I tried to strip it as much as I could.
At Wed, 07 Nov 2007 21:40:39 +0300, Stas Sergeev wrote:
Hi.
Takashi Iwai wrote:
I don't think it's a race. It appears to be just the matter of configuration mismatch (or misconception) to me.
OK then. The race will have to wait till I get to it hard and (in case there really is) produce a patch.
Yeah, a testcase is required to reproduce the bug anyway.
Possibly. Changing avail_min isn't a good idea, I believe, too.
Well, what I was trying to say is that there are a few serious bugs that were hidden by the hack... Never mind, I'll post the patches when the time will come. One step at a time.
OK, thanks.
size. Thus only for apps that fills arbitrary amount of data via snd_pcm_writei() triggers this hack.
I don't think so.
It is.
Firstly, it seems all or most apps write the arbitrary amounts.
Well, I object it - as far as I know (through hundreds of SUSE package I've been maintaining), most of them write the period_size data at once. Writing arbitrary size is rather minor.
mpg123:
#7 0x00002aaaaad34afa in snd_pcm_mmap_writei (pcm=0x565dd0, buffer=0x5660b0, size=4096) at pcm_mmap.c:186 186 return snd_pcm_write_areas(pcm, areas, 0, size, (gdb) p pcm->period_size $1 = 5512
writes 4096, while period_size is 5512.
ogg123:
#9 0x00002aaab2c48a69 in snd_pcm_writei (pcm=0x673430, buffer=0x2aaab4215108, size=11340) at pcm.c:1186 1186 return _snd_pcm_writei(pcm, buffer, size); (gdb) p pcm->period_size $2 = 2756
writes 11340 with period_size 2756.
Both are for libao. The normal apps are more careful about the write timing and sync with GUI. So they tend to write the data at the time poll permits, instead of dumb write sequence relying on blocking mode. Maybe this is the key to get or avoid the bug.
But I don't agree with you even if some app does not. It will still be affected. Let me demonstrate. I added the following:
+if (size > pcm->period_size && size % pcm->period_size)
+size -= size % pcm->period_size;
to snd_pcm_writei() and snd_pcm_mmap_writei() to simulate the condition when an app transfers by periods. Here we go:
#5 0x00002aaab2c5dbc1 in snd_pcm_mmap_writei (pcm=0x672e30, buffer=0x2aaab4215108, size=11024) at pcm_mmap.c:188 188 return snd_pcm_write_areas(pcm, areas, 0, size, (gdb) p pcm->period_size $1 = 2756 (gdb) p pcm->period_size*4 $2 = 11024 (gdb) p pcm->period_size*4==size $3 = 1
OK, it is trying to write 4 periods.
Breakpoint 2, snd_pcm_rate_poll_descriptors (pcm=0x672e30, pfds=0x409ffdd0, space=1) at pcm_rate.c:720 720 snd_pcm_rate_t *rate = pcm->private_data; (gdb) n 724 ret = snd_pcm_generic_poll_descriptors(pcm, pfds, space); (gdb) n 725 if (ret < 0) (gdb) n 728 avail_min = rate->appl_ptr % pcm->period_size; (gdb) n 729 if (avail_min > 0) { (gdb) n 730 recalc(pcm, &avail_min); (gdb) p avail_min $4 = 4
See a small reminder? Why you ask? Here: (the ogg has the rate 22050, the driver has 48000)
(gdb) p rate->gen.slave->period_size $10 = 6000 (gdb) p 6000/2756.0 $11 = 2.1770682148040637 (gdb) p 48000/22050.0 $12 = 2.1768707482993195
See a slight difference? I think this is exactly why _any_ app is affected, not only those that write arbitrary amounts (even though they are a majority, it seems to me). So let me challenge you on that. :)
Then try aplay to play 22.5kHz samples on 48k. You'll see it doesn't happen.
The difference seems to be whether you do simply writei() sequences without checks which causes partial writes and eventually triggers the hack. As mentioned, many apps do check the available spaces before write, so the partial write doesn't happen practically.
The app must not think that the whole fragment can be written at a single call. That's the bug of the app!
Hmm, I don't think it does, but I don't understand what you mean here. So you say the app should write an arbitrary amounts? Well, then the above examples of mine makes no sense, but I don't see how it helps.
No, what I meant is that the app should check always the return value of the write properly. Even in the blocking mode, write *may* return without writing the whole data. Well, it's off-topic right now.
The problem is that two periods with the current rate plugin cannot work. Suppose the case of client period size 940, slave period size
Thanks for the example, I now understand what fundamental problem you have in mind.
So, without another wake-up source, the problem cannot be solved.
OK, it appears we are talking about two completely different problems. Yes, I now see the fundamental one and I admit my patch doesn't solve it. Neither it was intended to.
So what we have:
- A fundamental problem with the different
fragment sizes. This may (or may not, if the HW period is adjustable) give you underruns. 2. A bug in snd_pcm_rate_poll_descriptors(), which prevents an app from filling the second fragment before the first one is played. This _does_ give an underruns, constantly, unavoidably and badly. I have a patch, it needs to be discussed. 3. A bug in snd_pcm_write_areas() I mentioned earlier. I have a patch, will post it when the time will come. 4. A (possible) race in snd_pcm_playback_poll() which is harmless most of the times, but I can get underruns out of it by rapidly switching consoles. 5. Some other bug related to that problem, which makes me a headache, but is still to be located.
And you basically say: "if we have 1, then we are not interested in fixing 2,3 and investigating 4,5". That point of view may exist, but given that 2 gives some 90% of underruns with 1 being only theoretically harmfull, I am not satisfied. :)
I'm not satisfied because I have no testcase to reproduce the practical bug. The patch looks fine, but I cannot test it because the serious bug doesn't occur to me. OK? That's why I claim a testcase.
thanks,
Takashi
Hello.
Takashi Iwai wrote:
OK then. The race will have to wait till I get to it hard and (in case there really is) produce a patch.
Yeah, a testcase is required to reproduce the bug anyway.
No, the race doesn't have a test-case - it gives underruns only when I rapidly switch the consoles... Anyway, let's wait till I get to it.
Well, I object it - as far as I know (through hundreds of SUSE package I've been maintaining), most of them write the period_size data at once. Writing arbitrary size is rather minor.
OK, thanks for info - I'll see about enlarging my test-suit.
mpg123: ogg123:
Both are for libao.
Hmm, my mpg123 is not from libao, only ogg123 is. And mpg123 doesn't suffer an underruns therefore, but it is affected by a few other problems I mentioned.
The normal apps are more careful about the write timing and sync with GUI. So they tend to write the data at the time poll permits, instead of dumb write sequence relying on blocking mode. Maybe this is the key to get or avoid the bug.
OK, I'll try aplay later and post back.
The difference seems to be whether you do simply writei() sequences without checks which causes partial writes and eventually triggers the hack.
No, its not that simple. Or at least in my theory. :) The rounding error I was pointing into, causes the following: - the app writes a full period. - the rate plugin converts it to the slave period, which has a different size. - because of the rounding errors, that slave period is not filled properly - it is either slightly underfilled, or overfilled (and the subsequent period is started). That triggers the hack. Does this look realistic or not?
As mentioned, many apps do check the available spaces before write, so the partial write doesn't happen practically.
I don't think this will help, but I'll check with aplay later.
No, what I meant is that the app should check always the return value of the write properly. Even in the blocking mode, write *may* return without writing the whole data. Well, it's off-topic right now.
I know that, but I can't follow the logic. If write returns earlier, then the app will have to do the partial write next time, so you loose the alignment. So by saying that, you basically say that the arbitrary writes are unavoidable, and so I can't follow the logic.
I'm not satisfied because I have no testcase to reproduce the practical bug. The patch looks fine, but I cannot test it because the serious bug doesn't occur to me. OK? That's why I claim a testcase.
I'll try to help with that, but I don't think this will be successfull (and you should try my asound.conf first). But we always can ask someone else with the problem to test the patch, right? I don't think there is a shortage of such people - google makes me to beleive so.
At Thu, 08 Nov 2007 11:27:42 +0300, Stas Sergeev wrote:
Hello.
Takashi Iwai wrote:
OK then. The race will have to wait till I get to it hard and (in case there really is) produce a patch.
Yeah, a testcase is required to reproduce the bug anyway.
No, the race doesn't have a test-case - it gives underruns only when I rapidly switch the consoles... Anyway, let's wait till I get to it.
The latency via console-switching is a long known problem in the kernel. It's likely irrelevant with the sound system itself.
Well, I object it - as far as I know (through hundreds of SUSE package I've been maintaining), most of them write the period_size data at once. Writing arbitrary size is rather minor.
OK, thanks for info - I'll see about enlarging my test-suit.
mpg123: ogg123:
Both are for libao.
Hmm, my mpg123 is not from libao, only ogg123 is. And mpg123 doesn't suffer an underruns therefore, but it is affected by a few other problems I mentioned.
Ah, OK, I thought it'g mpg321. OK, I forgot about mpg123 supporting ALSA.
The normal apps are more careful about the write timing and sync with GUI. So they tend to write the data at the time poll permits, instead of dumb write sequence relying on blocking mode. Maybe this is the key to get or avoid the bug.
OK, I'll try aplay later and post back.
The difference seems to be whether you do simply writei() sequences without checks which causes partial writes and eventually triggers the hack.
No, its not that simple. Or at least in my theory. :) The rounding error I was pointing into, causes the following:
- the app writes a full period.
- the rate plugin converts it to the
slave period, which has a different size.
- because of the rounding errors, that
slave period is not filled properly - it is either slightly underfilled, or overfilled (and the subsequent period is started). That triggers the hack. Does this look realistic or not?
Not really. It checks appl_ptr of the client, not hw_ptr. appl_ptr is what app writes. So, as long as app writes only the period size, the hack isn't triggered.
As mentioned, many apps do check the available spaces before write, so the partial write doesn't happen practically.
I don't think this will help, but I'll check with aplay later.
No, what I meant is that the app should check always the return value of the write properly. Even in the blocking mode, write *may* return without writing the whole data. Well, it's off-topic right now.
I know that, but I can't follow the logic. If write returns earlier, then the app will have to do the partial write next time, so you loose the alignment. So by saying that, you basically say that the arbitrary writes are unavoidable, and so I can't follow the logic.
The app can wait via poll until the period size becomes available. In that way, partial write can be avoided practically.
I'm not satisfied because I have no testcase to reproduce the practical bug. The patch looks fine, but I cannot test it because the serious bug doesn't occur to me. OK? That's why I claim a testcase.
I'll try to help with that, but I don't think this will be successfull (and you should try my asound.conf first). But we always can ask someone else with the problem to test the patch, right? I don't think there is a shortage of such people - google makes me to beleive so.
Sorry, no, testing the patch with specific apps alone isn't enough in this case.
If the problem persists only with two-period cases, the real fix is to change the hw_refine in rate plugin not to allow periods <= 2 for unaligned sample rates, because two periods don't work anyway. Then a non-working configuration wouldn't be allowed instead of jerky sounds.
Takashi
Hello.
Takashi Iwai wrote:
Not really. It checks appl_ptr of the client, not hw_ptr. appl_ptr is what app writes. So, as long as app writes only the period size, the hack isn't triggered.
No, its not that simple. When the partially filled period on slave appears, it then gets populated back. Namely, snd_pcm_write_areas() takes it via snd_pcm_avail_update(). It then adjusts the "frames" accordingly, and then also "size" gets infected. Then the hack triggers.
At Thu, 08 Nov 2007 12:13:29 +0300, Stas Sergeev wrote:
Hello.
Takashi Iwai wrote:
Not really. It checks appl_ptr of the client, not hw_ptr. appl_ptr is what app writes. So, as long as app writes only the period size, the hack isn't triggered.
No, its not that simple. When the partially filled period on slave appears, it then gets populated back. Namely, snd_pcm_write_areas() takes it via snd_pcm_avail_update(). It then adjusts the "frames" accordingly, and then also "size" gets infected. Then the hack triggers.
Yeah, but it's the case of partial writes again, i.e. when apps cannot write a full period size. When apps check its availability via poll, the hack isn't triggered.
Takashi
Takashi Iwai wrote:
The problem is that the hardware is woken up only at the period size of the slave side. Assume the slave (hardware playback) is running 48kHz and the client (input) is 44.1kHz. When dmix is used, usually the period size = 1024 in the h/w side. Then the period size of the client side is supposed to be 940. Here, note that 940 != 1024 * 44.1 / 48.0 exactly. This rounding causes the drift of wake-up time at each period and the delay is accumulated.
Let me propose a solution: let the "rate" plugin return a rate slightly different from the requested one, adjusted based on the period size, so that the mismatch and rounding error doesn't happen (i.e.: use 44062 Hz instead of 44100 in calculations of pointer positions in the above example, and resample the client data from that rate instead of exact 44100). This would not be a regression anyway, as there are cards (e.g., ens1371) that can do 48 kHz only approximately.
Stas Sergeev wrote:
The patch is attached, any comments?
I am actually a bit surprised with the lack of the response to this. Esp given the existance of the threads like this: http://www.mail-archive.com/alsa-user@lists.sourceforge.net/msg15494.html (btw, can someone please e-mail the address of Scott Waye, so that I can ask him to test the patch?) and the amount of entries google gives on "alsa underrun" request. I also have one private reply, which says that my patch fixes also portaudio+espeak, but that's all.
That patch may not be correct, but at least I think it worth some attention. Without the patch, I am (and many other users apparently) getting a constant stream of underruns and a choppy sound from pretty much anything. With the patch - its almost perfect. Well, it is perfect, but I can still get an underrun by the rapid console switching between X and text. And I think I see the source of these underruns too: snd_pcm_wait() checks for avail>=avail_min in userspace, and then proceeds to poll(). snd_pcm_playback_poll() (in kernel) doesn't check for anything and calls poll_wait(). If between these events the fragment was completed (by an irq handler), this poll_wait() will miss the right time. I guess that the fix is to use the wait_for_completion() here, but I am not sure.
Anyway. If this work is ignored now, then I can bet the problems will stay for the next few years or more, until someone else will expire to start fixing them. Which would probably be a bit disappointing.
Stas Sergeev wrote:
Hi.
I spent 4 week-ends debugging the constant underruns and lock-ups with the libao-based programs.
I think I'm seeing the same problem. When I play certain video files with mplayer and "-ao oss", the video plays much faster than normal, the audio cuts out, and my audio driver gets called the STOP command at the end of a period.
See my thread, "Any OSS changes from kernel 2.6.21 to 2.6.23? Something broke.", although I no longer believe that this is a problem only with newer kernels.
Timur Tabi wrote:
I think I'm seeing the same problem. When I play certain video files with mplayer and "-ao oss", the video plays much faster than normal, the audio cuts out, and my audio driver gets called the STOP command at the end of a period.
Update: when I tell mplayer to set the sample rate to 48000, the problem appears to go away completely. So it looks like sample rate conversion and OSS don't work well together.
participants (7)
-
Alexander E. Patrakov
-
Clemens Ladisch
-
James Courtier-Dutton
-
Lee Revell
-
Stas Sergeev
-
Takashi Iwai
-
Timur Tabi