[alsa-devel] Master Plan on rewinding
Hello.
(TL;DR: nothing really new except the strawman proposal about threads and the note about interaction of variable sample rate with rewindability)
As Takashi Iwai told me that the audio miniconference is for discussion only, and not for presentation of anything, I guess that I need to present the plans and options now. That's why this e-mail. The goal here, besides merely presenting the plan, is to identify points that everyone agrees upon, so that they are not discussed pointlessly at the miniconference. Also, this e-mail serves as a justification for the pending seemingly-destructive work.
First, the status quo. If anyone disagrees with the facts below, please complain loudly, before I make any conclusions!
1. PulseAudio does not call snd_pcm_rewindable(), because for some ALSA plugins it crashed. This crash is completely fixed in alsa-lib 1.0.28, but in some cases snd_pcm_rewindable() still returns wrong results.
2. PulseAudio blindly assumes that it can rewind up to hwbuf_frames - (snd_pcm_avail() + rewind_safeguard) frames. The rewind safeguard is needed due to reasons that I don't completely understand, but one of them is imprecise reporting of the hardware pointer, and another one is that the hardware transfers several bytes at a time, and the bytes we need to overwrite may be already cached by the hardware.
3. On the hw plugin, I could demonstrate two other bugs regarding snd_pcm_rewindable(): stale data and bogus negative return values.
4. There are ALSA plugins like "a52" or the "bluetooth" plugin from old version of bluez that return non-zero for snd_pcm_rewindable() and snd_pcm_rewind, but actually don't rewind, and where rewinding is almost impossible to implement (e.g. because this would seemingly involve unsending of already-sent bluetooth packets). See below why "almost", you need to search for "Alternative strawman proposal".
5. PulseAudio contains a workaround for non-rewindability of the a52 plugin that also happens to apply to other ioplug-based plugins. However, this workaround does not match extplug-based plugins (including "dca"), and my patch extending the workaround has been rejected with the argument that this has to be fixed in ALSA.
6. The rate plugin has been made non-rewindable in alsa-lib 1.0.28 because the old implementation was wrong and no simple fix exists.
7. Issues regarding rewindability of PulseAudio internals such as virtual sinks will be discussed separately.
And now the topics for discussion.
My immediate plan is to fix (3) and, if anyone replies to the rewind safeguard proposal, maybe start fixing (2) from the alsa-lib side.
For all proposals below, unless I say otherwise, I intend to write the required alsa-lib code myself if the proposal is accepted.
=== On the rewind safeguard ===
The consensus is that rewind_safeguard is a workaround for an ALSA bug. This information should come from snd_pcm_rewindable() from the hw plugin. I.e., on a hw device, it should not be equal to snd_pcm_mmap_hw_avail(). While the method of dealing with stale data and negative returns is obvious to me, I am not completely sure about the safeguard. Where should this value come from?
Proposal (credit goes to Raymond Yau): the safeguard should be ideally equal to the granularity of hardware pointer updates. However, we don't know this granularity a priori, and cannot know this because nobody will find out this information for old cards. Since we can't get this, let's instead use the upper bound: the minimum possible period size for the given sample rate, sample format and the number of channels.
On my desktop PC, on snd-hda-intel with analog outputs for S16LE stereo, the granularity is 32 bytes (= 8 samples), and I get the pointer granularity of 64 bytes (=16 samples) over HDMI. The minimum period size is 32 samples in both cases. Thus Raymond's method will overestimate the required safeguard on my cards, but it is still better than no safeguard at all, and less than the hard-coded value in PulseAudio.
On ymfpci, the pointer granularity is 5 ms, thus the value currently hard-coded in PA is insufficient. If I understand the card's limitations correctly, Raymond's method will yield the spot-on result for this card.
For the allegedly existing cards where the pointer granularity is always the same as the period size, Raymond's method will not work. However, I don't know any concrete example of such card (I didn't look at snd-firewire in detail, though). And in any case, my opinion is that this "pointer granularity = period size" limitation on such cards can be treated as a driver bug that needs to be fixed (by someone who knows the driver, not by me). On cards where pointer updates happen only on interrupts, the driver should not configure the card in such a way that one period visible to the userspace corresponds to one interrupt. Instead, it should always configure the card for the minimum possible period size, and report only part of the period interrupts to period_elapsed(). I.e.: userspace asked for 2 periods 64 ms each, but let's configure the card for, say, 64 periods 2 ms each, and use the "extra" interrupts only for updating the pointer.
Hopefully, the above heuristic and driver fix will also let PulseAudio get rid of special treatment of the "batch" cards, simply because they (in the "pointer granularity = period size" definition which is currently used in PulseAudio) will no longer exist.
If the heuristic (or something else) is accepted, then the question remains how to enable the use of snd_pcm_rewindable in PulseAudio. The problem is compatibility with older ALSA versions that, let me remind you, crash if snd_pcm_rewindable is called. I don't have proposals or opinions here. The obvious options are: compile-time alsa-lib version check with fallback to old code on buggy versions, run-time alsa-lib version check with the same fallback, add a (both compile-time and run-time) hard requirement of a fully-fixed alsa-lib version, or do nothing at all.
Possible discussion: alternative heuristics and, as mentioned, migration path in PulseAudio.
=== On non-rewindability of the rate plugin ===
I intend to write a rewindable resampler eventually, but don't have time now. I understand that it is an important task, but issues below (and the dayjob which you can change by offering me a new one) have higher priority for me. However, I want everyone to understand the following point now:
"The resampler has to be written from scratch for the reasons explained in http://permalink.gmane.org/gmane.linux.alsa.devel/122179 , and similar arguments apply to all other kinds of sound processing code that needs history."
For PulseAudio, it is also needed to figure out the desired interaction between variable rate and rewindability. Should rewinding other than "discard everything completely" be allowed at all on variable rate streams when the rewind crosses the sample rate change point? I.e., write 100 samples, change rate, write 50 samples, rewind 100 samples, what should be the resulting rate? Should we special-case small changes vs big ones?
Slightly off-topic, but still a discussion point.
=== On possibly-incomplete rewindability of the file plugin ===
The file plugin has quite hairy code involving the use of the write buffer. I have not verified its correctness or studied the code in detail. Anyway, it looks like it allows rewinding over that buffer only, even though its slave may allow rewinding further. If true, this would make the apparent rewindability useless, as the applications depend on being able to rewind almost everything, and won't like the apparently-random limitation (random because it depends on some obscure "wbuf" implementation detail).
I think that the plugin has to be changed to avoid this limitation if it really exists (which I still need to verify). Namely, I propose keeping a shadow copy of the slave's buffer, and, at the beginning of each API call, querying the slave about its hardware pointer. Only the part of the shadow buffer that corresponds to the distance covered by the slave's hardware pointer since the last call has to be written to the file.
The same applies to recording. As the code is much simpler there, it is obvious that the problem exists: if one rewinds in order to look at the past recorded samples again, the plugin will read further samples from the file instead of supplying the already-looked-at samples. The same solution with the slave hardware pointer should be applicable.
The possible discussion here amounts to suggesting alternative proposals.
=== On bogus rewindability of ladspa and extplug plugins ===
For ioplug, see below, because the situation is different enough.
Neither LADSPA nor snd_pcm_extplug_t have an API that allows to tell the corresponding plugin to forget the last N samples. We cannot fix LADSPA, as it is beyond our control. And we cannot really fix extplug, because this just means moving the problem down one level. External libraries (which are depended upon by the extplug-based plugins) are almost universally not rewindable, are beyond our control, and will not be made rewindable, because rewindability is hard and is not needed by anyone except ALSA and PulseAudio. Still, neither .rewindable not .rewind currently return 0.
So, I'd argue that the only correct result from snd_pcm_rewindable() is 0 for ladspa and extplug plugins. However, right now, they forward the result from the slave. This needs to be fixed, and I have a patch.
With snd_pcm_rewind(), the situation is a bit different. It, obviously, does nothing to tell the LADSPA or extplug plugin that the rewind happened (because it can't). It also performs a rewind on the slave. So, assuming that the rewind succeeds and the DSP algorithm in the plugin uses N samples from the past, we'll end up with the output of the correct duration but with N corrupted samples. Not ideal. But if we don't perform the rewind at all and return 0, we'll end up with regressing PulseAudio: huge latency when applying software volume changes or starting new streams. By setting up the large buffer size, it essentially assumes that any rewind will succeed.
So, my hackish proposal is: make the .rewindable callbacks on ladspa and extplug return 0, keep the current implementation of the .rewind callback for compatibility with the current versions of PulseAudio, document this hack. I already have patch for this, will send out if we agree that the proposal is good.
An alternative proposal (that I would implement if the proposal above is rejected and someone actively confirms this one - but I don't like it) would be to keep the current versions of the .rewindable and .rewind callbacks and add new versions of them (and new user-visible ALSA API) for the situation when imperfect results are unacceptable.
A completely strawman proposal would be to introduce a low-latency thread, see below in the ioplug section.
Same for .forwardable and .forward.
Any other alternatives that I missed?
=== On bogus rewindability of some ioplug-based plugins ===
Ioplug-based plugins currently report rewindability according to the same "appl.ptr - hw.ptr" rule that is used by the hw plugin. However, this is incorrect.
In fact, there are two types of ioplug-based plugins. The jack plugin does not even have a .transfer callback, and does all the work of irrevocably submitting samples to the JACK server in a thread. So, it is almost completely rewindable (with that "almost" being equal to JACK buffer size). Proposal: keep this logic for all plugins without the .transfer callback. Expose the JACK buffer size as the minimum period size. Then the same rule proposed in the section about the hw plugin will work for .rewindable.
Other plugins do their irrevocable work in the .transfer callback and thus are not really rewindable. Proposal: return 0 from .rewindable and rewind ioplug callbacks for all plugins with a .transfer callback (but see the next section for an amendment). Document that the best practice is to have no .transfer callback.
Alternative strawman proposal (speculative): if a .transfer callback exists, call it from an alsa-lib-created thread (not from snd_pcm_writei!) at the last possible moment, using the least possible amount of data (that truly cannot be rewound) in the mmap-style buffer. The new mantra is: .transfer is the thing that moves the hardware pointer. This way, previously non-rewindable plugins could become rewindable. Not sure if this is possible without changing the ABI or possible at all, though - it essentially forces all plugins to declare mmap support. Also not sure what to do with the .pointer callback.
For the strawman proposal, the benefit is a possibility to have dmix on top of a52 (because freewheeling will now work), the downside is that all programs (not only those intending to do rewinds) now pay the cost of the background low-latency thread. Can we avoid this (e.g. with a new flag for snd_pcm_open), so that only programs that intend to do rewinds pay the price? Can this new flag apply to the batch hw drivers? Won't that inflate the test matrix for plugin code?
For discussion: do we implement the first proposal or try the strawman proposal? Any other suggestions regarding ioplug internals? Anyone else has a problem with two completely different interfaces (with and without .transfer) being exposed under the same ioplug framework?
=== On the pulse plugin ===
PulseAudio natively supports rewinding on the wire. However, the ioplug-based plugin doesn't. It pretends that it rewound something, but always passes 0, 0 as the last two parameters to pa_stream_write(). That's bad.
It looks possible (but I haven't checked) to figure out the correct arguments (i.e. to detect the rewinds done before the call to snd_pcm_writei) by looking at the application pointer. This way, it looks possible to support rewinds, but does not look possible to get the correct result (and not the result implied by the circular-buffer model) from the snd_pcm_rewindable() function.
Also, we apply proposals from the previous section, we'll either end up with a plugin that is truly non-rewindable and doesn't pretend to be rewindable, or (with a strawman proposal) a rewindable plugin that forces low latency on the PulseAudio side and causes a lot of wakeups. Both situations are suboptimal.
So, I see no way around adding the .rewind and .rewindable callbacks to snd_pcm_ioplug_callback, and treating the plugins with them as rewindable. I guess .forward and .forwardable should be added, too.
For discussion: do we do the above or declare that ioplug is not really a suitable infrastructure for the pulse plugin? What other alternatives can be proposed?
=== On communication of non-rewindability to the program ===
PulseAudio attempts to use timer-based scheduling and rewinds. It makes a big hardware buffer in expectation that it will be able to rewind - with the exception of a workaround for ioplug-based plugins. It should not set big latency for non-rewindable plugins, but currently has no idea how to get this bit of information. See http://lists.freedesktop.org/archives/pulseaudio-discuss/2014-April/020457.h... for the initial problem statement.
If all of the proposals above are implemented, then we'll be at a point where each ALSA plugin is either almost fully rewindable, or not rewindable at all. So, contrary to the end of http://permalink.gmane.org/gmane.linux.alsa.devel/122191 , I think we need just snd_pcm_hw_params_is_seekable() (or rename it if you wish) with an essentially boolean result.
For discussion:
1. Decide, finally, on this bit of API, so that I can start working.
2. Decide what to do with old alsa-lib versions in PulseAudio. I.e. transition plan.
=== On the programmer expectations ===
(social issue)
Some people, including at least Andrew Eikum and Clemens Ladisch, at least once in the past expressed the opinion that amounts to "any plugin that does not allow random access is, as far as the ALSA API is concerned, buggy" (quoting http://permalink.gmane.org/gmane.linux.alsa.devel/122159 ), i.e. they are maybe asking for the impossible. We need to do something about that. Choices:
1. Document clearly that there exist non-rewindable plugins, and that it is not a bug until someone implements a working time machine.
2. Implement a strawman proposal with the background low-latency thread if it is doable (and I am not sure that it is doable).
3. Remove ioplug, extplug and ladspa as unfixable, fix the rate and adpcm plugins (or remove adpcm). Do something about pulse and jack because they rely on the to-be-removed ioplug infrastructure. Then all remaining plugins will indeed be fully rewindable.
4. Something else?
Sorry for the negative tone here, I know I am bad at formulating and resolving social issues.
=== Conclusion ===
When all the proposals are implemented, we'll have correct implementations of the rewind operation in all plugins (where "return 0" counts as correct), and "return 0" only where it is truly unavoidable. PulseAudio will be able to use all of that, and avoid doing rewinds on non-rewindable plugins. On this positive note, I'd like to finish writing this long e-mail.
On Sun, 2014-09-07 at 21:16 +0600, Alexander E. Patrakov wrote:
=== On non-rewindability of the rate plugin ===
I intend to write a rewindable resampler eventually, but don't have time now. I understand that it is an important task, but issues below (and the dayjob which you can change by offering me a new one) have higher priority for me. However, I want everyone to understand the following point now:
"The resampler has to be written from scratch for the reasons explained in http://permalink.gmane.org/gmane.linux.alsa.devel/122179 , and similar arguments apply to all other kinds of sound processing code that needs history."
For PulseAudio, it is also needed to figure out the desired interaction between variable rate and rewindability. Should rewinding other than "discard everything completely" be allowed at all on variable rate streams when the rewind crosses the sample rate change point? I.e., write 100 samples, change rate, write 50 samples, rewind 100 samples, what should be the resulting rate? Should we special-case small changes vs big ones?
This last paragraph isn't related to the rate plugin, right? So you're talking about PulseAudio internals only? If so, perhaps the best approach would be to make the current stream buffer contents non-rewindable when a rate change occurs, at least until someone points out a real use case where it is important to be able to rewind past rate change points in the buffer. Without any example use cases, I don't feel qualified to answer the question what should happen if a rewind crosses a rate change point (or possibly several!).
Another question is that should we do something to previously buffered stream data when the rate changes. If the audio rate changes completely, e.g. from 44.1 kHz to 8 kHz, any previously buffered audio was probably meant to be played at 44.1 kH, but with the current code it will be played at 8 kHz. I don't know if there are any applications that (ab)use the dynamic rate feature this way, though. Maybe we could just document that the dynamic rate feature is only meant for small adjustments.
08.09.2014 00:38, Tanu Kaskinen wrote:
On Sun, 2014-09-07 at 21:16 +0600, Alexander E. Patrakov wrote:
=== On non-rewindability of the rate plugin ===
I intend to write a rewindable resampler eventually, but don't have time now. I understand that it is an important task, but issues below (and the dayjob which you can change by offering me a new one) have higher priority for me. However, I want everyone to understand the following point now:
"The resampler has to be written from scratch for the reasons explained in http://permalink.gmane.org/gmane.linux.alsa.devel/122179 , and similar arguments apply to all other kinds of sound processing code that needs history."
For PulseAudio, it is also needed to figure out the desired interaction between variable rate and rewindability. Should rewinding other than "discard everything completely" be allowed at all on variable rate streams when the rewind crosses the sample rate change point? I.e., write 100 samples, change rate, write 50 samples, rewind 100 samples, what should be the resulting rate? Should we special-case small changes vs big ones?
This last paragraph isn't related to the rate plugin, right?
Right.
So you're talking about PulseAudio internals only?
Yes, in the last paragraph only.
If so, perhaps the best approach would be to make the current stream buffer contents non-rewindable when a rate change occurs, at least until someone points out a real use case where it is important to be able to rewind past rate change points in the buffer. Without any example use cases, I don't feel qualified to answer the question what should happen if a rewind crosses a rate change point (or possibly several!).
Another question is that should we do something to previously buffered stream data when the rate changes. If the audio rate changes completely, e.g. from 44.1 kHz to 8 kHz, any previously buffered audio was probably meant to be played at 44.1 kH, but with the current code it will be played at 8 kHz. I don't know if there are any applications that (ab)use the dynamic rate feature this way, though. Maybe we could just document that the dynamic rate feature is only meant for small adjustments.
Thanks for the feedback. Let's see if others say anything else on this topic.
Alexander E. Patrakov wrote:
The consensus is that rewind_safeguard is a workaround for an ALSA bug.
I do not agree with this consensus.
This information should come from snd_pcm_rewindable() from the hw plugin. I.e., on a hw device, it should not be equal to snd_pcm_mmap_hw_avail(). [...] the safeguard should be ideally equal to the granularity of hardware pointer updates.
When snd_pcm_rewindable() is called, it uses the last reported hardware pointer position, which is the boundary between safe-to-rewrite and already-used data. Subtracting the amount of one pointer update step makes it safe for the next pointer update, but it is not known how much time will elapse until the hardware does this update. This time might be too small for the software to have any chance, or even zero, but it is also possible that there is still enough time to do the rewriting up until the reported boundary. And if the software is too slow, it is even possible that two or more pointer updates happen, which would have required a larger safeguard.
In other words: the snd_pcm_mmap_hw_avail() value is possibly ouf of date, but due to the real-time nature of hardware pointer updates, it is not possible to define a safeguard that would be more correct. Writing the buffer near the DMA pointer is always racy.
The only somewhat reliable way to determine if a rewrite is successful is to check _afterwards_ whether the hardware pointer has advanced by too much.
[...] On cards where pointer updates happen only on interrupts, the driver should not configure the card in such a way that one period visible to the userspace corresponds to one interrupt. Instead, it should always configure the card for the minimum possible period size, and report only part of the period interrupts to period_elapsed().
Such stupid DMA controllers are typically found on mobile devices, which cannot afford extraneous interrupts.
=== On the programmer expectations ===
Some people, including at least Andrew Eikum and Clemens Ladisch, at least once in the past expressed the opinion that amounts to "any plugin that does not allow random access is, as far as the ALSA API is concerned, buggy" (quoting http://permalink.gmane.org/gmane.linux.alsa.devel/122159 ), i.e. they are maybe asking for the impossible.
This was merely a description of the current API and its implementation, which assume that all devices/plugins have a rewritable ring buffer. I am fully aware that this assumption is wrong.
Regards, Clemens
The consensus is that rewind_safeguard is a workaround for an ALSA bug.
I do not agree with this consensus.
This information should come from snd_pcm_rewindable() from the hw plugin. I.e., on a hw device, it should not be equal to snd_pcm_mmap_hw_avail(). [...] the safeguard should be ideally equal to the granularity of hardware pointer updates.
When snd_pcm_rewindable() is called, it uses the last reported hardware pointer position, which is the boundary between safe-to-rewrite and already-used data. Subtracting the amount of one pointer update step makes it safe for the next pointer update, but it is not known how much time will elapse until the hardware does this update. This time might be too small for the software to have any chance, or even zero, but it is also possible that there is still enough time to do the rewriting up until the reported boundary. And if the software is too slow, it is even possible that two or more pointer updates happen, which would have required a larger safeguard.
In other words: the snd_pcm_mmap_hw_avail() value is possibly ouf of date, but due to the real-time nature of hardware pointer updates, it is not possible to define a safeguard that would be more correct. Writing the buffer near the DMA pointer is always racy.
The only somewhat reliable way to determine if a rewrite is successful is to check _afterwards_ whether the hardware pointer has advanced by too much.
snd_pcm_rewindable is properly not any SAFE value for rewind, it should return zero for those hardware which don't support rewind(e.g, non interleaved or cs46xx)
It is just a theoretical value for rewind since the application don't rewrite immediately
So the application need to estimate the safe guard value from the processing time of the amount of data need to be reprocess /rewrite and CPU speed
Should pulseaudio enable timer scheduling only on those cards which support disable of period interrupt (e.g. hda-Intel and oxygen )?
08.09.2014 02:51, Clemens Ladisch wrote:
Alexander E. Patrakov wrote:
The consensus is that rewind_safeguard is a workaround for an ALSA bug.
I do not agree with this consensus.
OK, disagreements are what this thread is for :)
This information should come from snd_pcm_rewindable() from the hw plugin. I.e., on a hw device, it should not be equal to snd_pcm_mmap_hw_avail(). [...] the safeguard should be ideally equal to the granularity of hardware pointer updates.
When snd_pcm_rewindable() is called, it uses the last reported hardware pointer position, which is the boundary between safe-to-rewrite and already-used data. Subtracting the amount of one pointer update step makes it safe for the next pointer update, but it is not known how much time will elapse until the hardware does this update. This time might be too small for the software to have any chance, or even zero, but it is also possible that there is still enough time to do the rewriting up until the reported boundary. And if the software is too slow, it is even possible that two or more pointer updates happen, which would have required a larger safeguard.
I disagree with the words "boundary between safe-to-rewrite and already-used data". They describe a point that hardware knows, but software cannot know. My viewpoint is that, even with a perfect scheduler, known-safe-to-rewrite and known-already-used data are not separated with a point. See below for an example.
You indeed have a strong argument for keeping the safeguard in PulseAudio as a protection against scheduler glitches, but look what happens on ymfpci. Having read David's email, I understand that it is not the same reason why the safeguard has been originally added.
On ymfpci, interrupts (that don't correspond 1:1 to periods) happen every 5 ms, and pointer (which is a hardware register) is updated only during interrupts by the hardware.
|---------|---------P----h----p---------|-a-------|---------|
Here each character corresponds to 0.5 ms, "-" means nothing interesting, "|", "P" and "p" mark interrupt positions, "h" marks the place where the card reads the sound data from (i.e. the true boundary between safe-to-rewrite and already-used data). Thus, the interrupt at "P" has already happened, but the interrupt at "p" hasn't. "a" is the application write pointer.
So, what should alsa-lib return for snd_pcm_avail() and snd_pcm_rewind()? The driver only knows that "P" is already used, can infer that "p" isn't used yet, and knows nothing about samples in the middle.
For snd_pcm_avail(), it can only say that the application may only write up to "P" from the left. Everything else would be a possible lie leading to not-yet-played data possibly being overwritten.
For snd_pcm_rewindable(), it can only say that the application may try to go to "p" from the right. Everything further to the left is at risk of being already-played, even with a perfect scheduler and infinitely fast CPU.
In other words: the snd_pcm_mmap_hw_avail() value is possibly ouf of date, but due to the real-time nature of hardware pointer updates, it is not possible to define a safeguard that would be more correct. Writing the buffer near the DMA pointer is always racy.
Correct. The problem is that, on some cards, the uncertainty here (described above) is an order of magnitude bigger than typical scheduler glitches, and depends on sound card hardware. I don't want to introduce an unnecessarily-big safeguard for all sound cards because of a few bad ones.
So it looks like we need both the hardware-independent safeguard in PulseAudio (against scheduler glitches) and the hardware-dependent safeguard in alsa-lib (against known imprecise reporting by the card).
The only somewhat reliable way to determine if a rewrite is successful is to check _afterwards_ whether the hardware pointer has advanced by too much.
PulseAudio does that in addition to the safeguard.
[...] On cards where pointer updates happen only on interrupts, the driver should not configure the card in such a way that one period visible to the userspace corresponds to one interrupt. Instead, it should always configure the card for the minimum possible period size, and report only part of the period interrupts to period_elapsed().
Such stupid DMA controllers are typically found on mobile devices, which cannot afford extraneous interrupts.
Thanks for the information, I didn't know that, and hereby retract the quoted proposal.
Still, to support this type of stupid DMA controllers properly, we need a vocabulary of such weirdnesses and an agreed-upon way to communicate them from the kernel to userspace. I don't think that I can build such vocabulary.
=== On the programmer expectations ===
Some people, including at least Andrew Eikum and Clemens Ladisch, at least once in the past expressed the opinion that amounts to "any plugin that does not allow random access is, as far as the ALSA API is concerned, buggy" (quoting http://permalink.gmane.org/gmane.linux.alsa.devel/122159 ), i.e. they are maybe asking for the impossible.
This was merely a description of the current API and its implementation, which assume that all devices/plugins have a rewritable ring buffer. I am fully aware that this assumption is wrong.
OK. Now I see that in your case it was a misinterpretation from my side - sorry for that! But this doesn't cancel the attitude "if a function is documented without any caveats, it must always work, usefully" that I have seen from others. So I'll put a documentation patch on my TODO list.
Alexander E. Patrakov wrote:
|---------|---------P----h----p---------|-a-------|---------|
So, what should alsa-lib return for snd_pcm_avail() and snd_pcm_rewind()? The driver only knows that "P" is already used, can infer that "p" isn't used yet, and knows nothing about samples in the middle.
Indeed. However, the DMA pointer moves asynchronously, so it is possible that it has already moved beyond p when snd_pcm_rewindable() returns. For the samples between P and p, the risk is larger than for those after p, but p is not a boundary where the risk abruptly decreases.
It would make sense to report the pointer update granularity, but not to adjust the return value of snd_pcm_avail/rewindable().
Regards, Clemens
09.09.2014 14:43, Clemens Ladisch wrote:
Alexander E. Patrakov wrote:
|---------|---------P----h----p---------|-a-------|---------|
So, what should alsa-lib return for snd_pcm_avail() and snd_pcm_rewind()? The driver only knows that "P" is already used, can infer that "p" isn't used yet, and knows nothing about samples in the middle.
Indeed. However, the DMA pointer moves asynchronously, so it is possible that it has already moved beyond p when snd_pcm_rewindable() returns. For the samples between P and p, the risk is larger than for those after p, but p is not a boundary where the risk abruptly decreases.
It would make sense to report the pointer update granularity, but not to adjust the return value of snd_pcm_avail/rewindable().
OK, I understand your viewpoint, and the phrase "some indicator of the actual rewind granularity and/or safeguard ... should be enough for PA to be able to pick a suitable default latency" from David indicates that he has a similar opinion.
Now the remaining question is: can the proposed heuristic (minimum period size for a given sample rate, number of channels and sample format) be useful as an upper-bound approximation of the pointer update granularity for cards that are "rewindable even further than the nearest period"?
On 2014-09-09 10:55, Alexander E. Patrakov wrote:
09.09.2014 14:43, Clemens Ladisch wrote:
Alexander E. Patrakov wrote:
|---------|---------P----h----p---------|-a-------|---------|
So, what should alsa-lib return for snd_pcm_avail() and snd_pcm_rewind()? The driver only knows that "P" is already used, can infer that "p" isn't used yet, and knows nothing about samples in the middle.
Indeed. However, the DMA pointer moves asynchronously, so it is possible that it has already moved beyond p when snd_pcm_rewindable() returns. For the samples between P and p, the risk is larger than for those after p, but p is not a boundary where the risk abruptly decreases.
It would make sense to report the pointer update granularity, but not to adjust the return value of snd_pcm_avail/rewindable().
OK, I understand your viewpoint, and the phrase "some indicator of the actual rewind granularity and/or safeguard ... should be enough for PA to be able to pick a suitable default latency" from David indicates that he has a similar opinion.
Now the remaining question is: can the proposed heuristic (minimum period size for a given sample rate, number of channels and sample format) be useful as an upper-bound approximation of the pointer update granularity for cards that are "rewindable even further than the nearest period"?
Aha, thanks for the explanation. Now I understand that approximation idea. I don't know if that's a reasonable approximation, but even if it is, how would you determine if a card actually has that pointer granularity, or if the pointer granularity varies with period size? (I e without actually running a stream and measure it)
09.09.2014 15:08, David Henningsson wrote:
On 2014-09-09 10:55, Alexander E. Patrakov wrote:
09.09.2014 14:43, Clemens Ladisch wrote:
Alexander E. Patrakov wrote:
|---------|---------P----h----p---------|-a-------|---------|
So, what should alsa-lib return for snd_pcm_avail() and snd_pcm_rewind()? The driver only knows that "P" is already used, can infer that "p" isn't used yet, and knows nothing about samples in the middle.
Indeed. However, the DMA pointer moves asynchronously, so it is possible that it has already moved beyond p when snd_pcm_rewindable() returns. For the samples between P and p, the risk is larger than for those after p, but p is not a boundary where the risk abruptly decreases.
It would make sense to report the pointer update granularity, but not to adjust the return value of snd_pcm_avail/rewindable().
OK, I understand your viewpoint, and the phrase "some indicator of the actual rewind granularity and/or safeguard ... should be enough for PA to be able to pick a suitable default latency" from David indicates that he has a similar opinion.
Now the remaining question is: can the proposed heuristic (minimum period size for a given sample rate, number of channels and sample format) be useful as an upper-bound approximation of the pointer update granularity for cards that are "rewindable even further than the nearest period"?
Aha, thanks for the explanation. Now I understand that approximation idea. I don't know if that's a reasonable approximation, but even if it is, how would you determine if a card actually has that pointer granularity, or if the pointer granularity varies with period size? (I e without actually running a stream and measure it)
Currently, as you have already said, we have no such information. This information is, however, static for a given card model and should, in the future, come from the kernel. Therefore:
1. We need a new flag alongside SNDRV_PCM_INFO_BATCH that kernel drivers would set, and alsa-lib to act upon. As indicated in the following posts, SNDRV_PCM_INFO_BATCH means a different and not-useful-here thing:
http://mailman.alsa-project.org/pipermail/alsa-devel/2014-March/073816.html http://mailman.alsa-project.org/pipermail/alsa-devel/2014-March/073817.html
2. We need a volunteer to crawl through kernel sources and mark drivers that cannot report the pointer position with a better-than-one-period granularity.
3. Until this is done, we have to either assume that all cards are good, or that all cards are bad, or maybe misuse the SNDRV_PCM_INFO_BATCH flag as a pessimistic approximation of what we want (and document this approximation) if anyone thinks that such misuse will be beneficial in the short term.
This leaves the question of "old kernel + new alsa-lib" open.
|---------|---------P----h----p---------|-a-------|---------|
So, what should alsa-lib return for snd_pcm_avail() and snd_pcm_rewind()? The driver only knows that "P" is already used, can infer that "p" isn't used yet, and knows nothing about samples in the middle.
Indeed. However, the DMA pointer moves asynchronously, so it is possible that it has already moved beyond p when snd_pcm_rewindable() returns. For the samples between P and p, the risk is larger than for those
after
p, but p is not a boundary where the risk abruptly decreases.
It would make sense to report the pointer update granularity, but not to adjust the return value of snd_pcm_avail/rewindable().
OK, I understand your viewpoint, and the phrase "some indicator of the actual rewind granularity and/or safeguard ... should be enough for PA to be able to pick a suitable default latency" from David indicates that he has a similar opinion.
Now the remaining question is: can the proposed heuristic (minimum period size for a given sample rate, number of channels and sample format) be useful as an upper-bound approximation of the pointer update granularity for cards that are "rewindable even further than the nearest period"?
Aha, thanks for the explanation. Now I understand that approximation
idea.
I don't know if that's a reasonable approximation, but even if it is, how would you determine if a card actually has that pointer granularity, or if the pointer granularity varies with period size? (I e without actually running a stream and measure it)
Currently, as you have already said, we have no such information. This
information is, however, static for a given card model and should, in the future, come from the kernel. Therefore:
- We need a new flag alongside SNDRV_PCM_INFO_BATCH that kernel drivers
would set, and alsa-lib to act upon. As indicated in the following posts, SNDRV_PCM_INFO_BATCH means a different and not-useful-here thing:
http://mailman.alsa-project.org/pipermail/alsa-devel/2014-March/073816.html
http://mailman.alsa-project.org/pipermail/alsa-devel/2014-March/073817.html
- We need a volunteer to crawl through kernel sources and mark drivers
that cannot report the pointer position with a better-than-one-period granularity.
- Until this is done, we have to either assume that all cards are good,
or that all cards are bad, or maybe misuse the SNDRV_PCM_INFO_BATCH flag as a pessimistic approximation of what we want (and document this approximation) if anyone thinks that such misuse will be beneficial in the short term.
This leaves the question of "old kernel + new alsa-lib" open.
--
https://git.kernel.org/cgit/linux/kernel/git/tiwai/sound.git/commit/sound/so...
a) Set the SNDRV_PCM_INFO_BATCH if the granularity is per period or worse. b) Fallback to the (race condition prone) period counting if the driver does not support any residue reporting.
Seem soc already have this granularity
How can the granularity worse more than one period ?
https://git.kernel.org/cgit/linux/kernel/git/tiwai/sound.git/tree/include/li...
enum dma_residue_granularity { DMA_RESIDUE_GRANULARITY_DESCRIPTOR = 0, DMA_RESIDUE_GRANULARITY_SEGMENT = 1, DMA_RESIDUE_GRANULARITY_BURST = 2, };
There are three type of granularity
Does this mean the those sound card can report DMA_RESIDUE_GRANULARITY_BURST and driver use readl in pcm pointer callback ?
A few PCI sound cards use SG buffer including hda
It seem that pulseaudio expect the driver support DMA_RESIDUE_GRANULARITY_BURST for rewind/ timer scheduling
/** * enum dma_residue_granularity - Granularity of the reported transfer residue * @DMA_RESIDUE_GRANULARITY_DESCRIPTOR: Residue reporting is not support. The DMA channel is only able to tell whether a descriptor has been completed or not, which means residue reporting is not supported by this channel. The residue field of the dma_tx_state field will always be 0. * @DMA_RESIDUE_GRANULARITY_SEGMENT: Residue is updated after each successfully completed segment of the transfer (For cyclic transfers this is after each period). This is typically implemented by having the hardware generate an interrupt after each transferred segment and then the drivers updates the outstanding residue by the size of the segment. Another possibility is if the hardware supports scatter-gather and the segment descriptor has a field which gets set after the segment has been completed. The driver then counts the number of segments without the flag set to compute the residue. * @DMA_RESIDUE_GRANULARITY_BURST: Residue is updated after each transferred burst. This is typically only supported if the hardware has a progress register of some sort (E.g. a register with the current read/write address or a register with the amount of bursts/beats/bytes that have been transferred or still need to be transferred). */
On 09/21/2014 04:02 AM, Raymond Yau wrote: [...]
https://git.kernel.org/cgit/linux/kernel/git/tiwai/sound.git/commit/sound/so...
a) Set the SNDRV_PCM_INFO_BATCH if the granularity is per period or worse. b) Fallback to the (race condition prone) period counting if the driver does not support any residue reporting.
Seem soc already have this granularity
How can the granularity worse more than one period ?
The granularity is the granularity that the DMA driver is able to report. In some cases the DMA driver is only able to tell whether a transfer has finished or not, but is not able tell any intermediate position. In case of a cyclic transfer the transfer never stops, so the DMA driver will only ever tell us that it hasn't finished the transfer yet. But the DMA driver will still generate a interrupt after every finished period. So we use this as a fallback mechanism and simply increase the pointer position after each interrupt by one period size. So userspace will never see a granularity that is worse than one descriptor. This is just something kernel internal.
On a side node, this period counting scheme is prone to race conditions and we strongly discourage new drivers from using it. Its mainly for supporting old DMA drivers which do not implement progress reporting (yet).
https://git.kernel.org/cgit/linux/kernel/git/tiwai/sound.git/tree/include/li...
enum dma_residue_granularity { DMA_RESIDUE_GRANULARITY_DESCRIPTOR = 0, DMA_RESIDUE_GRANULARITY_SEGMENT = 1, DMA_RESIDUE_GRANULARITY_BURST = 2, };
There are three type of granularity
Only SEGMENT and BURST granularity are relevant for audio userspace.
For ALSA we'd probably see different kind of categories of granularity though. E.g. one field which is the unit of the granularity
PERIOD: The granularity is a multiple of the period size FRAME: The granularity is a multiple of the frame size BYTE: The granularity is a fixed number of bytes MS: The granularity is a fixed amount of time
and then a second field that has a integer number of the granularity in that unit.
And then there is of course the issue that for some chips you can trade granularity for e.g. power efficiency or similar (e.g. by changing the transfer burst size). And this needs to be modeled somehow as well.
Does this mean the those sound card can report DMA_RESIDUE_GRANULARITY_BURST and driver use readl in pcm pointer callback ?
A few PCI sound cards use SG buffer including hda
It seem that pulseaudio expect the driver support DMA_RESIDUE_GRANULARITY_BURST for rewind/ timer scheduling
Yes. This is why we set the BATCH flag if the granularity is not DMA_RESIDUE_GRANULARITY_BURST so for example pulse audio can disable timer scheduling.
- Lars
22.09.2014 19:20, Lars-Peter Clausen wrote:
Does this mean the those sound card can report DMA_RESIDUE_GRANULARITY_BURST and driver use readl in pcm pointer callback ?
A few PCI sound cards use SG buffer including hda
It seem that pulseaudio expect the driver support DMA_RESIDUE_GRANULARITY_BURST for rewind/ timer scheduling
Yes. This is why we set the BATCH flag if the granularity is not DMA_RESIDUE_GRANULARITY_BURST so for example pulse audio can disable timer scheduling.
For the record, disabling timer-based scheduling is IMHO a matter of further discussion. As long as there is enough safeguard, I think that timer-based scheduling can still be used, and is useful. A living proof is the whole story with the snd-usb-audio driver where (justified) addition of the BATCH flag was perceived as a performance regression and not as a fix to some real and obvious problem.
I do agree that such devices need to be marked up with the BATCH flag, so that PulseAudio chooses reasonable hardware parameters.
On 09/22/2014 03:36 PM, Alexander E. Patrakov wrote:
22.09.2014 19:20, Lars-Peter Clausen wrote:
Does this mean the those sound card can report DMA_RESIDUE_GRANULARITY_BURST and driver use readl in pcm pointer callback ?
A few PCI sound cards use SG buffer including hda
It seem that pulseaudio expect the driver support DMA_RESIDUE_GRANULARITY_BURST for rewind/ timer scheduling
Yes. This is why we set the BATCH flag if the granularity is not DMA_RESIDUE_GRANULARITY_BURST so for example pulse audio can disable timer scheduling.
For the record, disabling timer-based scheduling is IMHO a matter of further discussion. As long as there is enough safeguard, I think that timer-based scheduling can still be used, and is useful. A living proof is the whole story with the snd-usb-audio driver where (justified) addition of the BATCH flag was perceived as a performance regression and not as a fix to some real and obvious problem.
I do agree that such devices need to be marked up with the BATCH flag, so that PulseAudio chooses reasonable hardware parameters.
When I tried things it looks like it was more than just the safeguard threshold. It looked as if the algorithm that adjusts the wake-up time gets confused if the granularity is not good enough and just wakes up at the wrong point which leads to glitches. This can probably be fixed though by modifying the algorithm to take the granularity into account.
- Lars
Does this mean the those sound card can report DMA_RESIDUE_GRANULARITY_BURST and driver use readl in pcm pointer callback ?
A few PCI sound cards use SG buffer including hda
It seem that pulseaudio expect the driver support DMA_RESIDUE_GRANULARITY_BURST for rewind/ timer scheduling
Yes. This is why we set the BATCH flag if the granularity is not DMA_RESIDUE_GRANULARITY_BURST so for example pulse audio can disable timer scheduling.
The resolution of pulseaudio volume is higher than the number of steps of the hardware volume control, this mean any volume change by user force pulseaudio to rewind because of the change in software volume
As user won't expect the volume change is delayed by one second
Those drivers should not use 2 periods as graunulaity is one period which is 170ms ~1 second if you are running video conference (e.g. Google hangout) when video is 15~30 frames per second
The safeguard can only be decreased by reduce the period size
Is it feasible for pulseaudio to use more periods with suitable period size/time for the requested latency when there is one and only one client when the sound card cannot provide precise DMA position instead of maximum period size/time ?
For the record, disabling timer-based scheduling is IMHO a matter of
further discussion. As long as there is enough safeguard, I think that timer-based scheduling can still be used, and is useful. A living proof is the whole story with the snd-usb-audio driver where (justified) addition of the BATCH flag was perceived as a performance regression and not as a fix to some real and obvious problem.
The point is some drivers use .periods_min = 1
Pulseaudio select minimum number of period
interrupt driven mode should not use one period per buffer since granularity is one period
one period per buffer work only when sound card can report granularity which is less than one period
aplay won't use one period per buffer , default is four but fallback to two when driver use .periods_max = 2
http://0pointer.net/blog/projects/pulse-glitch-free.html
NFRAGS must >= 2
23.09.2014 14:29, Raymond Yau wrote:
Does this mean the those sound card can report DMA_RESIDUE_GRANULARITY_BURST and driver use readl in pcm pointer callback ?
A few PCI sound cards use SG buffer including hda
It seem that pulseaudio expect the driver support DMA_RESIDUE_GRANULARITY_BURST for rewind/ timer scheduling
Yes. This is why we set the BATCH flag if the granularity is not DMA_RESIDUE_GRANULARITY_BURST so for example pulse audio can disable timer scheduling.
The resolution of pulseaudio volume is higher than the number of steps of the hardware volume control, this mean any volume change by user force pulseaudio to rewind because of the change in software volume
As user won't expect the volume change is delayed by one second
Absolutely correct, and a similar thing was already discussed. The interesting part of the discussion starts here:
http://lists.freedesktop.org/archives/pulseaudio-discuss/2014-April/020462.h...
Please disregard my "factor of 1000" statement - it is no longer true.
Those drivers should not use 2 periods as graunulaity is one period which is 170ms ~1 second if you are running video conference (e.g. Google hangout) when video is 15~30 frames per second
The safeguard can only be decreased by reduce the period size
Also correct for the BATCH cards. However, please note that for the BATCH cards PulseAudio looks at the default-fragment-size-msec setting from daemon.conf by default.
Is it feasible for pulseaudio to use more periods with suitable period size/time for the requested latency when there is one and only one client when the sound card cannot provide precise DMA position instead of maximum period size/time ?
Probably not, because the rewinds are exposed in the public PulseAudio APIs (in particular, via the last two parameters of pa_stream_write()). So it definitely should not use the maximum period time, but should also not derive one from the client-requested latency. A hard-coded default or a default from the config (i.e. the current situation) is therefore as good as one can get on BATCH cards regarding the period size.
For the record, disabling timer-based scheduling is IMHO a matter of
further discussion. As long as there is enough safeguard, I think that timer-based scheduling can still be used, and is useful. A living proof is the whole story with the snd-usb-audio driver where (justified) addition of the BATCH flag was perceived as a performance regression and not as a fix to some real and obvious problem.
The point is some drivers use .periods_min = 1
Pulseaudio select minimum number of period
It does not do that on BATCH cards. Or if it does, it is a bug.
As for the rest of your arguments, you are just stating obvious and correct things, so I see no point in quoting them.
Indeed, most of the work on PulseAudio side would mean choosing the correct period size, number of periods, wakeup threshold and rewind granularity for each possible situation. I should just do it when the needed API appears on the ALSA side :)
Alexander E. Patrakov wrote:
can the proposed heuristic (minimum period size for a given sample rate, number of channels and sample format) be useful as an upper- bound approximation of the pointer update granularity for cards that are "rewindable even further than the nearest period"?
No; USB precsion depends on the URB size, which is roughly proportional to (but smaller than) the period size.
Regards, Clemens
09.09.2014 19:45, Clemens Ladisch wrote:
Alexander E. Patrakov wrote:
can the proposed heuristic (minimum period size for a given sample rate, number of channels and sample format) be useful as an upper- bound approximation of the pointer update granularity for cards that are "rewindable even further than the nearest period"?
No; USB precsion depends on the URB size, which is roughly proportional to (but smaller than) the period size.
Thanks for the explanation. It was very useful.
At this point, I am tempted to not express this "less than the period size but depends on it" rule, and classify USB audio devices as "Rewindable down to the period size", even though it is a pessimization.
Also (sorry for the off-topic) this explanation completely crystallizes my opinion about possible misuse of the SNDRV_PCM_INFO_BATCH flag in the snd-usb-audio driver or PulseAudio. My opinion now is: there is no misuse on either side.
At Tue, 09 Sep 2014 21:55:05 +0600, Alexander E. Patrakov wrote:
09.09.2014 19:45, Clemens Ladisch wrote:
Alexander E. Patrakov wrote:
can the proposed heuristic (minimum period size for a given sample rate, number of channels and sample format) be useful as an upper- bound approximation of the pointer update granularity for cards that are "rewindable even further than the nearest period"?
No; USB precsion depends on the URB size, which is roughly proportional to (but smaller than) the period size.
Thanks for the explanation. It was very useful.
At this point, I am tempted to not express this "less than the period size but depends on it" rule, and classify USB audio devices as "Rewindable down to the period size", even though it is a pessimization.
Also (sorry for the off-topic) this explanation completely crystallizes my opinion about possible misuse of the SNDRV_PCM_INFO_BATCH flag in the snd-usb-audio driver or PulseAudio. My opinion now is: there is no misuse on either side.
Well, it's not entirely a "misuse" in either side. The flag is merely a hint, and the current attitude of PA is in a safer side of the behavior this flag may indicate. Whether it's the best choice is another question, of course.
After all, we certainly need some API to expose the granularity. This has been asked since long time ago, but it wasn't done simply because the demands (and specifications) haven't been clarified.
Also, for rewinding, we may need to consider about the FIFO or other things on top of signal processing path. If the granularity defines the granularity of the hwptr *update*, it wouldn't be enough for defining a safe guard range. Currently, most of hardware have a small amount of FIFO that can be mostly ignored, fortunately, though.
Takashi
On 2014-09-07 17:16, Alexander E. Patrakov wrote:
Hello.
(TL;DR: nothing really new except the strawman proposal about threads and the note about interaction of variable sample rate with rewindability)
As Takashi Iwai told me that the audio miniconference is for discussion only, and not for presentation of anything, I guess that I need to present the plans and options now. That's why this e-mail. The goal here, besides merely presenting the plan, is to identify points that everyone agrees upon, so that they are not discussed pointlessly at the miniconference. Also, this e-mail serves as a justification for the pending seemingly-destructive work.
First, the status quo. If anyone disagrees with the facts below, please complain loudly, before I make any conclusions!
- PulseAudio does not call snd_pcm_rewindable(), because for some ALSA
plugins it crashed. This crash is completely fixed in alsa-lib 1.0.28, but in some cases snd_pcm_rewindable() still returns wrong results.
I don't know if we ever tried snd_pcm_rewindable() in PulseAudio.
- PulseAudio blindly assumes that it can rewind up to hwbuf_frames -
(snd_pcm_avail() + rewind_safeguard) frames. The rewind safeguard is needed due to reasons that I don't completely understand, but one of them is imprecise reporting of the hardware pointer, and another one is that the hardware transfers several bytes at a time, and the bytes we need to overwrite may be already cached by the hardware.
Pierre added the rewind safeguard due to DMA controller problems. IIRC, some DMA controllers go nuts (such as breaking the stream, causing interrupt storms, or something else seriously buggy) when trying to write to data that the DMA controller is just about to transfer. Pierre (now cc:ed) would know more about this than I do, though.
In my world, since this is a very hardware near problem, ALSA rather than PulseAudio should take these kinds of problems into account when reporting back snd_pcm_rewindable() so PulseAudio does not have to.
- PulseAudio blindly assumes that it can rewind up to hwbuf_frames -
(snd_pcm_avail() + rewind_safeguard) frames. The rewind safeguard is needed due to reasons that I don't completely understand, but one of them is imprecise reporting of the hardware pointer, and another one is that the hardware transfers several bytes at a time, and the bytes we need to overwrite may be already cached by the hardware.
Pierre added the rewind safeguard due to DMA controller problems. IIRC, some DMA controllers go nuts (such as breaking the stream, causing interrupt storms, or something else seriously buggy) when trying to write to data that the DMA controller is just about to transfer. Pierre (now cc:ed) would know more about this than I do, though.
In my world, since this is a very hardware near problem, ALSA rather than PulseAudio should take these kinds of problems into account when reporting back snd_pcm_rewindable() so PulseAudio does not have to.
Sorry to chime-in late on this, my real job keeps me busy. I would like to highlight that there is a fundamental conflict between requirements here. - for power consumption optimization, you want to let the hardware prefetch audio samples opportunistically and buffer as much as possible as close as possible to the serial link. Given the pressure on power optimizations these days, no one should expect the hardware buffering to reduce - and this buffering could vary depending on the platform power modes with the position of the DMA read pointer becoming less predictable. - for low-latency and user interaction, you want to rewind the write pointer as much as possible.
The question is 'how much'. The reason why i introduced this 'rewind_safeguard' was to factor in a good-enough latency that no one would ever complain about and yet large enough to avoid using stale data. We could ask drivers to provide a conservative estimate of this safeguard, but asking for a dynamic query of a safe position or an empirical determination based on the min period size is really asking for trouble. At a given point, if you really need very low-latency, i.e. sub ms, you will need a dedicated configuration that probably doesn't rely on rewinds and you probably don't use PulseAudio in the first place.
10.09.2014 01:56, Pierre-Louis Bossart wrote:
- PulseAudio blindly assumes that it can rewind up to hwbuf_frames -
(snd_pcm_avail() + rewind_safeguard) frames. The rewind safeguard is needed due to reasons that I don't completely understand, but one of them is imprecise reporting of the hardware pointer, and another one is that the hardware transfers several bytes at a time, and the bytes we need to overwrite may be already cached by the hardware.
Pierre added the rewind safeguard due to DMA controller problems. IIRC, some DMA controllers go nuts (such as breaking the stream, causing interrupt storms, or something else seriously buggy) when trying to write to data that the DMA controller is just about to transfer. Pierre (now cc:ed) would know more about this than I do, though.
In my world, since this is a very hardware near problem, ALSA rather than PulseAudio should take these kinds of problems into account when reporting back snd_pcm_rewindable() so PulseAudio does not have to.
Sorry to chime-in late on this, my real job keeps me busy.
Better late than never.
I would like to highlight that there is a fundamental conflict between requirements here.
- for power consumption optimization, you want to let the hardware
prefetch audio samples opportunistically and buffer as much as possible as close as possible to the serial link. Given the pressure on power optimizations these days, no one should expect the hardware buffering to reduce - and this buffering could vary depending on the platform power modes with the position of the DMA read pointer becoming less predictable.
- for low-latency and user interaction, you want to rewind the write
pointer as much as possible.
Spot-on.
The question is 'how much'. The reason why i introduced this 'rewind_safeguard' was to factor in a good-enough latency that no one would ever complain about and yet large enough to avoid using stale data. We could ask drivers to provide a conservative estimate of this safeguard, but asking for a dynamic query of a safe position or an empirical determination based on the min period size is really asking for trouble. At a given point, if you really need very low-latency, i.e. sub ms, you will need a dedicated configuration that probably doesn't rely on rewinds and you probably don't use PulseAudio in the first place.
Well, now there are two people criticizing the heuristic. I would like to retract it, but I currently can't, because there are no other constructive proposals.
The best solution would indeed be if a driver provides a conservative estimate of the rewind safeguard, but the problem is that nobody will add this to old drivers for no-longer-manufactured hardware. So we either need some other heuristic here, or we have to allow rewinding further than the nearest period only on cards where drivers explicitly say OK by providing a safeguard value. But the later proposal kills power-efficient PulseAudio on old kernels that simply don't know how to provide this information. Surely you won't like it.
On 09/07/2014 05:16 PM, Alexander E. Patrakov wrote: [...]
- PulseAudio blindly assumes that it can rewind up to hwbuf_frames -
(snd_pcm_avail() + rewind_safeguard) frames. The rewind safeguard is needed due to reasons that I don't completely understand, but one of them is imprecise reporting of the hardware pointer, and another one is that the hardware transfers several bytes at a time, and the bytes we need to overwrite may be already cached by the hardware.
Also at the time you will process the result form snd_pcm_avail() will be more than what was reported as the audio DMA keeps going in the background.
[...]
For the allegedly existing cards where the pointer granularity is always the same as the period size, Raymond's method will not work. However, I don't know any concrete example of such card (I didn't look at snd-firewire in detail, though). And in any case, my opinion is that this "pointer granularity = period size" limitation on such cards can be treated as a driver bug that needs to be fixed (by someone who knows the driver, not by me). On cards where pointer updates happen only on interrupts, the driver should not configure the card in such a way that one period visible to the userspace corresponds to one interrupt. Instead, it should always configure the card for the minimum possible period size, and report only part of the period interrupts to period_elapsed(). I.e.: userspace asked for 2 periods 64 ms each, but let's configure the card for, say, 64 periods 2 ms each, and use the "extra" interrupts only for updating the pointer.
There are quite a few ASoC drivers that do have this restriction. For some of the drivers the restriction comes from the hardware itself for others it is while the hardware in theory supports it nobody bothered so far to implement this.
I do not agree that this should be treated as a driver bug and that those drivers should increase the interrupt rate. Mainly because this is policy and the interrupt rate is configured by the application by setting the period size and the driver should not have to guess what kind of latency requirements the application has. If the application has lower latency requirements it needs to set a smaller period size. Unnecessarily increasing the interrupt rate has for example has a impact on power consumption which is something you'd like to keep low on battery powered devices.
We should though expose the granularity with which the pointer is updated to userspace so it can make educated decisions on how to configure the period size.
- Lars
On 2014-09-07 17:16, Alexander E. Patrakov wrote:
Hello.
(TL;DR: nothing really new except the strawman proposal about threads and the note about interaction of variable sample rate with rewindability)
So, having looked this through another time, it looks like we have three categories of ALSA devices, from rewindability point of view:
1) Not rewindable at all.
2) Rewindable down to the period size.
3) Rewindable even further than the nearest period, down to DMA transfer sizes or something else. This also requires the .pointer to have better granularity than the period size.
So I think any is_seekable() call or flag should indicate whether the device is case 1), 2) or 3). And for case 3) perhaps also some indicator of the actual rewind granularity and/or safeguard. This should be enough for PA to be able to pick a suitable default latency.
Case 1) is simple. Just let snd_pcm_rewindable return 0 and snd_pcm_rewind fail. If it doesn't, just fix it. (The extplug problem could be solved by PA having ifdefs depending on alsa-lib version, rather than making snd_pcm_rewind and snd_pcm_rewindable behave inconsistent.)
For case 2) you seem to suggest to emulate case 3) by using either a low-latency thread, or by increasing the number of interrupts from the hardware. Either method will inevitably increase power consumption, and the former might also increase the risk of glitches. Therefore I think this is replacing something bad with something worse, because I would value low power consumption higher than better rewinding.
Could we enable this functionality by explicit request by the application? Probably. E g, if the application sets a low period size but also sets the "disable period interrupts" flag, that could be an indicator that it wants lots of interrupts just to update the pointer, but nothing else. Maybe that's even the behaviour today (haven't checked).
The low-latency thread approach could be implemented by a separate ioplug layer, so that people who want it could open "flexiblerewind:plug:hw:0" instead of "plug:hw:0".
However, with my PA hat on, I would still say no to having PA use either of those by default, for power consumption reasons.
With that in mind I suggest, just as you, that we add .rewind/.rewindable callbacks to the ioplug layer. Any ioplug using .transfer needs to implement that if it wants to support rewinding, otherwise that ioplug would fall into category 1).
Does that make sense? It was a long email, so I might have missed something. :-)
08.09.2014 13:59, David Henningsson wrote:
(First of all, a big thanks for being the first to talk constructively about the important problem of rewindability classes!)
On 2014-09-07 17:16, Alexander E. Patrakov wrote:
Hello.
(TL;DR: nothing really new except the strawman proposal about threads and the note about interaction of variable sample rate with rewindability)
So, having looked this through another time, it looks like we have three categories of ALSA devices, from rewindability point of view:
Not rewindable at all.
Rewindable down to the period size.
Rewindable even further than the nearest period, down to DMA
transfer sizes or something else. This also requires the .pointer to have better granularity than the period size.
So I think any is_seekable() call or flag should indicate whether the device is case 1), 2) or 3). And for case 3) perhaps also some indicator of the actual rewind granularity and/or safeguard. This should be enough for PA to be able to pick a suitable default latency.
So far, I agree. The big question now is: do we have enough correct data from the kernel to decide between cases 2 and 3? I am definitely not qualified enough to answer this or to add such interface, though.
Case 1) is simple. Just let snd_pcm_rewindable return 0 and snd_pcm_rewind fail. If it doesn't, just fix it. (The extplug problem could be solved by PA having ifdefs depending on alsa-lib version, rather than making snd_pcm_rewind and snd_pcm_rewindable behave inconsistent.)
OK. But I don't think that the ifdef proposal is sufficient. If I understand correctly, it only covers the "new PA on old alsa-lib" problem, because we can't add these ifdefs to old PA. That's still progress! But the "keep the old implementation of rewind" was for the opposite problem - old PA on new alsa-lib which rightfully refuses to rewind.
For case 2) you seem to suggest to emulate case 3) by using either a low-latency thread, or by increasing the number of interrupts from the hardware. Either method will inevitably increase power consumption, and the former might also increase the risk of glitches. Therefore I think this is replacing something bad with something worse, because I would value low power consumption higher than better rewinding.
There is a slight misunderstanding above. I indeed proposed (and retracted the proposal when I knew that it happens on mobile devices) to emulate case 3 in case 2 by increasing the number of interrupts from the hardware. The proposal of a low-latency background thread was about emulating case 3 on case 1, and, so far, I have not retracted it, but I may do so in the future.
As for power consumption increase, the argument is essentially a duplicate of the phrase "the downside is that all programs (not only those intending to do rewinds) now pay the cost of the background low-latency thread" from my original mail. In other words, I partially agree. This, if enabled unconditionally, indeed would lead to unnecessary power consumption increase for those applications that don't use rewinds. But for applications that want low latency or dynamic latency, there is no other way to get it, so it is wrong to talk about power consumption increase for such applications.
Could we enable this functionality by explicit request by the application? Probably. E g, if the application sets a low period size but also sets the "disable period interrupts" flag, that could be an indicator that it wants lots of interrupts just to update the pointer, but nothing else. Maybe that's even the behaviour today (haven't checked).
A very good idea.
The low-latency thread approach could be implemented by a separate ioplug layer, so that people who want it could open "flexiblerewind:plug:hw:0" instead of "plug:hw:0".
Also a good idea.
However, with my PA hat on, I would still say no to having PA use either of those by default, for power consumption reasons.
I would also say no, but for a different reason (because, as I wrote above, the power-consumption reason is valid only for software that never wants to rewind and does not need low/dynamic latency, and PA does not fall into this category). My argument is that, in the case of PulseAudio, these emulations would just create an unnecessary layer of complexity. PulseAudio actively wants to know the device type (via the is_seekable() call), so it is reasonable for it to also deal with consequences, without the need for additional processing in the driver or for an extra thread.
With that in mind I suggest, just as you, that we add .rewind/.rewindable callbacks to the ioplug layer. Any ioplug using .transfer needs to implement that if it wants to support rewinding, otherwise that ioplug would fall into category 1).
Does that make sense? It was a long email, so I might have missed something. :-)
In fact, _I_ missed something when writing the original e-mail. I forgot to address your phrase from http://permalink.gmane.org/gmane.linux.alsa.devel/122191:
I understand that you have a mathematically perfect approach to this, as well as other algorithms. This would indeed be the best goal, but given an imperfect world, where we're forced to choose between
- no rewinding at all
- imperfect rewinding in the sense that it sometimes can produce
hearable artifacts
...I'm not sure 1) is always the right choice...
Should we have a snd_pcm_open (or whatever else) flag for accepting imperfect rewind results? [anyway, I would say "no" to using it in PulseAudio]
On 2014-09-08 10:46, Alexander E. Patrakov wrote:
08.09.2014 13:59, David Henningsson wrote:
(First of all, a big thanks for being the first to talk constructively about the important problem of rewindability classes!)
On 2014-09-07 17:16, Alexander E. Patrakov wrote:
Hello.
(TL;DR: nothing really new except the strawman proposal about threads and the note about interaction of variable sample rate with rewindability)
So, having looked this through another time, it looks like we have three categories of ALSA devices, from rewindability point of view:
Not rewindable at all.
Rewindable down to the period size.
Rewindable even further than the nearest period, down to DMA
transfer sizes or something else. This also requires the .pointer to have better granularity than the period size.
So I think any is_seekable() call or flag should indicate whether the device is case 1), 2) or 3). And for case 3) perhaps also some indicator of the actual rewind granularity and/or safeguard. This should be enough for PA to be able to pick a suitable default latency.
So far, I agree. The big question now is: do we have enough correct data from the kernel to decide between cases 2 and 3? I am definitely not qualified enough to answer this or to add such interface, though.
I'm quite certain that we don't have enough information coming through from the kernel. It's something we need to add if we end up deciding to implement hw_params_is_seekable.
Case 1) is simple. Just let snd_pcm_rewindable return 0 and snd_pcm_rewind fail. If it doesn't, just fix it. (The extplug problem could be solved by PA having ifdefs depending on alsa-lib version, rather than making snd_pcm_rewind and snd_pcm_rewindable behave inconsistent.)
OK. But I don't think that the ifdef proposal is sufficient. If I understand correctly, it only covers the "new PA on old alsa-lib" problem, because we can't add these ifdefs to old PA. That's still progress! But the "keep the old implementation of rewind" was for the opposite problem - old PA on new alsa-lib which rightfully refuses to rewind.
For case 2) you seem to suggest to emulate case 3) by using either a low-latency thread, or by increasing the number of interrupts from the hardware. Either method will inevitably increase power consumption, and the former might also increase the risk of glitches. Therefore I think this is replacing something bad with something worse, because I would value low power consumption higher than better rewinding.
There is a slight misunderstanding above. I indeed proposed (and retracted the proposal when I knew that it happens on mobile devices) to emulate case 3 in case 2 by increasing the number of interrupts from the hardware. The proposal of a low-latency background thread was about emulating case 3 on case 1, and, so far, I have not retracted it, but I may do so in the future.
Right. I guess the low-latency background thread can be used to emulate 3) on top of both 1) and 2).
This, if enabled unconditionally, indeed would lead to unnecessary power consumption increase for those applications that don't use rewinds. But for applications that want low latency or dynamic latency, there is no other way to get it, so it is wrong to talk about power consumption increase for such applications.
Uhm, is the set of applications that want "low or dynamic latency" necessarily the same as the set that want rewinds?
But that might be more of an academic question, I guess PA is the only one that does rewinds currently. Or are there more apps out there that do rewinds?
I'm not sure I buy this point completely, but I can't come up with a good counterexample either, so maybe you're right.
My argument is that, in the case of PulseAudio, these emulations would just create an unnecessary layer of complexity. PulseAudio actively wants to know the device type (via the is_seekable() call), so it is reasonable for it to also deal with consequences, without the need for additional processing in the driver or for an extra thread.
Agreed.
I understand that you have a mathematically perfect approach to this, as well as other algorithms. This would indeed be the best goal, but given an imperfect world, where we're forced to choose between
- no rewinding at all
- imperfect rewinding in the sense that it sometimes can produce
hearable artifacts
...I'm not sure 1) is always the right choice...
Should we have a snd_pcm_open (or whatever else) flag for accepting imperfect rewind results? [anyway, I would say "no" to using it in PulseAudio]
Well, if we had one, and it defaulted to the old behaviour, I suppose that would solve the new alsa-lib on old PA problem? But I'm not sure it's worth the added complexity.
08.09.2014 15:26, David Henningsson wrote:
On 2014-09-08 10:46, Alexander E. Patrakov wrote:
08.09.2014 13:59, David Henningsson wrote:
(First of all, a big thanks for being the first to talk constructively about the important problem of rewindability classes!)
On 2014-09-07 17:16, Alexander E. Patrakov wrote:
Hello.
(TL;DR: nothing really new except the strawman proposal about threads and the note about interaction of variable sample rate with rewindability)
So, having looked this through another time, it looks like we have three categories of ALSA devices, from rewindability point of view:
Not rewindable at all.
Rewindable down to the period size.
Rewindable even further than the nearest period, down to DMA
transfer sizes or something else. This also requires the .pointer to have better granularity than the period size.
So I think any is_seekable() call or flag should indicate whether the device is case 1), 2) or 3). And for case 3) perhaps also some indicator of the actual rewind granularity and/or safeguard. This should be enough for PA to be able to pick a suitable default latency.
So far, I agree. The big question now is: do we have enough correct data from the kernel to decide between cases 2 and 3? I am definitely not qualified enough to answer this or to add such interface, though.
I'm quite certain that we don't have enough information coming through from the kernel. It's something we need to add if we end up deciding to implement hw_params_is_seekable.
OK, but, given that we need this interface in order to avoid seeking on the DTS encoder (and to avoid creating a big buffer on it), I'd rather have this sooner than later. As a compromise, let's design the API with case 2 in mind, but only return cases 1 and 3 until we have enough information from the kernel to detect case 2 reliably. And, unless someone sends out a constructive proposal, let's defer the discussion of case 2 detection to the miniconference.
(And users do complain about rewinding on dcaenc, because this rewind creates one or two invalid DTS frames in the middle of every volume change, and some receivers are slow to resynchronize).
This, if enabled unconditionally, indeed would lead to unnecessary power consumption increase for those applications that don't use rewinds. But for applications that want low latency or dynamic latency, there is no other way to get it, so it is wrong to talk about power consumption increase for such applications.
Uhm, is the set of applications that want "low or dynamic latency" necessarily the same as the set that want rewinds?
It is a superset. Applications that want low latency don't necessarily want rewinds, but they want high interrupt rate => high power consumption anyway. Applications that want dynamic latency will do rewinds, because this is the only way to get it, except always defaulting to sometimes-unnecessary low latency.
But that might be more of an academic question, I guess PA is the only one that does rewinds currently. Or are there more apps out there that do rewinds?
Right now it is indeed an academic question. However, earlier in this year I was contacted by Andrew Eikum who was trying to use rewinds in Wine and insisted that it is an ALSA bug when I told him that they cannot work on the default device. I am not sure whether I convinced him, because I did not use the "unsend bluetooth packets" wording back then. I cannot quote him now, because this was via IM, so quoting would be impolite. And, the log is on the other PC anyway.
I'm not sure I buy this point completely, but I can't come up with a good counterexample either, so maybe you're right.
My argument is that, in the case of PulseAudio, these emulations would just create an unnecessary layer of complexity. PulseAudio actively wants to know the device type (via the is_seekable() call), so it is reasonable for it to also deal with consequences, without the need for additional processing in the driver or for an extra thread.
Agreed.
I understand that you have a mathematically perfect approach to this, as well as other algorithms. This would indeed be the best goal, but given an imperfect world, where we're forced to choose between
- no rewinding at all
- imperfect rewinding in the sense that it sometimes can produce
hearable artifacts
...I'm not sure 1) is always the right choice...
Should we have a snd_pcm_open (or whatever else) flag for accepting imperfect rewind results? [anyway, I would say "no" to using it in PulseAudio]
Well, if we had one, and it defaulted to the old behaviour, I suppose that would solve the new alsa-lib on old PA problem? But I'm not sure it's worth the added complexity.
Understood. Yes, it would solve the problem, and I have no opinion so far whether it is worth the cost of complexity and of the bad default.
This phrase from the documentation of snd_pcm_rewindable is also something to think about:
""" Note: The snd_pcm_rewind() can accept bigger value than returned by this function. But it is not guaranteed that output stream will be consistent with bigger value. """
Alexander E. Patrakov wrote:
This phrase from the documentation of snd_pcm_rewindable is also something to think about:
""" Note: The snd_pcm_rewind() can accept bigger value than returned by this function. But it is not guaranteed that output stream will be consistent with bigger value.
This phrase again assumes a ring buffer, i.e., it says that 1) there is a buffer, and applications can write _anywhere_ in it, but 2) if a plugin reduces the allowed range (with snd_pcm_rewindable()), it is likely that the plugin will not notice that some part of the buffer has changed.
(The first part is not true for most unrewindable plugins.)
Regards, Clemens
On my desktop PC, on snd-hda-intel with analog outputs for S16LE stereo,
the granularity is 32 bytes (= 8 samples), and I get the pointer granularity of 64 bytes (=16 samples) over HDMI. The minimum period size is 32 samples in both cases.
Do you mean hda-Intel does not support arbritray period size when you say the granularity is 32 bytes ?
However the granularity of the emulated hda sound card inside any VM depend on the vm and the backend audio system and sound card
However the granularity of the emulated hda sound card inside any VM
depend on the vm and the backend audio system and sound card
Obviously it is a "virtual granularity".
ACC On Sep 11, 2014 12:49 AM, "Raymond Yau" superquad.vortex2@gmail.com wrote:
On my desktop PC, on snd-hda-intel with analog outputs for S16LE stereo,
the granularity is 32 bytes (= 8 samples), and I get the pointer granularity of 64 bytes (=16 samples) over HDMI. The minimum period size is 32 samples in both cases.
Do you mean hda-Intel does not support arbritray period size when you say the granularity is 32 bytes ?
However the granularity of the emulated hda sound card inside any VM depend on the vm and the backend audio system and sound card _______________________________________________ Alsa-devel mailing list Alsa-devel@alsa-project.org http://mailman.alsa-project.org/mailman/listinfo/alsa-devel
However the granularity of the emulated hda sound card inside any VM
depend on the vm and the backend audio system and sound card
Obviously it is a "virtual granularity".
Seem emulate dma transfer
http://git.kernel.org/cgit/virt/kvm/qemu-kvm.git/plain/hw/intel-hda.c
But not all backend can provide same granularity
11.09.2014 09:49, Raymond Yau wrote:
On my desktop PC, on snd-hda-intel with analog outputs for S16LE
stereo, the granularity is 32 bytes (= 8 samples), and I get the pointer granularity of 64 bytes (=16 samples) over HDMI. The minimum period size is 32 samples in both cases.
Do you mean hda-Intel does not support arbritray period size when you say the granularity is 32 bytes ?
However the granularity of the emulated hda sound card inside any VM depend on the vm and the backend audio system and sound card
The precise meaning is defined here:
http://mailman.alsa-project.org/pipermail/alsa-devel/2014-May/076475.html
Of course the VM result would be different.
2014-9-11 下午1:28 於 "Alexander E. Patrakov" patrakov@gmail.com 寫道:
11.09.2014 09:49, Raymond Yau wrote:
On my desktop PC, on snd-hda-intel with analog outputs for S16LE
stereo, the granularity is 32 bytes (= 8 samples), and I get the pointer granularity of 64 bytes (=16 samples) over HDMI. The minimum period size is 32 samples in both cases.
Do you mean hda-Intel does not support arbritray period size when you say the granularity is 32 bytes ?
However the granularity of the emulated hda sound card inside any VM depend on the vm and the backend audio system and sound card
The precise meaning is defined here:
http://mailman.alsa-project.org/pipermail/alsa-devel/2014-May/076475.html
Of course the VM result would be different.
http://git.kernel.org/cgit/virt/kvm/qemu-kvm.git/plain/hw/hda-audio.c
/* audio in/out widget */ case AC_VERB_SET_CHANNEL_STREAMID: st = a->st + node->stindex; if (st->node == NULL) { goto fail; } hda_audio_set_running(st, false); st->stream = (payload >> 4) & 0x0f; st->channel = payload & 0x0f; dprint(a, 2, "%s: stream %d, channel %d\n", st->node->name, st->stream, st->channel); hda_audio_set_running(st, a->running_real[st->output * 16 + st->stream]); hda_codec_response(hda, true, 0); break;
It seem some vm just emulate start/stop of playback/capture , it does not update any hda controller registers, this mean that it won't support rewind
On my desktop PC, on snd-hda-intel with analog outputs for S16LE
stereo, the granularity is 32 bytes (= 8 samples), and I get the pointer granularity of 64 bytes (=16 samples) over HDMI. The minimum period size is 32 samples in both cases.
Do you mean hda-Intel does not support arbritray period size when you say the granularity is 32 bytes ?
However the granularity of the emulated hda sound card inside any VM depend on the vm and the backend audio system and sound card
The precise meaning is defined here:
http://mailman.alsa-project.org/pipermail/alsa-devel/2014-May/076475.html
Do you mean the different hda controllers may have different granularity ?
Do your two hda controllers have different Fifo size ?
13.09.2014 14:57, Raymond Yau wrote:
On my desktop PC, on snd-hda-intel with analog outputs for S16LE
stereo, the granularity is 32 bytes (= 8 samples), and I get the pointer granularity of 64 bytes (=16 samples) over HDMI. The minimum period size is 32 samples in both cases.
Do you mean hda-Intel does not support arbritray period size when you say the granularity is 32 bytes ?
However the granularity of the emulated hda sound card inside any VM depend on the vm and the backend audio system and sound card
The precise meaning is defined here:
http://mailman.alsa-project.org/pipermail/alsa-devel/2014-May/076475.html
Do you mean the different hda controllers may have different granularity ?
Yes. My two hda controllers have different granularity.
Do your two hda controllers have different Fifo size ?
If you mean the result of snd_pcm_hw_params_get_fifo_size(), then both cards return 0. I call this function after snd_pcm_hw_params(), as recommended by the documentation.
2014-9-13 下午6:43 於 "Alexander E. Patrakov" patrakov@gmail.com 寫道:
13.09.2014 14:57, Raymond Yau wrote:
On my desktop PC, on snd-hda-intel with analog outputs for S16LE
stereo, the granularity is 32 bytes (= 8 samples), and I get the
pointer
granularity of 64 bytes (=16 samples) over HDMI. The minimum period
size
is 32 samples in both cases.
Do you mean hda-Intel does not support arbritray period size when you say the granularity is 32 bytes ?
However the granularity of the emulated hda sound card inside any VM depend on the vm and the backend audio system and sound card
The precise meaning is defined here:
http://mailman.alsa-project.org/pipermail/alsa-devel/2014-May/076475.html
Do you mean the different hda controllers may have different granularity
?
Yes. My two hda controllers have different granularity.
Do your two hda controllers have different Fifo size ?
If you mean the result of snd_pcm_hw_params_get_fifo_size(), then both
cards return 0. I call this function after snd_pcm_hw_params(), as recommended by the documentation.
Do your hda controllers OSDnFIFOS register match the granularity ?
3.3.40 Offset 90: {IOB}SDnFIFOS – Input/Output/Bidirectional Stream Descriptor n FIFO Size
Length: 2 bytes
Table 39. Stream Descriptor n FIFO Size Bit Type Reset Description 15:0 RO Imp.Dep
FIFO Size (FIFOS): Indicates the maximum number of bytes that could be fetched by the controller at one time. This is the maximum number of bytes that may have been DMA‟d into memory but not yet transmitted on the link, and is also the maximum possible value that the LPIB count will increase by at one time. This number may be static to indicate a static buffer size, or may change after the data format has been programmed if the controller is able to vary its FIFO size based on the stream format. If it is able to change value after the data format has been programmed, the value update must happen immediately before the next read of the FIFOS register, and remain static until the next programming of data format.
13.09.2014 17:33, Raymond Yau wrote:
2014-9-13 下午6:43 於 "Alexander E. Patrakov" <patrakov@gmail.com mailto:patrakov@gmail.com> 寫道:
13.09.2014 14:57, Raymond Yau wrote:
On my desktop PC, on snd-hda-intel with analog outputs for S16LE
stereo, the granularity is 32 bytes (= 8 samples), and I get the
pointer
granularity of 64 bytes (=16 samples) over HDMI. The minimum
period size
is 32 samples in both cases.
Do you mean hda-Intel does not support arbritray period size
when you
say the granularity is 32 bytes ?
However the granularity of the emulated hda sound card inside any VM depend on the vm and the backend audio system and sound card
The precise meaning is defined here:
http://mailman.alsa-project.org/pipermail/alsa-devel/2014-May/076475.html
Do you mean the different hda controllers may have different
granularity ?
Yes. My two hda controllers have different granularity.
Do your two hda controllers have different Fifo size ?
If you mean the result of snd_pcm_hw_params_get_fifo_size(), then
both cards return 0. I call this function after snd_pcm_hw_params(), as recommended by the documentation.
Do your hda controllers OSDnFIFOS register match the granularity ?
I don't know. Please send a program or a kernel patch that prints this value (including units - samples or bytes), otherwise I won't be able to answer.
07.09.2014 21:16, Alexander E. Patrakov wrote:
- PulseAudio does not call snd_pcm_rewindable(), because for some ALSA
plugins it crashed. This crash is completely fixed in alsa-lib 1.0.28, but in some cases snd_pcm_rewindable() still returns wrong results.
Bad news: even in 1.0.28, snd_pcm_rewindable() crashes for the file plugin due to recursion. So I was wrong when saying "fixed completely" :(
- On the hw plugin, I could demonstrate two other bugs regarding
snd_pcm_rewindable(): stale data and bogus negative return values.
For all of the above issues, I have sent patches.
=== On the rewind safeguard ===
Result 1: it has been decided that the return value of snd_pcm_rewindable() is not changed, and the safeguard is returned by a separate function. This would require documentation changes for snd_pcm_rewindable(), though, as it officially no longer returns "safe count of frames which can be rewinded". I have difficulty designing a better wording what the function actually means now.
Result 2: the proposed heuristic has been rightfully and convincingly busted, but no alternatives were proposed. It is the top priority to get some constructive proposal here where to get the data, or a fallback plan for alsa-lib. A fallback plan is also needed for old kernels if the constructive proposal involves kernel changes.
=== On non-rewindability of the rate plugin ===
Nothing more to discuss.
=== On possibly-incomplete rewindability of the file plugin ===
Nothing more to discuss.
=== On bogus rewindability of ladspa and extplug plugins ===
No conclusion. The proposed hack to leave the .rewind implementation in place for use by old PulseAudio has met some opposition, but not such definite and convincing opposition as to the rewind-safeguard and low-latency-thread proposals. The proposed alternative (to make a "disallow imperfect rewinds" flag, off by default) means more work for no gain, and David also doubts whether such flag is worth the complexity. It's also notable that nobody explicitly said "let's regress old pulseaudio on new alsa-lib on the obscure dca plugin in the name of clean code" - which would have also closed the question :)
In fact, I am not really sure that everyone understands the problem.
=== On bogus rewindability of some ioplug-based plugins ===
The "low-latency-thread in ioplug" idea has been rightfully busted, and transformed into the "flexiblerewind" plugin idea. However, that plugin found no users (as PulseAudio won't use it), and thus there is no incentive for me or anyone else to implement it. Nothing left to discuss.
=== On the pulse plugin ===
We have reached an agreement that .rewindable and .rewind should be added to ioplug and used by the pulse plugin. Nothing more to discuss.
=== On communication of non-rewindability to the program ===
We have very good progress here: recognition of three rewindability classes that seemingly nobody objects to. The wanted behavior of PulseAudio for each rewindability class is not something we currently fully agree upon, though, but I think that it will be more appropriate to discuss further when someone sends a PulseAudio patch making use of these classes. I.e. not now. It is very good that there is at least one more person (Pierre Bossart) who understands the inherent conflict of requirements.
Transition plan remains to be discussed, but that's the same discussion as the one called for in the "rewind safeguard" part of the post.
=== On the programmer expectations ===
(social issue)
Resolved (was misinterpretation on my side). I have to submit the documentation patch.
=== On the rewind safeguard ===
Result 1: it has been decided that the return value of
snd_pcm_rewindable() is not changed, and the safeguard is returned by a separate function.
It is unlikely to return any value which is safe, it is the responsiability of the application to decide how much can be rewind
If pulseaudio assume 20ms process time is require to process two seconds of audio and sleep for 1980ms, it should not assume cpu have infinite power
what is the purpose of the timestamp ?
Can the timestamp use to predict when will next period update occur if the timestamp is obtained at previous period update ?
Why snd_pcm_rewind cannot return error when application just set stop threshold to buffer size and rewind more than the stop threshold ?
:This would require documentation changes for snd_pcm_rewindable(), though, as it officially no longer returns "safe count of frames which can be rewinded". I have difficulty designing a better wording what the function actually means now.
The word "safe" should be removed
14.09.2014 17:37, Raymond Yau wrote:
=== On the rewind safeguard ===
Result 1: it has been decided that the return value of
snd_pcm_rewindable() is not changed, and the safeguard is returned by a separate function.
It is unlikely to return any value which is safe, it is the responsiability of the application to decide how much can be rewind
You are placing a responsibility on an application without giving it any means to make an informed decision. E.g. 4 ms is OK on snd-hda-intel, but definitely not OK on ymfpci even on infinitely fast CPU (because of the fixed 5 ms interrupt interval). The whole question here is: how is an application supposed to know that?
If pulseaudio assume 20ms process time is require to process two seconds of audio and sleep for 1980ms, it should not assume cpu have infinite power
You are right. The safeguard interval is a sum of the CPU-independent but card-specific part (which is what is being talked about) and a CPU-specific part (which the application indeed should know).
what is the purpose of the timestamp ?
Can the timestamp use to predict when will next period update occur if the timestamp is obtained at previous period update ?
See http://0pointer.de/blog/projects/pulse-glitch-free.html
Why snd_pcm_rewind cannot return error when application just set stop threshold to buffer size and rewind more than the stop threshold ?
I think it does. The problem (due to which we need a safeguard) is that the card-specific minimum number of not-really-rewindable samples exists. E.g., for ymfpci, that would be 5 ms.
:This would require documentation changes for snd_pcm_rewindable(), though, as it officially no longer returns "safe count of frames which can be rewinded". I have difficulty designing a better wording what the function actually means now.
The word "safe" should be removed
That's insufficient. The returned value is valid only in the ring-buffer model for a card that fetches samples absolutely uniformly one at a time, i.e. without any batching, updates the pointer at every played-back sample, and doesn't have any "don't overwrite what I am DMA-ing or I will kill you with IRQs" quirk. I.e., even with the infinitely fast CPU, an attempt to rewind as much as that function returns (and then overwrite) will yield a glitch (that may or may not be detected as xrun even if xrun detection is enabled).
=== On the rewind safeguard ===
Result 1: it has been decided that the return value of
snd_pcm_rewindable() is not changed, and the safeguard is returned by a separate function.
It is unlikely to return any value which is safe, it is the responsiability of the application to decide how much can be rewind
You are placing a responsibility on an application without giving it any
means to make an informed decision. E.g. 4 ms is OK on snd-hda-intel, but definitely not OK on ymfpci even on infinitely fast CPU (because of the fixed 5 ms interrupt interval). The whole question here is: how is an application supposed to know that?
Take a look at patent US 20100131783
System and Method of Dynamically Switching Queue Threshold
HDA may have different fifo threshold in different power states, the granularity is not fixed
Twice the minimum period size/time is not any over estimate
Glitching still occurs at switch sink / change in power state when you allow sound card run with lowest latency ?
I don't think your proposal of having three different class of granularity is good idea
At Sun, 14 Sep 2014 18:07:23 +0600, Alexander E. Patrakov wrote:
14.09.2014 17:37, Raymond Yau wrote:
=== On the rewind safeguard ===
Result 1: it has been decided that the return value of
snd_pcm_rewindable() is not changed, and the safeguard is returned by a separate function.
It is unlikely to return any value which is safe, it is the responsiability of the application to decide how much can be rewind
You are placing a responsibility on an application without giving it any means to make an informed decision. E.g. 4 ms is OK on snd-hda-intel, but definitely not OK on ymfpci even on infinitely fast CPU (because of the fixed 5 ms interrupt interval). The whole question here is: how is an application supposed to know that?
Well, maybe the word "safeguard" is somewhat confusing to be used as a driver API. There is no "safety" at all there. There is only "theoretically minimal" (and it often lies even if the hardware chip says so). How much value to be taken as "safeguard" is rather a choice by each application or sound backend.
Right now, the only information we give from the sound driver is INFO_BATCH flag. And I agree with a bit more detailed information to be exposed from the driver -- but only if possible. This must be an optional information and not mandatory.
Takashi
15.09.2014 15:19, Takashi Iwai wrote:
At Sun, 14 Sep 2014 18:07:23 +0600, Alexander E. Patrakov wrote:
14.09.2014 17:37, Raymond Yau wrote:
=== On the rewind safeguard ===
Result 1: it has been decided that the return value of
snd_pcm_rewindable() is not changed, and the safeguard is returned by a separate function.
It is unlikely to return any value which is safe, it is the responsiability of the application to decide how much can be rewind
You are placing a responsibility on an application without giving it any means to make an informed decision. E.g. 4 ms is OK on snd-hda-intel, but definitely not OK on ymfpci even on infinitely fast CPU (because of the fixed 5 ms interrupt interval). The whole question here is: how is an application supposed to know that?
Well, maybe the word "safeguard" is somewhat confusing to be used as a driver API. There is no "safety" at all there. There is only "theoretically minimal" (and it often lies even if the hardware chip says so). How much value to be taken as "safeguard" is rather a choice by each application or sound backend.
I agree with this "bad wording" remark. Let's talk about the "theoretically minimal non-rewindable amount of samples" from now on.
Right now, the only information we give from the sound driver is INFO_BATCH flag. And I agree with a bit more detailed information to be exposed from the driver -- but only if possible. This must be an optional information and not mandatory.
Here I mostly agree.
Indeed, this INFO_BATCH flag exists. However, its documentation is so vague that IMHO a documentation patch is needed to legitimize its use as a "you must not leave less than one period when rewinding" indicator. Also, the documentation says about snd_pcm_hw_params_is_batch(): "This function should only be called when the configuration space contains a single configuration" - I believe that it is an error (and PulseAudio calls it without obeying this restriction), because this function is helpful exactly for choosing hardware parameters such as period size. May I write "if the configuration space contains more than one configuration, the result indicates whether a configuration exists where such double-buffering is done"? Also the documentation talks about "hardware", should I extend it to the whole plugin chain?
As for the desire to export other information if available, it is certainly good.
What remains not fully understood for me is the claim that the information already exposed by every driver (in the form of the minimal period size) is not useful. I understand that two people are against this idea, so it must be bad. But I must understand why. Is it because the minimum period size reported by some drivers (which ones are suspected, if any?) may be a lie?
Clemens Ladisch mentioned USB as a counterexample, but it is a batch device for which the current period size is more relevant than the minimal one. Pierre-Louis Bossart just said that it is "really asking for trouble" without much explanation.
Can anyone name a sound controller that is not of a batch variety, where the theoretically minimal non-rewindable amount of samples is, at some settings, higher than the reported minimum period size at the same number of channels, sample rate and sample format?
[and yes, I understand that I can always cheat and say: let's change the definition of INFO_BATCH so that it includes such cases]
At Mon, 15 Sep 2014 15:58:42 +0600, Alexander E. Patrakov wrote:
15.09.2014 15:19, Takashi Iwai wrote:
At Sun, 14 Sep 2014 18:07:23 +0600, Alexander E. Patrakov wrote:
14.09.2014 17:37, Raymond Yau wrote:
=== On the rewind safeguard ===
Result 1: it has been decided that the return value of
snd_pcm_rewindable() is not changed, and the safeguard is returned by a separate function.
It is unlikely to return any value which is safe, it is the responsiability of the application to decide how much can be rewind
You are placing a responsibility on an application without giving it any means to make an informed decision. E.g. 4 ms is OK on snd-hda-intel, but definitely not OK on ymfpci even on infinitely fast CPU (because of the fixed 5 ms interrupt interval). The whole question here is: how is an application supposed to know that?
Well, maybe the word "safeguard" is somewhat confusing to be used as a driver API. There is no "safety" at all there. There is only "theoretically minimal" (and it often lies even if the hardware chip says so). How much value to be taken as "safeguard" is rather a choice by each application or sound backend.
I agree with this "bad wording" remark. Let's talk about the "theoretically minimal non-rewindable amount of samples" from now on.
Right now, the only information we give from the sound driver is INFO_BATCH flag. And I agree with a bit more detailed information to be exposed from the driver -- but only if possible. This must be an optional information and not mandatory.
Here I mostly agree.
Indeed, this INFO_BATCH flag exists. However, its documentation is so vague that IMHO a documentation patch is needed to legitimize its use as a "you must not leave less than one period when rewinding" indicator. Also, the documentation says about snd_pcm_hw_params_is_batch(): "This function should only be called when the configuration space contains a single configuration" - I believe that it is an error (and PulseAudio calls it without obeying this restriction), because this function is helpful exactly for choosing hardware parameters such as period size. May I write "if the configuration space contains more than one configuration, the result indicates whether a configuration exists where such double-buffering is done"? Also the documentation talks about "hardware", should I extend it to the whole plugin chain?
The whole hw_params_*() descriptions are something like bot-style tweets, we need a bit better texts there :)
The batch parameter was (and partly is) indeed very ambiguous. Its usage has been materialized a few years ago.
As for the desire to export other information if available, it is certainly good.
What remains not fully understood for me is the claim that the information already exposed by every driver (in the form of the minimal period size) is not useful. I understand that two people are against this idea, so it must be bad. But I must understand why. Is it because the minimum period size reported by some drivers (which ones are suspected, if any?) may be a lie?
A kind of yes. Many drivers, especially the old ones, set the minimal period size without actually knowing the real limit. It tends to be smaller than the hardware really supports. This is, in most cases, just because no hardware spec defines that. So, it can't be blindly taken as the bottom line, unfortunately. That's why I suggested the new field would be optional; we simply don't know the value.
Takashi
Clemens Ladisch mentioned USB as a counterexample, but it is a batch device for which the current period size is more relevant than the minimal one. Pierre-Louis Bossart just said that it is "really asking for trouble" without much explanation.
Can anyone name a sound controller that is not of a batch variety, where the theoretically minimal non-rewindable amount of samples is, at some settings, higher than the reported minimum period size at the same number of channels, sample rate and sample format?
[and yes, I understand that I can always cheat and say: let's change the definition of INFO_BATCH so that it includes such cases]
-- Alexander E. Patrakov
What remains not fully understood for me is the claim that the information already exposed by every driver (in the form of the minimal period size) is not useful. I understand that two people are against this idea, so it must be bad. But I must understand why. Is it because the minimum period size reported by some drivers (which ones are suspected, if any?) may be a lie?
A kind of yes. Many drivers, especially the old ones, set the minimal period size without actually knowing the real limit. It tends to be smaller than the hardware really supports. This is, in most cases, just because no hardware spec defines that. So, it can't be blindly taken as the bottom line, unfortunately. That's why I suggested the new field would be optional; we simply don't know the value.
Agree with Takashi, even recent audio IP tend to be reused in various ways (buses, system agents, arbiters, DMA controllers, DDR controllers) and no one really knows what the rewind granularity is, it's not a metric that's tracked.
I actually liked the heuristic that's present in PulseAudio: constrain the safeguard to 256 bytes or 1ms (the last part would actually be fixed, it's currently not dependent on the sink actual rate and number of channels). we could introduce an optional query drivers for this but I wonder if it's worth the effort. -Pierre
15.09.2014 23:01, Pierre-Louis Bossart wrote:
What remains not fully understood for me is the claim that the information already exposed by every driver (in the form of the minimal period size) is not useful. I understand that two people are against this idea, so it must be bad. But I must understand why. Is it because the minimum period size reported by some drivers (which ones are suspected, if any?) may be a lie?
A kind of yes. Many drivers, especially the old ones, set the minimal period size without actually knowing the real limit. It tends to be smaller than the hardware really supports. This is, in most cases, just because no hardware spec defines that. So, it can't be blindly taken as the bottom line, unfortunately. That's why I suggested the new field would be optional; we simply don't know the value.
Agree with Takashi, even recent audio IP tend to be reused in various ways (buses, system agents, arbiters, DMA controllers, DDR controllers) and no one really knows what the rewind granularity is, it's not a metric that's tracked.
I actually liked the heuristic that's present in PulseAudio: constrain the safeguard to 256 bytes or 1ms (the last part would actually be fixed, it's currently not dependent on the sink actual rate and number of channels). we could introduce an optional query drivers for this but I wonder if it's worth the effort.
OK, a direct question then.
The "256 bytes or 1 ms" heuristic is known not to work on ymfpci. Should we bump it to "5.3 ms or 256 samples", or make configurable, or ask the ymfpci maintainer to add the INFO_BATCH flag to that driver?
At Mon, 15 Sep 2014 23:14:41 +0600, Alexander E. Patrakov wrote:
15.09.2014 23:01, Pierre-Louis Bossart wrote:
What remains not fully understood for me is the claim that the information already exposed by every driver (in the form of the minimal period size) is not useful. I understand that two people are against this idea, so it must be bad. But I must understand why. Is it because the minimum period size reported by some drivers (which ones are suspected, if any?) may be a lie?
A kind of yes. Many drivers, especially the old ones, set the minimal period size without actually knowing the real limit. It tends to be smaller than the hardware really supports. This is, in most cases, just because no hardware spec defines that. So, it can't be blindly taken as the bottom line, unfortunately. That's why I suggested the new field would be optional; we simply don't know the value.
Agree with Takashi, even recent audio IP tend to be reused in various ways (buses, system agents, arbiters, DMA controllers, DDR controllers) and no one really knows what the rewind granularity is, it's not a metric that's tracked.
I actually liked the heuristic that's present in PulseAudio: constrain the safeguard to 256 bytes or 1ms (the last part would actually be fixed, it's currently not dependent on the sink actual rate and number of channels). we could introduce an optional query drivers for this but I wonder if it's worth the effort.
OK, a direct question then.
The "256 bytes or 1 ms" heuristic is known not to work on ymfpci. Should we bump it to "5.3 ms or 256 samples", or make configurable, or ask the ymfpci maintainer to add the INFO_BATCH flag to that driver?
Adding INFO_BATCH would be a quick solution, indeed, although this isn't 100% right.
Takashi
What remains not fully understood for me is the claim that the information already exposed by every driver (in the form of the
minimal
period size) is not useful. I understand that two people are against this idea, so it must be bad. But I must understand why. Is it
because
the minimum period size reported by some drivers (which ones are suspected, if any?) may be a lie?
Does this mean the granularity of most drivers are only one period since most of them cannot reporte the dma position in realtime ?
(e.g. pointer callback of Intel8x0 use a timeout loop to read the last valid index)
The safeguard will be two periods
If some HDA have FIFO size of 192 bytes which is more than the minimum period bytes
Should we limit the start threshold to FIFO threshold or FIFO size ?
Does it need Brust length = 1 for those hda controller to support arbitrary period size ?
Seem only some hda controller can trigger DMA transfer at 1/3 or 2/3 FIFO buffer at different power states and report the dma position in pointer callback
Does snd-oxygen provide this position with granularity which is less than the minimum period size ?
18.09.2014 07:15, Raymond Yau wrote:
What remains not fully understood for me is the claim that the information already exposed by every driver (in the form of the
minimal
period size) is not useful. I understand that two people are
against
this idea, so it must be bad. But I must understand why. Is it
because
the minimum period size reported by some drivers (which ones are suspected, if any?) may be a lie?
Does this mean the granularity of most drivers are only one period since most of them cannot reporte the dma position in realtime ?
I guess you are right.
(e.g. pointer callback of Intel8x0 use a timeout loop to read the last valid index)
The safeguard will be two periods
Here I don't see how that can be right. Even if the position is updated at each period interrupt, we always know which period is currently playing. So the theoretical minimum safeguard is exactly one period. Add 1 ms on top if you want to account for scheduler glitches.
If some HDA have FIFO size of 192 bytes which is more than the minimum period bytes
Should we limit the start threshold to FIFO threshold or FIFO size ?
I think that in this case it is a bug to report such small minimum period size.
Does it need Brust length = 1 for those hda controller to support arbitrary period size ?
Seem only some hda controller can trigger DMA transfer at 1/3 or 2/3 FIFO buffer at different power states and report the dma position in pointer callback
I am not a specialist in HDA hardware.
Does snd-oxygen provide this position with granularity which is less than the minimum period size ?
I have a friend with a Xonar card, will ask him to perform a test for you when he appears online.
Raymond Yau wrote:
Does snd-oxygen provide this position with granularity which is less than the minimum period size ?
Yes. Its DMA controller uses a burst size of 32 bytes. The pointer registers are documented as reporting the position with 4-byte accuracy, but there were problems when the period/buffer sizes were not aligned to 32 bytes. I've set the minimum period size to 64 bytes, just to be safe.
Regards, Clemens
21.09.2014 15:53, Clemens Ladisch wrote:
Raymond Yau wrote:
Does snd-oxygen provide this position with granularity which is less than the minimum period size ?
Yes. Its DMA controller uses a burst size of 32 bytes. The pointer registers are documented as reporting the position with 4-byte accuracy, but there were problems when the period/buffer sizes were not aligned to 32 bytes. I've set the minimum period size to 64 bytes, just to be safe.
Well, my friend has tested the card anyway, and I am afraid that the situation is in fact more complex than documented. Look how fast (8 samples per ioctl in a busy loop!) the card eats the first 256 samples. This is very suspicious, and I am not really sure whether we can actually perform a rewind over these 256 samples.
The "usual" step of the pointer updates (via snd_pcm_avail) is indeed 8 samples = 32 bytes with some fine-grained updates sometimes showing up in the middle, which matches the documented burst size.
Does snd-oxygen provide this position with granularity which is less
than
the minimum period size ?
Yes. Its DMA controller uses a burst size of 32 bytes. The pointer
registers
are documented as reporting the position with 4-byte accuracy, but there
were
problems when the period/buffer sizes were not aligned to 32 bytes. I've
set
the minimum period size to 64 bytes, just to be safe.
The brust size is usually added to the constriants but the application cannot deduce this value
err = snd_pcm_hw_constraint_step(runtime, 0, SNDRV_PCM_HW_PARAM_PERIOD_BYTES, 32);
For those audio controllers have FIFO buffer, the application must keep the FIFO buffer full at any time for glitch free.
Someone may argue that keeping at least one frame above FIFO threshold is just enough
http://mailman.alsa-project.org/pipermail/alsa-devel/2014-September/081509.h...
I don't understand why the minimum buffer size can be less than FIFO size
participants (9)
-
A. C. Censi
-
Alexander E. Patrakov
-
Clemens Ladisch
-
David Henningsson
-
Lars-Peter Clausen
-
Pierre-Louis Bossart
-
Raymond Yau
-
Takashi Iwai
-
Tanu Kaskinen