[alsa-devel] Testing rewindability of a pcm
Hello.
[David: if I misrepresent the result of our discussion, please correct me.]
On the pulseaudio-discuss list, there was a thread started by me:
http://lists.freedesktop.org/archives/pulseaudio-discuss/2014-April/020449.h...
The initial patch attempted to work around known brokenness of rewind handling in certain ALSA plugins by keeping a whitelist of ALSA plugin types known to implement full or almost-full rewinds absolutely correctly, and falling back to the low-latency no-rewinds mode on others (such as extplug, which only pretends to support rewinds). While David accepted that it would be a good idea to use a not-so-high latency (he proposed ~150-200 ms) if rewinds are not available, keeping a plugin whitelist inside PulseAudio code was shot down as a workaround for an ALSA bug. The alternative proposal is to fix ALSA so that it correctly reports rewindability, and use the reported data, which looks like a good idea. However, there remain some disagreements on the details how ALSA needs to be fixed, that's why this e-mail.
David suggests that PulseAudio should use the snd_pcm_rewindable() function when all plugins are fixed to either implement rewinds correctly or to return 0 if they don't actually support rewinds. And for ioplug and extplug, supporting rewinds is not really an option, because the backend library (which can be an AC3 or DTS encoder) does not necessarily have a rewindable API. <offtopic><troll>And we don't even have a rewindable resampler implementation!</troll></offtopic>
While fixing snd_pcm_rewindable() is indeed a valid task that I take for some plugins, I have a question.
""" Should PulseAudio indeed use a complex dance with snd_pcm_rewindable() to get the essentially-static bit of information whether a given PCM supports full rewinds? Isn't there a better solution? """
The problem is that the information should be used to set the maximum allowed latency of the sink: a PCM that supports full rewinds can be set to 2 seconds of latency, while for a PCM that doesn't support rewinds, 150-200 ms is more appropriate. In other words, ideally, one bit of information about the future full-rewind support should be available at the server startup without any complex dance with test audio streams. Ideally, even before setting hw_params, but "just after" also seems to be acceptable. And at this stage (before or just after setting hw_params), snd_pcm_rewindable() is useless, because it returns 0 even for the hw plugin which does support full rewinds (rightfully, because the card initially is just about to underrun). Therefore, if the solution is indeed to use snd_pcm_rewindable(), then it is needed to write some test data to the pcm, and only then test for rewindability.
Alternatives are below, please help choosing:
1) Use a test stream and call snd_pcm_rewindable() when there is a non-zero amount of data in the buffer: I don't like it. Too complex dance, with side effects, to get one bit of essentially-static information that the plugin knows for sure even before setting hw_params.
2) Use snd_pcm_forwardable() with an empty buffer just after setting hw_params: works (once the implementation in ioplug/extplug is fixed to return 0), doesn't require API additions, but David doesn't like it. He thinks that the sets of PCMs that support forwarding and rewinding may be different (here I disagree, but can't provide arguments). Additionally, forwarding the application pointer without writing anything and rewinding over that beforehand looks like undefined behaviour (which looks like a valid argument, but I am not 100% sure, and I am not actually forwarding, I am just asking how much I can forward).
3) Add a new function to ALSA that gets this bit of static information for a given pcm handle, use it. Drawbacks: a new function is needed, and it's a bad idea to send a patch that adds a new public ALSA API function without discussing here first (that's why the e-mail).
4) Write your own alternative here if you have one.
Alexander E. Patrakov wrote:
And we don't even have a rewindable resampler implementation!
The rate plugin implements rewinding. (I haven't tested if this implementation is correct. What is the problem?)
Should PulseAudio indeed use a complex dance with snd_pcm_rewindable() to get the essentially-static bit of information whether a given PCM supports full rewinds?
As far as I can tell, ALSA assumes that _every_ device allows random access to its ring buffer.
In particular, both alsa-lib's internal API and the explug/ioplug APIs, when writing samples, have no concept of "current position" and write to an explicitly specified position in the buffer. The implementation if snd_pcm_rewindable() assumes that the entire buffer is accessible (the only plugin to change this is the file plugin).
snd_pcm_rewindable/forwardable() are just for determining the possible amount of movement, not whether it's possible at all.
In other words: any plugin that does not allow random access is, as far as the ALSA API is concerned, buggy.
- Use snd_pcm_forwardable() with an empty buffer just after setting hw_params: works (once the implementation in ioplug/extplug is fixed to return 0), doesn't require API additions,
If ALSA is changed to track the information whether a device allows random access or not, this would be the correct way.
but David doesn't like it. He thinks that the sets of PCMs that support forwarding and rewinding may be different (here I disagree, but can't provide arguments).
It's possible to imagine devices that implement forwarding by just inserting zero samples, but cannot go back in time. (I'm not sure if implemting that would make any sense.)
- Add a new function to ALSA that gets this bit of static information for a given pcm handle, use it.
I guess this will happen.
Regards, Clemens
24.04.2014 18:00, Clemens Ladisch wrote:
Alexander E. Patrakov wrote:
And we don't even have a rewindable resampler implementation!
The rate plugin implements rewinding. (I haven't tested if this implementation is correct. What is the problem?)
I was going to demonstrate a program that produces incorrect output, but instead got a program that crashes. Still, on extplug with upmix, it does demonstrate essentially the same issue with internal state nicely, search for "=====" below. And the speculation text below is the one about incorrect output that I wrote after looking at the pcm_rate.c code but which is not testable right now. So much for the failed trolling attempt :(
<speculation> The implementation of rewinds in the rate plugin is indeed incorrect. I will explain this now and show the bigger problem then (after a "+++++").
When resampling, each of the input samples affects more that one output sample, and each output sample is affected by more than one input sample. In the windowed-sinc approach to resampling (which is implemented by speex, libsamplerate and ffmpeg), this is determined by the width of the window, which is typically several (8..100) samples and depends on the rate ratio and resampler quality setting.
As a consequence, each output fragment (received from rate->ops.convert_s16 or from rate->ops.convert in do_convert() into dst_areas) is affected not only by the samples in the input fragment (src_areas), but also by the samples in the previous input fragment. I.e. by a few samples that are logically just before src_offset. Each sample rate converter, however, keeps a copy of them in its own history buffer (which is called "magic_samples" in speex), instead of looking before src_offset.
The important part of the above is that a sample rate converter is not stateless, and that this state left over from the previous input can yield non-silent output fragment as a result of conversion of a fragment of silence.
Let's also note an implicit precondition that do_convert is based on. The precondition is that the internal state of the resampler (i.e rate->obj) should match what was there after converting the previous samples (i.e. those that are just before src_offset now). Without rewinds, this precondition is trivially satisfied, as nobody touches rate->obj between converting the previous fragment and this fragment.
With rewinds, this precondition is violated: consider a program that uses 4 periods, prefills them with silence, and then, in a loop, rewinds a period, writes a period of silence, and writes a period of square waves. For your convenience, this program is attached. To test the rate plugin, edit it in order to use the correct device (such as plughw:2) and a rate (such as 32000 Hz) that the card doesn't support. As it always overwrites square waves with silence before the hardware pointer goes over them, the only correct output from this program on any device that properly supports rewinds is pure silence (unless the program complains about anything).
If I understand the code correctly (which, given the crash, is false), the rate plugin sometimes would give the square wave to the resampler as an input. And it never tells the resampler to forget that - there is no such op in snd_pcm_rate_ops_t! In fact, the rewind path (snd_pcm_rate_rewind, snd_pcm_rate_move_applptr) calls only read-only functions on obj, which is what I object to. However, see below why my requirements (which are indeed valid requirements if the rate plugin must implement rewinds correctly) are actually unreasonable. </speculation>
Now to the crash that was totally unexpected. The problem is that, in snd_pcm_rate_fast_ops, the .rewindable callback gets defaulted to NULL and that got dereferenced through the snd_pcm_rewindable() call in my test program. OK, we can comment out the call to snd_pcm_rewindable() and the check. But then, with alsa-lib 1.0.27.2, snd_pcm_rewind() would return negative numbers close to minus buffer size! I do see a commit by Andrew Eikum in the recent git history, but that only papers over the problem, as it was never explained where these negative numbers came from.
Installation of alsa-lib from git yields snd_pcm_rewind() that forgets ~4090 samples out of 1024 requested. Also not usable. And worse, the program produces a lot of steady sound - so here is obviously a bug with accounting in addition to the fundamental problem with state that I speculated about above.
So please understand my position: rewinding in the rate plugin never worked, so it cannot have any users that won't also be satisfied by being told that it should not work, always returning 0 and ignoring the rewind. Especially since the fundamental problem with state actually requires writing a new resampler implementation from scratch (see below), and since there are other plugins (such as AC3 encoder) where implementing proper rewinds is impossible without even larger efforts, so a program must be prepared to deal with this situation anyway. Of course, this is only my position, you are free to ignore it.
+++++
And now let's return to that fundamental problem with state. Inside the speculation block, I have suggested that the problem is that we don't have ops in snd_pcm_rate_ops_t that forward rewinds to the resampler implementation, so that it can restore its state to the one corresponding to an earlier moment in time. The real big problem (and the one I really meant in the original trolling attempt) is that even if you define new ops in snd_pcm_rate_ops_t, one can't implement them using existing open-source resamplers. Their APIs just don't have a function to rewind and to restore the prior magic samples.
=====
Now to the upmix plugin which does demonstrate a similar bug successfully.
With the following .asoundrc, the test program, when modified to run on the "upmix" pcm, produces silence in front channels and non-silence in back channels, indicating a bug in upmix.
pcm.!upmix { type upmix channels 6 slave.pcm "hw:2" }
The bug is now that the snd_pcm_extplug_callback structure does not have a callback that would notify the implementation about the rewind, so that the implementation restores its internal state (for upmix - a delay buffer) to a prior moment of time. Of course, implementability of this callback is again a valid question.
Should PulseAudio indeed use a complex dance with snd_pcm_rewindable() to get the essentially-static bit of information whether a given PCM supports full rewinds?
As far as I can tell, ALSA assumes that _every_ device allows random access to its ring buffer.
In particular, both alsa-lib's internal API and the explug/ioplug APIs, when writing samples, have no concept of "current position" and write to an explicitly specified position in the buffer. The implementation if snd_pcm_rewindable() assumes that the entire buffer is accessible (the only plugin to change this is the file plugin).
snd_pcm_rewindable/forwardable() are just for determining the possible amount of movement, not whether it's possible at all.
Understood.
In other words: any plugin that does not allow random access is, as far as the ALSA API is concerned, buggy.
See (3).
- Use snd_pcm_forwardable() with an empty buffer just after setting hw_params: works (once the implementation in ioplug/extplug is fixed to return 0), doesn't require API additions,
If ALSA is changed to track the information whether a device allows random access or not, this would be the correct way.
OK, I am willing to do this change, but I am a bit confused. Is the suggeestion to do (3), and, in addition, to add restrictions to snd_pcm_forwardable() that use the same underlying information?
but David doesn't like it. He thinks that the sets of PCMs that support forwarding and rewinding may be different (here I disagree, but can't provide arguments).
It's possible to imagine devices that implement forwarding by just inserting zero samples, but cannot go back in time. (I'm not sure if implemting that would make any sense.)
No comments here.
- Add a new function to ALSA that gets this bit of static information for a given pcm handle, use it.
I guess this will happen.
Do you agree with this prototype, and the fact that the function should be exportable?
int snd_pcm_hw_params_is_seekable(const snd_pcm_hw_params_t *params);
Check if the device fully supports rewinds and forwards.
Parameters: params Configuration space
Return values: 0 The whole plugin chain is not guaranteed to support arbitrary rewinds and forwards. 1 The whole plugin chain is guaranteed, under all remaining configurations in the configuration space, to support any rewinds and forwards that don't move the application pointer through the hardware pointer.
If this function is called when there is more than one configuration exists in the configuration set (e.g. when the rate is not fixed), it may return pessimistic results.
On 2014-04-24 23:01, Alexander E. Patrakov wrote:
+++++
And now let's return to that fundamental problem with state. Inside the speculation block, I have suggested that the problem is that we don't have ops in snd_pcm_rate_ops_t that forward rewinds to the resampler implementation, so that it can restore its state to the one corresponding to an earlier moment in time. The real big problem (and the one I really meant in the original trolling attempt) is that even if you define new ops in snd_pcm_rate_ops_t, one can't implement them using existing open-source resamplers. Their APIs just don't have a function to rewind and to restore the prior magic samples.
=====
First, thanks for doing this research. As you can see, nobody really went into the rewinding business in depth...
I understand that you have a mathematically perfect approach to this, as well as other algorithms. This would indeed be the best goal, but given an imperfect world, where we're forced to choose between 1) no rewinding at all 2) imperfect rewinding in the sense that it sometimes can produce hearable artifacts
...I'm not sure 1) is always the right choice...
int snd_pcm_hw_params_is_seekable(const snd_pcm_hw_params_t *params);
Check if the device fully supports rewinds and forwards.
Parameters: params Configuration space
Return values: 0 The whole plugin chain is not guaranteed to support arbitrary rewinds and forwards. 1 The whole plugin chain is guaranteed, under all remaining configurations in the configuration space, to support any rewinds and forwards that don't move the application pointer through the hardware pointer.
If this function is called when there is more than one configuration exists in the configuration set (e.g. when the rate is not fixed), it may return pessimistic results.
I wonder if this is flexible enough. I think it would be not too uncommon for hardware drivers to e g transfer data to the hardware one period at a time. Hence what would make sense to rewind could be everything but the current period. A similar approach could potentially be done with resamplers (save state data every period so you can rewind correctly to exactly those points but nowhere else).
This would be good to indicate somehow. I e, I'd like to see snd_pcm_rewindable take into account how much actually makes sense to rewind, taking DMA buffer sizes and other things into account where appropriate. Today, we add our own margin of 1.33 ms in PulseAudio because (if my understanding is correct) "some DMAs don't like us writing exactly the block it's busy transferring", but it would be better if this was indicated by ALSA.
25.04.2014 12:19, David Henningsson wrote:
I understand that you have a mathematically perfect approach to this, as well as other algorithms. This would indeed be the best goal, but given an imperfect world, where we're forced to choose between
- no rewinding at all
- imperfect rewinding in the sense that it sometimes can produce
hearable artifacts
...I'm not sure 1) is always the right choice...
Understood. But IMHO (2) with a warning would be better than the current situation of (2) without any indication of the problem.
int snd_pcm_hw_params_is_seekable(const snd_pcm_hw_params_t *params);
Check if the device fully supports rewinds and forwards.
Parameters: params Configuration space
Return values: 0 The whole plugin chain is not guaranteed to support arbitrary rewinds and forwards. 1 The whole plugin chain is guaranteed, under all remaining configurations in the configuration space, to support any rewinds and forwards that don't move the application pointer through the hardware pointer.
If this function is called when there is more than one configuration exists in the configuration set (e.g. when the rate is not fixed), it may return pessimistic results.
I wonder if this is flexible enough. I think it would be not too uncommon for hardware drivers to e g transfer data to the hardware one period at a time. Hence what would make sense to rewind could be everything but the current period. A similar approach could potentially be done with resamplers (save state data every period so you can rewind correctly to exactly those points but nowhere else).
This would be good to indicate somehow. I e, I'd like to see snd_pcm_rewindable take into account how much actually makes sense to rewind, taking DMA buffer sizes and other things into account where appropriate. Today, we add our own margin of 1.33 ms in PulseAudio because (if my understanding is correct) "some DMAs don't like us writing exactly the block it's busy transferring", but it would be better if this was indicated by ALSA.
This looks like two valid feature requests. Regarding the limited flexibility, this can be solved by returning a bitmask of rewind-related limitations, but the current hw_params API doesn't have any functions that return bitmasks. Instead, for each new problem, a new function is created. So it may make sense to create more than one function. However, I am afraid to create a duplicate of an existing function. Does anyone know whether the "must not rewind to the current period" limitation is the same as already indicated by snd_pcm_hw_params_get_fifo_size() or snd_pcm_hw_params_is_batch() or snd_pcm_hw_params_is_block_transfer()?
Of course, these limitations should be honoured by snd_pcm_rewindable().
As for the applicability of limiting the rewinds to the period boundaries (where the resampler state is saved), then we have already discussed something similar in person during LinuxCon Europe 2013 in the context of PulseAudio and IIR filters. If access to the original samples since the beginning of that period is not lost, then we can rewind to the period boundary earlier than needed, then restore the state and reapply the samples between that period boundary and the point we actually wanted to reply to. So the net effect would be "can rewind to anything except the currently playing period". OTOH I know another approach that would make only a few samples non-rewindable (i.e. will look similar to the 1.33ms limitation), and I will probably try to implement it on my vacation as a new proof-of-concept library.
2014-04-25 22:09 GMT+08:00 Alexander E. Patrakov patrakov@gmail.com:
25.04.2014 12:19, David Henningsson wrote:
I understand that you have a mathematically perfect approach to this, as well as other algorithms. This would indeed be the best goal, but given an imperfect world, where we're forced to choose between
- no rewinding at all
- imperfect rewinding in the sense that it sometimes can produce
hearable artifacts
...I'm not sure 1) is always the right choice...
Understood. But IMHO (2) with a warning would be better than the current situation of (2) without any indication of the problem.
Does extplug or ioplug plugin support rewind ?
http://www.alsa-project.org/alsa-doc/alsa-lib/pcm_external_plugins.html
The filter-type plugin is a plugin to convert the PCM signals from the input and feeds to the output. Thus, this plugin always needs a slave PCM as its output.
The plugin can modify the format and the channels of the input/output PCM. It can *not* modify the sample rate (because of simplicity reason).
1) DCA plugin convert 6 channel to compress audio at fixed sample rate
seem only allow sequential write and no random write 2) The I/O-type plugin is a PCM plugin to work as the input or output terminal point, i.e. as a user-space PCM driver.
mmap_rw specifies whether the plugin behaves in the pseudo mmap mode. When this value is set to 1, the plugin creates always a local buffer and performs read/write calls using this buffer as if it's mmapped.
does this mean ioplug can have its own local buffer which is different from the slave ?
e.g. a52_pointer seem use delay of the slave
3) Do pulseaudio keep a copy of client 's buffer ?
when do pulseauido mix two or more client's buffer ( before write to the sound card or after receive data from client )
seem pulse plugin is not rewindable too
26.04.2014 07:32, Raymond Yau wrote:
2014-04-25 22:09 GMT+08:00 Alexander E. Patrakov <patrakov@gmail.com mailto:patrakov@gmail.com>:
25.04.2014 12:19, David Henningsson wrote: I understand that you have a mathematically perfect approach to this, as well as other algorithms. This would indeed be the best goal, but given an imperfect world, where we're forced to choose between 1) no rewinding at all 2) imperfect rewinding in the sense that it sometimes can produce hearable artifacts ...I'm not sure 1) is always the right choice... Understood. But IMHO (2) with a warning would be better than the current situation of (2) without any indication of the problem.
Does extplug or ioplug plugin support rewind ?
http://www.alsa-project.org/alsa-doc/alsa-lib/pcm_external_plugins.html
Both plugins currently pretend to support rewind. Extplug shouldn't do that at all if we are talking about perfect handling of rewinds, and is OK as it is if we accept an imperfect output of the correct length (but I don't want to accept this). Ioplug shouldn't support rewind if mmap_rw == 0, and I can't say yet what to do with mmap_rw == 1. Running the test program posted earlier in this thread on top of lfloat + jack produces silence (good!), but, depending on the jack settings, the program either hangs or runs faster than it should. So still a bug, but I am not sure if it is of some fundamental nature.
- DCA plugin convert 6 channel to compress audio at fixed sample rate
seem only allow sequential write and no random write
Right. The same applies to any LADSPA plugin.
Still, there is a funny thing that I want to share. Even though the dcaenc library only allows sequential writes, here is what happens on ALSA rewind followed by a write. When processing a rewind, extplug moves the slave application pointer. Upon reveiving a write, extplug submits the newly-written data for encoding, and then writes the result (which unfortunately contains some samples from the rewound part of the sound) over the previously encoded data, thus correctly discarding part of the old data. So the end result is of the correct length, with one or two broken DTS frames (copy-pasted from old and new DTS data) in the middle. Still bad, but this is different from the ioplug-based a52 plugin, which simply ignores rewinds, but pretends that it succeeds.
- The I/O-type plugin is a PCM plugin to work as the input or output
terminal point, i.e. as a user-space PCM driver.
mmap_rw specifies whether the plugin behaves in the pseudo mmap mode. When this value is set to 1, the plugin creates always a local buffer and performs read/write calls using this buffer as if it's mmapped.
does this mean ioplug can have its own local buffer which is different from the slave ?
Ioplug doesn't have a slave. It just transfers data to somewhere.
e.g. a52_pointer seem use delay of the slave
What a52 has is not really a slave in the traditional ALSA sense. And the encoder in ffmpeg allows only sequential writes.
- Do pulseaudio keep a copy of client 's buffer ?
when do pulseauido mix two or more client's buffer ( before write to the sound card or after receive data from client )
seem pulse plugin is not rewindable too
That's a limitation of the pulse plugin, originating from ioplug design. The pulse API does allow rewinds (see the last two parameters of pa_stream_write), but ioplug with mmap_rw = 0 essentially ignores ALSA rewinds, i.e. it is impossible to know in alsa-plugins/pulse/*.c that a rewind has to be processed.
- Do pulseaudio keep a copy of client 's buffer ?
when do pulseauido mix two or more client's buffer ( before write to
the sound card or after receive data from client )
seem pulse plugin is not rewindable too
That's a limitation of the pulse plugin, originating from ioplug design.
The pulse API does allow rewinds (see the last two parameters of pa_stream_write), but ioplug with mmap_rw = 0 essentially ignores ALSA rewinds, i.e. it is impossible to know in alsa-plugins/pulse/*.c that a rewind has to be processed.
Once the appl_ptr is rewinded to the nearest position from hw_ptr, it is the responbility of the application to write at least one period of audio immediately for some card. http://freedesktop.org/software/pulseaudio/doxygen/streams.html
Data is routed, converted and mixed from several sources before it is passed along to a final output
...
Playback and record streams always have a server-side buffer as part of the data flow. The size of this buffer needs to be chosen in a compromise between low latency and sensitivity for buffer overflows/underruns.
as pulseaudio only recommend application to use a reasonable size (e.g. pulse plugin allow client to select extreme small buffer size which sound card cannot support) pulse plugin should increase the constraint to a reasonable size which are supported by most sound card
(e.g. minimum period time of snd-ymfpci is 5ms)
... The server-side per-stream playback buffers are indexed by a write and a read index. The application writes to the write index and the sound device reads from the read index. The read index is increased monotonically,
How do the sound device reads from the read index ?
while the write index may be freely controlled by the application. Subtracting the read index from the write index will give you the current fill level of the buffer.
look like this buffer still contain the audio in original format sent by the client if the difference is the fill level and can be used for per stream softvolume ...
Seeking in the Playback Buffer
A client application may freely seek in the playback buffer.
PA_SEEK_RELATIVE_ON_READ - seek relative to the current read index. Use this to write data to the output buffer that should be played as soon as possible
what is current read index ?
do the read index mean the place which pulseaudio start to perform reasampling, downmixing/upmixing and mixing of all clients ?
Does pulseaudio internal resampler support rewind ?
28.04.2014 12:57, Raymond Yau wrote:
- Do pulseaudio keep a copy of client 's buffer ?
when do pulseauido mix two or more client's buffer ( before write
to the sound card or after receive data from client )
seem pulse plugin is not rewindable too
That's a limitation of the pulse plugin, originating from ioplug
design. The pulse API does allow rewinds (see the last two parameters of pa_stream_write), but ioplug with mmap_rw = 0 essentially ignores ALSA rewinds, i.e. it is impossible to know in alsa-plugins/pulse/*.c that a rewind has to be processed.
Once the appl_ptr is rewinded to the nearest position from hw_ptr, it is the responbility of the application to write at least one period of audio immediately for some card.
Yes, of course, provided that the rewind succeeds.
http://freedesktop.org/software/pulseaudio/doxygen/streams.html
Data is routed, converted and mixed from several sources before it is passed along to a final output
...
Playback and record streams always have a server-side buffer as part of the data flow. The size of this buffer needs to be chosen in a compromise between low latency and sensitivity for buffer overflows/underruns.
as pulseaudio only recommend application to use a reasonable size (e.g. pulse plugin allow client to select extreme small buffer size which sound card cannot support) pulse plugin should increase the constraint to a reasonable size which are supported by most sound card
(e.g. minimum period time of snd-ymfpci is 5ms)
On most sound cards, PulseAudio uses timer-based scheduling and thus is not subject to any period-size limitations. As far as I can see, snd_ymfpci does not include SNDRV_PCM_INFO_BATCH, and thus will be used in this mode. In addition, it has an implementation of snd_ymfpci_playback_pointer that seems to look at the hardware, which is good. So effective period times less than 5 ms should work on this card.
... The server-side per-stream playback buffers are indexed by a write and a read index. The application writes to the write index and the sound device reads from the read index. The read index is increased monotonically,
How do the sound device reads from the read index ?
read index is just another word for hw_ptr if one talks about the sink buffer. When that buffer becomes almost empty, it is refilled by mixing all currently playing streams. You can try to follow the code starting from pa_sink_render().
while the write index may be freely controlled by the application. Subtracting the read index from the write index will give you the current fill level of the buffer.
look like this buffer still contain the audio in original format sent by the client if the difference is the fill level and can be used for per stream softvolume ...
Seeking in the Playback Buffer
A client application may freely seek in the playback buffer.
PA_SEEK_RELATIVE_ON_READ - seek relative to the current read index. Use this to write data to the output buffer that should be played as soon as possible
what is current read index ?
It is the point in the buffer that corresponds to the point in time that the hardware currently plays. Or maybe a better (but not equivalent) definition would be a point rewinding past which does not make sense because of the immediate underrun in the final hardware.
Seeking to 0 samples relatively to the current read index means "I want to play this buffer right now instead of whatever junk of unknown length that I sent before".
do the read index mean the place which pulseaudio start to perform reasampling, downmixing/upmixing and mixing of all clients ?
No. PulseAudio tries to resample and mix as much as possible in advance, in hope that it will be able to undo all that work if needed.
Does pulseaudio internal resampler support rewind ?
PulseAudio uses libraries such as speex and libsamplerate, and has a copy of the ffmpeg resampler as well. None of these libraries support rewinds. But PulseAudio has some code to work around that, and does achieve an imperfect result of approximately correct length. See pa_sink_input_process_rewind() - it resets the resampler, resulting in an audible click, if a seek is performed.
On most sound cards, PulseAudio uses timer-based scheduling and thus is
not subject to any period-size limitations. As far as I can see, snd_ymfpci does not include SNDRV_PCM_INFO_BATCH, and thus will be used in this mode. In addition, it has an implementation of snd_ymfpci_playback_pointer that seems to look at the hardware, which is good. So effective period times less than 5 ms should work on this card.
https://git.kernel.org/cgit/linux/kernel/git/tiwai/sound.git/log/?qt=grep&am...
Is there any reason to use timer scheduling for the sound card which cannot disable period wake up ?
The sound cards are still wake up by period and the timer
As the DSP of ymfpci can mix up to 32 voices of different sample rate, mono/stereo , u8/s16 .
The driver just setup the card's timer to increase the fake hw_ptr on 5ms interval by (48000Hz * 5ms) 240 samples
Unlike most sound cards which periods_min equal to 2, the periods_min of snd_ymfpci is 3
As the fake hw_ptr increase by 240 samples on 5 ms interval, I think it is a bad idea to allow any application to perform rewind
29.04.2014 22:01, Raymond Yau wrote:
On most sound cards, PulseAudio uses timer-based scheduling and thus
is not subject to any period-size limitations. As far as I can see, snd_ymfpci does not include SNDRV_PCM_INFO_BATCH, and thus will be used in this mode. In addition, it has an implementation of snd_ymfpci_playback_pointer that seems to look at the hardware, which is good. So effective period times less than 5 ms should work on this card.
https://git.kernel.org/cgit/linux/kernel/git/tiwai/sound.git/log/?qt=grep&am...
Is there any reason to use timer scheduling for the sound card which cannot disable period wake up ?
Yes (if "cannot disable period wakeups" is the only bad thing about the card, which the further text in your e-mail contradicts to). Key words: dynamic latency.
As you surely know, running with low latency has a high chance of underruns due to CPU spikes, so this should be avoided if reasonable. However, some applications (e.g. VoIP) require low latency. By ignoring period wakeups and relying on time-based scheduling, one can effectively change the period size on the fly, obtaining low latency only when it is needed.
The sound cards are still wake up by period and the timer
As the DSP of ymfpci can mix up to 32 voices of different sample rate, mono/stereo , u8/s16 .
Understood.
The driver just setup the card's timer to increase the fake hw_ptr on 5ms interval by (48000Hz * 5ms) 240 samples
And that's bad. As the precision of hw_ptr is only 5 ms (and not 1 sample), the driver must set SNDRV_PCM_INFO_BATCH. This would indeed avoid rewinds in the current PulseAudio code.
However, I don't see any code that backs up your claim. Is that fake pointer increased in hardware (as opposed to in the kernel) with such bad granularity?
See snd_ymfpci_memalloc(): both chip->voices[voice].bank and chip->bank_playback[voice][bank] are assigned a pointer that is based on chip->work_ptr.area, which is itself obtained through snd_dma_alloc_pages. So all bank pointers definitely point into the DMA-able area.
Also, snd_ymfpci_playback_pointer is based on voice->bank[chip->active_bank].start only, and the same construction is used in the interrupt handler to obtain the current playback position. So it is definitely based on what the hardware wrote into the DMA-able area, and that's why I thought that the .pointer function is good. Does it write there only just before the interrupt, or every 256 samples independently on the period size, or on some other condition?
Unlike most sound cards which periods_min equal to 2, the periods_min of snd_ymfpci is 3
OK, but I don't see why it is relevant. Surely we can set up as many periods as we want, if all we are going to do with period interrupts is to ignore them in userspace.
As the fake hw_ptr increase by 240 samples on 5 ms interval, I think it is a bad idea to allow any application to perform rewind
It is a bad idea to rewind into the uncertain area that we don't know whether it is already playing or not. If we have 6 periods 5 ms each, and always keep 5 of them full by promptly reacting to the interrupt notifications, then surely we can rewind one of them without much risk. But indeed, a full rewind is impossible for a fake hw_ptr with such bad granularity.
29.04.2014 23:17, I wrote:
And that's bad. As the precision of hw_ptr is only 5 ms (and not 1 sample), the driver must set SNDRV_PCM_INFO_BATCH. This would indeed avoid rewinds in the current PulseAudio code.
Oops. This will only avoid timer-based scheduling, but not rewinds.
On most sound cards, PulseAudio uses timer-based scheduling and thus
is not subject to any period-size limitations. As far as I can see, snd_ymfpci does not include SNDRV_PCM_INFO_BATCH, and thus will be used in this mode.
http://mailman.alsa-project.org/pipermail/alsa-devel/2014-March/073816.html
Do you mean sound card which cannot provide sample granularity need to use that flag or you want the driver to provide this granularity to application ?
In addition, it has an implementation of
snd_ymfpci_playback_pointer that seems to look at the hardware, which is good. So effective period times less than 5 ms should work on this card.
Yes (if "cannot disable period wakeups" is the only bad thing about the
card, which the further text in your e-mail contradicts to). Key words: dynamic latency.
As you surely know, running with low latency has a high chance of
underruns due to CPU spikes, so this should be avoided if reasonable. However, some applications (e.g. VoIP) require low latency. By ignoring period wakeups and relying on time-based scheduling, one can effectively change the period size on the fly, obtaining low latency only when it is needed.
Are there any different reqiurment bewteen VOIP and other application in addition to low latency ?
Voip does not really need stereo . The web cam Mic usually capture at 16000Hz mono while pulseaudio use same default rate for playback and capture
However, I don't see any code that backs up your claim. Is that fake
pointer increased in hardware (as opposed to in the kernel) with such bad granularity?
See snd_ymfpci_memalloc(): both chip->voices[voice].bank and
chip->bank_playback[voice][bank] are assigned a pointer that is based on chip->work_ptr.area, which is itself obtained through snd_dma_alloc_pages. So all bank pointers definitely point into the DMA-able area.
Also, snd_ymfpci_playback_pointer is based on
voice->bank[chip->active_bank].start only, and the same construction is used in the interrupt handler to obtain the current playback position. So it is definitely based on what the hardware wrote into the DMA-able area, and that's why I thought that the .pointer function is good. Does it write there only just before the interrupt, or every 256 samples independently on the period size, or on some other condition?
http://mailman.alsa-project.org/pipermail/alsa-devel/2011-September/043715.h...
The author of the driver told me that
The period interrupts are not accurate at all. The ymfpci hardware internally uses fixed periods of 256 frames at 48 kHz. the driver reports a period interrupt when the next hardware interrupt at or after a period boundary occurs. The current position reported by the hardware is the position at the time of the last hardware interrupt
OK, but I don't see why it is relevant. Surely we can set up as many
periods as we want, if all we are going to do with period interrupts is to ignore them in userspace.
Do you mean pulseaudio will use more periods instead of periods_min for those sound cards which cannot disable period wake-up ?
Do dynamic latency mean pulseaudio use 1ms and double the times until the sound card ?
It may also be obtained by double the number of periods and restrict hwbuf_unused in multiple of samples in a period instead of any arbituary number of sample ?
It is a bad idea to rewind into the uncertain area that we don't know
whether it is already playing or not. If we have 6 periods 5 ms each, and always keep 5 of them full by promptly reacting to the interrupt notifications, then surely we can rewind one of them without much risk. But indeed, a full rewind is impossible for a fake hw_ptr with such bad granularity.
/* FIXME? True value is 256/48 = 5.33333 ms */ err = snd_pcm_hw_constraint_minmax(runtime, SNDRV_PCM_HW_PARAM_PERIOD_TIME, 5334, UINT_MAX); if (err < 0) return err; err = snd_pcm_hw_rule_noresample(runtime, 48000);
Do you mean when noresample is used, the period byte constraint should be used instead of period time to avoid rounding error ?
To obtain dynamic latency, you can just keep half of them or at least two periods full at any time
02.05.2014 07:39, Raymond Yau wrote:
On most sound cards, PulseAudio uses timer-based scheduling and thus
is not subject to any period-size limitations. As far as I can see, snd_ymfpci does not include SNDRV_PCM_INFO_BATCH, and thus will be used in this mode.
http://mailman.alsa-project.org/pipermail/alsa-devel/2014-March/073816.html
Do you mean sound card which cannot provide sample granularity need to use that flag or you want the driver to provide this granularity to application ?
I acknowledge that there was some confusion regarding the precise meaning of the flag (especially given three possible situations - accurate to just one period, accurate to better than one period but worse than one sample, accurate to one sample), and I am not the authority here.
My own opinion so far is that, if the card cannot provide sample-accurate position reports at arbitrary moments in time, then the SNDRV_PCM_INFO_BATCH flag needs to be set. Of course, the driver should not attempt to provide sample-accurate information if such information does not exist in the first place.
In addition, there seems to be some confusion here between the hardware periods (which are always 256 samples) and the period size in hw params. If I understand the other emails correctly, regardless of what the user set in hw_params, the reported position on ymfpci will be accurate to within 256 samples.
PulseAudio has the following consideration here: if the card cannot report the position accurately, we need to disable the timestamp-based scheduling, as this breaks module-combine-sink (or any successor of it), because it relies on very accurate estimations of the actual sample rate ratio between two non-identical cards.
In addition, it has an implementation of
snd_ymfpci_playback_pointer that seems to look at the hardware, which is good. So effective period times less than 5 ms should work on this card.
Yes (if "cannot disable period wakeups" is the only bad thing about
the card, which the further text in your e-mail contradicts to). Key words: dynamic latency.
As you surely know, running with low latency has a high chance of
underruns due to CPU spikes, so this should be avoided if reasonable. However, some applications (e.g. VoIP) require low latency. By ignoring period wakeups and relying on time-based scheduling, one can effectively change the period size on the fly, obtaining low latency only when it is needed.
Are there any different reqiurment bewteen VOIP and other application in addition to low latency ?
Voip does not really need stereo . The web cam Mic usually capture at 16000Hz mono while pulseaudio use same default rate for playback and capture
In practice, only the latency matters here, as PulseAudio will open the device at 48 kHz stereo anyway.
However, I don't see any code that backs up your claim. Is that fake
pointer increased in hardware (as opposed to in the kernel) with such bad granularity?
See snd_ymfpci_memalloc(): both chip->voices[voice].bank and
chip->bank_playback[voice][bank] are assigned a pointer that is based on chip->work_ptr.area, which is itself obtained through snd_dma_alloc_pages. So all bank pointers definitely point into the DMA-able area.
Also, snd_ymfpci_playback_pointer is based on
voice->bank[chip->active_bank].start only, and the same construction is used in the interrupt handler to obtain the current playback position. So it is definitely based on what the hardware wrote into the DMA-able area, and that's why I thought that the .pointer function is good. Does it write there only just before the interrupt, or every 256 samples independently on the period size, or on some other condition?
http://mailman.alsa-project.org/pipermail/alsa-devel/2011-September/043715.h...
The author of the driver told me that
The period interrupts are not accurate at all. The ymfpci hardware internally uses fixed periods of 256 frames at 48 kHz. the driver reports a period interrupt when the next hardware interrupt at or after a period boundary occurs. The current position reported by the hardware is the position at the time of the last hardware interrupt
IMHO, this fits the purpose of the BATCH flag, but is precisely the middle case (possibly more accurate than one period, but definitely less accurate than one sample) which was disputed at http://mailman.alsa-project.org/pipermail/alsa-devel/2014-March/073816.html .
OK, but I don't see why it is relevant. Surely we can set up as many
periods as we want, if all we are going to do with period interrupts is to ignore them in userspace.
To avoid further misunderstandings: I was speaking above about possible changes, not about the current behaviour. The answers below are for the current situation, which may be suboptimal.
Do you mean pulseaudio will use more periods instead of periods_min for those sound cards which cannot disable period wake-up ?
For cards that are not batch, it will attempt to set both the period size and the buffer size to something large (tsched_size, the default is 2s) and let the driver adjust that to something more sensible for the card. I think that in practice this means periods_min, although it is never mentioned explicitly in PulseAudio source.
For batch cards, the desired period (or, in PA speak, "fragment") time and the number of periods come from the config, of course subject to adjustments made by the ALSA library to fit the hardware requirements. The defaults are:
; default-fragments = 4 ; default-fragment-size-msec = 25
Do dynamic latency mean pulseaudio use 1ms and double the times until the sound card ?
If you are talking about the wakeup latency, the mechanism is indeed as you described (start with something small, increase if xruns or near-xruns appear, decrease if all runs well for some time), but the 1ms number is wrong. See the defines at the top of src/modules/alsa/alsa-sink.c for exact numbers.
It may also be obtained by double the number of periods and restrict hwbuf_unused in multiple of samples in a period instead of any arbituary number of sample ?
I don't understand here.
It is a bad idea to rewind into the uncertain area that we don't know
whether it is already playing or not. If we have 6 periods 5 ms each, and always keep 5 of them full by promptly reacting to the interrupt notifications, then surely we can rewind one of them without much risk. But indeed, a full rewind is impossible for a fake hw_ptr with such bad granularity.
/* FIXME? True value is 256/48 = 5.33333 ms */ err = snd_pcm_hw_constraint_minmax(runtime, SNDRV_PCM_HW_PARAM_PERIOD_TIME, 5334, UINT_MAX); if (err < 0) return err; err = snd_pcm_hw_rule_noresample(runtime, 48000);
Do you mean when noresample is used, the period byte constraint should be used instead of period time to avoid rounding error ?
Yes. And maybe even constrain the period size to be a multiple of 256 samples in this case.
To obtain dynamic latency, you can just keep half of them or at least two periods full at any time
Yes if we talk about hardware periods (which are always 256 samples) and not about the period size which is settable through calling snd_pcm_hw_params_set_period_size_near() or something similar.
I acknowledge that there was some confusion regarding the precise meaning
of the flag (especially given three possible situations - accurate to just one period, accurate to better than one period but worse than one sample, accurate to one sample), and I am not the authority here.
Can this granularity be measured by
1) open a playback stream with 2 or more periods 2) fill the full buffer and start playback 3) check the value returned by snd_pcm_rewindable () is half of period at half period time interval 4) repeat the test with 1/4 , 1/8,..., period time interval
In addition, there seems to be some confusion here between the hardware
periods (which are always 256 samples) and the period size in hw params. If I understand the other emails correctly, regardless of what the user set in hw_params, the reported position on ymfpci will be accurate to within 256 samples.
http://www.alsa-project.org/~tiwai/writing-an-alsa-driver/ch05s06.html#pcm-i...
This callback is called when the PCM middle layer inquires the current hardware position on the buffer. The position must be returned in frames, ranging from 0 to buffer_size - 1.
This does not implies all drivers must give accurate position at any time, some drivers may use timer interrupt and increase the pointer by one periods
http://www.alsa-project.org/~tiwai/writing-an-alsa-driver/ch05s07.html#pcm-i...
High frequency timer interrupts
This happens when the hardware doesn't generate interrupts at the period boundary but issues timer interrupts at a fixed timer rate (e.g. es1968 or ymfpci drivers). In this case, you need to check the current hardware position and accumulate the processed sample length at each interrupt. When the accumulated size exceeds the period size, call snd_pcm_period_elapsed() and reset the accumulator.
I am also confuse about ymfpci really use timer interrupts.
it does not start timer in snd_ymfpci_hw_start and stop timer in snd_ymfpci_hw_stop , it call snd_pcm_timer() inside snd_ymfpci_interrupt()
it read YDSXGR_CTRLSELECT to determine active bank and call voice->interrupt
https://git.kernel.org/cgit/linux/kernel/git/tiwai/sound.git/commit/sound/pc...
it use per voice volume control to update voice->bank[next_bank].volume during the interrupt
YMF7xx are just AC97 controllers which DSP send the mixed audiothrough AC97 link @ 48000Hz to AC97 codec
Refer to Block Diagram and System Diagram, Audio is fetched by Bus Master DMA controller , Rate Converter/Mixer resample each voices's audio to 48000Hz if necessary and mix the audio streams , send to AC97 Interface
http://www.4front-tech.com/pguide/audio2.html
DSP_CAP_REALTIME bit tells if the device/operating system supports precise reporting of output pointer position using SNDCtl_DSP_GETxPTR. Precise means that accuracy of the reported playback pointer (time) is around few samples. Without this capability the playback/recording position is reported using precision of one fragment.
http://thread.gmane.org/gmane.linux.alsa.devel/5597
hw_ptr granularity is defined only by period_bytes_min (and additional
constraints if any).
https://git.kernel.org/cgit/linux/kernel/git/tiwai/sound.git/commit/sound/so...
hw.period_bytes_min = 256; hw.fifo_size = dma_data->fifo_size;
Does soc drivers transfer audio in 256 bytes ?
seem twice the values of snd-hda-Intel (pcie bus)
PulseAudio has the following consideration here: if the card cannot
report the position accurately, we need to disable the timestamp-based scheduling, as this breaks module-combine-sink (or any successor of it), because it relies on very accurate estimations of the actual sample rate ratio between two non-identical cards.
https://bugs.freedesktop.org/show_bug.cgi?id=47899
if you want to hear sound from two snd-hda-intel at the same time using combined sink, you may need driver provide the output delay in hda codec
7.3.4.5 Audio Function Group Capabilities
Output Delay is a four bit value representing the number of samples between when the sample is received from the Link and when it appears as an analog signal at the pin. This may be a “typical” value. If this is 0, the widgets along the critical path should be queried, and each individual widget must report its individual delay.
Figure 85. Audio Function Group Capabilities Response Format
7.3.4.6 Audio Widget Capabilities
Delay indicates the number of sample delays through the widget. This may be 0 if the delay value in the Audio Function Parameters is supplied to represent the entire path.
http://git.kernel.org/cgit/linux/kernel/git/tiwai/hda-emu.git/tree/codecs
some hda codecs report delay in audio output/input widgets and the ranges of delay vary from 3 to 13 samples, hda_proc.c did not show output/input delay in the audio function group
Does it mean that pulseaudio need to add back the difference of output delay of two hda codecs ?
If yes, this look like combined sink is also not rewindable too when difference of output delay is not zero
http://git.alsa-project.org/?p=alsa-plugins.git;a=blob;f=doc/upmix.txt;hb=HE...
for upmix plugin, this mean that snd_pcm_rewindable may need to return zero when user specify delay > 0 if you want "glitch free" rewind
Other pulseaudio modules seen does not support rewind (e.g. jack, tunnel, Bluetooth,...
http://git.alsa-project.org/?p=alsa-plugins.git;a=tree
Other alsa plugins (e.g. Jack, oss,...) seem not support rewind
How about route and plug plugins ?
For cards that are not batch, it will attempt to set both the period size
and the buffer size to something large (tsched_size, the default is 2s) and let the driver adjust that to something more sensible for the card. I think that in practice this means periods_min, although it is never mentioned explicitly in PulseAudio source.
it does not mean all drivers uses periods_min when applicaition did not explicitly set period , period time or period bytes
http://git.alsa-project.org/?p=alsa-lib.git;a=commit;h=09879a4bb58199f64abcb...
Some driver 's period_bytes_max can be less than (buffer_bytes_max / periods_min)
For batch cards, the desired period (or, in PA speak, "fragment") time
and the number of periods come from the config, of course subject to adjustments made by the ALSA library to fit the hardware requirements.
as long as the application use two periods , no safe rewind is possible as the application need to keep at least two periods fill in the buffer if the pointer is not accurate.
do you mean that pulseaudio still perform rewind when using AC3 passthrough on Didital playback device ?
if not , the value of snd_pcm_rewindable() of digital playback device depends on whether no audio bit is set or not
The defaults are:
; default-fragments = 4 ; default-fragment-size-msec = 25
Does this mean that pulseaudio use this values for all sound card including hda when user specify this value ?
Seem the values are comment out and not used by pulseaudio as default , users have to explicitly set these in the following report
https://bugs.launchpad.net/ubuntu/+source/pulseaudio/+bug/428619/comments/96
10.05.2014 06:46, Raymond Yau wrote:
I acknowledge that there was some confusion regarding the precise
meaning of the flag (especially given three possible situations - accurate to just one period, accurate to better than one period but worse than one sample, accurate to one sample), and I am not the authority here.
Can this granularity be measured by
- open a playback stream with 2 or more periods
- fill the full buffer and start playback
- check the value returned by snd_pcm_rewindable () is half of period
at half period time interval 4) repeat the test with 1/4 , 1/8,..., period time interval
Yes and no.
Instead of what you are proposing above, I wrote a loop that repeatedly calls snd_pcm_rewindable() 7000000 times and prints the result if it differs from the previous one. With snd-hda-intel (PCH), hw plugin, stereo, S16_LE, 48 kHz, 6 periods, and a period size of 1024, I get this:
Rewindable: 6119, loop iteration: 0 Rewindable: 5119, loop iteration: 5389434
So snd_pcm_rewindable() can return weird values that are updated every period size or so. As such, I wouldn't believe its return value out of the box even for hw devices. At loop iteration 5389433, the CPU chewed enough time for almost one period, but snd_pcm_rewindable() said that almost 6 periods are rewindable. Probably a missing sync_ptr() somewhere, or a documentation bug.
With snd_pcm_avail() inserted (which does synchronize the position) before each call to snd_pcm_rewindable(), I get:
Rewindable: 6119, loop iteration: 0 Rewindable: 6112, loop iteration: 2 Rewindable: 6104, loop iteration: 42 Rewindable: 6096, loop iteration: 76 Rewindable: 6088, loop iteration: 125 Rewindable: 6080, loop iteration: 173 Rewindable: 6072, loop iteration: 222 Rewindable: 6064, loop iteration: 270
(and an underrun in the end).
With 4 channels:
Rewindable: 6112, loop iteration: 0 Rewindable: 6108, loop iteration: 2 Rewindable: 6104, loop iteration: 14 Rewindable: 6100, loop iteration: 36 Rewindable: 6096, loop iteration: 58 Rewindable: 6092, loop iteration: 63
With 8 channels:
Rewindable: 6104, loop iteration: 0 Rewindable: 6098, loop iteration: 1 Rewindable: 6096, loop iteration: 2 Rewindable: 6094, loop iteration: 9 Rewindable: 6092, loop iteration: 24 Rewindable: 6090, loop iteration: 32 Rewindable: 6088, loop iteration: 41
So on my snd-hda-intel, the granularity of the pointer is 32 bytes.
For Haswell HDMI (on another snd-hda-intel), stereo, S16_LE:
Rewindable: 6128, loop iteration: 0 Rewindable: 6112, loop iteration: 129 Rewindable: 6096, loop iteration: 339 Rewindable: 6080, loop iteration: 551 Rewindable: 6064, loop iteration: 753 Rewindable: 6048, loop iteration: 966 Rewindable: 6032, loop iteration: 1180
so the resulting granularity is 64 bytes.
An unfortunate observation is that, without snd_pcm_avail(), even on hw just after an underrun snd_pcm_rewindable() can return negative numbers such as -16 or -25 that lead to nonsense error codes (EBUSY or ENOTTY).
In addition, there seems to be some confusion here between the
hardware periods (which are always 256 samples) and the period size in hw params. If I understand the other emails correctly, regardless of what the user set in hw_params, the reported position on ymfpci will be accurate to within 256 samples.
http://www.alsa-project.org/~tiwai/writing-an-alsa-driver/ch05s06.html#pcm-i...
This callback is called when the PCM middle layer inquires the current hardware position on the buffer. The position must be returned in frames, ranging from 0 to buffer_size - 1.
This does not implies all drivers must give accurate position at any time, some drivers may use timer interrupt and increase the pointer by one periods
That's why we have snd_pcm_htimestamp().
http://www.alsa-project.org/~tiwai/writing-an-alsa-driver/ch05s07.html#pcm-i...
High frequency timer interrupts
This happens when the hardware doesn't generate interrupts at the period boundary but issues timer interrupts at a fixed timer rate (e.g. es1968 or ymfpci drivers). In this case, you need to check the current hardware position and accumulate the processed sample length at each interrupt. When the accumulated size exceeds the period size, call snd_pcm_period_elapsed() and reset the accumulator.
This clears the confusion.
I am also confuse about ymfpci really use timer interrupts.
Well, that's easy. According to your own words, the card sends an interrupt every 256 samples and has no real notion of the user-defined period size. From ALSA viewpoint, this 256-sample interrupt is just a timer (but not a timer that is managed through functions that have "timer" in the name).
<snip information on the OSS DSP_CAP_REALTIME and DSP_CAP_BATCH capability bits>
Interesting, but does not really clarify anything.
hw_ptr granularity is defined only by period_bytes_min (and
additional constraints if any).
Well, this disagrees with my experiments. For S16_LE stereo, snd_pcm_hw_params_get_period_size_min() says 32 samples for both PCH and HDMI, while the measured granularity is different (8 and 16 samples).
https://git.kernel.org/cgit/linux/kernel/git/tiwai/sound.git/commit/sound/so...
hw.period_bytes_min = 256; hw.fifo_size = dma_data->fifo_size;
Does soc drivers transfer audio in 256 bytes ?
seem twice the values of snd-hda-Intel (pcie bus)
No comment on this, I am not an expert in this area.
PulseAudio has the following consideration here: if the card cannot
report the position accurately, we need to disable the timestamp-based scheduling, as this breaks module-combine-sink (or any successor of it), because it relies on very accurate estimations of the actual sample rate ratio between two non-identical cards.
This is something to investigate, I am not ready to provide any useful comment. Although in comment #2 bluetooth is mentioned, and this is indeed an example where even somewhat accurate timing information is not available.
if you want to hear sound from two snd-hda-intel at the same time using combined sink, you may need driver provide the output delay in hda codec
7.3.4.5 Audio Function Group Capabilities
Output Delay is a four bit value representing the number of samples between when the sample is received from the Link and when it appears as an analog signal at the pin. This may be a “typical” value. If this is 0, the widgets along the critical path should be queried, and each individual widget must report its individual delay.
Figure 85. Audio Function Group Capabilities Response Format
7.3.4.6 Audio Widget Capabilities
Delay indicates the number of sample delays through the widget. This may be 0 if the delay value in the Audio Function Parameters is supplied to represent the entire path.
http://git.kernel.org/cgit/linux/kernel/git/tiwai/hda-emu.git/tree/codecs
some hda codecs report delay in audio output/input widgets and the ranges of delay vary from 3 to 13 samples, hda_proc.c did not show output/input delay in the audio function group
Interesting, implementable for someone with the skills in this area, but probably not relevant for the above freedesktop bug. What you are talking about is just a constant offset in the snd_pcm_delay() return values. That's bad, but I guess not bad enough for PulseAudio to stutter. What PulseAudio doesn't tolerate is jitter.
Does it mean that pulseaudio need to add back the difference of output delay of two hda codecs ?
Yes.
If yes, this look like combined sink is also not rewindable too when difference of output delay is not zero
Indeed, it is not fully rewindable. But what's worse is that the combined sink contains a variable-rate resampler that is used in order to compensate adaptively for slightly different sample rates of the two cards.
http://git.alsa-project.org/?p=alsa-plugins.git;a=blob;f=doc/upmix.txt;hb=HE...
for upmix plugin, this mean that snd_pcm_rewindable may need to return zero when user specify delay > 0 if you want "glitch free" rewind
Absolutely correct.
Other pulseaudio modules seen does not support rewind (e.g. jack, tunnel, Bluetooth,...
http://git.alsa-project.org/?p=alsa-plugins.git;a=tree
Other alsa plugins (e.g. Jack, oss,...) seem not support rewind
Jack is interesting here: it is the only ioplug-based plugin which sets mmap_rw = 1. As such, ALSA treats it as something that has mmapped buffer with the same semantics as an ordinary hardware sound card, and performs rewinds using this buffer. There is also a "hardware" position callback. The actual transfer of samples from that buffer to JACK is performed in a separate realtime thread which is implicitly created in jack_activate(). The porition is updated every JACK period.
The whole construction should support rewinds, with the non-rewindable remainder being one JACK period (which may be different from one ALSA period). If the JACK period is 256 samples, this plugin should behave very much like one voice of ymfpci.
However, there is still a bug (possibly fixable). While running my test program over this plugin does produce silence, it produces a wrong amount of silence. As if there is something wrong with snd_pcm_jack_poll_revents() or snd_pcm_jack_pointer(). I have not tested this patch yet, although I should:
http://comments.gmane.org/gmane.linux.alsa.devel/122554
As for ioplug-based plugins with mmap_rw = 0, they are indeed not rewindable at all.
How about route and plug plugins ?
Route should in theory support rewind, but there may be bugs. The best way to test is to run a program that I attached to an earlier message in this thread and listen - any non-silence indicates a non-rewindable plugin.
Plug is not really a plugin, it just inserts other plugins and then disappears. It can insert adpcm and rate plugins which are not really rewindable.
For cards that are not batch, it will attempt to set both the period
size and the buffer size to something large (tsched_size, the default is 2s) and let the driver adjust that to something more sensible for the card. I think that in practice this means periods_min, although it is never mentioned explicitly in PulseAudio source.
it does not mean all drivers uses periods_min when applicaition did not explicitly set period , period time or period bytes
http://git.alsa-project.org/?p=alsa-lib.git;a=commit;h=09879a4bb58199f64abcb...
Some driver 's period_bytes_max can be less than (buffer_bytes_max / periods_min)
Please see this code and make your own conclusions.
http://cgit.freedesktop.org/pulseaudio/pulseaudio/tree/src/modules/alsa/alsa...
PulseAudio does try several strategies, among those is "set nothing at all if everything else failed".
For batch cards, the desired period (or, in PA speak, "fragment")
time and the number of periods come from the config, of course subject to adjustments made by the ALSA library to fit the hardware requirements.
as long as the application use two periods , no safe rewind is possible as the application need to keep at least two periods fill in the buffer if the pointer is not accurate.
Correct if we use the viewpoint that the pointer granularity is one period for such cards. Which is not the case for ymfpci and for USB audio.
do you mean that pulseaudio still perform rewind when using AC3 passthrough on Didital playback device ?
I don't know.
if not , the value of snd_pcm_rewindable() of digital playback device depends on whether no audio bit is set or not
While I understand the idea, I would like to see more details. The only consideration that comes to my mind is that it should not be possible to rewind into the middle of the currently playing IEC 61937 frame. But there are also stupid receivers (e.g. JVC TH-A25) that mute their outputs if a non-IEC-wrapped DTS stream (as found on DTS CDs) is coming with AES0 = 6.
The defaults are:
; default-fragments = 4 ; default-fragment-size-msec = 25
Does this mean that pulseaudio use this values for all sound card including hda when user specify this value ?
This is the default for all cards that are autodetected as incompatible with tsched or where the user configured tsched=0. These defaults can still be overridden with module-alsa-card parameters.
Seem the values are comment out and not used by pulseaudio as default , users have to explicitly set these in the following report
The commented-out values are used by default when PulseAudio decides that it needs a traditional (not tsched based) playback model. They are in the source:
http://cgit.freedesktop.org/pulseaudio/pulseaudio/tree/src/daemon/daemon-con...
https://bugs.launchpad.net/ubuntu/+source/pulseaudio/+bug/428619/comments/96
The key in that comment is the explicit tsched=0. Without tsched=0, these settings are not used.
I acknowledge that there was some confusion regarding the precise
meaning of the flag (especially given three possible situations - accurate to just one period, accurate to better than one period but worse than one sample, accurate to one sample), and I am not the authority here.
https://bugs.launchpad.net/ubuntu/+source/pulseaudio/+bug/1188425
I: [pulseaudio] alsa-sink.c: Successfully opened device a52:0. I: [pulseaudio] alsa-sink.c: Selected mapping 'Digital Surround 5.1 (IEC958/AC3)' (iec958-ac3-surround-51). I: [pulseaudio] alsa-sink.c: Cannot enable timer-based scheduling, falling back to sound IRQ scheduling. I: [pulseaudio] alsa-sink.c: Successfully enabled mmap() mode.
Seem only hw device report whether it support disable period wake-up
ioplug did not support disable period wake-up and you need A52 plugin to provide a parameter to disable the period wakeup of the slave
do you mean pulseaudio can disable period wakeup of the hda-intel through extplug ?
D: [alsa-sink] alsa-util.c: PCM state is RUNNING I: [alsa-sink] alsa-sink.c: Starting playback. I: [alsa-sink] (alsa-lib)pcm_hw.c: SNDRV_PCM_IOCTL_START failed (-77)
does SNDRV_PCM_IOCTL_START fail mean pcm state is no longer running ?
Instead of what you are proposing above, I wrote a loop that repeatedly
calls snd_pcm_rewindable() 7000000 times and prints the result if it differs from the previous one. With snd-hda-intel (PCH), hw plugin, stereo, S16_LE, 48 kHz, 6 periods, and a period size of 1024, I get this:
Rewindable: 6119, loop iteration: 0 Rewindable: 5119, loop iteration: 5389434
the method can be improved
instead of wake up in half period time to check the value of snd_pcm_rewindable()
1) set the timer to wakeup at 1/16 period time intervals, if the values does not change , this mean that it does not provide accuracy of 1/16 of period time and you can know whether it support 1/8 when the next wakeup occur at 1/8 period time, ...until you get 16 values for the first period
2) if the value of snd_pcm_rewindable change at every 1/16 period time intervals , set the timer to wakeup at 1/256 period time at the second period
So snd_pcm_rewindable() can return weird values that are updated every
period size or so. As such, I wouldn't believe its return value out of the box even for hw devices. At loop iteration 5389433, the CPU chewed enough time for almost one period, but snd_pcm_rewindable() said that almost 6 periods are rewindable. Probably a missing sync_ptr() somewhere, or a documentation bug.
With snd_pcm_avail() inserted (which does synchronize the position)
before each call to snd_pcm_rewindable(), I get:
Rewindable: 6119, loop iteration: 0 Rewindable: 6112, loop iteration: 2 Rewindable: 6104, loop iteration: 42 Rewindable: 6096, loop iteration: 76 Rewindable: 6088, loop iteration: 125 Rewindable: 6080, loop iteration: 173 Rewindable: 6072, loop iteration: 222 Rewindable: 6064, loop iteration: 270
(and an underrun in the end).
With 4 channels:
Rewindable: 6112, loop iteration: 0 Rewindable: 6108, loop iteration: 2 Rewindable: 6104, loop iteration: 14 Rewindable: 6100, loop iteration: 36 Rewindable: 6096, loop iteration: 58 Rewindable: 6092, loop iteration: 63
With 8 channels:
Rewindable: 6104, loop iteration: 0 Rewindable: 6098, loop iteration: 1 Rewindable: 6096, loop iteration: 2 Rewindable: 6094, loop iteration: 9 Rewindable: 6092, loop iteration: 24 Rewindable: 6090, loop iteration: 32 Rewindable: 6088, loop iteration: 41
So on my snd-hda-intel, the granularity of the pointer is 32 bytes.
For Haswell HDMI (on another snd-hda-intel), stereo, S16_LE:
Rewindable: 6128, loop iteration: 0 Rewindable: 6112, loop iteration: 129 Rewindable: 6096, loop iteration: 339 Rewindable: 6080, loop iteration: 551 Rewindable: 6064, loop iteration: 753 Rewindable: 6048, loop iteration: 966 Rewindable: 6032, loop iteration: 1180
so the resulting granularity is 64 bytes.
An unfortunate observation is that, without snd_pcm_avail(), even on hw
just after an underrun snd_pcm_rewindable() can return negative numbers such as -16 or -25 that lead to nonsense error codes (EBUSY or ENOTTY).
pcm_rewind2.c use period size instead of buffer size as start_threshold , pcm is already started before you fill the buffer full and pcm can be stopped at underrun if your program does not use boundary as stop_threshold
this affect your timing if your test program behave like pcm_rewind2.c static snd_pcm_sframes_t snd_pcm_hw_rewindable(snd_pcm_t *pcm)
{ return snd_pcm_mmap_hw_avail(pcm); }
if this function return the safe value, Do you mean it must hw_sync the pointer and should return zero if snd_pcm_mmap_hw_avail is negative and check the pcm state to return negative error code ?
http://www.alsa-project.org/~tiwai/writing-an-alsa-driver/ch05s07.html#pcm-i...
High frequency timer interrupts
This happens when the hardware doesn't generate interrupts at the period
boundary but issues timer interrupts at a fixed timer rate (e.g. es1968 or ymfpci drivers).
both es1968 and ymfpci use ac97 codec, there is an external clock source (oscillator) to provide the timing to both sound chips(ac97 controller) and ac97 codec to sync the transfer of audio through ac97 link at 48000Hz
it depends on whether the chip can count the clock ticks to provide a timer interrupt
I am also confuse about ymfpci really use timer interrupts.
Well, that's easy. According to your own words, the card sends an
interrupt every 256 samples and has no real notion of the user-defined period size. From ALSA viewpoint, this 256-sample interrupt is just a timer (but not a timer that is managed through functions that have "timer" in the name).
Unlike other hardware-mixing sound cards desgined for playing game , the multi voices of ymfpci is designed for playing MIDI which MIDI notes of a sound are usually start at same tempo
some subdevices can have unpredictable delay if is it not the subdevice which start the hardware
it depends on whether the hardware can provide registers for the driver to start each subdevice independently when receiving SNDRV_PCM_TRIGGER_START in pcm_trigger callback
hw_ptr granularity is defined only by period_bytes_min (and
additional constraints if any).
Well, this disagrees with my experiments. For S16_LE stereo,
snd_pcm_hw_params_get_period_size_min() says 32 samples for both PCH and HDMI, while the measured granularity is different (8 and 16 samples).
should you use period_bytes_min instead of period_size_min ?
128 bytes / (8 x 2) = 8 samples for 8 channels
for 6 channels playback , the period does not fit exactly the pcie playload size 128 bytes
PulseAudio has the following consideration here: if the card cannot
report the position accurately, we need to disable the timestamp-based scheduling, as this breaks module-combine-sink (or any successor of it), because it relies on very accurate estimations of the actual sample rate ratio between two non-identical cards.
This is something to investigate, I am not ready to provide any useful
comment. Although in comment #2 bluetooth is mentioned, and this is indeed an example where even somewhat accurate timing information is not available.
if you want to hear sound from two snd-hda-intel at the same time using combined sink, you may need driver provide the output delay in hda codec
7.3.4.5 Audio Function Group Capabilities
Output Delay is a four bit value representing the number of samples between when the sample is received from the Link and when it appears as an analog signal at the pin. This may be a “typical” value. If this is 0, the widgets along the critical path should be queried, and each individual widget must report its individual delay.
Figure 85. Audio Function Group Capabilities Response Format
7.3.4.6 Audio Widget Capabilities
Delay indicates the number of sample delays through the widget. This may be 0 if the delay value in the Audio Function Parameters is supplied to represent the entire path.
http://git.kernel.org/cgit/linux/kernel/git/tiwai/hda-emu.git/tree/codecs
some hda codecs report delay in audio output/input widgets and the ranges of delay vary from 3 to 13 samples, hda_proc.c did not show output/input delay in the audio function group
Did snd_hda_param_read(codec, codec->afg, AC_PAR_AUDIO_FG_CAP) return any values for your hda codecs ?
what is critical path ?
since some driver can enable/disable loopback mixing which the audio pass through less widgets when loopback mixing is disabled
some idt codecs have a 5 bands equalizer in the path of Port D(not in the pin complex widget or mixer widget but setup using vendor specific verb to audio function group or vendor specific widget) but not in the path of Port A (headphone)
Interesting, implementable for someone with the skills in this area, but
probably not relevant for the above freedesktop bug. What you are talking about is just a constant offset in the snd_pcm_delay() return values. That's bad, but I guess not bad enough for PulseAudio to stutter. What PulseAudio doesn't tolerate is jitter.
The two hda controllers of the reporter does not use same buffer size (buffer time)
Do the timer based scheduling wakeup 20ms before the the buffer is empty ? the timer eventually wakeup at different time if buffer time of two hda-controllers are not the same
does this mean the pulseaudio still keep the audio data until the pulseaudio client close the stream ?
Other pulseaudio modules seen does not support rewind (e.g. jack, tunnel, Bluetooth,...
http://git.alsa-project.org/?p=alsa-plugins.git;a=tree
Other alsa plugins (e.g. Jack, oss,...) seem not support rewind
Jack is interesting here: it is the only ioplug-based plugin which sets
mmap_rw = 1. As such, ALSA treats it as something that has mmapped buffer with the same semantics as an ordinary hardware sound card, and performs rewinds using this buffer. There is also a "hardware" position callback. The actual transfer of samples from that buffer to JACK is performed in a separate realtime thread which is implicitly created in jack_activate(). The porition is updated every JACK period.
The whole construction should support rewinds, with the non-rewindable
remainder being one JACK period (which may be different from one ALSA period). If the JACK period is 256 samples, this plugin should behave very much like one voice of ymfpci.
https://github.com/jackaudio/jack2/blob/master/linux/alsa/alsa_driver.c
jackd server does not use snd_pcm_rewind, support non-interleaved mode sound cards and sound cards with 10 or more channels (e.g iec1712, hdsp and hammerfall, ...) more than two playback ports
the jack client has no info about how many periods or channels used by jackd server
http://jackaudio.org/routing_alsa
jack client only specify how many channels and which playback ports
you can specify the stereo output to the grey jack if the jackd server use 8 channels playback or mix the stereo output to the right channel
http://cgit.freedesktop.org/pulseaudio/pulseaudio/tree/src/modules/jack/modu...
seem module-jack-sink.c use fixed latency
does it mean that pulseaudio only rewind those sink which support dynamic latency ? i.e. it won't rewind the sink if the sink used fixed latency
12.05.2014 09:11, Raymond Yau wrote:
https://bugs.launchpad.net/ubuntu/+source/pulseaudio/+bug/1188425
I: [pulseaudio] alsa-sink.c: Successfully opened device a52:0. I: [pulseaudio] alsa-sink.c: Selected mapping 'Digital Surround 5.1 (IEC958/AC3)' (iec958-ac3-surround-51). I: [pulseaudio] alsa-sink.c: Cannot enable timer-based scheduling, falling back to sound IRQ scheduling. I: [pulseaudio] alsa-sink.c: Successfully enabled mmap() mode.
Seem only hw device report whether it support disable period wake-up
ioplug did not support disable period wake-up and you need A52 plugin to provide a parameter to disable the period wakeup of the slave
No, or maybe "not yet". PulseAudio will not try to enable timer-based scheduling on a52 anyway, because of the following source lines.
http://cgit.freedesktop.org/pulseaudio/pulseaudio/tree/src/modules/alsa/alsa... http://cgit.freedesktop.org/pulseaudio/pulseaudio/tree/src/modules/alsa/alsa...
do you mean pulseaudio can disable period wakeup of the hda-intel through extplug ?
Yes. That's a difference between ioplug and extplug. But I don't really care about disabling period interrupts.
D: [alsa-sink] alsa-util.c: PCM state is RUNNING I: [alsa-sink] alsa-sink.c: Starting playback. I: [alsa-sink] (alsa-lib)pcm_hw.c: SNDRV_PCM_IOCTL_START failed (-77)
does SNDRV_PCM_IOCTL_START fail mean pcm state is no longer running ?
Note that this comes from pcm_hw.c. As PulseAudio does not use the hw: device in this particular use case, I have to conclude that it comes through the a52 or ioplug code. I am not really familiar with this code.
Instead of what you are proposing above, I wrote a loop that
repeatedly calls snd_pcm_rewindable() 7000000 times and prints the result if it differs from the previous one. With snd-hda-intel (PCH), hw plugin, stereo, S16_LE, 48 kHz, 6 periods, and a period size of 1024, I get this:
Rewindable: 6119, loop iteration: 0 Rewindable: 5119, loop iteration: 5389434
the method can be improved
instead of wake up in half period time to check the value of snd_pcm_rewindable()
- set the timer to wakeup at 1/16 period time intervals, if the values
does not change , this mean that it does not provide accuracy of 1/16 of period time and you can know whether it support 1/8 when the next wakeup occur at 1/8 period time, ...until you get 16 values for the first period
- if the value of snd_pcm_rewindable change at every 1/16 period time
intervals , set the timer to wakeup at 1/256 period time at the second period
No need to do this. I have already made enough conclusions. Unfortunately, I forgot to attach the new test program (intentionally modified to produce an underrun), doing it now.
The output here is:
$ ./a.out Hardware PCM card 2 'HDA Intel PCH' device 0 subdevice 0 Its setup is: stream : PLAYBACK access : RW_INTERLEAVED format : S16_LE subformat : STD channels : 4 rate : 48000 exact rate : 48000 (48000/1) msbits : 16 buffer_size : 4096 period_size : 1024 period_time : 21333 tstamp_mode : NONE period_step : 1 avail_min : 1024 period_event : 0 start_threshold : 1024 stop_threshold : 4096 silence_threshold: 0 silence_size : 0 boundary : 4611686018427387904 appl_ptr : 0 hw_ptr : 0 Playing silence =================== Hardware PCM card 2 'HDA Intel PCH' device 0 subdevice 0 Its setup is: stream : PLAYBACK access : RW_INTERLEAVED format : S16_LE subformat : STD channels : 4 rate : 48000 exact rate : 48000 (48000/1) msbits : 16 buffer_size : 4096 period_size : 1024 period_time : 21333 tstamp_mode : NONE period_step : 1 avail_min : 1024 period_event : 0 start_threshold : 1024 stop_threshold : 4096 silence_threshold: 0 silence_size : 0 boundary : 4611686018427387904 appl_ptr : 4096 hw_ptr : 0 Rewindable: 4096, loop iteration: 0 =================== Hardware PCM card 2 'HDA Intel PCH' device 0 subdevice 0 Its setup is: stream : PLAYBACK access : RW_INTERLEAVED format : S16_LE subformat : STD channels : 4 rate : 48000 exact rate : 48000 (48000/1) msbits : 16 buffer_size : 4096 period_size : 1024 period_time : 21333 tstamp_mode : NONE period_step : 1 avail_min : 1024 period_event : 0 start_threshold : 1024 stop_threshold : 4096 silence_threshold: 0 silence_size : 0 boundary : 4611686018427387904 appl_ptr : 4096 hw_ptr : 1048 Rewindable: 3048, loop iteration: 1288389 =================== Hardware PCM card 2 'HDA Intel PCH' device 0 subdevice 0 Its setup is: stream : PLAYBACK access : RW_INTERLEAVED format : S16_LE subformat : STD channels : 4 rate : 48000 exact rate : 48000 (48000/1) msbits : 16 buffer_size : 4096 period_size : 1024 period_time : 21333 tstamp_mode : NONE period_step : 1 avail_min : 1024 period_event : 0 start_threshold : 1024 stop_threshold : 4096 silence_threshold: 0 silence_size : 0 boundary : 4611686018427387904 appl_ptr : 4096 hw_ptr : 2049 Rewindable: 2047, loop iteration: 3010739 =================== Hardware PCM card 2 'HDA Intel PCH' device 0 subdevice 0 Its setup is: stream : PLAYBACK access : RW_INTERLEAVED format : S16_LE subformat : STD channels : 4 rate : 48000 exact rate : 48000 (48000/1) msbits : 16 buffer_size : 4096 period_size : 1024 period_time : 21333 tstamp_mode : NONE period_step : 1 avail_min : 1024 period_event : 0 start_threshold : 1024 stop_threshold : 4096 silence_threshold: 0 silence_size : 0 boundary : 4611686018427387904 appl_ptr : 4096 hw_ptr : 3092 Rewindable: 1004, loop iteration: 5251015 =================== Hardware PCM card 2 'HDA Intel PCH' device 0 subdevice 0 Its setup is: stream : PLAYBACK access : RW_INTERLEAVED format : S16_LE subformat : STD channels : 4 rate : 48000 exact rate : 48000 (48000/1) msbits : 16 buffer_size : 4096 period_size : 1024 period_time : 21333 tstamp_mode : NONE period_step : 1 avail_min : 1024 period_event : 0 start_threshold : 1024 stop_threshold : 4096 silence_threshold: 0 silence_size : 0 boundary : 4611686018427387904 appl_ptr : 4096 hw_ptr : 4136 Rewindable: -40, loop iteration: 7807909 This means Too many levels of symbolic links
So snd_pcm_rewindable() can return weird values that are updated
every period size or so. As such, I wouldn't believe its return value out of the box even for hw devices. At loop iteration 5389433, the CPU chewed enough time for almost one period, but snd_pcm_rewindable() said that almost 6 periods are rewindable. Probably a missing sync_ptr() somewhere, or a documentation bug.
With snd_pcm_avail() inserted (which does synchronize the position)
before each call to snd_pcm_rewindable(), I get:
Rewindable: 6119, loop iteration: 0 Rewindable: 6112, loop iteration: 2 Rewindable: 6104, loop iteration: 42 Rewindable: 6096, loop iteration: 76 Rewindable: 6088, loop iteration: 125 Rewindable: 6080, loop iteration: 173 Rewindable: 6072, loop iteration: 222 Rewindable: 6064, loop iteration: 270
(and an underrun in the end).
With 4 channels:
Rewindable: 6112, loop iteration: 0 Rewindable: 6108, loop iteration: 2 Rewindable: 6104, loop iteration: 14 Rewindable: 6100, loop iteration: 36 Rewindable: 6096, loop iteration: 58 Rewindable: 6092, loop iteration: 63
With 8 channels:
Rewindable: 6104, loop iteration: 0 Rewindable: 6098, loop iteration: 1 Rewindable: 6096, loop iteration: 2 Rewindable: 6094, loop iteration: 9 Rewindable: 6092, loop iteration: 24 Rewindable: 6090, loop iteration: 32 Rewindable: 6088, loop iteration: 41
So on my snd-hda-intel, the granularity of the pointer is 32 bytes.
For Haswell HDMI (on another snd-hda-intel), stereo, S16_LE:
Rewindable: 6128, loop iteration: 0 Rewindable: 6112, loop iteration: 129 Rewindable: 6096, loop iteration: 339 Rewindable: 6080, loop iteration: 551 Rewindable: 6064, loop iteration: 753 Rewindable: 6048, loop iteration: 966 Rewindable: 6032, loop iteration: 1180
so the resulting granularity is 64 bytes.
An unfortunate observation is that, without snd_pcm_avail(), even on
hw just after an underrun snd_pcm_rewindable() can return negative numbers such as -16 or -25 that lead to nonsense error codes (EBUSY or ENOTTY).
pcm_rewind2.c use period size instead of buffer size as start_threshold , pcm is already started before you fill the buffer full and pcm can be stopped at underrun if your program does not use boundary as stop_threshold
this affect your timing if your test program behave like pcm_rewind2.c
I agree with the above. As long as it serves as a testcase for a bug, it is good.
static snd_pcm_sframes_t snd_pcm_hw_rewindable(snd_pcm_t *pcm)
{ return snd_pcm_mmap_hw_avail(pcm); }
if this function return the safe value, Do you mean it must hw_sync the pointer and should return zero if snd_pcm_mmap_hw_avail is negative and check the pcm state to return negative error code ?
Answering by parts.
Must hw_sync the pointer - yes.
Should return zero if snd_pcm_mmap_hw_avail is negative - not sure, for two reasons. First, I am not sure if snd_pcm_mmap_hw_avail is indeed allowed to return negative values due to yet-undetected xruns. Second, negative snd_pcm_mmap_hw_avail means an xrun, so I am not sure whether 0 is a valid return code here.
Should check the pcm state to return negative error code - yes at least for non-xrun states such as SND_PCM_STATE_SUSPENDED or SND_PCM_STATE_DISCONNECTED, and I am not sure whether to return 0 or -EPIPE on a known xrun.
Also I am not sure about interaction with a very large stop_threshold (i.e. settings that ignore underruns), and the above (except hw_sync) is for playback only. We need a separate discussion about capture, but I am not yet ready to start it.
http://www.alsa-project.org/~tiwai/writing-an-alsa-driver/ch05s07.html#pcm-i...
High frequency timer interrupts
This happens when the hardware doesn't generate interrupts at the
period boundary but issues timer interrupts at a fixed timer rate (e.g. es1968 or ymfpci drivers).
both es1968 and ymfpci use ac97 codec, there is an external clock source (oscillator) to provide the timing to both sound chips(ac97 controller) and ac97 codec to sync the transfer of audio through ac97 link at 48000Hz
it depends on whether the chip can count the clock ticks to provide a timer interrupt
I am also confuse about ymfpci really use timer interrupts.
Well, that's easy. According to your own words, the card sends an
interrupt every 256 samples and has no real notion of the user-defined period size. From ALSA viewpoint, this 256-sample interrupt is just a timer (but not a timer that is managed through functions that have "timer" in the name).
Unlike other hardware-mixing sound cards desgined for playing game , the multi voices of ymfpci is designed for playing MIDI which MIDI notes of a sound are usually start at same tempo
some subdevices can have unpredictable delay if is it not the subdevice which start the hardware
it depends on whether the hardware can provide registers for the driver to start each subdevice independently when receiving SNDRV_PCM_TRIGGER_START in pcm_trigger callback
OK
hw_ptr granularity is defined only by period_bytes_min (and
additional constraints if any).
Well, this disagrees with my experiments. For S16_LE stereo,
snd_pcm_hw_params_get_period_size_min() says 32 samples for both PCH and HDMI, while the measured granularity is different (8 and 16 samples).
should you use period_bytes_min instead of period_size_min ?
128 bytes / (8 x 2) = 8 samples for 8 channels
for 6 channels playback , the period does not fit exactly the pcie playload size 128 bytes
Will retest later today.
PulseAudio has the following consideration here: if the card cannot
report the position accurately, we need to disable the timestamp-based scheduling, as this breaks module-combine-sink (or any successor of it), because it relies on very accurate estimations of the actual sample rate ratio between two non-identical cards.
This is something to investigate, I am not ready to provide any
useful comment. Although in comment #2 bluetooth is mentioned, and this is indeed an example where even somewhat accurate timing information is not available.
if you want to hear sound from two snd-hda-intel at the same time using combined sink, you may need driver provide the output delay in hda codec
7.3.4.5 Audio Function Group Capabilities
Output Delay is a four bit value representing the number of samples between when the sample is received from the Link and when it appears as an analog signal at the pin. This may be a “typical” value. If this is 0, the widgets along the critical path should be queried, and each individual widget must report its individual delay.
Figure 85. Audio Function Group Capabilities Response Format
7.3.4.6 Audio Widget Capabilities
Delay indicates the number of sample delays through the widget. This may be 0 if the delay value in the Audio Function Parameters is supplied to represent the entire path.
http://git.kernel.org/cgit/linux/kernel/git/tiwai/hda-emu.git/tree/codecs
some hda codecs report delay in audio output/input widgets and the ranges of delay vary from 3 to 13 samples, hda_proc.c did not show output/input delay in the audio function group
Did snd_hda_param_read(codec, codec->afg, AC_PAR_AUDIO_FG_CAP) return any values for your hda codecs ?
what is critical path ?
How do I test this? Could you please post some userspace test code or a kernel patch, together with the instructions?
since some driver can enable/disable loopback mixing which the audio pass through less widgets when loopback mixing is disabled
some idt codecs have a 5 bands equalizer in the path of Port D(not in the pin complex widget or mixer widget but setup using vendor specific verb to audio function group or vendor specific widget) but not in the path of Port A (headphone)
Interesting, implementable for someone with the skills in this area,
but probably not relevant for the above freedesktop bug. What you are talking about is just a constant offset in the snd_pcm_delay() return values. That's bad, but I guess not bad enough for PulseAudio to stutter. What PulseAudio doesn't tolerate is jitter.
The two hda controllers of the reporter does not use same buffer size (buffer time)
Do the timer based scheduling wakeup 20ms before the the buffer is empty ? the timer eventually wakeup at different time if buffer time of two hda-controllers are not the same
does this mean the pulseaudio still keep the audio data until the pulseaudio client close the stream ?
Not ready to answer this yet.
Other pulseaudio modules seen does not support rewind (e.g. jack, tunnel, Bluetooth,...
http://git.alsa-project.org/?p=alsa-plugins.git;a=tree
Other alsa plugins (e.g. Jack, oss,...) seem not support rewind
Jack is interesting here: it is the only ioplug-based plugin which
sets mmap_rw = 1. As such, ALSA treats it as something that has mmapped buffer with the same semantics as an ordinary hardware sound card, and performs rewinds using this buffer. There is also a "hardware" position callback. The actual transfer of samples from that buffer to JACK is performed in a separate realtime thread which is implicitly created in jack_activate(). The porition is updated every JACK period.
The whole construction should support rewinds, with the
non-rewindable remainder being one JACK period (which may be different from one ALSA period). If the JACK period is 256 samples, this plugin should behave very much like one voice of ymfpci.
https://github.com/jackaudio/jack2/blob/master/linux/alsa/alsa_driver.c
jackd server does not use snd_pcm_rewind, support non-interleaved mode sound cards and sound cards with 10 or more channels (e.g iec1712, hdsp and hammerfall, ...) more than two playback ports
the jack client has no info about how many periods or channels used by jackd server
http://jackaudio.org/routing_alsa
jack client only specify how many channels and which playback ports
you can specify the stereo output to the grey jack if the jackd server use 8 channels playback or mix the stereo output to the right channel
http://cgit.freedesktop.org/pulseaudio/pulseaudio/tree/src/modules/jack/modu...
seem module-jack-sink.c use fixed latency
does it mean that pulseaudio only rewind those sink which support dynamic latency ? i.e. it won't rewind the sink if the sink used fixed latency
No. PulseAudio can, in theory, rewind fixed-latency sinks, but it will never usefully rewind this one (i.e. will truncate all rewind requests to 0), because it never sets max_rewind, and thus max_rewind gets defaulted to 0:
http://cgit.freedesktop.org/pulseaudio/pulseaudio/tree/src/pulsecore/sink.c#...
PulseAudio here uses a different rendering strategy from the ALSA sink. For the ALSA sink, PulseAudio renders aggressively as much as possible and then rewinds if necessary. For the JACK sink, PulseAudio renders only the minimum required portion of data and only when strictly necessary (when JACK has asked for it).
Note that, in PulseAudio, sink inputs also have buffers (in the form of memblockq, that's where pa_sink_render reads from), and client rewinds can be done using these buffers even if the sink itself is not rewindable.
12.05.2014 10:52, I wrote:
12.05.2014 09:11, Raymond Yau wrote:
D: [alsa-sink] alsa-util.c: PCM state is RUNNING I: [alsa-sink] alsa-sink.c: Starting playback. I: [alsa-sink] (alsa-lib)pcm_hw.c: SNDRV_PCM_IOCTL_START failed (-77)
does SNDRV_PCM_IOCTL_START fail mean pcm state is no longer running ?
Note that this comes from pcm_hw.c. As PulseAudio does not use the hw: device in this particular use case, I have to conclude that it comes through the a52 or ioplug code. I am not really familiar with this code.
This error code means -EBADFD. There is indeed some ifdefed-out code that prints the state on -EBADFD, so probably you are right. OTOH there is something strange in the code that may or may not be relevant. I have not tested my findings.
The a52 plugin sets the start threshold of the "slave" to its own start threshold. However, it doesn't write anything to the "slave" until it has enough data for one A52 frame. So, if the start threshold is too low, or if the pcm is started manually, it is quite possible that the "slave" is started without anything written into it. Result: immediate underrun. The dca plugin deals with this situation by inserting one DTS frame of silence before everything else, and writing as many "samples" to the slave as the application samples written into it.
Note that PulseAudio always starts the stream manually after writing the first full buffer.
And the log that you quoted also says:
I: [alsa-sink] (alsa-lib)pcm_hw.c: SNDRV_PCM_IOCTL_START failed (-77) D: [alsa-sink] alsa-util.c: Got POLLERR from ALSA D: [alsa-sink] alsa-util.c: Got POLLOUT from ALSA D: [alsa-sink] alsa-util.c: PCM state is RUNNING
The failure is for the hw plugin, however, the a52 plugin reports that it is running. Quite obvious desynchronization of state between the a52 plugin and its "slave".
hw_ptr granularity is defined only by period_bytes_min (and
additional constraints if any).
Well, this disagrees with my experiments. For S16_LE stereo,
snd_pcm_hw_params_get_period_size_min() says 32 samples for both PCH and HDMI, while the measured granularity is different (8 and 16 samples).
should you use period_bytes_min instead of period_size_min ?
128 bytes / (8 x 2) = 8 samples for 8 channels
for 6 channels playback , the period does not fit exactly the pcie playload size 128 bytes
Will retest later today.
Sorry, there is no userspace API for that. The only way to get the minimum period size is via snd_pcm_hw_params_get_period_size_min, which obtains the result in terms of frames.
does this mean the pulseaudio still keep the audio data until the pulseaudio client close the stream ?
Not ready to answer this yet.
Still don't know for sure.
No need to do this. I have already made enough conclusions. Unfortunately, I forgot to attach the new test program (intentionally
modified to produce an underrun), doing it now.
The output here is:
Hardware PCM card 2 'HDA Intel PCH' device 0 subdevice 0 Its setup is: stream : PLAYBACK access : RW_INTERLEAVED format : S16_LE subformat : STD channels : 4 rate : 48000 exact rate : 48000 (48000/1) msbits : 16 buffer_size : 4096 period_size : 1024 period_time : 21333 tstamp_mode : NONE period_step : 1 avail_min : 1024 period_event : 0 start_threshold : 1024 stop_threshold : 4096 silence_threshold: 0 silence_size : 0 boundary : 4611686018427387904 appl_ptr : 4096 hw_ptr : 4136 Rewindable: -40, loop iteration: 7807909 This means Too many levels of symbolic links
appl_ptr is 40 samples behind hw_ptr since your program stop writing data to the sound card and perform rewind , snd_pcm_state was change from runnning (3) to xrun (4)
The appl_ptr can be placed in any position in the ring buffer for the application to write data but the sound card fetch data from this ring buffer sequentially, however snd_pcm_write() assume the maximum distance between appl_ptr and hwptr is only one buffer
https://git.kernel.org/cgit/linux/kernel/git/tiwai/sound.git/commit/sound/pc...
Do you mean hwptr does not decrease by one period when you use arbitrary period sizes for hda-Intel ?
e.g. 48 samples (192 bytes) when using 1ms period time and stereo instead of 4 channels
you program seen hang when using pulse plugin
( 11.615| 0.000) I: [pulseaudio] sink-input.c: application.language = "C" ( 11.615| 0.000) I: [pulseaudio] sink-input.c: window.x11.display = ":0" ( 11.615| 0.000) I: [pulseaudio] sink-input.c: application.process.machine_id = "6ca4132d684a1184ca3cfd3b51bb2316" ( 11.615| 0.000) I: [pulseaudio] sink-input.c: application.process.session_id = "c2" ( 11.615| 0.000) I: [pulseaudio] sink-input.c: module-stream-restore.id = "sink-input-by-application-name:ALSA plug-in [pcm_rewindable]" ( 11.615| 0.000) I: [pulseaudio] protocol-native.c: Requested tlength=85.33 ms, minreq=21.33 ms ( 11.615| 0.000) D: [pulseaudio] protocol-native.c: Early requests mode enabled, configuring sink latency to minreq. ( 11.615| 0.000) D: [pulseaudio] protocol-native.c: Requested latency=21.33 ms, Received latency=80.00 ms ( 11.615| 0.000) D: [pulseaudio] memblockq.c: memblockq requested: maxlength=4194304, tlength=46080, base=4, prebuf=4096, minreq=15360 maxrewind=0 ( 11.615| 0.000) D: [pulseaudio] memblockq.c: memblockq sanitized: maxlength=4194304, tlength=46080, base=4, prebuf=4096, minreq=15360 maxrewind=0 ( 11.615| 0.000) I: [pulseaudio] protocol-native.c: Final latency 320.00 ms = 80.00 ms + 2*80.00 ms + 80.00 ms ( 11.615| 0.000) D: [alsa-sink-Intel ICH] alsa-sink.c: Requested volume: 0: 71% 1: 71% ( 11.615| 0.000) D: [alsa-sink-Intel ICH] alsa-sink.c: in dB: 0: -9.00 dB 1: -9.00 dB ( 11.615| 0.000) D: [alsa-sink-Intel ICH] alsa-sink.c: Got hardware volume: 0: 71% 1: 71% ( 11.615| 0.000) D: [alsa-sink-Intel ICH] alsa-sink.c: in dB: 0: -9.00 dB 1: -9.00 dB ( 11.615| 0.000) D: [alsa-sink-Intel ICH] alsa-sink.c: Calculated software volume: 0: 100% 1: 100% (accurate-enough=yes) ( 11.615| 0.000) D: [alsa-sink-Intel ICH] alsa-sink.c: in dB: 0: 0.00 dB 1: 0.00 dB ( 11.615| 0.000) D: [alsa-sink-Intel ICH] sink.c: Volume not changing ( 11.616| 0.000) D: [pulseaudio] core-subscribe.c: Dropped redundant event due to change event. ( 11.620| 0.004) D: [alsa-sink-Intel ICH] protocol-native.c: Requesting rewind due to end of underrun. ( 11.620| 0.000) D: [alsa-sink-Intel ICH] alsa-sink.c: Requested to rewind 14112 bytes. ( 11.620| 0.000) D: [alsa-sink-Intel ICH] alsa-sink.c: Limited to 13856 bytes. ( 11.620| 0.000) D: [alsa-sink-Intel ICH] alsa-sink.c: before: 3464 ( 11.620| 0.000) D: [alsa-sink-Intel ICH] alsa-sink.c: after: 3464 ( 11.620| 0.000) D: [alsa-sink-Intel ICH] alsa-sink.c: Rewound 13856 bytes. ( 11.620| 0.000) D: [alsa-sink-Intel ICH] sink.c: Processing rewind... ( 11.620| 0.000) D: [alsa-sink-Intel ICH] sink.c: latency = 15434 ( 11.620| 0.000) D: [alsa-sink-Intel ICH] sink-input.c: Have to rewind 13856 bytes on render memblockq. ( 11.620| 0.000) D: [alsa-sink-Intel ICH] source.c: Processing rewind... ( 11.624| 0.004) D: [alsa-sink-Intel ICH] protocol-native.c: Implicit underrun of 'ALSA Playback' ( 11.624| 0.000) D: [alsa-sink-Intel ICH] sink.c: Found underrun 6516 bytes ago (7596 bytes ahead in playback buffer) ( 11.630| 0.005) D: [alsa-sink-Intel ICH] sink.c: Found underrun 6516 bytes ago (7596 bytes ahead in playback buffer) ( 11.630| 0.000) D: [alsa-sink-Intel ICH] sink.c: Found underrun 6516 bytes ago (7596 bytes ahead in playback buffer) ( 11.646| 0.016) D: [alsa-sink-Intel ICH] sink.c: Found underrun 6516 bytes ago (7596 bytes ahead in playback buffer) ( 11.647| 0.000) D: [alsa-sink-Intel ICH] sink.c: Found underrun 6516 bytes ago (7596 bytes ahead in playback buffer) ( 11.652| 0.005) D: [alsa-sink-Intel ICH] sink.c: Found underrun 8436 bytes ago (5676 bytes ahead in playback buffer) ( 11.663| 0.010) D: [alsa-sink-Intel ICH] sink.c: Found underrun 10356 bytes ago (3756 bytes ahead in playback buffer) ( 11.667| 0.004) D: [alsa-sink-Intel ICH] sink.c: Found underrun 10356 bytes ago (3756 bytes ahead in playback buffer) ( 11.667| 0.000) D: [alsa-sink-Intel ICH] sink.c: Found underrun 10356 bytes ago (3756 bytes ahead in playback buffer) ( 11.674| 0.006) D: [alsa-sink-Intel ICH] sink.c: Found underrun 12276 bytes ago (1836 bytes ahead in playback buffer) ( 12.272| 0.598) E: [alsa-sink-Intel ICH] alsa-sink.c: ALSA woke us up to write new data to the device, but there was actually nothing to write! ( 12.272| 0.598) E: [alsa-sink-Intel ICH] alsa-sink.c: Most likely this is a bug in the ALSA driver 'snd_intel8x0'. Please report this issue to the ALSA developers. ( 12.272| 0.598) E: [alsa-sink-Intel ICH] alsa-sink.c: We were woken up with POLLOUT set -- however a subsequent snd_pcm_avail() returned 0 or another value < min_avail.
Sorry, there is no userspace API for that. The only way to get the
minimum period size is via snd_pcm_hw_params_get_period_size_min, which obtains the result in terms of frames.
Have you set channels, rate and format first and you have to call snd_pcm_hw_params_get_period_size_min before you set the period size ? http://git.qemu.org/?p=qemu.git;a=blob;f=hw/audio/ac97.c;hb=HEAD
http://git.qemu.org/?p=qemu.git;a=blob;f=hw/audio/hda-codec.c;hb=HEAD
However those emulated AC97 and HDA sound card inside virtual machine won't transfer audio like the real sound card, it mainly depend on the backend audio driver of the host
http://git.qemu.org/?p=qemu.git;a=tree;f=audio;hb=HEAD
Can your program use period size but pulseaudio server user period time, all rates supported by the sink (at least default rate and alternate rate used by pulseaudio) ?
the supported channels ( at least 2 and 6 which is commonly used)
How do I test this? Could you please post some userspace test code or a
kernel patch, together with the instructions?
Attach the patch to dump the values of the audio function group capability
There are three cases
1) delay in analog output > delay in digital output e.g, idt codecs 2) delay in analog output < delay in digital output e.g. adi codecs 3) no delay in audio widgets , digital output and analog output have no delay difference when output delay in audio function group is non zero ?
https://git.kernel.org/cgit/linux/kernel/git/tiwai/sound.git/commit/sound/pc...
Audio send to both Analog and digital output at the same time when iec958 default pcm switch is on
It is unlikely for ordinary user to measure the delay without using oscilloscope since the Analog speaker and digital receiver also have delay
The whole construction should support rewinds, with the
non-rewindable remainder being one JACK period (which may be different
from one ALSA period). If the JACK period is 256 samples, this plugin should behave very much like one voice of ymfpci.
does support snd_pcm_rewind imply support snd_pcm_forward ?
(err = snd_pcm_ioplug_set_param_minmax(&jack->io, SND_PCM_IOPLUG_HW_PERIOD_BYTES, 128, 64*1024)) < 0 ||
Like pulse plugin which use same period bytes min as snd-hda-intel
The application can use arbitrary period size but hwptr seem not always change at one period time, this is much worse than those drivers with SNDRV_PCM_INFO_BATCH
No. PulseAudio can, in theory, rewind fixed-latency sinks, but it will
never usefully rewind this one (i.e. will truncate all rewind requests to 0), because it never sets max_rewind, and thus max_rewind gets defaulted to 0:
http://cgit.freedesktop.org/pulseaudio/pulseaudio/tree/src/pulsecore/sink.c#...
PulseAudio here uses a different rendering strategy from the ALSA sink.
For the ALSA sink, PulseAudio renders aggressively as much as possible and then rewinds if necessary. For the JACK sink, PulseAudio renders only the minimum required portion of data and only when strictly necessary (when JACK has asked for it).
what other events pulseaudio need to rewind beside
1) change in volume of sink/source 2) change in volume of streams 3) change in latency 4) sync slaves of combined sinks
The defaults are: ; default-fragments = 4 ; default-fragment-size-msec = 25
Do you mean if volume change is delayed by one buffer( 0.1 seconds ) is still unacceptable if the user choose 4 x 25ms buffer time for the sink ?
23.05.2014 08:03, Raymond Yau wrote:
No need to do this. I have already made enough conclusions.
Unfortunately, I forgot to attach the new test program (intentionally
modified to produce an underrun), doing it now.
The output here is:
Hardware PCM card 2 'HDA Intel PCH' device 0 subdevice 0 Its setup is: stream : PLAYBACK access : RW_INTERLEAVED format : S16_LE subformat : STD channels : 4 rate : 48000 exact rate : 48000 (48000/1) msbits : 16 buffer_size : 4096 period_size : 1024 period_time : 21333 tstamp_mode : NONE period_step : 1 avail_min : 1024 period_event : 0 start_threshold : 1024 stop_threshold : 4096 silence_threshold: 0 silence_size : 0 boundary : 4611686018427387904 appl_ptr : 4096 hw_ptr : 4136 Rewindable: -40, loop iteration: 7807909 This means Too many levels of symbolic links
appl_ptr is 40 samples behind hw_ptr since your program stop writing data to the sound card and perform rewind , snd_pcm_state was change from runnning (3) to xrun (4)
Yes.
The appl_ptr can be placed in any position in the ring buffer for the application to write data but the sound card fetch data from this ring buffer sequentially, however snd_pcm_write() assume the maximum distance between appl_ptr and hwptr is only one buffer
https://git.kernel.org/cgit/linux/kernel/git/tiwai/sound.git/commit/sound/pc...
Do you mean hwptr does not decrease by one period when you use arbitrary period sizes for hda-Intel ?
I cannot comment on this commit. But "snd_hda_intel.align_buffer_size=1" indeed existed on my kernel command line (for no good reason now - so removed), and I don't use a strange period size.
Still, the bug (negative reported rewindable amount) also exists without align_buffer_size=1.
e.g. 48 samples (192 bytes) when using 1ms period time and stereo instead of 4 channels
Not tested.
you program seen hang when using pulse plugin
I am not interested in any more rewind-related bug reports for the pulse plugin. This particular bug will be fixed, together with many others, by always returning 0 from the .rewindable callback for ioplug if mmap_rw is false.
Sorry, there is no userspace API for that. The only way to get the
minimum period size is via snd_pcm_hw_params_get_period_size_min, which obtains the result in terms of frames.
Have you set channels, rate and format first and you have to call snd_pcm_hw_params_get_period_size_min before you set the period size ?
Yes. The test program is attached. For stereo S16_LE output, it reports 32 samples (i.e. 128 bytes) as the minimum period size. Still, hw_ptr assumes odd values during the test runs. Maybe due to bdl_pos_adj, which is 1 for my analog card. OTOH, with snd_pcm_avail inserted, it assumes even values with a step of 8.
http://git.qemu.org/?p=qemu.git;a=blob;f=hw/audio/ac97.c;hb=HEAD
http://git.qemu.org/?p=qemu.git;a=blob;f=hw/audio/hda-codec.c;hb=HEAD
However those emulated AC97 and HDA sound card inside virtual machine won't transfer audio like the real sound card, it mainly depend on the backend audio driver of the host
These source files are large. What should I look for?
Can your program use period size but pulseaudio server user period time, all rates supported by the sink (at least default rate and alternate rate used by pulseaudio) ?
the supported channels ( at least 2 and 6 which is commonly used)
Sorry, I don't understand. Besides, the program is only intended to demonstrate a bug in snd_pcm_rewindable when applied to the hw plugin, and, before fixing hw, I am not ready to discuss any other bugs that it may find in any other plugin.
How do I test this? Could you please post some userspace test code or
a kernel patch, together with the instructions?
Attach the patch to dump the values of the audio function group capability
There was no attachment.
There are three cases
- delay in analog output > delay in digital output e.g, idt codecs
- delay in analog output < delay in digital output e.g. adi codecs
- no delay in audio widgets , digital output and analog output have no
delay difference when output delay in audio function group is non zero ?
Yes, that's logical.
https://git.kernel.org/cgit/linux/kernel/git/tiwai/sound.git/commit/sound/pc...
Audio send to both Analog and digital output at the same time when iec958 default pcm switch is on
This only affects connections inside the sound card. The same samples will be sent to both outputs (i.e. there will be one stream), thus, any difference in delays will manifest itself.
It is unlikely for ordinary user to measure the delay without using oscilloscope since the Analog speaker and digital receiver also have delay
Correct. Also, while delay in analog speakers can be often rightfully assumed to be 0 samples, this is not the case for digital receivers. In other words, the delay on the digital path is in fact unknown.
However, your considerations above are limited to timing differences between several outputs of the same card. The original use case for module_combine_sink was to combine several analog cards into one, even though they have independent clocks and slightly different (and even changing with temperature) ideas about the meaning of "44100 Hz".
The whole construction should support rewinds, with the
non-rewindable remainder being one JACK period (which may be
different from one ALSA period). If the JACK period is 256 samples, this plugin should behave very much like one voice of ymfpci.
does support snd_pcm_rewind imply support snd_pcm_forward ?
I thought so, but, after thinking a bit more about ADPCM and attempting to fix rewinds there, I am no longer sure. In any case, remembering and reprocessing the past input is a valid way to implement forwards.
(err = snd_pcm_ioplug_set_param_minmax(&jack->io,
SND_PCM_IOPLUG_HW_PERIOD_BYTES, 128, 64*1024)) < 0 ||
Like pulse plugin which use same period bytes min as snd-hda-intel
The application can use arbitrary period size but hwptr seem not always change at one period time, this is much worse than those drivers with SNDRV_PCM_INFO_BATCH
For the jack plugin, it is a bug that it hardcodes 128 bytes and thus exposes period sizes less than one jack period. If I have time, I will look into fixing this, too.
what other events pulseaudio need to rewind beside
- change in volume of sink/source
- change in volume of streams
- change in latency
- sync slaves of combined sinks
I don't think that (4) is valid. It is handled by resampling, not rewinding.
5) explicit rewrite request from the client (if the sink has already rendered the rewritten data into its buffer) 6) state change ("If we are added for the first time...") of the sink input 7) corking/uncorking 8) end of underrun (for specific sink inputs) 9) moving streams between sinks
The defaults are: ; default-fragments = 4 ; default-fragment-size-msec = 25
Do you mean if volume change is delayed by one buffer( 0.1 seconds ) is still unacceptable if the user choose 4 x 25ms buffer time for the sink ?
I didn't explicitly mean this. However, please look at this old bug resulting from changing the volume at the wrong time: http://lists.freedesktop.org/archives/pulseaudio-bugs/2010-January/003766.ht... (which is now fixed for all cards that implement deferred volume changes).
The appl_ptr can be placed in any position in the ring buffer for the application to write data but the sound card fetch data from this ring buffer sequentially, however snd_pcm_write() assume the maximum distance between appl_ptr and hwptr is only one buffer
https://git.kernel.org/cgit/linux/kernel/git/tiwai/sound.git/commit/sound/pc...
Do you mean hwptr does not decrease by one period when you use arbitrary period sizes for hda-Intel ?
I cannot comment on this commit. But "snd_hda_intel.align_buffer_size=1"
indeed existed on my kernel command line (for no good reason now - so removed), and I don't use a strange period size.
3.6.2 Buffer Descriptor List
There must be at least two entries in the list, with a maximum of 256 entries.
3.6.3 Buffer Descriptor List Entry
the buffers described by BDLE must start on 128 bytes boundary
refer to azx_setup_periods, if one period represent one BDLE , the buffers described by two periods must start on 128 bytes boundary
with default prealloc_max = 64, pulseaudio are forced to use maximum buffer size / period size which are also 128 bytes aligned
when you specifiy prealloc_max = 4096, one second 48000Hz is also aligned to 128 bytes boundary but one second 44100 Hz is not
Still, the bug (negative reported rewindable amount) also exists without
align_buffer_size=1.
e.g. 48 samples (192 bytes) when using 1ms period time and stereo instead of 4 channels
Not tested.
The implementation dependent FIFO Size affect the number of the bytes that could be fetched by the controller at one time.
3.3.40 Offset 90: {IOB}SDnFIFOS – Input/Output/Bidirectional Stream Descriptor n FIFO Size
FIFO Size (FIFOS): Indicates the maximum number of bytes that could be fetched by the controller at one time. This is the maximum number of bytes that may have been DMA‟d into memory but not yet transmitted on the link, and is also the maximum possible value that the LPIB count will increase by at one time.
you program seen hang when using pulse plugin
I am not interested in any more rewind-related bug reports for the pulse
plugin. This particular bug will be fixed, together with many others, by always returning 0 from the .rewindable callback for ioplug if mmap_rw is false.
why do you assume rewind is supported if mmap_rw is true ? any example
How do I test this? Could you please post some userspace test code or
a kernel patch, together with the instructions?
Attach the patch to dump the values of the audio function group
capability
There was no attachment.
There are three cases
- delay in analog output > delay in digital output e.g, idt codecs
- delay in analog output < delay in digital output e.g. adi codecs
- no delay in audio widgets , digital output and analog output have no
delay difference when output delay in audio function group is non zero ?
Yes, that's logical.
It is unlikely for ordinary user to measure the delay without using oscilloscope since the Analog speaker and digital receiver also have
delay
Correct. Also, while delay in analog speakers can be often rightfully
assumed to be 0 samples, this is not the case for digital receivers. In other words, the delay on the digital path is in fact unknown.
If 13 samples delay in analog output is due to the five bands equalizer in IDT codecs, the headphone should not has same delay since equalizer in not present in the headphone path, may need to implement multi-channel capture to find out any delay between headphone and line out
24.05.2014 11:10, Raymond Yau wrote:
The appl_ptr can be placed in any position in the ring buffer for the application to write data but the sound card fetch data from this ring buffer sequentially, however snd_pcm_write() assume the maximum distance between appl_ptr and hwptr is only one buffer
https://git.kernel.org/cgit/linux/kernel/git/tiwai/sound.git/commit/sound/pc...
Do you mean hwptr does not decrease by one period when you use arbitrary period sizes for hda-Intel ?
I cannot comment on this commit. But
"snd_hda_intel.align_buffer_size=1" indeed existed on my kernel command line (for no good reason now - so removed), and I don't use a strange period size.
3.6.2 Buffer Descriptor List
There must be at least two entries in the list, with a maximum of 256 entries.
3.6.3 Buffer Descriptor List Entry
the buffers described by BDLE must start on 128 bytes boundary
refer to azx_setup_periods, if one period represent one BDLE , the buffers described by two periods must start on 128 bytes boundary
with default prealloc_max = 64, pulseaudio are forced to use maximum buffer size / period size which are also 128 bytes aligned
when you specifiy prealloc_max = 4096, one second 48000Hz is also aligned to 128 bytes boundary but one second 44100 Hz is not
I am not sure I can continue this line of discussion usefully, because I don't understand the purpose. If this is an attempt to understand the granularity of hw_ptr (which would indeed be useful), then I cannot help. If this is a report of a possible non-rewind-related bug in PulseAudio, please start a new thread.
Still, the bug (negative reported rewindable amount) also exists
without align_buffer_size=1.
e.g. 48 samples (192 bytes) when using 1ms period time and stereo instead of 4 channels
Not tested.
The implementation dependent FIFO Size affect the number of the bytes that could be fetched by the controller at one time.
3.3.40 Offset 90: {IOB}SDnFIFOS – Input/Output/Bidirectional Stream Descriptor n FIFO Size
FIFO Size (FIFOS): Indicates the maximum number of bytes that could be fetched by the controller at one time. This is the maximum number of bytes that may have been DMA‟d into memory but not yet transmitted on the link, and is also the maximum possible value that the LPIB count will increase by at one time.
OK, this looks very relevant. Is this the same value as would be returned by snd_pcm_hw_params_get_fifo_size()? If not, why, and how do I view this value?
you program seen hang when using pulse plugin
I am not interested in any more rewind-related bug reports for the
pulse plugin. This particular bug will be fixed, together with many others, by always returning 0 from the .rewindable callback for ioplug if mmap_rw is false.
why do you assume rewind is supported if mmap_rw is true ? any example
The example is jack plugin (in fact, the only plugin known to me that sets mmap_rw to true). It does support rewinds, as I have already explained and tested. It works because the periodic transfer of samples to JACK is done in a separate realtime thread. Application writes samples into a circular mmap-style buffer, ioplug uses the generic mmap-style functions for rewinding that buffer, and the thread reads from it, just as a real sound card would do. So an application can safely rewind any samples that it has written to that buffer but that the thread hasn't yet copied to JACK.
Of course it is possible to write a buggy ioplug-based plugin that doesn't really support rewinds even though it sets mmap_rw to true (e.g. by implementing the transfer callback - the real problem here, if my understanding is correct, is that it has no access to the application pointer). But in reality I don't know any such plugin.
Still, you are right, and a better idea would be to say: an ioplug-based plugin can be assumed to support rewinds if and only if it sets mmap_rw to true and does not provide a transfer callback. I say so because such architecture forces the plugin to use a low-latency thread to do the actual transfers and also avoids the need to care about the application pointer altogether. In other words, such plugin implements an architecture similar to one of a real DMA-based sound card.
How do I test this? Could you please post some userspace test code or
a kernel patch, together with the instructions?
Attach the patch to dump the values of the audio function group
capability
There was no attachment.
There are three cases
- delay in analog output > delay in digital output e.g, idt codecs
- delay in analog output < delay in digital output e.g. adi codecs
- no delay in audio widgets , digital output and analog output have no
delay difference when output delay in audio function group is non
zero ?
Yes, that's logical.
It is unlikely for ordinary user to measure the delay without using oscilloscope since the Analog speaker and digital receiver also have
delay
Correct. Also, while delay in analog speakers can be often rightfully
assumed to be 0 samples, this is not the case for digital receivers. In other words, the delay on the digital path is in fact unknown.
If 13 samples delay in analog output is due to the five bands equalizer in IDT codecs, the headphone should not has same delay since equalizer in not present in the headphone path, may need to implement multi-channel capture to find out any delay between headphone and line out
For me an easier way would be to go to the nearest electronic components shop and buy three 3.5mm jacks and some wires to do a non-standard interconnection. You only need to capture two channels anyway: one from headphones and one from line output.
The appl_ptr can be placed in any position in the ring buffer for
the
application to write data but the sound card fetch data from this
ring
buffer sequentially, however snd_pcm_write() assume the maximum
distance
between appl_ptr and hwptr is only one buffer
https://git.kernel.org/cgit/linux/kernel/git/tiwai/sound.git/commit/sound/pc...
Do you mean hwptr does not decrease by one period when you use
arbitrary
period sizes for hda-Intel ?
I cannot comment on this commit. But
"snd_hda_intel.align_buffer_size=1" indeed existed on my kernel command line (for no good reason now - so removed), and I don't use a strange period size.
Loaded sound module options !!--------------------------
!!Module: snd_hda_intel align_buffer_size : -1 bdl_pos_adj : 32,32,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1
Why only one entry for align_buffer_size if it is specific to hda controller when the computer have two hda controllers ?
https://bugs.launchpad.net/ubuntu/+source/alsa-driver/+bug/1324426
https://launchpadlibrarian.net/176540072/Symptom_PulseAudioLog.txt
May 29 06:57:49 Lihkin pulseaudio[1874]: [alsa-sink-92HD81B1X5 Analog] alsa-sink.c: ALSA woke us up to write new data to the device, but there was actually nothing to write!
Most likely this is a bug in the ALSA driver 'snd_hda_intel'. Please report this issue to the ALSA developers. We were woken up with POLLOUT set -- however a subsequent snd_pcm_avail() returned 0 or another value < min_avail.
snd_pcm_avail() returned a value that is exceptionally large: 181132 bytes (1026 ms). Most likely this is a bug in the ALSA driver 'snd_hda_intel'. Please report this issue to the ALSA developers.
Seem pulseaudio can use strange period size
May 29 11:36:52 Lihkin pulseaudio[1306]: [alsa-sink-HDMI 0] alsa-util.c: Its setup is: stream : PLAYBACK access : MMAP_INTERLEAVED format : S16_LE subformat : STD channels : 2 rate : 44100 exact rate : 44100 (44100/1) msbits : 16 buffer_size : 3520 period_size : 352 period_time : 7981 tstamp_mode : ENABLE period_step : 1 avail_min : 352 period_event : 1 start_threshold : -1 stop_threshold : 7926335344172072960 silence_threshold: 0 silence_size : 0 boundary : 7926335344172072960 appl_ptr : 2916662 hw_ptr : 2958425
start_threshold : -1
This mean the playback will automatically start if pulseaudio write any audio data instead of start after write pre buffer
Try aplay with any arbitrary period size/period time and xrun_debug , check whether hw_ptr is updated on every period update in the system log
aplay -v --period-size=352 --buffer-size=3520 -Dhw:0,0 stereo.wav
http://www.alsa-project.org/main/index.php/XRUN_Debug
/proc/asound/card#/pcm0p/xrun_debug
Replace '#' with your card number (usually 0). This proc file can enable various debugging tools. The CONFIG_SND_PCM_XRUN_DEBUG, CONFIG_SND_VERBOSE_PROCFS, CONFIG_SND_DEBUG options must be enabled in your kernel (if xrun_debug proc file is present - this feature is enabled).
# Enable basic debugging, do jiffies check and dump position on each period and hardware pointer update calls # Usefull when the lowlevel (specific) hardware driver is somehow broken echo 29 > /proc/asound/card0/pcm0p/xrun_debug
Yes. The test program is attached. For stereo S16_LE output, it reports
32 samples (i.e. 128 bytes) as the minimum period size.
your program segfault when using NULL plugin
what other events pulseaudio need to rewind beside
- change in volume of sink/source
Volume change of the sink is supposed to be real time event
User are allowed to change the volume slider even when the sink is not playing or source is not capturing
If the requested volume cannot be satisfied by changing hardware volume controls , does pulseaudio keep the calculated software volume adjustment when the sink start playing while the hardware volume is supposed to be updated immediately ?
when user specify ignore_DB=1 or codec does not have any volume control (e.g. SPDIF stereo playback) which force pulseaudio to use software volume. The software volume will be same as the requested volume(volume slider in sound preference) even when there is no playing stream.
Do this software volume setting saved in pulseaudio database when shutdown and restore on pulseaudio next startup by module-device-restore for the SPDIF stereo profile ?
- change in volume of streams
since there may be multiple streams (clients) playing at the same time, Do pulseaudio still keep copies of unmixed audio of all connected clients ?
- change in latency
- sync slaves of combined sinks
I don't think that (4) is valid. It is handled by resampling, not
rewinding.
- explicit rewrite request from the client (if the sink has already
rendered the rewritten data into its buffer)
- state change ("If we are added for the first time...") of the sink
input
If the stream is the first client ,
How about a low latency client connect to server when a high latency client is already playing ?
Do pulseaudio still rewind the sink when low latency client is playing and a high latency client connect to server ?
- corking/uncorking
Do you mean pause of the playing stream while other stream is still playing ?
- end of underrun (for specific sink inputs)
if appl_ptr is behind hw_ptr , you need to use snd_pcm_forward to advance the appl_ptr to a suitable position which you write enough data so that final appl_ptr is at least one period ahead of hwptr
Why do end of underrun need rewind ?
- moving streams between sinks
What happen when the sinks have different buffer size ?
Do all playing streams need to re-adjust latency ?
28.05.2014 13:58, Raymond Yau wrote:
Yes. The test program is attached. For stereo S16_LE output, it
reports 32 samples (i.e. 128 bytes) as the minimum period size.
your program segfault when using NULL plugin
Bug in the null plugin. snd_pcm_null_fast_ops lacks the .rewindable field initialization, thus this callback gets defaulted to NULL. Thanks for reporting.
what other events pulseaudio need to rewind beside
- change in volume of sink/source
Volume change of the sink is supposed to be real time event
User are allowed to change the volume slider even when the sink is not playing or source is not capturing
If the requested volume cannot be satisfied by changing hardware volume controls , does pulseaudio keep the calculated software volume adjustment when the sink start playing while the hardware volume is supposed to be updated immediately ?
The answer depends on the volume mode.
In the flat-volume mode, there is in fact no independent sink volume. If at least one stream is playing, then the volume of the loudest stream is applied as a sink volume, and other streams are attenuated accordingly. So the answer to your question is "no", because starting the stream changes the total volume of the sink, including the hardware part.
In the non-flat volume mode, the answer is "yes".
when user specify ignore_DB=1 or codec does not have any volume control (e.g. SPDIF stereo playback) which force pulseaudio to use software volume.
The software volume will be same as the requested volume(volume slider in sound preference) even when there is no playing stream.
Do this software volume setting saved in pulseaudio database when shutdown and restore on pulseaudio next startup by module-device-restore for the SPDIF stereo profile ?
The volume is saved and restored as a whole. module-device-restore does not care what part of the volume is software and what part is hardware.
- change in volume of streams
since there may be multiple streams (clients) playing at the same time, Do pulseaudio still keep copies of unmixed audio of all connected clients ?
Yes. Each sink input keeps some past audio in a memblockq, just in case if the sink re-asks it.
- change in latency
- sync slaves of combined sinks
I don't think that (4) is valid. It is handled by resampling, not
rewinding.
- explicit rewrite request from the client (if the sink has already
rendered the rewritten data into its buffer)
- state change ("If we are added for the first time...") of the sink
input
If the stream is the first client ,
How about a low latency client connect to server when a high latency client is already playing ?
Do pulseaudio still rewind the sink when low latency client is playing and a high latency client connect to server ?
Every new connection (and by "added for the first time" the comment means exactly this) causes a full rewind.
- corking/uncorking
Do you mean pause of the playing stream while other stream is still playing ?
Yes. The other case is when a music stream is paused automatically while another stream (e.g. VoIP) is about to be played.
- end of underrun (for specific sink inputs)
if appl_ptr is behind hw_ptr , you need to use snd_pcm_forward to advance the appl_ptr to a suitable position which you write enough data so that final appl_ptr is at least one period ahead of hwptr
Why do end of underrun need rewind ?
Because we mean different kinds of underrun. You mean ALSA-level underrun. The comment means PulseAudio-level underrun - i.e. one of the streams failed to supply data in time, while the ALSA card got data without the contribution from that stream.
So PulseAudio mixed a lot of data ahead without that stream, filled in the ALSA buffer, and new data for that stream shows up. Obviously, it needs to be heard right away, and thus already-mixed data should be discarded by an ALSA-level rewind.
- moving streams between sinks
What happen when the sinks have different buffer size ?
Do all playing streams need to re-adjust latency ?
In theory, yes. The total latency is the sum of the latency of the sink and the latency specific to the sink input. This sum needs to be redistributed amongst these two parts.
However, the big comment after "case PA_SINK_MESSAGE_START_MOVE:" in src/pulsecore/sink.c contains some FIXMEs about known bugs in this process.
participants (4)
-
Alexander E. Patrakov
-
Clemens Ladisch
-
David Henningsson
-
Raymond Yau