[alsa-devel] safe support for rewind in ALSA
All, I'd like to reopen a thread on concerns with the current implementation of rewind in ALSA. This was already discussed on the mailing list last year without any progress. (see http://thread.gmane.org/gmane.linux.alsa.devel/65759/focus=65773 and http://article.gmane.org/gmane.linux.alsa.devel/66902)
The problem is that snd_pcm_rewind() allows in theory to go back to the next sample to be rendered. This however isn't possible on any existing hardware. DMAs do prefetch samples ahead of time, and output interfaces have FIFOs on top of this. This isn't too bad in the PC world with HDAudio, since the prefetching is limited. However in the embedded space it has become a headache. To reduce power consumption, most of the recent embedded interfaces do embed some local internal memory (5kB for TI's McBSP3, a lot more for others), and we should provide a means for driver developers to set a bound on how much can be rewound safely when power-saving features are enabled.
Note that there already are a set of definitions for FIFOs, but they are either obsolete or no one uses them (actually HDAudio makes use of this value internally in azx_dev but doesn't export it).
My suggestion would be to add a 'fifo' parameter to the runtime structure (so that it's visible to the core), let the driver developer fill the value in the .open routine, and use this parameter in rewind to prevent rewinding too much. This parameter would include both hardware FIFOs and DMA buffers.
Comments welcome. - Pierre
On Mon, 1 Feb 2010, pl bossart wrote:
All, I'd like to reopen a thread on concerns with the current implementation of rewind in ALSA. This was already discussed on the mailing list last year without any progress.
I don't think that there have been no progress. The queued samples can be stored to runtime->delay now, so snd_pcm_delay() returns a correct value.
The snd_pcm_rewind() and snd_pcm_forward() functions should operate in the ring buffer and it's ok. See the USB driver for an example. Just add support for runtime->delay to the lowlevel drivers and use snd_pcm_delay() correctly in the user space and everything will work as expected.
In other words - for hardware with large FIFOs, the runtime->delay should be used for queued samples and the hw_ptr in the ring buffer should be increased as soon as the FIFO is filled with samples from the ring buffer.
Jaroslav
----- Jaroslav Kysela perex@perex.cz Linux Kernel Sound Maintainer ALSA Project, Red Hat, Inc.
I don't think that there have been no progress. The queued samples can be stored to runtime->delay now, so snd_pcm_delay() returns a correct value.
The snd_pcm_rewind() and snd_pcm_forward() functions should operate in the ring buffer and it's ok. See the USB driver for an example. Just add support for runtime->delay to the lowlevel drivers and use snd_pcm_delay() correctly in the user space and everything will work as expected.
In other words - for hardware with large FIFOs, the runtime->delay should be used for queued samples and the hw_ptr in the ring buffer should be increased as soon as the FIFO is filled with samples from the ring buffer.
Thanks Jaroslav. I was under the impression that the runtime->delay indicated the time needed for transmission and D/A. If this is intended to be the queued samples, then it plays the same role as my 'fifo' proposal. There are several consequences though: - This means rewind() can only happen within the ring buffer, and all samples previously queued will be played as is. - the 'only' change is to make sure the hw_ptr reported in .pointer is NOT the number of samples pushed out to the interface. hw_ptr should really represent the next read position in the ring buffer, this isn't uniform across drivers. This means for example that the HDAudio implementation needs to modified so that the LPIB value is increased with runtime->delay. - How do you specify the time for transmission and D/A, so that applications can know when a sample will actually be played. With your explanation the applications can only know when a sample will be pushed out, there is an additional latency not accounted for. Thanks for your feedback. - Pierre
2010/2/2 pl bossart bossart.nospam@gmail.com
I don't think that there have been no progress. The queued samples can be stored to runtime->delay now, so snd_pcm_delay() returns a correct value.
The snd_pcm_rewind() and snd_pcm_forward() functions should operate in
the
ring buffer and it's ok. See the USB driver for an example. Just add
support
for runtime->delay to the lowlevel drivers and use snd_pcm_delay()
correctly
in the user space and everything will work as expected.
In other words - for hardware with large FIFOs, the runtime->delay should
be
used for queued samples and the hw_ptr in the ring buffer should be increased as soon as the FIFO is filled with samples from the ring
buffer.
Thanks Jaroslav. I was under the impression that the runtime->delay indicated the time needed for transmission and D/A. If this is intended to be the queued samples, then it plays the same role as my 'fifo' proposal. There are several consequences though:
- This means rewind() can only happen within the ring buffer, and all
samples previously queued will be played as is.
- the 'only' change is to make sure the hw_ptr reported in .pointer is
NOT the number of samples pushed out to the interface. hw_ptr should really represent the next read position in the ring buffer, this isn't uniform across drivers. This means for example that the HDAudio implementation needs to modified so that the LPIB value is increased with runtime->delay.
- How do you specify the time for transmission and D/A, so that
applications can know when a sample will actually be played. With your explanation the applications can only know when a sample will be pushed out, there is an additional latency not accounted for. Thanks for your feedback.
- Pierre
if snd_xxx_pointer call back return the next read position , the hw_ptr will always be multiple of PCI/PCIe brust size
Hi,
On Wed, 3 Feb 2010, Raymond Yau wrote:
- the 'only' change is to make sure the hw_ptr reported in .pointer is
NOT the number of samples pushed out to the interface. hw_ptr should
[...]
if snd_xxx_pointer call back return the next read position , the hw_ptr will always be multiple of PCI/PCIe brust size
yep, that's the point I was trying to make in my longish earlier mail yesterday: http://article.gmane.org/gmane.linux.alsa.devel/70306
I also presented two solutions for this. So let hw_ptr be a multiple of burst/batch size, but compensate by:
1) driver updates runtime->delay in its pointer() callback, or -> finegrained playout delay can be reported even if hw_ptr is a multiple of burst/batch size 2) use of snd_pcm_status_get_tstamp() (original idea from Eero) -> assumes runtime->delay is updated in sync with hw_ptr
Both are possible with existing ALSA infra, and driver and app developers could just use these, but what is missing is the consensus whether this is a valid/allowed thing to do and rely on.
Hello,
me again..
On Mon, 1 Feb 2010, pl bossart wrote:
In other words - for hardware with large FIFOs, the runtime->delay should be used for queued samples and the hw_ptr in the ring buffer should be increased as soon as the FIFO is filled with samples from the ring buffer.
[...]
- the 'only' change is to make sure the hw_ptr reported in .pointer is
NOT the number of samples pushed out to the interface. hw_ptr should
although, I'm not entirely sure this still makes rewinds fully safe. After driver has called period elapsed and hw_ptr jumps ahead one period worth of samples, the DMA for the next burst/batch is already programmed and possibly ongoing. And with some drivers the burst size (of a single DMA transaction) may be fairly large, while some transfer sample at a time, at codec rate.
This might lead to undefined behaviour when application rewinds upto hw_ptr and starts to refill the segment of the ringbuffer just after hw_ptr, while at the same time DMA engine is already transferring data out of that same ringbuffer segment.
So a safer bet would be to limit rewinds to hw_ptr+X, where X is highly driver/hw specific. At the minimum, 'X >= dma_get_cache_alignment()' (see linux/Documentation/DMA-API.txt) to get deterministic results on different platforms. A sane convervative assumption is 'X >= period-size'.
- the 'only' change is to make sure the hw_ptr reported in .pointer is
NOT the number of samples pushed out to the interface. hw_ptr should
although, I'm not entirely sure this still makes rewinds fully safe. After driver has called period elapsed and hw_ptr jumps ahead one period worth of samples, the DMA for the next burst/batch is already programmed and possibly ongoing. And with some drivers the burst size (of a single DMA transaction) may be fairly large, while some transfer sample at a time, at codec rate.
This might lead to undefined behaviour when application rewinds upto hw_ptr and starts to refill the segment of the ringbuffer just after hw_ptr, while at the same time DMA engine is already transferring data out of that same ringbuffer segment.
So a safer bet would be to limit rewinds to hw_ptr+X, where X is highly driver/hw specific. At the minimum, 'X >= dma_get_cache_alignment()' (see linux/Documentation/DMA-API.txt) to get deterministic results on different platforms. A sane convervative assumption is 'X >= period-size'.
Well, we went from my interpretation that was completely broken to something that can still be broken at times... If you really want to be safe, you'd need a means to specify this X value for your system. Actually it would make a lot of sense to do so. On most embedded systems the DMA bursts and buffering in FIFOs can be programmed. It'd be nice to have the ability to set different values for DMA bursts and delay depending on the mode (low-power, low-latency, etc).
2010/2/4 pl bossart bossart.nospam@gmail.com
- the 'only' change is to make sure the hw_ptr reported in .pointer is
NOT the number of samples pushed out to the interface. hw_ptr should
although, I'm not entirely sure this still makes rewinds fully safe.
After
driver has called period elapsed and hw_ptr jumps ahead one period worth
of
samples, the DMA for the next burst/batch is already programmed and
possibly
ongoing. And with some drivers the burst size (of a single DMA
transaction)
may be fairly large, while some transfer sample at a time, at codec rate.
This might lead to undefined behaviour when application rewinds upto
hw_ptr
and starts to refill the segment of the ringbuffer just after hw_ptr,
while
at the same time DMA engine is already transferring data out of that same ringbuffer segment.
So a safer bet would be to limit rewinds to hw_ptr+X, where X is highly driver/hw specific. At the minimum, 'X >= dma_get_cache_alignment()' (see linux/Documentation/DMA-API.txt) to get deterministic results on
different
platforms. A sane convervative assumption is 'X >= period-size'.
Well, we went from my interpretation that was completely broken to something that can still be broken at times... If you really want to be safe, you'd need a means to specify this X value for your system. Actually it would make a lot of sense to do so. On most embedded systems the DMA bursts and buffering in FIFOs can be programmed. It'd be nice to have the ability to set different values for DMA bursts and delay depending on the mode (low-power, low-latency, etc).
will your propsed hw_ptr move when underrun occur ?
if the 'only' change is to make sure the hw_ptr reported in .pointer is
NOT the number of >> samples pushed out to the interface.
hw_ptr should really represent the next read position in the ring buffer,
this isn't
uniform across drivers. This means for example that the HDAudio implementation needs to modified so that the LPIB value is increased with runtime->delay.
2010/2/2 Jaroslav Kysela perex@perex.cz
On Mon, 1 Feb 2010, pl bossart wrote:
All, I'd like to reopen a thread on concerns with the current implementation of rewind in ALSA. This was already discussed on the mailing list last year without any progress.
I don't think that there have been no progress. The queued samples can be stored to runtime->delay now, so snd_pcm_delay() returns a correct value.
The snd_pcm_rewind() and snd_pcm_forward() functions should operate in the ring buffer and it's ok. See the USB driver for an example. Just add support for runtime->delay to the lowlevel drivers and use snd_pcm_delay() correctly in the user space and everything will work as expected.
In other words - for hardware with large FIFOs, the runtime->delay should be used for queued samples and the hw_ptr in the ring buffer should be increased as soon as the FIFO is filled with samples from the ring buffer.
Jaroslav
Are there any demo program using snd_pcm_rewind()/snd_pcm_forward() to verify the driver meet your proposal ?
For the interrupt driven model, the application fill the period while the driver playing the other period on two periods per buffer , the drivers don't need to provide accuracy up to a few sample derivation.
However It seem to me that your proposal give an impression to the application developer that all ALSA drivers can provide accuracy up to sample accuracy
2010/2/2 Jaroslav Kysela perex@perex.cz
On Mon, 1 Feb 2010, pl bossart wrote:
All, I'd like to reopen a thread on concerns with the current implementation of rewind in ALSA. This was already discussed on the mailing list last year without any progress.
I don't think that there have been no progress. The queued samples can be stored to runtime->delay now, so snd_pcm_delay() returns a correct value.
The snd_pcm_rewind() and snd_pcm_forward() functions should operate in the ring buffer and it's ok. See the USB driver for an example. Just add support for runtime->delay to the lowlevel drivers and use snd_pcm_delay() correctly in the user space and everything will work as expected.
In other words - for hardware with large FIFOs, the runtime->delay should be used for queued samples and the hw_ptr in the ring buffer should be increased as soon as the FIFO is filled with samples from the ring buffer.
Jaroslav
How about those alsa plugins ( e.g. pulse , jack , oss , .... )
Can alsa application use snd_pcm_rewind() for pulse plugin or oss plugin ?
On Sat, 06.02.10 19:59, Raymond Yau (superquad.vortex2@gmail.com) wrote:
How about those alsa plugins ( e.g. pulse , jack , oss , .... )
Can alsa application use snd_pcm_rewind() for pulse plugin or oss plugin ?
If your question is wether the PA plugin for libasound supports snd_pcm_rewind(), then the answer is no, it doesn't. I am not even sure the ioplug stuff would even allow that as it stands now.
Lennart
2010/2/17 Lennart Poettering mznyfn@0pointer.de
On Sat, 06.02.10 19:59, Raymond Yau (superquad.vortex2@gmail.com) wrote:
How about those alsa plugins ( e.g. pulse , jack , oss , .... )
Can alsa application use snd_pcm_rewind() for pulse plugin or oss plugin
?
If your question is wether the PA plugin for libasound supports snd_pcm_rewind(), then the answer is no, it doesn't. I am not even sure the ioplug stuff would even allow that as it stands now.
Lennart
So this mean that snd_pcm_rewindable() should return zero when using pulse device
On Thu, 18.02.10 09:31, Raymond Yau (superquad.vortex2@gmail.com) wrote:
If your question is wether the PA plugin for libasound supports snd_pcm_rewind(), then the answer is no, it doesn't. I am not even sure the ioplug stuff would even allow that as it stands now.
Lennart
So this mean that snd_pcm_rewindable() should return zero when using pulse device
Yes, it should.
Lennart
2010/2/18 Lennart Poettering mznyfn@0pointer.de
On Thu, 18.02.10 09:31, Raymond Yau (superquad.vortex2@gmail.com) wrote:
If your question is wether the PA plugin for libasound supports snd_pcm_rewind(), then the answer is no, it doesn't. I am not even sure the ioplug stuff would even allow that as it stands now.
Lennart
So this mean that snd_pcm_rewindable() should return zero when using
pulse
device
Yes, it should.
Lennart
how about those ladspa , mulaw , alaw , a52 , oss and jack plugin ?
On Mon, Feb 01, 2010 at 11:20:25AM -0600, pl bossart wrote:
My suggestion would be to add a 'fifo' parameter to the runtime structure (so that it's visible to the core), let the driver developer fill the value in the .open routine, and use this parameter in rewind to prevent rewinding too much. This parameter would include both hardware FIFOs and DMA buffers.
I think we want to split these concepts out a bit - with some devices the stuff that's in the FIFO is not represented to the ALSA data management code because it's past where the DMA controller says it's working and trying to retrofit the data into what the DMA controller reports gets too complicated. I'd also expect a degree of lumpiness in the memory regions the DMA controller is working on which it might be useful to convey to applications. Perhaps for the DMA we could add a pointer for the first area the CPU can currently access safely?
For FIFO style latencies I think we'd want to allow the value reported to be variable (since for example changes in any DSP algorithms that are active could change the latency), and since it's not entirely joined up with the DMA itself it might be better to just explicitly report to applications and let them figure out what they think is a sensible way of dealing with it. This seems easier than trying to integrate the information with the DMA data and more useful for latencies which we know as actual time delays rather than buffer sizes in the hardware.
On Mon, 1 Feb 2010, Mark Brown wrote:
On Mon, Feb 01, 2010 at 11:20:25AM -0600, pl bossart wrote:
My suggestion would be to add a 'fifo' parameter to the runtime structure (so that it's visible to the core), let the driver developer fill the value in the .open routine, and use this parameter in rewind to prevent rewinding too much. This parameter would include both hardware FIFOs and DMA buffers.
I think we want to split these concepts out a bit - with some devices the stuff that's in the FIFO is not represented to the ALSA data management code because it's past where the DMA controller says it's working and trying to retrofit the data into what the DMA controller reports gets too complicated. I'd also expect a degree of lumpiness in the memory regions the DMA controller is working on which it might be useful to convey to applications. Perhaps for the DMA we could add a pointer for the first area the CPU can currently access safely?
For FIFO style latencies I think we'd want to allow the value reported to be variable (since for example changes in any DSP algorithms that are active could change the latency), and since it's not entirely joined up with the DMA itself it might be better to just explicitly report to applications and let them figure out what they think is a sensible way of dealing with it. This seems easier than trying to integrate the information with the DMA data and more useful for latencies which we know as actual time delays rather than buffer sizes in the hardware.
+1 - the runtime->delay is exactly for this information (number of samples queued in hardware) and can be changed in the lowlevel driver at runtime depending on the actual hardware state.
Jaroslav
----- Jaroslav Kysela perex@perex.cz Linux Kernel Sound Maintainer ALSA Project, Red Hat, Inc.
Hi all,
btw, another related thread: James Courtier-Dutton - "DMA and delay feedback" http://article.gmane.org/gmane.linux.alsa.devel/69262
So it seems this is a fairly frequenty-raised-topic nowadays here.;)
But then some actual comments inline:
On Mon, 1 Feb 2010, Jaroslav Kysela wrote:
of dealing with it. This seems easier than trying to integrate the information with the DMA data and more useful for latencies which we know as actual time delays rather than buffer sizes in the hardware.
+1 - the runtime->delay is exactly for this information (number of samples queued in hardware) and can be changed in the lowlevel driver at runtime depending on the actual hardware state.
As I already ack'ed in one of the older threads, this (btw fairly recent addition to ALSA) already solves part of the puzzle.
As James pointed out in the above thread from December, there are quite a few similar, but not quite identical, use-cases in play here. I'll now focus solely on the accurate a/v sync case with a buffering audio HW:
- at T1, DMA interrupt, period is elapsed and hw_ptr is incremented by period-size, and driver can update runtime->delay - at T2, application wakes up (due to ALSA or possible e.g. by system-timer interrupt) - at T3, application calls snd_pcm_delay() to query how many samples of delay there is currently (e.g. if it write a sample to ALSA PCM device now, how long before it hits the speaker) - note that this is what snd_pcm_delay() is specifically for
... now note that this is a different problem than the rewind() case, or getting more accurate pcm_avail() figures, although these all are related.
Anyways, the main problem is that snd_pcm_delay() accuracy is limited by the transfer/burst size used to move samples from main memory to the sound chip, _although_ the hardware _is_ able to tell the exact current position (not related to status of the DMA transfer, but the status of what is currently played out to the codec).
Do you see the problem here?
In the same December thread, Eero Nurkkala posted one workaround for this issue: http://article.gmane.org/gmane.linux.alsa.devel/69287
So while snd_pcm_delay() provides a snapshot of the delay at the last DMA burst/block-transfer (when hw_ptr+runtime->delay were last updated in the driver), the information may be refined with snd_pcm_status_get_tstamp(), which essentially tells the diff between T1&T3. So essentially what the application is looking for is 'snd_pcm_delay()-(T3-T1)'.
Now this does work, and a big plus is that it works on top of existigng ALSA driver interface. But for generic applications, this is a bit of a mess... when to use snd_pcm_delay() and when to augment the result with snd_pcm_status_get_tstamp()? With most "desktop audio drivers", the above calculation will provide the wrong result. So in the end this is a hack and applications must be customized for a specific piece of hardware/driver.
One idea is to tie this to the existing SNDRV_PCM_INFO_BATCH flag (e.g. quoting existing documentation in pcm_local.h -> "device transfers samples in batch"). So if the PCM has this flag set, application should interpret snd_pcm_delay() results as referring to the last batch.
What do you think? If this seems ok, an obvious next step is to provide a helper function snd_pcm_delay_foo()) which hides all of this from the apps (so that no if-else stuff for different types of drivers is needed in application code). Or.. (prepares to takes cover).. modify snd_pcm_delay() to in fact do this implicitly.
Hi all,
and then about the "other problem", e.g. of rewinding with drivers/HW that do burst transfers of samples from ALSA ringbuffer to a separate HW buffer. Uh oh, and what the subject says this thread is ought to be about.:)
On Mon, 1 Feb 2010, Kai Vehmanen wrote:
So while snd_pcm_delay() provides a snapshot of the delay at the last DMA burst/block-transfer (when hw_ptr+runtime->delay were last updated in the driver), the information may be refined with snd_pcm_status_get_tstamp(), which essentially tells the diff between T1&T3. So essentially what the application is looking for is 'snd_pcm_delay()-(T3-T1)'.
[...]
One idea is to tie this to the existing SNDRV_PCM_INFO_BATCH flag (e.g. quoting existing documentation in pcm_local.h -> "device transfers samples in batch"). So if the PCM has this flag set, application should interpret snd_pcm_delay() results as referring to the last batch.
Maybe the same INFO_BATCH flag could be used to help solve the rewind problem as well. If set, it is a signal that a segment of the ringbuffer (N samples after current hw_ptr) may have been already transferred, or is currently in transfer, and cannot be rewound (without stopping the stream and causing a glitch), but the elapsed callback and hw_ptr have not yet occured. And most importantly, when pointer() cb reports that hw_ptr jumps in bursts, so current snd_pcm_rewindable() implementation may not be accurate with these drivers.
But then how much is N? I guess we can't assume N=period-size (does not apply for e.g. how pulseaudio uses ALSA in glitch free mode). Sw-params:xfer-align is not the same thing plus it's now deprecated. Any ideas?
Hmm, or on a second though, maybe N=period-size is a good idea after all. E.g. drivers would configure the DMA transfers according to the hwparams:period-size, and apps such as pulseaudio could decide (by setting the period-size) how close to hw-ptr it wants to live (and still rewind if needed). Of course, it's not obvious how useful PA glitch-free is if used in this way...
For applications, this would be hidden in the implementation of snd_pcm_rewindable() and snd_pcm_rewind() (they would check for INFO_BATCH and limit rewinding appropriately).
PS Like with my earlier mail, I'm not 100% convinced this is a generic enough approach (or if the wider community thinks this is a serious enough of a problem), but with these proposals I'm thinking what can be done within the scope of current (driver) APIs...
2010/2/2 Kai Vehmanen kvehmanen@eca.cx
Hi all,
and then about the "other problem", e.g. of rewinding with drivers/HW that do burst transfers of samples from ALSA ringbuffer to a separate HW buffer. Uh oh, and what the subject says this thread is ought to be about.:)
On Mon, 1 Feb 2010, Kai Vehmanen wrote:
So while snd_pcm_delay() provides a snapshot of the delay at the last DMA burst/block-transfer (when hw_ptr+runtime->delay were last updated in the driver), the information may be refined with snd_pcm_status_get_tstamp(), which essentially tells the diff between T1&T3. So essentially what the application is looking for is 'snd_pcm_delay()-(T3-T1)'.
[...]
One idea is to tie this to the existing SNDRV_PCM_INFO_BATCH flag (e.g. quoting existing documentation in pcm_local.h -> "device transfers
samples in
batch"). So if the PCM has this flag set, application should interpret snd_pcm_delay() results as referring to the last batch.
Maybe the same INFO_BATCH flag could be used to help solve the rewind problem as well. If set, it is a signal that a segment of the ringbuffer (N samples after current hw_ptr) may have been already transferred, or is currently in transfer, and cannot be rewound (without stopping the stream and causing a glitch), but the elapsed callback and hw_ptr have not yet occured. And most importantly, when pointer() cb reports that hw_ptr jumps in bursts, so current snd_pcm_rewindable() implementation may not be accurate with these drivers.
But then how much is N? I guess we can't assume N=period-size (does not apply for e.g. how pulseaudio uses ALSA in glitch free mode). Sw-params:xfer-align is not the same thing plus it's now deprecated. Any ideas?
Hmm, or on a second though, maybe N=period-size is a good idea after all. E.g. drivers would configure the DMA transfers according to the hwparams:period-size, and apps such as pulseaudio could decide (by setting the period-size) how close to hw-ptr it wants to live (and still rewind if needed). Of course, it's not obvious how useful PA glitch-free is if used in this way...
For applications, this would be hidden in the implementation of snd_pcm_rewindable() and snd_pcm_rewind() (they would check for INFO_BATCH and limit rewinding appropriately).
PS Like with my earlier mail, I'm not 100% convinced this is a generic enough approach (or if the wider community thinks this is a serious enough of a problem), but with these proposals I'm thinking what can be done within the scope of current (driver) APIs...
- at T3, application calls snd_pcm_delay() to query how many samples
of delay there is currently (e.g. if it write a sample to ALSA PCM device now, how long before it hits the speaker)
No, your assumption (how long before it hits the speaker ) is wrong , refer to
http://www.alsa-project.org/alsa-doc/alsa-lib/group___p_c_m.html
For playback the delay is defined as the time that a frame that is written to the PCM stream shortly after this call will take to be actually audible. It is as such the overall latency from the write call to the final DAC.
For capture the delay is defined as the time that a frame that was digitized by the audio device takes until it can be read from the PCM stream shortly after this call returns. It is as such the overall latency from the initial ADC to the read call. (e.g. the latency of decoding those AC3 or DTS pass through spdif by the digital receiver is not included)
How about those sound cards which only support non-inteleaved mode ?
why do PA insist to use one period per buffer when only those ISA drivers and intel8x0 have periods_min =1 , the most common HDA driver and most sound cards have periods_min =2 ?
Hi,
On Tue, 9 Feb 2010, Raymond Yau wrote:
- at T3, application calls snd_pcm_delay() to query how many samples
of delay there is currently (e.g. if it write a sample to ALSA PCM device now, how long before it hits the speaker)
No, your assumption (how long before it hits the speaker ) is wrong , refer to
http://www.alsa-project.org/alsa-doc/alsa-lib/group___p_c_m.html
I have trouble following your logic here. I mean:
For playback the delay is defined as the time that a frame that is written to the PCM stream shortly after this call will take to be actually audible. It is as such the overall latency from the write call to the final DAC.
Isn't that _exactly_ what I wrote above?
I.e. this is the purpose of snd_pcm_delay() and why application developers use it for e.g. audio/video sync. So what's the difference? Do you mean that speaker!=DAC, or...?
why do PA insist to use one period per buffer when only those ISA drivers and intel8x0 have periods_min =1 , the most common HDA driver and most sound cards have periods_min =2 ?
That is discussed at length here: http://0pointer.de/blog/projects/pulse-glitch-free.html
2010/2/10 Kai Vehmanen kvehmanen@eca.cx
Hi,
On Tue, 9 Feb 2010, Raymond Yau wrote:
why do PA insist to use one period per buffer when only those ISA drivers
and intel8x0 have periods_min =1 , the most common HDA driver and most sound cards have periods_min =2 ?
That is discussed at length here: http://0pointer.de/blog/projects/pulse-glitch-free.html
most of the motherboard which have ISA bus for the ISA sound card did not have high resolution timer ( PA glitch free mode require high resolution timer )
The rest of those sound cards which have period_mins = 1 are regarded as broken by PA developer (e.g. intel8x0 ,emu10k1 , ens1371 , ... ) and aplay did not work with one period per buffer too.
what is the hardware requirement for the driver to support one period per buffer ?
if none of the those driver can be fixed
why don't we change the period_min of all those broken drivers from 1 to 2 ?
Raymond Yau wrote:
... The rest of those sound cards which have period_mins = 1 are regarded as broken by PA developer (e.g. intel8x0 ,emu10k1 , ens1371 , ... ) and aplay did not work with one period per buffer too.
what is the hardware requirement for the driver to support one period per buffer ?
That the hardware can be programmed to generate only one interrupt per buffer.
if none of the those driver can be fixed why don't we change the period_min of all those broken drivers from 1 to 2 ?
Because period_min has nothing to do with the brokenness. PA wants to use as few periods as possible because it does not use period interrupts but the DMA pointer, and it's the latter that is broken.
Regards, Clemens
2010/2/10 Clemens Ladisch clemens@ladisch.de
Raymond Yau wrote:
... The rest of those sound cards which have period_mins = 1 are regarded as broken by PA developer (e.g. intel8x0 ,emu10k1 , ens1371 , ... ) and
aplay
did not work with one period per buffer too.
what is the hardware requirement for the driver to support one period per buffer ?
That the hardware can be programmed to generate only one interrupt per buffer.
if none of the those driver can be fixed why don't we change the period_min of all those broken drivers from 1 to
2
?
Because period_min has nothing to do with the brokenness. PA wants to use as few periods as possible because it does not use period interrupts but the DMA pointer, and it's the latter that is broken.
Regards, Clemens
Are the one period per buffer designed for specific purpose such as playing/looping the pre-loaded soundfont ?
Other than pulseaudio , are there any application use this mode properly without glitch ?
Do you mean that PA only wake up once when configure sound card to use two periods per buffer ?
if sound card only support non-interleaved mode and exactly two periods , even if you limit the rewind of application pointer to the period boundary is not 100% safe if the sound card pre-fetch audio using DMA
Most sound cards hardware register only provide the number of sample processed
Raymond Yau wrote:
Are the one period per buffer designed for specific purpose such as playing/looping the pre-loaded soundfont ?
No, it's just a consequence of how the card's DMA controller is designed.
Other than pulseaudio , are there any application use this mode properly without glitch ?
The number of periods has nothing to do with glitching.
Do you mean that PA only wake up once when configure sound card to use two periods per buffer ?
When using two periods per buffer, ALSA tries to wake up PA two times. However, PA ignores the sound card's interrupts and is woken up by its own timer.
Regards, Clemens
On Thu, 11 Feb 2010, Clemens Ladisch wrote:
Do you mean that PA only wake up once when configure sound card to use two periods per buffer ?
When using two periods per buffer, ALSA tries to wake up PA two times. However, PA ignores the sound card's interrupts and is woken up by its own timer.
PA can drive the wake-ups using avail_min sw parameter. If this value is high enough, no userspace wake up is called, only interrupt is processed and internal ring buffer pointers in the driver are updated.
Jaroslav
----- Jaroslav Kysela perex@perex.cz Linux Kernel Sound Maintainer ALSA Project, Red Hat, Inc.
2010/2/11 Jaroslav Kysela perex@perex.cz
On Thu, 11 Feb 2010, Clemens Ladisch wrote:
Do you mean that PA only wake up once when configure sound card to use
two
periods per buffer ?
When using two periods per buffer, ALSA tries to wake up PA two times. However, PA ignores the sound card's interrupts and is woken up by its own timer.
PA can drive the wake-ups using avail_min sw parameter. If this value is high enough, no userspace wake up is called, only interrupt is processed and internal ring buffer pointers in the driver are updated.
Jaroslav
Even using a high resolution timer , the application still cannot achieve latency better than the configured period size
For USB case , the driver cannot give accurate hw pointer position , hw pointer increase in steps for the current implementation, (i.e. the graph is a stepping fuction if you plot the position of hardware pointer against time elasped ) .
This mean that the wake up time cannot be calculated using as number of sample/rate since the fuction is not linear especially when using max buffer size , min period --> max period size is much greater than the watermark
The glitch is most likely underrun,
Refer to
http://thread.gmane.org/gmane.linux.alsa.devel/60371/focus=60535
The delay value cannot be used for buffer filling, of course (also in standard - no-XRUN case - avail functions should be used).
On Mon, 15.02.10 11:03, Raymond Yau (superquad.vortex2@gmail.com) wrote:
For USB case , the driver cannot give accurate hw pointer position , hw pointer increase in steps for the current implementation, (i.e. the graph is a stepping fuction if you plot the position of hardware pointer against time elasped ) .
The granularity of write transfer blocks and of playback pointer updates is currently not quriable nor configurable in ALSA. We only have periods which tries to cover both in a way.
For the use in PA it is really important to get better APIs for this. Ideally both for querying the latency and the transfer granularityand configuring it, but most importantly for querying it.
This has been discussed before on this mailing list, btw.
Lennart
Hi,
On Mon, 15 Feb 2010, Raymond Yau wrote:
Even using a high resolution timer , the application still cannot achieve latency better than the configured period size
For USB case , the driver cannot give accurate hw pointer position , hw pointer increase in steps for the current implementation, (i.e. the graph is
... but that's a real hardware limitation for the USB-driver, right? And even in the USB case, hw pointer is incremented in steps of URB transfer size, so even in this case, latency of a highres timer based application is not limited by the set period-size.
Of course, there is no ALSA API to query the burst size (e.g. the granularity of hw_ptr updates), which is a real problem for generic apps (that are not hardcoded to work with just one ALSA driver).
But you can query the SNDRV_PCM_INFO_BATCH flag (it's set by e.g. the USB driver) and adjust your hrtimer-based application's logic based on that (if set, assume hw_ptr will jump in bursts). I know, not really ideal (you essentially have to e.g. disable PA glitch-free for these cards currently).
This mean that the wake up time cannot be calculated using as number of sample/rate since the fuction is not linear especially when using max buffer size , min period --> max period size is much greater than the watermark
Yep, that's what SNDRV_PCM_INFO_BATCH flag warns you about.
On Sun, 21.02.10 12:35, Kai Vehmanen (kvehmanen@eca.cx) wrote:
But you can query the SNDRV_PCM_INFO_BATCH flag (it's set by e.g. the USB driver) and adjust your hrtimer-based application's logic based on that (if set, assume hw_ptr will jump in bursts). I know, not really ideal (you essentially have to e.g. disable PA glitch-free for these cards currently).
This mean that the wake up time cannot be calculated using as number of sample/rate since the fuction is not linear especially when using max buffer size , min period --> max period size is much greater than the watermark
Yep, that's what SNDRV_PCM_INFO_BATCH flag warns you about.
Hmm, could you elaborate a little about SNDRV_PCM_INFO_BATCH? What exactly does that mean in general and especially for the timer based audio scheduling?
I am currently not making use of this, but should I?
I cannot find much documentation about this, can you enlighten me?
I assume this is exposed to userspace via snd_pcm_hw_params_is_batch()?
Lennart
Hi,
On Sun, 21 Feb 2010, Lennart Poettering wrote:
But you can query the SNDRV_PCM_INFO_BATCH flag (it's set by e.g. the USB driver) and adjust your hrtimer-based application's logic based on that
[...]
This mean that the wake up time cannot be calculated using as number of sample/rate since the fuction is not linear especially when using max buffer size , min period --> max period size is much greater than the watermark
Yep, that's what SNDRV_PCM_INFO_BATCH flag warns you about.
Hmm, could you elaborate a little about SNDRV_PCM_INFO_BATCH? What exactly does that mean in general and especially for the timer based audio scheduling?
I am currently not making use of this, but should I?
I'm also looking at this topic from an application developer perspective, so I can only provide you my interpretation of reading current ALSA code, but here goes.
I'll also add Eliot (who asked about the same thing, in http://mailman.alsa-project.org/pipermail/alsa-devel/2010-February/025262.ht... ) to the cc, and also Takashi, who git-annotate reveals has made most recent PCM_INFO_BATCH related changes to linux/sound/core/pcm_lib.c. ;)
Anyway, SNDRV_PCM_INFO_BATCH is set by following drivers: - isa/msnd - pcii/bt87x - pci/korg1212 - pcmcia/pdaudiocf - soc/au1x - soc/fsl (mpc52xx/8610) - sparc/dbri - usbaudio
Common to all these drivers is that audio samples are moved to/from the ALSA ringbuffer, in bursts, to an intermediary block of memory, and from there to/from the codec. And most importantly, pointer callback is based on last ack'ed burst. IOW, hw_ptr jumps in bursts with these drivers and granularity of hw_ptr is more coarse (affecting snd_pcm_avail(), snd_pcm_delay() and so forth).
Now the existing documentation for the flag (and user-space snd_pcm_hw_params_is_batch() API) is rather brief as you've noted, and AFAIK very few apps use these, so I'm not sure, how much we as app developers should rely on these semantics (and to the fact that future drivers will use the BATCH flag in a similar way).
But anyways, the fact remains that snd_pcm_hw_params_is_batch() exists, and based on a review of current (2.6.33-rc) drivers, majority/all of the drivers use the flag in the same way.
So for apps like PA, snd_pcm_hw_params_is_batch() is a hint that app should not rewind closer to hw_ptr than period-size. It also hints that hw_ptr granularity is more coarse, which might be useful in setting sane defaults for buffer watermarks and such.
PS Eliot: I didn't respond to your original mail, as I don't really have even good reading-the-code interpretations for the others. SNDRV_PCM_INFO_BLOCK_TRANSFER/snd_pcm_hw_params_is_block_transfer() is set by many, many drivers, but I'm not sure what the semantics really are (why cs4281 does not set the flag, while ens1370 sets it for instance). Then again SNDRV_PCM_INFO_BLOCK_DOUBLE seems very close to BATCH flag, but drivers setting DOUBLE (basicly just rme9652), _do_ support accurate pointer/hw_ptr reporting, so there is a difference.
2010/2/23 Kai Vehmanen kvehmanen@eca.cx
Hi,
On Sun, 21 Feb 2010, Lennart Poettering wrote:
But you can query the SNDRV_PCM_INFO_BATCH flag (it's set by e.g. the USB
driver) and adjust your hrtimer-based application's logic based on that
It is not easy to calculate the correct wake up time if the function is a stepping function since you must keep track of the elasped time instead of relying on the hw ptr positon
[...]
This mean that the wake up time cannot be calculated using as number of
sample/rate since the fuction is not linear especially when using max buffer size , min period --> max period size is much greater than the watermark
Yep, that's what SNDRV_PCM_INFO_BATCH flag warns you about.
Hmm, could you elaborate a little about SNDRV_PCM_INFO_BATCH? What exactly does that mean in general and especially for the timer based audio scheduling?
I am currently not making use of this, but should I?
I'm also looking at this topic from an application developer perspective, so I can only provide you my interpretation of reading current ALSA code, but here goes.
I'll also add Eliot (who asked about the same thing, in http://mailman.alsa-project.org/pipermail/alsa-devel/2010-February/025262.ht...) to the cc, and also Takashi, who git-annotate reveals has made most recent PCM_INFO_BATCH related changes to linux/sound/core/pcm_lib.c. ;)
Anyway, SNDRV_PCM_INFO_BATCH is set by following drivers:
- isa/msnd
- pcii/bt87x
- pci/korg1212
- pcmcia/pdaudiocf
- soc/au1x
- soc/fsl (mpc52xx/8610)
- sparc/dbri
- usbaudio
How about cs46xx ?
Common to all these drivers is that audio samples are moved to/from the ALSA ringbuffer, in bursts, to an intermediary block of memory, and from there to/from the codec. And most importantly, pointer callback is based on last ack'ed burst. IOW, hw_ptr jumps in bursts with these drivers and granularity of hw_ptr is more coarse (affecting snd_pcm_avail(), snd_pcm_delay() and so forth).
Now the existing documentation for the flag (and user-space snd_pcm_hw_params_is_batch() API) is rather brief as you've noted, and AFAIK very few apps use these, so I'm not sure, how much we as app developers should rely on these semantics (and to the fact that future drivers will use the BATCH flag in a similar way).
But anyways, the fact remains that snd_pcm_hw_params_is_batch() exists, and based on a review of current (2.6.33-rc) drivers, majority/all of the drivers use the flag in the same way.
So for apps like PA, snd_pcm_hw_params_is_batch() is a hint that app should not rewind closer to hw_ptr than period-size. It also hints that hw_ptr granularity is more coarse, which might be useful in setting sane defaults for buffer watermarks and such.
PS Eliot: I didn't respond to your original mail, as I don't really have even good reading-the-code interpretations for the others. SNDRV_PCM_INFO_BLOCK_TRANSFER/snd_pcm_hw_params_is_block_transfer() is set by many, many drivers, but I'm not sure what the semantics really are (why cs4281 does not set the flag, while ens1370 sets it for instance). Then again SNDRV_PCM_INFO_BLOCK_DOUBLE seems very close to BATCH flag, but drivers setting DOUBLE (basicly just rme9652), _do_ support accurate pointer/hw_ptr reporting, so there is a difference.
2010/2/21 Kai Vehmanen kvehmanen@eca.cx
Hi,
On Mon, 15 Feb 2010, Raymond Yau wrote:
Even using a high resolution timer , the application still cannot achieve
latency better than the configured period size
For USB case , the driver cannot give accurate hw pointer position , hw pointer increase in steps for the current implementation, (i.e. the graph is
... but that's a real hardware limitation for the USB-driver, right? And even in the USB case, hw pointer is incremented in steps of URB transfer size, so even in this case, latency of a highres timer based application is not limited by the set period-size.
Of course, there is no ALSA API to query the burst size (e.g. the granularity of hw_ptr updates), which is a real problem for generic apps (that are not hardcoded to work with just one ALSA driver).
The burst size is less than or equal to the minimum period size
2010/2/21 Kai Vehmanen kvehmanen@eca.cx
Hi,
On Mon, 15 Feb 2010, Raymond Yau wrote:
Even using a high resolution timer , the application still cannot achieve
latency better than the configured period size
For USB case , the driver cannot give accurate hw pointer position , hw pointer increase in steps for the current implementation, (i.e. the graph is
... but that's a real hardware limitation for the USB-driver, right? And even in the USB case, hw pointer is incremented in steps of URB transfer size, so even in this case, latency of a highres timer based application is not limited by the set period-size.
I don't think so since the sound card still use interrupt driven , the latency cannot be better than the period size
To achieve lower latency , you have to set a smaller period size even if you are using high resoultion timer
The glitch is most likely underrun since the watermark may be too low on machine with slow CPU since you assume all CPU have enough power and disk I/O to write ( buttfer time - 20ms ) audio data in 20ms
2010/2/21 Kai Vehmanen kvehmanen@eca.cx
Hi,
On Mon, 15 Feb 2010, Raymond Yau wrote:
Even using a high resolution timer , the application still cannot achieve
latency better than the configured period size
For USB case , the driver cannot give accurate hw pointer position , hw pointer increase in steps for the current implementation, (i.e. the graph is
... but that's a real hardware limitation for the USB-driver, right? And even in the USB case, hw pointer is incremented in steps of URB transfer size, so even in this case, latency of a highres timer based application is not limited by the set period-size.
I guess alsa pulse plugin also has the same problem since the value returned by pointer callback of the pulse device is also not accurate
On Thu, 11.02.10 08:27, Jaroslav Kysela (perex@perex.cz) wrote:
On Thu, 11 Feb 2010, Clemens Ladisch wrote:
Do you mean that PA only wake up once when configure sound card to use two periods per buffer ?
When using two periods per buffer, ALSA tries to wake up PA two times. However, PA ignores the sound card's interrupts and is woken up by its own timer.
PA can drive the wake-ups using avail_min sw parameter. If this value is high enough, no userspace wake up is called, only interrupt is processed and internal ring buffer pointers in the driver are updated.
We actually do set min_avail and update it depending on the latency requirements of the connected clients. Not that we set it to a value that is not necessarily a multiple of the period size.
That said our primary way to wakeup is and stays the system timer, not the sound card clock. We set min_avail only as a safety net.
Lennart
Hi,
On Thu, 11 Feb 2010, Jaroslav Kysela wrote:
However, PA ignores the sound card's interrupts and is woken up by its own timer.
PA can drive the wake-ups using avail_min sw parameter. If this value is high enough, no userspace wake up is called, only interrupt is processed and internal ring buffer pointers in the driver are updated.
but that's unfortunately not enough. AFAIK glitch-free aims, among other things, to minimize power usage for battery powered devices, and to do that, you need to minimize hardware interrupts [1]. And for that, avail_min won't help.
[1] this is exactly the same thing that is driving Linux tickless development: http://www.lesswatts.org/projects/tickless/
PS I do agree that avail_min is a very useful, and often overlooked feature, of ALSA. But in this specific case it won't help...
On Sun, 21.02.10 12:06, Kai Vehmanen (kvehmanen@eca.cx) wrote:
Hi,
On Thu, 11 Feb 2010, Jaroslav Kysela wrote:
However, PA ignores the sound card's interrupts and is woken up by its own timer.
PA can drive the wake-ups using avail_min sw parameter. If this value is high enough, no userspace wake up is called, only interrupt is processed and internal ring buffer pointers in the driver are updated.
but that's unfortunately not enough. AFAIK glitch-free aims, among other things, to minimize power usage for battery powered devices, and to do that, you need to minimize hardware interrupts [1]. And for that, avail_min won't help.
[1] this is exactly the same thing that is driving Linux tickless development: http://www.lesswatts.org/projects/tickless/
PS I do agree that avail_min is a very useful, and often overlooked feature, of ALSA. But in this specific case it won't help...
We try our best to minimize wakeups by setting the minimal number of periods in PA. Unfortunately that still means one gets 2 or 1 irqs per buffer iteration.
However even if we get those two wakeups, using avail_min allows us to minimize the number of processes that are woken up on that IRQ. i.e. if the CPU is woken up, it is still better when this only means some IRQ in the kernel is processed, then passing it into userspace.
Lennart
2010/2/22 Lennart Poettering mznyfn@0pointer.de
On Sun, 21.02.10 12:06, Kai Vehmanen (kvehmanen@eca.cx) wrote:
Hi,
On Thu, 11 Feb 2010, Jaroslav Kysela wrote:
However, PA ignores the sound card's interrupts and is woken up by its own timer.
PA can drive the wake-ups using avail_min sw parameter. If this value
is
high enough, no userspace wake up is called, only interrupt is
processed
and internal ring buffer pointers in the driver are updated.
but that's unfortunately not enough. AFAIK glitch-free aims, among other things, to minimize power usage for battery powered devices, and to do that, you need to minimize hardware interrupts [1]. And for that, avail_min won't help.
[1] this is exactly the same thing that is driving Linux tickless development: http://www.lesswatts.org/projects/tickless/
PS I do agree that avail_min is a very useful, and often overlooked feature, of ALSA. But in this specific case it won't help...
We try our best to minimize wakeups by setting the minimal number of periods in PA. Unfortunately that still means one gets 2 or 1 irqs per buffer iteration.
However even if we get those two wakeups, using avail_min allows us to minimize the number of processes that are woken up on that IRQ. i.e. if the CPU is woken up, it is still better when this only means some IRQ in the kernel is processed, then passing it into userspace.
Lennart
Has any one try the patch posted in http://thread.gmane.org/gmane.comp.audio.pulseaudio.general/6671
Rewinding the ring buffer completely causes audible issues with DMAs.
Previous solution didn't work with tsched=0, and used tsched_watermark for guardband, which isn't linked to hardware and could become really high if underflows occurred.
Why PA alway rewind the buffer when the stream connects ?
When you rewind the application pointer to hardware pointer ?
On Thu, 11.02.10 14:52, Raymond Yau (superquad.vortex2@gmail.com) wrote:
Other than pulseaudio , are there any application use this mode properly without glitch ?
To my knowledge PA is the only implementation of timer-based audio scheduling for free operating systems. Heck, I even think it might be the only user of snd_pcm_rewind() at all.
Lennart
On Wed, 10.02.10 21:19, Raymond Yau (superquad.vortex2@gmail.com) wrote:
most of the motherboard which have ISA bus for the ISA sound card did not have high resolution timer ( PA glitch free mode require high resolution timer )
The rest of those sound cards which have period_mins = 1 are regarded as broken by PA developer (e.g. intel8x0 ,emu10k1 , ens1371 , ... ) and aplay did not work with one period per buffer too.
Actually most of these drivers work fine these days with PA.
Lennart
On Tue, 09.02.10 06:59, Raymond Yau (superquad.vortex2@gmail.com) wrote:
why do PA insist to use one period per buffer when only those ISA drivers and intel8x0 have periods_min =1 , the most common HDA driver and most sound cards have periods_min =2 ?
PA actually doesn't insist. We just try to minimize wakeups and hence pick the lowest value we can get. Which means 2 frags on most devices and 1 on very few others.
Lennart
Hello,
spamming with one more mail.
On Mon, 1 Feb 2010, Kai Vehmanen wrote:
- at T3, application calls snd_pcm_delay() to query how many samples of delay there is currently (e.g. if it write a sample to ALSA PCM device now, how long before it hits the speaker)
- note that this is what snd_pcm_delay() is specifically for
[...]
Anyways, the main problem is that snd_pcm_delay() accuracy is limited by the transfer/burst size used to move samples from main memory to the sound chip, _although_ the hardware _is_ able to tell the exact current position
It just occured to me that could drivers update 'runtime->delay' in their pointer callback...? :P And if yes, would it make sense?
Now this would fix snd_pcm_delay() right away (as it already takes runtime->delay into consideration). Although definitely not sure whether adding standardized side-effects to pointer() is a good idea...
participants (7)
-
Clemens Ladisch
-
Jaroslav Kysela
-
Kai Vehmanen
-
Lennart Poettering
-
Mark Brown
-
pl bossart
-
Raymond Yau