[alsa-devel] PulseAudio and SNDRV_PCM_INFO_BATCH
Hi folks, I'd like to bring this one up again, since we are currently in the sub-optimal position of forcing ~100 ms latency on USB devices. The original thread is here -- http://mailman.alsa-project.org/pipermail/alsa-devel/2013-December/069666.ht...
I see two flags that are possibly of consequence here: SNDRV_PCM_INFO_BATCH and SNDRV_PCM_INFO_BLOCK_TRANSFER. I'm not sure what these mean -- the documentation mentions "double buffering" for the batch flag, and just that the block transfer means "block transfer". :-)
We've spoken about batch meaning either transfers in period size chunks, or some fixed chunk size. It seems that it would make more sense for it to mean the former, and block transfer to mean the latter.
So I guess the first thing that would be nice to have is a clear meaning of these two flags. With this done, we could potentially get to the API to report the transfer size from the driver.
I did notice that there is a snd_pcm_hw_params_get_fifo_size(). Is this something we could use for the purpose of transfer size reporting, by any chance?
Cheers, Arun
On 12 June 2015 at 17:59, Arun Raghavan arun@accosted.net wrote:
Hi folks, I'd like to bring this one up again, since we are currently in the sub-optimal position of forcing ~100 ms latency on USB devices. The original thread is here -- http://mailman.alsa-project.org/pipermail/alsa-devel/2013-December/069666.ht...
While we sort this out, though, is there an upper bound on the USB transfer size (that we could then use a rewind safety margin)? We might be able to use this as a workaround till this can be fixed properly.
-- Arun
12.06.2015 17:32, Arun Raghavan wrote:
On 12 June 2015 at 17:59, Arun Raghavan arun@accosted.net wrote:
Hi folks, I'd like to bring this one up again, since we are currently in the sub-optimal position of forcing ~100 ms latency on USB devices. The original thread is here -- http://mailman.alsa-project.org/pipermail/alsa-devel/2013-December/069666.ht...
While we sort this out, though, is there an upper bound on the USB transfer size (that we could then use a rewind safety margin)? We might be able to use this as a workaround till this can be fixed properly.
Hello Arun,
thanks for bringing up the issue again. However, I think that the "~100 ms latency on batch cards" problem can and should also be solved from the other end, independently from the USB special case. Namely, I think that the default buffer and period sizes that PulseAudio selects are way too conservative. The power-saving argument was used in the past as a justification, and I am calling for a reevaluation. Here is why.
1. Android's AudioFlinger uses 2 periods, 5 ms each. It is a mobile OS, and the developers think it is good enough.
2. I have measured the battery life time of my SandyBridge-based laptop and found that, with pure ALSA on the hw:0 device, a similar low-latency setup loses less than 5% of the battery life (935 seconds were lost out of 25742). With PulseAudio, the difference is worse, but let's treat this as a missed optimization in PulseAudio.
Let me make my viewpoint more explicit: from now on, I will reject CPU usage measurements as an evidence for "real power-saving problems", unless a correlation factor between them and the battery life has been also measured on the same device. Direct measurements of the battery life in reproducible conditions are of course welcome.
Raw data for the experiment (2) with pure ALSA are attached. The first column is the time since boot, in seconds, and the second column is the remaining energy in the battery, in microwatt-hours, as reported in the energy_now sysfs attribute. report-1428125745.txt is with --period-size=44100 --buffer-size=88200, report-1428161181.txt is with --buffer-size=440 --period-size=220, the test file is a 44100 Hz, S16, stereo wav file.
To guarantee the reproducibility of this experiment, the entire system (Gentoo stage3 plus PulseAudio plus laptop-mode-tools) has been put in the initramfs, together with a script that turns off the backlight and then repeatedly plays a wav file from the same initramfs through aplay. The wi-fi card has been turned off with a hardware toggle. So everything (including the SSD) during the test is unused, except CPU, memory, and the onboard sound card. That's why the test exhibits unrealistically long battery life.
If my result gets confirmed on a laptop that is not a former flagman from Sony, then I would argue for the following:
1. Change of the default buffer and period sizes for batch cards in PulseAudio to values that represent, on modern hardware, say, 2.5% of battery life reduction as compared to very large periods.
2. Limiting of the sleep time in the timer-based scheduling logic to a similar value. If this ends up below 30 ms, then we can simplify PulseAudio by removing all traces of the rewind logic.
sorry, I need to clarify some of my words
12.06.2015 18:43, Alexander E. Patrakov wrote:
To guarantee the reproducibility of this experiment, the entire system (Gentoo stage3 plus PulseAudio plus laptop-mode-tools) has been put in the initramfs,
Clarification due to a possible "why PulseAudio" question. I have actually reused an old initramfs that I put together in order to measure the effect of various resamplers on battery life. Result back then: with 1s latency, when resampling from 44.1 to 48 kHz, speex-float-5 robs 734 seconds of battery life, out of 26731, as compared to speex-float-1. I.e. less than the battery lost due to 6 months of aging.
The scripts are attached, measure.sh is called from a script in /etc/local.d.
- Limiting of the sleep time in the timer-based scheduling logic to a
similar value. If this ends up below 30 ms, then we can simplify PulseAudio by removing all traces of the rewind logic.
I should point out that CRAS (another sound server that implements timer-based scheduling, from ChromeOS) has no rewind logic at all, and relies on clients not to request insanely large buffer size. Also, it contains no batch-card logic.
Hi folks, I'd like to bring this one up again, since we are currently in the sub-optimal position of forcing ~100 ms latency on USB devices. The original thread is here --
http://mailman.alsa-project.org/pipermail/alsa-devel/2013-December/069666.ht...
While we sort this out, though, is there an upper bound on the USB transfer size (that we could then use a rewind safety margin)? We might be able to use this as a workaround till this can be fixed properly.
Hello Arun,
thanks for bringing up the issue again. However, I think that the "~100
ms latency on batch cards" problem can and should also be solved from the other end, independently from the USB special case. Namely, I think that the default buffer and period sizes that PulseAudio selects are way too conservative. The power-saving argument was used in the past as a justification, and I am calling for a reevaluation. Here is why.
- Android's AudioFlinger uses 2 periods, 5 ms each. It is a mobile OS,
and the developers think it is good enough.
- I have measured the battery life time of my SandyBridge-based laptop
and found that, with pure ALSA on the hw:0 device, a similar low-latency setup loses less than 5% of the battery life (935 seconds were lost out of 25742). With PulseAudio, the difference is worse, but let's treat this as a missed optimization in PulseAudio.
http://www.intel.com/content/www/us/en/chipsets/high-definition-audio-energy...
Do your hda controller support EEaudio Mechanism ?
On 17.06.2015 08:04, Raymond Yau wrote:
http://www.intel.com/content/www/us/en/chipsets/high-definition-audio-energy...
Do your hda controller support EEaudio Mechanism ?
I don't know.
I'd like to bring this one up again, since we are currently in the sub-optimal position of forcing ~100 ms latency on USB devices. The original thread is here --
http://mailman.alsa-project.org/pipermail/alsa-devel/2013-December/069666.ht...
While we sort this out, though, is there an upper bound on the USB transfer size (that we could then use a rewind safety margin)? We might be able to use this as a workaround till this can be fixed properly.
https://git.kernel.org/cgit/linux/kernel/git/tiwai/sound.git/commit/sound/us...
Seem there are differences between high speed devices and others
https://git.kernel.org/cgit/linux/kernel/git/tiwai/sound.git/commit/sound/us...
Do usb wireless audio different from others ?
On 06/12/2015 02:29 PM, Arun Raghavan wrote:
Hi folks, I'd like to bring this one up again, since we are currently in the sub-optimal position of forcing ~100 ms latency on USB devices. The original thread is here -- http://mailman.alsa-project.org/pipermail/alsa-devel/2013-December/069666.ht...
I see two flags that are possibly of consequence here: SNDRV_PCM_INFO_BATCH and SNDRV_PCM_INFO_BLOCK_TRANSFER. I'm not sure what these mean -- the documentation mentions "double buffering" for the batch flag, and just that the block transfer means "block transfer". :-)
We've spoken about batch meaning either transfers in period size chunks, or some fixed chunk size. It seems that it would make more sense for it to mean the former, and block transfer to mean the latter.
So I guess the first thing that would be nice to have is a clear meaning of these two flags. With this done, we could potentially get to the API to report the transfer size from the driver.
Yeah, the meaning of those flags is somewhat fuzzy and may have changed over time as well. Here is my understanding of the flags, might not necessarily be 100% correct.
SNDRV_PCM_INFO_BLOCK_TRANSFER means that the data is copied from the user accessible buffer in blocks of one period. Typically these kinds of devices have some dedicated audio memory that is not accessible via normal memory access and a DMA is setup to copy data from main memory to the dedicated memory. This DMA transfers the data from the main memory to the dedicated memory in chunks of period size. But otherwise the controller might still be capable of reporting a accurate pointer position down to the sample/frame level.
So SNDRV_PCM_INFO_BLOCK_TRANSFER is mainly important for rewind handling and devices with that flag set might need additional headroom since the data up to one period after the pointer position has already been copied to the dedicated memory and hence can no longer be overwritten.
SNDRV_PCM_INFO_BATCH on the other hand has become to mean that the device is only capable of reporting the audio pointer with a coarse granularity. Typically this means a period sized granularity, but there are some other cases as well.
I did notice that there is a snd_pcm_hw_params_get_fifo_size(). Is this something we could use for the purpose of transfer size reporting, by any chance?
I think the idea with fifo_size() is mainly to aide with audio latency measurement. I.e. how many samples before the pointer() position are still inside the audio pipeline and have not yet reached the analog domain. But since there are now users that rely on it a lot of driver don't really bothered with setting it to the correct value. And for the use case here data is on the wrong side of the pointer() value as well. What we need is some kind of indication of the fuzziness of the pointer() value. E.g. how far can it be away from the actual position in the stream.
- Lars
Hi folks, I'd like to bring this one up again, since we are currently in the sub-optimal position of forcing ~100 ms latency on USB devices. The original thread is here --
http://mailman.alsa-project.org/pipermail/alsa-devel/2013-December/069666.ht...
I see two flags that are possibly of consequence here: SNDRV_PCM_INFO_BATCH and SNDRV_PCM_INFO_BLOCK_TRANSFER. I'm not sure what these mean -- the documentation mentions "double buffering" for the batch flag, and just that the block transfer means "block transfer". :-)
We've spoken about batch meaning either transfers in period size chunks, or some fixed chunk size. It seems that it would make more sense for it to mean the former, and block transfer to mean the latter.
So I guess the first thing that would be nice to have is a clear meaning of these two flags. With this done, we could potentially get to the API to report the transfer size from the driver.
Yeah, the meaning of those flags is somewhat fuzzy and may have changed
over time as well. Here is my understanding of the flags, might not necessarily be 100% correct.
SNDRV_PCM_INFO_BLOCK_TRANSFER means that the data is copied from the user
accessible buffer in blocks of one period. Typically these kinds of devices have some dedicated audio memory that is not accessible via normal memory access and a DMA is setup to copy data from main memory to the dedicated memory. This DMA transfers the data from the main memory to the dedicated memory in chunks of period size. But otherwise the controller might still be capable of reporting a accurate pointer position down to the sample/frame level.
So SNDRV_PCM_INFO_BLOCK_TRANSFER is mainly important for rewind handling
and devices with that flag set might need additional headroom since the data up to one period after the pointer position has already been copied to the dedicated memory and hence can no longer be overwritten.
SNDRV_PCM_INFO_BATCH on the other hand has become to mean that the device
is only capable of reporting the audio pointer with a coarse granularity. Typically this means a period sized granularity, but there are some other cases as well.
DSP_CAP_REALTIME bit tells if the device/operating system supports precise reporting of output pointer position using SNDCTL_DSP_GETxPTR. Precise means that accuracy of the reported playback pointer (time) is around few samples. Without this capability the playback/recording position is reported using precision of one fragment.
DSP_CAP_BATCH bit means that the device has some kind of local storage for recording and/or playback. For this reason the information reported by SNDCTL_DSP_GETxPTR is very inaccurate.
Are those alsa cap have the similar meaning of those oss cap ?
On 06/15/2015 01:39 PM, Raymond Yau wrote:
Hi folks, I'd like to bring this one up again, since we are currently in the sub-optimal position of forcing ~100 ms latency on USB devices. The original thread is here --
http://mailman.alsa-project.org/pipermail/alsa-devel/2013-December/069666.ht...
I see two flags that are possibly of consequence here: SNDRV_PCM_INFO_BATCH and SNDRV_PCM_INFO_BLOCK_TRANSFER. I'm not sure what these mean -- the documentation mentions "double buffering" for the batch flag, and just that the block transfer means "block transfer". :-)
We've spoken about batch meaning either transfers in period size chunks, or some fixed chunk size. It seems that it would make more sense for it to mean the former, and block transfer to mean the latter.
So I guess the first thing that would be nice to have is a clear meaning of these two flags. With this done, we could potentially get to the API to report the transfer size from the driver.
Yeah, the meaning of those flags is somewhat fuzzy and may have changed
over time as well. Here is my understanding of the flags, might not necessarily be 100% correct.
SNDRV_PCM_INFO_BLOCK_TRANSFER means that the data is copied from the user
accessible buffer in blocks of one period. Typically these kinds of devices have some dedicated audio memory that is not accessible via normal memory access and a DMA is setup to copy data from main memory to the dedicated memory. This DMA transfers the data from the main memory to the dedicated memory in chunks of period size. But otherwise the controller might still be capable of reporting a accurate pointer position down to the sample/frame level.
So SNDRV_PCM_INFO_BLOCK_TRANSFER is mainly important for rewind handling
and devices with that flag set might need additional headroom since the data up to one period after the pointer position has already been copied to the dedicated memory and hence can no longer be overwritten.
SNDRV_PCM_INFO_BATCH on the other hand has become to mean that the device
is only capable of reporting the audio pointer with a coarse granularity. Typically this means a period sized granularity, but there are some other cases as well.
DSP_CAP_REALTIME bit tells if the device/operating system supports precise reporting of output pointer position using SNDCTL_DSP_GETxPTR. Precise means that accuracy of the reported playback pointer (time) is around few samples. Without this capability the playback/recording position is reported using precision of one fragment.
DSP_CAP_BATCH bit means that the device has some kind of local storage for recording and/or playback. For this reason the information reported by SNDCTL_DSP_GETxPTR is very inaccurate.
Are those alsa cap have the similar meaning of those oss cap ?
I would say historically SNDRV_PCM_INFO_BATCH had the same meaning as DSP_CAP_BATCH, but it has also come to have the inverse meaning to DSP_CAP_REALTIME. By default applications can assume that the pointer is accurate down to a few samples, but if SNDRV_PCM_INFO_BATCH is set it is less accurate and can be everything up to period size precession.
SNDRV_PCM_INFO_BLOCK_TRANSFER is different from all of them though.
Hi folks, I'd like to bring this one up again, since we are currently in the sub-optimal position of forcing ~100 ms latency on USB devices. The original thread is here --
http://mailman.alsa-project.org/pipermail/alsa-devel/2013-December/069666.ht...
I see two flags that are possibly of consequence here: SNDRV_PCM_INFO_BATCH and SNDRV_PCM_INFO_BLOCK_TRANSFER. I'm not sure what these mean -- the documentation mentions "double buffering" for the batch flag, and just that the block transfer means "block transfer". :-)
We've spoken about batch meaning either transfers in period size chunks, or some fixed chunk size. It seems that it would make more sense for it to mean the former, and block transfer to mean the latter.
So I guess the first thing that would be nice to have is a clear meaning of these two flags. With this done, we could potentially get to the API to report the transfer size from the driver.
Yeah, the meaning of those flags is somewhat fuzzy and may have changed
over time as well. Here is my understanding of the flags, might not necessarily be 100% correct.
SNDRV_PCM_INFO_BLOCK_TRANSFER means that the data is copied from the
user
accessible buffer in blocks of one period. Typically these kinds of
devices
have some dedicated audio memory that is not accessible via normal memory access and a DMA is setup to copy data from main memory to the dedicated memory. This DMA transfers the data from the main memory to the dedicated memory in chunks of period size. But otherwise the controller might still be capable of reporting a accurate pointer position down to the sample/frame level.
So SNDRV_PCM_INFO_BLOCK_TRANSFER is mainly important for rewind handling
and devices with that flag set might need additional headroom since the data up to one period after the pointer position has already been copied
to
the dedicated memory and hence can no longer be overwritten.
SNDRV_PCM_INFO_BATCH on the other hand has become to mean that the
device
is only capable of reporting the audio pointer with a coarse granularity. Typically this means a period sized granularity, but there are some other cases as well.
DSP_CAP_REALTIME bit tells if the device/operating system supports
precise
reporting of output pointer position using SNDCTL_DSP_GETxPTR. Precise means that accuracy of the reported playback pointer (time) is around few samples. Without this capability the playback/recording position is reported using precision of one fragment.
DSP_CAP_BATCH bit means that the device has some kind of local storage
for
recording and/or playback. For this reason the information reported by SNDCTL_DSP_GETxPTR is very inaccurate.
Are those alsa cap have the similar meaning of those oss cap ?
I would say historically SNDRV_PCM_INFO_BATCH had the same meaning as
DSP_CAP_BATCH, but it has also come to have the inverse meaning to DSP_CAP_REALTIME. By default applications can assume that the pointer is accurate down to a few samples, but if SNDRV_PCM_INFO_BATCH is set it is less accurate and can be everything up to period size precession.
SNDRV_PCM_INFO_BLOCK_TRANSFER is different from all of them though.
DMA_RESIDUE_GRANULARITY_DESCRIPTOR DMA_RESIDUE_GRANULARITY_SEGMENT DMA_RESIDUE_GRANULARITY_BURST
Can this be regard as bad, normal and good accuracy ?
https://bugs.freedesktop.org/show_bug.cgi?id=86262
Although the resolution seem less than period size but the deviation is quite large , can we still regard this result are accurate up to one period ?
http://lists.freedesktop.org/archives/pulseaudio-discuss/2015-June/024022.ht...
This is accurate up to one period
Why do we need flag for exactly one period since most of the driver use irq should increment one period when interrupt occur ?
We only need to know those good accuracy can use timer base and those bad accuracy need special care
On 06/15/2015 03:34 PM, Raymond Yau wrote:
Hi folks, I'd like to bring this one up again, since we are currently in the sub-optimal position of forcing ~100 ms latency on USB devices. The original thread is here --
http://mailman.alsa-project.org/pipermail/alsa-devel/2013-December/069666.ht...
I see two flags that are possibly of consequence here: SNDRV_PCM_INFO_BATCH and SNDRV_PCM_INFO_BLOCK_TRANSFER. I'm not sure what these mean -- the documentation mentions "double buffering" for the batch flag, and just that the block transfer means "block transfer". :-)
We've spoken about batch meaning either transfers in period size chunks, or some fixed chunk size. It seems that it would make more sense for it to mean the former, and block transfer to mean the latter.
So I guess the first thing that would be nice to have is a clear meaning of these two flags. With this done, we could potentially get to the API to report the transfer size from the driver.
Yeah, the meaning of those flags is somewhat fuzzy and may have changed
over time as well. Here is my understanding of the flags, might not necessarily be 100% correct.
SNDRV_PCM_INFO_BLOCK_TRANSFER means that the data is copied from the user
accessible buffer in blocks of one period. Typically these kinds of devices have some dedicated audio memory that is not accessible via normal memory access and a DMA is setup to copy data from main memory to the dedicated memory. This DMA transfers the data from the main memory to the dedicated memory in chunks of period size. But otherwise the controller might still be capable of reporting a accurate pointer position down to the sample/frame level.
So SNDRV_PCM_INFO_BLOCK_TRANSFER is mainly important for rewind handling
and devices with that flag set might need additional headroom since the data up to one period after the pointer position has already been copied to the dedicated memory and hence can no longer be overwritten.
SNDRV_PCM_INFO_BATCH on the other hand has become to mean that the device
is only capable of reporting the audio pointer with a coarse granularity. Typically this means a period sized granularity, but there are some other cases as well.
DSP_CAP_REALTIME bit tells if the device/operating system supports precise reporting of output pointer position using SNDCTL_DSP_GETxPTR. Precise means that accuracy of the reported playback pointer (time) is around few samples. Without this capability the playback/recording position is reported using precision of one fragment.
DSP_CAP_BATCH bit means that the device has some kind of local storage for recording and/or playback. For this reason the information reported by SNDCTL_DSP_GETxPTR is very inaccurate.
Are those alsa cap have the similar meaning of those oss cap ?
I would say historically SNDRV_PCM_INFO_BATCH had the same meaning as
DSP_CAP_BATCH, but it has also come to have the inverse meaning to DSP_CAP_REALTIME. By default applications can assume that the pointer is accurate down to a few samples, but if SNDRV_PCM_INFO_BATCH is set it is less accurate and can be everything up to period size precession.
SNDRV_PCM_INFO_BLOCK_TRANSFER is different from all of them though.
DMA_RESIDUE_GRANULARITY_DESCRIPTOR DMA_RESIDUE_GRANULARITY_SEGMENT DMA_RESIDUE_GRANULARITY_BURST
Can this be regard as bad, normal and good accuracy ?
DMA_RESIDUE_GRANULARITY_DESCRIPTOR is more like completely and utterly useless. It means the driver can only tell whether the descriptor has finished or not, in a cyclic transfer the descriptor will never finish so the driver will always report the same. Currently we fall back to counting the number of period completion callbacks we get in this mode. This is slightly prone to race conditions since it is legal to coalesce two completion callbacks if a second one is scheduled before the first has run. This will only happen with a very high system load, but it can happen. Luckily most DMA drivers that are used for audio do support DMA_RESIDUE_GRANULARITY_SEGMENT now and there is a deprecation plan, including eventually stopping to support them altogether, for drivers that only support DMA_RESIDUE_GRANULARITY_DESCRIPTOR.
DMA_RESIDUE_GRANULARITY_SEGMENT means the hardware/driver can report the pointer position with a period precession. In this case the BATCH flag will be set for the PCM.
DMA_RESIDUE_GRANULARITY_BURST means it can report the position with a granularity of a few samples. Typically half the audio FIFO size.
https://bugs.freedesktop.org/show_bug.cgi?id=86262
Although the resolution seem less than period size but the deviation is quite large , can we still regard this result are accurate up to one period ?
This looks like a separate issue that's just made visible by the BATCH flag patch with that particular hardware.
http://lists.freedesktop.org/archives/pulseaudio-discuss/2015-June/024022.ht...
This is accurate up to one period
Why do we need flag for exactly one period since most of the driver use irq should increment one period when interrupt occur ?
We only need to know those good accuracy can use timer base and those bad accuracy need special care
Yes, and that's what the BATCH flag tells us.
DMA_RESIDUE_GRANULARITY_DESCRIPTOR DMA_RESIDUE_GRANULARITY_SEGMENT DMA_RESIDUE_GRANULARITY_BURST
Can this be regard as bad, normal and good accuracy ?
DMA_RESIDUE_GRANULARITY_DESCRIPTOR is more like completely and utterly
useless. It means the driver can only tell whether the descriptor has finished or not, in a cyclic transfer the descriptor will never finish so the driver will always report the same. Currently we fall back to counting the number of period completion callbacks we get in this mode. This is slightly prone to race conditions since it is legal to coalesce two completion callbacks if a second one is scheduled before the first has run. This will only happen with a very high system load, but it can happen. Luckily most DMA drivers that are used for audio do support DMA_RESIDUE_GRANULARITY_SEGMENT now and there is a deprecation plan, including eventually stopping to support them altogether, for drivers that only support DMA_RESIDUE_GRANULARITY_DESCRIPTOR.
DMA_RESIDUE_GRANULARITY_SEGMENT means the hardware/driver can report the
pointer position with a period precession. In this case the BATCH flag will be set for the PCM.
DMA_RESIDUE_GRANULARITY_BURST means it can report the position with a
granularity of a few samples. Typically half the audio FIFO size.
Residue is updated after each transferred * burst. This is typically only supported if the hardware has a progress * register of some sort (E.g. a register with the current read/write address * or a register with the amount of bursts/beats/bytes that have been * transferred or still need to be transferred).
The pointer callback of both snd-hda-intel and snd-oxygen are reading from hardware register
Do this mezn that only those driver which pointer callback read from hardware register are suitable for timer schedluing ?
Do you need additional hardware specific feature (e.g. can disable period interrupt ) ?
https://bugs.freedesktop.org/show_bug.cgi?id=86262
Although the resolution seem less than period size but the deviation is quite large , can we still regard this result are accurate up to one
period ?
This looks like a separate issue that's just made visible by the BATCH
flag patch with that particular hardware.
https://bugs.freedesktop.org/show_bug.cgi?id=86262#c19
The hw_ptr of those two usb audio devices seem increment in 5 to 6ms which is quite different from
https://git.kernel.org/cgit/linux/kernel/git/tiwai/sound.git/commit/sound/us...
For example, testing shows that a high-speed device can handle 32 frames/period and 3 periods/buffer at 48 KHz, whereas the current driver starts to get glitchy at 64 frames/period and 2 periods/buffer.
Do the result mean that only high speed device support 1.5ms latency while the other usb audio devices may support 15 to 18 ms ?
http://lists.freedesktop.org/archives/pulseaudio-discuss/2015-June/024022.ht...
This is accurate up to one period
Why do we need flag for exactly one period since most of the driver use
irq
should increment one period when interrupt occur ?
We only need to know those good accuracy can use timer base and those bad accuracy need special care
Yes, and that's what the BATCH flag tells us.
For those sound cards support low latency, the application must use smaller petiod size which match the requestef latency. Most drivers only support fixed latency. Only those sound card with good accuracy can adjust latency without changing period size
On 06/16/2015 04:33 AM, Raymond Yau wrote:
DMA_RESIDUE_GRANULARITY_DESCRIPTOR DMA_RESIDUE_GRANULARITY_SEGMENT DMA_RESIDUE_GRANULARITY_BURST
Can this be regard as bad, normal and good accuracy ?
DMA_RESIDUE_GRANULARITY_DESCRIPTOR is more like completely and utterly
useless. It means the driver can only tell whether the descriptor has finished or not, in a cyclic transfer the descriptor will never finish so the driver will always report the same. Currently we fall back to counting the number of period completion callbacks we get in this mode. This is slightly prone to race conditions since it is legal to coalesce two completion callbacks if a second one is scheduled before the first has run. This will only happen with a very high system load, but it can happen. Luckily most DMA drivers that are used for audio do support DMA_RESIDUE_GRANULARITY_SEGMENT now and there is a deprecation plan, including eventually stopping to support them altogether, for drivers that only support DMA_RESIDUE_GRANULARITY_DESCRIPTOR.
DMA_RESIDUE_GRANULARITY_SEGMENT means the hardware/driver can report the
pointer position with a period precession. In this case the BATCH flag will be set for the PCM.
DMA_RESIDUE_GRANULARITY_BURST means it can report the position with a
granularity of a few samples. Typically half the audio FIFO size.
Residue is updated after each transferred
- burst. This is typically only supported if the hardware has a progress
- register of some sort (E.g. a register with the current read/write address
- or a register with the amount of bursts/beats/bytes that have been
- transferred or still need to be transferred).
The pointer callback of both snd-hda-intel and snd-oxygen are reading from hardware register
Do this mezn that only those driver which pointer callback read from hardware register are suitable for timer schedluing ?
Yes, more or less, but not exclusively. The pointer callback needs to a have a low enough precession for timer based scheduling to work. Where the threshold is where it starts and stops to work is a bit hard to say. Period based resolution is not good enough, a few samples based resolution is very good. USB seems to be somewhere in the middle between those two, the resolution is better then a full period, but more than just a few samples. But it seems to be good enough for the timer based scheduling algorithm to work. It still gets disabled for USB though since the USB driver stets the BATCH flag.
The main issue is that there is currently no method for a driver to advertise the exact pointer granularity to userspace applications. There is either BATCH flag not set which means very good accuracy or BATCH flag set which means very bad accuracy. USB which does not have a very good accuracy, but still a reasonably good accuracy, is somewhere in the middle between those two extremes.
Do you need additional hardware specific feature (e.g. can disable period interrupt ) ?
No. Disabling the period interrupt is a feature that can be used to reduce power consumption and is optional.
https://bugs.freedesktop.org/show_bug.cgi?id=86262
Although the resolution seem less than period size but the deviation is quite large , can we still regard this result are accurate up to one
period ?
This looks like a separate issue that's just made visible by the BATCH
flag patch with that particular hardware.
https://bugs.freedesktop.org/show_bug.cgi?id=86262#c19
The hw_ptr of those two usb audio devices seem increment in 5 to 6ms which is quite different from
https://git.kernel.org/cgit/linux/kernel/git/tiwai/sound.git/commit/sound/us...
For example, testing shows that a high-speed device can handle 32 frames/period and 3 periods/buffer at 48 KHz, whereas the current driver starts to get glitchy at 64 frames/period and 2 periods/buffer.
Do the result mean that only high speed device support 1.5ms latency while the other usb audio devices may support 15 to 18 ms ?
http://lists.freedesktop.org/archives/pulseaudio-discuss/2015-June/024022.ht...
This is accurate up to one period
Why do we need flag for exactly one period since most of the driver use irq should increment one period when interrupt occur ?
We only need to know those good accuracy can use timer base and those bad accuracy need special care
Yes, and that's what the BATCH flag tells us.
For those sound cards support low latency, the application must use smaller petiod size which match the requestef latency. Most drivers only support fixed latency. Only those sound card with good accuracy can adjust latency without changing period size
Yes, without a good precession pointer the lower bound for the latency will be determined by the period size. A smaller period size means less latency.
At Wed, 17 Jun 2015 10:27:08 +0200, Lars-Peter Clausen wrote:
On 06/16/2015 04:33 AM, Raymond Yau wrote:
DMA_RESIDUE_GRANULARITY_DESCRIPTOR DMA_RESIDUE_GRANULARITY_SEGMENT DMA_RESIDUE_GRANULARITY_BURST
Can this be regard as bad, normal and good accuracy ?
DMA_RESIDUE_GRANULARITY_DESCRIPTOR is more like completely and utterly
useless. It means the driver can only tell whether the descriptor has finished or not, in a cyclic transfer the descriptor will never finish so the driver will always report the same. Currently we fall back to counting the number of period completion callbacks we get in this mode. This is slightly prone to race conditions since it is legal to coalesce two completion callbacks if a second one is scheduled before the first has run. This will only happen with a very high system load, but it can happen. Luckily most DMA drivers that are used for audio do support DMA_RESIDUE_GRANULARITY_SEGMENT now and there is a deprecation plan, including eventually stopping to support them altogether, for drivers that only support DMA_RESIDUE_GRANULARITY_DESCRIPTOR.
DMA_RESIDUE_GRANULARITY_SEGMENT means the hardware/driver can report the
pointer position with a period precession. In this case the BATCH flag will be set for the PCM.
DMA_RESIDUE_GRANULARITY_BURST means it can report the position with a
granularity of a few samples. Typically half the audio FIFO size.
Residue is updated after each transferred
- burst. This is typically only supported if the hardware has a progress
- register of some sort (E.g. a register with the current read/write address
- or a register with the amount of bursts/beats/bytes that have been
- transferred or still need to be transferred).
The pointer callback of both snd-hda-intel and snd-oxygen are reading from hardware register
Do this mezn that only those driver which pointer callback read from hardware register are suitable for timer schedluing ?
Yes, more or less, but not exclusively. The pointer callback needs to a have a low enough precession for timer based scheduling to work. Where the threshold is where it starts and stops to work is a bit hard to say. Period based resolution is not good enough, a few samples based resolution is very good. USB seems to be somewhere in the middle between those two, the resolution is better then a full period, but more than just a few samples. But it seems to be good enough for the timer based scheduling algorithm to work. It still gets disabled for USB though since the USB driver stets the BATCH flag.
The main issue is that there is currently no method for a driver to advertise the exact pointer granularity to userspace applications. There is either BATCH flag not set which means very good accuracy or BATCH flag set which means very bad accuracy. USB which does not have a very good accuracy, but still a reasonably good accuracy, is somewhere in the middle between those two extremes.
Well, USB-audio has another problem. USB-audio uses the intermediate audio ring buffer, and the samples are copied to each URB buffer. At each packet complete, the driver copies the rest of sample chunk again, and advances the hwptr when the packets. So, the hwptr of USB-audio is in advance of the actual sample position. But we provide the runtime delay information for user-space to correct to the more accurate sample position. So far, so good.
A missing piece in this picture is, however, the position of the not-yet-queued samples in ring buffer. Basically you can rewrite / rewind the sample at most this point, but not farther -- such in-flight samples can't be modified any longer. This can be seen a kind of hardware fifo with a pretty big and non-continuously variable size during operation.
In that sense, get_fifo() looks like a candidate for giving such information, indeed. But reviving the old (and rather bad working) API appears dangerous to me. I'd prefer creating a new API function instead, if any.
BTW, because of its design like above, a large (or no) period size doesn't help for saving power at all with USB-audio. This should be considered before the further discussion...
Takashi
On 2015-06-17 11:19, Takashi Iwai wrote:
Well, USB-audio has another problem. USB-audio uses the intermediate audio ring buffer, and the samples are copied to each URB buffer. At each packet complete, the driver copies the rest of sample chunk again, and advances the hwptr when the packets. So, the hwptr of USB-audio is in advance of the actual sample position. But we provide the runtime delay information for user-space to correct to the more accurate sample position. So far, so good.
A missing piece in this picture is, however, the position of the not-yet-queued samples in ring buffer. Basically you can rewrite / rewind the sample at most this point, but not farther -- such in-flight samples can't be modified any longer. This can be seen a kind of hardware fifo with a pretty big and non-continuously variable size during operation.
In that sense, get_fifo() looks like a candidate for giving such information, indeed. But reviving the old (and rather bad working) API appears dangerous to me. I'd prefer creating a new API function instead, if any.
BTW, because of its design like above, a large (or no) period size doesn't help for saving power at all with USB-audio. This should be considered before the further discussion...
Hmm...I was trying to understand this power save argument. I tried to figure out a "typical" URB size by just plugging my headset in, and I saw wMaxPacketSize being 96 and/or 192 bytes. Then, MAX_PACKS is set to either 6 (or 48 for USB 2.0 devices, but this is just a headset).
Can this be correct? Does it mean that we are getting interrupts every 192 * 6 bytes (i e, every 6 ms for a 48kHz/stereo/16bit stream)?
On 17.06.2015 20:09, David Henningsson wrote:
On 2015-06-17 11:19, Takashi Iwai wrote:
Well, USB-audio has another problem. USB-audio uses the intermediate audio ring buffer, and the samples are copied to each URB buffer. At each packet complete, the driver copies the rest of sample chunk again, and advances the hwptr when the packets. So, the hwptr of USB-audio is in advance of the actual sample position. But we provide the runtime delay information for user-space to correct to the more accurate sample position. So far, so good.
A missing piece in this picture is, however, the position of the not-yet-queued samples in ring buffer. Basically you can rewrite / rewind the sample at most this point, but not farther -- such in-flight samples can't be modified any longer. This can be seen a kind of hardware fifo with a pretty big and non-continuously variable size during operation.
In that sense, get_fifo() looks like a candidate for giving such information, indeed. But reviving the old (and rather bad working) API appears dangerous to me. I'd prefer creating a new API function instead, if any.
BTW, because of its design like above, a large (or no) period size doesn't help for saving power at all with USB-audio. This should be considered before the further discussion...
Hmm...I was trying to understand this power save argument. I tried to figure out a "typical" URB size by just plugging my headset in, and I saw wMaxPacketSize being 96 and/or 192 bytes. Then, MAX_PACKS is set to either 6 (or 48 for USB 2.0 devices, but this is just a headset).
Can this be correct? Does it mean that we are getting interrupts every 192 * 6 bytes (i e, every 6 ms for a 48kHz/stereo/16bit stream)?
At least that's how often the position gets updated in the worst case. I have attached a report for a Logitech USB headset and a test program where you can modify the buffer and period sizes. A line is logged every time snd_pcm_avail() returns a different value.
Well, USB-audio has another problem. USB-audio uses the intermediate audio ring buffer, and the samples are copied to each URB buffer. At each packet complete, the driver copies the rest of sample chunk again, and advances the hwptr when the packets. So, the hwptr of USB-audio is in advance of the actual sample position. But we provide the runtime delay information for user-space to correct to the more accurate sample position. So far, so good.
A missing piece in this picture is, however, the position of the not-yet-queued samples in ring buffer. Basically you can rewrite / rewind the sample at most this point, but not farther -- such in-flight samples can't be modified any longer. This can be seen a kind of hardware fifo with a pretty big and non-continuously variable size during operation.
In that sense, get_fifo() looks like a candidate for giving such information, indeed. But reviving the old (and rather bad working) API appears dangerous to me. I'd prefer creating a new API function instead, if any.
BTW, because of its design like above, a large (or no) period size doesn't help for saving power at all with USB-audio. This should be considered before the further discussion...
Hmm...I was trying to understand this power save argument. I tried to figure out a "typical" URB size by just plugging my headset in, and I saw wMaxPacketSize being 96 and/or 192 bytes. Then, MAX_PACKS is set to either 6 (or 48 for USB 2.0 devices, but this is just a headset).
Can this be correct? Does it mean that we are getting interrupts every 192 * 6 bytes (i e, every 6 ms for a 48kHz/stereo/16bit stream)?
At least that's how often the position gets updated in the worst case. I
have attached a report for a Logitech USB headset and a test program where you can modify the buffer and period sizes. A line is logged every time snd_pcm_avail() returns a different value.
How about the result using multiple of 48 frames as period size since min period size is 48 frames (1 ms) , it seem that your tests are not using the optimal period size
18.06.2015 08:15, Raymond Yau wrote:
How about the result using multiple of 48 frames as period size since min period size is 48 frames (1 ms) , it seem that your tests are not using the optimal period size
Attached. See the second and the third tests in the file. Also, on request from David, I have added logging of the delay in addition to avail.
Hmm...I was trying to understand this power save argument. I tried to
figure out a "typical" URB size by just plugging my headset in, and I saw wMaxPacketSize being 96 and/or 192 bytes.
Then, MAX_PACKS is set to either 6 (or 48 for USB 2.0 devices, but this
is just a headset).
Can this be correct? Does it mean that we are getting interrupts every
192 * 6 bytes (i e, every 6 ms for a 48kHz/stereo/16bit stream)?
Do this mean that the driver report exact one period only when period size is multiple of wMaxPackerSize ?
Using other period size give bad result , the driver use variable period size
hw_ptr does not always at period boundary , seem more like DSP_CAP_BATCH
From the result , minimum and maximum time difference bewteen hw_ptr change
can varies from 20% to 400%
At Wed, 17 Jun 2015 17:09:41 +0200, David Henningsson wrote:
On 2015-06-17 11:19, Takashi Iwai wrote:
Well, USB-audio has another problem. USB-audio uses the intermediate audio ring buffer, and the samples are copied to each URB buffer. At each packet complete, the driver copies the rest of sample chunk again, and advances the hwptr when the packets. So, the hwptr of USB-audio is in advance of the actual sample position. But we provide the runtime delay information for user-space to correct to the more accurate sample position. So far, so good.
A missing piece in this picture is, however, the position of the not-yet-queued samples in ring buffer. Basically you can rewrite / rewind the sample at most this point, but not farther -- such in-flight samples can't be modified any longer. This can be seen a kind of hardware fifo with a pretty big and non-continuously variable size during operation.
In that sense, get_fifo() looks like a candidate for giving such information, indeed. But reviving the old (and rather bad working) API appears dangerous to me. I'd prefer creating a new API function instead, if any.
BTW, because of its design like above, a large (or no) period size doesn't help for saving power at all with USB-audio. This should be considered before the further discussion...
Hmm...I was trying to understand this power save argument. I tried to figure out a "typical" URB size by just plugging my headset in, and I saw wMaxPacketSize being 96 and/or 192 bytes. Then, MAX_PACKS is set to either 6 (or 48 for USB 2.0 devices, but this is just a headset).
Can this be correct? Does it mean that we are getting interrupts every 192 * 6 bytes (i e, every 6 ms for a 48kHz/stereo/16bit stream)?
The driver can build up a URB containing multiple packets, so the wakeups can be reduced in some level. But, then the hwptr update also suffers, and more badly, the in-flight size also increases -- both are bad for sample mixing, obviously.
Takashi
Well, USB-audio has another problem. USB-audio uses the intermediate audio ring buffer, and the samples are copied to each URB buffer. At each packet complete, the driver copies the rest of sample chunk again, and advances the hwptr when the packets. So, the hwptr of USB-audio is in advance of the actual sample position. But we provide the runtime delay information for user-space to correct to the more accurate sample position. So far, so good.
A missing piece in this picture is, however, the position of the not-yet-queued samples in ring buffer. Basically you can rewrite / rewind the sample at most this point, but not farther -- such in-flight samples can't be modified any longer. This can be seen a kind of hardware fifo with a pretty big and non-continuously variable size during operation.
In that sense, get_fifo() looks like a candidate for giving such information, indeed. But reviving the old (and rather bad working) API appears dangerous to me. I'd prefer creating a new API function instead, if any.
BTW, because of its design like above, a large (or no) period size doesn't help for saving power at all with USB-audio. This should be considered before the further discussion...
Hmm...I was trying to understand this power save argument. I tried to figure out a "typical" URB size by just plugging my headset in, and I saw wMaxPacketSize being 96 and/or 192 bytes. Then, MAX_PACKS is set to either 6 (or 48 for USB 2.0 devices, but this is just a headset).
Can this be correct? Does it mean that we are getting interrupts every 192 * 6 bytes (i e, every 6 ms for a 48kHz/stereo/16bit stream)?
The driver can build up a URB containing multiple packets, so the wakeups can be reduced in some level. But, then the hwptr update also suffers, and more badly, the in-flight size also increases -- both are bad for sample mixing, obviously.
http://lxr.free-electrons.com/ident?i=SNDRV_PCM_INFO_BATCH
If info batch represent exact one period,
do you mean snd-intel8x0 is worst or better ?
How about firewire which also use packets ?
Hmm...I was trying to understand this power save argument. I tried to figure out a "typical" URB size by just plugging my headset in, and I saw wMaxPacketSize being 96 and/or 192 bytes. Then, MAX_PACKS is set to either 6 (or 48 for USB 2.0 devices, but this is just a headset).
Can this be correct? Does it mean that we are getting interrupts every 192 * 6 bytes (i e, every 6 ms for a 48kHz/stereo/16bit stream)?
The driver can build up a URB containing multiple packets, so the wakeups can be reduced in some level. But, then the hwptr update also suffers, and more badly, the in-flight size also increases -- both are bad for sample mixing, obviously.
Do it mean that period_bytes_max is wMaxPacketSize * MAX_PACKS, driver only support small period sizes ?
On 06/22/2015 04:35 AM, Raymond Yau wrote:
DMA_RESIDUE_GRANULARITY_BURST means it can report the position with a
granularity of a few samples. Typically half the audio FIFO size.
How many soc driver/platform dma engine support DMA_RESIDUE_GRANULARITY_BURST ?
Not sure, maybe half of them. But most of the time it is a software restriction rather than a hardware restriction and more and more drivers are supporting it.
Are all soc audio driver use cyclic dma ?
I'm not sure I understand the question. All audio drivers use some kind of cyclic DMA. ASoC platforms which do not have a dedicated audio DMA typically use dmaengine for this, while others which have dedicated audio DMA do the DMA as part of the audio driver.
2015-6-22 下午2:43 於 "Lars-Peter Clausen" lars@metafoo.de 寫道:
On 06/22/2015 04:35 AM, Raymond Yau wrote:
DMA_RESIDUE_GRANULARITY_BURST means it can report the position with a
granularity of a few samples. Typically half the audio FIFO size.
How many soc driver/platform dma engine support
DMA_RESIDUE_GRANULARITY_BURST ?
Not sure, maybe half of them. But most of the time it is a software
restriction rather than a hardware restriction and more and more drivers are supporting it.
https://bugs.freedesktop.org/attachment.cgi?id=107650
snd_rpi_hifiberry_amp
granularity seem 4 frames most of the time but sometime there is a large delay between 3584 and 3988 ,seem cannot reach 4096
Available: 3580, loop iteration: 420 Available: 3584, loop iteration: 421 Available: 3988, loop iteration: 422 Available: 3996, loop iteration: 423 Available: 4004, loop iteration: 424 Available: 4012, loop iteration: 425
Are the pointer callback really use value read from hardware register or value calculated from clock ?
Lars-Peter Clausen wrote:
On 06/22/2015 04:35 AM, Raymond Yau wrote:
Are all soc audio driver use cyclic dma ?
I'm not sure I understand the question. All audio drivers use some kind of cyclic DMA.
The ALSA API requires the driver to provide a cyclic sample buffer (or something that behaves like one).
However, not all hardware works this way. USB and FireWire require the driver to continually queue new packets, whose size and timing are determined by the bus clock and are not directly related to the ALSA ring buffer. These drivers use double buffering; the actual DMA happens from those packets, not from the ring buffer.
Regards, Clemens
Are all soc audio driver use cyclic dma ?
I'm not sure I understand the question. All audio drivers use some kind of cyclic DMA.
The ALSA API requires the driver to provide a cyclic sample buffer (or something that behaves like one).
However, not all hardware works this way. USB and FireWire require the driver to continually queue new packets, whose size and timing are determined by the bus clock and are not directly related to the ALSA ring buffer. These drivers use double buffering; the actual DMA happens from those packets, not from the ring buffer.
If those queued packets/urb cannot be rewind, snd_pcm_rewindable should return zero for those driver
22.06.2015 16:54, Raymond Yau wrote:
Are all soc audio driver use cyclic dma ?
I'm not sure I understand the question. All audio drivers use some kind of cyclic DMA.
The ALSA API requires the driver to provide a cyclic sample buffer (or something that behaves like one).
However, not all hardware works this way. USB and FireWire require the driver to continually queue new packets, whose size and timing are determined by the bus clock and are not directly related to the ALSA ring buffer. These drivers use double buffering; the actual DMA happens from those packets, not from the ring buffer.
If those queued packets/urb cannot be rewind, snd_pcm_rewindable should return zero for those driver
Not really.
As I understand it, the kernel periodically converts a piece of the ring buffer (located in RAM) into an URB, and it gets sent through the USB bus. Parts of the buffer that are not yet converted to URB are perfectly rewindable.
In other words, for USB devices, the kernel already implements the "low-latency background thread that makes unrewindable devices rewindable" idea that I discussed (as a strawman proposal) here for userspace:
http://mailman.alsa-project.org/pipermail/alsa-devel/2014-September/080868.h...
The ALSA API requires the driver to provide a cyclic sample buffer (or something that behaves like one).
However, not all hardware works this way. USB and FireWire require the driver to continually queue new packets, whose size and timing are determined by the bus clock and are not directly related to the ALSA ring buffer. These drivers use double buffering; the actual DMA happens from those packets, not from the ring buffer.
If those queued packets/urb cannot be rewind, snd_pcm_rewindable should return zero for those driver
Not really.
As I understand it, the kernel periodically converts a piece of the ring
buffer (located in RAM) into an URB, and it gets sent through the USB bus. Parts of the buffer that are not yet converted to URB are perfectly rewindable.
In other words, for USB devices, the kernel already implements the
"low-latency background thread that makes unrewindable devices rewindable" idea that I discussed (as a strawman proposal) here for userspace:
http://mailman.alsa-project.org/pipermail/alsa-devel/2014-September/080868.h...
This mean that SNDRV_PCM_INFO_BATCH represent exact one period is not correct for usb and firewire since hw_ptr does not increment in period size
Do this mean .period_bytes_min of snd-usb-audio is incorrect since .period_bytes_min should be at least size of urb/packet ?
22.06.2015 17:34, Raymond Yau wrote:
The ALSA API requires the driver to provide a cyclic sample buffer (or something that behaves like one).
However, not all hardware works this way. USB and FireWire require the driver to continually queue new packets, whose size and timing are determined by the bus clock and are not directly related to the ALSA ring buffer. These drivers use double buffering; the actual DMA
happens
from those packets, not from the ring buffer.
If those queued packets/urb cannot be rewind, snd_pcm_rewindable should return zero for those driver
Not really.
As I understand it, the kernel periodically converts a piece of the
ring buffer (located in RAM) into an URB, and it gets sent through the USB bus. Parts of the buffer that are not yet converted to URB are perfectly rewindable.
In other words, for USB devices, the kernel already implements the
"low-latency background thread that makes unrewindable devices rewindable" idea that I discussed (as a strawman proposal) here for userspace:
http://mailman.alsa-project.org/pipermail/alsa-devel/2014-September/080868.h...
This mean that SNDRV_PCM_INFO_BATCH represent exact one period is not correct for usb and firewire since hw_ptr does not increment in period size
Well, according to the new definition, "SNDRV_PCM_INFO_BATCH on the other hand has become to mean that the device is only capable of reporting the audio pointer with a coarse granularity". In the USB case, we indeed have coarse granularity (6 ms in the worst case), but not as bad as one period.
Do this mean .period_bytes_min of snd-usb-audio is incorrect since .period_bytes_min should be at least size of urb/packet ?
I don't see anything wrong here. With the USB device that my colleague has here at work, the minimum period size is 48 samples, i.e. 1 ms, which looks exactly like one USB data packet.
The ALSA API requires the driver to provide a cyclic sample buffer
(or
something that behaves like one).
However, not all hardware works this way. USB and FireWire require
the
driver to continually queue new packets, whose size and timing are determined by the bus clock and are not directly related to the ALSA ring buffer. These drivers use double buffering; the actual DMA
happens
from those packets, not from the ring buffer.
If those queued packets/urb cannot be rewind, snd_pcm_rewindable
should
return zero for those driver
Not really.
As I understand it, the kernel periodically converts a piece of the
ring buffer (located in RAM) into an URB, and it gets sent through the USB bus. Parts of the buffer that are not yet converted to URB are perfectly rewindable.
In other words, for USB devices, the kernel already implements the
"low-latency background thread that makes unrewindable devices rewindable" idea that I discussed (as a strawman proposal) here for userspace:
http://mailman.alsa-project.org/pipermail/alsa-devel/2014-September/080868.h...
This mean that SNDRV_PCM_INFO_BATCH represent exact one period is not correct for usb and firewire since hw_ptr does not increment in period
size
Well, according to the new definition, "SNDRV_PCM_INFO_BATCH on the other
hand has become to mean that the device is only capable of reporting the audio pointer with a coarse granularity". In the USB case, we indeed have coarse granularity (6 ms in the worst case), but not as bad as one period.
Do this mean .period_bytes_min of snd-usb-audio is incorrect since .period_bytes_min should be at least size of urb/packet ?
I don't see anything wrong here. With the USB device that my colleague
has here at work, the minimum period size is 48 samples, i.e. 1 ms, which looks exactly like one USB data packet.
What is the smallest buffer time which your usb audio can playback without underrun using aplay with two periods (2ms or 12ms) ?
22.06.2015 20:50, Raymond Yau wrote:
The ALSA API requires the driver to provide a cyclic sample
buffer (or
something that behaves like one).
However, not all hardware works this way. USB and FireWire
require the
driver to continually queue new packets, whose size and timing are determined by the bus clock and are not directly related to the
ALSA
ring buffer. These drivers use double buffering; the actual DMA
happens
from those packets, not from the ring buffer.
If those queued packets/urb cannot be rewind, snd_pcm_rewindable
should
return zero for those driver
Not really.
As I understand it, the kernel periodically converts a piece of the
ring buffer (located in RAM) into an URB, and it gets sent through the USB bus. Parts of the buffer that are not yet converted to URB are perfectly rewindable.
In other words, for USB devices, the kernel already implements the
"low-latency background thread that makes unrewindable devices rewindable" idea that I discussed (as a strawman proposal) here for userspace:
http://mailman.alsa-project.org/pipermail/alsa-devel/2014-September/080868.h...
This mean that SNDRV_PCM_INFO_BATCH represent exact one period is not correct for usb and firewire since hw_ptr does not increment in
period size
Well, according to the new definition, "SNDRV_PCM_INFO_BATCH on the
other hand has become to mean that the device is only capable of reporting the audio pointer with a coarse granularity". In the USB case, we indeed have coarse granularity (6 ms in the worst case), but not as bad as one period.
Do this mean .period_bytes_min of snd-usb-audio is incorrect since .period_bytes_min should be at least size of urb/packet ?
I don't see anything wrong here. With the USB device that my
colleague has here at work, the minimum period size is 48 samples, i.e. 1 ms, which looks exactly like one USB data packet.
What is the smallest buffer time which your usb audio can playback without underrun using aplay with two periods (2ms or 12ms) ?
That device is at work, and is not mine. So, the test below has been done with a different USB device that I have at home, namely, ROTEL RA-1570 integrated amplifier. It has a menu option to select either 1.0 or 2.0 USB audio class, I have set it to 1.0 to match that C-Media device. By default, with large-enough period size, it has avail granularity that jumps between 3 and 4 ms, and delay granularity of 1 ms (even if I select a better period size, like 960 frames).
======= testing hw:3 ======= min_period_size: 48 frames, dir: 0 FIFO size is 0 Hardware PCM card 3 'Rotel PC-USB' device 0 subdevice 0 Its setup is: stream : PLAYBACK access : RW_INTERLEAVED format : S16_LE subformat : STD channels : 2 rate : 48000 exact rate : 48000 (48000/1) msbits : 16 buffer_size : 4096 period_size : 1024 period_time : 21333 tstamp_mode : NONE tstamp_type : MONOTONIC period_step : 1 avail_min : 1024 period_event : 0 start_threshold : 1024 stop_threshold : 4096 silence_threshold: 0 silence_size : 0 boundary : 4611686018427387904 appl_ptr : 0 hw_ptr : 0 Playing silence Available: 0, loop iteration: 0, diff: 0, timestamp diff: 1 usec Available: 192, loop iteration: 32070, diff: 32070, timestamp diff: 3923 usec Available: 336, loop iteration: 37315, diff: 5245, timestamp diff: 4003 usec Available: 480, loop iteration: 42710, diff: 5395, timestamp diff: 3999 usec Available: 624, loop iteration: 48292, diff: 5582, timestamp diff: 3999 usec Available: 768, loop iteration: 53880, diff: 5588, timestamp diff: 4000 usec Available: 912, loop iteration: 57858, diff: 3978, timestamp diff: 3000 usec Available: 1056, loop iteration: 62125, diff: 4267, timestamp diff: 3000 usec Available: 1200, loop iteration: 66536, diff: 4411, timestamp diff: 3000 usec Available: 1344, loop iteration: 70679, diff: 4143, timestamp diff: 3001 usec Available: 1488, loop iteration: 75091, diff: 4412, timestamp diff: 2999 usec Available: 1632, loop iteration: 79368, diff: 4277, timestamp diff: 3000 usec Available: 1776, loop iteration: 83675, diff: 4307, timestamp diff: 3000 usec Available: 1920, loop iteration: 87964, diff: 4289, timestamp diff: 3000 usec Available: 2064, loop iteration: 92384, diff: 4420, timestamp diff: 2999 usec Available: 2208, loop iteration: 96638, diff: 4254, timestamp diff: 3001 usec Available: 2353, loop iteration: 100900, diff: 4262, timestamp diff: 3010 usec Available: 2497, loop iteration: 105000, diff: 4100, timestamp diff: 2991 usec Available: 2641, loop iteration: 109291, diff: 4291, timestamp diff: 2999 usec Available: 2785, loop iteration: 113714, diff: 4423, timestamp diff: 3000 usec Available: 2977, loop iteration: 117955, diff: 4241, timestamp diff: 3000 usec Available: 3074, loop iteration: 122323, diff: 4368, timestamp diff: 2999 usec Available: 3266, loop iteration: 126556, diff: 4233, timestamp diff: 3001 usec Available: 3410, loop iteration: 130983, diff: 4427, timestamp diff: 3000 usec Available: 3554, loop iteration: 136790, diff: 5807, timestamp diff: 4000 usec Available: 3698, loop iteration: 139699, diff: 2909, timestamp diff: 2000 usec Available: 3842, loop iteration: 145320, diff: 5621, timestamp diff: 4000 usec Available: 3986, loop iteration: 149502, diff: 4182, timestamp diff: 3000 usec
This command produces no xruns with a 48 kHz stereo S16_LE file:
aplay --buffer-size 96 --period-size 48 -D hw:3 test1.wav
$ cat /proc/asound/PCUSB/pcm0p/sub0/info card: 3 device: 0 subdevice: 0 stream: PLAYBACK id: USB Audio name: USB Audio subname: subdevice #0 class: 0 subclass: 0 subdevices_count: 1 subdevices_avail: 0
$ cat /proc/asound/PCUSB/pcm0p/sub0/hw_params access: RW_INTERLEAVED format: S16_LE subformat: STD channels: 2 rate: 48000 (48000/1) period_size: 48 buffer_size: 96
$ cat /proc/asound/PCUSB/pcm0p/sub0/sw_params tstamp_mode: NONE period_step: 1 avail_min: 48 start_threshold: 96 stop_threshold: 96 silence_threshold: 0 silence_size: 0 boundary: 6917529027641081856
After several minutes, aplay did not report any xruns, and the sound is still clean:
$ cat /proc/asound/PCUSB/pcm0p/sub0/status state: RUNNING owner_pid : 2505 trigger_time: 4025.921768556 tstamp : 0.000000000 delay : 145 avail : 0 avail_max : 48 ----- hw_ptr : 25506151 appl_ptr : 25506247
The minimum reported period size at 48 kHz is 48 frames.
> However, not all hardware works this way. USB and FireWire
require the
> driver to continually queue new packets, whose size and timing
are
> determined by the bus clock and are not directly related to the
ALSA
> ring buffer. These drivers use double buffering; the actual DMA
happens
> from those packets, not from the ring buffer. >
If those queued packets/urb cannot be rewind, snd_pcm_rewindable
should
return zero for those driver
Not really.
As I understand it, the kernel periodically converts a piece of
the
ring buffer (located in RAM) into an URB, and it gets sent through
the
USB bus. Parts of the buffer that are not yet converted to URB are perfectly rewindable.
In other words, for USB devices, the kernel already implements the
"low-latency background thread that makes unrewindable devices rewindable" idea that I discussed (as a strawman proposal) here for userspace:
http://mailman.alsa-project.org/pipermail/alsa-devel/2014-September/080868.h...
This mean that SNDRV_PCM_INFO_BATCH represent exact one period is not correct for usb and firewire since hw_ptr does not increment in
period size
Well, according to the new definition, "SNDRV_PCM_INFO_BATCH on the
other hand has become to mean that the device is only capable of reporting the audio pointer with a coarse granularity". In the USB case, we indeed have coarse granularity (6 ms in the worst case), but not as bad as one period.
Do this mean .period_bytes_min of snd-usb-audio is incorrect since .period_bytes_min should be at least size of urb/packet ?
I don't see anything wrong here. With the USB device that my
colleague has here at work, the minimum period size is 48 samples, i.e. 1 ms, which looks exactly like one USB data packet.
What is the smallest buffer time which your usb audio can playback without underrun using aplay with two periods (2ms or 12ms) ?
That device is at work, and is not mine. So, the test below has been done
with a different USB device that I have at home, namely, ROTEL RA-1570 integrated amplifier. It has a menu option to select either 1.0 or 2.0 USB audio class, I have set it to 1.0 to match that C-Media device. By default, with large-enough period size, it has avail granularity that jumps between 3 and 4 ms, and delay granularity of 1 ms (even if I select a better period size, like 960 frames).
Seem time diff vary from 2 to 4 ms
Available: 3410, loop iteration: 130983, diff: 4427, timestamp diff: 3000
usec
Available: 3554, loop iteration: 136790, diff: 5807, timestamp diff: 4000
usec
Available: 3698, loop iteration: 139699, diff: 2909, timestamp diff: 2000
usec
Avail diff can be odd number too
Available: 2353, loop iteration: 100900, diff: 4262, timestamp diff: 3010 usec
Available: 2497, loop iteration: 105000, diff: 4100, timestamp diff: 2991
usec
Available: 2641, loop iteration: 109291, diff: 4291, timestamp diff: 2999
usec
Available: 2785, loop iteration: 113714, diff: 4423, timestamp diff: 3000
usec
Available: 2977, loop iteration: 117955, diff: 4241, timestamp diff: 3000
usec
https://bugs.freedesktop.org/show_bug.cgi?id=86262
The driver does not ensure granularity must be less than period size
granularity can be larger than period size
So it is not just coarse but inaccurate
min_period_size: 6 frames, dir: 0 FIFO size is 0 Hardware PCM card 1 'Scarlett 18i8 USB' device 0 subdevice 0 Its setup is: stream : PLAYBACK access : RW_INTERLEAVED format : S32_LE subformat : STD channels : 8 rate : 48000 exact rate : 48000 (48000/1) msbits : 32 buffer_size : 4096 period_size : 64 period_time : 1333 tstamp_mode : NONE tstamp_type : MONOTONIC period_step : 1 avail_min : 64 period_event : 0 start_threshold : 64 stop_threshold : 4096 silence_threshold: 0 silence_size : 0 boundary : 4611686018427387904 appl_ptr : 0 hw_ptr : 0 Playing silence Available: 0, loop iteration: 0, diff: 0, timestamp diff: 3 usec Available: 66, loop iteration: 9066, diff: 9066, timestamp diff: 1395 usec Available: 132, loop iteration: 11289, diff: 2223, timestamp diff: 1622 usec Available: 192, loop iteration: 13351, diff: 2062, timestamp diff: 1625 usec Available: 258, loop iteration: 15600, diff: 2249, timestamp diff: 1625 usec Available: 325, loop iteration: 17822, diff: 2222, timestamp diff: 1625 usec Available: 385, loop iteration: 20092, diff: 2270, timestamp diff: 1626 usec Available: 451, loop iteration: 22045, diff: 1953, timestamp diff: 1625 usec Available: 512, loop iteration: 24239, diff: 2194, timestamp diff: 1630 usec Available: 578, loop iteration: 26305, diff: 2066, timestamp diff: 1619 usec Available: 644, loop iteration: 28447, diff: 2142, timestamp diff: 1626 usec
On jUN 22 2015 21:10, Alexander E. Patrakov wrote:
22.06.2015 16:54, Raymond Yau wrote:
Are all soc audio driver use cyclic dma ?
I'm not sure I understand the question. All audio drivers use some kind of cyclic DMA.
The ALSA API requires the driver to provide a cyclic sample buffer (or something that behaves like one).
However, not all hardware works this way. USB and FireWire require the driver to continually queue new packets, whose size and timing are determined by the bus clock and are not directly related to the ALSA ring buffer. These drivers use double buffering; the actual DMA happens from those packets, not from the ring buffer.
If those queued packets/urb cannot be rewind, snd_pcm_rewindable should return zero for those driver
No need at all for ALSA firewire drivers, because PCM frames which is not transferred are still rewindable.
Current ALSA firewire stack implementation, 16 packets are queued in one callback from software interrupt context. In IEEE 1394 bus, 8,000 packets are transmitted per second. Therefore, one callback processes PCM frames around 2 msec. (Actually, in IEC 61883-6, two ways to transfer PCM frames in these packets, thus one callback cannot processes PCM frames 'just' 2 msec.)
Let's consider about actual time, for example, in my system: * Software interrupt interval takes 1.986-2.015msec * The processing of one callback takes 0.024-0.026msec
(Intel DH77DF, Core i3-2120T, 4GB RAM, VT6315 as 1394 OHCI controller)
During processing one callback, pcm frames are copied to packet buffer. The 'hw_ptr' is updated every time the number of copied pcm frames is across period boundary. Therefore, the 'hw_ptr' suddenly jumps by one period. This is a reason to use 'SNDRV_PCM_INFO_BLOCK_TRANSFER'.
While, snd-firewire-lib has internal representation to count the number of transferred PCM frames. The 'struct snd_pcm_ops.pointer()' returns the counter in the representation. This counter is updated during the callback processing, while it's not updated between the callback. Therefore, the counter changes during 0.024-0.026msec and calm during 1.986-2.015msec. In short, the counter doesn't move continuously. Roughly, this may be observed as the returned value of 'struct snd_pcm_ops.pointer()' changes every 1.986-2.015msec. This is a reason to use 'SNDRV_PCM_INFO_BATCH'.
(I think we use these two flags as the same way as Lars explained: http://mailman.alsa-project.org/pipermail/alsa-devel/2015-June/093750.html)
(In my understanding, 'SNDRV_PCM_INFO_BATCH' means that 'struct snd_pcm_ops.pointer()' returns correct value in PCM period/buffer but the value doesn't change constantly against actual time frame. Therefore, several periods must be included in the buffer.)
Well, here, I describe two aspects; one is for period size and across-boundary, another is for rewindability.
* period size and across-boundary: All of ALSA firewire drivers has a restriction that the minimum size of period is equivalent to 5 msec. Our intension is that at least one callback from software interrupt context occurs during processing PCM frames equivalent to the size of period. ('5' msec is for safe. '3' msec may be OK, I think.)
* rewindability During the 0.024-0.026msec, snd_pcm_rewindable() returns the value based on currect pointer position in the internal representation in snd-firewire-lib, while the value is immediately changed and the retrieved value becomes old. During the 1.986-2.015msec, snd_pcm_rewindable returns the fastened value because no PCM frames are processed for transmission.
By the way, abour the packetization and PCM frame processing, I've written a report: https://github.com/takaswie/alsa-firewire-report
If you want to post your questions about ALSA firewire stack, please read this report and consider about your issues before posting it, Raymond. Of cource, I'm welcome to receive any questions or indications about the content of my report.
Regards
Takashi Sakamoto
15.06.2015 13:03, Lars-Peter Clausen wrote:
On 06/12/2015 02:29 PM, Arun Raghavan wrote:
So I guess the first thing that would be nice to have is a clear meaning of these two flags. With this done, we could potentially get to the API to report the transfer size from the driver.
Yeah, the meaning of those flags is somewhat fuzzy and may have changed over time as well. Here is my understanding of the flags, might not necessarily be 100% correct.
SNDRV_PCM_INFO_BLOCK_TRANSFER means that the data is copied from the user accessible buffer in blocks of one period. Typically these kinds of devices have some dedicated audio memory that is not accessible via normal memory access and a DMA is setup to copy data from main memory to the dedicated memory. This DMA transfers the data from the main memory to the dedicated memory in chunks of period size. But otherwise the controller might still be capable of reporting a accurate pointer position down to the sample/frame level.
So SNDRV_PCM_INFO_BLOCK_TRANSFER is mainly important for rewind handling and devices with that flag set might need additional headroom since the data up to one period after the pointer position has already been copied to the dedicated memory and hence can no longer be overwritten.
SNDRV_PCM_INFO_BATCH on the other hand has become to mean that the device is only capable of reporting the audio pointer with a coarse granularity. Typically this means a period sized granularity, but there are some other cases as well.
I have tried to convert this text into something that can be added to the doxygen documentation in alsa-lib, but failed to verify the text. Here is why.
In kernel sources, sound/pci/hda/hda_controller.c mentions SNDRV_PCM_INFO_BLOCK_TRANSFER. However, sub-period rewinds work fine on this driver, and the avail granularity is something like 64 bytes. So, either the description is wrong (i.e., "large blocks, possibly up to one period size, but 64 bytes still counts as large" is actually meant - but that would be useless), or the flag is wrongly set in the driver.
Alexander E. Patrakov wrote:
15.06.2015 13:03, Lars-Peter Clausen wrote:
So SNDRV_PCM_INFO_BLOCK_TRANSFER is mainly important for rewind handling and devices with that flag set might need additional headroom since the data up to one period after the pointer position has already been copied to the dedicated memory and hence can no longer be overwritten.
In kernel sources, sound/pci/hda/hda_controller.c mentions SNDRV_PCM_INFO_BLOCK_TRANSFER. However, sub-period rewinds work fine on this driver, and the avail granularity is something like 64 bytes.
HDA is a very typical PCI controller; if this flag were correct here, pretty much _every_ driver would need it.
Some (older) HDA controllers have problems with position reporting (with workarounds in the drivers), but those problems are with the timing, not with the granularity.
As far as I can see, snd-hda-intel should just drop this flag.
Regards, Clemens
27.06.2015 22:15, Clemens Ladisch wrote:
Alexander E. Patrakov wrote:
15.06.2015 13:03, Lars-Peter Clausen wrote:
So SNDRV_PCM_INFO_BLOCK_TRANSFER is mainly important for rewind handling and devices with that flag set might need additional headroom since the data up to one period after the pointer position has already been copied to the dedicated memory and hence can no longer be overwritten.
In kernel sources, sound/pci/hda/hda_controller.c mentions SNDRV_PCM_INFO_BLOCK_TRANSFER. However, sub-period rewinds work fine on this driver, and the avail granularity is something like 64 bytes.
HDA is a very typical PCI controller; if this flag were correct here, pretty much _every_ driver would need it.
Well, there are more PCI drivers with this flag than without it. So it looks like a "typical error".
aep@aep-haswell /usr/src/linux/sound $ grep -l SNDRV_PCM_INFO_BLOCK_TRANSFER `grep -rl SNDRV_PCM_INFO_MMAP_VALID pci` | wc -l 48 aep@aep-haswell /usr/src/linux/sound $ grep -L SNDRV_PCM_INFO_BLOCK_TRANSFER `grep -rl SNDRV_PCM_INFO_MMAP_VALID pci` | wc -l 15
I guess I should write a program that tests sub-period rewinds and post it to the list.
Some (older) HDA controllers have problems with position reporting (with workarounds in the drivers), but those problems are with the timing, not with the granularity.
As far as I can see, snd-hda-intel should just drop this flag.
So SNDRV_PCM_INFO_BLOCK_TRANSFER is mainly important for rewind
handling
and devices with that flag set might need additional headroom since the data up to one period after the pointer position has already been
copied
to the dedicated memory and hence can no longer be overwritten.
In kernel sources, sound/pci/hda/hda_controller.c mentions SNDRV_PCM_INFO_BLOCK_TRANSFER. However, sub-period rewinds work fine on this driver, and the avail granularity is something like 64 bytes.
This is hardware specific feature which is essential for timer based scheduling.
Even snd-usb-audio use a default number of packets for larger period size, it is still need to change to use small period size for lowest latency
snd-hda-intel also set has .periods_max=32 which force driver not to use small period size when application use maximum buffer size , it still cannot adjust latency dynamically
HDA is a very typical PCI controller; if this flag were correct here, pretty much _every_ driver would need it.
Well, there are more PCI drivers with this flag than without it. So it
looks like a "typical error".
aep@aep-haswell /usr/src/linux/sound $ grep -l
SNDRV_PCM_INFO_BLOCK_TRANSFER `grep -rl SNDRV_PCM_INFO_MMAP_VALID pci` | wc -l
48 aep@aep-haswell /usr/src/linux/sound $ grep -L
SNDRV_PCM_INFO_BLOCK_TRANSFER `grep -rl SNDRV_PCM_INFO_MMAP_VALID pci` | wc -l
15
I guess I should write a program that tests sub-period rewinds and post
it to the list.
Some (older) HDA controllers have problems with position reporting (with workarounds in the drivers), but those problems are with the timing, not with the granularity.
As far as I can see, snd-hda-intel should just drop this flag.
http://lxr.free-electrons.com/ident?i=SNDRV_PCM_INFO_BLOCK_TRANSFER
SNDRV_PCM_INFO_BLOCK_TRANSFER is not present in those ISA sound cards which use io ports to transfer data
Why do alsa-time-test.c use
if (cap == 0) r = snd_pcm_sw_params_set_avail_min(pcm, swparams, 1); else r = snd_pcm_sw_params_set_avail_min(pcm, swparams, 0); assert(r == 0);
What is oss wakeup point ?
snd_pcm_sw_params_set_avail_min
This is similar to setting an OSS wakeup point. The valid values for 'val' are determined by the specific hardware. Most PC sound cards can only accept power of 2 frame counts (i.e. 512, 1024, 2048).
Why sw_params use hardhware specific value ?
Can this value set to grsnularity which is smaller than period size since the function silently force avail_min not less than period size ?
participants (8)
-
Alexander E. Patrakov
-
Arun Raghavan
-
Clemens Ladisch
-
David Henningsson
-
Lars-Peter Clausen
-
Raymond Yau
-
Takashi Iwai
-
Takashi Sakamoto