Re: [alsa-devel] Proposal for more reliable audio DMA.
On 24 Jun 2009, at 17:07, Jon Smirl jonsmirl@gmail.com wrote:
Trying to directly expose the ring buffer to the app seems like a good
way to avoid a copy, but it isn't achieving that. pulse is not decoding audio straight into the ring buffer, it decodes first and then copies into the buffer. Pulse is using the timers to estimate the destination for the copy. Move this copy down into the drivers, the drivers know the correct destination for the copy.
Pulse isn't just doing straight playback, a large part of what it's there for is to do software mixing. When you have multiple sources active pulse is going to be forced to do the copy as part of the mixing process so putting that bit of the buffer management in kernel won't help in the way you think it does. Part of what's going on here is that the kernel code is trying to give userspace access to the data for as long as possible.
On Wed, Jun 24, 2009 at 12:28 PM, Mark Brownbroonie@opensource.wolfsonmicro.com wrote:
On 24 Jun 2009, at 17:07, Jon Smirl jonsmirl@gmail.com wrote:
Trying to directly expose the ring buffer to the app seems like a good
way to avoid a copy, but it isn't achieving that. pulse is not decoding audio straight into the ring buffer, it decodes first and then copies into the buffer. Pulse is using the timers to estimate the destination for the copy. Move this copy down into the drivers, the drivers know the correct destination for the copy.
Pulse isn't just doing straight playback, a large part of what it's there for is to do software mixing. When you have multiple sources active pulse is going to be forced to do the copy as part of the mixing process so putting that bit of the buffer management in kernel won't help in the way you think it does. Part of what's going on here is that the kernel code is trying to give userspace access to the data for as long as possible.
Does this work as a use case?
Pulse is playing music in the background. A game want to do a laser blast.
Pulse has already sent a buffer into kernel for the background music. Pulse makes a new buffer that contains the background mixed with the laser. It sends this new buffer into the kernel and says play with minimum latency.
The problem is knowing which sample in the background music to start mixing the low latency laser blast into. ALSA will need to know this index to figure out where to switch onto the replacement buffer. This offset is dynamic and it depends on how much work pulse is doing.
So you need two things. An estimate of the current playing sample and an estimate of the system latency to know where to start mixing.
Estimating the current sample can be done accurately by using a high frequency, free running counter. Drift can be compensated for by recording the count when the hardware accurately knows what sample it is on. Knowing how many samples to delay before mixing is a function of app latency and it needs to be measured in a feedback loop.
I'm starting to think the OSS model is right and mixing belongs in the kernel
At Wed, 24 Jun 2009 15:07:17 -0400, Jon Smirl wrote:
On Wed, Jun 24, 2009 at 12:28 PM, Mark Brownbroonie@opensource.wolfsonmicro.com wrote:
On 24 Jun 2009, at 17:07, Jon Smirl jonsmirl@gmail.com wrote:
Trying to directly expose the ring buffer to the app seems like a good
way to avoid a copy, but it isn't achieving that. pulse is not decoding audio straight into the ring buffer, it decodes first and then copies into the buffer. Pulse is using the timers to estimate the destination for the copy. Move this copy down into the drivers, the drivers know the correct destination for the copy.
Pulse isn't just doing straight playback, a large part of what it's there for is to do software mixing. When you have multiple sources active pulse is going to be forced to do the copy as part of the mixing process so putting that bit of the buffer management in kernel won't help in the way you think it does. Part of what's going on here is that the kernel code is trying to give userspace access to the data for as long as possible.
Does this work as a use case?
Pulse is playing music in the background. A game want to do a laser blast.
Pulse has already sent a buffer into kernel for the background music. Pulse makes a new buffer that contains the background mixed with the laser. It sends this new buffer into the kernel and says play with minimum latency.
When it's mmapped and can be updated on the fly, no need to resend the buffer. You can just rewrite it.
The problem is knowing which sample in the background music to start mixing the low latency laser blast into.
That's why querying the accurate hwptr is important in PA.
Takashi
On Wed, Jun 24, 2009 at 5:11 PM, Takashi Iwaitiwai@suse.de wrote:
The problem is knowing which sample in the background music to start mixing the low latency laser blast into.
That's why querying the accurate hwptr is important in PA.
I'm still not convinced that all of this logic should be exposed to PA. Exposing these details is what makes ALSA hard to use. We should be able to better isolate user space from this. If mixing were moved into the kernel these details could be hidden. The in-kernel code could then be customized for various sound DMA hardware. This would also go a long ways toward getting rid of latency issues by removing the need for real-time response from PA.
My hardware doesn't have the capability of querying the hwptr and the hwptr speed is not linear because of the FIFO and burst transfers. Non-linear speed means I can't use a clock to estimate hwptr. I do however have the capability of directing the DMA into a new buffer. Another thing I could try is setting up DMA descriptor chain blocks for every 16 bytes. These descriptors get marked as they are used and they don't have to cause an interrupt.
We are evaluating a processor change from PPC to ARM so all of this may change for me.
On 06/24/2009 10:26 PM, Jon Smirl wrote:
On Wed, Jun 24, 2009 at 5:11 PM, Takashi Iwaitiwai@suse.de wrote:
The problem is knowing which sample in the background music to start mixing the low latency laser blast into.
That's why querying the accurate hwptr is important in PA.
I'm still not convinced that all of this logic should be exposed to PA. Exposing these details is what makes ALSA hard to use. We should be able to better isolate user space from this. If mixing were moved into the kernel these details could be hidden. The in-kernel code could then be customized for various sound DMA hardware. This would also go a long ways toward getting rid of latency issues by removing the need for real-time response from PA.
Mixing really does not belong in the kernel. Moving it there doesn't remove any complication or problem, it just moves it to a different place where it's more difficult to program and less debuggable. Most OSes (Windows included) are moving in the direction of moving mixing out of the kernel, not into it.
For what PulseAudio is trying to do, it needs this kind of information because it wants to be able to rewrite the buffer the card is reading out of at any time, and it needs to be able to know how far along in the buffer the card has read so it knows where it can start rewriting. It's somewhat complicated for sure, but most normal applications don't have to deal with these kinds of details.
My hardware doesn't have the capability of querying the hwptr and the hwptr speed is not linear because of the FIFO and burst transfers. Non-linear speed means I can't use a clock to estimate hwptr. I do however have the capability of directing the DMA into a new buffer. Another thing I could try is setting up DMA descriptor chain blocks for every 16 bytes. These descriptors get marked as they are used and they don't have to cause an interrupt.
We are evaluating a processor change from PPC to ARM so all of this may change for me.
On Thu, Jun 25, 2009 at 1:36 AM, Robert Hancockhancockrwd@gmail.com wrote:
On 06/24/2009 10:26 PM, Jon Smirl wrote:
On Wed, Jun 24, 2009 at 5:11 PM, Takashi Iwaitiwai@suse.de wrote:
The problem is knowing which sample in the background music to start mixing the low latency laser blast into.
That's why querying the accurate hwptr is important in PA.
I'm still not convinced that all of this logic should be exposed to PA. Exposing these details is what makes ALSA hard to use. We should be able to better isolate user space from this. If mixing were moved into the kernel these details could be hidden. The in-kernel code could then be customized for various sound DMA hardware. This would also go a long ways toward getting rid of latency issues by removing the need for real-time response from PA.
Mixing really does not belong in the kernel. Moving it there doesn't remove any complication or problem, it just moves it to a different place where it's more difficult to program and less debuggable. Most OSes (Windows included) are moving in the direction of moving mixing out of the kernel, not into it.
Mixing has a real-time component to it. Currently Desktop Linux doesn't have real-time support. That's why pulse is developing RealTimeKit. Buggy real-time code can easily lock your machine to where you need to hit the reset button.
User space code that is locked down with real-time priority and servicing interrupts is effectively kernel code, it might as well be in the kernel where it can get rid of the process overhead.
http://git.0pointer.de/?p=rtkit.git;a=blob;f=README
For what PulseAudio is trying to do, it needs this kind of information because it wants to be able to rewrite the buffer the card is reading out of at any time, and it needs to be able to know how far along in the buffer the card has read so it knows where it can start rewriting. It's somewhat complicated for sure, but most normal applications don't have to deal with these kinds of details.
My hardware doesn't have the capability of querying the hwptr and the hwptr speed is not linear because of the FIFO and burst transfers. Non-linear speed means I can't use a clock to estimate hwptr. I do however have the capability of directing the DMA into a new buffer. Another thing I could try is setting up DMA descriptor chain blocks for every 16 bytes. These descriptors get marked as they are used and they don't have to cause an interrupt.
We are evaluating a processor change from PPC to ARM so all of this may change for me.
On 06/25/2009 08:20 AM, Jon Smirl wrote:
On Thu, Jun 25, 2009 at 1:36 AM, Robert Hancockhancockrwd@gmail.com wrote:
On 06/24/2009 10:26 PM, Jon Smirl wrote:
On Wed, Jun 24, 2009 at 5:11 PM, Takashi Iwaitiwai@suse.de wrote:
The problem is knowing which sample in the background music to start mixing the low latency laser blast into.
That's why querying the accurate hwptr is important in PA.
I'm still not convinced that all of this logic should be exposed to PA. Exposing these details is what makes ALSA hard to use. We should be able to better isolate user space from this. If mixing were moved into the kernel these details could be hidden. The in-kernel code could then be customized for various sound DMA hardware. This would also go a long ways toward getting rid of latency issues by removing the need for real-time response from PA.
Mixing really does not belong in the kernel. Moving it there doesn't remove any complication or problem, it just moves it to a different place where it's more difficult to program and less debuggable. Most OSes (Windows included) are moving in the direction of moving mixing out of the kernel, not into it.
Mixing has a real-time component to it. Currently Desktop Linux doesn't have real-time support. That's why pulse is developing RealTimeKit. Buggy real-time code can easily lock your machine to where you need to hit the reset button.
User space code that is locked down with real-time priority and servicing interrupts is effectively kernel code, it might as well be in the kernel where it can get rid of the process overhead.
Just because it runs with real time priority does not mean it is effectively kernel code, or that it belongs there. Putting the code into the kernel adds a bunch of extra challenges for little reason, and also locks all users into a mixing scheme that may not meet their needs (ahem, Windows kernel mixer..)
On Wed, Jun 24, 2009 at 03:07:17PM -0400, Jon Smirl wrote:
On Wed, Jun 24, 2009 at 12:28 PM, Mark Brownbroonie@opensource.wolfsonmicro.com wrote:
it does. Part of what's going on here is that the kernel code is trying to give userspace access to the data for as long as possible.
The problem is knowing which sample in the background music to start mixing the low latency laser blast into. ALSA will need to know this index to figure out where to switch onto the replacement buffer. This offset is dynamic and it depends on how much work pulse is doing.
Of course, some hardware is not going to allow the DMA controller to be reprogrammed while active so would need to either wait for a buffer boundary or update the data in the current buffer as is currently done.
I'm starting to think the OSS model is right and mixing belongs in the kernel
This isn't a kernel/user problem. Exactly the same issues come up if the code pushing data into the driver is in the kernel, it'll still want as much information as possible about what the current status is.
Moving any non-hardware stuff into the kernel would create more problems than it solves. Remember that ALSA supports arbitrary plugin stacks - users could be doing signal processing on the data post mix, for example doing soft EQ or 3D enhancement. Some of this can be done pre-mix but it'll always be less efficient and in some cases would interfere with the operation of the algorithms.
Remember also that hardware output is just one option for ALSA. You can also have output plugins that do things like send data over the network.
participants (4)
-
Jon Smirl
-
Mark Brown
-
Robert Hancock
-
Takashi Iwai