[alsa-devel] Proposal for more reliable audio DMA.
This algorithm is fairly similar to what currently exists in ALSA with a few modifications. I'm trying to come up with guaranteed way to play glitch free audio. Does this algorithm work or could it be modified to work?
The main change is to track a high resolution timer as the means of estimating where in the buffer to insert new, low latency data. This tracking is done in a timer tick interrupt. Jiffies are too coarse and not all hardware allows you to ask the current DMA position.
The other change is to keep all of the buffers filled with silence if there isn't any pending data. (not doing this caused problem in my mpc5200 AC97 driver)
Three buffers are used as a way to deterministically bound the DMA pointer without needing interrupts.
---
Use three chained buffers. Buffer size is samples/tick (or maybe 1.5 samples/tick) Initialize buffers to silence. Set end of buffer three to automatically terminate DMA
Set FIFO to minimum bus allows Fill buffer one with samples available, start playing.
On timer tick call into driver.. If buffer one has finished playing move it to third position. New last buffer is set to automatically stop DMA. Return swap status to ALSA If swapped, fill buffer three with silence or pending data It is important to fill this buffer with silence if there is no pending data
The size of the buffers needs to be large enough to cover worst case timer tick latency. Buffer two has to be large enough to ensure that the tick routine will notice buffer one has been played before buffer three starts.
If two buffers have been completed when the tick runs, let the third buffer finish without swapping. Since you don't know where you are in the third buffer, it is unsafe to swap at this point.
Playing the last buffer causes the end of buffer interrupt to happen. Callback into ALSA to alert it of the underrun and need to restart DMA. If this callback happens, make the buffers larger.
Long term observation of buffer completion status in tick handler will allow accurate computation of samples/HPET unit. Record high accuracy timer (HPET) on tick.
New low latency data that arrives can be inserted into the buffers dynamically. Use the high resolution timer source to estimate where to place it. Minimum FIFO ensures low latency play.
Design a minimum power mode. Allows a huge FIFO to be loaded (like 128KB). Call into driver to reset to minimum latency, small FIFO mode.
Advantages:
1) If ALSA goes away, hardware will stop on it's own with no noise. 2) No need to know the current position of DMA hardware. 3) Both low and high latency modes. 4) Audio interrupts only generated as error condition 5) Behavior is deterministic. Nothing is left to guess work.
2009/6/21 Jon Smirl jonsmirl@gmail.com:
New last buffer is set to automatically stop DMA.
How do you do this? DMA transfers on sound cards are a ring buffer. There is no automatic stop feature. You set the dma pointers, start dma going in a loop and that is it. You can then stop the DMA on command but not as a result of a buffer end.
On Mon, Jun 22, 2009 at 12:27 PM, James Courtier-Duttonjames.dutton@gmail.com wrote:
2009/6/21 Jon Smirl jonsmirl@gmail.com:
New last buffer is set to automatically stop DMA.
How do you do this? DMA transfers on sound cards are a ring buffer. There is no automatic stop feature.
I don't about all hardware, but all of the hardware I've worked with works both ways, ring or stop at the end.
DMA transfers for network packets wouldn't work in the ring buffer model, you need the stop at the end capability.
You set the dma pointers, start dma going in a loop and that is it. You can then stop the DMA on command but not as a result of a buffer end.
On Mon, Jun 22, 2009 at 12:43:52PM -0400, Jon Smirl wrote:
On Mon, Jun 22, 2009 at 12:27 PM, James
DMA transfers on sound cards are a ring buffer. There is no automatic stop feature.
I don't about all hardware, but all of the hardware I've worked with works both ways, ring or stop at the end.
DMA transfers for network packets wouldn't work in the ring buffer model, you need the stop at the end capability.
Remember, you're working with a general purpose SoC which shares the DMA controller with a large selection of other hardware. A DMA controller that's part of a sound device and can't be used in anything else doesn't need to worry about any other applications.
On Tue, Jun 23, 2009 at 5:54 AM, Mark Brownbroonie@opensource.wolfsonmicro.com wrote:
On Mon, Jun 22, 2009 at 12:43:52PM -0400, Jon Smirl wrote:
On Mon, Jun 22, 2009 at 12:27 PM, James
DMA transfers on sound cards are a ring buffer. There is no automatic stop feature.
I don't about all hardware, but all of the hardware I've worked with works both ways, ring or stop at the end.
DMA transfers for network packets wouldn't work in the ring buffer model, you need the stop at the end capability.
Remember, you're working with a general purpose SoC which shares the DMA controller with a large selection of other hardware. =A0A DMA controller that's part of a sound device and can't be used in anything else doesn't need to worry about any other applications.
_From what I have observed the current ALSA DMA design does not reliably deal with over/underrun. On the hardware I'm using it is possible to construct a system which will always behave predictably but I can't build it using the ALSA driver interface.
These issues probably indicates that the DMA interface between ALSA and the driver has been designed at the wrong level. For example those timers trying to fix glitches in HDA belong down in the HDA driver, not the core. Why did my DMA code needs to peak back into ALSA core at appl pointer? The proliferation of flags on the DMA interface is also an indication that it is too low level.
I'm still working on solutions for my embedded application but I may be forced to add private IOCTLs to the driver and by-pass ASLA. That will work for me since I'm not building a general purpose system.
--=20 Jon Smirl jonsmirl@gmail.com
At Wed, 24 Jun 2009 10:10:35 -0400, Jon Smirl wrote:
On Tue, Jun 23, 2009 at 5:54 AM, Mark Brownbroonie@opensource.wolfsonmicro.com wrote:
On Mon, Jun 22, 2009 at 12:43:52PM -0400, Jon Smirl wrote:
On Mon, Jun 22, 2009 at 12:27 PM, James
DMA transfers on sound cards are a ring buffer. There is no automatic stop feature.
I don't about all hardware, but all of the hardware I've worked with works both ways, ring or stop at the end.
DMA transfers for network packets wouldn't work in the ring buffer model, you need the stop at the end capability.
Remember, you're working with a general purpose SoC which shares the DMA controller with a large selection of other hardware. A DMA controller that's part of a sound device and can't be used in anything else doesn't need to worry about any other applications.
From what I have observed the current ALSA DMA design does not reliably deal with over/underrun. On the hardware I'm using it is possible to construct a system which will always behave predictably but I can't build it using the ALSA driver interface.
That's true for your hardware. But not for most hardware with simple "setup-go-and-dont-touch-anymore" style DMA.
These issues probably indicates that the DMA interface between ALSA and the driver has been designed at the wrong level.
Partly true. ALSA PCM was designed for most ISA/PCI DMA transfer model, not for embedded devices. (BTW, it means that your proposal can't be applied easily to most of these devices because their DMA setup cannot be changed at all while DMA is running...)
For example those timers trying to fix glitches in HDA belong down in the HDA driver, not the core.
Basically, XRUN can be avoided very easily. Simply have an enough large buffer. The rest question is how to fill the buffer. If you don't want to give h/w interrupts from the sound chip, you need any other timing source to sync with the position. But, XRUN is simply a matter of the buffer size and the latency of the system.
The glitch-free problem of PA comes from the fact that PA assumes that the driver returns the current hw position accurately at any time. But, in many hardwards, including HDA, this is not true. The hardware lies. It doesn't report the right position at all. Thus, there are many workarounds implemented in HD-audio side.
So, before a discussion goes chaotic, I'd like to separate two issues: - how to avoid XRUN - how to detect XRUN
The former is what I mentioned in the above.
The latter pretty depends on the hardware, and your proposal would help (if it were possible for the target hardware).
Actually, for me, your proposal looks rather like a redesign of PCM core to fit better with specific embedded devices. That's fine, and I've been thinking of a way to improve the core model. But, it would merely help for "reliability" in general, if you look at all devices we must support.
thanks,
Takashi
On Wed, Jun 24, 2009 at 10:39 AM, Takashi Iwaitiwai@suse.de wrote:
The glitch-free problem of PA comes from the fact that PA assumes that the driver returns the current hw position accurately at any time. But, in many hardwares, including HDA, this is not true. The hardware lies. It doesn't report the right position at all. Thus, there are many workarounds implemented in HD-audio side.
Why does pulse need to know the DMA position?
If it is so that it can write into the buffer with minimal latency there are other ways to accomplish that. The simplest is to just add an entry into ALSA core that says, play this buffer with minimal latency. That would let the transfer be pushed down into the specific driver and that driver could handle it in an optimal way.
Optimal on my hardware would be to reprogram the DMA hardware to immediately start playing from the new buffer. No copies involved. The FIFO will hide the buffer swap from the audio hardware.
Optimal on ring buffer hardware would be to locate where DMA was in the ring and copy to a position in front of it.
These hardware differences should be hidden from pulse.
At Wed, 24 Jun 2009 11:14:33 -0400, Jon Smirl wrote:
On Wed, Jun 24, 2009 at 10:39 AM, Takashi Iwaitiwai@suse.de wrote:
The glitch-free problem of PA comes from the fact that PA assumes that the driver returns the current hw position accurately at any time. But, in many hardwares, including HDA, this is not true. The hardware lies. It doesn't report the right position at all. Thus, there are many workarounds implemented in HD-audio side.
Why does pulse need to know the DMA position?
If it is so that it can write into the buffer with minimal latency there are other ways to accomplish that. The simplest is to just add an entry into ALSA core that says, play this buffer with minimal latency. That would let the transfer be pushed down into the specific driver and that driver could handle it in an optimal way.
To get a minimum latency, you need to program the sound hardware to notify that. And, most hardware can't do anything but issue IRQs periodically at buffer, fragment or period or whatever boundary.
Instead of the hardware irqs (that can wake up too often), PA uses the timer. Then it must check the position because the clock on the hardware isn't quite accurate and very likely you'll get a drift sooner or later if you use the other timer source.
Yes, there can be optimizations for hardwares that are capable to notify any free size chunks. This is a missing piece.
Optimal on my hardware would be to reprogram the DMA hardware to immediately start playing from the new buffer. No copies involved. The FIFO will hide the buffer swap from the audio hardware.
Yes. But it's rather a rare case that you can do such an operation.
Optimal on ring buffer hardware would be to locate where DMA was in the ring and copy to a position in front of it.
That's exactly what PA does. But you must know where to copy beforehand. And the hardware lies where is the current position.
Takashi
On Wed, Jun 24, 2009 at 11:24 AM, Takashi Iwaitiwai@suse.de wrote:
At Wed, 24 Jun 2009 11:14:33 -0400, Jon Smirl wrote:
On Wed, Jun 24, 2009 at 10:39 AM, Takashi Iwaitiwai@suse.de wrote:
The glitch-free problem of PA comes from the fact that PA assumes that the driver returns the current hw position accurately at any time. But, in many hardwares, including HDA, this is not true. The hardware lies. It doesn't report the right position at all. Thus, there are many workarounds implemented in HD-audio side.
Why does pulse need to know the DMA position?
If it is so that it can write into the buffer with minimal latency there are other ways to accomplish that. The simplest is to just add an entry into ALSA core that says, play this buffer with minimal latency. That would let the transfer be pushed down into the specific driver and that driver could handle it in an optimal way.
To get a minimum latency, you need to program the sound hardware to notify that. And, most hardware can't do anything but issue IRQs periodically at buffer, fragment or period or whatever boundary.
Instead of the hardware irqs (that can wake up too often), PA uses the timer. Then it must check the position because the clock on the hardware isn't quite accurate and very likely you'll get a drift sooner or later if you use the other timer source.
Pulse should not need to mess with this. Doesn't pulse need to do just two things? 1) play this buffer as soon as possible - abandon previously queued samples 2) ALSA call back saying I have room for more data in my queue 2a) ALSA call back saying underrun happen
When pulse gets new low latency data from a client app, it should make a new buffer, call into ALSA core and say, play this buffer ASAP. This new buffer would replace the old one. Pulse does not need to know where the DMA pointer is or mess with timers. Messing with those should be internal to ALSA and it's drivers. The ring buffer hardware model should not be exposed to pulse, especially since not all hardware has ring buffers.
Inside ALSA this play-ASAP buffer should just be passed into the driver. Only the driver really know how to play something ASAP. The implementation of play-ASAP will be quite different on ring buffer hardware, scatter/gather DMA hardware or hardware that needs indirect access to the buffer.
Trying to directly expose the ring buffer to the app seems like a good way to avoid a copy, but it isn't achieving that. pulse is not decoding audio straight into the ring buffer, it decodes first and then copies into the buffer. Pulse is using the timers to estimate the destination for the copy. Move this copy down into the drivers, the drivers know the correct destination for the copy.
Yes, there can be optimizations for hardwares that are capable to notify any free size chunks. This is a missing piece.
Optimal on my hardware would be to reprogram the DMA hardware to immediately start playing from the new buffer. No copies involved. The FIFO will hide the buffer swap from the audio hardware.
Yes. But it's rather a rare case that you can do such an operation.
Optimal on ring buffer hardware would be to locate where DMA was in the ring and copy to a position in front of it.
That's exactly what PA does. But you must know where to copy beforehand. And the hardware lies where is the current position.
Takashi
2009/6/24 Jon Smirl jonsmirl@gmail.com:
If it is so that it can write into the buffer with minimal latency there are other ways to accomplish that. The simplest is to just add an entry into ALSA core that says, play this buffer with minimal latency. That would let the transfer be pushed down into the specific driver and that driver could handle it in an optimal way.
"minimal latency" is not the only requirement. Another is "play this sample at a predictable time." in order to ensure it plays in sync with video for example. The real problem is ensuring that the application reacts in time to fill up the buffer before it empties. So, the ideal would be a way to ensure that "process X gets woken just before the buffer empties". One way to trigger this is via the sound card hardware interrupt, another is by using the global timer to wake one up at a particular time. Using the timer is probably better because the wake up is then more granular, but one then needs a way to sync the timer with the hardware clock on the sound card. I would like a kernel scheduling api that could reliably do "wake me up at nanosecond X" but none exists. I can do "nanosleep(X)" which will maybe wake me up in X nanoseconds, but it is not very reliable. Another useful scheduling api would be "make sure process X is woken up after me".
participants (4)
-
James Courtier-Dutton
-
Jon Smirl
-
Mark Brown
-
Takashi Iwai