[alsa-devel] TWL4030 low-latency

Dear Alsa Devel List,
I would like to send some feedback on the TWL4030 driver when used on the Beagle Board. We have tested it under Ubuntu and Angstrom, and we find it does not work well at low latencies (e.g. input-to-output in 5ms, 10ms, or 20ms). Perhaps it was never tested extensively at these latencies. I thought I would send you this information in case you would look into it, or at least pass it on to the relevant contact. I can also run further tests if someone informs me what I should try.
1) To get the drivers to run at particularly low latencies (e.g. 20ms), it is necessary to operate the drivers in NON-realtime mode. It would interest me to know whether this means that the realtime functionality of the Beagle Board xM hardware or OS is hampered, or whether the TWL4030 driver is not operating up to specification. (If instead the TWL4030 driver is operating in realtime mode, then Jack stops running when an unrelated low-latency USB-serial device is opened.)
2) Some audio software (e.g. Pure Data, ChucK, etc.) will not connect to the TWL4030 ALSA drivers directly, instead they will only start when jack intermediates between the audio software and the TWL4030 ALSA drivers.
3) In some situations, the TWL4030 drivers appear not to start using the requested settings. For example, when we start jack requesting two periods per buffer, it is apparently getting initialized at eight periods per buffer, see the final lines below: Calling $ jackd -r -dalsa -dhw:0 -p128 -n2 -i2 -o2 -s -S -r on the Beagle under Ubuntu 10.10 and Angstrom demo image results in the following
jackd 0.118.0 Copyright 2001-2009 Paul Davis, Stephane Letz, Jack O'Quinn, Torben Hohn and ot. jackd comes with ABSOLUTELY NO WARRANTY This is free software, and you are welcome to redistribute it under certain conditions; see the file COPYING for details
JACK compiled with System V SHM support. loading driver .. apparent rate = 44100 creating alsa driver ... hw:0|hw:0|128|2|44100|2|2|nomon|swmeter|soft-mode|16bit control device hw:0 configuring for 44100Hz, period = 128 frames (2.9 ms), buffer = 2 periods ALSA: final selected sample format for capture: 16bit little-endian ALSA: use **8** periods for capture ALSA: final selected sample format for playback: 16bit little-endian ALSA: use **8** periods for playback
Thanks very much for listening!
Best, Edgar Berdahl
Lecturer Center for Computer Research in Music and Acoustics (CCRMA) Stanford University http://ccrma.stanford.edu/courses/250a

Hello Edgar,
Please find my comments embedded.
On Fri, Oct 14, 2011 at 9:24 PM, Edgar Berdahl eberdahl@ccrma.stanford.edu wrote:
Dear Alsa Devel List,
I would like to send some feedback on the TWL4030 driver when used on the Beagle Board. We have tested it under Ubuntu and Angstrom, and we find it does not work well at low latencies (e.g. input-to-output in 5ms, 10ms, or 20ms). Perhaps it was never tested extensively at these latencies. I thought I would send you this information in case you would look into it, or at least pass it on to the relevant contact. I can also run further tests if someone informs me what I should try.
As the first question: what is the kernel versions in these distros? Just one clarification: the twl4030 driver itself is passive during audio activity - we set it up, and let it run. It needs no interaction. omap-pcm, and omap-mcbsp is "running" during audio. McBSP feeds the audio via I2S bus to twl4030, sDMA takes care of the samples coming/going to SDRAM.
- To get the drivers to run at particularly low latencies (e.g. 20ms), it is necessary to operate the drivers in NON-realtime mode. It would interest me to know whether this means that the realtime functionality of the Beagle Board xM hardware or OS is hampered, or whether the TWL4030 driver is not operating up to specification. (If instead the TWL4030 driver is operating in realtime mode, then Jack stops running when an unrelated low-latency USB-serial device is opened.)
I'm not sure what you mean under realtime/non-realtime. Is it something to do with jack? It is possible that jack does not like if the pcm period interrupt has been delayed. This can hapen due to some driver disables the interrupts for a long time, thus delaying the DMA interrupt handler, etc. Not easy to figure out what is happening.
- Some audio software (e.g. Pure Data, ChucK, etc.) will not connect to the TWL4030 ALSA drivers directly, instead they will only start when jack intermediates between the audio software and the TWL4030 ALSA drivers.
No comment. I have used mplayer, pulseaudio, ecasound on Beagle without issues.
- In some situations, the TWL4030 drivers appear not to start using the requested settings. For example, when we start jack requesting two periods per buffer, it is apparently getting initialized at eight periods per buffer, see the final lines below:
Calling $ jackd -r -dalsa -dhw:0 -p128 -n2 -i2 -o2 -s -S -r on the Beagle under Ubuntu 10.10 and Angstrom demo image results in the following
jackd 0.118.0 Copyright 2001-2009 Paul Davis, Stephane Letz, Jack O'Quinn, Torben Hohn and ot. jackd comes with ABSOLUTELY NO WARRANTY This is free software, and you are welcome to redistribute it under certain conditions; see the file COPYING for details
JACK compiled with System V SHM support. loading driver .. apparent rate = 44100 creating alsa driver ... hw:0|hw:0|128|2|44100|2|2|nomon|swmeter|soft-mode|16bit control device hw:0 configuring for 44100Hz, period = 128 frames (2.9 ms), buffer = 2 periods ALSA: final selected sample format for capture: 16bit little-endian ALSA: use **8** periods for capture ALSA: final selected sample format for playback: 16bit little-endian ALSA: use **8** periods for playback
I think I can explain this: McBSP2 has 1280 word long buffer. We place constraint that the ALSA buffer must not be smaller than this, because if it is smaller we can end up spinning on the buffer at stream start: sDMA will fill up the FIFO, and if the buffer is smaller than the FIFO, it will copy the buffer several times, or partially. Depending on the size you would have chosen.
You are requesting 128 frames long period size, which occupy 256 words (I assume stereo samples) With that period size you will need at least 5 periods to cover the McBSP2 FIFO: 1280 / 256 = 5 It is another question, why jack selects 8 periods, while it could use 6, if it likes to have even number of periods...

On 10/14/2011 02:45 PM, Ujfalusi, Peter wrote:
- To get the drivers to run at particularly low latencies (e.g. 20ms), it is necessary to operate the drivers in NON-realtime mode. It would interest me to know whether this means that the realtime functionality of the Beagle Board xM hardware or OS is hampered, or whether the TWL4030 driver is not operating up to specification. (If instead the TWL4030 driver is operating in realtime mode, then Jack stops running when an unrelated low-latency USB-serial device is opened.)
I'm not sure what you mean under realtime/non-realtime. Is it something to do with jack?
That's probably what he means. When you run JACK in "realtime mode", it runs the audio thread with SCHED_FIFO (realtime scheduling).
It is possible that jack does not like if the pcm period interrupt has been delayed.
Yes, when running RT... jack will not like this at all. If the interrupt comes too late for the audio cycle, then the whole audio cycle is compromised.
This can hapen due to some driver disables the interrupts for a long time, thus delaying the DMA interrupt handler, etc. Not easy to figure out what is happening.
Edgar, are you guys running a vanilla kernel or an -rt kernel?
- Some audio software (e.g. Pure Data, ChucK, etc.) will not connect to the TWL4030 ALSA drivers directly, instead they will only start when jack intermediates between the audio software and the TWL4030 ALSA drivers.
No comment. I have used mplayer, pulseaudio, ecasound on Beagle without issues.
It's probably some manner of format negotiation that's the problem here. I.e. pd or ChucK is probably asking for an audio format that the device doesn't support. (E.g. they might be asking for F32_LE or an unsupported sample rate.) In contrast "consumer" type apps tend to me less picky / more flexible about the sound card format.
- In some situations, the TWL4030 drivers appear not to start using the requested settings. For example, when we start jack requesting two periods per buffer, it is apparently getting initialized at eight periods per buffer, see the final lines below:
Calling $ jackd -r -dalsa -dhw:0 -p128 -n2 -i2 -o2 -s -S -r on the Beagle under Ubuntu 10.10 and Angstrom demo image results in the following
jackd 0.118.0 Copyright 2001-2009 Paul Davis, Stephane Letz, Jack O'Quinn, Torben Hohn and ot. jackd comes with ABSOLUTELY NO WARRANTY This is free software, and you are welcome to redistribute it under certain conditions; see the file COPYING for details
JACK compiled with System V SHM support. loading driver .. apparent rate = 44100 creating alsa driver ... hw:0|hw:0|128|2|44100|2|2|nomon|swmeter|soft-mode|16bit control device hw:0 configuring for 44100Hz, period = 128 frames (2.9 ms), buffer = 2 periods ALSA: final selected sample format for capture: 16bit little-endian ALSA: use **8** periods for capture ALSA: final selected sample format for playback: 16bit little-endian ALSA: use **8** periods for playback
I think I can explain this: McBSP2 has 1280 word long buffer. We place constraint that the ALSA buffer must not be smaller than this, because if it is smaller we can end up spinning on the buffer at stream start: sDMA will fill up the FIFO, and if the buffer is smaller than the FIFO, it will copy the buffer several times, or partially. Depending on the size you would have chosen.
Hrm, yuck. Is S32_LE an option? This may be one way to compensate.
You are requesting 128 frames long period size, which occupy 256 words (I assume stereo samples) With that period size you will need at least 5 periods to cover the McBSP2 FIFO: 1280 / 256 = 5 It is another question, why jack selects 8 periods, while it could use 6, if it likes to have even number of periods...
No, jack only requires that it be more than 2. E.g. a lot of usb devices require 3. The suggestion that it should be 8 is probably just a best guess on behalf of jack's code.
-gabriel

Dear Peter,
On Oct 14, 2011, at 12:45 PM, Ujfalusi, Peter wrote:
As the first question: what is the kernel versions in these distros? Just one clarification: the twl4030 driver itself is passive during audio activity - we set it up, and let it run. It needs no interaction. omap-pcm, and omap-mcbsp is "running" during audio. McBSP feeds the audio via I2S bus to twl4030, sDMA takes care of the samples coming/going to SDRAM.
Thanks for clarifying that!
- To get the drivers to run at particularly low latencies (e.g. 20ms), it is necessary to operate the drivers in NON-realtime mode. It would interest me to know whether this means that the realtime functionality of the Beagle Board xM hardware or OS is hampered, or whether the TWL4030 driver is not operating up to specification. (If instead the TWL4030 driver is operating in realtime mode, then Jack stops running when an unrelated low-latency USB-serial device is opened.)
I'm not sure what you mean under realtime/non-realtime. Is it something to do with jack? It is possible that jack does not like if the pcm period interrupt has been delayed. This can hapen due to some driver disables the interrupts for a long time, thus delaying the DMA interrupt handler, etc. Not easy to figure out what is happening.
I'm just starting to get experience with embedded linux systems. I assume that since they have less computational power, interrupt processing could use up a larger percentage of that CPU power on average.
- Some audio software (e.g. Pure Data, ChucK, etc.) will not connect to the TWL4030 ALSA drivers directly, instead they will only start when jack intermediates between the audio software and the TWL4030 ALSA drivers.
No comment. I have used mplayer, pulseaudio, ecasound on Beagle without issues.
Yes, as far as I know these programs work great! Audio latency is probably less of a concern for them.
- In some situations, the TWL4030 drivers appear not to start using the requested settings. For example, when we start jack requesting two periods per buffer, it is apparently getting initialized at eight periods per buffer, see the final lines below:
Calling $ jackd -r -dalsa -dhw:0 -p128 -n2 -i2 -o2 -s -S -r on the Beagle under Ubuntu 10.10 and Angstrom demo image results in the following
jackd 0.118.0 Copyright 2001-2009 Paul Davis, Stephane Letz, Jack O'Quinn, Torben Hohn and ot. jackd comes with ABSOLUTELY NO WARRANTY This is free software, and you are welcome to redistribute it under certain conditions; see the file COPYING for details
JACK compiled with System V SHM support. loading driver .. apparent rate = 44100 creating alsa driver ... hw:0|hw:0|128|2|44100|2|2|nomon|swmeter|soft-mode|16bit control device hw:0 configuring for 44100Hz, period = 128 frames (2.9 ms), buffer = 2 periods ALSA: final selected sample format for capture: 16bit little-endian ALSA: use **8** periods for capture ALSA: final selected sample format for playback: 16bit little-endian ALSA: use **8** periods for playback
I think I can explain this: McBSP2 has 1280 word long buffer.
Wow, that is a long buffer! There is no way to reduce the buffer size?
We place constraint that the ALSA buffer must not be smaller than this, because if it is smaller we can end up spinning on the buffer at stream start: sDMA will fill up the FIFO, and if the buffer is smaller than the FIFO, it will copy the buffer several times, or partially. Depending on the size you would have chosen.
You are requesting 128 frames long period size, which occupy 256 words (I assume stereo samples) With that period size you will need at least 5 periods to cover the McBSP2 FIFO: 1280 / 256 = 5 It is another question, why jack selects 8 periods, while it could use 6, if it likes to have even number of periods...
I bet this is where a lot of the latency comes from!
Best, Edgar
PS. We have tried several USB audio interfaces with the Beagle, with varying degrees of success. For one of them, I put it on a scope just to check that the Beagle could actually go as low as 6ms of input-output audio latency (Guitar Link UCG102), with a USB audio interface. For that interface, the Pure Data software was able to open the audio interface directly using ALSA.

Hi Edgar,
On Mon, Oct 17, 2011 at 4:39 AM, Edgar Berdahl eberdahl@ccrma.stanford.edu wrote:
I'm just starting to get experience with embedded linux systems. I assume that since they have less computational power, interrupt processing could use up a larger percentage of that CPU power on average.
It is not just processing power. We have peripherals attached through slow buses. Some drivers just disables interrupts while they are communicating (when they should not). Embedded environment is quite different from normal desktops, and some tweaking might be needed to get optimal results on a given configuration.
- Some audio software (e.g. Pure Data, ChucK, etc.) will not connect to the TWL4030 ALSA drivers directly, instead they will only start when jack intermediates between the audio software and the TWL4030 ALSA drivers.
No comment. I have used mplayer, pulseaudio, ecasound on Beagle without issues.
Yes, as far as I know these programs work great! Audio latency is probably less of a concern for them.
BTW: PA also have 'real time' mode which basically means that it is using the kernel scheduling for audio timing. It does not uses the audio interrupts (this should work fine now).
I think I can explain this: McBSP2 has 1280 word long buffer.
Wow, that is a long buffer! There is no way to reduce the buffer size?
I have some ideas, but it is a long shot. Basically by design it is not possible, but I might can get around of it by misusing/configuring the McBSP, and sDMA. I try to hack something up for kernel 3.3.
I bet this is where a lot of the latency comes from!
I think you can see rapid period elapsed at the start of the playback which should settle down after ~640 samples...

Ujfalusi, Peter wrote:
BTW: PA also have 'real time' mode which basically means that it is using the kernel scheduling for audio timing.
That mode isn't called 'real time' and has nothing to do with real time. It just uses the kernel timer instead of the sound card clock so that it can be woken up in irregular intervals. This mode doesn't work so well if the sound card doesn't allow to read the current sample position precisely (many embedded DMA controllers have this problem).
Regards, Clemens

Ujfalusi, Peter <peter.ujfalusi <at> ti.com> writes:
McBSP2 has 1280 word long buffer.
Wow, that is a long buffer! There is no way to reduce the buffer size?
I have some ideas, but it is a long shot. Basically by design it is not possible, but I might can get around of it by misusing/configuring the McBSP, and sDMA. I try to hack something up for kernel 3.3.
Hi Peter, is there like a set of steps/hints/source that you could share for developers to experiment around these 'hacks' to the McBSP/sDMA? Is it a matter of recompiling a particular kernel module or lib?.
I have verified Edgar's statements; I am not using Jack, just straight ALSA pcm commands and I cannot get the stream to run at anything less than a 23 ms: 128 frames x 8 periods. I can use longer size w/less periods but ALSA will always throw an error if I try a combined lower latency.
Your feedback will be greatly appreciated! -Jacinto

Hi Jacinto,
On 01/04/2012 08:35 PM, Jacinto Alvarez wrote:
Hi Peter, is there like a set of steps/hints/source that you could share for developers to experiment around these 'hacks' to the McBSP/sDMA? Is it a matter of recompiling a particular kernel module or lib?.
Unfortunately I was not able to spend time on this :(
I have verified Edgar's statements; I am not using Jack, just straight ALSA pcm commands and I cannot get the stream to run at anything less than a 23 ms: 128 frames x 8 periods. I can use longer size w/less periods but ALSA will always throw an error if I try a combined lower latency.
Not sure what you mean under latency. For me the latency is the time needed to play out the sample at the DMA pointer in main memory. In case of McBSP2 (which has 1280 word long buffer) it is maximum of: (1280/<number of channels>)/sampling frequency In case of stereo sample it is 14.51ms with 44.1KHz, and 13.3ms with 48KHz.
There's not that much we can do to reduce this. The alsa buffer size need to be at least 1280/<number of channels> to avoid error at stream start.
In McBSP element mode the McBSP FIFO will be kept full all the time (threshold is 0). This means that the McBSP will request one word from the DMA if there's a single free slot in the FIFO.
If you switch to threshold mode things will be different under the hood (providing better power saving): the McBSP threshold value will be calculated according to the period size. In this mode the DMA will request for a chunk of data based on the threshold value, so the FIFO will be filled with DMA bursts. You still have the FIFO caused latency, but the system can rest between bursts. I think if you use McBSP threshold mode with 3x 5ms periods you should be fine (the ALSA buffer is going to be 15ms). The only thing you need to make sure is that before you start the playback you need to fill up the 15ms ALSA buffer. At the start of the playback you most likely see that 2 period will elapse, and pretty soon the third will also. From this point the time between periods will settle, and will keep the 5ms distance.
I hope this helps.

Dear Peter,
Sorry for the delay in getting back to you. I just wanted to follow up now that I got the Angstrom SDHC card and booted it to check the kernel version.
Edgar, are you guys running a vanilla kernel or an -rt kernel?
We are running vanilla kernels: In Ubuntu: ccrma@satellite:~$ uname -a Linux omap 2.6.38.4-x3 #1 SMP Wed Apr 27 00:42:20 UTC 2011 armv7l GNU/Linux
In Angstrom: root@beagleboard:~# uname -a Linux beagleboard 2.6.32 #3 PREEMPT Wed Aug 18 15:53:03 UTC 2010 armv7l unknown
I think I can explain this: McBSP2 has 1280 word long buffer.
Wow, that is a long buffer! There is no way to reduce the buffer size?
I have some ideas, but it is a long shot. Basically by design it is not possible, but I might can get around of it by misusing/configuring the McBSP, and sDMA.
I would imagine that you could, in order to decrease the latency, initialize the buffer with all zeros and then only change part of the buffer with each interrupt. Or is there some hardware restriction?
I try to hack something up for kernel 3.3.
I'm looking forward to testing it!
Best regards, Edgar
Ujfalusi, Peter wrote:
I think I can explain this: McBSP2 has 1280 word long buffer. We place constraint that the ALSA buffer must not be smaller than this, because if it is smaller we can end up spinning on the buffer at stream start: sDMA will fill up the FIFO, and if the buffer is smaller than the FIFO, it will copy the buffer several times, or partially. Depending on the size you would have chosen.
You are requesting 128 frames long period size, which occupy 256 words (I assume stereo samples) With that period size you will need at least 5 periods to cover the McBSP2 FIFO: 1280 / 256 = 5 It is another question, why jack selects 8 periods, while it could use 6, if it likes to have even number of periods...
participants (6)
-
Clemens Ladisch
-
Edgar Berdahl
-
Gabriel M. Beddingfield
-
Jacinto Alvarez
-
Peter Ujfalusi
-
Ujfalusi, Peter