[alsa-devel] Some questions related to ALSA based duplex audio processing
Hi,
I have just started to develop a duplex audio processing application based on ALSA. My development goals are - of course - maximum stability as well as lowest possible delay (latency).
So, here is what I did on my Ubuntu laptop:
1) Setup of device/soundcard "hw:0,0" for processing at a samplerate of 44.1 kHz, buffersize 128 samples. 2) I created one input PCM device and one output PCM device handle 3) I use the function "snd_async_add_pcm_handler" to install a callback function (ASYNC mode), one for input, one for output. 4) I use "snd_pcm_link" to synchronize both pcm handles. 5) I use the mmap'ed area access
My motivation to work the way I do is to maximize the control of processing behavior and to minimize the latency. E.g., I have understood from the docs that the snd_pcm_read/snd_pcm_write functions do nothing else than addressing the memory mapped areas. As a conclusion, I realize this access myself to have more control about it. And by using the async callbacks, I do not have to deal with blocking or non-blocking read functions or polling related issues.
So far, I have realized the playback part and noticed some aspects which are not clear to me from the ALSA documentation:
a) What is the most efficient way to realize duplex audio processing, the async way with mmap'ed read/write that I follow? b) At first, I created the pcm handles as follows: "snd_pcm_open (.., SND_PCM_ASYNC);" When starting to process audio (call of "snd_pcm_start"), I realized that the registered callback functions do not get called. I had to change this to "snd_pcm_open (.., 0);". Is that really intended, it seems contradictory? c) The function "snd_pcm_info_get_sync" is supposed to return a description of the synchronization behavior of a soundcard. In my case, I called this function for two soundcards (one USB and the laptop integrated soundcard). In both cases, the returned content is all zeros. Should not this be different for both devices? d) By default, it seems that the callbacks for audio frames come within a thread with normal priority. On Windows OS, I am used to increase the thread priority whenever starting my threads, but it is commonly not a good idea to increase the process priority. In the ALSA latency-example, the PROCESS priority is increased (which can be done only with superuser priv.). What is the recommended way in Linux to achieve lower latencies?
In the future I will have to deal with synchronization of input and output since the samples arrive/leave my application in two independent callback functions. So what will I do: Once input samples are available I will store these in one buffer, and these samples will be output the next time when there is space in the output ring buffer. In order to minimize the latency, I would need to know more about the intended exact behavior of the ALSA lib which I did not find in the documentation:
I) If linking input and output with "snd_pcm_link", I have understood that the changes of state for input and output PCM handle will occur synchronously. That is, when doing the operation such as "snd_pcm_prepare" on one of the handles, both handles will be affected. However, what does this means for audio processing on the frame level: If I use two callbacks for signal processing for input and output respectively installed based on the function "snd_async_add_pcm_handler", will these callbacks occur simultaneously? On Windows OS (speaking of ASIO sound) there is one callback in which input and output is handled simultaneously. Can I somehow setup ALSA to have a similar behavor?
Thank you for any assistance and best regards
HK
On Sun, Jan 29, 2012 at 08:19:45PM +0100, public-hk wrote:
I have just started to develop a duplex audio processing application based on ALSA. My development goals are - of course - maximum stability as well as lowest possible delay (latency).
Have a look at the zita-alsa-pcmi library. It probably does all you need - it takes care of the zillions of calls necessary to initialise mmap access, and provides/accepts floating point audio samples regardless of the soundcard's format. No documentation ATM but the examples should get you going. It's C++, but using on the class mechanism, no templates, dependency on C++ libriries etc.
http://kokkinizita.linuxaudio.org/linuxaudio/downloads
Hi,
thank you very much for this hint. I noticed that there are also some other interesting publications on the referred webpage!
Best regards
HK
Am 29.01.2012 21:31, schrieb Fons Adriaensen:
On Sun, Jan 29, 2012 at 08:19:45PM +0100, public-hk wrote:
I have just started to develop a duplex audio processing application based on ALSA. My development goals are - of course - maximum stability as well as lowest possible delay (latency).
Have a look at the zita-alsa-pcmi library. It probably does all you need - it takes care of the zillions of calls necessary to initialise mmap access, and provides/accepts floating point audio samples regardless of the soundcard's format. No documentation ATM but the examples should get you going. It's C++, but using on the class mechanism, no templates, dependency on C++ libriries etc.
public-hk wrote:
So, here is what I did on my Ubuntu laptop:
- Setup of device/soundcard "hw:0,0" for processing at a samplerate of 44.1 kHz, buffersize 128 samples.
Please note that not every hardware might support these specific parameters.
- I created one input PCM device and one output PCM device handle
- I use the function "snd_async_add_pcm_handler" to install a callback function (ASYNC mode), one for input,
one for output. 4) I use "snd_pcm_link" to synchronize both pcm handles. 5) I use the mmap'ed area access
My motivation to work the way I do is to maximize the control of processing behavior and to minimize the latency. E.g., I have understood from the docs that the snd_pcm_read/snd_pcm_write functions do nothing else than addressing the memory mapped areas. As a conclusion, I realize this access myself to have more control about it.
Why do you want to duplicate snd_pcm_read/write? What do you "control" there, i.e., what are you doing differently?
And by using the async callbacks, I do not have to deal with blocking or non-blocking read functions or polling related issues.
But instead you have to deal with signal delivery. Besides being nonportable, you are not allowed to do anything useful inside a signal handler.
a) What is the most efficient way to realize duplex audio processing, the async way with mmap'ed read/write that I follow?
There isn't much difference in efficiency. If you count programmer time, these choices are the worst.
c) The function "snd_pcm_info_get_sync" is supposed to return a description of the synchronization behavior of a soundcard. In my case, I called this function for two soundcards (one USB and the laptop integrated soundcard). In both cases, the returned content is all zeros. Should not this be different for both devices?
These functions return useful values only if snd_pcm_hw_params_can_sync_start().
d) By default, it seems that the callbacks for audio frames come within a thread with normal priority.
Please don't mix signals and threads.
What is the recommended way in Linux to achieve lower latencies?
Use a small buffer size. Anything else doesn't really matter.
I) If linking input and output with "snd_pcm_link", I have understood that the changes of state for input and output PCM handle will occur synchronously. That is, when doing the operation such as "snd_pcm_prepare" on one of the handles, both handles will be affected.
And if the hardware doesn't have special hardware support, ALSA will just call both devices' start function one after the other.
However, what does this means for audio processing on the frame level: If I use two callbacks for signal processing for input and output respectively installed based on the function "snd_async_add_pcm_handler", will these callbacks occur simultaneously?
This depends. If both buffers are configured with the same parameters, and if both devices run from the same sample clock, then both devices should be ready at approximately the same time. (A playback device needs to fill it FIFO before playing these samples, while a capture device needs to write its FIFO to memory after recording the samples, so you could expect the capture notification to be a little bit later, unless the hardware specifically avoids this.)
I have no clue how two simultaneous signals behave. You should use poll() so that you can wait for both devices being ready.
Regards, Clemens
Hi, thank you very much for your comments!
So, here is what I did on my Ubuntu laptop:
- Setup of device/soundcard "hw:0,0" for processing at a samplerate of 44.1 kHz, buffersize 128 samples.
Please note that not every hardware might support these specific parameters.
Yes, in the future, I will use the core functionality of audio processing within a GUI based application so that other devices with different setups have to be setup by the user.
- I created one input PCM device and one output PCM device handle
- I use the function "snd_async_add_pcm_handler" to install a callback function (ASYNC mode), one for input,
one for output. 4) I use "snd_pcm_link" to synchronize both pcm handles. 5) I use the mmap'ed area access
My motivation to work the way I do is to maximize the control of processing behavior and to minimize the latency. E.g., I have understood from the docs that the snd_pcm_read/snd_pcm_write functions do nothing else than addressing the memory mapped areas. As a conclusion, I realize this access myself to have more control about it.
Why do you want to duplicate snd_pcm_read/write? What do you "control" there, i.e., what are you doing differently?
The additional degree of freedom is that I see the amount of samples which are available in the mmap'ed buffer whereas with read and write, I only get the notification that a specific amount of samples has been available (based on the number of samples to be read/written specified when calling the function). I had the feeling that I can react in a more flexible way using mmap'ed buffers.
And by using the async callbacks, I do not have to deal with blocking or non-blocking read functions or polling related issues.
But instead you have to deal with signal delivery. Besides being nonportable, you are not allowed to do anything useful inside a signal handler.
Why is this nonportable? The sound APIs that I dealt with before more or less by definition work based on callback mechanisms (ASIO, CoreAudio). What is the restriction considering the processing that I plan within the signal handler? Is that a documented restriction? My understanding is that unless my code is too slow to be in time for the next delivery, there should be no problem.
A possible alternative realization would be to start a thread in which I do 1) pcm_read 2) process audio samples 3) pcm_write in an infinite loop. In this case, however, the "read" would also block the "write" for a specific time. This architecture would a) introduce additional delay if I miss the next required "write" due to the blocking "read". b) might reduce the available processing time (operation "process audio samples") since I have "wasted" time in the blocking "read".
And using two distinct threads for input and output, one looping over "pcm_read" and the other looping over "process audio samples" followed by "pcm_write" would be a third option. This, however, would not be so different from my approach but would increase the programming effort from my point of view (effort to start threads).
c) The function "snd_pcm_info_get_sync" is supposed to return a description of the synchronization behavior of a soundcard. In my case, I called this function for two soundcards (one USB and the laptop integrated soundcard). In both cases, the returned content is all zeros. Should not this be different for both devices?
These functions return useful values only if snd_pcm_hw_params_can_sync_start().
Ah, ok, I will test that.
d) By default, it seems that the callbacks for audio frames come within a thread with normal priority.
Please don't mix signals and threads.
Maybe I should be more precise at this point: I assume that the asynchronous callback handler functions are triggered repeatedly from within one thread which is started by the ALSA Lib on processing startup (that is: one thread for input callbacks, one for output callbacks). I would want to raise the priority of these two threads. Maybe my assumption is wrong?
However, what does this means for audio processing on the frame level: If I use two callbacks for signal processing for input and output respectively installed based on the function "snd_async_add_pcm_handler", will these callbacks occur simultaneously?
This depends. If both buffers are configured with the same parameters, and if both devices run from the same sample clock, then both devices should be ready at approximately the same time. (A playback device needs to fill it FIFO before playing these samples, while a capture device needs to write its FIFO to memory after recording the samples, so you could expect the capture notification to be a little bit later, unless the hardware specifically avoids this.)
I have no clue how two simultaneous signals behave. You should use poll() so that you can wait for both devices being ready.
So, the conclusion is that it is not really defined and I have to deal with synchronization of input and output myself, is that the right interpretation?
Thank you again and best regards
HK
On Tue, Jan 31, 2012 at 12:39:55PM +0100, public-hk wrote:
So, the conclusion is that it is not really defined and I have to deal with synchronization of input and output myself, is that the right interpretation?
If the capture and playback devices use the same sample clock (which will be the case if they refer to the same soundcard), and they are started together, they will remain in sync and both sides will be ready at the same time - any difference will be trivially small compared to the period time.
If they don't use the same sample clock you have a difficult problem: you need adaptive resampling (which is the subject of my proposed LAC paper).
So in practice, if you use one and the same soundcard for input and output there is nothing to be gained from keeping the two separate, and in most cases the driver will allow then to be 'linked', that is started and stopped with a single call.
Waiting for ALSA devices using poll() can be complicated. In the general case there will be more than one filedesc, the poll events returned don't always correspond to the logical direction and have to reworked, you need to take care of error recovery, etc. This is one of the many things zita-alsa-pcmi will do for you, just call pcm_wait(). When it returns you can read and write one period (in theory there could be more than one, but that is quite exceptional - I've never seen it happen).
Ciao,
Dear FA, dear Clemens,
thank you a lot for the valuable comments, I have learned a lot about Linux and ALSA from your comments!
Best regards
HK
public-hk wrote:
I have understood from the docs that the snd_pcm_read/snd_pcm_write functions do nothing else than addressing the memory mapped areas. As a conclusion, I realize this access myself to have more control about it.
Why do you want to duplicate snd_pcm_read/write? What do you "control" there, i.e., what are you doing differently?
The additional degree of freedom is that I see the amount of samples which are available in the mmap'ed buffer whereas with read and write, I only get the notification that a specific amount of samples has been available (based on the number of samples to be read/written specified when calling the function).
snd_pcm_avail()
And by using the async callbacks, I do not have to deal with blocking or non-blocking read functions or polling related issues.
But instead you have to deal with signal delivery. Besides being nonportable, you are not allowed to do anything useful inside a signal handler.
Why is this nonportable?
Not every sound device type supports async callbacks.
The sound APIs that I dealt with before more or less by definition work based on callback mechanisms (ASIO, CoreAudio). What is the restriction considering the processing that I plan within the signal handler?
Signals are _not_ threads; they are sent to an existing process or thread, and interrupt it.
Most functions are not reentrant and therefore cannot be called from a signal handler. The only useful thing you _can_ do is to write to a pipe to wake up another thread that waits with poll() for this notification, but then you could just as well wait for the device itself.
Is that a documented restriction?
http://pubs.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_04.html
(ALSA uses SIGIO. SIGEV_THREAD is not supported.)
A possible alternative realization would be to start a thread in which I do
- pcm_read
- process audio samples
- pcm_write
in an infinite loop. In this case, however, the "read" would also block the "write" for a specific time. This architecture would a) introduce additional delay if I miss the next required "write" due to the blocking "read". b) might reduce the available processing time (operation "process audio samples") since I have "wasted" time in the blocking "read".
Then use non-blocking mode. (Or use a library that does this for you.)
Regards, Clemens
participants (3)
-
Clemens Ladisch
-
Fons Adriaensen
-
public-hk