[alsa-devel] need help with io plugin programming: how to add delay ?

Hi,
I would like to write a simple alsa io plugin that takes some amount of PCM samples and sends them to a socket in a timely manner (this would be playback). Later on it should work the other way round as well (recording). (Well, actually I have been working on this for a month already but so far I still did not manage to get a working solution.)
However, if the PCM stream comes from something prerecorded (i.e. a wav file) there needs to be some kind of delay. If for example the wav file would have a sampling rate of 8 kHz, 16 bit resolution and one period should consist of 160 samples, it would be necessary to write on period exactly each 20ms. For this reason delay needs to be added somewhere so that each 20ms exactly one period is begin sent to the socket. Otherwise all the PCM samples would be just immediately written to the socket without any delay.
I read a lot of other alsa plugin code but still I have found no way to properly solve the delay problem.
* Where and how do I need to add delay ??
My first approach was to add delay by sleeping in the transfer callback function. After I had implemented that solution (and it did not work that well) I was told that this is not the proper way to do it.
* What is the proper way then ?
I was told that the alsa plugin framework is poll based and I should come up with a poll based solution. However, I can not just let the alsa io plugin poll for POLLOUT on the socket as this would not add any delay at all. (It would just wait until new data can be written to the socket.)
After looking at the bluetooth alsa plugin I came up with another idea (and my second approach):
Basically it should be possible to create an independent timing thread in the alsa plugin that writes dummy data (e.g. a single byte) to a pipe each period (thus in my case each 20ms).
The plugin would then poll for POLLIN on the above mentioned pipe and poll for POLLOUT on the socket at the same time. This way the transfer callback function would be called if and only if the following two conditions are met:
* 20 ms period time has passed * the socket can accept new data
* Would this solution be the proper way ?
Another solution (my third one already) would be to introduce *no* delay in the alsa plugin itself, but do everything in the application that is listening on the socket. For this reason the application would read one period, process the data and then sleep for the rest of the time until the time for one period (i.e. 20ms) is over. After that the next period is read from the socket and so forth. The alsa plugin would then "automatically" write new data to the socket each time the application listening to that socket reads data and there is free space in the socket send buffer.
* Once again, would this be a proper solution ?
* If so, how do I know when to increment the hardware pointer ? (- After the send() call in the transfer callback function ? - Using an independent thread in the plugin that just increments the hardware pointer each 20ms ?) - something even more complex ?)
My guess is that the third solution (delay is added in the application that listens to the socket) might work well as the delay is added at the end of the "audio processing chain" which makes it less sensitive to delays that are introduced in between.
* Is this correct ?
I would appreciate any help as I have invested way too much time in this already.
cheers, stefan

Hi,
Stefan Schoenleitner wrote:
Another solution (my third one already) would be to introduce *no* delay in the alsa plugin itself, but do everything in the application that is listening on the socket. For this reason the application would read one period, process the data and then sleep for the rest of the time until the time for one period (i.e. 20ms) is over. After that the next period is read from the socket and so forth. The alsa plugin would then "automatically" write new data to the socket each time the application listening to that socket reads data and there is free space in the socket send buffer.
I now tested this idea by writing a simple socket reader and writer (see code below).
The writer just sends data (160 bytes each) over the socket as soon as the socket is available for writing. (For this reason if no one is reading from the other end of the socket, it just fills up the socket buffer and then blocks).
The counterpart of the writer is the reader application that should control the timing (as I mentioned in my previous post). For this reason it basically just reads a chunk of data (160 bytes each) from the socket and then sleeps until a certain amount of time (20ms, i.e. one period) is over.
However, the result is not satisfying at all as the delay between the received chunks varies and sometimes even takes a whole 15ms (!) longer (i.e. 35ms delay instead of 20ms).
Here is some example output of the reader:
last recv 23.691140ms ago sleeping for 9ms last recv 20.127908ms ago sleeping for 9ms last recv 20.105699ms ago sleeping for 9ms last recv 20.052133ms ago sleeping for 9ms last recv 20.116666ms ago sleeping for 3ms last recv 25.770387ms ago sleeping for 6ms last recv 21.651706ms ago sleeping for 9ms last recv 36.320541ms ago sleeping for 5ms last recv 20.348258ms ago
As the calculation of the sleep time uses an *absolute* point of time as reference (which is takes before starting to poll() and recv() from the socket), all delay after the time measurement and before the actual sleep does not influence the precision, right ? For this reason I have no idea where the high delay variations are coming from. I mean how should it work with an alsa plugin if not even this simple simulation code works in a timely and precise manner ?
* What am I doing wrong ?
I thought of adding buffering so that the socket read operations do not have a strict timing schedule. However, as I said earlier, if the time for socket polling or recv varies, it does not influence the precision as an absolute point of time is used as timeout and the (absolute) start time measurement is done *before* all the socket operations. Thus I think that for this reason adding buffering would not make much difference here, right ?
Maybe you can have a quick look at the code snippets ? The reader is pretty much what I would be doing in my daemon that should receive the PCM samples from the alsa plugin.
cheers, stefan
---------------------------- reader.c ---------------------------- #define _POSIX_C_SOURCE 199309L #define _BSD_SOURCE
#include <stdio.h> #include <stdlib.h> #include <time.h> #include <unistd.h> #include <sys/types.h> #include <sys/socket.h> #include <sys/un.h> #include <string.h> #include <errno.h> #include <pthread.h> #include <poll.h> #include <assert.h>
#define SOCK_PATH "/tmp/pcm.socket" #define PCM_BUFFERSIZE 160
/* adapted from glibc sys/time.h timersub() macro */ #define priv_timespecsub(a, b, result) \ do { \ (result)->tv_sec = (a)->tv_sec - (b)->tv_sec; \ (result)->tv_nsec = (a)->tv_nsec - (b)->tv_nsec; \ if ((result)->tv_nsec < 0) { \ --(result)->tv_sec; \ (result)->tv_nsec += 1000000000; \ } \ } while (0)
#define priv_timespecadd(a, b, result) \ do { \ (result)->tv_sec = (a)->tv_sec + (b)->tv_sec; \ (result)->tv_nsec = (a)->tv_nsec + (b)->tv_nsec; \ if ((result)->tv_nsec >=1000000000L) { \ ++(result)->tv_sec; \ (result)->tv_nsec -= 1000000000L; \ } \ } while (0)
pthread_mutex_t timer_mutex = PTHREAD_MUTEX_INITIALIZER; pthread_cond_t timer_condition = PTHREAD_COND_INITIALIZER;
int main(void) { int s, s2, t, len, ret; struct sockaddr_un local, remote; char buf[PCM_BUFFERSIZE]; struct timespec start, now, period_time, delta, timeout; struct timespec prev, delta_prev; struct pollfd fds; unsigned int poll_timeout;
// one period is 20ms period_time.tv_sec=0; period_time.tv_nsec = 20 * 1000000L; poll_timeout = period_time.tv_nsec / 1000000L;
if ((s = socket(AF_UNIX, SOCK_STREAM, 0)) == -1) { perror("socket"); exit(1); }
local.sun_family = AF_UNIX; strcpy(local.sun_path, SOCK_PATH); unlink(local.sun_path); len = strlen(local.sun_path) + sizeof(local.sun_family); if (bind(s, (struct sockaddr *)&local, len) == -1) { perror("bind"); exit(1); }
if (listen(s, 0) == -1) { perror("listen"); exit(1); }
start.tv_sec=0; start.tv_nsec=0;
for(;;) { int done, n; printf("Waiting for a connection...\n"); t = (int)sizeof(remote); if ((s2 = accept(s, (struct sockaddr *)&remote, (unsigned int *)&t)) == -1) { perror("accept"); exit(1); }
printf("Connected.\n");
done = 0;
fds.fd = s2; fds.events = POLLIN;
// sleep for some time so that socket buffer fills up usleep(80000);
clock_gettime(CLOCK_REALTIME, &start);
do { memcpy(&prev, &start, sizeof(struct timespec)); clock_gettime(CLOCK_REALTIME, &start);
// use poll to check out when we can read if ((ret = poll(&fds, 1, poll_timeout))<0) { perror("poll"); exit(1); }
if (ret == 0) { printf("timeout --> underflow\n"); continue; }
assert (ret == 1 && (fds.revents & POLLIN)); priv_timespecadd(&start, &period_time, &timeout); //printf("timeout at %li:%li\n", timeout.tv_sec, timeout.tv_nsec/1000000L);
priv_timespecsub(&start, &prev, &delta_prev); printf("last recv %fms ago\n", delta_prev.tv_sec*1000.0 + delta_prev.tv_nsec/1000000.0);
n = recv(s2, buf, PCM_BUFFERSIZE, MSG_DONTWAIT); if (n <= 0) { if (n < 0) perror("recv"); done = 1; break; }
// -------------------------- BEGIN PROCESSING BLOCK -------------------------- assert (n==PCM_BUFFERSIZE); //printf("received %i bytes at %li:%li\n", n, start.tv_sec, start.tv_nsec/1000000L);
// process data (requires some amount of time) usleep(10000);
// -------------------------- END PROCESSING BLOCK -------------------------- // sleep for the rest of the time until the period is over clock_gettime(CLOCK_REALTIME, &now); //printf("now: %li:%li\n", now.tv_sec, now.tv_nsec/1000000L); priv_timespecsub(&timeout, &now, &delta);
if ((now.tv_sec >= timeout.tv_sec) && (now.tv_nsec >= timeout.tv_nsec)) { printf("we're late, don't sleep\n"); } else {
printf("sleeping for %lims\n", delta.tv_sec*1000 + delta.tv_nsec/1000000L); //nanosleep(&delta, NULL);
// wait for absolute timeout using a fake pthread condition which is never signaled pthread_mutex_lock(&timer_mutex); pthread_cond_timedwait(&timer_condition, &timer_mutex, &timeout); pthread_mutex_unlock(&timer_mutex); }
} while (!done);
close(s2); }
return 0; } ------------------------------------------------------------------
---------------------------- writer.c ---------------------------- #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <errno.h> #include <string.h> #include <sys/types.h> #include <sys/socket.h> #include <sys/un.h> #include <poll.h> #include <assert.h>
#define SOCK_PATH "/tmp/pcm.socket" #define PCM_BUFSIZE 160
int main(void) { int s, len; struct sockaddr_un remote; char str[PCM_BUFSIZE]; struct pollfd fds; int ret; int value; unsigned int optlen;
if ((s = socket(AF_UNIX, SOCK_STREAM, 0)) == -1) { perror("socket"); exit(1); }
// set send buffer size // man 7 socket --> The minimum value for this option is 2048. value=2048; if (setsockopt(s, SOL_SOCKET, SO_SNDBUF, &value, sizeof(value))<0) { perror("setsockopt()"); return -1; }
// check size optlen=sizeof(value); if (getsockopt(s, SOL_SOCKET, SO_SNDBUF, &value, &optlen)<0) { perror("getsockopt()"); return -1; }
printf("optlen = %i, value: %i\n", optlen, value); assert (optlen == sizeof(value));
printf("send buffer size is %i bytes\n", value);
printf("Trying to connect...\n");
remote.sun_family = AF_UNIX; strcpy(remote.sun_path, SOCK_PATH); len = strlen(remote.sun_path) + sizeof(remote.sun_family); if (connect(s, (struct sockaddr *)&remote, len) == -1) { perror("connect"); exit(1); }
printf("Connected.\n");
// set up poll fds fds.fd = s; fds.events = POLLOUT; // writing now will not block
while(1) { // poll socket to see if it's ready, timeout: 20 ms ret = poll(&fds, 1, 20);
if (ret<0) { perror("poll"); exit(1); }
if (ret==0) { printf("timeout\n"); continue; }
if (fds.revents & (POLLERR | POLLHUP | POLLNVAL)) { printf("POLLERR | POLLHUP | POLLNVAL)\n"); exit(1); }
if (fds.revents == POLLOUT) printf("POLLOUT, we can write now ..\n"); else printf("revent is not POLLOUT !\n");
// send PCM_BUFSIZE bytes if ((ret=send(s, str, PCM_BUFSIZE, MSG_DONTWAIT))<0) { perror("send"); exit(1); }
printf("sent %i bytes\n", ret); }
// never reached close(s);
return 0; } ------------------------------------------------------------------

Do you really need a socket, or are you just trying to move audio data from one app to another? Check out jack and netjack. They may already provide the functionality you need.
On Tue, Dec 22, 2009 at 5:08 PM, Stefan Schoenleitner < dev.c0debabe@gmail.com> wrote:
Hi,
Stefan Schoenleitner wrote:
Another solution (my third one already) would be to introduce *no* delay in the alsa plugin itself, but do everything in the application that is listening on the socket. For this reason the application would read one period, process the data and then sleep for the rest of the time until the time for one period (i.e. 20ms) is over. After that the next period is read from the socket and so forth. The alsa plugin would then "automatically" write new data to the socket each time the application listening to that socket reads data and there is free space in the socket send buffer.
I now tested this idea by writing a simple socket reader and writer (see code below).
The writer just sends data (160 bytes each) over the socket as soon as the socket is available for writing. (For this reason if no one is reading from the other end of the socket, it just fills up the socket buffer and then blocks).
The counterpart of the writer is the reader application that should control the timing (as I mentioned in my previous post). For this reason it basically just reads a chunk of data (160 bytes each) from the socket and then sleeps until a certain amount of time (20ms, i.e. one period) is over.
However, the result is not satisfying at all as the delay between the received chunks varies and sometimes even takes a whole 15ms (!) longer (i.e. 35ms delay instead of 20ms).
Here is some example output of the reader:
last recv 23.691140ms ago sleeping for 9ms last recv 20.127908ms ago sleeping for 9ms last recv 20.105699ms ago sleeping for 9ms last recv 20.052133ms ago sleeping for 9ms last recv 20.116666ms ago sleeping for 3ms last recv 25.770387ms ago sleeping for 6ms last recv 21.651706ms ago sleeping for 9ms last recv 36.320541ms ago sleeping for 5ms last recv 20.348258ms ago
As the calculation of the sleep time uses an *absolute* point of time as reference (which is takes before starting to poll() and recv() from the socket), all delay after the time measurement and before the actual sleep does not influence the precision, right ? For this reason I have no idea where the high delay variations are coming from. I mean how should it work with an alsa plugin if not even this simple simulation code works in a timely and precise manner ?
- What am I doing wrong ?
I thought of adding buffering so that the socket read operations do not have a strict timing schedule. However, as I said earlier, if the time for socket polling or recv varies, it does not influence the precision as an absolute point of time is used as timeout and the (absolute) start time measurement is done *before* all the socket operations. Thus I think that for this reason adding buffering would not make much difference here, right ?
Maybe you can have a quick look at the code snippets ? The reader is pretty much what I would be doing in my daemon that should receive the PCM samples from the alsa plugin.
cheers, stefan
---------------------------- reader.c ---------------------------- #define _POSIX_C_SOURCE 199309L #define _BSD_SOURCE
#include <stdio.h> #include <stdlib.h> #include <time.h> #include <unistd.h> #include <sys/types.h> #include <sys/socket.h> #include <sys/un.h> #include <string.h> #include <errno.h> #include <pthread.h> #include <poll.h> #include <assert.h>
#define SOCK_PATH "/tmp/pcm.socket" #define PCM_BUFFERSIZE 160
/* adapted from glibc sys/time.h timersub() macro */ #define priv_timespecsub(a, b, result) \ do { \ (result)->tv_sec = (a)->tv_sec - (b)->tv_sec; \ (result)->tv_nsec = (a)->tv_nsec - (b)->tv_nsec; \ if ((result)->tv_nsec < 0) { \ --(result)->tv_sec; \ (result)->tv_nsec += 1000000000; \ } \ } while (0)
#define priv_timespecadd(a, b, result) \ do { \ (result)->tv_sec = (a)->tv_sec + (b)->tv_sec; \ (result)->tv_nsec = (a)->tv_nsec + (b)->tv_nsec; \ if ((result)->tv_nsec >=1000000000L) { \ ++(result)->tv_sec; \ (result)->tv_nsec -= 1000000000L; \ } \ } while (0)
pthread_mutex_t timer_mutex = PTHREAD_MUTEX_INITIALIZER; pthread_cond_t timer_condition = PTHREAD_COND_INITIALIZER;
int main(void) { int s, s2, t, len, ret; struct sockaddr_un local, remote; char buf[PCM_BUFFERSIZE]; struct timespec start, now, period_time, delta, timeout; struct timespec prev, delta_prev; struct pollfd fds; unsigned int poll_timeout;
// one period is 20ms period_time.tv_sec=0; period_time.tv_nsec = 20 * 1000000L; poll_timeout = period_time.tv_nsec / 1000000L; if ((s = socket(AF_UNIX, SOCK_STREAM, 0)) == -1) { perror("socket"); exit(1); } local.sun_family = AF_UNIX; strcpy(local.sun_path, SOCK_PATH); unlink(local.sun_path); len = strlen(local.sun_path) + sizeof(local.sun_family); if (bind(s, (struct sockaddr *)&local, len) == -1) { perror("bind"); exit(1); } if (listen(s, 0) == -1) { perror("listen"); exit(1); } start.tv_sec=0; start.tv_nsec=0; for(;;) { int done, n; printf("Waiting for a connection...\n"); t = (int)sizeof(remote); if ((s2 = accept(s, (struct sockaddr *)&remote, (unsigned
int *)&t)) == -1) { perror("accept"); exit(1); }
printf("Connected.\n"); done = 0; fds.fd = s2; fds.events = POLLIN; // sleep for some time so that socket buffer fills up usleep(80000); clock_gettime(CLOCK_REALTIME, &start); do { memcpy(&prev, &start, sizeof(struct timespec)); clock_gettime(CLOCK_REALTIME, &start); // use poll to check out when we can read if ((ret = poll(&fds, 1, poll_timeout))<0) { perror("poll"); exit(1); } if (ret == 0) { printf("timeout --> underflow\n"); continue; } assert (ret == 1 && (fds.revents & POLLIN)); priv_timespecadd(&start, &period_time, &timeout); //printf("timeout at %li:%li\n", timeout.tv_sec,
timeout.tv_nsec/1000000L);
priv_timespecsub(&start, &prev, &delta_prev); printf("last recv %fms ago\n",
delta_prev.tv_sec*1000.0 + delta_prev.tv_nsec/1000000.0);
n = recv(s2, buf, PCM_BUFFERSIZE, MSG_DONTWAIT); if (n <= 0) { if (n < 0) perror("recv"); done = 1; break; } // -------------------------- BEGIN PROCESSING BLOCK
assert (n==PCM_BUFFERSIZE); //printf("received %i bytes at %li:%li\n", n,
start.tv_sec, start.tv_nsec/1000000L);
// process data (requires some amount of time) usleep(10000); // -------------------------- END PROCESSING BLOCK
// sleep for the rest of the time until the period
is over clock_gettime(CLOCK_REALTIME, &now); //printf("now: %li:%li\n", now.tv_sec, now.tv_nsec/1000000L); priv_timespecsub(&timeout, &now, &delta);
if ((now.tv_sec >= timeout.tv_sec) && (now.tv_nsec
= timeout.tv_nsec))
{ printf("we're late, don't sleep\n"); } else { printf("sleeping for %lims\n",
delta.tv_sec*1000 + delta.tv_nsec/1000000L); //nanosleep(&delta, NULL);
// wait for absolute timeout using a fake
pthread condition which is never signaled pthread_mutex_lock(&timer_mutex); pthread_cond_timedwait(&timer_condition, &timer_mutex, &timeout); pthread_mutex_unlock(&timer_mutex); }
} while (!done); close(s2); } return 0;
}
---------------------------- writer.c ---------------------------- #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <errno.h> #include <string.h> #include <sys/types.h> #include <sys/socket.h> #include <sys/un.h> #include <poll.h> #include <assert.h>
#define SOCK_PATH "/tmp/pcm.socket" #define PCM_BUFSIZE 160
int main(void) { int s, len; struct sockaddr_un remote; char str[PCM_BUFSIZE]; struct pollfd fds; int ret; int value; unsigned int optlen;
if ((s = socket(AF_UNIX, SOCK_STREAM, 0)) == -1) { perror("socket"); exit(1); }
// set send buffer size // man 7 socket --> The minimum value for this option is 2048. value=2048; if (setsockopt(s, SOL_SOCKET, SO_SNDBUF, &value, sizeof(value))<0) { perror("setsockopt()"); return -1; }
// check size optlen=sizeof(value); if (getsockopt(s, SOL_SOCKET, SO_SNDBUF, &value, &optlen)<0) { perror("getsockopt()"); return -1; }
printf("optlen = %i, value: %i\n", optlen, value); assert (optlen == sizeof(value));
printf("send buffer size is %i bytes\n", value);
printf("Trying to connect...\n"); remote.sun_family = AF_UNIX; strcpy(remote.sun_path, SOCK_PATH); len = strlen(remote.sun_path) + sizeof(remote.sun_family); if (connect(s, (struct sockaddr *)&remote, len) == -1) { perror("connect"); exit(1); } printf("Connected.\n"); // set up poll fds fds.fd = s; fds.events = POLLOUT; // writing now will not block while(1) { // poll socket to see if it's ready, timeout: 20 ms ret = poll(&fds, 1, 20); if (ret<0) { perror("poll"); exit(1); } if (ret==0) { printf("timeout\n"); continue; } if (fds.revents & (POLLERR | POLLHUP | POLLNVAL)) { printf("POLLERR | POLLHUP | POLLNVAL)\n"); exit(1); } if (fds.revents == POLLOUT) printf("POLLOUT, we can write now ..\n"); else printf("revent is not POLLOUT !\n"); // send PCM_BUFSIZE bytes if ((ret=send(s, str, PCM_BUFSIZE, MSG_DONTWAIT))<0) { perror("send"); exit(1); } printf("sent %i bytes\n", ret); } // never reached close(s); return 0;
}
Alsa-devel mailing list Alsa-devel@alsa-project.org http://mailman.alsa-project.org/mailman/listinfo/alsa-devel

Alex Austin wrote:
Do you really need a socket, or are you just trying to move audio data from one app to another? Check out jack and netjack. They may already provide the functionality you need.
I need to move PCM samples to/from alsa from/to a DSP board. As the DSP board uses a serial interface (UART) it would be rather cumbersome to implement this functionality as kernel driver. My solution should be a little bit similar to bluetooth audio as it also uses an alsa io plugin and a daemon that actually talks to different (hardware-) devices.
cheers, stefan

The simplest solution would probably be completely userspace. Write a jackd client which connects to the UART, gives itself RT priority using capabilities, and exposes however many input/output ports as you have. Google FFADO for an example of the programming model used.
If you need normal apps to drive into it, set up an .asoundrc with a device using the jack plugin. I still don't see why you need a kernel driver at all.
On Wed, Dec 23, 2009 at 2:15 AM, Stefan Schoenleitner < dev.c0debabe@gmail.com> wrote:
Alex Austin wrote:
Do you really need a socket, or are you just trying to move audio data from one app to another? Check out jack and netjack. They may already provide the functionality you need.
I need to move PCM samples to/from alsa from/to a DSP board. As the DSP board uses a serial interface (UART) it would be rather cumbersome to implement this functionality as kernel driver. My solution should be a little bit similar to bluetooth audio as it also uses an alsa io plugin and a daemon that actually talks to different (hardware-) devices.
cheers, stefan

Alex Austin wrote:
The simplest solution would probably be completely userspace. Write a jackd client which connects to the UART, gives itself RT priority using capabilities, and exposes however many input/output ports as you have. Google FFADO for an example of the programming model used.
Just before starting to write yet another implementation (the fourth one, this time with jack) I was wondering if the following criteria are met:
* does this work on embedded arm on an arm9 AT91 cpu running at 180MHz ? * Has jack been ported to arm EABI ? * Can I add a serial protocol if I decide to to use jack ?
* Can I use serial port hardware flow control with it ? The Jack application would need to be aware of the UART and the pipeline fill state of the DSP. (The DSP wants to have 160 audio samples exactly each 20ms)
* Is there a bluetooth jack client (as is does a similar job compared to my application) ?
If you need normal apps to drive into it, set up an .asoundrc with a device using the jack plugin. I still don't see why you need a kernel driver at all.
The previous three solutions I implemented were all in userspace as they consisted of an (userspace) alsa ioplugin and some deamon that processes the data coming from/going to alsa. I never wrote kernel code to solve this.
If I chose to use jack now it means a complete rewrite of all my previous solutions. I will have to start from scratch (again) and all the work I have done in the last month would we useless :(
Just to make sure that jack is the right choice for this I thought I give you some more insight of my design.
Basically it should be possible to use any source and sink with my application. For example this could be a microphone and a headphone, but it could also be some prerecorded wav file. The audio format should be 8kHz 16bit and the whole thing is running on an arm9 AT91 board.
Then there is this DSP board that does does speech compression. It is connected over UART at 460800 baud and requires exactly 160 16bit PCM samples with 8kHz sampling rate each 20ms for speech compression. At the same time it also sends 160 PCM samples each 20ms (speech decompression). It uses a simple protocol for this and all data is exchanged in a protocol consisting of packets including a header. Besides with the same protocol (also each 20ms) the compressed speech packets go in and out. For this reason not only the uncompressed PCM samples need to be exchanged on time but also the compressed packets need to be handled timely.
Thus in the end one could for example talk in the microphone and real-time speech compression would be performed so that the compressed speech comes out of my application. At the same time on should hear the decompressed speech coming from the DSP in the headphones.
Now that you know the gory details, do you think that jack is the best and simplest solution for this problem ?
I really hope that this is the last solution I'm going to implement as I already implemented three previous solutions that took a lot of time but do not work satisfactory.
thanks for helping, cheers, stefan

Stefan Schoenleitner wrote, On 12/23/2009 12:08 AM:
However, the result is not satisfying at all as the delay between the received chunks varies and sometimes even takes a whole 15ms (!) longer (i.e. 35ms delay instead of 20ms).
...
As the calculation of the sleep time uses an *absolute* point of time as reference (which is takes before starting to poll() and recv() from the socket), all delay after the time measurement and before the actual sleep does not influence the precision, right ? For this reason I have no idea where the high delay variations are coming from. I mean how should it work with an alsa plugin if not even this simple simulation code works in a timely and precise manner ?
- What am I doing wrong ?
I thought of adding buffering so that the socket read operations do not have a strict timing schedule. However, as I said earlier, if the time for socket polling or recv varies, it does not influence the precision as an absolute point of time is used as timeout and the (absolute) start time measurement is done *before* all the socket operations. Thus I think that for this reason adding buffering would not make much difference here, right ?
AFAIK ALSA gurantees no deadlines, so I think you will have to add a "sufficiently large" buffer somewhere - and make sure to handle over/underruns somehow anyway.
Just before starting to write yet another implementation (the fourth one,
I did the same exercise. It _is_ really hard to get started writing sound applications and figure out which abstraction level and implementation to use and how to use it correctly.
I ended up coding for the pulseaudio API. My case was for http://bitbucket.org/kiilerix/tcpcam/ . The result isn't pretty, but it could have been worse. The result isn't very stable or understood in all details, but it is good enough for my case. This case is different than yours, but I had problems similar to yours when I tried to code for the ALSA api. My numbers also happens to be exactly the same as in your case.
this time with jack)
Jack to some extent builds on top of ALSA, so jack can't do it any better than ALSA. But jack might use ALSA the right way and thus give you someting close to what you want. AFAIK.
It is connected over UART at 460800 baud and requires exactly 160 16bit PCM samples with 8kHz sampling rate each 20ms for speech compression.
It is my experience that it matters that two different oscillators never have exactly the same frequency, so over time you will get an over- or under-run.
/Mads

Hi,
Mads Kiilerich wrote:
AFAIK ALSA gurantees no deadlines, so I think you will have to add a "sufficiently large" buffer somewhere - and make sure to handle over/underruns somehow anyway.
I already thought that this would help. So the alsa plugin would then fill some large buffer in my daemon application and instead of receiving the data from a socket I would have it available in some application buffer already. Thus I could just read the frames from the application buffer each 20ms and send them over UART to the DSP.
However, right now I found out that there seems to be no way to execute *anything* each 20ms +/- 1ms.
So far I tried
* nanosleep() * pthread_cond_timed_wait() * usleep()
I'm looking forward to try
* HPET * RTC * setitimer() * alsa timers ? * anything else ?
I don't understand why all the function I have tried so far have microsecond or even nanosecond precision and in the end I'm off not for some nano- or microseconds, but for a full 15ms ! This is really bad :(
Just before starting to write yet another implementation (the fourth one,
I did the same exercise. It _is_ really hard to get started writing sound applications and figure out which abstraction level and implementation to use and how to use it correctly.
Indeed.
I ended up coding for the pulseaudio API. My case was for http://bitbucket.org/kiilerix/tcpcam/ . The result isn't pretty, but it could have been worse. The result isn't very stable or understood in all details, but it is good enough for my case. This case is different than yours, but I had problems similar to yours when I tried to code for the ALSA api. My numbers also happens to be exactly the same as in your case.
As you said it's a different case.
this time with jack)
Jack to some extent builds on top of ALSA, so jack can't do it any better than ALSA. But jack might use ALSA the right way and thus give you someting close to what you want. AFAIK.
I looked at some code already. Generally speaking it looks good, but it also uses 32 bit floating point format for example (while I would need 8Khz 16bit LE). It seems a little bit of a detour if I use:
ALSA --> jack io plugin --> jackd --> jack-client --> resample and change format --> my application
instead of
ALSA --> my plugin --> my application
As I said above, right now the problem is not even fully alsa related. It is how I can execute something (e.g. my uart send code) each 20ms +/- 1ms.
It is connected over UART at 460800 baud and requires exactly 160 16bit PCM samples with 8kHz sampling rate each 20ms for speech compression.
It is my experience that it matters that two different oscillators never have exactly the same frequency, so over time you will get an over- or under-run.
Yes, I will have to deal with this as well :( However, from time to time an overflow/underrun should be ok as long it is generally working I guess.
Thanks for your inspiring words, stefan

On Wed, Dec 23, 2009 at 06:08:06PM +0100, Stefan Schoenleitner wrote:
However, right now I found out that there seems to be no way to execute *anything* each 20ms +/- 1ms.
So far I tried
- nanosleep()
- pthread_cond_timed_wait()
- usleep()
I'm looking forward to try
- HPET
- RTC
- setitimer()
- alsa timers ?
- anything else ?
I'd be somewhat surprised if any of them do any better to be honest. A brief glance at the AT91 RTC drivers suggests they don't implement any sort of high resolution tick, HPET is an x86 thing and the others are likely to be implemented in terms of the same underlying constructs as the things you've tried already.
I don't understand why all the function I have tried so far have microsecond or even nanosecond precision and in the end I'm off not for some nano- or microseconds, but for a full 15ms ! This is really bad :(
These APIs are all standards based ones. The time units they use are deliberately chosen to be much smaller than might have realistically been used on systems when they were specified to allow room for systems with highly accurate timers that might come along, but they're all specified in terms of a minimum requested delay rather than a guaranteed accuracy.
Much of the timing in Linux is based of HZ, which is often set quite low for power reasons. As well as the scheduler it's probably also worth asking the AT91 port people how to get the best out of the hardware. It may be that some work is needed to hook the port into the kernel time framework to better use the capabilities of the hardware, for example.

Mark Brown wrote:
On Wed, Dec 23, 2009 at 06:08:06PM +0100, Stefan Schoenleitner wrote:
However, right now I found out that there seems to be no way to execute *anything* each 20ms +/- 1ms. [..]
I'd be somewhat surprised if any of them do any better to be honest. A brief glance at the AT91 RTC drivers suggests they don't implement any sort of high resolution tick, HPET is an x86 thing and the others are likely to be implemented in terms of the same underlying constructs as the things you've tried already.
Thats not good. I guess I will have to find a whole different approach then ?
I don't understand why all the function I have tried so far have microsecond or even nanosecond precision and in the end I'm off not for some nano- or microseconds, but for a full 15ms ! This is really bad :(
These APIs are all standards based ones. The time units they use are deliberately chosen to be much smaller than might have realistically been used on systems when they were specified to allow room for systems with highly accurate timers that might come along, but they're all specified in terms of a minimum requested delay rather than a guaranteed accuracy.
I see, thanks for explaining.
Much of the timing in Linux is based of HZ, which is often set quite low for power reasons. As well as the scheduler it's probably also worth asking the AT91 port people how to get the best out of the hardware. It may be that some work is needed to hook the port into the kernel time framework to better use the capabilities of the hardware, for example.
In your earlier post you mentioned that it might be a good idea to reduce the accurate timing dependency (as the speech codec chip requires 160 samples each 20ms).
I found out that the chip can also be used in a "best effort" kind of way in the sense that one just sends packets as fast as possible and it will de-/compress them on the fly. This way I would no longer need accurate timing for this, right ? The data flow could be controlled by hardware handshaking as the chip signals to the UART when it is ready to receive the next packet.
(Another way would be to just wait until a packet has been received from the chip and then send the next one. The chip has a FIFO for 2x 160 samples.)
At a first glance this seems to solve my problems to some amount I guess.
However, in the end, for example when decompressing, I will end up with lots of PCM samples that need to be played back. Can I just send all these samples to ALSA (using the alsa io plugin) and it will automatically play them correctly if the sampling rate and format has been set up ?
I guess the framework will just wait until the PCM sample count threshold has been reached and then start playing back the samples ?
Thanks for taking me in the right direction ;)
cheers, stefan

On Wed, Dec 23, 2009 at 05:39:16PM +0100, Mads Kiilerich wrote:
Stefan Schoenleitner wrote, On 12/23/2009 12:08 AM:
I mean how should it work with an alsa plugin if not even this simple simulation code works in a timely and precise manner ?
- What am I doing wrong ?
I thought of adding buffering so that the socket read operations do not have a strict timing schedule.
At this point you're not really dealing with ALSA issues at all, the scheduling of tasks in Linux is handled by the core kernel scheduler code. Rather than looking at the audio APIs it seems like you'd be better off looking at either reducing your dependency on highly accurate scheduling or looking at how to work best with the scheduler to get the best possible performance from it.
It is connected over UART at 460800 baud and requires exactly 160 16bit PCM samples with 8kHz sampling rate each 20ms for speech compression.
It is my experience that it matters that two different oscillators never have exactly the same frequency, so over time you will get an over- or under-run.
Indeed. If you've got two different clock domains and no way of lining them up you'll eventually get drift between the two sufficient to cause an under or overrun if you run for long enough without a break.

Hi,
I tracked down the timing problem even more and write another piece of testing code (see below). The code should just do something each 20ms. I discovered that using printf() in critical section is not a good idea as it introduces unwanted delay that is not calculated in the timeout. You can test it by setting DO_PRINTF to 1.
However, if no printf()'s are performed I still do not get a precise delay of 20ms +/- 1ms (even with realtime priority). See example output here:
---------------------------------------------------------------------- $ sudo nice -n-19 ./test-timer
delay[ 0]: 0.007194ms delay[ 1]: 21.125031ms --> thats too long :( delay[ 2]: 20.112892ms delay[ 3]: 20.113100ms delay[ 4]: 20.121551ms delay[ 5]: 22.959203ms --> thats too long :( delay[ 6]: 20.131611ms delay[ 7]: 20.145507ms delay[ 8]: 20.131889ms delay[ 9]: 20.145649ms delay[10]: 20.142365ms ----------------------------------------------------------------------
* How can I deal with the problem ? * Where does the unwanted delay come from ?
cheers, stefan
---------------------------- timer-test.c ---------------------------- #define _POSIX_C_SOURCE 199309L #define _BSD_SOURCE
#include <stdio.h> #include <stdlib.h> #include <time.h> #include <unistd.h> #include <sys/types.h> #include <sys/socket.h> #include <sys/un.h> #include <string.h> #include <errno.h> #include <pthread.h> #include <poll.h> #include <assert.h>
#define DELAYSTAMPS 30 #define DO_PRINTF 0
/* adapted from glibc sys/time.h timersub() macro */ #define priv_timespecsub(a, b, result) \ do { \ (result)->tv_sec = (a)->tv_sec - (b)->tv_sec; \ (result)->tv_nsec = (a)->tv_nsec - (b)->tv_nsec; \ if ((result)->tv_nsec < 0) { \ --(result)->tv_sec; \ (result)->tv_nsec += 1000000000; \ } \ } while (0)
#define priv_timespecadd(a, b, result) \ do { \ (result)->tv_sec = (a)->tv_sec + (b)->tv_sec; \ (result)->tv_nsec = (a)->tv_nsec + (b)->tv_nsec; \ if ((result)->tv_nsec >=1000000000L) { \ ++(result)->tv_sec; \ (result)->tv_nsec -= 1000000000L; \ } \ } while (0)
pthread_mutex_t timer_mutex = PTHREAD_MUTEX_INITIALIZER; pthread_cond_t timer_condition = PTHREAD_COND_INITIALIZER;
int main(void) { int t, len, ret; struct timespec start, now, period_time, delta, timeout; struct timespec prev, delta_prev; double delaystamps[DELAYSTAMPS]; int i;
// one period is 20ms period_time.tv_sec=0; period_time.tv_nsec = 20 * 1000000L;
start.tv_sec=0; start.tv_nsec=0;
clock_gettime(CLOCK_REALTIME, &start);
for (i=0; i<DELAYSTAMPS; i++) { memcpy(&prev, &start, sizeof(struct timespec)); clock_gettime(CLOCK_REALTIME, &start);
priv_timespecadd(&start, &period_time, &timeout); //printf("timeout at %li:%li\n", timeout.tv_sec, timeout.tv_nsec/1000000L);
priv_timespecsub(&start, &prev, &delta_prev);
#if (DO_PRINTF==1) printf("last action %fms ago\n", delta_prev.tv_sec*1000.0 + delta_prev.tv_nsec/1000000.0); #endif
delaystamps[i]=delta_prev.tv_sec*1000.0 + delta_prev.tv_nsec/1000000.0;
// -------------------------- BEGIN PROCESSING BLOCK --------------------------
// process data (requires some amount of time) usleep(10000);
// -------------------------- END PROCESSING BLOCK -------------------------- // sleep for the rest of the time until the period is over clock_gettime(CLOCK_REALTIME, &now); priv_timespecsub(&timeout, &now, &delta);
if ((now.tv_sec >= timeout.tv_sec) && (now.tv_nsec >= timeout.tv_nsec)) { //printf("we're late, don't sleep\n"); } else { #if (DO_PRINTF==1) printf("sleeping for %lims\n", delta.tv_sec*1000 + delta.tv_nsec/1000000L); #endif //nanosleep(&delta, NULL);
// wait for timeout using a fake pthread condition which is never signaled pthread_mutex_lock(&timer_mutex); pthread_cond_timedwait(&timer_condition, &timer_mutex, &timeout); pthread_mutex_unlock(&timer_mutex); }
}
// print delay timestamps for (i=0; i<DELAYSTAMPS; i++) { printf("delay[%2i]: %fms", i, delaystamps[i]); if (delaystamps[i] > 21) printf("\t--> thats too long :("); printf("\n"); }
return 0; }
----------------------------------------------------------------------
participants (4)
-
Alex Austin
-
Mads Kiilerich
-
Mark Brown
-
Stefan Schoenleitner