[alsa-devel] Too many snd_pcm_mmap_readi calls
Hi ALSA-Dev-List
I am very new to all this ALSA-Dev-stuff. For a few decades I have been just an alsa user but now i try to develop a tiny sound recorder like millions or thousands did before. On the recorder there is a small while-loop for capturing the pcm data, encode them and send them to a server every 10ms. On my desktop-pc and on a Freescale i.mx 233 (ARMv5) the recorder is working quite well. Now i run into a problem where the recorder is using all the cpu-time on the arm-device for a few minutes. I tried to debug this behaviour without any luck. Next i tried to trace the problem and removed as many calls as possible and put a "gettimeofday" between every remaining call. Here is the remaining code: while(1) {
sleep(1); gettimeofday(&time1,NULL);
snd_pcm_wait( pcm_handle, REC_BUFFER_ELEMENT_SIZE_MS + 10); gettimeofday(&time2,NULL);
snd_pcm_mmap_begin(pcm_handle, &my_areas, &offset, &frames); gettimeofday(&time2,NULL);
snd_pcm_avail_update ( pcm_handle ); gettimeofday(&time2,NULL);
snd_pcm_mmap_readi( pcm_handle, network_data.data_UC, ALSA_READ_FRAMES); gettimeofday(&time3,NULL);
snd_pcm_mmap_commit(pcm_handle, offset, frames); gettimeofday(&time4,NULL); }
This is the output from strace:
0.001062 [401b418c] nanosleep({1, 0}, 0xbebd017c) = 0 1.001190 [401a4dac] gettimeofday({1399380864, 459211}, NULL) = 0 0.001032 [401de46c] ioctl(3, SNDRV_PCM_IOCTL_SYNC_PTR, 0x4e1b0) = 0 0.002780 [401a4dac] gettimeofday({1399380864, 463023}, NULL) = 0 0.001063 [401a4dac] gettimeofday({1399380864, 464086}, NULL) = 0 0.001031 [401de46c] ioctl(3, SNDRV_PCM_IOCTL_SYNC_PTR, 0x4e1b0) = 0 0.000968 [401a4dac] gettimeofday({1399380864, 466085}, NULL) = 0 0.001032 [401de46c] ioctl(3, SNDRV_PCM_IOCTL_SYNC_PTR, 0x4e1b0) = 0 0.000968 [401de46c] ioctl(3, SNDRV_PCM_IOCTL_SYNC_PTR, 0x4e1b0) = 0 0.000969 [401de46c] ioctl(3, SNDRV_PCM_IOCTL_SYNC_PTR, 0x4e1b0) = 0 0.001000 [401de46c] ioctl(3, SNDRV_PCM_IOCTL_SYNC_PTR, 0x4e1b0) = 0 0.002249 [401a4dac] gettimeofday({1399380864, 472678}, NULL) = 0 0.001438 [401a4dac] gettimeofday({1399380864, 473740}, NULL) = 0 0.001031 [40149518] rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 0.001218 [40149400] rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0 0.001562 [40149518] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 0.001032 [401b418c] nanosleep({1, 0}, ^CProcess 14219 detached
As you can see, the output starts with the sleep. I just added the sleep to find my loop-start in the trace-outpus. Before every gettimeofday there is one call to a snd_pcm-function (ok, besides the sleep). There is a block of four ioctl calls between the fourth and the fifth gettimeofday. This sould be the sound_pcm_mmap_readi. In the while loop there is only one call to sound_pcm_mmap_read and on my desktop-pc trace even shows only one call to sound_pcm_mmap_readi.
Maybe anyone can explain to me why the function is called that often on the arm-device? Maybe you can give some hints on what to check for the high cpu utilization?
Any hints would be kindly appreciated.
Thanks in advance, Jan Homann
Jan Homann wrote:
while(1) { sleep(1); snd_pcm_wait( pcm_handle, REC_BUFFER_ELEMENT_SIZE_MS + 10); snd_pcm_mmap_begin(pcm_handle, &my_areas, &offset, &frames); snd_pcm_avail_update ( pcm_handle ); snd_pcm_mmap_readi( pcm_handle, network_data.data_UC, ALSA_READ_FRAMES); snd_pcm_mmap_commit(pcm_handle, offset, frames); }
The sleep will result in an overrun, which will throw off the timing of all following calls.
You are not checking the return value of any of these functions.
Why are you calling snd_pcm_avail_update? And why are you doing this _after_ calling mmap_begin?
You are ignoring the number of frames returned by mmap_begin. This value could be smaller or larger than ALSA_READ_FRAMES.
Why are you using mmap in the first place, when you don't even try to access the buffer directly?
Regards, Clemens
Arg, i am sorry for the enormouse amount of errors in eight lines of code. Is there any "Errors per line"-List? Maybe there is a new leader by now.
I added the sleep to see the start of one while-loop in the strace output. Not every function appears in the strace. Just to see where the loop starts i added a function and sleep came into my mind first. In my real function i am checking the return values of snd_pcm_wait and snd_pcm_mmap_readi. The return value of snd_pcm_mmap_readi is used to encode the amount of captured samples with opus.
In my opinion mmap_begin just tells the alsa-lib that someone is about to access the buffer. With avail_update all buffer states (whatever they are) are updated and mmap_readi gets the samples to network_data.data_UC. After that, mmap_commit tells the alsa-lib that i am ready with the buffer. Maybe i am wrong with that. Ok, for sure i am wrong with that.
I am using mmap because of its good performance. Under normal condition with snd_pcm_mmap_readi my program uses about 1% of cpu-time where snd_pcm_readi uses about 75%.
Do you know a good tutorial about using mmap_readi?
Am 07.05.2014 11:01, schrieb Clemens Ladisch:
Jan Homann wrote:
while(1) { sleep(1); snd_pcm_wait( pcm_handle, REC_BUFFER_ELEMENT_SIZE_MS + 10); snd_pcm_mmap_begin(pcm_handle, &my_areas, &offset, &frames); snd_pcm_avail_update ( pcm_handle ); snd_pcm_mmap_readi( pcm_handle, network_data.data_UC, ALSA_READ_FRAMES); snd_pcm_mmap_commit(pcm_handle, offset, frames); }
The sleep will result in an overrun, which will throw off the timing of all following calls.
You are not checking the return value of any of these functions.
Why are you calling snd_pcm_avail_update? And why are you doing this _after_ calling mmap_begin?
You are ignoring the number of frames returned by mmap_begin. This value could be smaller or larger than ALSA_READ_FRAMES.
Why are you using mmap in the first place, when you don't even try to access the buffer directly?
Regards, Clemens
Date 7.5.2014 11:52, Jan Homann wrote:
Arg, i am sorry for the enormouse amount of errors in eight lines of code. Is there any "Errors per line"-List? Maybe there is a new leader by now.
I added the sleep to see the start of one while-loop in the strace output. Not every function appears in the strace. Just to see where the loop starts i added a function and sleep came into my mind first. In my real function i am checking the return values of snd_pcm_wait and snd_pcm_mmap_readi. The return value of snd_pcm_mmap_readi is used to encode the amount of captured samples with opus.
In my opinion mmap_begin just tells the alsa-lib that someone is about to access the buffer. With avail_update all buffer states (whatever they are) are updated and mmap_readi gets the samples to network_data.data_UC. After that, mmap_commit tells the alsa-lib that i am ready with the buffer. Maybe i am wrong with that. Ok, for sure i am wrong with that.
I am using mmap because of its good performance. Under normal condition with snd_pcm_mmap_readi my program uses about 1% of cpu-time where snd_pcm_readi uses about 75%.
The difference should not be so big between MMAP and normal I/O. It seems like a loop somewhere or you set bad thresholds so kernel wakes up very often. You may also use a different schedule timer for the non-block mode.
The ARM architecture does not have coherent caches, so the alsa-lib tries to sychronize the pointers using ioctl.
Do you know a good tutorial about using mmap_readi?
Am 07.05.2014 11:01, schrieb Clemens Ladisch:
Jan Homann wrote:
while(1) { sleep(1); snd_pcm_wait( pcm_handle, REC_BUFFER_ELEMENT_SIZE_MS + 10); snd_pcm_mmap_begin(pcm_handle, &my_areas, &offset, &frames); snd_pcm_avail_update ( pcm_handle ); snd_pcm_mmap_readi( pcm_handle, network_data.data_UC, ALSA_READ_FRAMES); snd_pcm_mmap_commit(pcm_handle, offset, frames); }
Remove snd_pcm_mmap_begin(), snd_pcm_mmap_commit() calls.
The snd_pcm_mmap_readi() does this in a loop for you. Only update+readi is enough.
Jaroslav
Jan Homann wrote:
In my opinion mmap_begin just tells the alsa-lib that someone is about to access the buffer.
With mmap_begin, you ask the device how many frames are available in the buffer (and get a pointer to them). With mmap_commit, you tell the device how many frames you actually read.
I am using mmap because of its good performance.
Using mmap makes sense only when you are accessing the samples in the buffer directly. If you just copy the frames into your own buffer, there is *no* difference whatsoever to snd_pcm_readi.
Under normal condition with snd_pcm_mmap_readi my program uses about 1% of cpu-time where snd_pcm_readi uses about 75%.
It's likely that you do not correctly wait for frames being available in your real program.
Do you know a good tutorial about using mmap_readi?
This: use snd_pcm_readi instead.
Regards, Clemens
Am 07.05.2014 14:58, schrieb Clemens Ladisch:
Jan Homann wrote:
In my opinion mmap_begin just tells the alsa-lib that someone is about to access the buffer.
With mmap_begin, you ask the device how many frames are available in the buffer (and get a pointer to them). With mmap_commit, you tell the device how many frames you actually read.
I am using mmap because of its good performance.
Using mmap makes sense only when you are accessing the samples in the buffer directly. If you just copy the frames into your own buffer, there is *no* difference whatsoever to snd_pcm_readi.
Under normal condition with snd_pcm_mmap_readi my program uses about 1% of cpu-time where snd_pcm_readi uses about 75%.
It's likely that you do not correctly wait for frames being available in your real program.
Do you know a good tutorial about using mmap_readi?
This: use snd_pcm_readi instead.
Regards, Clemens
Thank you for your explanation. Obviously, there was an error in my initialization which caused the high cpu-load while using snd_pcm_readi. After fixing this my programm is running nice and quiet. For almost 8 minutes. After this time the cpu-load goes up again. The alsa buffer should contain 480 samples but contains something between 1920 and 6720 samples. The debug output prints several timestamps. The time for one cycle through the while-loop is captured in looptime. Under normal circumstances this one is around 10ms as I expect. After about 46.500 loops (which is something like 7.75minutes) the looptime is up to 120ms. The time is used not at the snd_pcm_wait but at the snd_pcm_readi. My ARM-Device is running a Preempt Linux Kernel 2.6.35-8 and Arch Linux.
Any hints would be kindly appreciated.
Thanks in advance, Jan Homann
P.S.: i appended the while-loop which is just in fact doing nothing then waiting and reading. The above described behaviour occures even when doing nothing with the captured pcm-data. The debug-output appears never in my log-files.
while(1) {
cnt++;
err = snd_pcm_wait( sound_handler, 20 ); switch( err ) { case 1: //normal operation break; case 0: //timeout occurred - go on with a new loop if( global.debug ) printf("%s:%u snd_pcm_wait timeout (%s), error-code: %d\n", __FILE__, __LINE__, snd_strerror(err), err); continue; break; case -EPIPE: if( global.debug ) printf("%s:%u snd_pcm_wait XRUN (%s), error-code: %d\n", __FILE__, __LINE__, snd_strerror(err), err); break; case -ESTRPIPE: if( global.debug ) printf("%s:%u snd_pcm_wait suspended (%s), error-code: %d\n", __FILE__, __LINE__, snd_strerror(err), err); break; default: if( global.debug ) printf("%s:%u snd_pcm_wait error (%s), error-code: %d\n", __FILE__, __LINE__, snd_strerror(err), err); break; }
if( global.debug ) gettimeofday(&time2,NULL); //time needed for alsa wait testtime1 = (time2.tv_usec - time1.tv_usec + (time2.tv_sec - time1.tv_sec) * 1000000);
//get the frames out of the soundcardbuffer memset( alsa_out_pcm_bytes_U8, 0, sizeof(alsa_out_pcm_bytes_U8) );
frames_to_deliver = snd_pcm_avail_update(sound_handler);
frames_to_deliver = min(frames_to_deliver, 480); network_data.read_frames = snd_pcm_readi( sound_handler, alsa_out_pcm_bytes_U8, frames_to_deliver );
if( network_data.read_frames < 0 ) { switch( network_data.read_frames ) { case -EAGAIN: //there is no data in driverbuffer available - wait again by starting the main loop again if( global.debug ) printf("%s:%u EAGAIN on reading - driverbuffer empty! (%s), error-code: %ld\n", __FILE__, __LINE__,\ snd_strerror(network_data.read_frames), network_data.read_frames); continue; break; case -EPIPE: if( global.debug ) printf("<<<<<<<<<<<<<<< Buffer READ Overrun >>>>>>>>>>>>>>>\n"); if( global.debug ) printf("%s:%u (%s), error-code: %ld\n",__FILE__, __LINE__,\ snd_strerror(network_data.read_frames), network_data.read_frames); continue; break; default: if( global.debug ) printf("%s:%u error reading (%s), error-code: %ld\n", __FILE__, __LINE__,\ snd_strerror(network_data.read_frames), network_data.read_frames); continue; break; } }
if( global.debug ) {
gettimeofday(&time3,NULL); //time needed for read testtime2 = (time3.tv_usec - time2.tv_usec + (time3.tv_sec - time2.tv_sec) * 1000000);
//sum of times for one cycle looptime = (time3.tv_usec - time1.tv_usec + (time3.tv_sec - time1.tv_sec) * 1000000);
//looptime: Time for one cycle //testtime1: Time for alsa wait //testtime2: Time for alsa read printf("%06d L:%05d T1:%05d T2:%05d %ld %ld %d\n", \ cnt, \ looptime, testtime1,testtime2, \ network_data.read_frames, frames_to_deliver, err);
if (looptime > 15000){ printf("ERR %06d L:%05d T1:%05d T2:%05d %ld %ld %d\n", \ cnt, \ looptime, testtime1,testtime2, \ network_data.read_frames, frames_to_deliver, err); }
if( global.debug ) gettimeofday(&time1,NULL); } } }
participants (3)
-
Clemens Ladisch
-
Jan Homann
-
Jaroslav Kysela