[alsa-devel] Assertion failure in alsa-lib crashing applications (snd_pcm_area_copy: Assertion `dst < src || dst >= src + bytes' failed)
Hi,
When using Ring [1], an audio-video application, it sometimes crashes because of an assertion in libasound.so:
pcm.c:2758: snd_pcm_area_copy: Assertion `dst < src || dst >= src + bytes' failed.
The complete upstream bug report is at [2]. The (not-so-useful) stack trace is the following:
#0 0x00007fc356e462a8 raise (libc.so.6) #1 0x00007fc356e4772a abort (libc.so.6) #2 0x00007fc356e3f1b7 __assert_fail_base (libc.so.6) #3 0x00007fc356e3f262 __assert_fail (libc.so.6) #4 0x00007fc35ba65c84 snd_pcm_area_copy (libasound.so.2) #5 0x00007fc35ba65ebc snd_pcm_areas_copy (libasound.so.2) #6 0x00007fc35baa6d95 n/a (libasound.so.2) #7 0x00007fc35ba77c1f n/a (libasound.so.2) #8 0x00007fc35ba77a59 n/a (libasound.so.2) #9 0x000000000051ba97 n/a (dring) #10 0x000000000051c118 n/a (dring) #11 0x000000000051cfad n/a (dring) #12 0x00007fc357787c40 execute_native_thread_routine (libstdc++.so.6) #13 0x00007fc359ec2424 start_thread (libpthread.so.0) #14 0x00007fc356efacbd __clone (libc.so.6)
The next time it crashes, I will have debug symbols in dring. I can also recompile alsa-lib with debug symbols, what is the best way to do that?
Looking around, it seems other projects have run into the same issue:
https://aur.archlinux.org/packages/ultrastardx-git/?comments=all#comment-435... https://aur.archlinux.org/packages/zoom/#comment-544696 http://ubuntuforums.org/showthread.php?t=2248373 https://github.com/js-platform/node-webrtc/issues/110 https://fedorahosted.org/fldigi/ticket/70
The output of the alsa-info.sh script on my system is at [3]. What else can I provide to debug this issue further?
Thanks, Baptiste
[1] https://ring.cx [2] https://tuleap.ring.cx/plugins/tracker/?aid=502 [3] http://www.alsa-project.org/db/?f=0dd2ba1021b3d535f30f07c55dc18e2ef60db26d
Hi,
On Tue, Jun 07, 2016 at 07:29:35PM +0200, Baptiste Jonglez wrote:
When using Ring [1], an audio-video application, it sometimes crashes because of an assertion in libasound.so:
pcm.c:2758: snd_pcm_area_copy: Assertion `dst < src || dst >= src + bytes' failed.
The complete upstream bug report is at [2]. The (not-so-useful) stack trace is the following:
<snip>
The next time it crashes, I will have debug symbols in dring. I can also recompile alsa-lib with debug symbols, what is the best way to do that?
As promised, here is a new stack trace with debug symbols:
#0 0x00007f899af94295 in raise () from /usr/lib/libc.so.6 #1 0x00007f899af956da in abort () from /usr/lib/libc.so.6 #2 0x00007f899af8d297 in __assert_fail_base () from /usr/lib/libc.so.6 #3 0x00007f899af8d342 in __assert_fail () from /usr/lib/libc.so.6 #4 0x00007f89a003b04d in snd_pcm_area_copy (dst_area=0x7f89847f3360, dst_offset=<optimized out>, src_area=0x7f89847f3350, src_offset=<optimized out>, samples=<optimized out>, format=<optimized out>) at pcm.c:2758 #5 0x00007f89a003b27c in snd_pcm_areas_copy (dst_areas=0x7f896c002dd0, dst_areas@entry=0x7f896c002db0, dst_offset=248, src_areas=0x7f896c002dd0, src_areas@entry=0x610, src_offset=0, channels=2, channels@entry=0, frames=1552, frames@entry=0, format=SND_PCM_FORMAT_S32_LE) at pcm.c:2904 #6 0x00007f89a007c2d9 in softvol_convert_stereo_vol (svol=svol@entry=0x7f896c0013c0, dst_areas=dst_areas@entry=0x7f896c002db0, dst_offset=dst_offset@entry=248, src_areas=0x610, src_areas@entry=0x7f896c002db0, src_offset=src_offset@entry=0, channels=0, frames=<optimized out>) at pcm_softvol.c:291 #7 0x00007f89a007c3f6 in snd_pcm_softvol_read_areas (pcm=0x7f896c001b10, areas=0x7f896c002db0, offset=248, size=1552, slave_areas=0x7f896c002db0, slave_offset=0, slave_sizep=0x7f89847f3480) at pcm_softvol.c:630 #8 0x00007f89a004d0e1 in snd_pcm_plugin_avail_update (pcm=0x7f896c001b10) at pcm_plugin.c:490 #9 0x00007f89a004cf19 in snd_pcm_plugin_avail_update (pcm=0x7f896c001eb0) at pcm_plugin.c:460 #10 0x00000000005486d4 in ring::AlsaLayer::capture (this=0x1a9ef00) at alsalayer.cpp:694 #11 0x0000000000545de4 in ring::AlsaThread::run (this=0x1b00500) at alsalayer.cpp:137 <snip> (horrible C++ stack trace from the application itself)
Is there anything in there that could explain the assertion failure? Perhaps a wrong API usage?
Thanks, Baptiste
Looking around, it seems other projects have run into the same issue:
https://aur.archlinux.org/packages/ultrastardx-git/?comments=all#comment-435458 https://aur.archlinux.org/packages/zoom/#comment-544696 http://ubuntuforums.org/showthread.php?t=2248373 https://github.com/js-platform/node-webrtc/issues/110 https://fedorahosted.org/fldigi/ticket/70
The output of the alsa-info.sh script on my system is at [3]. What else can I provide to debug this issue further?
Thanks, Baptiste
[1] https://ring.cx [2] https://tuleap.ring.cx/plugins/tracker/?aid=502 [3] http://www.alsa-project.org/db/?f=0dd2ba1021b3d535f30f07c55dc18e2ef60db26d
Alsa-devel mailing list Alsa-devel@alsa-project.org http://mailman.alsa-project.org/mailman/listinfo/alsa-devel
On Tue, Jun 14, 2016 at 11:02:21PM +0200, Baptiste Jonglez wrote:
Hi,
On Tue, Jun 07, 2016 at 07:29:35PM +0200, Baptiste Jonglez wrote:
When using Ring [1], an audio-video application, it sometimes crashes because of an assertion in libasound.so:
pcm.c:2758: snd_pcm_area_copy: Assertion `dst < src || dst >= src + bytes' failed.
The complete upstream bug report is at [2]. The (not-so-useful) stack trace is the following:
<snip>
The next time it crashes, I will have debug symbols in dring. I can also recompile alsa-lib with debug symbols, what is the best way to do that?
As promised, here is a new stack trace with debug symbols:
<snip>
Sorry, this stack trace still had some arguments optimised out. Here is the stack trace without optimisation:
#0 0x00007f42a8696295 in raise () from /usr/lib/libc.so.6 #1 0x00007f42a86976da in abort () from /usr/lib/libc.so.6 #2 0x00007f42a868f297 in __assert_fail_base () from /usr/lib/libc.so.6 #3 0x00007f42a868f342 in __assert_fail () from /usr/lib/libc.so.6 #4 0x00007f42ad7463da in snd_pcm_area_copy (dst_area=0x7f4290ced250, dst_offset=144, src_area=0x7f4290ced260, src_offset=0, samples=0, format=SND_PCM_FORMAT_S32_LE) at pcm.c:2758 #5 0x00007f42ad746918 in snd_pcm_areas_copy (dst_areas=0x7f427c000e90, dst_offset=72, src_areas=0x7f427c000e90, src_offset=0, channels=2, frames=1904, format=SND_PCM_FORMAT_S32_LE) at pcm.c:2904 #6 0x00007f42ad79cb66 in softvol_convert_stereo_vol (svol=0x7f427c02f500, dst_areas=0x7f427c000e70, dst_offset=72, src_areas=0x7f427c000e70, src_offset=0, channels=2, frames=1904) at pcm_softvol.c:291 #7 0x00007f42ad79e081 in snd_pcm_softvol_read_areas (pcm=0x7f427c031080, areas=0x7f427c000e70, offset=72, size=1904, slave_areas=0x7f427c000e70, slave_offset=0, slave_sizep=0x7f4290ced408) at pcm_softvol.c:630 #8 0x00007f42ad75e367 in snd_pcm_plugin_avail_update (pcm=0x7f427c031080) at pcm_plugin.c:490 #9 0x00007f42ad745c3e in snd_pcm_avail_update (pcm=0x7f427c031080) at pcm.c:2508 #10 0x00007f42ad75e214 in snd_pcm_plugin_avail_update (pcm=0x7f427c02fc70) at pcm_plugin.c:460 #11 0x00007f42ad745c3e in snd_pcm_avail_update (pcm=0x7f427c02faa0) at pcm.c:2508 #12 0x00000000005486d4 in ring::AlsaLayer::capture (this=0x1978220) at alsalayer.cpp:694 #13 0x0000000000545de4 in ring::AlsaThread::run (this=0x19706f0) at alsalayer.cpp:137
Attached is a full stack trace with local variables. I can provide the core dump privately.
Also note that this happens when the application is under high CPU pressure (for instance other programs with higher priority are using all available CPU cores).
The calling code into the lib is here, when capturing from a device:
https://github.com/savoirfairelinux/ring-daemon/blob/master/src/media/audio/...
Baptiste
Is there anything in there that could explain the assertion failure? Perhaps a wrong API usage?
Thanks, Baptiste
Looking around, it seems other projects have run into the same issue:
https://aur.archlinux.org/packages/ultrastardx-git/?comments=all#comment-435458 https://aur.archlinux.org/packages/zoom/#comment-544696 http://ubuntuforums.org/showthread.php?t=2248373 https://github.com/js-platform/node-webrtc/issues/110 https://fedorahosted.org/fldigi/ticket/70
The output of the alsa-info.sh script on my system is at [3]. What else can I provide to debug this issue further?
Thanks, Baptiste
[1] https://ring.cx [2] https://tuleap.ring.cx/plugins/tracker/?aid=502 [3] http://www.alsa-project.org/db/?f=0dd2ba1021b3d535f30f07c55dc18e2ef60db26d
Alsa-devel mailing list Alsa-devel@alsa-project.org http://mailman.alsa-project.org/mailman/listinfo/alsa-devel
Alsa-devel mailing list Alsa-devel@alsa-project.org http://mailman.alsa-project.org/mailman/listinfo/alsa-devel
Is there anything else I can provide to help getting this issue fixed?
The main issue is that snd_pcm_area_copy() gets called with overlapping source and destination buffers, which triggers this assertion before a memcpy:
http://git.alsa-project.org/?p=alsa-lib.git;a=blob;f=src/pcm/pcm.c;h=0d0d093...
This happens when the system is really loaded (CPU-wise).
Do you think the problem is likely to come from an incorrect API usage from the application? Knowing this would significantly narrow down the search, because the problem could also come from within alsa itself.
The callling code from the application is quite complex, see here:
https://github.com/savoirfairelinux/ring-daemon/blob/master/src/media/audio/...
In particular, the assertion is triggered by the call to snd_pcm_avail_update() at line 694 (see the stack trace below).
Thanks, Baptiste
On Wed, Jun 15, 2016 at 12:39:39AM +0200, Baptiste Jonglez wrote:
On Tue, Jun 14, 2016 at 11:02:21PM +0200, Baptiste Jonglez wrote:
Hi,
On Tue, Jun 07, 2016 at 07:29:35PM +0200, Baptiste Jonglez wrote:
When using Ring [1], an audio-video application, it sometimes crashes because of an assertion in libasound.so:
pcm.c:2758: snd_pcm_area_copy: Assertion `dst < src || dst >= src + bytes' failed.
The complete upstream bug report is at [2]. The (not-so-useful) stack trace is the following:
<snip>
The next time it crashes, I will have debug symbols in dring. I can also recompile alsa-lib with debug symbols, what is the best way to do that?
As promised, here is a new stack trace with debug symbols:
<snip>
Sorry, this stack trace still had some arguments optimised out. Here is the stack trace without optimisation:
#0 0x00007f42a8696295 in raise () from /usr/lib/libc.so.6 #1 0x00007f42a86976da in abort () from /usr/lib/libc.so.6 #2 0x00007f42a868f297 in __assert_fail_base () from /usr/lib/libc.so.6 #3 0x00007f42a868f342 in __assert_fail () from /usr/lib/libc.so.6 #4 0x00007f42ad7463da in snd_pcm_area_copy (dst_area=0x7f4290ced250, dst_offset=144, src_area=0x7f4290ced260, src_offset=0, samples=0, format=SND_PCM_FORMAT_S32_LE) at pcm.c:2758 #5 0x00007f42ad746918 in snd_pcm_areas_copy (dst_areas=0x7f427c000e90, dst_offset=72, src_areas=0x7f427c000e90, src_offset=0, channels=2, frames=1904, format=SND_PCM_FORMAT_S32_LE) at pcm.c:2904 #6 0x00007f42ad79cb66 in softvol_convert_stereo_vol (svol=0x7f427c02f500, dst_areas=0x7f427c000e70, dst_offset=72, src_areas=0x7f427c000e70, src_offset=0, channels=2, frames=1904) at pcm_softvol.c:291 #7 0x00007f42ad79e081 in snd_pcm_softvol_read_areas (pcm=0x7f427c031080, areas=0x7f427c000e70, offset=72, size=1904, slave_areas=0x7f427c000e70, slave_offset=0, slave_sizep=0x7f4290ced408) at pcm_softvol.c:630 #8 0x00007f42ad75e367 in snd_pcm_plugin_avail_update (pcm=0x7f427c031080) at pcm_plugin.c:490 #9 0x00007f42ad745c3e in snd_pcm_avail_update (pcm=0x7f427c031080) at pcm.c:2508 #10 0x00007f42ad75e214 in snd_pcm_plugin_avail_update (pcm=0x7f427c02fc70) at pcm_plugin.c:460 #11 0x00007f42ad745c3e in snd_pcm_avail_update (pcm=0x7f427c02faa0) at pcm.c:2508 #12 0x00000000005486d4 in ring::AlsaLayer::capture (this=0x1978220) at alsalayer.cpp:694 #13 0x0000000000545de4 in ring::AlsaThread::run (this=0x19706f0) at alsalayer.cpp:137
Attached is a full stack trace with local variables. I can provide the core dump privately.
Also note that this happens when the application is under high CPU pressure (for instance other programs with higher priority are using all available CPU cores).
The calling code into the lib is here, when capturing from a device:
https://github.com/savoirfairelinux/ring-daemon/blob/master/src/media/audio/...
Baptiste
Is there anything in there that could explain the assertion failure? Perhaps a wrong API usage?
Thanks, Baptiste
Looking around, it seems other projects have run into the same issue:
https://aur.archlinux.org/packages/ultrastardx-git/?comments=all#comment-435458 https://aur.archlinux.org/packages/zoom/#comment-544696 http://ubuntuforums.org/showthread.php?t=2248373 https://github.com/js-platform/node-webrtc/issues/110 https://fedorahosted.org/fldigi/ticket/70
The output of the alsa-info.sh script on my system is at [3]. What else can I provide to debug this issue further?
Thanks, Baptiste
[1] https://ring.cx [2] https://tuleap.ring.cx/plugins/tracker/?aid=502 [3] http://www.alsa-project.org/db/?f=0dd2ba1021b3d535f30f07c55dc18e2ef60db26d
Alsa-devel mailing list Alsa-devel@alsa-project.org http://mailman.alsa-project.org/mailman/listinfo/alsa-devel
Alsa-devel mailing list Alsa-devel@alsa-project.org http://mailman.alsa-project.org/mailman/listinfo/alsa-devel
d7463da in snd_pcm_area_copy (dst_area=0x7f4290ced250, dst_offset=144, src_area=0x7f4290ced260, src_offset=0, samples=0, format=SND_PCM_FORMAT_S32_LE) at pcm.c:2758 bytes = 15232 src = 0x7f427c0316b0 "" dst = 0x7f427c0318f0 "" width = 32 src_step = 32578 dst_step = -1865493936 __PRETTY_FUNCTION__ = "snd_pcm_area_copy" __FUNCTION__ = "snd_pcm_area_copy" #5 0x00007f42ad746918 in snd_pcm_areas_copy (dst_areas=0x7f427c000e90, dst_offset=72, src_areas=0x7f427c000e90, src_offset=0, channels=2, frames=1904, format=SND_PCM_FORMAT_S32_LE) at pcm.c:2904 s = {addr = 0x7f427c0316b0, first = 0, step = 32} d = {addr = 0x7f427c0316b0, first = 0, step = 32} step = 64 src_addr = 0x7f427c0316b0 src_start = 0x7f427c000e70 channels1 = 0 chns = 2 dst_addr = 0x7f427c0316b0 dst_start = 0x7f427c000e70 width = 32 __PRETTY_FUNCTION__ = "snd_pcm_areas_copy" __FUNCTION__ = "snd_pcm_areas_copy" #6 0x00007f42ad79cb66 in softvol_convert_stereo_vol (svol=0x7f427c02f500, dst_areas=0x7f427c000e70, dst_offset=72, src_areas=0x7f427c000e70, src_offset=0, channels=2, frames=1904) at pcm_softvol.c:291 dst_area = 0x7f427c02f560 src_area = 0x7f427c031380 src_step = 32578 dst_step = 2909997239 vol_scale = 32578 vol = {2429473552, 1} vol_c = 2080568576 #7 0x00007f42ad79e081 in snd_pcm_softvol_read_areas (pcm=0x7f427c031080, areas=0x7f427c000e70, offset=72, size=1904, slave_areas=0x7f427c000e70, slave_offset=0, slave_sizep=0x7f4290ced408) at pcm_softvol.c:630 svol = 0x7f427c02f500 #8 0x00007f42ad75e367 in snd_pcm_plugin_avail_update (pcm=0x7f427c031080) at pcm_plugin.c:490 frames = 1904 cont = 1976 slave_frames = 2048 err = 0 slave_areas = 0x7f427c000e70 slave_offset = 0 result = 72 areas = 0x7f427c000e70 xfer = 144 hw_offset = 72 size = 1904 plugin = 0x7f427c02f500 slave = 0x7f427c030640 slave_size = 2472 #9 0x00007f42ad745c3e in snd_pcm_avail_update (pcm=0x7f427c031080) at pcm.c:2508 No locals. #10 0x00007f42ad75e214 in snd_pcm_plugin_avail_update (pcm=0x7f427c02fc70) at pcm_plugin.c:460 plugin = 0x7f427c031310 slave = 0x7f427c031080 slave_size = 9255600 #11 0x00007f42ad745c3e in snd_pcm_avail_update (pcm=0x7f427c02faa0) at pcm.c:2508 No locals. #12 0x00000000005486d4 in ring::AlsaLayer::capture (this=0x1978220) at alsalayer.cpp:694 mainBufferFormat = {sample_rate = 48000, nb_channels = 2, static DEFAULT_SAMPLE_RATE = 48000} toGetFrames = 32578 framesPerBufferAlsa = 5540553 #13 0x0000000000545de4 in ring::AlsaThread::run (this=0x19706f0) at alsalayer.cpp:137 No locals. d7463da in snd_pcm_area_copy (dst_area=0x7f4290ced250, dst_offset=144, src_area=0x7f4290ced260, src_offset=0, samples=0, format=SND_PCM_FORMAT_S32_LE) at pcm.c:2758 bytes = 15232 src = 0x7f427c0316b0 "" dst = 0x7f427c0318f0 "" width = 32 src_step = 32578 dst_step = -1865493936 __PRETTY_FUNCTION__ = "snd_pcm_area_copy" __FUNCTION__ = "snd_pcm_area_copy" #5 0x00007f42ad746918 in snd_pcm_areas_copy (dst_areas=0x7f427c000e90, dst_offset=72, src_areas=0x7f427c000e90, src_offset=0, channels=2, frames=1904, format=SND_PCM_FORMAT_S32_LE) at pcm.c:2904 s = {addr = 0x7f427c0316b0, first = 0, step = 32} d = {addr = 0x7f427c0316b0, first = 0, step = 32} step = 64 src_addr = 0x7f427c0316b0 src_start = 0x7f427c000e70 channels1 = 0 chns = 2 dst_addr = 0x7f427c0316b0 dst_start = 0x7f427c000e70 width = 32 __PRETTY_FUNCTION__ = "snd_pcm_areas_copy" __FUNCTION__ = "snd_pcm_areas_copy" #6 0x00007f42ad79cb66 in softvol_convert_stereo_vol (svol=0x7f427c02f500, dst_areas=0x7f427c000e70, dst_offset=72, src_areas=0x7f427c000e70, src_offset=0, channels=2, frames=1904) at pcm_softvol.c:291 dst_area = 0x7f427c02f560 src_area = 0x7f427c031380 src_step = 32578 dst_step = 2909997239 vol_scale = 32578 vol = {2429473552, 1} vol_c = 2080568576 #7 0x00007f42ad79e081 in snd_pcm_softvol_read_areas (pcm=0x7f427c031080, areas=0x7f427c000e70, offset=72, size=1904, slave_areas=0x7f427c000e70, slave_offset=0, slave_sizep=0x7f4290ced408) at pcm_softvol.c:630 svol = 0x7f427c02f500 #8 0x00007f42ad75e367 in snd_pcm_plugin_avail_update (pcm=0x7f427c031080) at pcm_plugin.c:490 frames = 1904 cont = 1976 slave_frames = 2048 err = 0 slave_areas = 0x7f427c000e70 slave_offset = 0 result = 72 areas = 0x7f427c000e70 xfer = 144 hw_offset = 72 size = 1904 plugin = 0x7f427c02f500 slave = 0x7f427c030640 slave_size = 2472 #9 0x00007f42ad745c3e in snd_pcm_avail_update (pcm=0x7f427c031080) at pcm.c:2508 No locals. #10 0x00007f42ad75e214 in snd_pcm_plugin_avail_update (pcm=0x7f427c02fc70) at pcm_plugin.c:460 plugin = 0x7f427c031310 slave = 0x7f427c031080 slave_size = 9255600 #11 0x00007f42ad745c3e in snd_pcm_avail_update (pcm=0x7f427c02faa0) at pcm.c:2508 No locals. #12 0x00000000005486d4 in ring::AlsaLayer::capture (this=0x1978220) at alsalayer.cpp:694 mainBufferFormat = {sample_rate = 48000, nb_channels = 2, static DEFAULT_SAMPLE_RATE = 48000} toGetFrames = 32578 framesPerBufferAlsa = 5540553 #13 0x0000000000545de4 in ring::AlsaThread::run (this=0x19706f0) at alsalayer.cpp:137 No locals.
Alsa-devel mailing list Alsa-devel@alsa-project.org http://mailman.alsa-project.org/mailman/listinfo/alsa-devel
On Sun, 26 Jun 2016 20:28:32 +0200, Baptiste Jonglez wrote:
Is there anything else I can provide to help getting this issue fixed?
The main issue is that snd_pcm_area_copy() gets called with overlapping source and destination buffers, which triggers this assertion before a memcpy:
http://git.alsa-project.org/?p=alsa-lib.git;a=blob;f=src/pcm/pcm.c;h=0d0d093...
Right, this isn't supposed to happen.
This happens when the system is really loaded (CPU-wise).
Do you think the problem is likely to come from an incorrect API usage from the application? Knowing this would significantly narrow down the search, because the problem could also come from within alsa itself.
The callling code from the application is quite complex, see here:
https://github.com/savoirfairelinux/ring-daemon/blob/master/src/media/audio/...
In particular, the assertion is triggered by the call to snd_pcm_avail_update() at line 694 (see the stack trace below).
I have too little time to investigate on this issue, but judging from the fact that there has been no such a report until now, I guess it's specific to your code. But I can't say sure, of course.
Is the relevant code path accessed concurrently? The possible race is the first suspect I'd always think of in such a case.
Takashi
Hello Takashi,
On Wed, Jun 29, 2016 at 03:39:37PM +0200, Takashi Iwai wrote:
Do you think the problem is likely to come from an incorrect API usage from the application? Knowing this would significantly narrow down the search, because the problem could also come from within alsa itself.
The callling code from the application is quite complex, see here:
https://github.com/savoirfairelinux/ring-daemon/blob/master/src/media/audio/...
In particular, the assertion is triggered by the call to snd_pcm_avail_update() at line 694 (see the stack trace below).
I have too little time to investigate on this issue, but judging from the fact that there has been no such a report until now, I guess it's specific to your code. But I can't say sure, of course.
Well, I did find other projects having this issue:
https://aur.archlinux.org/packages/zoom/#comment-544696 http://ubuntuforums.org/showthread.php?t=2248373 https://github.com/js-platform/node-webrtc/issues/110 https://fedorahosted.org/fldigi/ticket/70
Although in most of these cases, the issue seems 100% reproducible (e.g. crash at startup). In our case, the assertion failure only happens when there is significant CPU load.
Is the relevant code path accessed concurrently? The possible race is the first suspect I'd always think of in such a case.
Good idea, I will check. Given it happens when the CPU is loaded, my guess is that "something" is done too late, which means that it works on "invalidated" data (this is all very vague, sorry).
Would you know a simple program using alsa-lib for sound capture, with which I could try to reproduce?
Baptiste
On Wed, 29 Jun 2016 15:59:59 +0200, Baptiste Jonglez wrote:
Hello Takashi,
On Wed, Jun 29, 2016 at 03:39:37PM +0200, Takashi Iwai wrote:
Do you think the problem is likely to come from an incorrect API usage from the application? Knowing this would significantly narrow down the search, because the problem could also come from within alsa itself.
The callling code from the application is quite complex, see here:
https://github.com/savoirfairelinux/ring-daemon/blob/master/src/media/audio/...
In particular, the assertion is triggered by the call to snd_pcm_avail_update() at line 694 (see the stack trace below).
I have too little time to investigate on this issue, but judging from the fact that there has been no such a report until now, I guess it's specific to your code. But I can't say sure, of course.
Well, I did find other projects having this issue:
https://aur.archlinux.org/packages/zoom/#comment-544696 http://ubuntuforums.org/showthread.php?t=2248373 https://github.com/js-platform/node-webrtc/issues/110 https://fedorahosted.org/fldigi/ticket/70
Oh well. They have been never reported to upstream... In anyway, your report is the first one, and with a stack trace.
Although in most of these cases, the issue seems 100% reproducible (e.g. crash at startup). In our case, the assertion failure only happens when there is significant CPU load.
Is the relevant code path accessed concurrently? The possible race is the first suspect I'd always think of in such a case.
Good idea, I will check. Given it happens when the CPU is loaded, my guess is that "something" is done too late, which means that it works on "invalidated" data (this is all very vague, sorry).
Would you know a simple program using alsa-lib for sound capture, with which I could try to reproduce?
arecord? But it's no multi-thread, so not suitable for the concurrency tests.
How is your PCM plugin setup? Is softvol over dmix?
Takashi
On Wed, 29 Jun 2016 16:16:03 +0200, Takashi Iwai wrote:
On Wed, 29 Jun 2016 15:59:59 +0200, Baptiste Jonglez wrote:
Hello Takashi,
On Wed, Jun 29, 2016 at 03:39:37PM +0200, Takashi Iwai wrote:
Do you think the problem is likely to come from an incorrect API usage from the application? Knowing this would significantly narrow down the search, because the problem could also come from within alsa itself.
The callling code from the application is quite complex, see here:
https://github.com/savoirfairelinux/ring-daemon/blob/master/src/media/audio/...
In particular, the assertion is triggered by the call to snd_pcm_avail_update() at line 694 (see the stack trace below).
I have too little time to investigate on this issue, but judging from the fact that there has been no such a report until now, I guess it's specific to your code. But I can't say sure, of course.
Well, I did find other projects having this issue:
https://aur.archlinux.org/packages/zoom/#comment-544696 http://ubuntuforums.org/showthread.php?t=2248373 https://github.com/js-platform/node-webrtc/issues/110 https://fedorahosted.org/fldigi/ticket/70
Oh well. They have been never reported to upstream... In anyway, your report is the first one, and with a stack trace.
Although in most of these cases, the issue seems 100% reproducible (e.g. crash at startup). In our case, the assertion failure only happens when there is significant CPU load.
Is the relevant code path accessed concurrently? The possible race is the first suspect I'd always think of in such a case.
Good idea, I will check. Given it happens when the CPU is loaded, my guess is that "something" is done too late, which means that it works on "invalidated" data (this is all very vague, sorry).
Would you know a simple program using alsa-lib for sound capture, with which I could try to reproduce?
arecord? But it's no multi-thread, so not suitable for the concurrency tests.
Now I reread the code again, and I guess it's really the racy accesses. Basically you can't call alsa-lib functions concurrently. For example, calling snd_pcm_avail_update() from multiple threads concurrently may lead to such an error. Internally it copies / converts the content (e.g. for softvol plugin), and this would conflict when called in parallel.
You can try a hackish patch below to see whether it emits any messages. It has no mutex, so the code itself is racy, but it should be enough just as a check.
Meanwhile, the crash itself might be avoided by disabling "pcm->mmap_shadow = 1" line in pcm_softvol.c. Then it'll be copied to the external buffer.
Takashi
--- diff --git a/src/pcm/pcm_plugin.c b/src/pcm/pcm_plugin.c index 8527783c3569..571defad6e12 100644 --- a/src/pcm/pcm_plugin.c +++ b/src/pcm/pcm_plugin.c @@ -483,17 +483,25 @@ snd_pcm_plugin_mmap_commit(snd_pcm_t *pcm,
static snd_pcm_sframes_t snd_pcm_plugin_avail_update(snd_pcm_t *pcm) { + static snd_pcm_t *check = NULL; snd_pcm_plugin_t *plugin = pcm->private_data; snd_pcm_t *slave = plugin->gen.slave; snd_pcm_sframes_t slave_size; int err;
+ if (!check) + check = pcm; + else if (pcm == check) + fprintf(stderr, "XXX RACY CALL\n"); + slave_size = snd_pcm_avail_update(slave); if (pcm->stream == SND_PCM_STREAM_CAPTURE && pcm->access != SND_PCM_ACCESS_RW_INTERLEAVED && pcm->access != SND_PCM_ACCESS_RW_NONINTERLEAVED) goto _capture; *pcm->hw.ptr = *slave->hw.ptr; + if (pcm == check) + check = NULL; return slave_size; _capture: { @@ -545,11 +553,15 @@ static snd_pcm_sframes_t snd_pcm_plugin_avail_update(snd_pcm_t *pcm) slave_size -= slave_frames; xfer += frames; } + if (pcm == check) + check = NULL; return (snd_pcm_sframes_t)xfer;
error_atomic: snd_atomic_write_end(&plugin->watom); error: + if (pcm == check) + check = NULL; return xfer > 0 ? (snd_pcm_sframes_t)xfer : err; } }
On Wed, Jun 29, 2016 at 06:02:58PM +0200, Takashi Iwai wrote:
Now I reread the code again, and I guess it's really the racy accesses. Basically you can't call alsa-lib functions concurrently. For example, calling snd_pcm_avail_update() from multiple threads concurrently may lead to such an error. Internally it copies / converts the content (e.g. for softvol plugin), and this would conflict when called in parallel.
You can try a hackish patch below to see whether it emits any messages. It has no mutex, so the code itself is racy, but it should be enough just as a check.
Meanwhile, the crash itself might be avoided by disabling "pcm->mmap_shadow = 1" line in pcm_softvol.c. Then it'll be copied to the external buffer.
Thank you Takashi, I will give it a try!
Baptiste
diff --git a/src/pcm/pcm_plugin.c b/src/pcm/pcm_plugin.c index 8527783c3569..571defad6e12 100644 --- a/src/pcm/pcm_plugin.c +++ b/src/pcm/pcm_plugin.c @@ -483,17 +483,25 @@ snd_pcm_plugin_mmap_commit(snd_pcm_t *pcm,
static snd_pcm_sframes_t snd_pcm_plugin_avail_update(snd_pcm_t *pcm) {
static snd_pcm_t *check = NULL; snd_pcm_plugin_t *plugin = pcm->private_data; snd_pcm_t *slave = plugin->gen.slave; snd_pcm_sframes_t slave_size; int err;
if (!check)
check = pcm;
else if (pcm == check)
fprintf(stderr, "XXX RACY CALL\n");
slave_size = snd_pcm_avail_update(slave); if (pcm->stream == SND_PCM_STREAM_CAPTURE && pcm->access != SND_PCM_ACCESS_RW_INTERLEAVED && pcm->access != SND_PCM_ACCESS_RW_NONINTERLEAVED) goto _capture; *pcm->hw.ptr = *slave->hw.ptr;
if (pcm == check)
check = NULL; return slave_size;
_capture: {
@@ -545,11 +553,15 @@ static snd_pcm_sframes_t snd_pcm_plugin_avail_update(snd_pcm_t *pcm) slave_size -= slave_frames; xfer += frames; }
if (pcm == check)
check = NULL;
return (snd_pcm_sframes_t)xfer;
error_atomic: snd_atomic_write_end(&plugin->watom); error:
if (pcm == check)
check = NULL;
return xfer > 0 ? (snd_pcm_sframes_t)xfer : err; }
}
On Fri, 01 Jul 2016 12:19:08 +0200, Baptiste Jonglez wrote:
On Wed, Jun 29, 2016 at 06:02:58PM +0200, Takashi Iwai wrote:
Now I reread the code again, and I guess it's really the racy accesses. Basically you can't call alsa-lib functions concurrently. For example, calling snd_pcm_avail_update() from multiple threads concurrently may lead to such an error. Internally it copies / converts the content (e.g. for softvol plugin), and this would conflict when called in parallel.
You can try a hackish patch below to see whether it emits any messages. It has no mutex, so the code itself is racy, but it should be enough just as a check.
Meanwhile, the crash itself might be avoided by disabling "pcm->mmap_shadow = 1" line in pcm_softvol.c. Then it'll be copied to the external buffer.
Thank you Takashi, I will give it a try!
To be noted: clearing mmap_show won't "fix" but merely avoid the crash. You may still get garbled sounds when such a race happens.
The best would be to fix the race. But it's often difficult to fix due to the design of apps.
I created an experimental version of alsa-lib to make PCM thread-safe by forcibly applying pthread mutex. The code is found in thread-safety branch in the git tree below: https://github.com/tiwai/alsa-lib
You need to build with --enable-thread-safety configure option with this code. I haven't tested whether it really avoids the race yet. I'd need to write a test case. So, take it as a proof-of-concept, so far.
Takashi
participants (2)
-
Baptiste Jonglez
-
Takashi Iwai