On Wed, 29 Jun 2016 16:16:03 +0200, Takashi Iwai wrote:
On Wed, 29 Jun 2016 15:59:59 +0200, Baptiste Jonglez wrote:
Hello Takashi,
On Wed, Jun 29, 2016 at 03:39:37PM +0200, Takashi Iwai wrote:
Do you think the problem is likely to come from an incorrect API usage from the application? Knowing this would significantly narrow down the search, because the problem could also come from within alsa itself.
The callling code from the application is quite complex, see here:
https://github.com/savoirfairelinux/ring-daemon/blob/master/src/media/audio/...
In particular, the assertion is triggered by the call to snd_pcm_avail_update() at line 694 (see the stack trace below).
I have too little time to investigate on this issue, but judging from the fact that there has been no such a report until now, I guess it's specific to your code. But I can't say sure, of course.
Well, I did find other projects having this issue:
https://aur.archlinux.org/packages/zoom/#comment-544696 http://ubuntuforums.org/showthread.php?t=2248373 https://github.com/js-platform/node-webrtc/issues/110 https://fedorahosted.org/fldigi/ticket/70
Oh well. They have been never reported to upstream... In anyway, your report is the first one, and with a stack trace.
Although in most of these cases, the issue seems 100% reproducible (e.g. crash at startup). In our case, the assertion failure only happens when there is significant CPU load.
Is the relevant code path accessed concurrently? The possible race is the first suspect I'd always think of in such a case.
Good idea, I will check. Given it happens when the CPU is loaded, my guess is that "something" is done too late, which means that it works on "invalidated" data (this is all very vague, sorry).
Would you know a simple program using alsa-lib for sound capture, with which I could try to reproduce?
arecord? But it's no multi-thread, so not suitable for the concurrency tests.
Now I reread the code again, and I guess it's really the racy accesses. Basically you can't call alsa-lib functions concurrently. For example, calling snd_pcm_avail_update() from multiple threads concurrently may lead to such an error. Internally it copies / converts the content (e.g. for softvol plugin), and this would conflict when called in parallel.
You can try a hackish patch below to see whether it emits any messages. It has no mutex, so the code itself is racy, but it should be enough just as a check.
Meanwhile, the crash itself might be avoided by disabling "pcm->mmap_shadow = 1" line in pcm_softvol.c. Then it'll be copied to the external buffer.
Takashi
--- diff --git a/src/pcm/pcm_plugin.c b/src/pcm/pcm_plugin.c index 8527783c3569..571defad6e12 100644 --- a/src/pcm/pcm_plugin.c +++ b/src/pcm/pcm_plugin.c @@ -483,17 +483,25 @@ snd_pcm_plugin_mmap_commit(snd_pcm_t *pcm,
static snd_pcm_sframes_t snd_pcm_plugin_avail_update(snd_pcm_t *pcm) { + static snd_pcm_t *check = NULL; snd_pcm_plugin_t *plugin = pcm->private_data; snd_pcm_t *slave = plugin->gen.slave; snd_pcm_sframes_t slave_size; int err;
+ if (!check) + check = pcm; + else if (pcm == check) + fprintf(stderr, "XXX RACY CALL\n"); + slave_size = snd_pcm_avail_update(slave); if (pcm->stream == SND_PCM_STREAM_CAPTURE && pcm->access != SND_PCM_ACCESS_RW_INTERLEAVED && pcm->access != SND_PCM_ACCESS_RW_NONINTERLEAVED) goto _capture; *pcm->hw.ptr = *slave->hw.ptr; + if (pcm == check) + check = NULL; return slave_size; _capture: { @@ -545,11 +553,15 @@ static snd_pcm_sframes_t snd_pcm_plugin_avail_update(snd_pcm_t *pcm) slave_size -= slave_frames; xfer += frames; } + if (pcm == check) + check = NULL; return (snd_pcm_sframes_t)xfer;
error_atomic: snd_atomic_write_end(&plugin->watom); error: + if (pcm == check) + check = NULL; return xfer > 0 ? (snd_pcm_sframes_t)xfer : err; } }