[alsa-devel] locking looks odd
Hello,
We are having odd issues with libasound 1.1.2 which we didn't have with libasound 1.1.1, more precisely
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=833950
so I'm having a look at the locking API introduced in 1.1.2, and there are some oddities:
- snd_pcm_new seems to initialize pcm->thread_safe to 0 by default, this does not seem safe. The attached patch initializes it to 1, which fixes the bug in our tests.
- snd_pcm_hw_open_fd forces it to 1, thus ignoring what snd_pcm_new set.
- one can find both __snd_pcm_lock and snd_pcm_lock functions, what is the expected difference between them?
- __snd_pcm_lock takes locks when thread_safe >= 0, while snd_pcm_lock takes locks when thread_safe == 0, this looks really odd.
- libasound could just not link against libpthread, pthread_mutex_lock/unlock are already provided as empty stubs by libc, the overhead will thus only be hit when the application links against libpthread (libasound will then properly use pthread locks).
Samuel
Hi,
On Aug 17 2016 06:03, Samuel Thibault wrote:
Hello,
We are having odd issues with libasound 1.1.2 which we didn't have with libasound 1.1.1, more precisely
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=833950
so I'm having a look at the locking API introduced in 1.1.2, and there are some oddities:
snd_pcm_new seems to initialize pcm->thread_safe to 0 by default, this does not seem safe. The attached patch initializes it to 1, which fixes the bug in our tests.
snd_pcm_hw_open_fd forces it to 1, thus ignoring what snd_pcm_new set.
one can find both __snd_pcm_lock and snd_pcm_lock functions, what is the expected difference between them?
__snd_pcm_lock takes locks when thread_safe >= 0, while snd_pcm_lock takes locks when thread_safe == 0, this looks really odd.
libasound could just not link against libpthread, pthread_mutex_lock/unlock are already provided as empty stubs by libc, the overhead will thus only be hit when the application links against libpthread (libasound will then properly use pthread locks).
Thanks for this information. Unfortunately, committer of this modification (a subsystem maintainer) has a summer vacation, so it's delayed for you to get replies for to questions.
In my opinion, the second item you addressed to is a bug of this commit, to introduce the environment variable.
pcm: Add LIBASOUND_THREAD_SAFE env variable check http://git.alsa-project.org/?p=alsa-lib.git;a=commitdiff;h=c4b690278eeaf9a68...
I don't know exactly the others. Anyway, I'll continue to investigate these issues and cooperate to solve them.
Thanks
Takashi Sakamoto
Dne 16.8.2016 v 23:03 Samuel Thibault napsal(a):
Hello,
We are having odd issues with libasound 1.1.2 which we didn't have with libasound 1.1.1, more precisely
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=833950
so I'm having a look at the locking API introduced in 1.1.2, and there are some oddities:
snd_pcm_new seems to initialize pcm->thread_safe to 0 by default, this does not seem safe. The attached patch initializes it to 1, which fixes the bug in our tests.
snd_pcm_hw_open_fd forces it to 1, thus ignoring what snd_pcm_new set.
The thread_safe has this meaning:
0 - the pcm plugin is not thread safe 1 - the pcm plugin is thread safe (actually only the hw plugin) -1 - disable thread safety
Your patch does not look correct. It's necessary to determine where the mutex is locked the second time (use gdb and backtrace for all threads for that). Note that plugins may be chained.
- one can find both __snd_pcm_lock and snd_pcm_lock functions, what is the expected difference between them?
__snd_pcm_lock/unlock is for forced lock
snd_pcm_lock/unlock skips locking for safe plugins (only hw plugin)
- __snd_pcm_lock takes locks when thread_safe >= 0, while snd_pcm_lock takes locks when thread_safe == 0, this looks really odd.
See meaning of this variable.
- libasound could just not link against libpthread, pthread_mutex_lock/unlock are already provided as empty stubs by libc, the overhead will thus only be hit when the application links against libpthread (libasound will then properly use pthread locks).
Good point, but this should be configurable through the configure script.
Jaroslav
Jaroslav Kysela, on Wed 17 Aug 2016 19:46:42 +0200, wrote:
Dne 16.8.2016 v 23:03 Samuel Thibault napsal(a):
snd_pcm_new seems to initialize pcm->thread_safe to 0 by default, this does not seem safe. The attached patch initializes it to 1, which fixes the bug in our tests.
snd_pcm_hw_open_fd forces it to 1, thus ignoring what snd_pcm_new set.
The thread_safe has this meaning:
0 - the pcm plugin is not thread safe 1 - the pcm plugin is thread safe (actually only the hw plugin) -1 - disable thread safety
So now with rethinking all of this, I'm starting to understand: from reading the variable name, I would have thought "thread_safe=1" means "I want thread safety thanks to a mutex", while apparently it means "the plugin is already thread-safe, there is no need for a mutex"... Really, all of this should be documented clearly along the source code, otherwise people will get it wrong.
I'd just like to check something: do we agree that libasound must be thread-safe by default (otherwise it breaks the application assumption that it's thread-safe)? If so, then there are thread-safety bugs: the mentioned Debian report is far from alone, the upgrade to the newer libasound has severely broken quite a few applications, I'm at the point of advising the Debian maintainer to just revert to the previous version for Stretch, otherwize we'll be shipping just very-buggy software.
Samuel
On Sat, 20 Aug 2016 14:12:05 +0200, Samuel Thibault wrote:
Jaroslav Kysela, on Wed 17 Aug 2016 19:46:42 +0200, wrote:
Dne 16.8.2016 v 23:03 Samuel Thibault napsal(a):
snd_pcm_new seems to initialize pcm->thread_safe to 0 by default, this does not seem safe. The attached patch initializes it to 1, which fixes the bug in our tests.
snd_pcm_hw_open_fd forces it to 1, thus ignoring what snd_pcm_new set.
The thread_safe has this meaning:
0 - the pcm plugin is not thread safe 1 - the pcm plugin is thread safe (actually only the hw plugin) -1 - disable thread safety
So now with rethinking all of this, I'm starting to understand: from reading the variable name, I would have thought "thread_safe=1" means "I want thread safety thanks to a mutex", while apparently it means "the plugin is already thread-safe, there is no need for a mutex"... Really, all of this should be documented clearly along the source code, otherwise people will get it wrong.
Sorry for unclearness. An improvement patch is always welcome :)
I'd just like to check something: do we agree that libasound must be thread-safe by default (otherwise it breaks the application assumption that it's thread-safe)? If so, then there are thread-safety bugs: the mentioned Debian report is far from alone, the upgrade to the newer libasound has severely broken quite a few applications, I'm at the point of advising the Debian maintainer to just revert to the previous version for Stretch, otherwize we'll be shipping just very-buggy software.
Basically the alsa-lib is not thread safe. The new thread-safe implementation was introduced just because there are way too many buggy applications that work casually for now. It's not meant as the rock-solid thread-safety alone.
If the application deadlocks with the alsa-lib with the thread-safe option enabled, it's most likely that the application does already accessing alsa-lib in a wrong, racy way.
In anyway, it'd be helpful to get stack trace of threads at the hanging moment. It'll clarify how the deadlock is triggered. The plugin may become a complex chain, and there might be something missing in the big picture.
thanks,
Takashi
Samuel Thibault, on Sat 20 Aug 2016 14:12:05 +0200, wrote:
I'd just like to check something: do we agree that libasound must be thread-safe by default (otherwise it breaks the application assumption that it's thread-safe)? If so, then there are thread-safety bugs:
Ok, now with the discussion on the portaudio list starting at
https://lists.columbia.edu/pipermail/portaudio/2016-August/000599.html
it seems the issue was in portaudio, which was hit by the locking behavior change.
Samuel
(Adding Alan to Cc)
On Fri, 26 Aug 2016 21:02:12 +0200, Samuel Thibault wrote:
Samuel Thibault, on Sat 20 Aug 2016 14:12:05 +0200, wrote:
I'd just like to check something: do we agree that libasound must be thread-safe by default (otherwise it breaks the application assumption that it's thread-safe)? If so, then there are thread-safety bugs:
Ok, now with the discussion on the portaudio list starting at
https://lists.columbia.edu/pipermail/portaudio/2016-August/000599.html
it seems the issue was in portaudio, which was hit by the locking behavior change.
Good to hear that it's been addressed in portaudio side.
But I still wonder in which code path the deadlock was triggered. Can anyone give the stack traces of deadlocked (multi-)threads?
thanks,
Takashi
On Tuesday 30 August 2016 15:20, Takashi Iwai wrote:
(Adding Alan to Cc)
On Fri, 26 Aug 2016 21:02:12 +0200,
Samuel Thibault wrote:
Samuel Thibault, on Sat 20 Aug 2016 14:12:05 +0200, wrote:
I'd just like to check something: do we agree that libasound must be thread-safe by default (otherwise it breaks the application assumption that it's thread-safe)? If so, then there are thread-safety bugs:
Ok, now with the discussion on the portaudio list starting at
https://lists.columbia.edu/pipermail/portaudio/2016-August/000599.html
it seems the issue was in portaudio, which was hit by the locking behavior change.
Good to hear that it's been addressed in portaudio side.
But I still wonder in which code path the deadlock was triggered. Can anyone give the stack traces of deadlocked (multi-)threads?
Actually, it wasn't so much the locking itself, but thread cancellation during one call (snd_pcm_mmap_commit()) leaving the lock held and not released, which then balked when a cleanup call to snd_pcm_drop() attempted to aquire the lock. This occurred as a result of an 'Abort' action in Portaudio. It can be avoided by disabling cancelablity at all times except when waiting on poll(), though I have yet to be completely satisfied there are no snags in this approach, as cancelability is new to me!
Regards
Alan
On Tue, 30 Aug 2016 17:54:22 +0200, Alan Horstmann wrote:
On Tuesday 30 August 2016 15:20, Takashi Iwai wrote:
(Adding Alan to Cc)
On Fri, 26 Aug 2016 21:02:12 +0200,
Samuel Thibault wrote:
Samuel Thibault, on Sat 20 Aug 2016 14:12:05 +0200, wrote:
I'd just like to check something: do we agree that libasound must be thread-safe by default (otherwise it breaks the application assumption that it's thread-safe)? If so, then there are thread-safety bugs:
Ok, now with the discussion on the portaudio list starting at
https://lists.columbia.edu/pipermail/portaudio/2016-August/000599.html
it seems the issue was in portaudio, which was hit by the locking behavior change.
Good to hear that it's been addressed in portaudio side.
But I still wonder in which code path the deadlock was triggered. Can anyone give the stack traces of deadlocked (multi-)threads?
Actually, it wasn't so much the locking itself, but thread cancellation during one call (snd_pcm_mmap_commit()) leaving the lock held and not released, which then balked when a cleanup call to snd_pcm_drop() attempted to aquire the lock. This occurred as a result of an 'Abort' action in Portaudio. It can be avoided by disabling cancelablity at all times except when waiting on poll(), though I have yet to be completely satisfied there are no snags in this approach, as cancelability is new to me!
OK, that relieves me. Let me know if you still find anything odd about the locking in alsa-lib, though.
thanks,
Takashi
On Wed, 17 Aug 2016 19:46:42 +0200, Jaroslav Kysela wrote:
Dne 16.8.2016 v 23:03 Samuel Thibault napsal(a):
Hello,
We are having odd issues with libasound 1.1.2 which we didn't have with libasound 1.1.1, more precisely
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=833950
so I'm having a look at the locking API introduced in 1.1.2, and there are some oddities:
snd_pcm_new seems to initialize pcm->thread_safe to 0 by default, this does not seem safe. The attached patch initializes it to 1, which fixes the bug in our tests.
snd_pcm_hw_open_fd forces it to 1, thus ignoring what snd_pcm_new set.
The thread_safe has this meaning:
0 - the pcm plugin is not thread safe 1 - the pcm plugin is thread safe (actually only the hw plugin) -1 - disable thread safety
Your patch does not look correct. It's necessary to determine where the mutex is locked the second time (use gdb and backtrace for all threads for that). Note that plugins may be chained.
- one can find both __snd_pcm_lock and snd_pcm_lock functions, what is the expected difference between them?
__snd_pcm_lock/unlock is for forced lock
snd_pcm_lock/unlock skips locking for safe plugins (only hw plugin)
- __snd_pcm_lock takes locks when thread_safe >= 0, while snd_pcm_lock takes locks when thread_safe == 0, this looks really odd.
See meaning of this variable.
Thanks Jaroslav for explaining details!
- libasound could just not link against libpthread, pthread_mutex_lock/unlock are already provided as empty stubs by libc, the overhead will thus only be hit when the application links against libpthread (libasound will then properly use pthread locks).
Good point, but this should be configurable through the configure script.
The thread-safety lock API usage is enabled only when pthread is enabled in configure script. So the least check in the configure script is already there.
Takashi
(Sorry if this email reaches you twice, had some issues)
On 2016-08-17 19:46, Jaroslav Kysela wrote:
Dne 16.8.2016 v 23:03 Samuel Thibault napsal(a):
Hello,
We are having odd issues with libasound 1.1.2 which we didn't have with libasound 1.1.1, more precisely
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=833950
so I'm having a look at the locking API introduced in 1.1.2, and there are some oddities:
snd_pcm_new seems to initialize pcm->thread_safe to 0 by default, this does not seem safe. The attached patch initializes it to 1, which fixes the bug in our tests.
snd_pcm_hw_open_fd forces it to 1, thus ignoring what snd_pcm_new set.
The thread_safe has this meaning:
0 - the pcm plugin is not thread safe 1 - the pcm plugin is thread safe (actually only the hw plugin) -1 - disable thread safety
Your patch does not look correct. It's necessary to determine where the mutex is locked the second time (use gdb and backtrace for all threads for that). Note that plugins may be chained.
Still Samuel's point about -1 being ignored is valid, so I just sent a patch about that.
But I'm quite sceptic about having that environment variable in the first place - it seems to me that new apps will start to rely on alsa-lib doing the locking for them, second a blog post comes along that tells people to set that environment variable to 0 to maximize performance, third those apps will crash and the user doesn't understand why.
- one can find both __snd_pcm_lock and snd_pcm_lock functions, what is the expected difference between them?
__snd_pcm_lock/unlock is for forced lock
snd_pcm_lock/unlock skips locking for safe plugins (only hw plugin)
These are quite confusing names, one would expect them to be the same (snd_pcm_lock being a compatibility wrapper around some internal __snd_pcm_lock).
I'm not sure about better names, but maybe something like snd_pcm_lock_all and snd_pcm_lock_ts0 would at least indicate that they are different functions.
On Thu, 01 Sep 2016 08:27:07 +0200, David Henningsson wrote:
(Sorry if this email reaches you twice, had some issues)
On 2016-08-17 19:46, Jaroslav Kysela wrote:
Dne 16.8.2016 v 23:03 Samuel Thibault napsal(a):
Hello,
We are having odd issues with libasound 1.1.2 which we didn't have with libasound 1.1.1, more precisely
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=833950
so I'm having a look at the locking API introduced in 1.1.2, and there are some oddities:
snd_pcm_new seems to initialize pcm->thread_safe to 0 by default, this does not seem safe. The attached patch initializes it to 1, which fixes the bug in our tests.
snd_pcm_hw_open_fd forces it to 1, thus ignoring what snd_pcm_new set.
The thread_safe has this meaning:
0 - the pcm plugin is not thread safe 1 - the pcm plugin is thread safe (actually only the hw plugin) -1 - disable thread safety
Your patch does not look correct. It's necessary to determine where the mutex is locked the second time (use gdb and backtrace for all threads for that). Note that plugins may be chained.
Still Samuel's point about -1 being ignored is valid, so I just sent a patch about that.
But I'm quite sceptic about having that environment variable in the first place - it seems to me that new apps will start to rely on alsa-lib doing the locking for them, second a blog post comes along that tells people to set that environment variable to 0 to maximize performance, third those apps will crash and the user doesn't understand why.
Setting the env variable to 0 doesn't mean to give you the maximum performance. The hw plugin is always without locking, and it's only with other plugins. If you need the "maximum" performance, you need to use only hw. Using other plugins already contradicts your purpose from the very beginning.
That said, the current implementation shouldn't matter for JACK or such systems where they expect more or less the direct access to hw plugin.
Still, for buggy pthread implementation in applications, the locking inside alsa-lib may cause an issue. The env variable is a help to identify that. It's not provided as a solution.
- one can find both __snd_pcm_lock and snd_pcm_lock functions, what is the expected difference between them?
__snd_pcm_lock/unlock is for forced lock
snd_pcm_lock/unlock skips locking for safe plugins (only hw plugin)
These are quite confusing names, one would expect them to be the same (snd_pcm_lock being a compatibility wrapper around some internal __snd_pcm_lock).
I'm not sure about better names, but maybe something like snd_pcm_lock_all and snd_pcm_lock_ts0 would at least indicate that they are different functions.
Well, both snd_pcm_lock_all() and snd_pcm_lock_ts0() would be confusing, too, since these don't match with the behavior as well.
__snd_pcm_lock() -> lock a plugin when it's not disabled by env variable (no matter whether it declares itself as safe or unsafe). snd_pcm_lock() -> lock a plugin when it's unsafe and not disabled by env variable
I'm open for any better names. Patches are welcome.
thanks,
Takashi
participants (6)
-
Alan Horstmann
-
David Henningsson
-
Jaroslav Kysela
-
Samuel Thibault
-
Takashi Iwai
-
Takashi Sakamoto