[PATCH alsa-lib 1/2] pcm: dmix: Fix semaphore usage
As Maarten Baert recently reported, the current dmix code applies the semaphore unnecessarily around mixing streams even when the lockless mix operation is used on x86. This was rather introduced mistakenly at the commit 267d7c728196 ("Add support of little-endian on i386/x86_64 dmix") where the generic dmix code was included on x86, too.
For achieving the original performance back, this patch changes the semaphore handling to be checked at run time instead of statically at compile time.
Signed-off-by: Takashi Iwai tiwai@suse.de --- src/pcm/pcm_direct.h | 1 + src/pcm/pcm_dmix.c | 18 +++++++++++------- src/pcm/pcm_dmix_generic.c | 2 +- src/pcm/pcm_dmix_i386.c | 1 + src/pcm/pcm_dmix_x86_64.c | 1 + 5 files changed, 15 insertions(+), 8 deletions(-)
diff --git a/src/pcm/pcm_direct.h b/src/pcm/pcm_direct.h index 8a236970a3a1..2150bce15449 100644 --- a/src/pcm/pcm_direct.h +++ b/src/pcm/pcm_direct.h @@ -186,6 +186,7 @@ struct snd_pcm_direct { mix_areas_32_t *remix_areas_32; mix_areas_24_t *remix_areas_24; mix_areas_u8_t *remix_areas_u8; + unsigned int use_sem; } dmix; struct { unsigned long long chn_mask; diff --git a/src/pcm/pcm_dmix.c b/src/pcm/pcm_dmix.c index 843fa3168756..e9343b19a536 100644 --- a/src/pcm/pcm_dmix.c +++ b/src/pcm/pcm_dmix.c @@ -292,13 +292,17 @@ static void remix_areas(snd_pcm_direct_t *dmix, * the area via semaphore */ #ifndef DOC_HIDDEN -#ifdef NO_CONCURRENT_ACCESS -#define dmix_down_sem(dmix) snd_pcm_direct_semaphore_down(dmix, DIRECT_IPC_SEM_CLIENT) -#define dmix_up_sem(dmix) snd_pcm_direct_semaphore_up(dmix, DIRECT_IPC_SEM_CLIENT) -#else -#define dmix_down_sem(dmix) -#define dmix_up_sem(dmix) -#endif +static void dmix_down_sem(snd_pcm_direct_t *dmix) +{ + if (dmix->u.dmix.use_sem) + snd_pcm_direct_semaphore_down(dmix, DIRECT_IPC_SEM_CLIENT); +} + +static void dmix_up_sem(snd_pcm_direct_t *dmix) +{ + if (dmix->u.dmix.use_sem) + snd_pcm_direct_semaphore_up(dmix, DIRECT_IPC_SEM_CLIENT); +} #endif
/* diff --git a/src/pcm/pcm_dmix_generic.c b/src/pcm/pcm_dmix_generic.c index 40c08747a74a..8a5b6f148556 100644 --- a/src/pcm/pcm_dmix_generic.c +++ b/src/pcm/pcm_dmix_generic.c @@ -43,7 +43,6 @@ static inline unsigned long __cmpxchg(volatile void *ptr, unsigned long old, #ifndef ARCH_ADD #define ARCH_ADD(p,a) (*(p) += (a)) #define ARCH_CMPXCHG(p,a,b) (*(p)) /* fake */ -#define NO_CONCURRENT_ACCESS /* use semaphore to avoid race */ #define IS_CONCURRENT 0 /* no race check */ #endif
@@ -530,6 +529,7 @@ static void generic_mix_select_callbacks(snd_pcm_direct_t *dmix) dmix->u.dmix.mix_areas_u8 = generic_mix_areas_u8; dmix->u.dmix.remix_areas_24 = generic_remix_areas_24; dmix->u.dmix.remix_areas_u8 = generic_remix_areas_u8; + dmix->u.dmix.use_sem = 1; }
#endif diff --git a/src/pcm/pcm_dmix_i386.c b/src/pcm/pcm_dmix_i386.c index 1ab983a8a373..82a91c5c2897 100644 --- a/src/pcm/pcm_dmix_i386.c +++ b/src/pcm/pcm_dmix_i386.c @@ -135,4 +135,5 @@ static void mix_select_callbacks(snd_pcm_direct_t *dmix) dmix->u.dmix.mix_areas_24 = smp > 1 ? mix_areas_24_smp: mix_areas_24; dmix->u.dmix.remix_areas_24 = smp > 1 ? remix_areas_24_smp: remix_areas_24; } + dmix->u.dmix.use_sem = 0; } diff --git a/src/pcm/pcm_dmix_x86_64.c b/src/pcm/pcm_dmix_x86_64.c index 34c40d4e9d1d..4d882bfd01bf 100644 --- a/src/pcm/pcm_dmix_x86_64.c +++ b/src/pcm/pcm_dmix_x86_64.c @@ -102,4 +102,5 @@ static void mix_select_callbacks(snd_pcm_direct_t *dmix) dmix->u.dmix.remix_areas_32 = smp > 1 ? remix_areas_32_smp : remix_areas_32; dmix->u.dmix.mix_areas_24 = smp > 1 ? mix_areas_24_smp : mix_areas_24; dmix->u.dmix.remix_areas_24 = smp > 1 ? remix_areas_24_smp : remix_areas_24; + dmix->u.dmix.use_sem = 0; }
The recent overlooked bug about the unconditional semaphore usage in the dmix implies that basically we've had no problem with the locking in the practical usages over years. Although the lockless operation has a clear merit, it's a much higher CPU usage (especially on some uncached pages), and it might lead to a potential deadlock in theory (which is hard to reproduce at will, though).
This patch introduces a new configure option "--enable-lockless-dmix" or "--disable-lockless-dmix" to let user choose the default dmix operation mode, then the default value for the dmix "direct_memory_access" option is set based on it.
A big question is which operation mode should be default: it's hard to decide, but in this patch, I bet for disabling the lockless in the end as the performance loss is significant.
But the user can enable the lockless operation at any time; at build time, with the configure option above, and at run time, by specifying the dmix "direct_memory_access" option, too.
Signed-off-by: Takashi Iwai tiwai@suse.de --- configure.ac | 13 +++++++++++++ src/pcm/pcm_direct.c | 4 ++++ 2 files changed, 17 insertions(+)
diff --git a/configure.ac b/configure.ac index 4577c417037a..75d037d5509a 100644 --- a/configure.ac +++ b/configure.ac @@ -629,6 +629,19 @@ if test "$build_pcm_mmap_emul" = "yes"; then AC_DEFINE([BUILD_PCM_PLUGIN_MMAP_EMUL], "1", [Build PCM mmap-emul plugin]) fi
+if test "$build_pcm_dmix" = "yes"; then +AC_MSG_CHECKING(for default lockless dmix) +AC_ARG_ENABLE(lockless-dmix, + AS_HELP_STRING([--enable-lockless-dmix], + [use lockless dmix as default on x86]), + lockless_dmix="$enableval", lockless_dmix="no") +if test "$lockless_dmix" = "yes"; then + AC_MSG_RESULT(yes) + AC_DEFINE([LOCKLESS_DMIX_DEFAULT], "1", [Lockless dmix as default]) +else + AC_MSG_RESULT(no) +fi +fi
dnl Create PCM plugin symbol list for static library rm -f "$srcdir"/src/pcm/pcm_symbols_list.c diff --git a/src/pcm/pcm_direct.c b/src/pcm/pcm_direct.c index 665340954cf3..c38ba3190f1a 100644 --- a/src/pcm/pcm_direct.c +++ b/src/pcm/pcm_direct.c @@ -1892,7 +1892,11 @@ int snd_pcm_direct_parse_open_conf(snd_config_t *root, snd_config_t *conf, rec->slowptr = 1; rec->max_periods = 0; rec->var_periodsize = 0; +#ifdef LOCKLESS_DMIX_DEFAULT rec->direct_memory_access = 1; +#else + rec->direct_memory_access = 0; +#endif rec->hw_ptr_alignment = SND_PCM_HW_PTR_ALIGNMENT_AUTO; rec->tstamp_type = -1;
On Fri, 19 Jun 2020, Takashi Iwai wrote:
The recent overlooked bug about the unconditional semaphore usage in the dmix implies that basically we've had no problem with the locking in the practical usages over years. Although the lockless operation has a clear merit, it's a much higher CPU usage (especially on some uncached pages), and it might lead to a potential deadlock in theory (which is hard to reproduce at will, though).
This patch introduces a new configure option "--enable-lockless-dmix" or "--disable-lockless-dmix" to let user choose the default dmix operation mode, then the default value for the dmix "direct_memory_access" option is set based on it.
A big question is which operation mode should be default: it's hard to decide, but in this patch, I bet for disabling the lockless in the end as the performance loss is significant.
But the user can enable the lockless operation at any time; at build time, with the configure option above, and at run time, by specifying the dmix "direct_memory_access" option, too.
I would like to express some caution here.
Seems it must be essential (not just choice) that the semaphore implementation is the default. As per Maarten's information, there are libasound embedded in applications and containers. Differing defaults results in broken audio between applications.
Sadly there is, in effect, an ABI here; a practical risk that users suffering the consequences eventually.
Because of the history of sepahores, an application would need to signal its intent to use atomics, which is not a good thing as that is complex.
Instead I think it is smart here to consider the opportunity which Maarten has come here with.
Patches here are just the beginning to bring alive a lot of dormant functionality. It assumes hand-written assembly code will run concurrently that appears to not yet have been tested in that way. It is a joy to see hand written assembly, but my worry is that is influencing the decision making.
I am only recently looking at dmix/snoop code, so I do not aim to stand in the way. But I think it would be prudent to consider that bringing alive dormant functionality (vs. opportunity to remove code) as if it were adding the code explicitly. Would ALSA developers review and accept a 1000+ line patch adding architecture-specific assembly, changes to the ABI, based on the benchmarks which Maarten has presented?
Where I am more certain is: if options are to be provided to users then it should be because a user is in the best position to decide. In this case I think ALSA developers must equip users in understanding the pros/cons. That's why ideally there's no option and code just does the right thing. If not, at very least documentation must explain the tradeoff (and I think a better name should be chosen.)
I can certainly see interesting positives for mixing based on atomics. But there are many years without it, and this feels hasty and there are risks.
And Maarten has presented and benchmarked an excellent opportunity to simplify, which could be missed. It is one thing to leave the code dormant until a decision or clearer picture. But these patches risk meeting that opportunity and transforming it into complexity for developers and users.
Signed-off-by: Takashi Iwai tiwai@suse.de --- configure.ac | 13 +++++++++++++ src/pcm/pcm_direct.c | 4 ++++ 2 files changed, 17 insertions(+)
diff --git a/configure.ac b/configure.ac index 4577c417037a..75d037d5509a 100644 --- a/configure.ac +++ b/configure.ac @@ -629,6 +629,19 @@ if test "$build_pcm_mmap_emul" = "yes"; then AC_DEFINE([BUILD_PCM_PLUGIN_MMAP_EMUL], "1", [Build PCM mmap-emul plugin]) fi
+if test "$build_pcm_dmix" = "yes"; then +AC_MSG_CHECKING(for default lockless dmix) +AC_ARG_ENABLE(lockless-dmix,
- AS_HELP_STRING([--enable-lockless-dmix],
- [use lockless dmix as default on x86]),
- lockless_dmix="$enableval", lockless_dmix="no")
+if test "$lockless_dmix" = "yes"; then
- AC_MSG_RESULT(yes)
- AC_DEFINE([LOCKLESS_DMIX_DEFAULT], "1", [Lockless dmix as default])
+else
- AC_MSG_RESULT(no)
+fi +fi
dnl Create PCM plugin symbol list for static library rm -f "$srcdir"/src/pcm/pcm_symbols_list.c diff --git a/src/pcm/pcm_direct.c b/src/pcm/pcm_direct.c index 665340954cf3..c38ba3190f1a 100644 --- a/src/pcm/pcm_direct.c +++ b/src/pcm/pcm_direct.c @@ -1892,7 +1892,11 @@ int snd_pcm_direct_parse_open_conf(snd_config_t *root, snd_config_t *conf, rec->slowptr = 1; rec->max_periods = 0; rec->var_periodsize = 0; +#ifdef LOCKLESS_DMIX_DEFAULT rec->direct_memory_access = 1; +#else
- rec->direct_memory_access = 0;
+#endif rec->hw_ptr_alignment = SND_PCM_HW_PTR_ALIGNMENT_AUTO; rec->tstamp_type = -1;
Dne 22. 06. 20 v 11:31 Mark Hills napsal(a):
On Fri, 19 Jun 2020, Takashi Iwai wrote:
The recent overlooked bug about the unconditional semaphore usage in the dmix implies that basically we've had no problem with the locking in the practical usages over years. Although the lockless operation has a clear merit, it's a much higher CPU usage (especially on some uncached pages), and it might lead to a potential deadlock in theory (which is hard to reproduce at will, though).
This patch introduces a new configure option "--enable-lockless-dmix" or "--disable-lockless-dmix" to let user choose the default dmix operation mode, then the default value for the dmix "direct_memory_access" option is set based on it.
A big question is which operation mode should be default: it's hard to decide, but in this patch, I bet for disabling the lockless in the end as the performance loss is significant.
But the user can enable the lockless operation at any time; at build time, with the configure option above, and at run time, by specifying the dmix "direct_memory_access" option, too.
I would like to express some caution here.
Seems it must be essential (not just choice) that the semaphore implementation is the default. As per Maarten's information, there are libasound embedded in applications and containers. Differing defaults results in broken audio between applications.
Sadly there is, in effect, an ABI here; a practical risk that users suffering the consequences eventually.
Because of the history of sepahores, an application would need to signal its intent to use atomics, which is not a good thing as that is complex.
Instead I think it is smart here to consider the opportunity which Maarten has come here with.
Patches here are just the beginning to bring alive a lot of dormant functionality. It assumes hand-written assembly code will run concurrently that appears to not yet have been tested in that way. It is a joy to see hand written assembly, but my worry is that is influencing the decision making.
I am only recently looking at dmix/snoop code, so I do not aim to stand in the way. But I think it would be prudent to consider that bringing alive dormant functionality (vs. opportunity to remove code) as if it were adding the code explicitly. Would ALSA developers review and accept a 1000+ line patch adding architecture-specific assembly, changes to the ABI, based on the benchmarks which Maarten has presented?
Where I am more certain is: if options are to be provided to users then it should be because a user is in the best position to decide. In this case I think ALSA developers must equip users in understanding the pros/cons. That's why ideally there's no option and code just does the right thing. If not, at very least documentation must explain the tradeoff (and I think a better name should be chosen.)
I can certainly see interesting positives for mixing based on atomics. But there are many years without it, and this feels hasty and there are risks.
And Maarten has presented and benchmarked an excellent opportunity to simplify, which could be missed. It is one thing to leave the code dormant until a decision or clearer picture. But these patches risk meeting that opportunity and transforming it into complexity for developers and users.
Some comments: This code was designed when we worked with SMP machines with 2 CPUs (not cores - physical CPUs). It's old, but I feel it's worth to keep it at least for the reference and testing. The one instruction racy time window described by Maarten seems true, but as I wrote, the errors were mostly zero (no hearable).
The locked variant (without atomic instructions) should be the default without any questions. It's more effective for the multi-cpu-core architectures used nowadays.
I would propose this:
1) use the locked variant as default (no atomic code even for x86) 2) add a configuration option to select the mixing code 3) change SND_PCM_DIRECT_MAGIC which will follow the mixing code selection to avoid locking / atomic mixing code mismatch / misuse
Jaroslav
participants (3)
-
Jaroslav Kysela
-
Mark Hills
-
Takashi Iwai