On 19/05/2020 15:46, Takashi Iwai wrote:
Hm, I'm not sure whether whether this really leads to the deadlock, but I can't exclude such a possibility in some corner cases.
It is not a deadlock, rather there exist scenarios where the buffer is cleared more than once, and since clearing is done by subtracting the old value, this results in incorrect audio.
Consider the scenario where before mixing starts, *dst = 0 and *sum = 100. Now consider that application A plays back 10 and application B plays back 0. The expected result is 10. However this can happen:
A: sample = *src; (A:sample == 10) A: old_sample = *sum; (A:old_sample == 100)
<< here A gets interrupted by B>
B: sample = *src; (B:sample == 0) B: old_sample = *sum; (B:old_sample == 100) B: if (ARCH_CMPXCHG(dst, 0, 1) == 0) (*dst == 1) B: sample -= old_sample; (B:sample == -100) B: ARCH_ADD(sum, sample); (*sum == 0) B: [copy sum to dst] (*dst == 0)
<< here A resumes >>
A: if (ARCH_CMPXCHG(dst, 0, 1) == 0) (*dst == 1) A: sample -= old_sample; (A:sample == -90) A: ARCH_ADD(sum, sample); (*sum == -90) A: [copy sum to dst] (*dst == -90)
In the end, *sum and *dst are -90 instead of 10.
The advantage of lockless dmix implementation isn't about its CPU usage but the nature where a stream isn't prevented by other streams, which assures the very low latency, too. That is, with the generic dmix, a stream can be still halted when another stream stalls in the middle, and there is no way to recover from it.
I agree, however considering that the lock has (unintentionally) been enabled for 13 years, and no one noticed until now, it doesn't seem to be a problem. I think there is a higher risk that removing the lock will expose new bugs due to flaws in the lock-free algorithm that were previously hidden by the lock.
So, IMO, we can start with a dynamic configuration to choose the behavior and add a configure option to choose the implementations. The default behavior should be still an open question, though.
I can do that too, but monitoring is significantly easier to implement in the locked case. Also, I see no way to fix the lock-free algorithm without making changes to the shared memory layout (which would break compatibility between ALSA versions, which is a problem for anyone who uses chroots or for some reason has multiple versions of libasound on their system (e.g. Steam includes its own version of libasound). And I would rather not implement a lock-free implementation which I know is not fully correct.
Best regards, Maarten Baert