Am 24/07/17 04:46, schrieb Takashi Sakamoto:
Hi,
Thanks for the regression report, and I'm sorry for your inconvenience.
As long as reading the call trace, the issue is indeed deadlock between the process and softIRQ (tasklet) contexts against the group lock for ALSA PCM substream and the tasklet for OHCI 1394 IT context.
A. In the process context * (lock A) Acquiring spin_lock by snd_pcm_stream_lock_irq() in snd_pcm_status64() * (lock B) Then attempt to enter tasklet
B. In the softIRQ context * (lock B) Enter tasklet * (lock A) Attempt to acquire spin_lock by snd_pcm_stream_lock_irqsave() in snd_pcm_period_elapsed()
It is the same issue as you reported in test branch for bh workqueue[1].
I think the users rarely face the issue when working with either PipeWire or PulseAudio, since these processes run with no period wakeup mode of runtime for PCM substream (thus with less hardIRQ).
Anyway, it is one of solutions to revert both a commit b5b519965c4c ("ALSA: firewire-lib: obsolete workqueue for period update") and a commit 7ba5ca32fe6e ("ALSA: firewire-lib: operate for period elapse event in process context"). The returned workqueue is responsible for lock A, thus:
A. In the process context * (lock A) Acquiring spin_lock by snd_pcm_stream_lock_irq() in snd_pcm_status64() * (lock B) Then attempt to enter tasklet
B. In the softIRQ context * (lock B) Enter tasklet * schedule workqueue
C. another process context (workqueue) * (lock A) Attempt to acquire spin_lock by snd_pcm_stream_lock_irqsave() in snd_pcm_period_elapsed()
The deadlock would not occur.
[1] https://github.com/allenpais/for-6.9-bh-conversions/issues/1
Regards
Takashi Sakamoto
Thank you for taking the issue seriously! Yes, indeed it was the same issue reported to the test branch for bh workqueue!
It was "fun" living with this "hilarious" bug for years and not knowing where it comes from. Having it solved is almost like christmas to me, I am very glad I was able to.
Your explaination of what was happening here also helped me understand the issue better, so thank you.
Of course there will be better solutions in the future but for now, the kernel freeze is banished, I hope [2].
Trying to implement my "fix" on the latest kernel (I was only testing with 6.9.9) revealed that 6.10.0 introduced another regression [3], resulting in heavy digital distortion. I'd like to ask you to look into it. Despite the horrible distortion, I'm happy to report that the patch [2] also works on the latest kernel!
Thank you for your hard work on the firewire sound drivers!
[2] https://lore.kernel.org/linux-sound/20240718115637.12816-1-edmund.raile@prot... [3] https://lore.kernel.org/linux-sound/n4jdkizinqfoztqn2cwv7uqqqnvkyu2xk32qebaz...