On Tue, 12 Oct 2021 15:45:41 +0200, Pierre-Louis Bossart wrote:
And indeed there's a deadlock!
snd_pcm_period_elapsed() takes the FE pcm stream lock, and will call snd_pcm_trigger.
Indeed, this would deadlock.
So if we also take the pcm stream lock in the BE trigger, there's a conceptual deadlock: we painted ourselves in a corner by using the same lock twice.
Takashi, are you really sure we should protect these for_each_dpcm_be() loops with the same pcm lock?
The call within the FE lock is done only in dpcm_dai_trigger_fe_be(), and this should call dpcm_be_dai_trigger() as is. In other places, the calls are without FE lock, hence they can take the lock, e.g. create a variant dpcm_dai_trigger_fe_be_lock().
Or rather it was the code path of snd_soc_dpcm_check_state()? The function could be called from dpcm_be_dai_trigger(). Currently dpcm_lock seems to be added at places casually covering some of the for_each_dpcm_be() or whatever... The whole lock scheme has to be revisited.
it seems like asking for trouble to revisit the ALSA core just to walking through a list of BEs? Would you object to changing dpcm_lock as I suggested, but not interfering with stream handling?
That would work, too, it's just a pity to degrade the fine-grained locks that have been already taken into global locks...
For the performance problem, at least, you can make it rwlock and rwsem types (depending on nonatomic) so that the concurrent accesses would work. The only exclusive lock is the case for adding and removing the entries, which should be done with write lock / sem down, while other link traversals can be executed in the read lock / sem.
Thanks Takashi for your feedback.
Let's first get the locking right. We can optimize performance later.
I will continue with the idea of using existing fine-grained locks a bit more, I am with you that this dpcm_lock was not added in a consistent way and reusing the concept is really building on sand.
We can remove the lock in snd_soc_dpcm_check_state, I already did the change in other versions. But what I'll need to check further is if indeed dpcm_be_dai_trigger() is called with the FE lock taken already. Adding a lockdep_assert_help() or something would help I guess.
The part where I am still not clear is that snd_pcm_period_elapsed uses the irqsave/irqrestore version, but in the initial code you suggested the vanilla irq version is fine. I am not sure I get the nuance here, and btw in the case of SOF the snd_pcm_period_elapsed is called from a workqueue, not an interrupt handler, as a work-around to avoid an IPC deadlock, so we probably don't need the irqsave/irqrestore version anyways.
In a nutshell: * in the code paths that are already with the stream lock (i.e. trigger, pointer and ack PCM callbacks), you need no extra lock for the stream itself. But if you want additional locks (e.g. for BE), use either *_spin_lock() or *_spin_lock_irqsave(), but not *_spin_lock_irq().
* In other code paths, *_spin_lock_irq().
In doubt, you can use always use *_irqsave(), of course.
Takashi