On Thu, Oct 22, 2020 at 11:50:41AM +0200, Maxime Ripard wrote:
This is caused by the HDMI driver polling some status bit that reports that the infoframes have been properly sent, and calling usleep_range between each iteration[1], and that is done in our trigger callback that seems to be run with a spinlock taken and the interrupt disabled (snd_pcm_action_lock_irq) as part of snd_pcm_start_lock_irq. This is the entire stack trace:
That doesn't sound like something I would expect you do be doing in the trigger callback TBH - it feels like if this is something that could block then the setup should have been done during parameter configuration or something rather than in trigger.
It looks like the snd_soc_dai_link structure has a nonatomic flag that seems to be made to address more or less that issue, taking a mutex instead of a spinlock. However setting that flag results in another lockdep issue, since the dmaengine controller doing the DMA transfer would call snd_pcm_period_elapsed on completion, in a tasklet, this time taking a mutex in an atomic context which is just as bad as the initial issue. This is the stacktrace this time:
Like Jaroslav says you could punt to a workqueue here. I'd be more inclined to move the sleeping stuff out of the trigger operations but that'd avoid the issue too. There are some drivers doing this already IIRC.
So, I'm not really sure what I'm supposed to do here. The drivers involved don't appear to be doing anything extraordinary, but the issues lockdep report are definitely valid too. What are the expectations in terms of context from ALSA when running the callbacks, and how can we fix it?
To me having something in the trigger that needs waiting for is the bit that feels the most awkward fit here, trigger is supposed to run very quickly.