On 01/07/2014 01:29 PM, Fabio Estevam wrote:
Hi Lars-Peter,
On Tue, Jan 7, 2014 at 10:07 AM, Lars-Peter Clausen lars@metafoo.de wrote:
We've had similar problems like this before. Typically this is caused by the dmaengine driver calling the complete callback when it shouldn't. E.g. after dmaegine_terminate_all() has already been called for the channel. There is also unfortunately a small chance condition (which needs some extensions to the dmaengine API before we can fix it). But the chance for this race condition to occur is really small and has only been observed on a 4 (or 8, not sure) core system so far. Is the problem reproducible in your case? Also
Thanks for your comments. Yes, I get this issue on a mx6 quad all the time running linux-next, but not on a mx6solo.
Looking at the imx-sdma dmaengine driver there is a whole bunch of issues in regard to concurrency. The problem might have existed before and only surfaced now due to some random other changes.
For starters there is absolutely no protection against concurrent access to the drivers state struct. Access to buf_tail, bd, num_bd and possible other fields should be protected by a lock. The second problem is that the active descriptor is not invalidated when dmaengine_terminate_all() is called, so if the tasklet runs after dmaengine_terminate_all() has been called the DMA descriptor callback will be called even though it shouldn't.
And another problem is that the imx-sdma driver doesn't really conform to the semantics of the dmaengine API with regrads to the prepare, submit and issue_pending. But I don't think this is the source of the problem in this case.
- Lars