On 25/03/2019 09:59, Takashi Iwai wrote:
On Thu, 21 Mar 2019 15:28:23 +0100, Jon Hunter wrote:
On 20/03/2019 17:59, Jon Hunter wrote:
On 20/03/2019 17:19, Takashi Iwai wrote:
On Wed, 20 Mar 2019 17:04:14 +0100, Jon Hunter wrote:
From: Jonathan Hunter jonathanh@nvidia.com
This issue is not observed on the latest mainline kernel since the 'ALSA: PCM suspend cleanup' series has been applied and the snd_pcm_suspend_all() function call in the HDA codec driver has been removed. However, this issue impacts stable kernel branches and so I am trying to find a solution for these branches.
When stress testing audio playback across multiple HDA streams simultaneously, the following failure is sometimes observed when starting playback ...
Unable to set hw params for playback: File descriptor in bad state
The problem is caused because ...
- Playback is starting for one HDA codec and so the chip->open_mutex in azx_pcm_open() has been acquired.
- Playback for another HDA codec is starting but is blocked waiting to acquire the chip->open_mutex in azx_pcm_open().
- For the HDA codec that is blocked, its runtime-pm status is still enabled from a previous playback session that has since completed and been closed, however its autosuspend timeout has not expired yet.
- For the HDA codec that is blocked, the runtime-pm autosuspend timeout now occurs calling the runtime-pm suspend callback and this calls snd_pcm_suspend_all() placing all PCM streams in the suspended state.
- The block HDA codec is now unblocked and fails to set the HW params because the PCM stream is in the suspended state.
The above has been observed on Tegra platforms where the autosuspend delay is set to 1 second and there is a delay to an AZX command. This highlights that there is a window of time where autosuspend can place the HDA PCM stream in the suspended state while opening the stream and cause playback to fail.
Looking at commit 3d21ef0b49f8 ('ALSA: pcm: Suspend streams globally via device type PM ops') it appears that the call to snd_pcm_suspend_all() has now been moved to the to the suspend handler for PCM streams and so I am wondering if it would be equivalent to make the following change to the HDA codec driver to achieve the same behaviour but only for the HDA driver. So far the issue has not been observed with this change.
Please note that I am not sending this as a formal patch as I wanted to get some feedback/comments on the above.
I guess just backporting 3d21ef0b49f8 should be OK even for older kernels. This assures the PCM stream gets suspended before entering any parent device suspend call.
The remaining changes (except for the one for atiixp) are merely cleanup of superfluous calls, and keeping them are harmless.
Could you check whether my theory above correct?
I believe that you will need that change and 17bc4815de58 because we need to prevent snd_pcm_suspend_all() being called in the pm-runtime suspend callback. Applying only 3d21ef0b49f8 will means that at runtime snd_pcm_suspend_all() can still be called from within the HDA codec driver when the autosuspend timeout occurs. I will test this and confirm. Maybe both could be backported?
I confirmed today that with just 3d21ef0b49f8 the issue can occur, but with both 3d21ef0b49f8 and 17bc4815de58 the issue is not seen.
Do you think that it would be appropriate to include these in the stable branches? They apply cleanly to v5.0, but we would probably need to back-port for early kernels such as v4.19, v4.9, etc.
I reconsidered the problem again, and noticed that the very same problem may appear with the system PM, not only with runtime PM. The suspend can happen at any time, so even a stream in OPEN state may go to suspend, and you'll hit the same problem. It's just a corner case so no one really cared much.
Yes I was not sure if it could also happen in the system suspend case or not.
So, I think the patch like below should fix the problem. This can be easily backported to all stable trees, and it alone should work without backporting the else intrusive changes.
Could you check whether my theory is correct?
Do you want me to test this on the same stable branch I have been testing with where commits 3d21ef0b49f8 and 17bc4815de58 are not present?
Cheers Jon -- nvpublic