[alsa-devel] [REGRESSION 5.3] WARNING: pulseaudio/1297 still has locks held!

When upgrading from 5.2 to 5.3, I now encounter this error after boot:
==================================== WARNING: pulseaudio/1297 still has locks held! 5.3.0+ #1826 Not tainted ------------------------------------ 1 lock held by pulseaudio/1297: #0: ee815308 (&hcp->lock){....}, at: hdmi_codec_startup+0x20/0x130
stack backtrace: CPU: 0 PID: 1297 Comm: pulseaudio Not tainted 5.3.0+ #1826 Hardware name: Marvell Dove (Cubox) [<c0017b4c>] (unwind_backtrace) from [<c0014d10>] (show_stack+0x10/0x14) [<c0014d10>] (show_stack) from [<c00a2d18>] (futex_wait_queue_me+0x13c/0x19c) [<c00a2d18>] (futex_wait_queue_me) from [<c00a3630>] (futex_wait+0x184/0x24c) [<c00a3630>] (futex_wait) from [<c00a5e1c>] (do_futex+0x334/0x598) [<c00a5e1c>] (do_futex) from [<c00a62e8>] (sys_futex_time32+0x118/0x180) [<c00a62e8>] (sys_futex_time32) from [<c0009000>] (ret_fast_syscall+0x0/0x54) Exception stack(0xebd31fa8 to 0xebd31ff0) 1fa0: 00000000 ffffffff 000c8748 00000189 00000001 00000000 1fc0: 00000000 ffffffff 00000000 000000f0 00000000 00000000 00000000 00056200 1fe0: 000000f0 beac03a8 b6d6c835 b6d6f456
This seems to be caused by:
eb1ecadb7f67 ("ASoC: hdmi-codec: re-introduce mutex locking")
which holds a mutex and returns to userspace. Meanwhile, userspace issues a futex operation, which ends up in futex_wait_queue_me().
futex_wait_queue_me() can wait, and if the system will be hibernating, is a potential hibernation point. One of the requirements there is that no locks shall be held, and the freezer code verifies that by calling debug_check_no_locks_held(). With lockdep enabled, it reveals that hdmi-codec is indeed holding a lock.
Reverting this commit solves the problem.

On Tue 22 Oct 2019 at 15:57, Russell King - ARM Linux admin linux@armlinux.org.uk wrote:
When upgrading from 5.2 to 5.3, I now encounter this error after boot:
==================================== WARNING: pulseaudio/1297 still has locks held! 5.3.0+ #1826 Not tainted
1 lock held by pulseaudio/1297: #0: ee815308 (&hcp->lock){....}, at: hdmi_codec_startup+0x20/0x130
stack backtrace: CPU: 0 PID: 1297 Comm: pulseaudio Not tainted 5.3.0+ #1826 Hardware name: Marvell Dove (Cubox) [<c0017b4c>] (unwind_backtrace) from [<c0014d10>] (show_stack+0x10/0x14) [<c0014d10>] (show_stack) from [<c00a2d18>] (futex_wait_queue_me+0x13c/0x19c) [<c00a2d18>] (futex_wait_queue_me) from [<c00a3630>] (futex_wait+0x184/0x24c) [<c00a3630>] (futex_wait) from [<c00a5e1c>] (do_futex+0x334/0x598) [<c00a5e1c>] (do_futex) from [<c00a62e8>] (sys_futex_time32+0x118/0x180) [<c00a62e8>] (sys_futex_time32) from [<c0009000>] (ret_fast_syscall+0x0/0x54) Exception stack(0xebd31fa8 to 0xebd31ff0) 1fa0: 00000000 ffffffff 000c8748 00000189 00000001 00000000 1fc0: 00000000 ffffffff 00000000 000000f0 00000000 00000000 00000000 00056200 1fe0: 000000f0 beac03a8 b6d6c835 b6d6f456
This seems to be caused by:
eb1ecadb7f67 ("ASoC: hdmi-codec: re-introduce mutex locking")
which holds a mutex and returns to userspace. Meanwhile, userspace issues a futex operation, which ends up in futex_wait_queue_me().
futex_wait_queue_me() can wait, and if the system will be hibernating, is a potential hibernation point. One of the requirements there is that no locks shall be held, and the freezer code verifies that by calling debug_check_no_locks_held(). With lockdep enabled, it reveals that hdmi-codec is indeed holding a lock.
Reverting this commit solves the problem.
Hi Mark,
Just before this patch, I had reworked the hdmi-codec to use an atomic variable to track if the codec is busy. This was done to solve some other locking issue.
When you asked me to move back to a mutex, I have probably been a bit brutal about it. The mutex is indeed held as long as the mutex is busy, on purpose.
AFAIK, the revert is safe. We can keep the codec this way if it is ok with you. IMO, it is simpler that way
I can also submit a follow-up to use a mutex in a safer/cleaner way (mutex + busy flag) if you prefer.
Let me know how you wish to proceed. Regards
Jerome

On Tue, Oct 22, 2019 at 04:14:49PM +0200, Jerome Brunet wrote:
AFAIK, the revert is safe. We can keep the codec this way if it is ok with you. IMO, it is simpler that way
Let's just do the revert, at least initially, so it can be backported simply.
I can also submit a follow-up to use a mutex in a safer/cleaner way (mutex
- busy flag) if you prefer.
That'd definitely be worth looking at, both for the extra safety and in case we discover on review that the revert is causing problems.
participants (3)
-
Jerome Brunet
-
Mark Brown
-
Russell King - ARM Linux admin