[alsa-devel] sound: spinlock lockup in snd_seq_oss_write

Takashi Iwai tiwai at suse.de
Tue Feb 9 12:14:58 CET 2016


On Thu, 04 Feb 2016 20:37:13 +0100,
Dmitry Vyukov wrote:
> 
> Hello,
> 
> The following program hangs machine:
> https://gist.githubusercontent.com/dvyukov/1c9eeb67414a9e7c5e52/raw/127b660cd2efbce0e3d8a01862d38a2bc0a16bcb/gistfile1.txt
> 
> 
> BUG: spinlock lockup suspected on CPU#2, a.out/7470
>  lock: 0xffff880036a9ef50, .magic: dead4ead, .owner: a.out/7466, .owner_cpu: 3
> CPU: 2 PID: 7470 Comm: a.out Not tainted 4.5.0-rc2+ #319
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>  00000000ffffffff ffff88005f5ff660 ffffffff82be30ad ffff880036a9ef50
>  ffff88005d3d5f00 ffff88005d0b4740 ffff88005f5ff698 ffffffff8146768d
>  ffffffff85241dce ffff880000000000 ffff880036a9ef50 ffff880036a9ef60
> Call Trace:
>  [<     inline     >] __dump_stack lib/dump_stack.c:15
>  [<ffffffff82be30ad>] dump_stack+0x6f/0xa2 lib/dump_stack.c:50
>  [<ffffffff8146768d>] spin_dump+0x14d/0x280 kernel/locking/spinlock_debug.c:67
>  [<     inline     >] __spin_lock_debug kernel/locking/spinlock_debug.c:117
>  [<ffffffff814679ad>] do_raw_spin_lock+0x15d/0x2b0
> kernel/locking/spinlock_debug.c:137
>  [<     inline     >] __raw_spin_lock_irqsave
> include/linux/spinlock_api_smp.h:119
>  [<ffffffff8665ea57>] _raw_spin_lock_irqsave+0xa7/0xd0
> kernel/locking/spinlock.c:159
>  [<ffffffff85241dce>] snd_timer_start+0x10e/0x2b0 sound/core/timer.c:491
>  [<     inline     >] seq_timer_start sound/core/seq/seq_timer.c:393
>  [<ffffffff852bf860>] snd_seq_timer_start+0x1a0/0x2b0
> sound/core/seq/seq_timer.c:405
>  [<     inline     >] snd_seq_queue_process_event sound/core/seq/seq_queue.c:687
>  [<ffffffff852ba514>] snd_seq_control_queue+0x304/0x8b0
> sound/core/seq/seq_queue.c:748
>  [<ffffffff852c01b5>] event_input_timer+0x25/0x30
> sound/core/seq/seq_system.c:118
>  [<ffffffff852b2de4>]
> snd_seq_deliver_single_event.constprop.11+0x3f4/0x740
> sound/core/seq/seq_clientmgr.c:634
>  [<ffffffff852b3248>] snd_seq_deliver_event+0x118/0x800
> sound/core/seq/seq_clientmgr.c:831
>  [<ffffffff852b4636>] snd_seq_kernel_client_dispatch+0x126/0x170
> sound/core/seq/seq_clientmgr.c:2404
>  [<ffffffff852c92eb>] send_timer_event.isra.0+0x10b/0x150
> sound/core/seq/oss/seq_oss_timer.c:153
>  [<ffffffff852c981a>] snd_seq_oss_timer_start+0x1ca/0x310
> sound/core/seq/oss/seq_oss_timer.c:174
>  [<ffffffff852c9c04>] snd_seq_oss_process_timer_event+0x2a4/0x340
> sound/core/seq/oss/seq_oss_timer.c:106
>  [<     inline     >] insert_queue sound/core/seq/oss/seq_oss_rw.c:170
>  [<ffffffff852cf7ab>] snd_seq_oss_write+0x20b/0x810
> sound/core/seq/oss/seq_oss_rw.c:148
>  [<ffffffff852c74f9>] odev_write+0x59/0xa0 sound/core/seq/oss/seq_oss.c:179
>  [<ffffffff817bacd3>] __vfs_write+0x113/0x4b0 fs/read_write.c:528
>  [<ffffffff817bc797>] vfs_write+0x167/0x4a0 fs/read_write.c:577
>  [<     inline     >] SYSC_write fs/read_write.c:624
>  [<ffffffff817bfa81>] SyS_write+0x111/0x220 fs/read_write.c:616
>  [<ffffffff8665f0b6>] entry_SYSCALL_64_fastpath+0x16/0x7a
> arch/x86/entry/entry_64.S:185
> Sending NMI to all CPUs:
> NMI backtrace for cpu 0
> CPU: 0 PID: 7473 Comm: a.out Not tainted 4.5.0-rc2+ #319
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> task: ffff88003376df00 ti: ffff880032440000 task.ti: ffff880032440000
> RIP: 0010:[<ffffffff82c103a7>]  [<ffffffff82c103a7>] delay_tsc+0x27/0x70
> RSP: 0018:ffff8800324479f8  EFLAGS: 00000006
> RAX: 00000000a0066ab3 RBX: ffff880036a9ef50 RCX: 000000000000001e
> RDX: 0000000000000100 RSI: 00000100a0066a95 RDI: 0000000000000001
> RBP: ffff8800324479f8 R08: 0000000000000000 R09: 0000000000000001
> R10: ffff88003376df00 R11: ffff880036a9ef68 R12: ffff880036a9ef60
> R13: 000000009a9d2d40 R14: ffff880036a9ef58 R15: 0000000099db9d6d
> FS:  00007f94a370e700(0000) GS:ffff88003ec00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 000000c82010e120 CR3: 000000003334f000 CR4: 00000000000006f0
> Stack:
>  ffff880032447a08 ffffffff82c102ea ffff880032447a40 ffffffff81467999
>  ffff880036a9ef50 ffff880036a9ee20 0000000000000282 ffff880035d8ab70
>  0000000000000000 ffff880032447a60 ffffffff8665dfbb ffffffff8523dd64
> Call Trace:
>  [<ffffffff82c102ea>] __delay+0xa/0x10 arch/x86/lib/delay.c:153
>  [<     inline     >] __spin_lock_debug kernel/locking/spinlock_debug.c:114
>  [<ffffffff81467999>] do_raw_spin_lock+0x149/0x2b0
> kernel/locking/spinlock_debug.c:137
>  [<     inline     >] __raw_spin_lock include/linux/spinlock_api_smp.h:145
>  [<ffffffff8665dfbb>] _raw_spin_lock+0x3b/0x50 kernel/locking/spinlock.c:151
>  [<     inline     >] spin_lock include/linux/spinlock.h:302
>  [<ffffffff8523dd64>] snd_timer_start_slave+0xd4/0x200 sound/core/timer.c:460
>  [<ffffffff85241880>] snd_timer_continue+0x200/0x290 sound/core/timer.c:598
>  [<     inline     >] snd_timer_user_continue sound/core/timer.c:1822
>  [<     inline     >] __snd_timer_user_ioctl sound/core/timer.c:1889
>  [<ffffffff85246a9d>] snd_timer_user_ioctl+0xe1d/0x25f0 sound/core/timer.c:1904
>  [<     inline     >] vfs_ioctl fs/ioctl.c:43
>  [<ffffffff817faf7c>] do_vfs_ioctl+0x18c/0xfb0 fs/ioctl.c:674
>  [<     inline     >] SYSC_ioctl fs/ioctl.c:689
>  [<ffffffff817fbe2f>] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:680
>  [<ffffffff8665f0b6>] entry_SYSCALL_64_fastpath+0x16/0x7a
> arch/x86/entry/entry_64.S:185
> Code: 00 00 00 00 55 65 ff 05 c8 4a 40 7d 48 89 e5 65 44 8b 05 8d 2d
> 40 7d 0f ae e8 0f 31 48 c1 e2 20 48 89 d6 48 09 c6 0f ae e8 0f 31 <48>
> c1 e2 20 48 09 d0 48 89 c1 48 29 f1 48 39 f9 73 27 65 ff 0d
> NMI backtrace for cpu 1
> CPU: 1 PID: 7421 Comm: stress Not tainted 4.5.0-rc2+ #319
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> task: ffff88003330df00 ti: ffff880032230000 task.ti: ffff880032230000
> RIP: 0010:[<ffffffff82c103a7>]  [<ffffffff82c103a7>] delay_tsc+0x27/0x70
> RSP: 0018:ffff88003ed07c80  EFLAGS: 00000006
> RAX: 00000000a0066d38 RBX: ffff880036a9ef50 RCX: 000000000000001e
> RDX: 0000000000000100 RSI: 00000100a0066d0f RDI: 0000000000000001
> RBP: ffff88003ed07c80 R08: 0000000000000001 R09: 0000000000000001
> R10: ffff88003330df00 R11: ffff880036a9ef68 R12: ffff880036a9ef60
> R13: 000000009a9d2d40 R14: ffff880036a9ef58 R15: 000000009a82fe90
> FS:  000000c820127700(0000) GS:ffff88003ed00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 000000c8201004e0 CR3: 0000000032076000 CR4: 00000000000006e0
> Stack:
>  ffff88003ed07c90 ffffffff82c102ea ffff88003ed07cc8 ffffffff81467999
>  ffff880036a9ef50 0000000000000082 ffff880036a9ee20 00000000000f4240
>  ffff880032795068 ffff88003ed07cf0 ffffffff8665ea57 ffffffff85242019
> Call Trace:
> d [<ffffffff82c102ea>] __delay+0xa/0x10 arch/x86/lib/delay.c:153
>  [<     inline     >] __spin_lock_debug kernel/locking/spinlock_debug.c:114
>  [<ffffffff81467999>] do_raw_spin_lock+0x149/0x2b0
> kernel/locking/spinlock_debug.c:137
>  [<     inline     >] __raw_spin_lock_irqsave
> include/linux/spinlock_api_smp.h:119
>  [<ffffffff8665ea57>] _raw_spin_lock_irqsave+0xa7/0xd0
> kernel/locking/spinlock.c:159
>  [<ffffffff85242019>] snd_timer_interrupt+0xa9/0xc80 sound/core/timer.c:718
>  [<ffffffff85248cb9>] snd_hrtimer_callback+0x169/0x230 sound/core/hrtimer.c:54
>  [<     inline     >] __run_hrtimer kernel/time/hrtimer.c:1248
>  [<ffffffff814b7dc3>] __hrtimer_run_queues+0x363/0xc20
> kernel/time/hrtimer.c:1312
>  [<ffffffff814b9de2>] hrtimer_interrupt+0x182/0x430 kernel/time/hrtimer.c:1346
>  [<ffffffff8124e1d2>] local_apic_timer_interrupt+0x72/0xe0
> arch/x86/kernel/apic/apic.c:907
>  [<ffffffff812516f9>] smp_apic_timer_interrupt+0x79/0xa0
> arch/x86/kernel/apic/apic.c:931
>  [<ffffffff8665feac>] apic_timer_interrupt+0x8c/0xa0
> arch/x86/entry/entry_64.S:520
>  [<ffffffff817ab804>] create_object+0x34/0x2d0 mm/kmemleak.c:546
>  [<ffffffff86641fb3>] kmemleak_alloc+0x63/0xa0 mm/kmemleak.c:916
>  [<     inline     >] kmemleak_alloc_recursive include/linux/kmemleak.h:47
>  [<     inline     >] slab_post_alloc_hook mm/slub.c:1337
>  [<     inline     >] slab_alloc_node mm/slub.c:2596
>  [<     inline     >] slab_alloc mm/slub.c:2604
>  [<ffffffff81760843>] kmem_cache_alloc+0x153/0x2e0 mm/slub.c:2609
>  [<ffffffff8181b1b5>] dup_fd+0x85/0xa90 fs/file.c:298
>  [<     inline     >] copy_files kernel/fork.c:1033
>  [<ffffffff8134efed>] copy_process.part.35+0x188d/0x57a0 kernel/fork.c:1436
>  [<     inline     >] copy_process kernel/fork.c:1274
>  [<ffffffff8135322c>] _do_fork+0x1bc/0xcb0 kernel/fork.c:1723
>  [<     inline     >] SYSC_clone kernel/fork.c:1832
>  [<ffffffff81353df7>] SyS_clone+0x37/0x50 kernel/fork.c:1826
>  [<ffffffff8665f0b6>] entry_SYSCALL_64_fastpath+0x16/0x7a
> arch/x86/entry/entry_64.S:185
> Code: 00 00 00 00 55 65 ff 05 c8 4a 40 7d 48 89 e5 65 44 8b 05 8d 2d
> 40 7d 0f ae e8 0f 31 48 c1 e2 20 48 89 d6 48 09 c6 0f ae e8 0f 31 <48>
> c1 e2 20 48 09 d0 48 89 c1 48 29 f1 48 39 f9 73 27 65 ff 0d
> NMI backtrace for cpu 2
> CPU: 2 PID: 7470 Comm: a.out Not tainted 4.5.0-rc2+ #319
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> task: ffff88005d0b4740 ti: ffff88005f5f8000 task.ti: ffff88005f5f8000
> RIP: 0010:[<ffffffff81262066>]  [<ffffffff81262066>]
> flat_send_IPI_mask+0x156/0x290
> RSP: 0018:ffff88005f5ff5d0  EFLAGS: 00000046
> RAX: 0000000000000000 RBX: 0000000000000c00 RCX: 0000000000000000
> RDX: 0000000000000c00 RSI: 0000000000000002 RDI: ffffffffff5fc300
> RBP: ffff88005f5ff5f8 R08: 0000000000000001 R09: 0000000000000000
> R10: fffffbfff11c14aa R11: 1ffffffff13552ed R12: 0000000000000082
> BUG: spinlock lockup suspected on CPU#1, stress/7421
>  lock: 0xffff880036a9ef50, .magic: dead4ead, .owner: a.out/7466, .owner_cpu: 3
> CPU: 1 PID: 7421 Comm: stress Not tainted 4.5.0-rc2+ #319
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>  00000000ffffffff ffff88003ed07c58 ffffffff82be30ad ffff880036a9ef50
>  ffff88005d3d5f00 ffff88003330df00 ffff88003ed07c90 ffffffff8146768d
>  ffffffff85242019 0000000000000000 ffff880036a9ef50 ffff880036a9ef60
> Call Trace:
>  <IRQ>  [<     inline     >] __dump_stack lib/dump_stack.c:15
>  <IRQ>  [<ffffffff82be30ad>] dump_stack+0x6f/0xa2 lib/dump_stack.c:50
>  [<ffffffff8146768d>] spin_dump+0x14d/0x280 kernel/locking/spinlock_debug.c:67
>  [<     inline     >] __spin_lock_debug kernel/locking/spinlock_debug.c:117
>  [<ffffffff814679ad>] do_raw_spin_lock+0x15d/0x2b0
> kernel/locking/spinlock_debug.c:137
>  [<     inline     >] __raw_spin_lock_irqsave
> include/linux/spinlock_api_smp.h:119
>  [<ffffffff8665ea57>] _raw_spin_lock_irqsave+0xa7/0xd0
> kernel/locking/spinlock.c:159
>  [<ffffffff85242019>] snd_timer_interrupt+0xa9/0xc80 sound/core/timer.c:718
>  [<ffffffff85248cb9>] snd_hrtimer_callback+0x169/0x230 sound/core/hrtimer.c:54
>  [<     inline     >] __run_hrtimer kernel/time/hrtimer.c:1248
>  [<ffffffff814b7dc3>] __hrtimer_run_queues+0x363/0xc20
> kernel/time/hrtimer.c:1312
>  [<ffffffff814b9de2>] hrtimer_interrupt+0x182/0x430 kernel/time/hrtimer.c:1346
>  [<ffffffff8124e1d2>] local_apic_timer_interrupt+0x72/0xe0
> arch/x86/kernel/apic/apic.c:907
>  [<ffffffff812516f9>] smp_apic_timer_interrupt+0x79/0xa0
> arch/x86/kernel/apic/apic.c:931
>  [<ffffffff8665feac>] apic_timer_interrupt+0x8c/0xa0
> arch/x86/entry/entry_64.S:520
>  [<ffffffff817ab804>] create_object+0x34/0x2d0 mm/kmemleak.c:546
>  [<ffffffff86641fb3>] kmemleak_alloc+0x63/0xa0 mm/kmemleak.c:916
>  [<     inline     >] kmemleak_alloc_recursive include/linux/kmemleak.h:47
>  [<     inline     >] slab_post_alloc_hook mm/slub.c:1337
>  [<     inline     >] slab_alloc_node mm/slub.c:2596
>  [<     inline     >] slab_alloc mm/slub.c:2604
>  [<ffffffff81760843>] kmem_cache_alloc+0x153/0x2e0 mm/slub.c:2609
>  [<ffffffff8181b1b5>] dup_fd+0x85/0xa90 fs/file.c:298
>  [<     inline     >] copy_files kernel/fork.c:1033
>  [<ffffffff8134efed>] copy_process.part.35+0x188d/0x57a0 kernel/fork.c:1436
>  [<     inline     >] copy_process kernel/fork.c:1274
>  [<ffffffff8135322c>] _do_fork+0x1bc/0xcb0 kernel/fork.c:1723
>  [<     inline     >] SYSC_clone kernel/fork.c:1832
>  [<ffffffff81353df7>] SyS_clone+0x37/0x50 kernel/fork.c:1826
>  [<ffffffff8665f0b6>] entry_SYSCALL_64_fastpath+0x16/0x7a
> arch/x86/entry/entry_64.S:185
> R13: 000000000f000000 R14: ffffffff87b00c80 R15: 0000000000000002
> FS:  00007fcabc427700(0000) GS:ffff88006d600000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 0000000020028073 CR3: 000000005c829000 CR4: 00000000000006e0
> Stack:
>  ffffffff87b00c80 ffffffff888be0a0 0000000000000007 fffffbfff111761c
>  ffff88006d71a540 ffff88005f5ff618 ffffffff81257bfe ffffffff86ac7020
>  dffffc0000000000 ffff88005f5ff688 ffffffff82bee6ac ffffffff811aa610
> Call Trace:
>  [<ffffffff81257bfe>] nmi_raise_cpu_backtrace+0x5e/0x80
> arch/x86/kernel/apic/hw_nmi.c:33
>  [<ffffffff82bee6ac>] nmi_trigger_all_cpu_backtrace+0x54c/0x5f0
> lib/nmi_backtrace.c:85
>  [<ffffffff81257c54>] arch_trigger_all_cpu_backtrace+0x14/0x20
> arch/x86/kernel/apic/hw_nmi.c:38
>  [<     inline     >] trigger_all_cpu_backtrace include/linux/nmi.h:41
>  [<     inline     >] __spin_lock_debug kernel/locking/spinlock_debug.c:119
>  [<ffffffff814679b7>] do_raw_spin_lock+0x167/0x2b0
> kernel/locking/spinlock_debug.c:137
>  [<     inline     >] __raw_spin_lock_irqsave
> include/linux/spinlock_api_smp.h:119
>  [<ffffffff8665ea57>] _raw_spin_lock_irqsave+0xa7/0xd0
> kernel/locking/spinlock.c:159
>  [<ffffffff85241dce>] snd_timer_start+0x10e/0x2b0 sound/core/timer.c:491
>  [<     inline     >] seq_timer_start sound/core/seq/seq_timer.c:393
>  [<ffffffff852bf860>] snd_seq_timer_start+0x1a0/0x2b0
> sound/core/seq/seq_timer.c:405
>  [<     inline     >] snd_seq_queue_process_event sound/core/seq/seq_queue.c:687
>  [<ffffffff852ba514>] snd_seq_control_queue+0x304/0x8b0
> sound/core/seq/seq_queue.c:748
>  [<ffffffff852c01b5>] event_input_timer+0x25/0x30
> sound/core/seq/seq_system.c:118
>  [<ffffffff852b2de4>]
> snd_seq_deliver_single_event.constprop.11+0x3f4/0x740
> sound/core/seq/seq_clientmgr.c:634
>  [<ffffffff852b3248>] snd_seq_deliver_event+0x118/0x800
> sound/core/seq/seq_clientmgr.c:831
> 
> On commit b37a05c083c85c2657dca9bbe1f5d79dccf756d5 + latest Takashi fixes.

This took really long time to finally figure out what goes wrong.
Despite it showing OSS sequencer, the real culprit of this lockup was
a typo: a wrong pointer was passed in the loop.

@@ -422,7 +422,7 @@ static void snd_timer_notify1(struct snd_timer_instance *ti, int event)
        spin_lock_irqsave(&timer->lock, flags);
        list_for_each_entry(ts, &ti->slave_active_head, active_list)
                if (ts->ccallback)
-                       ts->ccallback(ti, event + 100, &tstamp, resolution);
+                       ts->ccallback(ts, event + 100, &tstamp, resolution);
        spin_unlock_irqrestore(&timer->lock, flags);

The proper patch is found in the first attachment.

However, this alone isn't enough.  I still got a list corruption even
after that fix.  Chasing this also took long time, and it resulted in
yet another trivial patch, as found in the second attachment.  It's
basically a missing spinlock for a slave timer at stopping it.

@@ -518,9 +518,13 @@ static int _snd_timer_stop(struct snd_timer_instance *timeri, int event)
                        spin_unlock_irqrestore(&slave_active_lock, flags);
                        return -EBUSY;
                }
+               if (timeri->timer)
+                       spin_lock(&timeri->timer->lock);
                timeri->flags &= ~SNDRV_TIMER_IFLG_RUNNING;
                list_del_init(&timeri->ack_list);
                list_del_init(&timeri->active_list);
+               if (timeri->timer)
+                       spin_unlock(&timeri->timer->lock);
                spin_unlock_irqrestore(&slave_active_lock, flags);
                goto __end;
        }

Could you try these two patches below?  These cured the test case on
my local machine, at least.  If it's not enough, I have yet more
patches to try, but these are the minimally needed changes I figured
out.


thanks,

Takashi

-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-ALSA-timer-Fix-wrong-instance-passed-to-slave-callba.patch
Type: application/octet-stream
Size: 1383 bytes
Desc: not available
URL: <http://mailman.alsa-project.org/pipermail/alsa-devel/attachments/20160209/32c258a5/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-ALSA-timer-Fix-race-between-stop-and-interrupt.patch
Type: application/octet-stream
Size: 1602 bytes
Desc: not available
URL: <http://mailman.alsa-project.org/pipermail/alsa-devel/attachments/20160209/32c258a5/attachment-0001.obj>


More information about the Alsa-devel mailing list