Crash in acpi_ns_validate_handle triggered by soundwire on Linux 5.10
Marcin Ślusarz
marcin.slusarz at gmail.com
Fri Jan 29 19:59:52 CET 2021
czw., 28 sty 2021 o 15:32 Marcin Ślusarz <marcin.slusarz at gmail.com> napisał(a):
>
> czw., 28 sty 2021 o 13:39 Rafael J. Wysocki <rafael at kernel.org> napisał(a):
> > The only explanation for that I can think about (and which does not
> > involve supernatural intervention so to speak) is a stack corruption
> > occurring between these two calls in sdw_intel_acpi_cb(). IOW,
> > something scribbles on the handle in the meantime, but ATM I have no
> > idea what that can be.
>
> I tried KASAN but it didn't find anything and kernel actually booted
> successfully.
I investigated this and it looks like a compiler bug (or something nastier),
but I can't find where exactly registers get corrupted because if I add printks
the corruption seems on the printk side, but if I don't add them it seems
the value gets corrupted earlier.
Here's what I tried:
1) If I add printk(KERN_INFO "%s handle: %p\n", __func__, handle); to
acpi_ns_validate_handle before and after call to ACPI_GET_DESCRIPTOR_TYPE
I get this:
acpi_ns_validate_handle handle: 00000000433f39ec
BUG: kernel NULL pointer dereference, address: 0000000000000050
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] SMP NOPTI
CPU: 3 PID: 473 Comm: systemd-udevd Tainted: G W E 5.10.11+ #13
Hardware name: HP HP Pavilion Laptop 15-cs3xxx/86E2, BIOS F.05 01/01/2020
RIP: 0010:acpi_ns_validate_handle+0x39/0x62
Code: 49 8d 44 24 ff 48 83 f8 fd 76 09 4c 8b 25 4a b4 89 01 eb 39 4c
89 e2 48 c7 c6 c0 a4 6c 8f 48 c7 c7 46 6d 94 8f e8 eb 87 31 00 <41> 80
7c 24 08 0f 75 18 4c 89 e2 48 c7 c6 c0 a4 6c 8f 48 c7 c7 46
RSP: 0018:ffff9f488060bb00 EFLAGS: 00010246
RAX: 0000000000000030 RBX: ffff9f488060bb68 RCX: ffff889d5fad8a08
RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff889d5fad8a00
RBP: 0000000000000048 R08: 0000000000000000 R09: ffff9f488060b928
R10: ffff9f488060b920 R11: ffffffff8fcdc3f8 R12: 0000000000000048
R13: 0000000000000000 R14: ffffffff8ed400f0 R15: 0000000000000000
FS: 00007f811fdbe8c0(0000) GS:ffff889d5fac0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000050 CR3: 000000026877c001 CR4: 0000000000770ee0
PKRU: 55555554
Call Trace:
acpi_get_data_full+0x81/0xdf
acpi_bus_get_device+0x32/0xa0
sdw_intel_acpi_scan.cold+0x23/0x21c [soundwire_intel]
snd_intel_dsp_driver_probe.cold+0x187/0x1b2 [snd_intel_dspcfg]
azx_probe+0x7a/0x970 [snd_hda_intel]
local_pci_probe+0x42/0x80
? _cond_resched+0x16/0x40
pci_device_probe+0xfd/0x1b0
really_probe+0xf2/0x440
driver_probe_device+0xe1/0x150
device_driver_attach+0xa1/0xb0
__driver_attach+0x8a/0x150
? device_driver_attach+0xb0/0xb0
? device_driver_attach+0xb0/0xb0
bus_for_each_dev+0x78/0xc0
bus_add_driver+0x12b/0x1e0
driver_register+0x8b/0xe0
? 0xffffffffc128a000
do_one_initcall+0x44/0x1d0
? do_init_module+0x23/0x260
? kmem_cache_alloc_trace+0xf5/0x200
do_init_module+0x5c/0x260
Generic FE-GE Realtek PHY r8169-1000:00: attached PHY driver [Generic
FE-GE Realtek PHY] (mii_bus:phy_addr=r8169-1000:00, irq=IGNORE)
__do_sys_finit_module+0xb1/0x110
do_syscall_64+0x33/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xa9
$ gdb vmlinux
(...)
(gdb) disassemble acpi_ns_validate_handle
Dump of assembler code for function acpi_ns_validate_handle:
0xffffffff815781cd <+0>: call 0xffffffff81062c40 <__fentry__>
0xffffffff815781d2 <+5>: push %r12
0xffffffff815781d4 <+7>: mov %rdi,%r12
0xffffffff815781d7 <+10>: call 0xffffffff81584a44
<acpi_ut_track_stack_ptr>
0xffffffff815781dc <+15>: lea -0x1(%r12),%rax
0xffffffff815781e1 <+20>: cmp $0xfffffffffffffffd,%rax
0xffffffff815781e5 <+24>: jbe 0xffffffff815781f0
<acpi_ns_validate_handle+35>
0xffffffff815781e7 <+26>: mov 0x189b44a(%rip),%r12 #
0xffffffff82e13638 <acpi_gbl_root_node>
0xffffffff815781ee <+33>: jmp 0xffffffff81578229
<acpi_ns_validate_handle+92>
0xffffffff815781f0 <+35>: mov %r12,%rdx
0xffffffff815781f3 <+38>: mov $0xffffffff81eca4c0,%rsi
0xffffffff815781fa <+45>: mov $0xffffffff82146d46,%rdi
0xffffffff81578201 <+52>: call 0xffffffff818909f1 <printk>
0xffffffff81578206 <+57>: cmpb $0xf,0x8(%r12)
0xffffffff8157820c <+63>: jne 0xffffffff81578226
<acpi_ns_validate_handle+89>
0xffffffff8157820e <+65>: mov %r12,%rdx
0xffffffff81578211 <+68>: mov $0xffffffff81eca4c0,%rsi
0xffffffff81578218 <+75>: mov $0xffffffff82146d46,%rdi
0xffffffff8157821f <+82>: call 0xffffffff818909f1 <printk>
0xffffffff81578224 <+87>: jmp 0xffffffff81578229
<acpi_ns_validate_handle+92>
0xffffffff81578226 <+89>: xor %r12d,%r12d
0xffffffff81578229 <+92>: mov %r12,%rax
0xffffffff8157822c <+95>: pop %r12
0xffffffff8157822e <+97>: ret
End of assembler dump.
(...)
(gdb) print (char *)0xffffffff82146d46
$2 = 0xffffffff82146d46 "\001\066%s handle: %p\n"
(gdb) print (char *)0xffffffff81eca4c0
$3 = 0xffffffff81eca4c0 <__func__.3> "acpi_ns_validate_handle"
The crash is on cmpb $0xf,0x8(%r12).
%r12 is 0x48, so 0x48 + 0x8 == 0x50 and that's the address we are crashing on.
However, how can %r12 become 0x48 when few instructions above
we called printk and the value we see in the kernel log is different?
This probably means that printk is corrupting it... (or it's a CPU
bug, yeah right,
but with my luck for such issues who knows ;)
I don't dare to debug printk.
2) Without printks I get this:
BUG: kernel NULL pointer dereference, address: 0000000000000050
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] SMP NOPTI
CPU: 0 PID: 500 Comm: systemd-udevd Tainted: G W E 5.10.11+ #14
Hardware name: HP HP Pavilion Laptop 15-cs3xxx/86E2, BIOS F.05 01/01/2020
RIP: 0010:acpi_ns_validate_handle+0x23/0x34
Code: 41 5d 41 5e 41 5f c3 0f 1f 44 00 00 41 54 49 89 fc e8 8d c7 00
00 49 8d 44 24 ff 48 83 f8 fd 76 09 4c 8b 25 4a b4 89 01 eb 0b <41> 80
7c 24 08 0f 74 03 45 31 e4 4c 89 e0 41 5c c3 0f 1f 44 00 00
RSP: 0018:ffffbfc0c08dfb08 EFLAGS: 00010213
RAX: 0000000000000047 RBX: ffffbfc0c08dfb70 RCX: 0000000000000000
RDX: ffffffffc11204b1 RSI: 0000000000000246 RDI: 0000000000000048
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: ffffffffc11204b1 R11: 0000000000000001 R12: 0000000000000048
R13: ffffffffa37400f0 R14: 0000000000000048 R15: 0000000000000000
FS: 00007fbfdfd478c0(0000) GS:ffff9cd01fa00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000050 CR3: 00000001401de003 CR4: 0000000000770ef0
PKRU: 55555554
Call Trace:
acpi_get_data_full+0x4d/0x92
acpi_bus_get_device+0x32/0xa0
sdw_intel_acpi_scan+0x59/0x230 [soundwire_intel]
? strstr+0x22/0x60
snd_intel_dsp_driver_probe.cold+0xaf/0x163 [snd_intel_dspcfg]
azx_probe+0x7a/0x970 [snd_hda_intel]
local_pci_probe+0x42/0x80
? _cond_resched+0x16/0x40
pci_device_probe+0xfd/0x1b0
really_probe+0xf2/0x440
driver_probe_device+0xe1/0x150
device_driver_attach+0xa1/0xb0
__driver_attach+0x8a/0x150
? device_driver_attach+0xb0/0xb0
? device_driver_attach+0xb0/0xb0
bus_for_each_dev+0x78/0xc0
bus_add_driver+0x12b/0x1e0
driver_register+0x8b/0xe0
? 0xffffffffc1112000
do_one_initcall+0x44/0x1d0
? do_init_module+0x23/0x260
? kmem_cache_alloc_trace+0xf5/0x200
do_init_module+0x5c/0x260
__do_sys_finit_module+0xb1/0x110
do_syscall_64+0x33/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xa9
$ gdb vmlinux
(...)
(gdb) disassemble acpi_ns_validate_handle
Dump of assembler code for function acpi_ns_validate_handle:
0xffffffff815781cd <+0>: call 0xffffffff81062c40 <__fentry__>
0xffffffff815781d2 <+5>: push %r12
0xffffffff815781d4 <+7>: mov %rdi,%r12
0xffffffff815781d7 <+10>: call 0xffffffff81584969
<acpi_ut_track_stack_ptr>
0xffffffff815781dc <+15>: lea -0x1(%r12),%rax
0xffffffff815781e1 <+20>: cmp $0xfffffffffffffffd,%rax
0xffffffff815781e5 <+24>: jbe 0xffffffff815781f0
<acpi_ns_validate_handle+35>
0xffffffff815781e7 <+26>: mov 0x189b44a(%rip),%r12 #
0xffffffff82e13638 <acpi_gbl_root_node>
0xffffffff815781ee <+33>: jmp 0xffffffff815781fb
<acpi_ns_validate_handle+46>
0xffffffff815781f0 <+35>: cmpb $0xf,0x8(%r12)
0xffffffff815781f6 <+41>: je 0xffffffff815781fb
<acpi_ns_validate_handle+46>
0xffffffff815781f8 <+43>: xor %r12d,%r12d
0xffffffff815781fb <+46>: mov %r12,%rax
0xffffffff815781fe <+49>: pop %r12
0xffffffff81578200 <+51>: ret
End of assembler dump.
Again, we crash on cmpb $0xf,0x8(%r12).
%r12 == 0x48, so again we crash on address 0x50.
Why is %r12 == 0x48? It was either corrupted by acpi_ut_track_stack_ptr
or before we even entered this function.
It's weird that the value is exactly the same when corruption
has a different origin...
3) Without printks and with ACPI_DEBUG=n
BUG: kernel NULL pointer dereference, address: 0000000000000050
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] SMP NOPTI
CPU: 7 PID: 473 Comm: systemd-udevd Tainted: G W E 5.10.11+ #15
Hardware name: HP HP Pavilion Laptop 15-cs3xxx/86E2, BIOS F.05 01/01/2020
RIP: 0010:acpi_ns_validate_handle+0x1a/0x23
Code: 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 0f 1f 44 00 00
48 8d 57 ff 48 89 f8 48 83 fa fd 76 08 48 8b 05 8c 28 6a 01 c3 <80> 7f
08 0f 74 02 31 c0 c3 0f 1f 44 00 00 48 8b 3d 76 28 6a 01 e8
RSP: 0018:ffffa96ac0847b20 EFLAGS: 00010213
RAX: 0000000000000048 RBX: ffffa96ac0847b70 RCX: 0000000000000000
RDX: 0000000000000047 RSI: 0000000000000246 RDI: 0000000000000048
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: ffffffffc10ee4b1 R11: ffffffffa14e3268 R12: 0000000000001001
R13: ffffffffa073f150 R14: 0000000000000048 R15: 0000000000000000
FS: 00007f5a27c588c0(0000) GS:ffff91555fbc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000050 CR3: 000000010ab64004 CR4: 0000000000770ee0
PKRU: 55555554
Call Trace:
acpi_get_data_full+0x4d/0x92
acpi_bus_get_device+0x26/0x50
sdw_intel_acpi_scan+0x59/0x230 [soundwire_intel]
? strstr+0x22/0x60
snd_intel_dsp_driver_probe.cold+0xaf/0x163 [snd_intel_dspcfg]
azx_probe+0x7a/0x970 [snd_hda_intel]
local_pci_probe+0x42/0x80
? _cond_resched+0x16/0x40
pci_device_probe+0xfd/0x1b0
really_probe+0xf2/0x440
driver_probe_device+0xe1/0x150
device_driver_attach+0xa1/0xb0
__driver_attach+0x8a/0x150
? device_driver_attach+0xb0/0xb0
? device_driver_attach+0xb0/0xb0
bus_for_each_dev+0x78/0xc0
bus_add_driver+0x12b/0x1e0
driver_register+0x8b/0xe0
? 0xffffffffc1217000
do_one_initcall+0x44/0x1d0
? do_init_module+0x23/0x260
? kmem_cache_alloc_trace+0xf5/0x200
do_init_module+0x5c/0x260
__do_sys_finit_module+0xb1/0x110
do_syscall_64+0x33/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xa9
$ gdb vmlinux
(...)
(gdb) disassemble acpi_ns_validate_handle
Dump of assembler code for function acpi_ns_validate_handle:
0xffffffff8156518b <+0>: call 0xffffffff81062c40 <__fentry__>
0xffffffff81565190 <+5>: lea -0x1(%rdi),%rdx
0xffffffff81565194 <+9>: mov %rdi,%rax
0xffffffff81565197 <+12>: cmp $0xfffffffffffffffd,%rdx
0xffffffff8156519b <+16>: jbe 0xffffffff815651a5
<acpi_ns_validate_handle+26>
0xffffffff8156519d <+18>: mov 0x16a288c(%rip),%rax #
0xffffffff82c07a30 <acpi_gbl_root_node>
0xffffffff815651a4 <+25>: ret
0xffffffff815651a5 <+26>: cmpb $0xf,0x8(%rdi)
0xffffffff815651a9 <+30>: je 0xffffffff815651ad
<acpi_ns_validate_handle+34>
0xffffffff815651ab <+32>: xor %eax,%eax
0xffffffff815651ad <+34>: ret
End of assembler dump.
Again, we crash on cmpb $0xf,0x8(%rdi).
%rdi == 0x48, it can only come from outside.
All logs (+result of some gdb commands I didn't include here) can be found here:
https://people.freedesktop.org/~mslusarz/tmp/
I'm using gcc 10.2.1 from Debian testing.
I hope I didn't make a fatal mistake in my analysis...^W^W^W^W^W^W^W^W^W^W^W^W
I hope I made a mistake in my analysis and the bug can be explained easily ;)
Marcin
More information about the Alsa-devel
mailing list