Crash in acpi_ns_validate_handle triggered by soundwire on Linux 5.10

Marcin Ślusarz marcin.slusarz at gmail.com
Thu Feb 4 13:48:54 CET 2021


czw., 4 lut 2021 o 13:11 Marcin Ślusarz <marcin.slusarz at gmail.com> napisał(a):
>
> pon., 1 lut 2021 o 13:16 Marcin Ślusarz <marcin.slusarz at gmail.com> napisał(a):
> >
> > pon., 1 lut 2021 o 12:43 Rafael J. Wysocki <rafael at kernel.org> napisał(a):
> > >
> > > On Fri, Jan 29, 2021 at 9:03 PM Marcin Ślusarz <marcin.slusarz at gmail.com> wrote:
> > > >
> > > > pt., 29 sty 2021 o 19:59 Marcin Ślusarz <marcin.slusarz at gmail.com> napisał(a):
> > > > >
> > > > > czw., 28 sty 2021 o 15:32 Marcin Ślusarz <marcin.slusarz at gmail.com> napisał(a):
> > > > > >
> > > > > > czw., 28 sty 2021 o 13:39 Rafael J. Wysocki <rafael at kernel.org> napisał(a):
> > > > > > > The only explanation for that I can think about (and which does not
> > > > > > > involve supernatural intervention so to speak) is a stack corruption
> > > > > > > occurring between these two calls in sdw_intel_acpi_cb().  IOW,
> > > > > > > something scribbles on the handle in the meantime, but ATM I have no
> > > > > > > idea what that can be.
> > > > > >
> > > > > > I tried KASAN but it didn't find anything and kernel actually booted
> > > > > > successfully.
> > > > >
> > > > > I investigated this and it looks like a compiler bug (or something nastier),
> > > > > but I can't find where exactly registers get corrupted because if I add printks
> > > > > the corruption seems on the printk side, but if I don't add them it seems
> > > > > the value gets corrupted earlier.
> > > > (...)
> > > > > I'm using gcc 10.2.1 from Debian testing.
> > > >
> > > > Someone on IRC, after hearing only that "gcc miscompiles the kernel",
> > > > suggested disabling CONFIG_STACKPROTECTOR_STRONG.
> > > > It helped indeed and it matches my observations, so it's quite likely it
> > > > is the culprit.
> > > >
> > > > What do we do now?
> > >
> > > Figure out why the stack protection kicks in, I suppose.
> > >
> > > The target object is not on the stack, so if the pointer to it is
> > > valid (we need to verify somehow that it is indeed), dereferencing it
> > > shouldn't cause the stack protection to trigger.
> >
> > Well, the problem is not that stack protector finds something, but
> > the feature itself corrupts some registers.
>
> I retract this statement.
>
> Originally I based it on this piece of code:
>    0xffffffff815781f0 <+35>:    mov    %r12,%rdx
>    0xffffffff815781f3 <+38>:    mov    $0xffffffff81eca4c0,%rsi
>    0xffffffff815781fa <+45>:    mov    $0xffffffff82146d46,%rdi
>    0xffffffff81578201 <+52>:    call   0xffffffff818909f1 <printk>
>    0xffffffff81578206 <+57>:    cmpb   $0xf,0x8(%r12)
> where crash is on the last line and I supposedly could see the message
> printed by printk with the correct value of %r12.
> However, after attaching kgdb+kgdboe (it's so much pain...) to the kernel
> I discovered that someting corrupts memory so much that the formatting
> string becomes "", which means that I don't actually see the output of printk.

Oh crap, I can't reproduce it anymore. I might have tried this before
I disabled KALSR, which would explain why I've seen "" as a formatting
string. (because 0xffffffff82146d46 would not be the real address of it)


More information about the Alsa-devel mailing list