Crash in acpi_ns_validate_handle triggered by soundwire on Linux 5.10
Marcin Ślusarz
marcin.slusarz at gmail.com
Thu Feb 4 13:48:54 CET 2021
czw., 4 lut 2021 o 13:11 Marcin Ślusarz <marcin.slusarz at gmail.com> napisał(a):
>
> pon., 1 lut 2021 o 13:16 Marcin Ślusarz <marcin.slusarz at gmail.com> napisał(a):
> >
> > pon., 1 lut 2021 o 12:43 Rafael J. Wysocki <rafael at kernel.org> napisał(a):
> > >
> > > On Fri, Jan 29, 2021 at 9:03 PM Marcin Ślusarz <marcin.slusarz at gmail.com> wrote:
> > > >
> > > > pt., 29 sty 2021 o 19:59 Marcin Ślusarz <marcin.slusarz at gmail.com> napisał(a):
> > > > >
> > > > > czw., 28 sty 2021 o 15:32 Marcin Ślusarz <marcin.slusarz at gmail.com> napisał(a):
> > > > > >
> > > > > > czw., 28 sty 2021 o 13:39 Rafael J. Wysocki <rafael at kernel.org> napisał(a):
> > > > > > > The only explanation for that I can think about (and which does not
> > > > > > > involve supernatural intervention so to speak) is a stack corruption
> > > > > > > occurring between these two calls in sdw_intel_acpi_cb(). IOW,
> > > > > > > something scribbles on the handle in the meantime, but ATM I have no
> > > > > > > idea what that can be.
> > > > > >
> > > > > > I tried KASAN but it didn't find anything and kernel actually booted
> > > > > > successfully.
> > > > >
> > > > > I investigated this and it looks like a compiler bug (or something nastier),
> > > > > but I can't find where exactly registers get corrupted because if I add printks
> > > > > the corruption seems on the printk side, but if I don't add them it seems
> > > > > the value gets corrupted earlier.
> > > > (...)
> > > > > I'm using gcc 10.2.1 from Debian testing.
> > > >
> > > > Someone on IRC, after hearing only that "gcc miscompiles the kernel",
> > > > suggested disabling CONFIG_STACKPROTECTOR_STRONG.
> > > > It helped indeed and it matches my observations, so it's quite likely it
> > > > is the culprit.
> > > >
> > > > What do we do now?
> > >
> > > Figure out why the stack protection kicks in, I suppose.
> > >
> > > The target object is not on the stack, so if the pointer to it is
> > > valid (we need to verify somehow that it is indeed), dereferencing it
> > > shouldn't cause the stack protection to trigger.
> >
> > Well, the problem is not that stack protector finds something, but
> > the feature itself corrupts some registers.
>
> I retract this statement.
>
> Originally I based it on this piece of code:
> 0xffffffff815781f0 <+35>: mov %r12,%rdx
> 0xffffffff815781f3 <+38>: mov $0xffffffff81eca4c0,%rsi
> 0xffffffff815781fa <+45>: mov $0xffffffff82146d46,%rdi
> 0xffffffff81578201 <+52>: call 0xffffffff818909f1 <printk>
> 0xffffffff81578206 <+57>: cmpb $0xf,0x8(%r12)
> where crash is on the last line and I supposedly could see the message
> printed by printk with the correct value of %r12.
> However, after attaching kgdb+kgdboe (it's so much pain...) to the kernel
> I discovered that someting corrupts memory so much that the formatting
> string becomes "", which means that I don't actually see the output of printk.
Oh crap, I can't reproduce it anymore. I might have tried this before
I disabled KALSR, which would explain why I've seen "" as a formatting
string. (because 0xffffffff82146d46 would not be the real address of it)
More information about the Alsa-devel
mailing list