
czw., 4 lut 2021 o 13:11 Marcin Ślusarz marcin.slusarz@gmail.com napisał(a):
pon., 1 lut 2021 o 13:16 Marcin Ślusarz marcin.slusarz@gmail.com napisał(a):
pon., 1 lut 2021 o 12:43 Rafael J. Wysocki rafael@kernel.org napisał(a):
On Fri, Jan 29, 2021 at 9:03 PM Marcin Ślusarz marcin.slusarz@gmail.com wrote:
pt., 29 sty 2021 o 19:59 Marcin Ślusarz marcin.slusarz@gmail.com napisał(a):
czw., 28 sty 2021 o 15:32 Marcin Ślusarz marcin.slusarz@gmail.com napisał(a):
czw., 28 sty 2021 o 13:39 Rafael J. Wysocki rafael@kernel.org napisał(a): > The only explanation for that I can think about (and which does not > involve supernatural intervention so to speak) is a stack corruption > occurring between these two calls in sdw_intel_acpi_cb(). IOW, > something scribbles on the handle in the meantime, but ATM I have no > idea what that can be.
I tried KASAN but it didn't find anything and kernel actually booted successfully.
I investigated this and it looks like a compiler bug (or something nastier), but I can't find where exactly registers get corrupted because if I add printks the corruption seems on the printk side, but if I don't add them it seems the value gets corrupted earlier.
(...)
I'm using gcc 10.2.1 from Debian testing.
Someone on IRC, after hearing only that "gcc miscompiles the kernel", suggested disabling CONFIG_STACKPROTECTOR_STRONG. It helped indeed and it matches my observations, so it's quite likely it is the culprit.
What do we do now?
Figure out why the stack protection kicks in, I suppose.
The target object is not on the stack, so if the pointer to it is valid (we need to verify somehow that it is indeed), dereferencing it shouldn't cause the stack protection to trigger.
Well, the problem is not that stack protector finds something, but the feature itself corrupts some registers.
I retract this statement.
Originally I based it on this piece of code: 0xffffffff815781f0 <+35>: mov %r12,%rdx 0xffffffff815781f3 <+38>: mov $0xffffffff81eca4c0,%rsi 0xffffffff815781fa <+45>: mov $0xffffffff82146d46,%rdi 0xffffffff81578201 <+52>: call 0xffffffff818909f1 <printk> 0xffffffff81578206 <+57>: cmpb $0xf,0x8(%r12) where crash is on the last line and I supposedly could see the message printed by printk with the correct value of %r12. However, after attaching kgdb+kgdboe (it's so much pain...) to the kernel I discovered that someting corrupts memory so much that the formatting string becomes "", which means that I don't actually see the output of printk.
Oh crap, I can't reproduce it anymore. I might have tried this before I disabled KALSR, which would explain why I've seen "" as a formatting string. (because 0xffffffff82146d46 would not be the real address of it)