pon., 1 lut 2021 o 12:43 Rafael J. Wysocki rafael@kernel.org napisał(a):
On Fri, Jan 29, 2021 at 9:03 PM Marcin Ślusarz marcin.slusarz@gmail.com wrote:
pt., 29 sty 2021 o 19:59 Marcin Ślusarz marcin.slusarz@gmail.com napisał(a):
czw., 28 sty 2021 o 15:32 Marcin Ślusarz marcin.slusarz@gmail.com napisał(a):
czw., 28 sty 2021 o 13:39 Rafael J. Wysocki rafael@kernel.org napisał(a):
The only explanation for that I can think about (and which does not involve supernatural intervention so to speak) is a stack corruption occurring between these two calls in sdw_intel_acpi_cb(). IOW, something scribbles on the handle in the meantime, but ATM I have no idea what that can be.
I tried KASAN but it didn't find anything and kernel actually booted successfully.
I investigated this and it looks like a compiler bug (or something nastier), but I can't find where exactly registers get corrupted because if I add printks the corruption seems on the printk side, but if I don't add them it seems the value gets corrupted earlier.
(...)
I'm using gcc 10.2.1 from Debian testing.
Someone on IRC, after hearing only that "gcc miscompiles the kernel", suggested disabling CONFIG_STACKPROTECTOR_STRONG. It helped indeed and it matches my observations, so it's quite likely it is the culprit.
What do we do now?
Figure out why the stack protection kicks in, I suppose.
The target object is not on the stack, so if the pointer to it is valid (we need to verify somehow that it is indeed), dereferencing it shouldn't cause the stack protection to trigger.
Well, the problem is not that stack protector finds something, but the feature itself corrupts some registers.