On Thu, Feb 02, 2023 at 07:51:01AM -0800, Guenter Roeck wrote:
On Tue, Jan 31, 2023 at 12:08:53PM +0000, Mark Brown wrote:
On Mon, Jan 30, 2023 at 10:16:06PM -0800, Guenter Roeck wrote:
I'll see if I can implement a downstream fix.
If you implement something I don't see a reason not to post it upstream.
I had a look into the code, and concluded that it is too complex for anyone who doesn't know it to find a proper fix. For example, for an outsider it
It's definitely unclear, there's a datasheet at [1] which does appear to explicitly call for a 512ms delay though (see figure 20 on page 50). It does look like it should only be applied in the case where an inserted jack is detected (ie, when identifying an accessory or button press) and not when removal is detected though.
is not conceivable (or explained) why the ground switch is enabled only to be disabled immediately afterwards if a jack was removed.
It smells like there's a power benefit to leaving it disabled when unplugged (which seems plausible), and possibly like the detection is more stable with the ground switch enabled. The ground switch is not documented AFAICT (it's in register 0xfb which isn't named and doesn't appear to appear in the datsheet from a quick search). The code is leaving the switch enabled so long as an accessory is plugged.
This is now the top crash reason on affected Chromebooks (so far I identified Asus C424, HP SeaStar, and HP StingRay) with this patch applied. I am inclined to revert it from all ChromeOS kernel branches. At least for us the cure for the problem is much worse than the problem itself.
Are you saying this is actually crashing, or just that you're getting warnings about threads being blocked for too long (that was what was posted earlier in the thread)? The only things I can see that look like they have the potential to actually lock up are the cancel_work_sync() calls but they were unchanged and the backtrace you showed was showing the thread in the msleep(). My guess would be that you've got systems where there are very frequent jack detection events (potentiallly with broken accessories, or possibly due to the ground switch putting things into the wrong priority) and that the interrupt is firing again as soon as the thread unmasks the primary interrupt which means it never actually stops running.
It's possible that reordering things so that the delay is only applied if DA7219_JACK_INSERTION_STS_MASK is set would help, that'd need some motion of the interrupt acking as well. That's probably a good idea in general, it's what the datasheet seems to call for and would lead to prompter removal detection. However if the issue is systems with broken accessories constantly firing spurious button events they'd still be seeing the delay.
My other guess would be that moving the delay that's been added to a delayed work would avoid the warnings, though you might want to manually keep the physical interrupt disabled while that's running which is fun. Possibly also tuning down the delay given that as you say 500ms is rather a long potential delay even in the context of jack debounces, though if it is bad accessories then there's probably a bit of luck involved in the original code not triggering issues and any debounce is likely to cause fun, and like I say the datasheet does seem to say that this is the appropriate delay.
You'd end up with something along the lines of
disable_irq(); schedule_delayed_work(delay, current_irq_code);
in the IRQ handler then call enable_irq() on the way out of the new delayed_work. That would keep the same flow but not look like the task is running which should avoid setting off the hung task alarm.
[1] https://www.renesas.com/us/en/document/dst/da7219-datasheet?r=1563341