Re: [PATCH] ASoC: cs42l42: Implement system suspend

1 Dec 2021

      On Wed, Dec 01, 2021 at 06:04:00PM +0000, Richard Fitzgerald wrote:
...
On 01/12/2021 16:32, Mark Brown wrote:
...
On Wed, Dec 01, 2021 at 03:36:48PM +0000, Richard Fitzgerald wrote:
...
...
...

/*
* PWR_CTL2 must be volatile so it can be used as a canary bit to

* detect a reset during system suspend.

*/

case CS42L42_PWR_CTL2:

...
...
This feels a bit hackish
...
I can't think of a better way of detecting whether the chip reset. If
we don't have control of the reset GPIO we can't force a reset. So it
depends whether power turned off (also whether it dropped below the
Vmin voltage at which a reset is triggered). The regulator framework
can't tell us if it really turned off power and on ACPI systems the
power is all controlled magically.
Right, hence the second part of the question - the general thing for
drivers has just been to assume that the power might've gone out over
suspend and behave as though it did unless we were using the device as a
wake source.  We could if required extend the existing regulator API
notification stuff to generate notifications, though they'd not work
when it's compiled out.
...
...
- is the cost of doing the cache sync really so> expensive that it's

worth the effort of trying to skip it?
...
It's not cost, it's about getting the correct values in the registers.
If we call regcache_mark_dirty() it tells regmap that all the hardware
registers have reset to default. Then regcache_sync() will skip writing
cache values that are the register default value, assuming that the
register already has this value. If the chip didn't reset, the registers
could retain non-default values while the cache contains a dirty
default value, in that case the register needs updating to match the
cache.
(note BTW that at the point I queried the performance thing I'd not got
as far as the magic bypassed write sequences)
So this is clearly a being done at the wrong level - it is needlessly
complex, non-idiomatic and making fragile and non-obvious assumptions
about how the cache code operates.  The issue you've got is that the
cache code is presenting interfaces for the common case where you'd only
want to resync the cache in cases where the device has been reset but
you've added code which bypasses the cache and pulls the device out of
sync with the cache.  You need to teach regmap about that, not try to
hack around things in the driver code.  But...
...
I tried to persuade myself that a cache value couldn't possibly change
at any time from suspend() being called to resume disabling cache-only
so I could safely accept default values being skipped. But that
assumption made me very uncomfortable - I don't like leaving potential
bugs in when its simple enough to prevent them.
...that's based on the assumption that this is all about the magic write
sequences you're using for shutdown potentially conflicting with default
values you've got in the cache?  If it's not those then the assumption
is that either the device was reset in which case it has reset values or
the device was not reset and held it's previous value, in which case the
cache sync is redundant.  If we don't trust the device to reset cleanly
for some reason then it's probably a bad idea to tell regmap about
default values in the first place since even on initial boot we might
actually be doing a soft reboot and the device could be holding values
from whatever was running beforehand.
This clearly isn't simple, and other than those shutdown sequences or
potential issues with the device not resetting cleanly this should be
done by extending regmap so it explicitly knows what's going on.  If it
is those shutdown sequences then you're probably looking for something
like doing a _sync_region() on the registers the sequences touch.
Possibly a _sync_region() variant that writes everything requested, or
just make sync_region() not skip defaults.  Or just remove all the
defaults from the driver, the sync will be a bit more expensive but
that's hopefully fine.  If it's not those shutdown sequences it should
still be handled genericly but I'd really like to understand the
scenario you're concerned about here, especially the fact that any issue
will show up in this single register that the code is checking.
...
...
...

if (cs42l42->suspended) {
mutex_unlock(&cs42l42->irq_lock);

return IRQ_NONE;

}

...
...
Given that you're using disable_irq() to force the interrupt off (which
is a bit rude but often the best one can do) how might we be getting an
interrupt while suspended?  This seems to indicate an error condition.
...
I may have misunderstood here, but the documentation says that
enables/disables are nested. The interrupt starts out enabled right
after calling request_threaded_irq(), so I expected that all users of
the shared interrupt would have to call disable_irq() for it to actually
get disabled (I put the call in on that basis that it does no harm). If
disable_irq() forces the irq off even if it's shared then I'll remove it
because as you say that would be rude.
Hrm, that may have changed since I last looked - if that's the case then
I guess it's best not to warn which was what I was thinking here.
...
...
...

/*

 * Only call regcache_mark_dirty() if cs42l42 reset, so

 * regcache_sync() writes all non-default cached values.

 * If it didn't reset we don't want to filter out syncing

 * dirty cache entries that have default value.

 * DISCHARGE_FILT==1 after suspend. If the cs42l42 reset

 * it will now be 0.

 */

...
...
Something needs to tell regmap that the cache is dirty otherwise it
...
Regmap knows if it has dirty values that it hasn't written out to the
hardware.
Well, apparently it doesn't since otherwise we could just do a resync.
...
...
won't sync anything, including the non-default register values?  There's
...
That's fine. If the registers have not been reset they still have the
value of every clean cache entry. Every dirty cache entry will be
written out by regcache_sync().
What about the shutdown sequence - does that not touch cached registers?
...
...
currently an assumption coded in there that the cache is dirty because
the device was reset and everything has default values.
...
Regmap only assumes that if regcache_mark_dirty() is called.
Right, but the point here is that coming out of suspend the driver is
expected to be unconditionally marking the cache as dirty and doing a
resync since we probably lost power.  The code is trying to avoid that
for reasons that are not at all apparent but I *think* are due to the
bypassed writes in shutdown(), especially if you say it's not a
performance thing.  We shouldn't need to worry about there being non
default states in the hardware.
The whole thing is just really confusing as it stands.  Like I say I
think this needs to express what it's trying to do clearly in both the
regmap operations it's doing and the code in general, at the minute it's
not clear looking at things what the goal is.