Re: [alsa-devel] [RFC 00/17] clk: Add per-controller locks to fix deadlocks

12 Nov 2016


      On 11/04, Marek Szyprowski wrote:
...
Hi Stephen,
Krzysztof has left Samsung, but we would like to continue this task, because
the ABBA dead-locks related to global prepare lock becomes more and more
problematic for us to workaround.
Hmm. Ok. Thanks for the info.
...
On 2016-09-09 02:24, Stephen Boyd wrote:
...
So I'm not very fond of this design because the locking scheme is
pretty much out of the hands of the framework and can be easily
broken.
Well, switching from a single global lock to more granular locking
is always a good approach. Please note that the global lock sooner
or later became a serious bottleneck if one wants to make somehow
more aggressive power management and clock gating.
I'm not so sure switching from a global lock to a more granular
lock is _always_ a great idea. Sometimes simpler code is better,
even if it doesn't scale to a million clk nodes. The largest
systems I've seen only have clocks in the hundreds, and a
majority of those aren't rate changing in parallel, so it's not
like we're suffering from VFS type scalability problems here with
tens of thousands of inodes.
That isn't to say I don't agree there's a scalability problem
here, but I'd like to actually see numbers to prove that there's
some sort of scalability problem before making drastic changes.
...
...
I'm biased of course, because I'd prefer we go with my
wwmutex design of per-clk locks[1]. Taking locks in any order
works fine there, and we resolve quite a few long standing
locking problems that we have while improving scalability. The
problem there is that we don't get the recursive mutex design
(maybe that's a benefit!).
Do you have any plan to continue working on your approach? per-clk
wwmutex looks like an overkill on the first glance, but that's probably
the only working solution if you want to get rid of recursive locks.
I'm still not really convinced that we really need wwmutex here,
especially if it is possible to guarantee the same order of locking
operations inside the clock core. This requires a bit of cooperation
from clock providers (with proper documentation and a list of
DO/DON'T it shouldn't be that hard).
So far I haven't gotten around to resurrecting the wwmutex stuff.
If you have interest in doing it that's great. Having a locking
scheme with rules of DO/DON'T sounds brittle to me, unless it can
be automated to find problems. I know that I'm not going to spend
time policing that.
...
...
Once a clk_op reenters the framework
with consumer APIs and tries to grab the same lock we deadlock.
This is why I've been slowly splitting consumers from providers
so we can easily identify these cases. If we had something like
coordinated clk rate switching, we could get rid of clk_ops
reentering the framework and avoid this problem (and we really do
need to do that).
I'm not sure that this makes really sense split consumers and
providers. You will get recursive calls to clk core anyway with
consumers calls if you are implementing i2c clock, for which an i2c
bus driver does it's own clock gating (i2c controller uses
consumer clk api).
I suppose this is a different topic. Regardless of the recursive
call or not, we can easily see that a clk consumer is also a clk
provider and just knowing that is useful. Once we know that, we
can look to see if they're calling clk consumer APIs from their
provider callbacks which is not desired because it makes it
impossible to get rid of the recursive lock design. If the lock
is per-clock, then recursion doesn't happen when the provider is
also a consumer. If it does, that's bad and lockdep should tell
us.
-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project