[alsa-devel] [PATCH] ASoC: core: Fix race in dapm_power_widgets
dapm_power_widgets can be called from different context. When two calls are happening at the same time both will try to change states/lists. This can lead to kernel crash.
A simple way to reproduce the problem:
while [ "1" = "1" ] ; do amixer sset -Dhw:0 -q 'Mixer for loopback' on amixer sset -Dhw:0 -q 'Mixer for loopback' off done &
while [ "1" = "1" ] ; do aplay -Dplughw:0 -fdat -d 3 /dev/urandom echo "Playback finished" sleep 6 done &
Add new card level mutex (dpw_mutex) to protect the dapm_power_widgets from race. The exisiting card->mutex can not be used for this purpose, since it has been taken in probe time in the snd_soc_instantiate_card function. Through probe calls from this function eventually dapm_power_widgets will be called, which will lead to dead lock.
Signed-off-by: Peter Ujfalusi peter.ujfalusi@nokia.com --- include/sound/soc.h | 1 + sound/soc/soc-core.c | 1 + sound/soc/soc-dapm.c | 2 ++ 3 files changed, 4 insertions(+), 0 deletions(-)
diff --git a/include/sound/soc.h b/include/sound/soc.h index 5c3bce8..c0175ab 100644 --- a/include/sound/soc.h +++ b/include/sound/soc.h @@ -563,6 +563,7 @@ struct snd_soc_card {
struct list_head list; struct mutex mutex; + struct mutex dpw_mutex; /* To avoid dapm_power_widget race condition */
bool instantiated;
diff --git a/sound/soc/soc-core.c b/sound/soc/soc-core.c index 614a8b3..8017afc 100644 --- a/sound/soc/soc-core.c +++ b/sound/soc/soc-core.c @@ -2872,6 +2872,7 @@ static int snd_soc_register_card(struct snd_soc_card *card) INIT_LIST_HEAD(&card->list); card->instantiated = 0; mutex_init(&card->mutex); + mutex_init(&card->dpw_mutex);
mutex_lock(&client_mutex); list_add(&card->list, &card_list); diff --git a/sound/soc/soc-dapm.c b/sound/soc/soc-dapm.c index 7d85c64..89bfa37 100644 --- a/sound/soc/soc-dapm.c +++ b/sound/soc/soc-dapm.c @@ -902,6 +902,7 @@ static int dapm_power_widgets(struct snd_soc_codec *codec, int event) /* Check which widgets we need to power and store them in * lists indicating if they should be powered up or down. */ + mutex_lock(&card->dpw_mutex); list_for_each_entry(w, &codec->dapm_widgets, list) { switch (w->id) { case snd_soc_dapm_pre: @@ -1009,6 +1010,7 @@ static int dapm_power_widgets(struct snd_soc_codec *codec, int event) pop_dbg(codec->pop_time, "DAPM sequencing finished, waiting %dms\n", codec->pop_time); pop_wait(codec->pop_time); + mutex_unlock(&card->dpw_mutex);
return 0; }
On Thu, 2010-11-04 at 10:16 +0200, Peter Ujfalusi wrote:
dapm_power_widgets can be called from different context. When two calls are happening at the same time both will try to change states/lists. This can lead to kernel crash.
A simple way to reproduce the problem:
while [ "1" = "1" ] ; do amixer sset -Dhw:0 -q 'Mixer for loopback' on amixer sset -Dhw:0 -q 'Mixer for loopback' off done &
while [ "1" = "1" ] ; do aplay -Dplughw:0 -fdat -d 3 /dev/urandom echo "Playback finished" sleep 6 done &
Add new card level mutex (dpw_mutex) to protect the dapm_power_widgets from race. The exisiting card->mutex can not be used for this purpose, since it has been taken in probe time in the snd_soc_instantiate_card function. Through probe calls from this function eventually dapm_power_widgets will be called, which will lead to dead lock.
Signed-off-by: Peter Ujfalusi peter.ujfalusi@nokia.com
Acked-by: Liam Girdwood lrg@slimlogic.co.uk
On Thu, Nov 04, 2010 at 10:16:45AM +0200, Peter Ujfalusi wrote:
The exisiting card->mutex can not be used for this purpose, since it has been taken in probe time in the snd_soc_instantiate_card function. Through probe calls from this function eventually dapm_power_widgets will be called, which will lead to dead lock.
Hrm. You're right that the card mutex has issues here but just adding a lock here leaves us with a race - whenever we make a change to widget power we're doing a read/modify/write cycle and those aren't locked at all. I'm not sure if we need a lower level lock to fix that, or a higher level lock which would make this one redundant but we ought to think about it and make sure we've picked the right way.
On Thursday 04 November 2010 15:43:54 ext Mark Brown wrote:
On Thu, Nov 04, 2010 at 10:16:45AM +0200, Peter Ujfalusi wrote:
The exisiting card->mutex can not be used for this purpose, since it has been taken in probe time in the snd_soc_instantiate_card function. Through probe calls from this function eventually dapm_power_widgets will be called, which will lead to dead lock.
Hrm. You're right that the card mutex has issues here but just adding a lock here leaves us with a race - whenever we make a change to widget power we're doing a read/modify/write cycle and those aren't locked at all. I'm not sure if we need a lower level lock to fix that, or a higher level lock which would make this one redundant but we ought to think about it and make sure we've picked the right way.
It is really hard to see what is actually happening, but based on my observation the root is this: the widget's power_list going to be modified during the dapm_power_widgets call. If the previous call to dapm_power_widgets did not finished, and it is going through the widget's power_list, another call to dapm_power_widgets might cause the power_list to be changed. This leads to a crash. It is possible that the list which existed in the first dapm_power_widgets call got distroyed by the second call while the first instance was going through that. IMHO in most cases the new mutex is not going to add much delay, since it is really rare case, when we can hit this bug.
I did not find lower level point where I can safely deal with this problem.
On Thu, Nov 04, 2010 at 04:18:03PM +0200, Peter Ujfalusi wrote:
It is really hard to see what is actually happening, but based on my observation the root is this: the widget's power_list going to be modified during the dapm_power_widgets call. If the previous call to dapm_power_widgets did not finished, and it is going through the widget's power_list, another call to dapm_power_widgets might cause the power_list to be changed. This leads to a crash. It is possible that the list which existed in the first dapm_power_widgets call got distroyed by the second call while the first instance was going through that. IMHO in most cases the new mutex is not going to add much delay, since it is really rare case, when we can hit this bug.
Sure, it's fairly clear how this could fall over.
I did not find lower level point where I can safely deal with this problem.
Right, that's what I was thinking - I'm mostly concerned that we need to have the locking further up the stack rather than further down. This feels like a band aid which fixes the symptom (the list pointers getting corrupted) but it smells like we may have something further up the stack which can still create issues. For example, if both DAPM and something else manipulate the register map simultaneously we could get R/M/W issues.
On Thursday 04 November 2010 20:08:58 ext Mark Brown wrote:
Right, that's what I was thinking - I'm mostly concerned that we need to have the locking further up the stack rather than further down. This feels like a band aid which fixes the symptom (the list pointers getting corrupted) but it smells like we may have something further up the stack which can still create issues. For example, if both DAPM and something else manipulate the register map simultaneously we could get R/M/W issues.
Well, I think if we want to have one place to use a lock, this is the highest we can go. But I did figured out the reason for the crash. In my machine driver I had a kcontrol (ENUM_EXT) for muting the speaker. In the set callback I used snd_soc_dapm_enable_pin/snd_soc_dapm_disable_pin + snd_soc_dapm_sync. This is the place where it is crashing, since dapm_sync calls dapm_power_widgets directly without locking. All other places, when dapm_power_widgets called are protected by codec->mutex so those paths are safe. By replacing the custom kcontrol/callbacks with SOC_DAPM_PIN_SWITCH() the problem is gone. So it seams that my case can be solved easily with the SOC_DAPM_PIN_SWITCH(), but the race remains, and other setups might fail under some rare scenario
We can not add the codec->mutex to the snd_soc_dapm_sync, since it is called from soc_probe_dai_link, which is part of the initialization, and the codec-
mutex has been taken up in the chain.
It might be a good idea to go through the machine drivers, and check for snd_soc_dapm_sync usage.
On Fri, Nov 05, 2010 at 09:53:35AM +0200, Peter Ujfalusi wrote:
Well, I think if we want to have one place to use a lock, this is the highest we can go.
This assumes that we've figured out the correct thing to lock and where we need to hold it - if you want to lock the DAPM algorithm then that's one thing, but it's not clear to me that this is the only thing we need to lock or if we need to ensure that the users of DAPM are handled and therefore this level of stuff becomes redundant.
All other places, when dapm_power_widgets called are protected by codec->mutex so those paths are safe. By replacing the custom kcontrol/callbacks with SOC_DAPM_PIN_SWITCH() the problem is gone. So it seams that my case can be solved easily with the SOC_DAPM_PIN_SWITCH(), but the race remains, and other setups might fail under some rare scenario
This is all smelling like we should be doing something to ensure that all the controls take the CODEC lock. We're relying pretty heavily on that throughout the code so there's going to be other smoking guns there.
It might be a good idea to go through the machine drivers, and check for snd_soc_dapm_sync usage.
I think so; probably any DAPM function has this issue.
On Fri, 2010-11-05 at 10:38 -0400, Mark Brown wrote:
On Fri, Nov 05, 2010 at 09:53:35AM +0200, Peter Ujfalusi wrote:
Well, I think if we want to have one place to use a lock, this is the highest we can go.
This assumes that we've figured out the correct thing to lock and where we need to hold it - if you want to lock the DAPM algorithm then that's one thing, but it's not clear to me that this is the only thing we need to lock or if we need to ensure that the users of DAPM are handled and therefore this level of stuff becomes redundant.
All other places, when dapm_power_widgets called are protected by codec->mutex so those paths are safe. By replacing the custom kcontrol/callbacks with SOC_DAPM_PIN_SWITCH() the problem is gone. So it seams that my case can be solved easily with the SOC_DAPM_PIN_SWITCH(), but the race remains, and other setups might fail under some rare scenario
This is all smelling like we should be doing something to ensure that all the controls take the CODEC lock. We're relying pretty heavily on that throughout the code so there's going to be other smoking guns there.
I think we need to consider non CODEC components too that have DAPM controls (and have no CODEC mutex). i.e OMAP4 ABE.
I'll try and look into this later as part of the OMAP4 upstream. Ideally a card/DAPM mutex may be more desirable here.
Liam
On Sat, Nov 06, 2010 at 11:00:06AM +0000, Liam Girdwood wrote:
I think we need to consider non CODEC components too that have DAPM controls (and have no CODEC mutex). i.e OMAP4 ABE.
I'll try and look into this later as part of the OMAP4 upstream. Ideally a card/DAPM mutex may be more desirable here.
Another option is to just move the mutex up to the card level so we don't need mutexes on the individual objects. I think we're going to need to lock at the card level to do cross-object DAPM sequences, and if we do that we may not need to bother with anything more fine grained.
participants (3)
-
Liam Girdwood
-
Mark Brown
-
Peter Ujfalusi