[alsa-devel] [PATCH 0/3] ASoC: max98090: fix PLL unlocked workaround-related

This series is a follow up fix for the question: https://mailman.alsa-project.org/pipermail/alsa-devel/2019-October/157364.ht...
Tzung-Bi Shih (3): ASoC: max98090: remove msleep in PLL unlocked workaround ASoC: max98090: exit workaround earlier if PLL is locked ASoC: max98090: fix possible race conditions
sound/soc/codecs/max98090.c | 23 ++++++++++++++--------- sound/soc/codecs/max98090.h | 1 - 2 files changed, 14 insertions(+), 10 deletions(-)

It was observed Baytrail-based chromebooks could cause continuous PLL unlocked when using playback stream and capture stream simultaneously. Specifically, starting a capture stream after started a playback stream. As a result, the audio data could corrupt or turn completely silent.
As the datasheet suggested, the maximum PLL lock time should be 7 msec. The workaround resets the codec softly by toggling SHDN off and on if PLL failed to lock for 10 msec. Notably, there is no suggested hold time for SHDN off.
On Baytrail-based chromebooks, it would easily happen continuous PLL unlocked if there is a 10 msec delay between SHDN off and on. Removes the msleep().
Signed-off-by: Tzung-Bi Shih tzungbi@google.com --- sound/soc/codecs/max98090.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/sound/soc/codecs/max98090.c b/sound/soc/codecs/max98090.c index f6bf4cfbea23..8382a77586ee 100644 --- a/sound/soc/codecs/max98090.c +++ b/sound/soc/codecs/max98090.c @@ -2117,7 +2117,6 @@ static void max98090_pll_work(struct work_struct *work) /* Toggle shutdown OFF then ON */ snd_soc_component_update_bits(component, M98090_REG_DEVICE_SHUTDOWN, M98090_SHDNN_MASK, 0); - msleep(10); snd_soc_component_update_bits(component, M98090_REG_DEVICE_SHUTDOWN, M98090_SHDNN_MASK, M98090_SHDNN_MASK);

On 11/20/19 12:02 AM, Tzung-Bi Shih wrote:
It was observed Baytrail-based chromebooks could cause continuous PLL unlocked when using playback stream and capture stream simultaneously. Specifically, starting a capture stream after started a playback stream. As a result, the audio data could corrupt or turn completely silent.
As the datasheet suggested, the maximum PLL lock time should be 7 msec. The workaround resets the codec softly by toggling SHDN off and on if PLL failed to lock for 10 msec. Notably, there is no suggested hold time for SHDN off.
On Baytrail-based chromebooks, it would easily happen continuous PLL unlocked if there is a 10 msec delay between SHDN off and on. Removes the msleep().
Signed-off-by: Tzung-Bi Shih tzungbi@google.com
sound/soc/codecs/max98090.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/sound/soc/codecs/max98090.c b/sound/soc/codecs/max98090.c index f6bf4cfbea23..8382a77586ee 100644 --- a/sound/soc/codecs/max98090.c +++ b/sound/soc/codecs/max98090.c @@ -2117,7 +2117,6 @@ static void max98090_pll_work(struct work_struct *work) /* Toggle shutdown OFF then ON */ snd_soc_component_update_bits(component, M98090_REG_DEVICE_SHUTDOWN, M98090_SHDNN_MASK, 0);
- msleep(10);
maybe add a comment here that the off/on sequence is done on purpose (as stated in the commit message)?
snd_soc_component_update_bits(component, M98090_REG_DEVICE_SHUTDOWN, M98090_SHDNN_MASK, M98090_SHDNN_MASK);

On Thu, Nov 21, 2019 at 12:10 AM Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com wrote:
maybe add a comment here that the off/on sequence is done on purpose (as stated in the commit message)?
Will do, thanks.

According to the datasheet, PLL lock time typically takes 2 msec and at most takes 7 msec.
Check the lock status every 1 msec and exit the workaround if PLL is locked.
Signed-off-by: Tzung-Bi Shih tzungbi@google.com --- sound/soc/codecs/max98090.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/sound/soc/codecs/max98090.c b/sound/soc/codecs/max98090.c index 8382a77586ee..2ccdfb2383b7 100644 --- a/sound/soc/codecs/max98090.c +++ b/sound/soc/codecs/max98090.c @@ -2108,6 +2108,8 @@ static void max98090_pll_work(struct work_struct *work) struct max98090_priv *max98090 = container_of(work, struct max98090_priv, pll_work); struct snd_soc_component *component = max98090->component; + unsigned int pll; + int i;
if (!snd_soc_component_is_active(component)) return; @@ -2120,8 +2122,16 @@ static void max98090_pll_work(struct work_struct *work) snd_soc_component_update_bits(component, M98090_REG_DEVICE_SHUTDOWN, M98090_SHDNN_MASK, M98090_SHDNN_MASK);
- /* Give PLL time to lock */ - msleep(10); + for (i = 0; i < 10; ++i) { + /* Check lock status */ + pll = snd_soc_component_read32( + component, M98090_REG_DEVICE_STATUS); + if (!(pll & M98090_ULK_MASK)) + break; + + /* Give PLL time to lock */ + usleep_range(1000, 1200); + } }
static void max98090_jack_work(struct work_struct *work)

max98090_interrupt() and max98090_pll_work() run in 2 different threads. There are 2 possible races:
Note: M98090_REG_DEVICE_STATUS = 0x01. Note: ULK == 0, PLL is locked; ULK == 1, PLL is unlocked.
max98090_interrupt max98090_pll_work ---------------------------------------------- schedule max98090_pll_work restart max98090 codec receive ULK INT assert ULK == 0 schedule max98090_pll_work (1).
In the case (1), the PLL is locked but max98090_interrupt unnecessarily schedules another max98090_pll_work.
max98090_interrupt max98090_pll_work max98090 codec ---------------------------------------------------------------------- ULK = 1 receive ULK INT read 0x01 ULK = 0 (clear on read) schedule max98090_pll_work restart max98090 codec ULK = 1 receive ULK INT read 0x01 ULK = 0 (clear on read) read 0x1 assert ULK == 0 (2).
In the case (2), both max98090_interrupt and max98090_pll_work read the same clear-on-read register. max98090_pll_work would falsely thought PLL is locked.
There are 2 possible options: A. turn off ULK interrupt before scheduling max98090_pll_work; and turn on again before exiting max98090_pll_work. B. remove the second thread of execution.
Adopts option B which is more straightforward.
Signed-off-by: Tzung-Bi Shih tzungbi@google.com --- sound/soc/codecs/max98090.c | 8 ++------ sound/soc/codecs/max98090.h | 1 - 2 files changed, 2 insertions(+), 7 deletions(-)
diff --git a/sound/soc/codecs/max98090.c b/sound/soc/codecs/max98090.c index 2ccdfb2383b7..75abe98dfc44 100644 --- a/sound/soc/codecs/max98090.c +++ b/sound/soc/codecs/max98090.c @@ -2103,10 +2103,8 @@ static void max98090_pll_det_disable_work(struct work_struct *work) M98090_IULK_MASK, 0); }
-static void max98090_pll_work(struct work_struct *work) +static void max98090_pll_work(struct max98090_priv *max98090) { - struct max98090_priv *max98090 = - container_of(work, struct max98090_priv, pll_work); struct snd_soc_component *component = max98090->component; unsigned int pll; int i; @@ -2268,7 +2266,7 @@ static irqreturn_t max98090_interrupt(int irq, void *data)
if (active & M98090_ULK_MASK) { dev_dbg(component->dev, "M98090_ULK_MASK\n"); - schedule_work(&max98090->pll_work); + max98090_pll_work(max98090); }
if (active & M98090_JDET_MASK) { @@ -2431,7 +2429,6 @@ static int max98090_probe(struct snd_soc_component *component) max98090_pll_det_enable_work); INIT_WORK(&max98090->pll_det_disable_work, max98090_pll_det_disable_work); - INIT_WORK(&max98090->pll_work, max98090_pll_work);
/* Enable jack detection */ snd_soc_component_write(component, M98090_REG_JACK_DETECT, @@ -2484,7 +2481,6 @@ static void max98090_remove(struct snd_soc_component *component) cancel_delayed_work_sync(&max98090->jack_work); cancel_delayed_work_sync(&max98090->pll_det_enable_work); cancel_work_sync(&max98090->pll_det_disable_work); - cancel_work_sync(&max98090->pll_work); max98090->component = NULL; }
diff --git a/sound/soc/codecs/max98090.h b/sound/soc/codecs/max98090.h index 57965cd678b4..a197114b0dad 100644 --- a/sound/soc/codecs/max98090.h +++ b/sound/soc/codecs/max98090.h @@ -1530,7 +1530,6 @@ struct max98090_priv { struct delayed_work jack_work; struct delayed_work pll_det_enable_work; struct work_struct pll_det_disable_work; - struct work_struct pll_work; struct snd_soc_jack *jack; unsigned int dai_fmt; int tdm_slots;

On 11/20/19 12:02 AM, Tzung-Bi Shih wrote:
max98090_interrupt() and max98090_pll_work() run in 2 different threads. There are 2 possible races:
Note: M98090_REG_DEVICE_STATUS = 0x01. Note: ULK == 0, PLL is locked; ULK == 1, PLL is unlocked.
max98090_interrupt max98090_pll_work
schedule max98090_pll_work restart max98090 codec receive ULK INT assert ULK == 0 schedule max98090_pll_work (1).
In the case (1), the PLL is locked but max98090_interrupt unnecessarily schedules another max98090_pll_work.
if you re-test that the PLL is already running, then you can exit the work function immediately without redoing the sequence?
maybe also play with the masks so that the PLL unlock is masked in the interrupt and unmasked after the PLL locks?
max98090_interrupt max98090_pll_work max98090 codec
ULK = 1
receive ULK INT read 0x01 ULK = 0 (clear on read) schedule max98090_pll_work restart max98090 codec ULK = 1 receive ULK INT read 0x01 ULK = 0 (clear on read) read 0x1 assert ULK == 0 (2).
what are those 0x01 and 0x1? is the second a typo possibly?
In the case (2), both max98090_interrupt and max98090_pll_work read the same clear-on-read register. max98090_pll_work would falsely thought PLL is locked.
There are 2 possible options: A. turn off ULK interrupt before scheduling max98090_pll_work; and turn on again before exiting max98090_pll_work. B. remove the second thread of execution.
Adopts option B which is more straightforward.
but has the side effect of possibly adding a 10ms delay in the interrupt thread?

On Thu, Nov 21, 2019 at 12:10 AM Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com wrote:
On 11/20/19 12:02 AM, Tzung-Bi Shih wrote:
max98090_interrupt() and max98090_pll_work() run in 2 different threads. There are 2 possible races:
Note: M98090_REG_DEVICE_STATUS = 0x01. Note: ULK == 0, PLL is locked; ULK == 1, PLL is unlocked.
max98090_interrupt max98090_pll_work
schedule max98090_pll_work restart max98090 codec receive ULK INT assert ULK == 0 schedule max98090_pll_work (1).
In the case (1), the PLL is locked but max98090_interrupt unnecessarily schedules another max98090_pll_work.
if you re-test that the PLL is already running, then you can exit the work function immediately without redoing the sequence?
ACK.
maybe also play with the masks so that the PLL unlock is masked in the interrupt and unmasked after the PLL locks?
ACK, this is our option A mentioned below.
max98090_interrupt max98090_pll_work max98090 codec
ULK = 1
receive ULK INT read 0x01 ULK = 0 (clear on read) schedule max98090_pll_work restart max98090 codec ULK = 1 receive ULK INT read 0x01 ULK = 0 (clear on read) read 0x1 assert ULK == 0 (2).
what are those 0x01 and 0x1? is the second a typo possibly?
ACK, a typo.
In the case (2), both max98090_interrupt and max98090_pll_work read the same clear-on-read register. max98090_pll_work would falsely thought PLL is locked.
There are 2 possible options: A. turn off ULK interrupt before scheduling max98090_pll_work; and turn on again before exiting max98090_pll_work. B. remove the second thread of execution.
Adopts option B which is more straightforward.
but has the side effect of possibly adding a 10ms delay in the interrupt thread?
(forgot to mention) Option A cannot fix the case (2) race condition: there would be 2 threads read the same clear-on-read register. In theory, the hardware should faster than CPUs' accesses via I2C. max98090 should returns ULK=1 any time if PLL is unlocked. Shall we ignore the case (2) and adopt option A?

max98090_interrupt max98090_pll_work max98090 codec
ULK = 1
receive ULK INT read 0x01 ULK = 0 (clear on read) schedule max98090_pll_work restart max98090 codec ULK = 1 receive ULK INT read 0x01 ULK = 0 (clear on read) read 0x1 assert ULK == 0 (2).
what are those 0x01 and 0x1? is the second a typo possibly?
ACK, a typo.
In the case (2), both max98090_interrupt and max98090_pll_work read the same clear-on-read register. max98090_pll_work would falsely thought PLL is locked.
There are 2 possible options: A. turn off ULK interrupt before scheduling max98090_pll_work; and turn on again before exiting max98090_pll_work. B. remove the second thread of execution.
Adopts option B which is more straightforward.
but has the side effect of possibly adding a 10ms delay in the interrupt thread?
(forgot to mention) Option A cannot fix the case (2) race condition: there would be 2 threads read the same clear-on-read register. In theory, the hardware should faster than CPUs' accesses via I2C. max98090 should returns ULK=1 any time if PLL is unlocked. Shall we ignore the case (2) and adopt option A?
I don't have any specific recommendation, just trying to follow your line of thought on a problem that bugged be in the past. If your tests shows that the extra delay is fine, then it's good progress
Still you may want to clarify your second point and the commit message. The first race condition was clear, the second one not so much.
what did you mean with 'assert ULK == 0' and what does this map to in pll_work
the code looks as this:
static void max98090_pll_work(struct work_struct *work) { if (!snd_soc_component_is_active(component)) return;
/* Toggle shutdown OFF then ON */ snd_soc_component_update_bits(component, M98090_REG_DEVICE_SHUTDOWN, M98090_SHDNN_MASK, 0);
snd_soc_component_update_bits(component, M98090_REG_DEVICE_SHUTDOWN, M98090_SHDNN_MASK, M98090_SHDNN_MASK);
/* Give PLL time to lock */ msleep(10); }
so in what does the race occur?
or was it race with the other work queue, which starts like this:
static void max98090_pll_det_enable_work(struct work_struct *work) { struct max98090_priv *max98090 = container_of(work, struct max98090_priv, pll_det_enable_work.work); struct snd_soc_component *component = max98090->component; unsigned int status, mask;
/* * Clear status register in order to clear possibly already occurred * PLL unlock. If PLL hasn't still locked, the status will be set * again and PLL unlock interrupt will occur. * Note this will clear all status bits */ regmap_read(max98090->regmap, M98090_REG_DEVICE_STATUS, &status);
So here indeed there is the same read, but what is the consequence of the read? did you mean that in the pll_det_enable_work the value of 'status' may be incorrect as it was cleared already?
But the interrupt does not schedule the pll_det_enable_work(), it happens on a trigger, so how would the race happen?
it'd be good if you provide more details on the flow so that all reviewers get the ideas, it's a good opportunity to fix this for good and move on.

On Thu, Nov 21, 2019 at 11:20 AM Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com wrote:
Still you may want to clarify your second point and the commit message. The first race condition was clear, the second one not so much.
what did you mean with 'assert ULK == 0' and what does this map to in pll_work
The case (2) race condition is based on the previous patch: https://mailman.alsa-project.org/pipermail/alsa-devel/2019-November/158852.h...
After applying the patch, the code would be something like: (in the case, two threads read 0x01.) static void max98090_pll_work(struct work_struct *work) { [snip] snd_soc_component_update_bits(component, M98090_REG_DEVICE_SHUTDOWN, M98090_SHDNN_MASK, 0); snd_soc_component_update_bits(component, M98090_REG_DEVICE_SHUTDOWN, M98090_SHDNN_MASK, M98090_SHDNN_MASK);
for (i = 0; i < 10; ++i) { /* Check lock status */ pll = snd_soc_component_read32( component, M98090_REG_DEVICE_STATUS); if (!(pll & M98090_ULK_MASK)) break;
/* Give PLL time to lock */ usleep_range(1000, 1200); } }
I guess we have 2 choices: i. stick to the original plan: option B, remove the second thread of execution. In the case, we would be punished by at most 10~12ms delay in the interrupt handler thread. I browsed the max98090_interrupt() and tend to think it should be fine if we delay jack detection for extra 10ms.
ii. adopt option A, play with the masks; and drop the previous patch to avoid multiple threads access 0x01 We would be forced to wait >10ms before re-enabling ULK interrupt. Notably, Documentation/timers/timers-howto.txt states "msleep(1~20) may not do what the caller intends, and will often sleep longer".Documentation/timers/timers-howto.txt".
Either way should be fine. Which one would you suggest?

On 11/21/19 12:17 AM, Tzung-Bi Shih wrote:
On Thu, Nov 21, 2019 at 11:20 AM Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com wrote:
Still you may want to clarify your second point and the commit message. The first race condition was clear, the second one not so much.
what did you mean with 'assert ULK == 0' and what does this map to in pll_work
The case (2) race condition is based on the previous patch: https://mailman.alsa-project.org/pipermail/alsa-devel/2019-November/158852.h...
ah, ok, so that's a race you introduced :-)
After applying the patch, the code would be something like: (in the case, two threads read 0x01.) static void max98090_pll_work(struct work_struct *work) { [snip] snd_soc_component_update_bits(component, M98090_REG_DEVICE_SHUTDOWN, M98090_SHDNN_MASK, 0); snd_soc_component_update_bits(component, M98090_REG_DEVICE_SHUTDOWN, M98090_SHDNN_MASK, M98090_SHDNN_MASK);
for (i = 0; i < 10; ++i) { /* Check lock status */ pll = snd_soc_component_read32( component, M98090_REG_DEVICE_STATUS); if (!(pll & M98090_ULK_MASK)) break; /* Give PLL time to lock */ usleep_range(1000, 1200); }
}
Sorry, I don't see at what point the race happens.
if your interrupt sees an ULK status, it will schedule the pll_work. You only test the status again *after* switching the device on/off, so what is the side effect?
I am also wondering if you shouldn't sleep first in the loop, then check the status?
And in case you do have a side effect due to the clear on read, maybe you could save the status in a context and retest in the workqueue, that way you'd have a trace of the ULK event even if the register was cleared.

On Thu, Nov 21, 2019 at 11:41 PM Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com wrote:
On 11/21/19 12:17 AM, Tzung-Bi Shih wrote:
On Thu, Nov 21, 2019 at 11:20 AM Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com wrote:
Still you may want to clarify your second point and the commit message. The first race condition was clear, the second one not so much.
what did you mean with 'assert ULK == 0' and what does this map to in pll_work
The case (2) race condition is based on the previous patch: https://mailman.alsa-project.org/pipermail/alsa-devel/2019-November/158852.h...
ah, ok, so that's a race you introduced :-)
After applying the patch, the code would be something like: (in the case, two threads read 0x01.) static void max98090_pll_work(struct work_struct *work) { [snip] snd_soc_component_update_bits(component, M98090_REG_DEVICE_SHUTDOWN, M98090_SHDNN_MASK, 0); snd_soc_component_update_bits(component, M98090_REG_DEVICE_SHUTDOWN, M98090_SHDNN_MASK, M98090_SHDNN_MASK);
for (i = 0; i < 10; ++i) { /* Check lock status */ pll = snd_soc_component_read32( component, M98090_REG_DEVICE_STATUS); if (!(pll & M98090_ULK_MASK)) break; /* Give PLL time to lock */ usleep_range(1000, 1200); }
}
Sorry, I don't see at what point the race happens.
if your interrupt sees an ULK status, it will schedule the pll_work. You only test the status again *after* switching the device on/off, so what is the side effect?
Both interrupt thread and pll_work thread access the status register. The side effect is: interrupt thread could read the status register right before pll_work thread's read.
I am also wondering if you shouldn't sleep first in the loop, then check the status?
ACK, "sleep first and check the status" makes more sense.
And in case you do have a side effect due to the clear on read, maybe you could save the status in a context and retest in the workqueue, that way you'd have a trace of the ULK event even if the register was cleared.
I think cache the status couldn't be a good idea. We couldn't make sure the freshness of the cache.
Can we suppose CPUs always get ULK=1 if PLL is unlocked so that we can simply ignore the case (2)? Since we know the register is volatile and read the register via I2C should be much slower than hardware raise the bit.

if your interrupt sees an ULK status, it will schedule the pll_work. You only test the status again *after* switching the device on/off, so what is the side effect?
Both interrupt thread and pll_work thread access the status register. The side effect is: interrupt thread could read the status register right before pll_work thread's read.
I am also wondering if you shouldn't sleep first in the loop, then check the status?
ACK, "sleep first and check the status" makes more sense.
And in case you do have a side effect due to the clear on read, maybe you could save the status in a context and retest in the workqueue, that way you'd have a trace of the ULK event even if the register was cleared.
I think cache the status couldn't be a good idea. We couldn't make sure the freshness of the cache.
I was thinking more of treating it as a volatile, but it's ugly indeed.
Can we suppose CPUs always get ULK=1 if PLL is unlocked so that we can simply ignore the case (2)? Since we know the register is volatile and read the register via I2C should be much slower than hardware raise the bit.
Your option B with the small change to sleep then retest is probably the better solution overall.
BTW did you test this on both Baytrail and BSW chromebooks? In the past I saw baytrail devices exposed this error a lot more than BSW.

On Fri, Nov 22, 2019 at 3:20 AM Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com wrote:
Can we suppose CPUs always get ULK=1 if PLL is unlocked so that we can simply ignore the case (2)? Since we know the register is volatile and read the register via I2C should be much slower than hardware raise the bit.
Your option B with the small change to sleep then retest is probably the better solution overall.
Have sent v2, https://mailman.alsa-project.org/pipermail/alsa-devel/2019-November/159050.h...
BTW did you test this on both Baytrail and BSW chromebooks? In the past I saw baytrail devices exposed this error a lot more than BSW.
I tested on a Baytrail-based chromebook which is easy to generate lots of "PLL unlocked" on the console. After applying this series and the next series (https://mailman.alsa-project.org/pipermail/alsa-devel/2019-November/158853.h...), I seldom see "PLL unlocked" on the console. It still happens a few times, but with very very low probability.
participants (2)
-
Pierre-Louis Bossart
-
Tzung-Bi Shih