[PATCH] ASoC: da7219: Fix pole orientation detection on OMTP headsets when playing music
The OMTP pin define headsets can be mis-detected as line out instead of OMTP, causing obvious issues with audio quality. This patch is to put increased resistances within the device at a suitable point.
To solve this issue better, the new mechanism setup ground switches with conditional delay control and these allow for more stabile detection process to operate as intended. This conditional delay control will not impact the hardware process but use extra system resource.
This commit improves control of ground switches in the AAD logic.
Signed-off-by: David Rau david.rau.zg@renesas.com --- sound/soc/codecs/da7219-aad.c | 42 ++++++++++++++++++++++++++++++----- sound/soc/codecs/da7219-aad.h | 1 + 2 files changed, 37 insertions(+), 6 deletions(-)
diff --git a/sound/soc/codecs/da7219-aad.c b/sound/soc/codecs/da7219-aad.c index bba73c44c219..08200ec259f9 100644 --- a/sound/soc/codecs/da7219-aad.c +++ b/sound/soc/codecs/da7219-aad.c @@ -352,9 +352,14 @@ static irqreturn_t da7219_aad_irq_thread(int irq, void *data) struct snd_soc_dapm_context *dapm = snd_soc_component_get_dapm(component); struct da7219_priv *da7219 = snd_soc_component_get_drvdata(component); u8 events[DA7219_AAD_IRQ_REG_MAX]; - u8 statusa; + u8 statusa, srm_st; int i, report = 0, mask = 0;
+ srm_st = snd_soc_component_read(component, DA7219_PLL_SRM_STS) & DA7219_PLL_SRM_STS_MCLK; + msleep(da7219_aad->gnd_switch_delay * ((srm_st == 0x0) ? 2 : 1) - 4); + /* Enable ground switch */ + snd_soc_component_update_bits(component, 0xFB, 0x01, 0x01); + /* Read current IRQ events */ regmap_bulk_read(da7219->regmap, DA7219_ACCDET_IRQ_EVENT_A, events, DA7219_AAD_IRQ_REG_MAX); @@ -454,8 +459,8 @@ static irqreturn_t da7219_aad_irq_thread(int irq, void *data) snd_soc_dapm_disable_pin(dapm, "Mic Bias"); snd_soc_dapm_sync(dapm);
- /* Enable ground switch */ - snd_soc_component_update_bits(component, 0xFB, 0x01, 0x01); + /* Disable ground switch */ + snd_soc_component_update_bits(component, 0xFB, 0x01, 0x00); } }
@@ -831,6 +836,32 @@ static void da7219_aad_handle_pdata(struct snd_soc_component *component) } }
+static void da7219_aad_handle_gnd_switch_time(struct snd_soc_component *component) +{ + struct da7219_priv *da7219 = snd_soc_component_get_drvdata(component); + struct da7219_aad_priv *da7219_aad = da7219->aad; + u8 jack_det; + + jack_det = snd_soc_component_read(component, DA7219_ACCDET_CONFIG_2) + & DA7219_JACK_DETECT_RATE_MASK; + switch (jack_det) { + case 0x00: + da7219_aad->gnd_switch_delay = 32; + break; + case 0x10: + da7219_aad->gnd_switch_delay = 64; + break; + case 0x20: + da7219_aad->gnd_switch_delay = 128; + break; + case 0x30: + da7219_aad->gnd_switch_delay = 256; + break; + default: + da7219_aad->gnd_switch_delay = 32; + break; + } +}
/* * Suspend/Resume @@ -908,9 +939,6 @@ int da7219_aad_init(struct snd_soc_component *component) snd_soc_component_update_bits(component, DA7219_ACCDET_CONFIG_1, DA7219_BUTTON_CONFIG_MASK, 0);
- /* Enable ground switch */ - snd_soc_component_update_bits(component, 0xFB, 0x01, 0x01); - INIT_WORK(&da7219_aad->btn_det_work, da7219_aad_btn_det_work); INIT_WORK(&da7219_aad->hptest_work, da7219_aad_hptest_work);
@@ -928,6 +956,8 @@ int da7219_aad_init(struct snd_soc_component *component) regmap_bulk_write(da7219->regmap, DA7219_ACCDET_IRQ_MASK_A, &mask, DA7219_AAD_IRQ_REG_MAX);
+ da7219_aad_handle_gnd_switch_time(component); + return 0; } EXPORT_SYMBOL_GPL(da7219_aad_init); diff --git a/sound/soc/codecs/da7219-aad.h b/sound/soc/codecs/da7219-aad.h index f48a12012ef3..21fdf53095cc 100644 --- a/sound/soc/codecs/da7219-aad.h +++ b/sound/soc/codecs/da7219-aad.h @@ -187,6 +187,7 @@ enum da7219_aad_event_regs { struct da7219_aad_priv { struct snd_soc_component *component; int irq; + int gnd_switch_delay;
u8 micbias_pulse_lvl; u32 micbias_pulse_time;
On Mon, 21 Nov 2022 05:07:44 +0000, David Rau wrote:
The OMTP pin define headsets can be mis-detected as line out instead of OMTP, causing obvious issues with audio quality. This patch is to put increased resistances within the device at a suitable point.
To solve this issue better, the new mechanism setup ground switches with conditional delay control and these allow for more stabile detection process to operate as intended. This conditional delay control will not impact the hardware process but use extra system resource.
[...]
Applied to
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git for-next
Thanks!
[1/1] ASoC: da7219: Fix pole orientation detection on OMTP headsets when playing music commit: 969357ec94e670571d6593f2a93aba25e4577d4f
All being well this means that it will be integrated into the linux-next tree (usually sometime in the next 24 hours) and sent to Linus during the next merge window (or sooner if it is a bug fix), however if problems are discovered then the patch may be dropped or reverted.
You may get further e-mails resulting from automated or manual testing and review of the tree, please engage with people reporting problems and send followup patches addressing any issues that are reported if needed.
If any updates are required or you are submitting further changes they should be sent as incremental updates against current git, existing patches will not be replaced.
Please add any relevant lists and maintainers to the CCs when replying to this mail.
Thanks, Mark
On Mon, Nov 21, 2022 at 05:07:44AM +0000, David Rau wrote:
The OMTP pin define headsets can be mis-detected as line out instead of OMTP, causing obvious issues with audio quality. This patch is to put increased resistances within the device at a suitable point.
To solve this issue better, the new mechanism setup ground switches with conditional delay control and these allow for more stabile detection process to operate as intended. This conditional delay control will not impact the hardware process but use extra system resource.
This commit improves control of ground switches in the AAD logic.
Signed-off-by: David Rau david.rau.zg@renesas.com
sound/soc/codecs/da7219-aad.c | 42 ++++++++++++++++++++++++++++++----- sound/soc/codecs/da7219-aad.h | 1 + 2 files changed, 37 insertions(+), 6 deletions(-)
diff --git a/sound/soc/codecs/da7219-aad.c b/sound/soc/codecs/da7219-aad.c index bba73c44c219..08200ec259f9 100644 --- a/sound/soc/codecs/da7219-aad.c +++ b/sound/soc/codecs/da7219-aad.c @@ -352,9 +352,14 @@ static irqreturn_t da7219_aad_irq_thread(int irq, void *data) struct snd_soc_dapm_context *dapm = snd_soc_component_get_dapm(component); struct da7219_priv *da7219 = snd_soc_component_get_drvdata(component); u8 events[DA7219_AAD_IRQ_REG_MAX];
- u8 statusa;
u8 statusa, srm_st; int i, report = 0, mask = 0;
srm_st = snd_soc_component_read(component, DA7219_PLL_SRM_STS) & DA7219_PLL_SRM_STS_MCLK;
msleep(da7219_aad->gnd_switch_delay * ((srm_st == 0x0) ? 2 : 1) - 4);
Ever since this patch was applied to ChromeOS, we have observed hung task crashes in da7219_aad_irq_thread().
Is it really appropriate to sleep up to (256 * 2) - 4 = 508 ms in an interrupt handler ?
Thanks, Guenter
- /* Enable ground switch */
- snd_soc_component_update_bits(component, 0xFB, 0x01, 0x01);
- /* Read current IRQ events */ regmap_bulk_read(da7219->regmap, DA7219_ACCDET_IRQ_EVENT_A, events, DA7219_AAD_IRQ_REG_MAX);
@@ -454,8 +459,8 @@ static irqreturn_t da7219_aad_irq_thread(int irq, void *data) snd_soc_dapm_disable_pin(dapm, "Mic Bias"); snd_soc_dapm_sync(dapm);
/* Enable ground switch */
snd_soc_component_update_bits(component, 0xFB, 0x01, 0x01);
/* Disable ground switch */
} }snd_soc_component_update_bits(component, 0xFB, 0x01, 0x00);
@@ -831,6 +836,32 @@ static void da7219_aad_handle_pdata(struct snd_soc_component *component) } }
+static void da7219_aad_handle_gnd_switch_time(struct snd_soc_component *component) +{
- struct da7219_priv *da7219 = snd_soc_component_get_drvdata(component);
- struct da7219_aad_priv *da7219_aad = da7219->aad;
- u8 jack_det;
- jack_det = snd_soc_component_read(component, DA7219_ACCDET_CONFIG_2)
& DA7219_JACK_DETECT_RATE_MASK;
- switch (jack_det) {
- case 0x00:
da7219_aad->gnd_switch_delay = 32;
break;
- case 0x10:
da7219_aad->gnd_switch_delay = 64;
break;
- case 0x20:
da7219_aad->gnd_switch_delay = 128;
break;
- case 0x30:
da7219_aad->gnd_switch_delay = 256;
break;
- default:
da7219_aad->gnd_switch_delay = 32;
break;
- }
+}
/*
- Suspend/Resume
@@ -908,9 +939,6 @@ int da7219_aad_init(struct snd_soc_component *component) snd_soc_component_update_bits(component, DA7219_ACCDET_CONFIG_1, DA7219_BUTTON_CONFIG_MASK, 0);
- /* Enable ground switch */
- snd_soc_component_update_bits(component, 0xFB, 0x01, 0x01);
- INIT_WORK(&da7219_aad->btn_det_work, da7219_aad_btn_det_work); INIT_WORK(&da7219_aad->hptest_work, da7219_aad_hptest_work);
@@ -928,6 +956,8 @@ int da7219_aad_init(struct snd_soc_component *component) regmap_bulk_write(da7219->regmap, DA7219_ACCDET_IRQ_MASK_A, &mask, DA7219_AAD_IRQ_REG_MAX);
- da7219_aad_handle_gnd_switch_time(component);
- return 0;
} EXPORT_SYMBOL_GPL(da7219_aad_init); diff --git a/sound/soc/codecs/da7219-aad.h b/sound/soc/codecs/da7219-aad.h index f48a12012ef3..21fdf53095cc 100644 --- a/sound/soc/codecs/da7219-aad.h +++ b/sound/soc/codecs/da7219-aad.h @@ -187,6 +187,7 @@ enum da7219_aad_event_regs { struct da7219_aad_priv { struct snd_soc_component *component; int irq;
int gnd_switch_delay;
u8 micbias_pulse_lvl; u32 micbias_pulse_time;
-- 2.17.1
Would you please provide me the related error messages when hung task crashes in da7219_aad_irq_thread()? BTW, "gnd_switch_delay = 256" is an unusual use case of the longer jack detection latency.
-----Original Message----- From: Guenter Roeck groeck7@gmail.com On Behalf Of Guenter Roeck Sent: Wednesday, January 18, 2023 03:57 To: David Rau we730128@gmail.com Cc: perex@perex.cz; lgirdwood@gmail.com; broonie@kernel.org; tiwai@suse.com; support.opensource@diasemi.com; alsa-devel@alsa-project.org; linux-kernel@vger.kernel.org; David Rau david.rau.zg@renesas.com Subject: Re: [PATCH] ASoC: da7219: Fix pole orientation detection on OMTP headsets when playing music
On Mon, Nov 21, 2022 at 05:07:44AM +0000, David Rau wrote:
The OMTP pin define headsets can be mis-detected as line out instead of OMTP, causing obvious issues with audio quality. This patch is to put increased resistances within the device at a suitable point.
To solve this issue better, the new mechanism setup ground switches with conditional delay control and these allow for more stabile detection process to operate as intended. This conditional delay control will not impact the hardware process but use extra system resource.
This commit improves control of ground switches in the AAD logic.
Signed-off-by: David Rau david.rau.zg@renesas.com
sound/soc/codecs/da7219-aad.c | 42 ++++++++++++++++++++++++++++++----- sound/soc/codecs/da7219-aad.h | 1 + 2 files changed, 37 insertions(+), 6 deletions(-)
diff --git a/sound/soc/codecs/da7219-aad.c b/sound/soc/codecs/da7219-aad.c index bba73c44c219..08200ec259f9 100644 --- a/sound/soc/codecs/da7219-aad.c +++ b/sound/soc/codecs/da7219-aad.c @@ -352,9 +352,14 @@ static irqreturn_t da7219_aad_irq_thread(int irq, void *data) struct snd_soc_dapm_context *dapm = snd_soc_component_get_dapm(component); struct da7219_priv *da7219 = snd_soc_component_get_drvdata(component); u8 events[DA7219_AAD_IRQ_REG_MAX];
- u8 statusa;
u8 statusa, srm_st; int i, report = 0, mask = 0;
srm_st = snd_soc_component_read(component, DA7219_PLL_SRM_STS) & DA7219_PLL_SRM_STS_MCLK;
msleep(da7219_aad->gnd_switch_delay * ((srm_st == 0x0) ? 2 : 1) -
+4);
Ever since this patch was applied to ChromeOS, we have observed hung task crashes in da7219_aad_irq_thread().
Is it really appropriate to sleep up to (256 * 2) - 4 = 508 ms in an interrupt handler ?
Thanks, Guenter
- /* Enable ground switch */
- snd_soc_component_update_bits(component, 0xFB, 0x01, 0x01);
- /* Read current IRQ events */ regmap_bulk_read(da7219->regmap, DA7219_ACCDET_IRQ_EVENT_A, events, DA7219_AAD_IRQ_REG_MAX);
@@ -454,8 +459,8 @@ static irqreturn_t da7219_aad_irq_thread(int irq, void *data) snd_soc_dapm_disable_pin(dapm, "Mic Bias"); snd_soc_dapm_sync(dapm);
/* Enable ground switch */
snd_soc_component_update_bits(component, 0xFB, 0x01, 0x01);
/* Disable ground switch */
} }snd_soc_component_update_bits(component, 0xFB, 0x01, 0x00);
@@ -831,6 +836,32 @@ static void da7219_aad_handle_pdata(struct snd_soc_component *component) } }
+static void da7219_aad_handle_gnd_switch_time(struct +snd_soc_component *component) {
- struct da7219_priv *da7219 = snd_soc_component_get_drvdata(component);
- struct da7219_aad_priv *da7219_aad = da7219->aad;
- u8 jack_det;
- jack_det = snd_soc_component_read(component, DA7219_ACCDET_CONFIG_2)
& DA7219_JACK_DETECT_RATE_MASK;
- switch (jack_det) {
- case 0x00:
da7219_aad->gnd_switch_delay = 32;
break;
- case 0x10:
da7219_aad->gnd_switch_delay = 64;
break;
- case 0x20:
da7219_aad->gnd_switch_delay = 128;
break;
- case 0x30:
da7219_aad->gnd_switch_delay = 256;
break;
- default:
da7219_aad->gnd_switch_delay = 32;
break;
- }
+}
/*
- Suspend/Resume
@@ -908,9 +939,6 @@ int da7219_aad_init(struct snd_soc_component *component) snd_soc_component_update_bits(component, DA7219_ACCDET_CONFIG_1, DA7219_BUTTON_CONFIG_MASK, 0);
- /* Enable ground switch */
- snd_soc_component_update_bits(component, 0xFB, 0x01, 0x01);
- INIT_WORK(&da7219_aad->btn_det_work, da7219_aad_btn_det_work); INIT_WORK(&da7219_aad->hptest_work, da7219_aad_hptest_work);
@@ -928,6 +956,8 @@ int da7219_aad_init(struct snd_soc_component *component) regmap_bulk_write(da7219->regmap, DA7219_ACCDET_IRQ_MASK_A, &mask, DA7219_AAD_IRQ_REG_MAX);
- da7219_aad_handle_gnd_switch_time(component);
- return 0;
} EXPORT_SYMBOL_GPL(da7219_aad_init); diff --git a/sound/soc/codecs/da7219-aad.h b/sound/soc/codecs/da7219-aad.h index f48a12012ef3..21fdf53095cc 100644 --- a/sound/soc/codecs/da7219-aad.h +++ b/sound/soc/codecs/da7219-aad.h @@ -187,6 +187,7 @@ enum da7219_aad_event_regs { struct da7219_aad_priv { struct snd_soc_component *component; int irq;
int gnd_switch_delay;
u8 micbias_pulse_lvl; u32 micbias_pulse_time;
-- 2.17.1
On Thu, Jan 19, 2023 at 11:02:25AM +0000, David Rau wrote:
Would you please provide me the related error messages when hung task crashes in da7219_aad_irq_thread()? BTW, "gnd_switch_delay = 256" is an unusual use case of the longer jack detection latency.
Here is a typical traceback.
<3>[ 246.919057] INFO: task irq/105-da7219-:2854 blocked for more than 122 seconds. <3>[ 246.919065] Not tainted 5.10.159-20927-g317f62e2494d #1 <3>[ 246.919068] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. <6>[ $PHONE_NUMBER] task:irq/105-da7219- state:D stack: 0 pid: 2854 ppid: 2 flags:0x00004080 <6>[ 246.919075] Call Trace: <6>[ 246.919084] __schedule+0x3b0/0xdaf <6>[ 246.919090] schedule+0x44/0xa8 <6>[ 246.919093] schedule_timeout+0xb6/0x290 <6>[ 246.919098] ? run_local_timers+0x4e/0x4e <6>[ 246.919102] msleep+0x2c/0x38 <6>[ 246.919108] da7219_aad_irq_thread+0x66/0x2b0 [snd_soc_da7219 cd5a76eef6e777074216b9d61f7918f7561bf7ec] <6>[ 246.919113] ? irq_forced_thread_fn+0x5f/0x5f <6>[ 246.919116] irq_thread_fn+0x22/0x4d <6>[ 246.919120] irq_thread+0x120/0x19d <6>[ 246.919123] ? irq_thread_fn+0x4d/0x4d <6>[ 246.919128] kthread+0x142/0x153 <6>[ 246.919132] ? irq_forced_secondary_handler+0x21/0x21 <6>[ 246.919135] ? kthread_blkcg+0x31/0x31 <6>[ 246.919139] ret_from_fork+0x1f/0x30
The underlying question is if it really appropriate to have an msleep() of any kind in an interrupt handler. If this is about debouncing a signal, it should be handled with a delayed timer.
Guenter
-----Original Message----- From: Guenter Roeck groeck7@gmail.com On Behalf Of Guenter Roeck Sent: Wednesday, January 18, 2023 03:57 To: David Rau we730128@gmail.com Cc: perex@perex.cz; lgirdwood@gmail.com; broonie@kernel.org; tiwai@suse.com; support.opensource@diasemi.com; alsa-devel@alsa-project.org; linux-kernel@vger.kernel.org; David Rau david.rau.zg@renesas.com Subject: Re: [PATCH] ASoC: da7219: Fix pole orientation detection on OMTP headsets when playing music
On Mon, Nov 21, 2022 at 05:07:44AM +0000, David Rau wrote:
The OMTP pin define headsets can be mis-detected as line out instead of OMTP, causing obvious issues with audio quality. This patch is to put increased resistances within the device at a suitable point.
To solve this issue better, the new mechanism setup ground switches with conditional delay control and these allow for more stabile detection process to operate as intended. This conditional delay control will not impact the hardware process but use extra system resource.
This commit improves control of ground switches in the AAD logic.
Signed-off-by: David Rau david.rau.zg@renesas.com
sound/soc/codecs/da7219-aad.c | 42 ++++++++++++++++++++++++++++++----- sound/soc/codecs/da7219-aad.h | 1 + 2 files changed, 37 insertions(+), 6 deletions(-)
diff --git a/sound/soc/codecs/da7219-aad.c b/sound/soc/codecs/da7219-aad.c index bba73c44c219..08200ec259f9 100644 --- a/sound/soc/codecs/da7219-aad.c +++ b/sound/soc/codecs/da7219-aad.c @@ -352,9 +352,14 @@ static irqreturn_t da7219_aad_irq_thread(int irq, void *data) struct snd_soc_dapm_context *dapm = snd_soc_component_get_dapm(component); struct da7219_priv *da7219 = snd_soc_component_get_drvdata(component); u8 events[DA7219_AAD_IRQ_REG_MAX];
- u8 statusa;
u8 statusa, srm_st; int i, report = 0, mask = 0;
srm_st = snd_soc_component_read(component, DA7219_PLL_SRM_STS) & DA7219_PLL_SRM_STS_MCLK;
msleep(da7219_aad->gnd_switch_delay * ((srm_st == 0x0) ? 2 : 1) -
+4);
Ever since this patch was applied to ChromeOS, we have observed hung task crashes in da7219_aad_irq_thread().
Is it really appropriate to sleep up to (256 * 2) - 4 = 508 ms in an interrupt handler ?
Thanks, Guenter
- /* Enable ground switch */
- snd_soc_component_update_bits(component, 0xFB, 0x01, 0x01);
- /* Read current IRQ events */ regmap_bulk_read(da7219->regmap, DA7219_ACCDET_IRQ_EVENT_A, events, DA7219_AAD_IRQ_REG_MAX);
@@ -454,8 +459,8 @@ static irqreturn_t da7219_aad_irq_thread(int irq, void *data) snd_soc_dapm_disable_pin(dapm, "Mic Bias"); snd_soc_dapm_sync(dapm);
/* Enable ground switch */
snd_soc_component_update_bits(component, 0xFB, 0x01, 0x01);
/* Disable ground switch */
} }snd_soc_component_update_bits(component, 0xFB, 0x01, 0x00);
@@ -831,6 +836,32 @@ static void da7219_aad_handle_pdata(struct snd_soc_component *component) } }
+static void da7219_aad_handle_gnd_switch_time(struct +snd_soc_component *component) {
- struct da7219_priv *da7219 = snd_soc_component_get_drvdata(component);
- struct da7219_aad_priv *da7219_aad = da7219->aad;
- u8 jack_det;
- jack_det = snd_soc_component_read(component, DA7219_ACCDET_CONFIG_2)
& DA7219_JACK_DETECT_RATE_MASK;
- switch (jack_det) {
- case 0x00:
da7219_aad->gnd_switch_delay = 32;
break;
- case 0x10:
da7219_aad->gnd_switch_delay = 64;
break;
- case 0x20:
da7219_aad->gnd_switch_delay = 128;
break;
- case 0x30:
da7219_aad->gnd_switch_delay = 256;
break;
- default:
da7219_aad->gnd_switch_delay = 32;
break;
- }
+}
/*
- Suspend/Resume
@@ -908,9 +939,6 @@ int da7219_aad_init(struct snd_soc_component *component) snd_soc_component_update_bits(component, DA7219_ACCDET_CONFIG_1, DA7219_BUTTON_CONFIG_MASK, 0);
- /* Enable ground switch */
- snd_soc_component_update_bits(component, 0xFB, 0x01, 0x01);
- INIT_WORK(&da7219_aad->btn_det_work, da7219_aad_btn_det_work); INIT_WORK(&da7219_aad->hptest_work, da7219_aad_hptest_work);
@@ -928,6 +956,8 @@ int da7219_aad_init(struct snd_soc_component *component) regmap_bulk_write(da7219->regmap, DA7219_ACCDET_IRQ_MASK_A, &mask, DA7219_AAD_IRQ_REG_MAX);
- da7219_aad_handle_gnd_switch_time(component);
- return 0;
} EXPORT_SYMBOL_GPL(da7219_aad_init); diff --git a/sound/soc/codecs/da7219-aad.h b/sound/soc/codecs/da7219-aad.h index f48a12012ef3..21fdf53095cc 100644 --- a/sound/soc/codecs/da7219-aad.h +++ b/sound/soc/codecs/da7219-aad.h @@ -187,6 +187,7 @@ enum da7219_aad_event_regs { struct da7219_aad_priv { struct snd_soc_component *component; int irq;
int gnd_switch_delay;
u8 micbias_pulse_lvl; u32 micbias_pulse_time;
-- 2.17.1
Thanks for the kind feedback. Would you please let me know what kinds of environment such error appears you ever meet? Ex: da7219_aad->gnd_switch_delay = ?
-----Original Message----- From: Guenter Roeck groeck7@gmail.com On Behalf Of Guenter Roeck Sent: Friday, January 20, 2023 00:12 To: David Rau david.rau.zg@renesas.com Cc: David Rau we730128@gmail.com; perex@perex.cz; lgirdwood@gmail.com; broonie@kernel.org; tiwai@suse.com; support.opensource@diasemi.com; alsa-devel@alsa-project.org; linux-kernel@vger.kernel.org Subject: Re: [PATCH] ASoC: da7219: Fix pole orientation detection on OMTP headsets when playing music
On Thu, Jan 19, 2023 at 11:02:25AM +0000, David Rau wrote:
Would you please provide me the related error messages when hung task crashes in da7219_aad_irq_thread()? BTW, "gnd_switch_delay = 256" is an unusual use case of the longer jack detection latency.
Here is a typical traceback.
<3>[ 246.919057] INFO: task irq/105-da7219-:2854 blocked for more than 122 seconds. <3>[ 246.919065] Not tainted 5.10.159-20927-g317f62e2494d #1 <3>[ 246.919068] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. <6>[ $PHONE_NUMBER] task:irq/105-da7219- state:D stack: 0 pid: 2854 ppid: 2 flags:0x00004080 <6>[ 246.919075] Call Trace: <6>[ 246.919084] __schedule+0x3b0/0xdaf <6>[ 246.919090] schedule+0x44/0xa8 <6>[ 246.919093] schedule_timeout+0xb6/0x290 <6>[ 246.919098] ? run_local_timers+0x4e/0x4e <6>[ 246.919102] msleep+0x2c/0x38 <6>[ 246.919108] da7219_aad_irq_thread+0x66/0x2b0 [snd_soc_da7219 cd5a76eef6e777074216b9d61f7918f7561bf7ec] <6>[ 246.919113] ? irq_forced_thread_fn+0x5f/0x5f <6>[ 246.919116] irq_thread_fn+0x22/0x4d <6>[ 246.919120] irq_thread+0x120/0x19d <6>[ 246.919123] ? irq_thread_fn+0x4d/0x4d <6>[ 246.919128] kthread+0x142/0x153 <6>[ 246.919132] ? irq_forced_secondary_handler+0x21/0x21 <6>[ 246.919135] ? kthread_blkcg+0x31/0x31 <6>[ 246.919139] ret_from_fork+0x1f/0x30
The underlying question is if it really appropriate to have an msleep() of any kind in an interrupt handler. If this is about debouncing a signal, it should be handled with a delayed timer.
Guenter
-----Original Message----- From: Guenter Roeck groeck7@gmail.com On Behalf Of Guenter Roeck Sent: Wednesday, January 18, 2023 03:57 To: David Rau we730128@gmail.com Cc: perex@perex.cz; lgirdwood@gmail.com; broonie@kernel.org; tiwai@suse.com; support.opensource@diasemi.com; alsa-devel@alsa-project.org; linux-kernel@vger.kernel.org; David Rau david.rau.zg@renesas.com Subject: Re: [PATCH] ASoC: da7219: Fix pole orientation detection on OMTP headsets when playing music
On Mon, Nov 21, 2022 at 05:07:44AM +0000, David Rau wrote:
The OMTP pin define headsets can be mis-detected as line out instead of OMTP, causing obvious issues with audio quality. This patch is to put increased resistances within the device at a suitable point.
To solve this issue better, the new mechanism setup ground switches with conditional delay control and these allow for more stabile detection process to operate as intended. This conditional delay control will not impact the hardware process but use extra system resource.
This commit improves control of ground switches in the AAD logic.
Signed-off-by: David Rau david.rau.zg@renesas.com
sound/soc/codecs/da7219-aad.c | 42 ++++++++++++++++++++++++++++++----- sound/soc/codecs/da7219-aad.h | 1 + 2 files changed, 37 insertions(+), 6 deletions(-)
diff --git a/sound/soc/codecs/da7219-aad.c b/sound/soc/codecs/da7219-aad.c index bba73c44c219..08200ec259f9 100644 --- a/sound/soc/codecs/da7219-aad.c +++ b/sound/soc/codecs/da7219-aad.c @@ -352,9 +352,14 @@ static irqreturn_t da7219_aad_irq_thread(int irq, void *data) struct snd_soc_dapm_context *dapm = snd_soc_component_get_dapm(component); struct da7219_priv *da7219 = snd_soc_component_get_drvdata(component); u8 events[DA7219_AAD_IRQ_REG_MAX];
- u8 statusa;
u8 statusa, srm_st; int i, report = 0, mask = 0;
srm_st = snd_soc_component_read(component, DA7219_PLL_SRM_STS) & DA7219_PLL_SRM_STS_MCLK;
msleep(da7219_aad->gnd_switch_delay * ((srm_st == 0x0) ? 2 : 1) -
+4);
Ever since this patch was applied to ChromeOS, we have observed hung task crashes in da7219_aad_irq_thread().
Is it really appropriate to sleep up to (256 * 2) - 4 = 508 ms in an interrupt handler ?
Thanks, Guenter
- /* Enable ground switch */
- snd_soc_component_update_bits(component, 0xFB, 0x01, 0x01);
- /* Read current IRQ events */ regmap_bulk_read(da7219->regmap, DA7219_ACCDET_IRQ_EVENT_A, events, DA7219_AAD_IRQ_REG_MAX); @@ -454,8 +459,8 @@ static
irqreturn_t da7219_aad_irq_thread(int irq, void *data) snd_soc_dapm_disable_pin(dapm, "Mic Bias"); snd_soc_dapm_sync(dapm);
/* Enable ground switch */
snd_soc_component_update_bits(component, 0xFB, 0x01, 0x01);
/* Disable ground switch */
} }snd_soc_component_update_bits(component, 0xFB, 0x01, 0x00);
@@ -831,6 +836,32 @@ static void da7219_aad_handle_pdata(struct snd_soc_component *component) } }
+static void da7219_aad_handle_gnd_switch_time(struct +snd_soc_component *component) {
- struct da7219_priv *da7219 = snd_soc_component_get_drvdata(component);
- struct da7219_aad_priv *da7219_aad = da7219->aad;
- u8 jack_det;
- jack_det = snd_soc_component_read(component, DA7219_ACCDET_CONFIG_2)
& DA7219_JACK_DETECT_RATE_MASK;
- switch (jack_det) {
- case 0x00:
da7219_aad->gnd_switch_delay = 32;
break;
- case 0x10:
da7219_aad->gnd_switch_delay = 64;
break;
- case 0x20:
da7219_aad->gnd_switch_delay = 128;
break;
- case 0x30:
da7219_aad->gnd_switch_delay = 256;
break;
- default:
da7219_aad->gnd_switch_delay = 32;
break;
- }
+}
/*
- Suspend/Resume
@@ -908,9 +939,6 @@ int da7219_aad_init(struct snd_soc_component *component) snd_soc_component_update_bits(component, DA7219_ACCDET_CONFIG_1, DA7219_BUTTON_CONFIG_MASK, 0);
- /* Enable ground switch */
- snd_soc_component_update_bits(component, 0xFB, 0x01, 0x01);
- INIT_WORK(&da7219_aad->btn_det_work, da7219_aad_btn_det_work); INIT_WORK(&da7219_aad->hptest_work, da7219_aad_hptest_work);
@@ -928,6 +956,8 @@ int da7219_aad_init(struct snd_soc_component *component) regmap_bulk_write(da7219->regmap, DA7219_ACCDET_IRQ_MASK_A, &mask, DA7219_AAD_IRQ_REG_MAX);
- da7219_aad_handle_gnd_switch_time(component);
- return 0;
} EXPORT_SYMBOL_GPL(da7219_aad_init); diff --git a/sound/soc/codecs/da7219-aad.h b/sound/soc/codecs/da7219-aad.h index f48a12012ef3..21fdf53095cc 100644 --- a/sound/soc/codecs/da7219-aad.h +++ b/sound/soc/codecs/da7219-aad.h @@ -187,6 +187,7 @@ enum da7219_aad_event_regs { struct da7219_aad_priv { struct snd_soc_component *component; int irq;
int gnd_switch_delay;
u8 micbias_pulse_lvl; u32 micbias_pulse_time;
-- 2.17.1
On 1/30/23 19:58, David Rau wrote:
Thanks for the kind feedback. Would you please let me know what kinds of environment such error appears you ever meet? Ex: da7219_aad->gnd_switch_delay = ?
We are seeing the problem on various Chromebooks.
Never mind, though. I really don't have time to keep arguing about this. I would have assumed that it is obvious that a long msleep() in an interrupt handler is not appropriate, but obviously I was wrong. I'll see if I can implement a downstream fix.
Guenter
-----Original Message----- From: Guenter Roeck groeck7@gmail.com On Behalf Of Guenter Roeck Sent: Friday, January 20, 2023 00:12 To: David Rau david.rau.zg@renesas.com Cc: David Rau we730128@gmail.com; perex@perex.cz; lgirdwood@gmail.com; broonie@kernel.org; tiwai@suse.com; support.opensource@diasemi.com; alsa-devel@alsa-project.org; linux-kernel@vger.kernel.org Subject: Re: [PATCH] ASoC: da7219: Fix pole orientation detection on OMTP headsets when playing music
On Thu, Jan 19, 2023 at 11:02:25AM +0000, David Rau wrote:
Would you please provide me the related error messages when hung task crashes in da7219_aad_irq_thread()? BTW, "gnd_switch_delay = 256" is an unusual use case of the longer jack detection latency.
Here is a typical traceback.
<3>[ 246.919057] INFO: task irq/105-da7219-:2854 blocked for more than 122 seconds. <3>[ 246.919065] Not tainted 5.10.159-20927-g317f62e2494d #1 <3>[ 246.919068] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. <6>[ $PHONE_NUMBER] task:irq/105-da7219- state:D stack: 0 pid: 2854 ppid: 2 flags:0x00004080 <6>[ 246.919075] Call Trace: <6>[ 246.919084] __schedule+0x3b0/0xdaf <6>[ 246.919090] schedule+0x44/0xa8 <6>[ 246.919093] schedule_timeout+0xb6/0x290 <6>[ 246.919098] ? run_local_timers+0x4e/0x4e <6>[ 246.919102] msleep+0x2c/0x38 <6>[ 246.919108] da7219_aad_irq_thread+0x66/0x2b0 [snd_soc_da7219 cd5a76eef6e777074216b9d61f7918f7561bf7ec] <6>[ 246.919113] ? irq_forced_thread_fn+0x5f/0x5f <6>[ 246.919116] irq_thread_fn+0x22/0x4d <6>[ 246.919120] irq_thread+0x120/0x19d <6>[ 246.919123] ? irq_thread_fn+0x4d/0x4d <6>[ 246.919128] kthread+0x142/0x153 <6>[ 246.919132] ? irq_forced_secondary_handler+0x21/0x21 <6>[ 246.919135] ? kthread_blkcg+0x31/0x31 <6>[ 246.919139] ret_from_fork+0x1f/0x30
The underlying question is if it really appropriate to have an msleep() of any kind in an interrupt handler. If this is about debouncing a signal, it should be handled with a delayed timer.
Guenter
-----Original Message----- From: Guenter Roeck groeck7@gmail.com On Behalf Of Guenter Roeck Sent: Wednesday, January 18, 2023 03:57 To: David Rau we730128@gmail.com Cc: perex@perex.cz; lgirdwood@gmail.com; broonie@kernel.org; tiwai@suse.com; support.opensource@diasemi.com; alsa-devel@alsa-project.org; linux-kernel@vger.kernel.org; David Rau david.rau.zg@renesas.com Subject: Re: [PATCH] ASoC: da7219: Fix pole orientation detection on OMTP headsets when playing music
On Mon, Nov 21, 2022 at 05:07:44AM +0000, David Rau wrote:
The OMTP pin define headsets can be mis-detected as line out instead of OMTP, causing obvious issues with audio quality. This patch is to put increased resistances within the device at a suitable point.
To solve this issue better, the new mechanism setup ground switches with conditional delay control and these allow for more stabile detection process to operate as intended. This conditional delay control will not impact the hardware process but use extra system resource.
This commit improves control of ground switches in the AAD logic.
Signed-off-by: David Rau david.rau.zg@renesas.com
sound/soc/codecs/da7219-aad.c | 42 ++++++++++++++++++++++++++++++----- sound/soc/codecs/da7219-aad.h | 1 + 2 files changed, 37 insertions(+), 6 deletions(-)
diff --git a/sound/soc/codecs/da7219-aad.c b/sound/soc/codecs/da7219-aad.c index bba73c44c219..08200ec259f9 100644 --- a/sound/soc/codecs/da7219-aad.c +++ b/sound/soc/codecs/da7219-aad.c @@ -352,9 +352,14 @@ static irqreturn_t da7219_aad_irq_thread(int irq, void *data) struct snd_soc_dapm_context *dapm = snd_soc_component_get_dapm(component); struct da7219_priv *da7219 = snd_soc_component_get_drvdata(component); u8 events[DA7219_AAD_IRQ_REG_MAX];
- u8 statusa;
u8 statusa, srm_st; int i, report = 0, mask = 0;
srm_st = snd_soc_component_read(component, DA7219_PLL_SRM_STS) & DA7219_PLL_SRM_STS_MCLK;
msleep(da7219_aad->gnd_switch_delay * ((srm_st == 0x0) ? 2 : 1) -
+4);
Ever since this patch was applied to ChromeOS, we have observed hung task crashes in da7219_aad_irq_thread().
Is it really appropriate to sleep up to (256 * 2) - 4 = 508 ms in an interrupt handler ?
Thanks, Guenter
- /* Enable ground switch */
- snd_soc_component_update_bits(component, 0xFB, 0x01, 0x01);
- /* Read current IRQ events */ regmap_bulk_read(da7219->regmap, DA7219_ACCDET_IRQ_EVENT_A, events, DA7219_AAD_IRQ_REG_MAX); @@ -454,8 +459,8 @@ static
irqreturn_t da7219_aad_irq_thread(int irq, void *data) snd_soc_dapm_disable_pin(dapm, "Mic Bias"); snd_soc_dapm_sync(dapm);
/* Enable ground switch */
snd_soc_component_update_bits(component, 0xFB, 0x01, 0x01);
/* Disable ground switch */
} }snd_soc_component_update_bits(component, 0xFB, 0x01, 0x00);
@@ -831,6 +836,32 @@ static void da7219_aad_handle_pdata(struct snd_soc_component *component) } }
+static void da7219_aad_handle_gnd_switch_time(struct +snd_soc_component *component) {
- struct da7219_priv *da7219 = snd_soc_component_get_drvdata(component);
- struct da7219_aad_priv *da7219_aad = da7219->aad;
- u8 jack_det;
- jack_det = snd_soc_component_read(component, DA7219_ACCDET_CONFIG_2)
& DA7219_JACK_DETECT_RATE_MASK;
- switch (jack_det) {
- case 0x00:
da7219_aad->gnd_switch_delay = 32;
break;
- case 0x10:
da7219_aad->gnd_switch_delay = 64;
break;
- case 0x20:
da7219_aad->gnd_switch_delay = 128;
break;
- case 0x30:
da7219_aad->gnd_switch_delay = 256;
break;
- default:
da7219_aad->gnd_switch_delay = 32;
break;
- }
+}
/*
- Suspend/Resume
@@ -908,9 +939,6 @@ int da7219_aad_init(struct snd_soc_component *component) snd_soc_component_update_bits(component, DA7219_ACCDET_CONFIG_1, DA7219_BUTTON_CONFIG_MASK, 0);
- /* Enable ground switch */
- snd_soc_component_update_bits(component, 0xFB, 0x01, 0x01);
- INIT_WORK(&da7219_aad->btn_det_work, da7219_aad_btn_det_work); INIT_WORK(&da7219_aad->hptest_work, da7219_aad_hptest_work);
@@ -928,6 +956,8 @@ int da7219_aad_init(struct snd_soc_component *component) regmap_bulk_write(da7219->regmap, DA7219_ACCDET_IRQ_MASK_A, &mask, DA7219_AAD_IRQ_REG_MAX);
- da7219_aad_handle_gnd_switch_time(component);
- return 0; } EXPORT_SYMBOL_GPL(da7219_aad_init);
diff --git a/sound/soc/codecs/da7219-aad.h b/sound/soc/codecs/da7219-aad.h index f48a12012ef3..21fdf53095cc 100644 --- a/sound/soc/codecs/da7219-aad.h +++ b/sound/soc/codecs/da7219-aad.h @@ -187,6 +187,7 @@ enum da7219_aad_event_regs { struct da7219_aad_priv { struct snd_soc_component *component; int irq;
int gnd_switch_delay;
u8 micbias_pulse_lvl; u32 micbias_pulse_time;
-- 2.17.1
On Mon, Jan 30, 2023 at 10:16:06PM -0800, Guenter Roeck wrote:
On 1/30/23 19:58, David Rau wrote:
Thanks for the kind feedback. Would you please let me know what kinds of environment such error appears you ever meet? Ex: da7219_aad->gnd_switch_delay = ?
We are seeing the problem on various Chromebooks.
Never mind, though. I really don't have time to keep arguing about this. I would have assumed that it is obvious that a long msleep() in an interrupt handler is not appropriate, but obviously I was wrong.
This is a threaded interrupt handler so it's a bit less clear that it's meaningfully different to just disabling the interrupt for debounce or whatever. Not to say it's ideal.
I'll see if I can implement a downstream fix.
If you implement something I don't see a reason not to post it upstream.
On Tue, Jan 31, 2023 at 12:08:53PM +0000, Mark Brown wrote:
On Mon, Jan 30, 2023 at 10:16:06PM -0800, Guenter Roeck wrote:
On 1/30/23 19:58, David Rau wrote:
Thanks for the kind feedback. Would you please let me know what kinds of environment such error appears you ever meet? Ex: da7219_aad->gnd_switch_delay = ?
We are seeing the problem on various Chromebooks.
Never mind, though. I really don't have time to keep arguing about this. I would have assumed that it is obvious that a long msleep() in an interrupt handler is not appropriate, but obviously I was wrong.
This is a threaded interrupt handler so it's a bit less clear that it's meaningfully different to just disabling the interrupt for debounce or whatever. Not to say it's ideal.
I'll see if I can implement a downstream fix.
If you implement something I don't see a reason not to post it upstream.
I had a look into the code, and concluded that it is too complex for anyone who doesn't know it to find a proper fix. For example, for an outsider it is not conceivable (or explained) why the ground switch is enabled only to be disabled immediately afterwards if a jack was removed.
This is now the top crash reason on affected Chromebooks (so far I identified Asus C424, HP SeaStar, and HP StingRay) with this patch applied. I am inclined to revert it from all ChromeOS kernel branches. At least for us the cure for the problem is much worse than the problem itself.
Guenter
On Thu, Feb 02, 2023 at 07:51:01AM -0800, Guenter Roeck wrote:
On Tue, Jan 31, 2023 at 12:08:53PM +0000, Mark Brown wrote:
On Mon, Jan 30, 2023 at 10:16:06PM -0800, Guenter Roeck wrote:
I'll see if I can implement a downstream fix.
If you implement something I don't see a reason not to post it upstream.
I had a look into the code, and concluded that it is too complex for anyone who doesn't know it to find a proper fix. For example, for an outsider it
It's definitely unclear, there's a datasheet at [1] which does appear to explicitly call for a 512ms delay though (see figure 20 on page 50). It does look like it should only be applied in the case where an inserted jack is detected (ie, when identifying an accessory or button press) and not when removal is detected though.
is not conceivable (or explained) why the ground switch is enabled only to be disabled immediately afterwards if a jack was removed.
It smells like there's a power benefit to leaving it disabled when unplugged (which seems plausible), and possibly like the detection is more stable with the ground switch enabled. The ground switch is not documented AFAICT (it's in register 0xfb which isn't named and doesn't appear to appear in the datsheet from a quick search). The code is leaving the switch enabled so long as an accessory is plugged.
This is now the top crash reason on affected Chromebooks (so far I identified Asus C424, HP SeaStar, and HP StingRay) with this patch applied. I am inclined to revert it from all ChromeOS kernel branches. At least for us the cure for the problem is much worse than the problem itself.
Are you saying this is actually crashing, or just that you're getting warnings about threads being blocked for too long (that was what was posted earlier in the thread)? The only things I can see that look like they have the potential to actually lock up are the cancel_work_sync() calls but they were unchanged and the backtrace you showed was showing the thread in the msleep(). My guess would be that you've got systems where there are very frequent jack detection events (potentiallly with broken accessories, or possibly due to the ground switch putting things into the wrong priority) and that the interrupt is firing again as soon as the thread unmasks the primary interrupt which means it never actually stops running.
It's possible that reordering things so that the delay is only applied if DA7219_JACK_INSERTION_STS_MASK is set would help, that'd need some motion of the interrupt acking as well. That's probably a good idea in general, it's what the datasheet seems to call for and would lead to prompter removal detection. However if the issue is systems with broken accessories constantly firing spurious button events they'd still be seeing the delay.
My other guess would be that moving the delay that's been added to a delayed work would avoid the warnings, though you might want to manually keep the physical interrupt disabled while that's running which is fun. Possibly also tuning down the delay given that as you say 500ms is rather a long potential delay even in the context of jack debounces, though if it is bad accessories then there's probably a bit of luck involved in the original code not triggering issues and any debounce is likely to cause fun, and like I say the datasheet does seem to say that this is the appropriate delay.
You'd end up with something along the lines of
disable_irq(); schedule_delayed_work(delay, current_irq_code);
in the IRQ handler then call enable_irq() on the way out of the new delayed_work. That would keep the same flow but not look like the task is running which should avoid setting off the hung task alarm.
[1] https://www.renesas.com/us/en/document/dst/da7219-datasheet?r=1563341
On 2/2/23 09:04, Mark Brown wrote:
On Thu, Feb 02, 2023 at 07:51:01AM -0800, Guenter Roeck wrote:
On Tue, Jan 31, 2023 at 12:08:53PM +0000, Mark Brown wrote:
On Mon, Jan 30, 2023 at 10:16:06PM -0800, Guenter Roeck wrote:
I'll see if I can implement a downstream fix.
If you implement something I don't see a reason not to post it upstream.
I had a look into the code, and concluded that it is too complex for anyone who doesn't know it to find a proper fix. For example, for an outsider it
It's definitely unclear, there's a datasheet at [1] which does appear to explicitly call for a 512ms delay though (see figure 20 on page 50). It does look like it should only be applied in the case where an inserted jack is detected (ie, when identifying an accessory or button press) and not when removal is detected though.
The datasheet doesn't really suggest that a delay shall be applied using msleep (ie in the code). The chip presumably debounces internally (see jackdet_debounce and jackdet_rem_deb), and there is also jack_detect_rate to configure the detection rate. The table seems to suggest (to me) that there is an e_jack_insertion event, which would then be followed 64-512 ms later with an e_jack_detect_complete event.
Whatever is done in software is on top of that, or at least that is my understanding, and not explained by anything in the datasheet.
Given that the chip itself supports debouncing internally, it is not clear to me what the delay is actually supposed to accomplish. Soft debounce on top of chip debounce ? I don't see that explained anywhere, though of course I might be missing it.
is not conceivable (or explained) why the ground switch is enabled only to be disabled immediately afterwards if a jack was removed.
It smells like there's a power benefit to leaving it disabled when unplugged (which seems plausible), and possibly like the detection is more stable with the ground switch enabled. The ground switch is not documented AFAICT (it's in register 0xfb which isn't named and doesn't appear to appear in the datsheet from a quick search). The code is leaving the switch enabled so long as an accessory is plugged.
I understand. What I don't understand is that it is always enabled in the interrupt handler, no matter if a jack was inserted or not, only to be disabled immediately if the jack was disabled or after insertion detection work is complete.
Overall it is not clear what the impact of enabling ground switch actually is. What is really odd is that the original code only enabled ground switch once during initialization and disabled it either after a disconnect or after insertion detection was complete, but never re-enabled it. Now it is briefly enabled in the interrupt handler, but only after sleeping.
This is now the top crash reason on affected Chromebooks (so far I identified Asus C424, HP SeaStar, and HP StingRay) with this patch applied. I am inclined to revert it from all ChromeOS kernel branches. At least for us the cure for the problem is much worse than the problem itself.
Are you saying this is actually crashing, or just that you're getting warnings about threads being blocked for too long (that was what was posted earlier in the thread)? The only things I can see that look like
ChromeOS is configured to crash after stalled threads are detected (ie after 120 seconds), so this is actually causing crashes.
they have the potential to actually lock up are the cancel_work_sync() calls but they were unchanged and the backtrace you showed was showing the thread in the msleep(). My guess would be that you've got systems where there are very frequent jack detection events (potentiallly with broken accessories, or possibly due to the ground switch putting things into the wrong priority) and that the interrupt is firing again as soon as the thread unmasks the primary interrupt which means it never actually stops running.
That is what I strongly suspect is happening. I don't know why exactly the interrupt is firing continuously, but the hang is always in msleep(). One possibility might be that the event is actually a disconnect event, and that enabling and immediately disabling the ground switch causes another interrupt, which is then handled immediately, causing the hang.
It's possible that reordering things so that the delay is only applied if DA7219_JACK_INSERTION_STS_MASK is set would help, that'd need some motion of the interrupt acking as well. That's probably a good idea in general, it's what the datasheet seems to call for and would lead to prompter removal detection. However if the issue is systems with broken accessories constantly firing spurious button events they'd still be seeing the delay.
My other guess would be that moving the delay that's been added to a delayed work would avoid the warnings, though you might want to manually keep the physical interrupt disabled while that's running which is fun. Possibly also tuning down the delay given that as you say 500ms is rather a long potential delay even in the context of jack debounces, though if it is bad accessories then there's probably a bit of luck involved in the original code not triggering issues and any debounce is likely to cause fun, and like I say the datasheet does seem to say that this is the appropriate delay.
You'd end up with something along the lines of
disable_irq(); schedule_delayed_work(delay, current_irq_code);
I am not sure if that would fix anything. The current code sleeps, then enables the ground switch and does the rest of the detection. I'd somewhat understand the code if it would enable the ground switch after an "insertion detected" interrupt, then wait for some amount of time and handle the rest of the detection after waiting (even though that should really be handled by the "detection complete" interrupt). But that isn't what it does. If we were to implement the above, I suspect the result would be that the interrupt still happens all the time, and the only difference would be that it would be "silenced" while the delayed work is waiting to be scheduled. That doesn't really fix the problem, it only works around it. But, sure, it would be much better than the current situation.
My "wild shot" fix would be to enable the ground switch after an insertion event and to drop the software sleep entirely.
However, it is really impossible to know what the delay is for in the first place. Looking into the code further, the sleep time actually matches the configured jack detection rate. I have no idea why it would make sense to wait for a detection cycle after an event, then enable the ground switch and actually handle the event (which by then probably reports that jack detection is complete after an insertion). I really don't understand the logic behind that.
Guenter
in the IRQ handler then call enable_irq() on the way out of the new delayed_work. That would keep the same flow but not look like the task is running which should avoid setting off the hung task alarm.
[1] https://www.renesas.com/us/en/document/dst/da7219-datasheet?r=1563341
On Thu, Feb 02, 2023 at 10:39:51AM -0800, Guenter Roeck wrote:
On 2/2/23 09:04, Mark Brown wrote:
It's definitely unclear, there's a datasheet at [1] which does appear to explicitly call for a 512ms delay though (see figure 20 on page 50). It does look like it should only be applied in the case where an inserted jack is detected (ie, when identifying an accessory or button press) and not when removal is detected though.
The datasheet doesn't really suggest that a delay shall be applied using msleep (ie in the code). The chip presumably debounces internally (see
Obviously it doesn't call for an explicit implementation in the host.
jackdet_debounce and jackdet_rem_deb), and there is also jack_detect_rate to configure the detection rate. The table seems to suggest (to me) that there is an e_jack_insertion event, which would then be followed 64-512 ms later with an e_jack_detect_complete event.
Right, I think what I was looking at was that in combination of the fact that there's a *much* longer window before the host clears the interrupt shown on the first JACK_IN. It could be spurious and possibly just due to the host type check thing in the diagram but it smells real bad, like the hardware state machine has robustness issues or something. The diagram currently doesn't quite correspond to the code since we have the delay applied unconditionally, and there's that undocumented register for the ground switch being managed.
Whatever is done in software is on top of that, or at least that is my understanding, and not explained by anything in the datasheet.
Given that the chip itself supports debouncing internally, it is not clear to me what the delay is actually supposed to accomplish. Soft debounce on top of chip debounce ? I don't see that explained anywhere, though of course I might be missing it.
That's what it looks like it's trying to accomplish but as you say it's not exactly explicit. I *suspect* it's trying to debounce in more cases than is needed.
is not conceivable (or explained) why the ground switch is enabled only to be disabled immediately afterwards if a jack was removed.
It smells like there's a power benefit to leaving it disabled when unplugged (which seems plausible), and possibly like the detection is more stable with the ground switch enabled. The ground switch is not documented AFAICT (it's in register 0xfb which isn't named and doesn't appear to appear in the datsheet from a quick search). The code is leaving the switch enabled so long as an accessory is plugged.
I understand. What I don't understand is that it is always enabled in the interrupt handler, no matter if a jack was inserted or not, only to be disabled immediately if the jack was disabled or after insertion detection work is complete.
My guess was that it was making the detection more stable, it's surprising that it'd help with simple presence detection though.
Are you saying this is actually crashing, or just that you're getting warnings about threads being blocked for too long (that was what was posted earlier in the thread)? The only things I can see that look like
ChromeOS is configured to crash after stalled threads are detected (ie after 120 seconds), so this is actually causing crashes.
Ah, that's much more serious than I'd understood from the log you posted.
they have the potential to actually lock up are the cancel_work_sync() calls but they were unchanged and the backtrace you showed was showing the thread in the msleep(). My guess would be that you've got systems where there are very frequent jack detection events (potentiallly with broken accessories, or possibly due to the ground switch putting things into the wrong priority) and that the interrupt is firing again as soon as the thread unmasks the primary interrupt which means it never actually stops running.
That is what I strongly suspect is happening. I don't know why exactly the interrupt is firing continuously, but the hang is always in msleep(). One possibility might be that the event is actually a disconnect event, and that enabling and immediately disabling the ground switch causes another interrupt, which is then handled immediately, causing the hang.
Could be. I'd be willing to guess that it's not just one event but rather a stream of events of some kind. Possibly if it's due to the ground switch it's spuriously detecting a constant stream of button presses for the affected systems, which don't produce any UI visible result which would cause users to pull the accessory for whatever reason? Whatever's going on I bet it's broken accessories triggering it.
My other guess would be that moving the delay that's been added to a delayed work would avoid the warnings, though you might want to manually keep the physical interrupt disabled while that's running which is fun.
I am not sure if that would fix anything. The current code sleeps, then enables the ground switch and does the rest of the detection. I'd somewhat understand the code if it would enable the ground switch after an "insertion detected" interrupt, then wait for some amount of time and handle the rest of the detection after waiting (even though that should really be handled by the "detection complete" interrupt). But that isn't what it does. If we were to implement the above, I suspect the result would be that the interrupt still happens all the time, and the only difference would be that it would be "silenced" while the delayed work is waiting to be scheduled. That doesn't really fix the problem, it only works around it. But, sure, it would be much better than the current situation.
Yes, exactly - I was just looking at a refactoring in the code which would mitigate the immediate problem while keeping the current partially documented algorithm in place.
My "wild shot" fix would be to enable the ground switch after an insertion event and to drop the software sleep entirely.
That's entirely plausible to me, either together or possibly just one of those is actually needed. Do you want to send a patch?
However, it is really impossible to know what the delay is for in the first place. Looking into the code further, the sleep time actually matches the configured jack detection rate. I have no idea why it would make sense to wait for a detection cycle after an event, then enable the ground switch and actually handle the event (which by then probably reports that jack detection is complete after an insertion). I really don't understand the logic behind that.
This all smells like there's either a race condition in a state machine somewhere or the button detection needs a bit of help (though if it's the latter then it'd be conditional on a microphone having been detected).
Hopefully David will get back to us with some explanation and ideally fix.
On Thu, Feb 02, 2023 at 07:36:42PM +0000, Mark Brown wrote:
they have the potential to actually lock up are the cancel_work_sync() calls but they were unchanged and the backtrace you showed was showing the thread in the msleep(). My guess would be that you've got systems where there are very frequent jack detection events (potentiallly with broken accessories, or possibly due to the ground switch putting things into the wrong priority) and that the interrupt is firing again as soon as the thread unmasks the primary interrupt which means it never actually stops running.
That is what I strongly suspect is happening. I don't know why exactly the interrupt is firing continuously, but the hang is always in msleep(). One possibility might be that the event is actually a disconnect event, and that enabling and immediately disabling the ground switch causes another interrupt, which is then handled immediately, causing the hang.
Could be. I'd be willing to guess that it's not just one event but rather a stream of events of some kind. Possibly if it's due to the ground switch it's spuriously detecting a constant stream of button presses for the affected systems, which don't produce any UI visible result which would cause users to pull the accessory for whatever reason? Whatever's going on I bet it's broken accessories triggering it.
That seems to be unlikely. The average number of crashes per affected system is 1.92, which points to something the users are doing and less to a broken accessory. We do observe crashes due to broken accessories, but in those cases the number of crashes per system tends to be much higher.
Anyway, below is a patch with a possible fix. Of course, I still don't know what the patch originally tried to fix, so it might not do much if anything good. For example, it keeps button detection in the interrupt handler to avoid dropping button events, so if spurious button detection as you suspected is indeed (part of) the problem we might still see a large number of interrupts.
Guenter
---
From 81dbe47c94d8d97ce7919a5ec4d4269c55b56ae6 Mon Sep 17 00:00:00 2001
From: Guenter Roeck linux@roeck-us.net Date: Thu, 2 Feb 2023 16:09:14 -0800 Subject: [RFC] ASoC: da7219: Prevent hung task errors in interrupt handler
Commit 969357ec94e6 ("ASoC: da7219: Fix pole orientation detection on OMTP headsets when playing music") tried to improve pole orientation on certain headsets. Part of the change was to add a long sleep in the beginning of the interrupt handler, followed by enabling the ground switch
Unfortunately, this results in hung tasks in the threaded interrupt handler.
INFO: task irq/105-da7219-:2556 blocked for more than 122 seconds. Not tainted 5.10.159-20945-g4390861bfc33 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:irq/105-da7219- state:D stack: 0 pid: 2556 ppid: 2 flags:0x00004080 Call Trace: __schedule+0x3b0/0xddb schedule+0x44/0xa8 schedule_timeout+0xae/0x241 ? run_local_timers+0x4e/0x4e msleep+0x2c/0x38 da7219_aad_irq_thread+0x66/0x2b0 irq_thread_fn+0x22/0x4d irq_thread+0x131/0x1cb ? irq_forced_thread_fn+0x5f/0x5f ? irq_thread_fn+0x4d/0x4d kthread+0x142/0x153 ? synchronize_irq+0xe0/0xe0 ? kthread_blkcg+0x31/0x31 ret_from_fork+0x1f/0x30
Solve the problem by enabling the ground switch immediately and only after an insertion has been detected. Delay pole orientation detection until after the chip reports that detection is complete plus an additional time depending on the chip configuration. Do this by implementing ground switch detection in a delayed worker.
Fixes: 969357ec94e6 ("ASoC: da7219: Fix pole orientation detection on OMTP headsets when playing music") Cc: David Rau david.rau.zg@renesas.com Signed-off-by: Guenter Roeck linux@roeck-us.net --- sound/soc/codecs/da7219-aad.c | 156 ++++++++++++++++++++++------------ sound/soc/codecs/da7219-aad.h | 3 + 2 files changed, 105 insertions(+), 54 deletions(-)
diff --git a/sound/soc/codecs/da7219-aad.c b/sound/soc/codecs/da7219-aad.c index c55b033d89da..47685c996bda 100644 --- a/sound/soc/codecs/da7219-aad.c +++ b/sound/soc/codecs/da7219-aad.c @@ -8,6 +8,7 @@ */
#include <linux/module.h> +#include <linux/mutex.h> #include <linux/platform_device.h> #include <linux/clk.h> #include <linux/i2c.h> @@ -339,6 +340,82 @@ static void da7219_aad_hptest_work(struct work_struct *work) SND_JACK_HEADSET | SND_JACK_LINEOUT); }
+static void da7219_aad_handle_removal(struct da7219_aad_priv *da7219_aad) +{ + struct snd_soc_component *component = da7219_aad->component; + struct snd_soc_dapm_context *dapm = snd_soc_component_get_dapm(component); + struct da7219_priv *da7219 = snd_soc_component_get_drvdata(component); + + da7219_aad->jack_inserted = false; + + /* Cancel any pending work */ + cancel_work_sync(&da7219_aad->btn_det_work); + cancel_work_sync(&da7219_aad->hptest_work); + + /* Un-drive headphones/lineout */ + snd_soc_component_update_bits(component, DA7219_HP_R_CTRL, + DA7219_HP_R_AMP_OE_MASK, 0); + snd_soc_component_update_bits(component, DA7219_HP_L_CTRL, + DA7219_HP_L_AMP_OE_MASK, 0); + + /* Ensure button detection disabled */ + snd_soc_component_update_bits(component, DA7219_ACCDET_CONFIG_1, + DA7219_BUTTON_CONFIG_MASK, 0); + + da7219->micbias_on_event = false; + + /* Disable mic bias */ + snd_soc_dapm_disable_pin(dapm, "Mic Bias"); + snd_soc_dapm_sync(dapm); + + /* Disable ground switch */ + snd_soc_component_update_bits(component, 0xFB, 0x01, 0x00); + + snd_soc_jack_report(da7219_aad->jack, 0, DA7219_AAD_REPORT_ALL_MASK); +} + +static void da7219_aad_insertion_work(struct work_struct *work) +{ + struct da7219_aad_priv *da7219_aad = + container_of(work, struct da7219_aad_priv, hptest_work); + struct snd_soc_component *component = da7219_aad->component; + u8 statusa; + + mutex_lock(&da7219_aad->insertion_mutex); + + if (!da7219_aad->jack_inserted) + goto unlock; + + /* Read status register for jack insertion & type status */ + statusa = snd_soc_component_read(component, DA7219_ACCDET_STATUS_A); + if (!(statusa & DA7219_JACK_INSERTION_STS_MASK)) { + da7219_aad_handle_removal(da7219_aad); + goto unlock; + } + + /* + * If 4-pole, then enable button detection, else perform + * HP impedance test to determine output type to report. + * + * We schedule work here as the tasks themselves can + * take time to complete, and in particular for hptest + * we want to be able to check if the jack was removed + * during the procedure as this will invalidate the + * result. By doing this as work, the IRQ thread can + * handle a removal, and we can check at the end of + * hptest if we have a valid result or not. + */ + if (statusa & DA7219_JACK_TYPE_STS_MASK) { + schedule_work(&da7219_aad->btn_det_work); + snd_soc_jack_report(da7219_aad->jack, SND_JACK_HEADSET, + SND_JACK_HEADSET | SND_JACK_LINEOUT); + } else { + schedule_work(&da7219_aad->hptest_work); + } + +unlock: + mutex_unlock(&da7219_aad->insertion_mutex); +}
/* * IRQ @@ -348,23 +425,21 @@ static irqreturn_t da7219_aad_irq_thread(int irq, void *data) { struct da7219_aad_priv *da7219_aad = data; struct snd_soc_component *component = da7219_aad->component; - struct snd_soc_dapm_context *dapm = snd_soc_component_get_dapm(component); struct da7219_priv *da7219 = snd_soc_component_get_drvdata(component); u8 events[DA7219_AAD_IRQ_REG_MAX]; - u8 statusa, srm_st; + u8 statusa; int i, report = 0, mask = 0;
- srm_st = snd_soc_component_read(component, DA7219_PLL_SRM_STS) & DA7219_PLL_SRM_STS_MCLK; - msleep(da7219_aad->gnd_switch_delay * ((srm_st == 0x0) ? 2 : 1) - 4); - /* Enable ground switch */ - snd_soc_component_update_bits(component, 0xFB, 0x01, 0x01); + mutex_lock(&da7219_aad->insertion_mutex);
/* Read current IRQ events */ regmap_bulk_read(da7219->regmap, DA7219_ACCDET_IRQ_EVENT_A, events, DA7219_AAD_IRQ_REG_MAX);
- if (!events[DA7219_AAD_IRQ_REG_A] && !events[DA7219_AAD_IRQ_REG_B]) + if (!events[DA7219_AAD_IRQ_REG_A] && !events[DA7219_AAD_IRQ_REG_B]) { + mutex_unlock(&da7219_aad->insertion_mutex); return IRQ_NONE; + }
/* Read status register for jack insertion & type status */ statusa = snd_soc_component_read(component, DA7219_ACCDET_STATUS_A); @@ -378,36 +453,29 @@ static irqreturn_t da7219_aad_irq_thread(int irq, void *data) statusa);
if (statusa & DA7219_JACK_INSERTION_STS_MASK) { + u8 srm_st; + + srm_st = snd_soc_component_read(component, DA7219_PLL_SRM_STS) & + DA7219_PLL_SRM_STS_MCLK; + /* Jack Insertion */ if (events[DA7219_AAD_IRQ_REG_A] & DA7219_E_JACK_INSERTED_MASK) { report |= SND_JACK_MECHANICAL; mask |= SND_JACK_MECHANICAL; da7219_aad->jack_inserted = true; + + /* Enable ground switch */ + snd_soc_component_update_bits(component, 0xFB, 0x01, 0x01); }
/* Jack type detection */ if (events[DA7219_AAD_IRQ_REG_A] & DA7219_E_JACK_DETECT_COMPLETE_MASK) { - /* - * If 4-pole, then enable button detection, else perform - * HP impedance test to determine output type to report. - * - * We schedule work here as the tasks themselves can - * take time to complete, and in particular for hptest - * we want to be able to check if the jack was removed - * during the procedure as this will invalidate the - * result. By doing this as work, the IRQ thread can - * handle a removal, and we can check at the end of - * hptest if we have a valid result or not. - */ - if (statusa & DA7219_JACK_TYPE_STS_MASK) { - report |= SND_JACK_HEADSET; - mask |= SND_JACK_HEADSET | SND_JACK_LINEOUT; - schedule_work(&da7219_aad->btn_det_work); - } else { - schedule_work(&da7219_aad->hptest_work); - } + int delay = da7219_aad->gnd_switch_delay * + ((srm_st == 0x0) ? 2 : 1) - 4; + + schedule_delayed_work(&da7219_aad->insertion_work, delay); }
/* Button support for 4-pole jack */ @@ -431,40 +499,16 @@ static irqreturn_t da7219_aad_irq_thread(int irq, void *data) } } } + snd_soc_jack_report(da7219_aad->jack, report, mask); } else { /* Jack removal */ if (events[DA7219_AAD_IRQ_REG_A] & DA7219_E_JACK_REMOVED_MASK) { - report = 0; - mask |= DA7219_AAD_REPORT_ALL_MASK; - da7219_aad->jack_inserted = false; - - /* Cancel any pending work */ - cancel_work_sync(&da7219_aad->btn_det_work); - cancel_work_sync(&da7219_aad->hptest_work); - - /* Un-drive headphones/lineout */ - snd_soc_component_update_bits(component, DA7219_HP_R_CTRL, - DA7219_HP_R_AMP_OE_MASK, 0); - snd_soc_component_update_bits(component, DA7219_HP_L_CTRL, - DA7219_HP_L_AMP_OE_MASK, 0); - - /* Ensure button detection disabled */ - snd_soc_component_update_bits(component, DA7219_ACCDET_CONFIG_1, - DA7219_BUTTON_CONFIG_MASK, 0); - - da7219->micbias_on_event = false; - - /* Disable mic bias */ - snd_soc_dapm_disable_pin(dapm, "Mic Bias"); - snd_soc_dapm_sync(dapm); - - /* Disable ground switch */ - snd_soc_component_update_bits(component, 0xFB, 0x01, 0x00); + cancel_delayed_work(&da7219_aad->insertion_work); + da7219_aad_handle_removal(da7219_aad); } }
- snd_soc_jack_report(da7219_aad->jack, report, mask); - + mutex_unlock(&da7219_aad->insertion_mutex); return IRQ_HANDLED; }
@@ -938,6 +982,9 @@ int da7219_aad_init(struct snd_soc_component *component) snd_soc_component_update_bits(component, DA7219_ACCDET_CONFIG_1, DA7219_BUTTON_CONFIG_MASK, 0);
+ mutex_init(&da7219_aad->insertion_mutex); + + INIT_DELAYED_WORK(&da7219_aad->insertion_work, da7219_aad_insertion_work); INIT_WORK(&da7219_aad->btn_det_work, da7219_aad_btn_det_work); INIT_WORK(&da7219_aad->hptest_work, da7219_aad_hptest_work);
@@ -973,6 +1020,7 @@ void da7219_aad_exit(struct snd_soc_component *component)
free_irq(da7219_aad->irq, da7219_aad);
+ cancel_delayed_work_sync(&da7219_aad->insertion_work); cancel_work_sync(&da7219_aad->btn_det_work); cancel_work_sync(&da7219_aad->hptest_work); } diff --git a/sound/soc/codecs/da7219-aad.h b/sound/soc/codecs/da7219-aad.h index 21fdf53095cc..b1b7f8ba45bd 100644 --- a/sound/soc/codecs/da7219-aad.h +++ b/sound/soc/codecs/da7219-aad.h @@ -10,6 +10,7 @@ #ifndef __DA7219_AAD_H #define __DA7219_AAD_H
+#include <linux/mutex.h> #include <linux/timer.h> #include <sound/soc.h> #include <sound/jack.h> @@ -194,6 +195,8 @@ struct da7219_aad_priv {
u8 btn_cfg;
+ struct mutex insertion_mutex; + struct delayed_work insertion_work; struct work_struct btn_det_work; struct work_struct hptest_work;
-----Original Message----- From: Guenter Roeck groeck7@gmail.com On Behalf Of Guenter Roeck Sent: Saturday, February 4, 2023 23:42 To: Mark Brown broonie@kernel.org Cc: David Rau david.rau.zg@renesas.com; perex@perex.cz; lgirdwood@gmail.com; tiwai@suse.com; support.opensource@diasemi.com; alsa-devel@alsa-project.org; linux-kernel@vger.kernel.org Subject: Re: [PATCH] ASoC: da7219: Fix pole orientation detection on OMTP headsets when playing music
On Thu, Feb 02, 2023 at 07:36:42PM +0000, Mark Brown wrote:
they have the potential to actually lock up are the cancel_work_sync() calls but they were unchanged and the backtrace you showed was showing the thread in the msleep(). My guess would be that you've got systems where there are very frequent jack detection events (potentiallly with broken accessories, or possibly due to the ground switch putting things into the wrong priority) and that the interrupt is firing again as soon as the thread unmasks the primary interrupt which means it never actually stops running.
That is what I strongly suspect is happening. I don't know why exactly the interrupt is firing continuously, but the hang is always in msleep(). One possibility might be that the event is actually a disconnect event, and that enabling and immediately disabling the ground switch causes another interrupt, which is then handled immediately, causing the hang.
Could be. I'd be willing to guess that it's not just one event but rather a stream of events of some kind. Possibly if it's due to the ground switch it's spuriously detecting a constant stream of button presses for the affected systems, which don't produce any UI visible result which would cause users to pull the accessory for whatever reason? Whatever's going on I bet it's broken accessories triggering it.
That seems to be unlikely. The average number of crashes per affected system is 1.92, which points to something the users are doing and less to a broken accessory. We do observe crashes due to broken accessories, but in those cases the number of crashes per system tends to be much > higher.
Anyway, below is a patch with a possible fix. Of course, I still don't know what the patch originally tried to fix, so it might not do much if anything good.
I added the software debouncing before insertion task to ensue the better compatibility of OMTP Jack.
For example, it keeps button detection in the interrupt handler to avoid dropping button events, so if spurious button detection as you suspected is indeed (part of) the problem we might still see a large number of interrupts.
Guenter
Thanks a lot for your big efforts to implement the temporary fix and verifications. Would you please let me know the average number of crashes per affected system if you rollback to the pervious fix? Ref: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/so... https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/so...
David ---
From 81dbe47c94d8d97ce7919a5ec4d4269c55b56ae6 Mon Sep 17 00:00:00 2001
From: Guenter Roeck linux@roeck-us.net Date: Thu, 2 Feb 2023 16:09:14 -0800 Subject: [RFC] ASoC: da7219: Prevent hung task errors in interrupt handler
Commit 969357ec94e6 ("ASoC: da7219: Fix pole orientation detection on OMTP headsets when playing music") tried to improve pole orientation on certain headsets. Part of the change was to add a long sleep in the beginning of the interrupt handler, followed by enabling the ground switch
Unfortunately, this results in hung tasks in the threaded interrupt handler.
INFO: task irq/105-da7219-:2556 blocked for more than 122 seconds. Not tainted 5.10.159-20945-g4390861bfc33 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:irq/105-da7219- state:D stack: 0 pid: 2556 ppid: 2 flags:0x00004080 Call Trace: __schedule+0x3b0/0xddb schedule+0x44/0xa8 schedule_timeout+0xae/0x241 ? run_local_timers+0x4e/0x4e msleep+0x2c/0x38 da7219_aad_irq_thread+0x66/0x2b0 irq_thread_fn+0x22/0x4d irq_thread+0x131/0x1cb ? irq_forced_thread_fn+0x5f/0x5f ? irq_thread_fn+0x4d/0x4d kthread+0x142/0x153 ? synchronize_irq+0xe0/0xe0 ? kthread_blkcg+0x31/0x31 ret_from_fork+0x1f/0x30
Solve the problem by enabling the ground switch immediately and only after an insertion has been detected. Delay pole orientation detection until after the chip reports that detection is complete plus an additional time depending on the chip configuration. Do this by implementing ground switch detection in a delayed worker.
Fixes: 969357ec94e6 ("ASoC: da7219: Fix pole orientation detection on OMTP headsets when playing music") Cc: David Rau david.rau.zg@renesas.com Signed-off-by: Guenter Roeck linux@roeck-us.net --- sound/soc/codecs/da7219-aad.c | 156 ++++++++++++++++++++++------------ sound/soc/codecs/da7219-aad.h | 3 + 2 files changed, 105 insertions(+), 54 deletions(-)
diff --git a/sound/soc/codecs/da7219-aad.c b/sound/soc/codecs/da7219-aad.c index c55b033d89da..47685c996bda 100644 --- a/sound/soc/codecs/da7219-aad.c +++ b/sound/soc/codecs/da7219-aad.c @@ -8,6 +8,7 @@ */
#include <linux/module.h> +#include <linux/mutex.h> #include <linux/platform_device.h> #include <linux/clk.h> #include <linux/i2c.h> @@ -339,6 +340,82 @@ static void da7219_aad_hptest_work(struct work_struct *work) SND_JACK_HEADSET | SND_JACK_LINEOUT); }
+static void da7219_aad_handle_removal(struct da7219_aad_priv +*da7219_aad) { + struct snd_soc_component *component = da7219_aad->component; + struct snd_soc_dapm_context *dapm = snd_soc_component_get_dapm(component); + struct da7219_priv *da7219 = snd_soc_component_get_drvdata(component); + + da7219_aad->jack_inserted = false; + + /* Cancel any pending work */ + cancel_work_sync(&da7219_aad->btn_det_work); + cancel_work_sync(&da7219_aad->hptest_work); + + /* Un-drive headphones/lineout */ + snd_soc_component_update_bits(component, DA7219_HP_R_CTRL, + DA7219_HP_R_AMP_OE_MASK, 0); + snd_soc_component_update_bits(component, DA7219_HP_L_CTRL, + DA7219_HP_L_AMP_OE_MASK, 0); + + /* Ensure button detection disabled */ + snd_soc_component_update_bits(component, DA7219_ACCDET_CONFIG_1, + DA7219_BUTTON_CONFIG_MASK, 0); + + da7219->micbias_on_event = false; + + /* Disable mic bias */ + snd_soc_dapm_disable_pin(dapm, "Mic Bias"); + snd_soc_dapm_sync(dapm); + + /* Disable ground switch */ + snd_soc_component_update_bits(component, 0xFB, 0x01, 0x00); + + snd_soc_jack_report(da7219_aad->jack, 0, DA7219_AAD_REPORT_ALL_MASK); +} + +static void da7219_aad_insertion_work(struct work_struct *work) { + struct da7219_aad_priv *da7219_aad = + container_of(work, struct da7219_aad_priv, hptest_work); + struct snd_soc_component *component = da7219_aad->component; + u8 statusa; + + mutex_lock(&da7219_aad->insertion_mutex); + + if (!da7219_aad->jack_inserted) + goto unlock; + + /* Read status register for jack insertion & type status */ + statusa = snd_soc_component_read(component, DA7219_ACCDET_STATUS_A); + if (!(statusa & DA7219_JACK_INSERTION_STS_MASK)) { + da7219_aad_handle_removal(da7219_aad); + goto unlock; + } + + /* + * If 4-pole, then enable button detection, else perform + * HP impedance test to determine output type to report. + * + * We schedule work here as the tasks themselves can + * take time to complete, and in particular for hptest + * we want to be able to check if the jack was removed + * during the procedure as this will invalidate the + * result. By doing this as work, the IRQ thread can + * handle a removal, and we can check at the end of + * hptest if we have a valid result or not. + */ + if (statusa & DA7219_JACK_TYPE_STS_MASK) { + schedule_work(&da7219_aad->btn_det_work); + snd_soc_jack_report(da7219_aad->jack, SND_JACK_HEADSET, + SND_JACK_HEADSET | SND_JACK_LINEOUT); + } else { + schedule_work(&da7219_aad->hptest_work); + } + +unlock: + mutex_unlock(&da7219_aad->insertion_mutex); +}
/* * IRQ @@ -348,23 +425,21 @@ static irqreturn_t da7219_aad_irq_thread(int irq, void *data) { struct da7219_aad_priv *da7219_aad = data; struct snd_soc_component *component = da7219_aad->component; - struct snd_soc_dapm_context *dapm = snd_soc_component_get_dapm(component); struct da7219_priv *da7219 = snd_soc_component_get_drvdata(component); u8 events[DA7219_AAD_IRQ_REG_MAX]; - u8 statusa, srm_st; + u8 statusa; int i, report = 0, mask = 0;
- srm_st = snd_soc_component_read(component, DA7219_PLL_SRM_STS) & DA7219_PLL_SRM_STS_MCLK; - msleep(da7219_aad->gnd_switch_delay * ((srm_st == 0x0) ? 2 : 1) - 4); - /* Enable ground switch */ - snd_soc_component_update_bits(component, 0xFB, 0x01, 0x01); + mutex_lock(&da7219_aad->insertion_mutex);
/* Read current IRQ events */ regmap_bulk_read(da7219->regmap, DA7219_ACCDET_IRQ_EVENT_A, events, DA7219_AAD_IRQ_REG_MAX);
- if (!events[DA7219_AAD_IRQ_REG_A] && !events[DA7219_AAD_IRQ_REG_B]) + if (!events[DA7219_AAD_IRQ_REG_A] && !events[DA7219_AAD_IRQ_REG_B]) { + mutex_unlock(&da7219_aad->insertion_mutex); return IRQ_NONE; + }
/* Read status register for jack insertion & type status */ statusa = snd_soc_component_read(component, DA7219_ACCDET_STATUS_A); @@ -378,36 +453,29 @@ static irqreturn_t da7219_aad_irq_thread(int irq, void *data) statusa);
if (statusa & DA7219_JACK_INSERTION_STS_MASK) { + u8 srm_st; + + srm_st = snd_soc_component_read(component, DA7219_PLL_SRM_STS) & + DA7219_PLL_SRM_STS_MCLK; + /* Jack Insertion */ if (events[DA7219_AAD_IRQ_REG_A] & DA7219_E_JACK_INSERTED_MASK) { report |= SND_JACK_MECHANICAL; mask |= SND_JACK_MECHANICAL; da7219_aad->jack_inserted = true; + + /* Enable ground switch */ + snd_soc_component_update_bits(component, 0xFB, 0x01, 0x01); }
/* Jack type detection */ if (events[DA7219_AAD_IRQ_REG_A] & DA7219_E_JACK_DETECT_COMPLETE_MASK) { - /* - * If 4-pole, then enable button detection, else perform - * HP impedance test to determine output type to report. - * - * We schedule work here as the tasks themselves can - * take time to complete, and in particular for hptest - * we want to be able to check if the jack was removed - * during the procedure as this will invalidate the - * result. By doing this as work, the IRQ thread can - * handle a removal, and we can check at the end of - * hptest if we have a valid result or not. - */ - if (statusa & DA7219_JACK_TYPE_STS_MASK) { - report |= SND_JACK_HEADSET; - mask |= SND_JACK_HEADSET | SND_JACK_LINEOUT; - schedule_work(&da7219_aad->btn_det_work); - } else { - schedule_work(&da7219_aad->hptest_work); - } + int delay = da7219_aad->gnd_switch_delay * + ((srm_st == 0x0) ? 2 : 1) - 4; + + schedule_delayed_work(&da7219_aad->insertion_work, delay); }
/* Button support for 4-pole jack */ @@ -431,40 +499,16 @@ static irqreturn_t da7219_aad_irq_thread(int irq, void *data) } } } + snd_soc_jack_report(da7219_aad->jack, report, mask); } else { /* Jack removal */ if (events[DA7219_AAD_IRQ_REG_A] & DA7219_E_JACK_REMOVED_MASK) { - report = 0; - mask |= DA7219_AAD_REPORT_ALL_MASK; - da7219_aad->jack_inserted = false; - - /* Cancel any pending work */ - cancel_work_sync(&da7219_aad->btn_det_work); - cancel_work_sync(&da7219_aad->hptest_work); - - /* Un-drive headphones/lineout */ - snd_soc_component_update_bits(component, DA7219_HP_R_CTRL, - DA7219_HP_R_AMP_OE_MASK, 0); - snd_soc_component_update_bits(component, DA7219_HP_L_CTRL, - DA7219_HP_L_AMP_OE_MASK, 0); - - /* Ensure button detection disabled */ - snd_soc_component_update_bits(component, DA7219_ACCDET_CONFIG_1, - DA7219_BUTTON_CONFIG_MASK, 0); - - da7219->micbias_on_event = false; - - /* Disable mic bias */ - snd_soc_dapm_disable_pin(dapm, "Mic Bias"); - snd_soc_dapm_sync(dapm); - - /* Disable ground switch */ - snd_soc_component_update_bits(component, 0xFB, 0x01, 0x00); + cancel_delayed_work(&da7219_aad->insertion_work); + da7219_aad_handle_removal(da7219_aad); } }
- snd_soc_jack_report(da7219_aad->jack, report, mask); - + mutex_unlock(&da7219_aad->insertion_mutex); return IRQ_HANDLED; }
@@ -938,6 +982,9 @@ int da7219_aad_init(struct snd_soc_component *component) snd_soc_component_update_bits(component, DA7219_ACCDET_CONFIG_1, DA7219_BUTTON_CONFIG_MASK, 0);
+ mutex_init(&da7219_aad->insertion_mutex); + + INIT_DELAYED_WORK(&da7219_aad->insertion_work, +da7219_aad_insertion_work); INIT_WORK(&da7219_aad->btn_det_work, da7219_aad_btn_det_work); INIT_WORK(&da7219_aad->hptest_work, da7219_aad_hptest_work);
@@ -973,6 +1020,7 @@ void da7219_aad_exit(struct snd_soc_component *component)
free_irq(da7219_aad->irq, da7219_aad);
+ cancel_delayed_work_sync(&da7219_aad->insertion_work); cancel_work_sync(&da7219_aad->btn_det_work); cancel_work_sync(&da7219_aad->hptest_work); } diff --git a/sound/soc/codecs/da7219-aad.h b/sound/soc/codecs/da7219-aad.h index 21fdf53095cc..b1b7f8ba45bd 100644 --- a/sound/soc/codecs/da7219-aad.h +++ b/sound/soc/codecs/da7219-aad.h @@ -10,6 +10,7 @@ #ifndef __DA7219_AAD_H #define __DA7219_AAD_H
+#include <linux/mutex.h> #include <linux/timer.h> #include <sound/soc.h> #include <sound/jack.h> @@ -194,6 +195,8 @@ struct da7219_aad_priv {
u8 btn_cfg;
+ struct mutex insertion_mutex; + struct delayed_work insertion_work; struct work_struct btn_det_work; struct work_struct hptest_work;
-- 2.39.1
On 2/5/23 21:38, David Rau wrote:
-----Original Message----- From: Guenter Roeck groeck7@gmail.com On Behalf Of Guenter Roeck Sent: Saturday, February 4, 2023 23:42 To: Mark Brown broonie@kernel.org Cc: David Rau david.rau.zg@renesas.com; perex@perex.cz; lgirdwood@gmail.com; tiwai@suse.com; support.opensource@diasemi.com; alsa-devel@alsa-project.org; linux-kernel@vger.kernel.org Subject: Re: [PATCH] ASoC: da7219: Fix pole orientation detection on OMTP headsets when playing music
On Thu, Feb 02, 2023 at 07:36:42PM +0000, Mark Brown wrote:
they have the potential to actually lock up are the cancel_work_sync() calls but they were unchanged and the backtrace you showed was showing the thread in the msleep(). My guess would be that you've got systems where there are very frequent jack detection events (potentiallly with broken accessories, or possibly due to the ground switch putting things into the wrong priority) and that the interrupt is firing again as soon as the thread unmasks the primary interrupt which means it never actually stops running.
That is what I strongly suspect is happening. I don't know why exactly the interrupt is firing continuously, but the hang is always in msleep(). One possibility might be that the event is actually a disconnect event, and that enabling and immediately disabling the ground switch causes another interrupt, which is then handled immediately, causing the hang.
Could be. I'd be willing to guess that it's not just one event but rather a stream of events of some kind. Possibly if it's due to the ground switch it's spuriously detecting a constant stream of button presses for the affected systems, which don't produce any UI visible result which would cause users to pull the accessory for whatever reason? Whatever's going on I bet it's broken accessories triggering it.
That seems to be unlikely. The average number of crashes per affected system is 1.92, which points to something the users are doing and less to a broken accessory. We do observe crashes due to broken accessories, but in those cases the number of crashes per system tends to be much > higher.
Anyway, below is a patch with a possible fix. Of course, I still don't know what the patch originally tried to fix, so it might not do much if anything good.
I added the software debouncing before insertion task to ensue the better compatibility of OMTP Jack.
For example, it keeps button detection in the interrupt handler to avoid dropping button events, so if spurious button detection as you suspected is indeed (part of) the problem we might still see a large number of interrupts.
Guenter
Thanks a lot for your big efforts to implement the temporary fix and verifications. Would you please let me know the average number of crashes per affected system if you rollback to the pervious fix? Ref: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/so... https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/so...
You mean just keep the above two patches and revert 969357ec94e6 ? Sure, I can do that, but feedback from the field would take some 2-3 months. Is that what you recommend to do for now ?
Thanks, Guenter
-----Original Message----- From: Guenter Roeck groeck7@gmail.com On Behalf Of Guenter Roeck Sent: Monday, February 6, 2023 22:05 To: David Rau david.rau.zg@renesas.com; Mark Brown broonie@kernel.org Cc: perex@perex.cz; lgirdwood@gmail.com; tiwai@suse.com; support.opensource@diasemi.com; alsa-devel@alsa-project.org; linux-kernel@vger.kernel.org Subject: Re: [PATCH] ASoC: da7219: Fix pole orientation detection on OMTP headsets when playing music
On 2/5/23 21:38, David Rau wrote:
-----Original Message----- From: Guenter Roeck groeck7@gmail.com On Behalf Of Guenter Roeck Sent: Saturday, February 4, 2023 23:42 To: Mark Brown broonie@kernel.org Cc: David Rau david.rau.zg@renesas.com; perex@perex.cz; lgirdwood@gmail.com; tiwai@suse.com; support.opensource@diasemi.com; alsa-devel@alsa-project.org; linux-kernel@vger.kernel.org Subject: Re: [PATCH] ASoC: da7219: Fix pole orientation detection on OMTP headsets when playing music
On Thu, Feb 02, 2023 at 07:36:42PM +0000, Mark Brown wrote:
they have the potential to actually lock up are the cancel_work_sync() calls but they were unchanged and the backtrace you showed was showing the thread in the msleep(). My guess would be that you've got systems where there are very frequent jack detection events (potentiallly with broken accessories, or possibly due to the ground switch putting things into the wrong priority) and that the interrupt is firing again as soon as the thread unmasks the primary interrupt which means it never actually stops running.
That is what I strongly suspect is happening. I don't know why exactly the interrupt is firing continuously, but the hang is always in msleep(). One possibility might be that the event is actually a disconnect event, and that enabling and immediately disabling the ground switch causes another interrupt, which is then handled immediately, causing the hang.
Could be. I'd be willing to guess that it's not just one event but rather a stream of events of some kind. Possibly if it's due to the ground switch it's spuriously detecting a constant stream of button presses for the affected systems, which don't produce any UI visible result which would cause users to pull the accessory for whatever reason? Whatever's going on I bet it's broken accessories triggering it.
That seems to be unlikely. The average number of crashes per affected system is 1.92, which points to something the users are doing and less to a broken accessory.
We do observe crashes due to broken accessories, but in those cases the number of crashes per system tends to be much > higher.
Anyway, below is a patch with a possible fix. Of course, I still don't know what the patch originally tried to fix, so it might not do much if anything good.
I added the software debouncing before insertion task to ensue the better compatibility of OMTP Jack.
For example, it keeps button detection in the interrupt handler to avoid dropping button events, so if spurious button detection as you suspected is indeed (part of) the problem we might still see a large number of interrupts.
Guenter
Thanks a lot for your big efforts to implement the temporary fix and verifications. Would you please let me know the average number of crashes per affected system if you rollback to the pervious fix? Ref: https://jpn01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit. kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2 Fcommit%2Fsound%2Fsoc%2Fcodecs%3Fid%3D2d969e8f35b1849a43156029a7a6e294 3b89d0c0&data=05%7C01%7Cdavid.rau.zg%40renesas.com%7Cae6910f8ff4e4e299 bc408db084b1a2a%7C53d82571da1947e49cb4625a166a4a2a%7C0%7C0%7C638112890 873388020%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIi LCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=8KgHP%2FOD%2BTDcr rUVSATFkDCDDmhiCu7d5%2FKhyOszThA%3D&reserved=0 https://jpn01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit. kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2 Fcommit%2Fsound%2Fsoc%2Fcodecs%3Fid%3D06f5882122e3faa183d76c4ec2c92f4c 38e2c7bb&data=05%7C01%7Cdavid.rau.zg%40renesas.com%7Cae6910f8ff4e4e299 bc408db084b1a2a%7C53d82571da1947e49cb4625a166a4a2a%7C0%7C0%7C638112890 873388020%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIi LCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=WosfvANk0YxeJD5PG %2FnAuAWVqt7m4U3mMaYXefLLdS4%3D&reserved=0
You mean just keep the above two patches and revert 969357ec94e6 ? Sure, I can do that, but feedback from the field would take some 2-3 months. Is that what you recommend to do for now ?
Thanks, Guenter
Thanks for the feedback. What I mean is just do rollback to remove the "sleep" patch I did in your repository.
After the rollback, could you please let me know the average number of crashes per affected system with the same test conditions? Will it still take some 2-3 months?
Thanks. David
On 2/6/23 18:42, David Rau wrote:
-----Original Message----- From: Guenter Roeck groeck7@gmail.com On Behalf Of Guenter Roeck Sent: Monday, February 6, 2023 22:05 To: David Rau david.rau.zg@renesas.com; Mark Brown broonie@kernel.org Cc: perex@perex.cz; lgirdwood@gmail.com; tiwai@suse.com; support.opensource@diasemi.com; alsa-devel@alsa-project.org; linux-kernel@vger.kernel.org Subject: Re: [PATCH] ASoC: da7219: Fix pole orientation detection on OMTP headsets when playing music
On 2/5/23 21:38, David Rau wrote:
-----Original Message----- From: Guenter Roeck groeck7@gmail.com On Behalf Of Guenter Roeck Sent: Saturday, February 4, 2023 23:42 To: Mark Brown broonie@kernel.org Cc: David Rau david.rau.zg@renesas.com; perex@perex.cz; lgirdwood@gmail.com; tiwai@suse.com; support.opensource@diasemi.com; alsa-devel@alsa-project.org; linux-kernel@vger.kernel.org Subject: Re: [PATCH] ASoC: da7219: Fix pole orientation detection on OMTP headsets when playing music
On Thu, Feb 02, 2023 at 07:36:42PM +0000, Mark Brown wrote:
they have the potential to actually lock up are the cancel_work_sync() calls but they were unchanged and the backtrace you showed was showing the thread in the msleep(). My guess would be that you've got systems where there are very frequent jack detection events (potentiallly with broken accessories, or possibly due to the ground switch putting things into the wrong priority) and that the interrupt is firing again as soon as the thread unmasks the primary interrupt which means it never actually stops running.
That is what I strongly suspect is happening. I don't know why exactly the interrupt is firing continuously, but the hang is always in msleep(). One possibility might be that the event is actually a disconnect event, and that enabling and immediately disabling the ground switch causes another interrupt, which is then handled immediately, causing the hang.
Could be. I'd be willing to guess that it's not just one event but rather a stream of events of some kind. Possibly if it's due to the ground switch it's spuriously detecting a constant stream of button presses for the affected systems, which don't produce any UI visible result which would cause users to pull the accessory for whatever reason? Whatever's going on I bet it's broken accessories triggering it.
That seems to be unlikely. The average number of crashes per affected system is 1.92, which points to something the users are doing and less to a broken accessory.
We do observe crashes due to broken accessories, but in those cases the number of crashes per system tends to be much > higher.
Anyway, below is a patch with a possible fix. Of course, I still don't know what the patch originally tried to fix, so it might not do much if anything good.
I added the software debouncing before insertion task to ensue the better compatibility of OMTP Jack.
For example, it keeps button detection in the interrupt handler to avoid dropping button events, so if spurious button detection as you suspected is indeed (part of) the problem we might still see a large number of interrupts.
Guenter
Thanks a lot for your big efforts to implement the temporary fix and verifications. Would you please let me know the average number of crashes per affected system if you rollback to the pervious fix? Ref: https://jpn01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit. kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2 Fcommit%2Fsound%2Fsoc%2Fcodecs%3Fid%3D2d969e8f35b1849a43156029a7a6e294 3b89d0c0&data=05%7C01%7Cdavid.rau.zg%40renesas.com%7Cae6910f8ff4e4e299 bc408db084b1a2a%7C53d82571da1947e49cb4625a166a4a2a%7C0%7C0%7C638112890 873388020%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIi LCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=8KgHP%2FOD%2BTDcr rUVSATFkDCDDmhiCu7d5%2FKhyOszThA%3D&reserved=0 https://jpn01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit. kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2 Fcommit%2Fsound%2Fsoc%2Fcodecs%3Fid%3D06f5882122e3faa183d76c4ec2c92f4c 38e2c7bb&data=05%7C01%7Cdavid.rau.zg%40renesas.com%7Cae6910f8ff4e4e299 bc408db084b1a2a%7C53d82571da1947e49cb4625a166a4a2a%7C0%7C0%7C638112890 873388020%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIi LCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=WosfvANk0YxeJD5PG %2FnAuAWVqt7m4U3mMaYXefLLdS4%3D&reserved=0
You mean just keep the above two patches and revert 969357ec94e6 ? Sure, I can do that, but feedback from the field would take some 2-3 months. Is that what you recommend to do for now ?
Thanks, Guenter
Thanks for the feedback. What I mean is just do rollback to remove the "sleep" patch I did in your repository.
After the rollback, could you please let me know the average number of crashes per affected system with the same test conditions? Will it still take some 2-3 months?
Yes, due to our rollout schedules. Those are crashes observed in the field, after all.
Guenter
On Tue, Feb 07, 2023 at 02:42:14AM +0000, David Rau wrote:
You mean just keep the above two patches and revert 969357ec94e6 ? Sure, I can do that, but feedback from the field would take some 2-3 months. Is that what you recommend to do for now ?
Thanks, Guenter
Thanks for the feedback. What I mean is just do rollback to remove the "sleep" patch I did in your repository.
After the rollback, could you please let me know the average number of crashes per affected system with the same test conditions? Will it still take some 2-3 months?
The msleep() patch has been reverted in R111 and dev releases of ChromeOS. I did not get permission to land the revert in R110, meaning we'll continue to see the crashes there. R111 is expected to go to Beta shortly, so we should get _some_ feedback in the next few weeks.
Still, it would be great to get a more permanent fix for the underlying problem. Also, the msleep() patch is still upstream, so a solution is still needed there.
I can try to get and play with one of the affected Chromebooks, but I don't think that would help much since we still don't know what the undocumented ground switch is supposed to do.
Thanks, Guenter
-----Original Message----- From: Guenter Roeck groeck7@gmail.com On Behalf Of Guenter Roeck Sent: Thursday, February 9, 2023 02:04 To: David Rau david.rau.zg@renesas.com Cc: Mark Brown broonie@kernel.org; perex@perex.cz; lgirdwood@gmail.com; tiwai@suse.com; support.opensource@diasemi.com; alsa-devel@alsa-project.org; linux-kernel@vger.kernel.org Subject: Re: [PATCH] ASoC: da7219: Fix pole orientation detection on OMTP headsets when playing music
On Tue, Feb 07, 2023 at 02:42:14AM +0000, David Rau wrote:
You mean just keep the above two patches and revert 969357ec94e6 ? Sure, I can do that, but feedback from the field would take some 2-3 months. Is that what you recommend to do for now ?
Thanks, Guenter
Thanks for the feedback. What I mean is just do rollback to remove the "sleep" patch I did in your repository.
After the rollback, could you please let me know the average number of crashes per affected system with the same test conditions? Will it still take some 2-3 months?
The msleep() patch has been reverted in R111 and dev releases of ChromeOS. I did not get permission to land the revert in R110, meaning we'll continue to see the crashes there. R111 is expected to go to Beta shortly, so we should get _some_ feedback in the next few weeks. Still, it would be great to get a more permanent fix for the underlying problem. Also, the msleep() patch is still upstream, so a solution is still needed there. I can try to get and play with one of the affected Chromebooks, but I don't think that would help much since we still don't know what the undocumented ground switch is supposed to do.
Enable the GND switch earlier is needed to ensure the stable and smooth Jack detection.
Thanks, Guenter
Thanks for the kind explanation and feedback. I am verifying another method which do the msleep() in the individual schedule work to avoid such crash issue.
Would you please let me know how to reproduce this crash phenomenon? Then I can ensure the new solution is stronger and solve this problem as well.
Thanks. David
On 2/8/23 19:05, David Rau wrote:
-----Original Message----- From: Guenter Roeck groeck7@gmail.com On Behalf Of Guenter Roeck Sent: Thursday, February 9, 2023 02:04 To: David Rau david.rau.zg@renesas.com Cc: Mark Brown broonie@kernel.org; perex@perex.cz; lgirdwood@gmail.com; tiwai@suse.com; support.opensource@diasemi.com; alsa-devel@alsa-project.org; linux-kernel@vger.kernel.org Subject: Re: [PATCH] ASoC: da7219: Fix pole orientation detection on OMTP headsets when playing music
On Tue, Feb 07, 2023 at 02:42:14AM +0000, David Rau wrote:
You mean just keep the above two patches and revert 969357ec94e6 ? Sure, I can do that, but feedback from the field would take some 2-3 months. Is that what you recommend to do for now ?
Thanks, Guenter
Thanks for the feedback. What I mean is just do rollback to remove the "sleep" patch I did in your repository.
After the rollback, could you please let me know the average number of crashes per affected system with the same test conditions? Will it still take some 2-3 months?
The msleep() patch has been reverted in R111 and dev releases of ChromeOS. I did not get permission to land the revert in R110, meaning we'll continue to see the crashes there. R111 is expected to go to Beta shortly, so we should get _some_ feedback in the next few weeks. Still, it would be great to get a more permanent fix for the underlying problem. Also, the msleep() patch is still upstream, so a solution is still needed there. I can try to get and play with one of the affected Chromebooks, but I don't think that would help much since we still don't know what the undocumented ground switch is supposed to do.
Enable the GND switch earlier is needed to ensure the stable and smooth Jack detection.
"earlier" doesn't explain the changes, nor what "earlier" actually means. The original code enabled the ground switch in the probe function, only it never re-enabled it after the first insertion/removal sequence. I don't know the potential impact on power consumption, but an easier and more straightforward fix might be to (re-)enable the ground switch on jack removal and to keep it enabled while nothing is inserted (assuming that helps with detection).
The new (current) code enables the ground switch with each interrupt, even if that interrupt is a button interrupt. That is really difficult to understand and doesn't seem to make sense.
A diagram such as the one in Figure 20, describing exactly when the ground switch is supposed to be enabled, would help a lot. It would be even better if that diagram provided information about any extra delays needed after e_jack_detect_complete is raised and before the detection code actually runs.
Thanks, Guenter
Thanks for the kind explanation and feedback. I am verifying another method which do the msleep() in the individual schedule work to avoid such crash issue.
Would you please let me know how to reproduce this crash phenomenon? Then I can ensure the new solution is stronger and solve this problem as well.
Sorry, I can't, because the crash reports are from customers and automated. We don't know what they are doing, only that whatever they are doing causes the hang and thus crash.
Thanks, Guenter
On Sat, Feb 04, 2023 at 07:42:22AM -0800, Guenter Roeck wrote:
Solve the problem by enabling the ground switch immediately and only after an insertion has been detected. Delay pole orientation detection until after the chip reports that detection is complete plus an additional time depending on the chip configuration. Do this by implementing ground switch detection in a delayed worker.
This looks sensible to me.
-----Original Message----- From: Mark Brown broonie@kernel.org Sent: Friday, February 3, 2023 03:37 To: Guenter Roeck linux@roeck-us.net Cc: David Rau david.rau.zg@renesas.com; perex@perex.cz; lgirdwood@gmail.com; tiwai@suse.com; support.opensource@diasemi.com; alsa-devel@alsa-project.org; linux-kernel@vger.kernel.org Subject: Re: [PATCH] ASoC: da7219: Fix pole orientation detection on OMTP headsets when playing music
On Thu, Feb 02, 2023 at 10:39:51AM -0800, Guenter Roeck wrote:
On 2/2/23 09:04, Mark Brown wrote:
It's definitely unclear, there's a datasheet at [1] which does appear to explicitly call for a 512ms delay though (see figure 20 on page 50). It does look like it should only be applied in the case where an inserted jack is detected (ie, when identifying an accessory or button press) and not when removal is detected though.
The datasheet doesn't really suggest that a delay shall be applied using msleep (ie in the code). The chip presumably debounces internally (see
Obviously it doesn't call for an explicit implementation in the host.
jackdet_debounce and jackdet_rem_deb), and there is also jack_detect_rate to configure the detection rate. The table seems to suggest (to me) that there is an e_jack_insertion event, which would then be followed 64-512 ms later with an e_jack_detect_complete event.
Right, I think what I was looking at was that in combination of the fact that there's a *much* longer window before the host clears the interrupt shown on the first JACK_IN. It could be spurious and possibly just due to the host type check thing in the diagram but it smells real bad, like the hardware state machine has robustness issues or something. The diagram currently doesn't quite correspond to the code since we have the delay applied unconditionally, and there's that undocumented register for the ground switch being managed.
Whatever is done in software is on top of that, or at least that is my understanding, and not explained by anything in the datasheet.
Given that the chip itself supports debouncing internally, it is not clear to me what the delay is actually supposed to accomplish. Soft debounce on top of chip debounce ? I don't see that explained anywhere, though of course I might be missing it.
That's what it looks like it's trying to accomplish but as you say it's not exactly explicit. I *suspect* it's trying to debounce in more cases than is needed.
is not conceivable (or explained) why the ground switch is enabled only to be disabled immediately afterwards if a jack was removed.
It smells like there's a power benefit to leaving it disabled when unplugged (which seems plausible), and possibly like the detection is more stable with the ground switch enabled. The ground switch is not documented AFAICT (it's in register 0xfb which isn't named and doesn't appear to appear in the datsheet from a quick search). The code is leaving the switch enabled so long as an accessory is plugged.
I understand. What I don't understand is that it is always enabled in the interrupt handler, no matter if a jack was inserted or not, only to be disabled immediately if the jack was disabled or after insertion detection work is complete.
My guess was that it was making the detection more stable, it's surprising that it'd help with simple presence detection though.
I added this software debouncing to make DA7219 more stable to do Jack detection.
Are you saying this is actually crashing, or just that you're getting warnings about threads being blocked for too long (that was what was posted earlier in the thread)? The only things I can see that look like
ChromeOS is configured to crash after stalled threads are detected (ie after 120 seconds), so this is actually causing crashes.
Ah, that's much more serious than I'd understood from the log you posted.
Sorry to hear about that. Now I am refactoring the mechanism that remove the pervious delay in IRQ thread to avoid such race condition problem.
they have the potential to actually lock up are the cancel_work_sync() calls but they were unchanged and the backtrace you showed was showing the thread in the msleep(). My guess would be that you've got systems where there are very frequent jack detection events (potentiallly with broken accessories, or possibly due to the ground switch putting things into the wrong priority) and that the interrupt is firing again as soon as the thread unmasks the primary interrupt which means it never actually stops running.
That is what I strongly suspect is happening. I don't know why exactly the interrupt is firing continuously, but the hang is always in msleep(). One possibility might be that the event is actually a disconnect event, and that enabling and immediately disabling the ground switch causes another interrupt, which is then handled immediately, causing the hang.
Could be. I'd be willing to guess that it's not just one event but rather a stream of events of some kind. Possibly if it's due to the ground switch it's spuriously detecting a constant stream of button presses for the affected systems, which don't produce any UI visible result which would cause users to pull the accessory for whatever reason? Whatever's going on I bet it's broken accessories triggering it.
My other guess would be that moving the delay that's been added to a delayed work would avoid the warnings, though you might want to manually keep the physical interrupt disabled while that's running which is fun.
I am not sure if that would fix anything. The current code sleeps, then enables the ground switch and does the rest of the detection. I'd somewhat understand the code if it would enable the ground switch after an "insertion detected" interrupt, then wait for some amount of time and handle the rest of the detection after waiting (even though that should really be handled by the "detection complete" interrupt). But that isn't what it does. If we were to implement the above, I suspect the result would be that the interrupt still happens all the time, and the only difference would be that it would be "silenced" while the delayed work is waiting to be scheduled. That doesn't really fix the problem, it only works around it. But, sure, it would be much better than the current situation.
Yes, exactly - I was just looking at a refactoring in the code which would mitigate the immediate problem while keeping the current partially documented algorithm in place.
My "wild shot" fix would be to enable the ground switch after an insertion event and to drop the software sleep entirely.
That's entirely plausible to me, either together or possibly just one of those is actually needed. Do you want to send a patch?
I will send a patch after the complete verification and waveform measurement.
However, it is really impossible to know what the delay is for in the first place. Looking into the code further, the sleep time actually matches the configured jack detection rate. I have no idea why it would make sense to wait for a detection cycle after an event, then enable the ground switch and actually handle the event (which by then probably reports that jack detection is complete after an insertion). I really don't understand the logic behind that.
This all smells like there's either a race condition in a state machine somewhere or the button detection needs a bit of help (though if it's the latter then it'd be conditional on a microphone having been detected).
Hopefully David will get back to us with some explanation and ideally fix.
participants (4)
-
David Rau
-
David Rau
-
Guenter Roeck
-
Mark Brown