[alsa-devel] [PATCH] ASoC: omap-mcbsp: Add PM QoS support for McBSP to prevent glitches
We can get audio errors if hitting deeper idle states on omaps:
[alsa.c:230] error: Fatal problem with alsa output, error -5. [audio.c:614] error: Error in writing audio (Input/output error?)!
This seems to happen with off mode idle enabled as power for the whole SoC may get cut off between filling the McBSP fifo using DMA. While active DMA blocks deeper idle states in hardware, McBSP activity does not seem to do so.
Let's fix the issue by adding PM QoS latency requirement for McBSP. Based on my experiments a working latency value is somewhere around 35 ms with 40 ms breaking. So let's use a safer value of 30 ms.
Note that this is different from snd_pcm_hw_constraint_step() as that can configure things based on the buffer and period size. However, that does not help to block idle between the fifo fills.
Note that in theory some omaps are capable of playing audio while hitting deeper idle states. If somebody needs that and gets it working, we can set the latency requirements accordingly.
Signed-off-by: Tony Lindgren tony@atomide.com --- sound/soc/omap/mcbsp.c | 10 ++++++++++ sound/soc/omap/mcbsp.h | 2 ++ 2 files changed, 12 insertions(+)
diff --git a/sound/soc/omap/mcbsp.c b/sound/soc/omap/mcbsp.c --- a/sound/soc/omap/mcbsp.c +++ b/sound/soc/omap/mcbsp.c @@ -25,6 +25,7 @@ #include <linux/io.h> #include <linux/slab.h> #include <linux/pm_runtime.h> +#include <linux/pm_qos.h>
#include <linux/platform_data/asoc-ti-mcbsp.h>
@@ -643,6 +644,10 @@ void omap_mcbsp_start(struct omap_mcbsp *mcbsp, int tx, int rx) int enable_srg = 0; u16 w;
+ /* Prevent omap hardware from hitting off between fifo fills */ + pm_qos_add_request(&mcbsp->pm_qos_req, PM_QOS_CPU_DMA_LATENCY, + 30 * 1000); + if (mcbsp->st_data) omap_st_start(mcbsp);
@@ -731,6 +736,8 @@ void omap_mcbsp_stop(struct omap_mcbsp *mcbsp, int tx, int rx)
if (mcbsp->st_data) omap_st_stop(mcbsp); + + pm_qos_remove_request(&mcbsp->pm_qos_req); }
int omap2_mcbsp_set_clks_src(struct omap_mcbsp *mcbsp, u8 fck_src_id) @@ -1098,6 +1105,9 @@ err_thres:
void omap_mcbsp_cleanup(struct omap_mcbsp *mcbsp) { + if (dev_pm_qos_request_active(&mcbsp->pm_qos_req)) + pm_qos_remove_request(&mcbsp->pm_qos_req); + if (mcbsp->pdata->buffer_size) sysfs_remove_group(&mcbsp->dev->kobj, &additional_attr_group);
diff --git a/sound/soc/omap/mcbsp.h b/sound/soc/omap/mcbsp.h index 61e93b1..e603f33 100644 --- a/sound/soc/omap/mcbsp.h +++ b/sound/soc/omap/mcbsp.h @@ -325,6 +325,8 @@ struct omap_mcbsp { unsigned int in_freq; int clk_div; int wlen; + + struct pm_qos_request pm_qos_req; };
void omap_mcbsp_config(struct omap_mcbsp *mcbsp,
On 08/29/16 21:27, Tony Lindgren wrote:
We can get audio errors if hitting deeper idle states on omaps:
[alsa.c:230] error: Fatal problem with alsa output, error -5. [audio.c:614] error: Error in writing audio (Input/output error?)!
This seems to happen with off mode idle enabled as power for the whole SoC may get cut off between filling the McBSP fifo using DMA. While active DMA blocks deeper idle states in hardware, McBSP activity does not seem to do so.
Let's fix the issue by adding PM QoS latency requirement for McBSP. Based on my experiments a working latency value is somewhere around 35 ms with 40 ms breaking. So let's use a safer value of 30 ms.
Note that this is different from snd_pcm_hw_constraint_step() as that can configure things based on the buffer and period size. However, that does not help to block idle between the fifo fills.
Note that in theory some omaps are capable of playing audio while hitting deeper idle states. If somebody needs that and gets it working, we can set the latency requirements accordingly.
Signed-off-by: Tony Lindgren tony@atomide.com
sound/soc/omap/mcbsp.c | 10 ++++++++++ sound/soc/omap/mcbsp.h | 2 ++ 2 files changed, 12 insertions(+)
diff --git a/sound/soc/omap/mcbsp.c b/sound/soc/omap/mcbsp.c --- a/sound/soc/omap/mcbsp.c +++ b/sound/soc/omap/mcbsp.c @@ -25,6 +25,7 @@ #include <linux/io.h> #include <linux/slab.h> #include <linux/pm_runtime.h> +#include <linux/pm_qos.h>
#include <linux/platform_data/asoc-ti-mcbsp.h>
@@ -643,6 +644,10 @@ void omap_mcbsp_start(struct omap_mcbsp *mcbsp, int tx, int rx) int enable_srg = 0; u16 w;
- /* Prevent omap hardware from hitting off between fifo fills */
- pm_qos_add_request(&mcbsp->pm_qos_req, PM_QOS_CPU_DMA_LATENCY,
30 * 1000);
The latency value depends on the McBSP FIFO size and the speed the DMA can transfer the data. OMAP3.McBSP2 have 1280 word as buffer. All other McBSP across SoCs have 128 words or no FIFO.
The DMA fill speed will depend on several factors, I guess it is faster on OMAP4/5 then on OMAP3/2/1.
If we have McBSP w/o FIFO there is no need for the qos AFAIK. I'm fine with the 30ms, if you have done the testing on OMAP3.McBSP2, probably setting 30ms for McBSP with 1280 FIFO, 3ms for 128 FIFO and no qos for McBSP w/o FIFO might be better. Or:
latency = mcbsp->pdata->buffer_size * 23; /* 29.44ms on omap3's mcbsp2 */
if (latency) pm_qos_add_request(&mcbsp->pm_qos_req, PM_QOS_CPU_DMA_LATENCY, latency);
- if (mcbsp->st_data) omap_st_start(mcbsp);
@@ -731,6 +736,8 @@ void omap_mcbsp_stop(struct omap_mcbsp *mcbsp, int tx, int rx)
if (mcbsp->st_data) omap_st_stop(mcbsp);
- pm_qos_remove_request(&mcbsp->pm_qos_req);
}
int omap2_mcbsp_set_clks_src(struct omap_mcbsp *mcbsp, u8 fck_src_id) @@ -1098,6 +1105,9 @@ err_thres:
void omap_mcbsp_cleanup(struct omap_mcbsp *mcbsp) {
- if (dev_pm_qos_request_active(&mcbsp->pm_qos_req))
pm_qos_remove_request(&mcbsp->pm_qos_req);
- if (mcbsp->pdata->buffer_size) sysfs_remove_group(&mcbsp->dev->kobj, &additional_attr_group);
diff --git a/sound/soc/omap/mcbsp.h b/sound/soc/omap/mcbsp.h index 61e93b1..e603f33 100644 --- a/sound/soc/omap/mcbsp.h +++ b/sound/soc/omap/mcbsp.h @@ -325,6 +325,8 @@ struct omap_mcbsp { unsigned int in_freq; int clk_div; int wlen;
- struct pm_qos_request pm_qos_req;
};
void omap_mcbsp_config(struct omap_mcbsp *mcbsp,
* Peter Ujfalusi peter.ujfalusi@ti.com [160831 04:38]:
- /* Prevent omap hardware from hitting off between fifo fills */
- pm_qos_add_request(&mcbsp->pm_qos_req, PM_QOS_CPU_DMA_LATENCY,
30 * 1000);
The latency value depends on the McBSP FIFO size and the speed the DMA can transfer the data. OMAP3.McBSP2 have 1280 word as buffer. All other McBSP across SoCs have 128 words or no FIFO.
Yeah this is with mcbsp2 using boards like the beaglebone variants. Probably the same issue on 900 as reported by Pavel earlier.
The DMA fill speed will depend on several factors, I guess it is faster on OMAP4/5 then on OMAP3/2/1.
Probably this is only issue with mcbsp2..
If we have McBSP w/o FIFO there is no need for the qos AFAIK. I'm fine with the 30ms, if you have done the testing on OMAP3.McBSP2, probably setting 30ms for McBSP with 1280 FIFO, 3ms for 128 FIFO and no qos for McBSP w/o FIFO might be better. Or:
latency = mcbsp->pdata->buffer_size * 23; /* 29.44ms on omap3's mcbsp2 */
if (latency) pm_qos_add_request(&mcbsp->pm_qos_req, PM_QOS_CPU_DMA_LATENCY, latency);
OK let me run some tests with that and mcbsp2 fifo set to 128. My guess is that things already work for that case with no patches with the existing fifo based snd_pcm_hw_constraint_step() setting limits low enough so we never enter deeper idle states, but let's see :)
Regards,
Tony
* Tony Lindgren tony@atomide.com [160831 07:14]:
- Peter Ujfalusi peter.ujfalusi@ti.com [160831 04:38]:
If we have McBSP w/o FIFO there is no need for the qos AFAIK. I'm fine with the 30ms, if you have done the testing on OMAP3.McBSP2, probably setting 30ms for McBSP with 1280 FIFO, 3ms for 128 FIFO and no qos for McBSP w/o FIFO might be better. Or:
latency = mcbsp->pdata->buffer_size * 23; /* 29.44ms on omap3's mcbsp2 */
if (latency) pm_qos_add_request(&mcbsp->pm_qos_req, PM_QOS_CPU_DMA_LATENCY, latency);
OK let me run some tests with that and mcbsp2 fifo set to 128. My guess is that things already work for that case with no patches with the existing fifo based snd_pcm_hw_constraint_step() setting limits low enough so we never enter deeper idle states, but let's see :)
Well not quite so :) The fixed limit of 30 ms also works with fifo set to 128, but we still have audio fail without any patches.
So basically anything blocking off idle for cpuidle is enough. Retention idle is OK to allow. With your calculation retention idle never happens when FIFO size is 128. It seems the latency value should eventually be SoC and cpuidle policy specific that we can probably do with dev_pm_qos once we have set_latency_tolerance implemented.
I've also confirmed that we're hitting retention idle while playing a wav or mp3 file on omap3 even with FIFO limited to 128 and updated the patch description accordingly.
Updated version of the patch below.
Regards,
Tony
8< ----------------- From: Tony Lindgren tony@atomide.com Date: Mon, 29 Aug 2016 11:27:46 -0700 Subject: [PATCHv3] ASoC: omap-mcbsp: Add PM QoS support for McBSP to prevent glitches
We can get audio errors if hitting deeper idle states on omaps:
[alsa.c:230] error: Fatal problem with alsa output, error -5. [audio.c:614] error: Error in writing audio (Input/output error?)!
This seems to happen with off mode idle enabled as power for the whole SoC may get cut off between filling the McBSP fifo using DMA. While active DMA blocks deeper idle states in hardware, McBSP activity does not seem to do so.
Let's fix the issue by adding PM QoS latency requirement for McBSP. Based on my experiments a working latency value is somewhere around 35 ms with 40 ms breaking. So let's use a safer value of 30 ms.
Note that this is different from snd_pcm_hw_constraint_step() as that can configure things based on the buffer and period size. However, that does not help to block idle between the fifo fills.
The value for the latency should eventually come from SoC specific configured value for dev_pm_qos, but we can use the measured value for now.
I've confirmed that we are hitting core retention on omap3 while playing a wav and mp3 file even with mcbsp2 FIFO set to 128 bytes. If somebody needs to hit off idle while playing and gets it somehow working, we can set the latency requirements accordingly.
Signed-off-by: Tony Lindgren tony@atomide.com ---
Changes since v1 & 2:
- Configure the latency during init in case we need to set SoC and cpuidle specific values later on
- Update description and comments for latency requirements
sound/soc/omap/mcbsp.c | 21 +++++++++++++++++++++ sound/soc/omap/mcbsp.h | 3 +++ 2 files changed, 24 insertions(+)
diff --git a/sound/soc/omap/mcbsp.c b/sound/soc/omap/mcbsp.c --- a/sound/soc/omap/mcbsp.c +++ b/sound/soc/omap/mcbsp.c @@ -25,6 +25,7 @@ #include <linux/io.h> #include <linux/slab.h> #include <linux/pm_runtime.h> +#include <linux/pm_qos.h>
#include <linux/platform_data/asoc-ti-mcbsp.h>
@@ -643,6 +644,11 @@ void omap_mcbsp_start(struct omap_mcbsp *mcbsp, int tx, int rx) int enable_srg = 0; u16 w;
+ /* Prevent omap hardware from hitting off between fifo fills */ + if (mcbsp->latency) + pm_qos_add_request(&mcbsp->pm_qos_req, PM_QOS_CPU_DMA_LATENCY, + mcbsp->latency); + if (mcbsp->st_data) omap_st_start(mcbsp);
@@ -731,6 +737,8 @@ void omap_mcbsp_stop(struct omap_mcbsp *mcbsp, int tx, int rx)
if (mcbsp->st_data) omap_st_stop(mcbsp); + + pm_qos_remove_request(&mcbsp->pm_qos_req); }
int omap2_mcbsp_set_clks_src(struct omap_mcbsp *mcbsp, u8 fck_src_id) @@ -1064,6 +1072,16 @@ int omap_mcbsp_init(struct platform_device *pdev) mcbsp->max_tx_thres = max_thres(mcbsp) - 0x10; mcbsp->max_rx_thres = max_thres(mcbsp) - 0x10;
+ /* + * Prevent device from hitting deeper idle states between + * DMA fifo loads. On omap3 mcbsp2 we have a larger FIFO of + * 1280 bytes instead of the usual 128 bytes, and the measured + * max latency we can tolerate is somewhere below 40 ms. To be + * safe, use a fixed value of 30 ms to block off idle on omap3 + * while still allowing retention idle. + */ + mcbsp->latency = 30 * 1000; + ret = sysfs_create_group(&mcbsp->dev->kobj, &additional_attr_group); if (ret) { @@ -1098,6 +1116,9 @@ err_thres:
void omap_mcbsp_cleanup(struct omap_mcbsp *mcbsp) { + if (pm_qos_request_active(&mcbsp->pm_qos_req)) + pm_qos_remove_request(&mcbsp->pm_qos_req); + if (mcbsp->pdata->buffer_size) sysfs_remove_group(&mcbsp->dev->kobj, &additional_attr_group);
diff --git a/sound/soc/omap/mcbsp.h b/sound/soc/omap/mcbsp.h --- a/sound/soc/omap/mcbsp.h +++ b/sound/soc/omap/mcbsp.h @@ -325,6 +325,9 @@ struct omap_mcbsp { unsigned int in_freq; int clk_div; int wlen; + + s32 latency; + struct pm_qos_request pm_qos_req; };
void omap_mcbsp_config(struct omap_mcbsp *mcbsp,
On 08/31/16 19:59, Tony Lindgren wrote:
- Tony Lindgren tony@atomide.com [160831 07:14]:
- Peter Ujfalusi peter.ujfalusi@ti.com [160831 04:38]:
If we have McBSP w/o FIFO there is no need for the qos AFAIK. I'm fine with the 30ms, if you have done the testing on OMAP3.McBSP2, probably setting 30ms for McBSP with 1280 FIFO, 3ms for 128 FIFO and no qos for McBSP w/o FIFO might be better. Or:
latency = mcbsp->pdata->buffer_size * 23; /* 29.44ms on omap3's mcbsp2 */
if (latency) pm_qos_add_request(&mcbsp->pm_qos_req, PM_QOS_CPU_DMA_LATENCY, latency);
OK let me run some tests with that and mcbsp2 fifo set to 128. My guess is that things already work for that case with no patches with the existing fifo based snd_pcm_hw_constraint_step() setting limits low enough so we never enter deeper idle states, but let's see :)
Well not quite so :) The fixed limit of 30 ms also works with fifo set to 128, but we still have audio fail without any patches.
does it fail at start time or later? At start time we will have ~10 DMA bursts of 128 words to fill the FIFO on McBSP2.
So basically anything blocking off idle for cpuidle is enough. Retention idle is OK to allow. With your calculation retention idle never happens when FIFO size is 128. It seems the latency value should eventually be SoC and cpuidle policy specific that we can probably do with dev_pm_qos once we have set_latency_tolerance implemented.
I've also confirmed that we're hitting retention idle while playing a wav or mp3 file on omap3 even with FIFO limited to 128 and updated the patch description accordingly.
Updated version of the patch below.
Regards,
Tony
8< ----------------- From: Tony Lindgren tony@atomide.com Date: Mon, 29 Aug 2016 11:27:46 -0700 Subject: [PATCHv3] ASoC: omap-mcbsp: Add PM QoS support for McBSP to prevent glitches
We can get audio errors if hitting deeper idle states on omaps:
[alsa.c:230] error: Fatal problem with alsa output, error -5. [audio.c:614] error: Error in writing audio (Input/output error?)!
This seems to happen with off mode idle enabled as power for the whole SoC may get cut off between filling the McBSP fifo using DMA. While active DMA blocks deeper idle states in hardware, McBSP activity does not seem to do so.
Let's fix the issue by adding PM QoS latency requirement for McBSP. Based on my experiments a working latency value is somewhere around 35 ms with 40 ms breaking. So let's use a safer value of 30 ms.
Note that this is different from snd_pcm_hw_constraint_step() as that can configure things based on the buffer and period size. However, that does not help to block idle between the fifo fills.
The value for the latency should eventually come from SoC specific configured value for dev_pm_qos, but we can use the measured value for now.
I've confirmed that we are hitting core retention on omap3 while playing a wav and mp3 file even with mcbsp2 FIFO set to 128 bytes. If somebody needs to hit off idle while playing and gets it somehow working, we can set the latency requirements accordingly.
Signed-off-by: Tony Lindgren tony@atomide.com
Changes since v1 & 2:
Configure the latency during init in case we need to set SoC and cpuidle specific values later on
Update description and comments for latency requirements
sound/soc/omap/mcbsp.c | 21 +++++++++++++++++++++ sound/soc/omap/mcbsp.h | 3 +++ 2 files changed, 24 insertions(+)
diff --git a/sound/soc/omap/mcbsp.c b/sound/soc/omap/mcbsp.c --- a/sound/soc/omap/mcbsp.c +++ b/sound/soc/omap/mcbsp.c @@ -25,6 +25,7 @@ #include <linux/io.h> #include <linux/slab.h> #include <linux/pm_runtime.h> +#include <linux/pm_qos.h>
#include <linux/platform_data/asoc-ti-mcbsp.h>
@@ -643,6 +644,11 @@ void omap_mcbsp_start(struct omap_mcbsp *mcbsp, int tx, int rx) int enable_srg = 0; u16 w;
- /* Prevent omap hardware from hitting off between fifo fills */
- if (mcbsp->latency)
pm_qos_add_request(&mcbsp->pm_qos_req, PM_QOS_CPU_DMA_LATENCY,
mcbsp->latency);
- if (mcbsp->st_data) omap_st_start(mcbsp);
@@ -731,6 +737,8 @@ void omap_mcbsp_stop(struct omap_mcbsp *mcbsp, int tx, int rx)
if (mcbsp->st_data) omap_st_stop(mcbsp);
- pm_qos_remove_request(&mcbsp->pm_qos_req);
}
int omap2_mcbsp_set_clks_src(struct omap_mcbsp *mcbsp, u8 fck_src_id) @@ -1064,6 +1072,16 @@ int omap_mcbsp_init(struct platform_device *pdev) mcbsp->max_tx_thres = max_thres(mcbsp) - 0x10; mcbsp->max_rx_thres = max_thres(mcbsp) - 0x10;
/*
* Prevent device from hitting deeper idle states between
* DMA fifo loads. On omap3 mcbsp2 we have a larger FIFO of
* 1280 bytes instead of the usual 128 bytes, and the measured
not bytes, but words. The word size does not matter, it can be 8, 16 or 32bits, but we have 1280 or 128 of these words.
* max latency we can tolerate is somewhere below 40 ms. To be
* safe, use a fixed value of 30 ms to block off idle on omap3
* while still allowing retention idle.
*/
mcbsp->latency = 30 * 1000;
were there issue with calculating the latency like mcbsp->latency = max_thres(mcbsp) * 23;
- ret = sysfs_create_group(&mcbsp->dev->kobj, &additional_attr_group); if (ret) {
@@ -1098,6 +1116,9 @@ err_thres:
void omap_mcbsp_cleanup(struct omap_mcbsp *mcbsp) {
- if (pm_qos_request_active(&mcbsp->pm_qos_req))
pm_qos_remove_request(&mcbsp->pm_qos_req);
- if (mcbsp->pdata->buffer_size) sysfs_remove_group(&mcbsp->dev->kobj, &additional_attr_group);
diff --git a/sound/soc/omap/mcbsp.h b/sound/soc/omap/mcbsp.h --- a/sound/soc/omap/mcbsp.h +++ b/sound/soc/omap/mcbsp.h @@ -325,6 +325,9 @@ struct omap_mcbsp { unsigned int in_freq; int clk_div; int wlen;
- s32 latency;
- struct pm_qos_request pm_qos_req;
};
void omap_mcbsp_config(struct omap_mcbsp *mcbsp,
* Peter Ujfalusi peter.ujfalusi@ti.com [160831 11:34]:
On 08/31/16 19:59, Tony Lindgren wrote:
- Tony Lindgren tony@atomide.com [160831 07:14]:
- Peter Ujfalusi peter.ujfalusi@ti.com [160831 04:38]:
If we have McBSP w/o FIFO there is no need for the qos AFAIK. I'm fine with the 30ms, if you have done the testing on OMAP3.McBSP2, probably setting 30ms for McBSP with 1280 FIFO, 3ms for 128 FIFO and no qos for McBSP w/o FIFO might be better. Or:
latency = mcbsp->pdata->buffer_size * 23; /* 29.44ms on omap3's mcbsp2 */
if (latency) pm_qos_add_request(&mcbsp->pm_qos_req, PM_QOS_CPU_DMA_LATENCY, latency);
OK let me run some tests with that and mcbsp2 fifo set to 128. My guess is that things already work for that case with no patches with the existing fifo based snd_pcm_hw_constraint_step() setting limits low enough so we never enter deeper idle states, but let's see :)
Well not quite so :) The fixed limit of 30 ms also works with fifo set to 128, but we still have audio fail without any patches.
does it fail at start time or later? At start time we will have ~10 DMA bursts of 128 words to fill the FIFO on McBSP2.
The problems start only later on, about 10 seconds into playing of course depending on things like console activity etc.
@@ -1064,6 +1072,16 @@ int omap_mcbsp_init(struct platform_device *pdev) mcbsp->max_tx_thres = max_thres(mcbsp) - 0x10; mcbsp->max_rx_thres = max_thres(mcbsp) - 0x10;
/*
* Prevent device from hitting deeper idle states between
* DMA fifo loads. On omap3 mcbsp2 we have a larger FIFO of
* 1280 bytes instead of the usual 128 bytes, and the measured
not bytes, but words. The word size does not matter, it can be 8, 16 or 32bits, but we have 1280 or 128 of these words.
Oh OK, will fix.
* max latency we can tolerate is somewhere below 40 ms. To be
* safe, use a fixed value of 30 ms to block off idle on omap3
* while still allowing retention idle.
*/
mcbsp->latency = 30 * 1000;
were there issue with calculating the latency like mcbsp->latency = max_thres(mcbsp) * 23;
Yes, we're not hitting retention idle at all during play with FIFO set to 128 words.
Regards,
Tony
On 08/31/16 22:41, Tony Lindgren wrote:
- Peter Ujfalusi peter.ujfalusi@ti.com [160831 11:34]:
On 08/31/16 19:59, Tony Lindgren wrote:
- Tony Lindgren tony@atomide.com [160831 07:14]:
- Peter Ujfalusi peter.ujfalusi@ti.com [160831 04:38]:
If we have McBSP w/o FIFO there is no need for the qos AFAIK. I'm fine with the 30ms, if you have done the testing on OMAP3.McBSP2, probably setting 30ms for McBSP with 1280 FIFO, 3ms for 128 FIFO and no qos for McBSP w/o FIFO might be better. Or:
latency = mcbsp->pdata->buffer_size * 23; /* 29.44ms on omap3's mcbsp2 */
if (latency) pm_qos_add_request(&mcbsp->pm_qos_req, PM_QOS_CPU_DMA_LATENCY, latency);
OK let me run some tests with that and mcbsp2 fifo set to 128. My guess is that things already work for that case with no patches with the existing fifo based snd_pcm_hw_constraint_step() setting limits low enough so we never enter deeper idle states, but let's see :)
Well not quite so :) The fixed limit of 30 ms also works with fifo set to 128, but we still have audio fail without any patches.
does it fail at start time or later? At start time we will have ~10 DMA bursts of 128 words to fill the FIFO on McBSP2.
OK, so it is not happening during DMA bursts.
The problems start only later on, about 10 seconds into playing of course depending on things like console activity etc.
@@ -1064,6 +1072,16 @@ int omap_mcbsp_init(struct platform_device *pdev) mcbsp->max_tx_thres = max_thres(mcbsp) - 0x10; mcbsp->max_rx_thres = max_thres(mcbsp) - 0x10;
/*
* Prevent device from hitting deeper idle states between
* DMA fifo loads. On omap3 mcbsp2 we have a larger FIFO of
* 1280 bytes instead of the usual 128 bytes, and the measured
not bytes, but words. The word size does not matter, it can be 8, 16 or 32bits, but we have 1280 or 128 of these words.
Oh OK, will fix.
* max latency we can tolerate is somewhere below 40 ms. To be
* safe, use a fixed value of 30 ms to block off idle on omap3
* while still allowing retention idle.
*/
mcbsp->latency = 30 * 1000;
were there issue with calculating the latency like mcbsp->latency = max_thres(mcbsp) * 23;
Yes, we're not hitting retention idle at all during play with FIFO set to 128 words.
What is the audio format you are using? sampling rate and how many channels? cat /proc/asound/card0/pcm0p/sub0/hw_params
What happens if you set the FIFO to 256? Do you still need the 30ms or it can be higher, like 60ms?
When the FIFO is set to 128, it means that after the initial FIFO fill we will have DMA request coming from McBSP to sDMA with a rate of:
(1000/sampling_rate) * (FIFO-threshold / channels) = DMA_req_distance_in_ms
So in case of 44.1KHz, stereo with 128 FIFO threshold DMA request will come at every 1.45ms. If I'm not mistaken. The whole FIFO (1280) holds 14.51ms of audio in this case.
I don't see this correlate with the 30ms at all.
Do you see anything in the kernel log? If McBSP is not powered and DMA tries to write data we should see l3 error at least. If the reason is that the system is not able to wake up fast enough to react to the DMA request from McBSP, then we should see FIFO underrun from McBSP (this patch will show them: http://mailman.alsa-project.org/pipermail/alsa-devel/2016-August/111671.html)
Have you tried with different sampling rates? That would have similar effect as changing the FIFO threshold for the DMA request periods.
* Peter Ujfalusi peter.ujfalusi@ti.com [160831 23:58]:
What is the audio format you are using? sampling rate and how many channels? cat /proc/asound/card0/pcm0p/sub0/hw_params
Just 44.1 stereo mp3 or wav files.
What happens if you set the FIFO to 256? Do you still need the 30ms or it can be higher, like 60ms?
No this is not related to the FIFO size. We just need to block off idle mode for cpuidle as the McBSP hardware is not blocking it.
When the FIFO is set to 128, it means that after the initial FIFO fill we will have DMA request coming from McBSP to sDMA with a rate of:
(1000/sampling_rate) * (FIFO-threshold / channels) = DMA_req_distance_in_ms
So in case of 44.1KHz, stereo with 128 FIFO threshold DMA request will come at every 1.45ms. If I'm not mistaken. The whole FIFO (1280) holds 14.51ms of audio in this case.
I don't see this correlate with the 30ms at all.
It seems we easily have a situation where DMA is done buffering to McBSP, and PMIC is playing audio, and we hit idle. At that point there are no immediate timers pending and cpuidle determines we can try to hit a deeper idle mode. As there are no hardware blockers with DMA off and McBSP not blocking, the hardware hits off mode. This cuts power to McBSP.
Ideally we'd configure McBSP activity to block deeper idle states in the hardware but I don't think we have such a configuration available.
So we just want to tell pm_qos that McBSP can't take any idle states shorter than 30 ms to prevent cutting off power fro the whole SoC.
Do you see anything in the kernel log? If McBSP is not powered and DMA tries to write data we should see l3 error at least. If the reason is that the system is not able to wake up fast enough to react to the DMA request from McBSP, then we should see FIFO underrun from McBSP (this patch will show them: http://mailman.alsa-project.org/pipermail/alsa-devel/2016-August/111671.html)
No, nothing as both McBSP and DMA are powered off :) Coming back from idle, from McBSP and DMA point of view things appear as if nothing happened. I think we're still missing the save and restore of the McBSP state, but that's a different story and does not help with the fact that we allow cutting off the power to McBSP during playback.
With Linux next with the patch above, I do see occasional interrupts for "omap-mcbsp 49022000.mcbsp: TX Buffer Overflow!" but those seem to also happen if I keep the system busy and I don't hear any dropped samples.
Have you tried with different sampling rates? That would have similar effect as changing the FIFO threshold for the DMA request periods.
Not very much, but this seems more like a SoC specific issue.
Regards,
Tony
On 09/01/16 17:50, Tony Lindgren wrote:
- Peter Ujfalusi peter.ujfalusi@ti.com [160831 23:58]:
What is the audio format you are using? sampling rate and how many channels? cat /proc/asound/card0/pcm0p/sub0/hw_params
Just 44.1 stereo mp3 or wav files.
What happens if you set the FIFO to 256? Do you still need the 30ms or it can be higher, like 60ms?
No this is not related to the FIFO size. We just need to block off idle mode for cpuidle as the McBSP hardware is not blocking it.
So this happens with any threshold value? Even in element mode, where we have the threshold set to 2?
When the FIFO is set to 128, it means that after the initial FIFO fill we will have DMA request coming from McBSP to sDMA with a rate of:
(1000/sampling_rate) * (FIFO-threshold / channels) = DMA_req_distance_in_ms
So in case of 44.1KHz, stereo with 128 FIFO threshold DMA request will come at every 1.45ms. If I'm not mistaken. The whole FIFO (1280) holds 14.51ms of audio in this case.
I don't see this correlate with the 30ms at all.
It seems we easily have a situation where DMA is done buffering to McBSP, and PMIC is playing audio, and we hit idle. At that point there are no immediate timers pending and cpuidle determines we can try to hit a deeper idle mode. As there are no hardware blockers with DMA off and McBSP not blocking, the hardware hits off mode. This cuts power to McBSP.
Ideally we'd configure McBSP activity to block deeper idle states in the hardware but I don't think we have such a configuration available.
I wonder why we have not seen this before? I can not recall anything like this with n900 (Jarkko might know that better) neither with n9/n950. On the n9/n950 I have even put the OMAP3 to OFF during audio playback with the DAC33 and still it worked just fine. On the twl4030 audio side we used 1024 or something as threshold during audio playback and we do not had QoS placed at all. The C state numbers must be different what we had and what we have in mainline now days, but still this is something I don't expect to happen.
Why McBSP is not working correctly in smart-idle mode in upstream, when it appeared to be working fine on n900/n9/n950?
So we just want to tell pm_qos that McBSP can't take any idle states shorter than 30 ms to prevent cutting off power fro the whole SoC.
based on the numbers in the arch/arm/mach-omap2/cpuidle34xx.c we are trying to block C7 since the exit_latency for it is (10000 + 30000)us, right? Or is it the traget_residency we need to look at when we set PM_QOS_CPU_DMA_LATENCY?
Still in case of 44.1KHz/Stereo, FIFO threshold at 128 we will have DMA requests coming at 1.45ms with the whole FIFO holding 14.51ms of data. In order to avoid draining the FIFO, the latency for omap3.mcbsp2 when the audio is 44.1KHz, stereo must be set to 14.51ms maximum, but I would set it to: (1000/sampling_rate) * ((FIFOsize - threshold) / channels) in ms.
Certainly not 30ms. What happens if someone changes the C7 state number and it is going to be less than 30ms? We will hit C7 and have the same issue? I suppose yes. This is my main problem with the 30ms. It is just an arbitrary number which matches some number in cpuidle34xx.c, but it has nothing to do with the real requirement of McBSP, which is that we should not hit a state where the wakeup is going to take more time than what we can compensate from the McBSP FIFO.
Furthermore: in case of other McBSPs where they have 128 word FIFO, which holds in your use case 1.45ms of audio, how it will survive the C6 wakeup latency of (3000 + 8500)us? It can tolerate C3 at maximum, C4 takes (1500 + 1800)us.
Sure, we will not going to be able to hit C6 with this based on the numbers we have in cpuidle34xx.c, but I have no view on how those numbers were calculated...
Do you see anything in the kernel log? If McBSP is not powered and DMA tries to write data we should see l3 error at least. If the reason is that the system is not able to wake up fast enough to react to the DMA request from McBSP, then we should see FIFO underrun from McBSP (this patch will show them: http://mailman.alsa-project.org/pipermail/alsa-devel/2016-August/111671.html)
No, nothing as both McBSP and DMA are powered off :) Coming back from idle, from McBSP and DMA point of view things appear as if nothing happened. I think we're still missing the save and restore of the McBSP state, but that's a different story and does not help with the fact that we allow cutting off the power to McBSP during playback.
With Linux next with the patch above, I do see occasional interrupts for "omap-mcbsp 49022000.mcbsp: TX Buffer Overflow!" but those seem to also happen if I keep the system busy and I don't hear any dropped samples.
Have you tried with different sampling rates? That would have similar effect as changing the FIFO threshold for the DMA request periods.
Not very much, but this seems more like a SoC specific issue.
* Peter Ujfalusi peter.ujfalusi@ti.com [160902 00:30]:
On 09/01/16 17:50, Tony Lindgren wrote:
- Peter Ujfalusi peter.ujfalusi@ti.com [160831 23:58]:
What is the audio format you are using? sampling rate and how many channels? cat /proc/asound/card0/pcm0p/sub0/hw_params
Just 44.1 stereo mp3 or wav files.
What happens if you set the FIFO to 256? Do you still need the 30ms or it can be higher, like 60ms?
No this is not related to the FIFO size. We just need to block off idle mode for cpuidle as the McBSP hardware is not blocking it.
So this happens with any threshold value? Even in element mode, where we have the threshold set to 2?
Probably. How can I test the above?
When the FIFO is set to 128, it means that after the initial FIFO fill we will have DMA request coming from McBSP to sDMA with a rate of:
(1000/sampling_rate) * (FIFO-threshold / channels) = DMA_req_distance_in_ms
So in case of 44.1KHz, stereo with 128 FIFO threshold DMA request will come at every 1.45ms. If I'm not mistaken. The whole FIFO (1280) holds 14.51ms of audio in this case.
I don't see this correlate with the 30ms at all.
It seems we easily have a situation where DMA is done buffering to McBSP, and PMIC is playing audio, and we hit idle. At that point there are no immediate timers pending and cpuidle determines we can try to hit a deeper idle mode. As there are no hardware blockers with DMA off and McBSP not blocking, the hardware hits off mode. This cuts power to McBSP.
Ideally we'd configure McBSP activity to block deeper idle states in the hardware but I don't think we have such a configuration available.
I wonder why we have not seen this before? I can not recall anything like this with n900 (Jarkko might know that better) neither with n9/n950. On the n9/n950 I have even put the OMAP3 to OFF during audio playback with the DAC33 and still it worked just fine. On the twl4030 audio side we used 1024 or something as threshold during audio playback and we do not had QoS placed at all. The C state numbers must be different what we had and what we have in mainline now days, but still this is something I don't expect to happen.
Why McBSP is not working correctly in smart-idle mode in upstream, when it appeared to be working fine on n900/n9/n950?
I think the difference is that the audio chip on n900/n9/n950 is a buffering one while the pmic is not. Plus I think we're also missing McBSP state save and restore in the mainline kernel for off mode. And we're missing the piece of code that in the Nokia kernel blocked off idle when McBSP was active. And we're missing some kind of wakeirq that tells to wake up McBSP, restore it's state and refill the fifo.
The wakeirq could be anything like audio codec GPIO in bank one or PMIC GPIO. Or it could be just a Linux timer that wakes up the system to refill the fifo.
So we just want to tell pm_qos that McBSP can't take any idle states shorter than 30 ms to prevent cutting off power fro the whole SoC.
based on the numbers in the arch/arm/mach-omap2/cpuidle34xx.c we are trying to block C7 since the exit_latency for it is (10000 + 30000)us, right? Or is it the traget_residency we need to look at when we set PM_QOS_CPU_DMA_LATENCY?
Yeah anything blocking C7 is enough because McBSP can't survive off mode.
Still in case of 44.1KHz/Stereo, FIFO threshold at 128 we will have DMA requests coming at 1.45ms with the whole FIFO holding 14.51ms of data. In order to avoid draining the FIFO, the latency for omap3.mcbsp2 when the audio is 44.1KHz, stereo must be set to 14.51ms maximum, but I would set it to: (1000/sampling_rate) * ((FIFOsize - threshold) / channels) in ms.
Yes if it was fifo related I'd agree :) But it's not, see below.
I think the reason why we don't need to set any pm_qos latency limit with retention idle is that the McBSP hardware will automatigally wake up the system to a fifo threshold interrupt or a timer which will trigger another dma transaction. As long as the wakeup latency from retention idle is less than the time needed to refill the fifo, everything works automagically in the hardware. So the calculation above is not needed at all and unnecessarily blocks core retention during idle.
Certainly not 30ms. What happens if someone changes the C7 state number and it is going to be less than 30ms? We will hit C7 and have the same issue? I suppose yes.
That value is a hardware limitation of the SoC for the only known working case of off idle in the mainline kernel. But yes it would have to be SoC specific if we ever get let's say omap4 off idle working in the mainline kernel. In that case I'd make SoC specific using someting like compatible = "ti,omap4-mcbsp".
This is my main problem with the 30ms. It is just an arbitrary number which matches some number in cpuidle34xx.c, but it has nothing to do with the real requirement of McBSP, which is that we should not hit a state where the wakeup is going to take more time than what we can compensate from the McBSP FIFO.
Yeah I agree this number is somewhat artificial as it is a value that's a limitation of the SoC and not the McBSP. I don't know a better way to block C7 in a Linux generic way though when McBSP is active.
Furthermore: in case of other McBSPs where they have 128 word FIFO, which holds in your use case 1.45ms of audio, how it will survive the C6 wakeup latency of (3000 + 8500)us? It can tolerate C3 at maximum, C4 takes (1500 + 1800)us.
That's because the hardware or a timer triggers the next dma automagically so we don't need to do anything.
Sure, we will not going to be able to hit C6 with this based on the numbers we have in cpuidle34xx.c, but I have no view on how those numbers were calculated...
Right, but we don't need to block C6 because of the hardware and/or Linux timers doing things for us :)
Regards,
Tony
On 09/02/2016 05:56 PM, Tony Lindgren wrote:
- Peter Ujfalusi peter.ujfalusi@ti.com [160902 00:30]:
On 09/01/16 17:50, Tony Lindgren wrote:
- Peter Ujfalusi peter.ujfalusi@ti.com [160831 23:58]:
What is the audio format you are using? sampling rate and how many channels? cat /proc/asound/card0/pcm0p/sub0/hw_params
Just 44.1 stereo mp3 or wav files.
What happens if you set the FIFO to 256? Do you still need the 30ms or it can be higher, like 60ms?
No this is not related to the FIFO size. We just need to block off idle mode for cpuidle as the McBSP hardware is not blocking it.
So this happens with any threshold value? Even in element mode, where we have the threshold set to 2?
Probably. How can I test the above?
set 'element' mode to the used McBSP's dma_op_mode file in sysfs. When in 'threshold' mode, you can change the FIFO threshold.
When the FIFO is set to 128, it means that after the initial FIFO fill we will have DMA request coming from McBSP to sDMA with a rate of:
(1000/sampling_rate) * (FIFO-threshold / channels) = DMA_req_distance_in_ms
So in case of 44.1KHz, stereo with 128 FIFO threshold DMA request will come at every 1.45ms. If I'm not mistaken. The whole FIFO (1280) holds 14.51ms of audio in this case.
I don't see this correlate with the 30ms at all.
It seems we easily have a situation where DMA is done buffering to McBSP, and PMIC is playing audio, and we hit idle. At that point there are no immediate timers pending and cpuidle determines we can try to hit a deeper idle mode. As there are no hardware blockers with DMA off and McBSP not blocking, the hardware hits off mode. This cuts power to McBSP.
Ideally we'd configure McBSP activity to block deeper idle states in the hardware but I don't think we have such a configuration available.
I wonder why we have not seen this before? I can not recall anything like this with n900 (Jarkko might know that better) neither with n9/n950. On the n9/n950 I have even put the OMAP3 to OFF during audio playback with the DAC33 and still it worked just fine. On the twl4030 audio side we used 1024 or something as threshold during audio playback and we do not had QoS placed at all. The C state numbers must be different what we had and what we have in mainline now days, but still this is something I don't expect to happen.
Why McBSP is not working correctly in smart-idle mode in upstream, when it appeared to be working fine on n900/n9/n950?
I think the difference is that the audio chip on n900/n9/n950 is a buffering one while the pmic is not.
n900 uses aic3x for audio with McBSP2. On n9/n950 we have dac33 with McBSP2 and twl5030 on McBSP3. The DAC33 can be set in bypass mode when it is not buffering.
Plus I think we're also missing McBSP state save and restore in the mainline kernel for off mode.
In Nokia kernel we did not had suspend/resume either in mcbsp driver.
And we're missing the piece of code that in the Nokia kernel blocked off idle when McBSP was active. And we're missing some kind of wakeirq that tells to wake up McBSP, restore it's state and refill the fifo.
Hrm, it is possible that the C state numbers in Nokia kernel were adjusted, but I don't see anything in the McBSP or audio codes to do this.
The wakeirq could be anything like audio codec GPIO in bank one or PMIC GPIO. Or it could be just a Linux timer that wakes up the system to refill the fifo.
Yeah, for the DAC3 mode7lp mode we had the gpio irq to wake the system up and there I have set the QoS to be able to wake up in time.
So we just want to tell pm_qos that McBSP can't take any idle states shorter than 30 ms to prevent cutting off power fro the whole SoC.
based on the numbers in the arch/arm/mach-omap2/cpuidle34xx.c we are trying to block C7 since the exit_latency for it is (10000 + 30000)us, right? Or is it the traget_residency we need to look at when we set PM_QOS_CPU_DMA_LATENCY?
Yeah anything blocking C7 is enough because McBSP can't survive off mode.
Still in case of 44.1KHz/Stereo, FIFO threshold at 128 we will have DMA requests coming at 1.45ms with the whole FIFO holding 14.51ms of data. In order to avoid draining the FIFO, the latency for omap3.mcbsp2 when the audio is 44.1KHz, stereo must be set to 14.51ms maximum, but I would set it to: (1000/sampling_rate) * ((FIFOsize - threshold) / channels) in ms.
Yes if it was fifo related I'd agree :) But it's not, see below.
I think the reason why we don't need to set any pm_qos latency limit with retention idle is that the McBSP hardware will automatigally wake up the system to a fifo threshold interrupt or a timer which will trigger another dma transaction. As long as the wakeup latency from retention idle is less than the time needed to refill the fifo, everything works automagically in the hardware. So the calculation above is not needed at all and unnecessarily blocks core retention during idle.
Certainly not 30ms. What happens if someone changes the C7 state number and it is going to be less than 30ms? We will hit C7 and have the same issue? I suppose yes.
That value is a hardware limitation of the SoC for the only known working case of off idle in the mainline kernel. But yes it would have to be SoC specific if we ever get let's say omap4 off idle working in the mainline kernel. In that case I'd make SoC specific using someting like compatible = "ti,omap4-mcbsp".
This is my main problem with the 30ms. It is just an arbitrary number which matches some number in cpuidle34xx.c, but it has nothing to do with the real requirement of McBSP, which is that we should not hit a state where the wakeup is going to take more time than what we can compensate from the McBSP FIFO.
Yeah I agree this number is somewhat artificial as it is a value that's a limitation of the SoC and not the McBSP. I don't know a better way to block C7 in a Linux generic way though when McBSP is active.
Furthermore: in case of other McBSPs where they have 128 word FIFO, which holds in your use case 1.45ms of audio, how it will survive the C6 wakeup latency of (3000 + 8500)us? It can tolerate C3 at maximum, C4 takes (1500 + 1800)us.
That's because the hardware or a timer triggers the next dma automagically so we don't need to do anything.
McBSP is triggering thee DMA request. In case of non OMAP3.McBSP2 the time between them is shorter than 1.45ms. as the maximum FIFO size is 128. But in your case you also have threshold of 128, which means that we have DMA requests coming in every 1.45ms. In theory no audio should be working on OMAP3 if we can hit C7 runtime.
Sure, we will not going to be able to hit C6 with this based on the numbers we have in cpuidle34xx.c, but I have no view on how those numbers were calculated...
Right, but we don't need to block C6 because of the hardware and/or Linux timers doing things for us :)
But the correct QoS latency for McBSP is the one which would ensure that the FIFO will be not drained. This is the whole point of PM_QOS. And that is: (1000/sampling_rate) * ((FIFOsize - threshold) / channels) In case of playback
On OMAP3.McBSP2 for example (44.1KHz, stereo): FIFO threshold 128 - DMA request will be triggered when 128 slots are free in the FIFO - at that point we have still 1152 words in the FIFO. - if the C wakeup latency is longer then what it takes to play out the samples from the FIFO (13.06ms), we will drain the FIFO and got underflow. - in this case the QOS should be set as 13.06ms
FIFO threshold 1024 - DMA request will be triggered when 1024 slots are free in the FIFO - at that point we have still 256 words in the FIFO. - if the C wakeup latency is longer then what it takes to play out the samples from the FIFO (2.9ms), we will drain the FIFO and got underflow. - in this case the QOS should be 2.9ms
On other McBSPs with 128 word FIFO the required latency is shorter to ensure we don't drain the FIFO.
IMHO if the driver sets the PM_QOS, it should set it in a way it is needing it and not to work around system issues.
I don't know the PM code that well, but is there a way to set attribute to a device to tell how deep C state it can tolerate on the given board or SoC?
* Peter Ujfalusi peter.ujfalusi@ti.com [160902 12:39]:
On 09/02/2016 05:56 PM, Tony Lindgren wrote:
That's because the hardware or a timer triggers the next dma automagically so we don't need to do anything.
McBSP is triggering thee DMA request. In case of non OMAP3.McBSP2 the time between them is shorter than 1.45ms. as the maximum FIFO size is 128. But in your case you also have threshold of 128, which means that we have DMA requests coming in every 1.45ms. In theory no audio should be working on OMAP3 if we can hit C7 runtime.
I agree C7 audio playback should only work if the external audio chip is buffering. And there needs to be a wakeup event from the external audio chip based on the external audio fifo threshold to wake up the SoC and queue up more data. Or a hrtimer in the external audio chip that wakes up the SoC and McBSP.
Sure, we will not going to be able to hit C6 with this based on the numbers we have in cpuidle34xx.c, but I have no view on how those numbers were calculated...
Right, but we don't need to block C6 because of the hardware and/or Linux timers doing things for us :)
But the correct QoS latency for McBSP is the one which would ensure that the FIFO will be not drained. This is the whole point of PM_QOS. And that is: (1000/sampling_rate) * ((FIFOsize - threshold) / channels) In case of playback
On OMAP3.McBSP2 for example (44.1KHz, stereo): FIFO threshold 128
- DMA request will be triggered when 128 slots are free in the FIFO
- at that point we have still 1152 words in the FIFO.
- if the C wakeup latency is longer then what it takes to play out the
samples from the FIFO (13.06ms), we will drain the FIFO and got underflow.
- in this case the QOS should be set as 13.06ms
FIFO threshold 1024
- DMA request will be triggered when 1024 slots are free in the FIFO
- at that point we have still 256 words in the FIFO.
- if the C wakeup latency is longer then what it takes to play out the
samples from the FIFO (2.9ms), we will drain the FIFO and got underflow.
- in this case the QOS should be 2.9ms
On other McBSPs with 128 word FIFO the required latency is shorter to ensure we don't drain the FIFO.
IMHO if the driver sets the PM_QOS, it should set it in a way it is needing it and not to work around system issues.
Hmm well actually with the fifo threshold your calculations above start making some sense. The missing piece is that we will never hit off mode until DMA is done. So the true worst case would be:
- 2.9 - 13.06 ms for the fifo threshold minus dma setup time - plus the time it takes to complete the fifo fill with dma as dma will keep the SoC active - plus the time it takes to idle the dma after the last fifo fill as the dma will keep SoC active
The dma setup, complete, and idle latencies here can be probably be measured with hrtimer :) That still does not explain we don't miss anything in retention idle currently.
I don't know the PM code that well, but is there a way to set attribute to a device to tell how deep C state it can tolerate on the given board or SoC?
I believe we can only set the pm_qos latency requirements and there is no direct limiting of C states. Then I think the idea of dev_pm_qos and set_latency_tolerance callback is that it allows the SoC specific code to select the allowed C states. So if we implemented set_latency_tolerance in the cpuidle driver, we could tinker directly with the C states for latencies.
Regards,
Tony
On 09/02/16 23:40, Tony Lindgren wrote:
- Peter Ujfalusi peter.ujfalusi@ti.com [160902 12:39]:
On 09/02/2016 05:56 PM, Tony Lindgren wrote:
That's because the hardware or a timer triggers the next dma automagically so we don't need to do anything.
McBSP is triggering thee DMA request. In case of non OMAP3.McBSP2 the time between them is shorter than 1.45ms. as the maximum FIFO size is 128. But in your case you also have threshold of 128, which means that we have DMA requests coming in every 1.45ms. In theory no audio should be working on OMAP3 if we can hit C7 runtime.
I agree C7 audio playback should only work if the external audio chip is buffering. And there needs to be a wakeup event from the external audio chip based on the external audio fifo threshold to wake up the SoC and queue up more data. Or a hrtimer in the external audio chip that wakes up the SoC and McBSP.
The only C state we can not take is where the McBSP would need context save/restore... In that case the McBSP FIFO will be lost and there is no way to recover that. If the external chip has FIFO we could try to somehow make sure that at the end of a codec FIFO fill the McBSP FIFO is empty. I don't think it is possible at all. Another way is to check the amount of data in the FIFO before off and with CPU try to write back the data from the ALSA buffer - from DMA point we step back and read the data... But if it is in period boundary, the previous period might be already updated -> corrupted audio. The sDMA also have internal FIFO which holds unknown amount of data -> audio corruption, etc.
If the codec does not have FIFO, we can not hit a C state where McBSP would be off at all.
Sure, we will not going to be able to hit C6 with this based on the numbers we have in cpuidle34xx.c, but I have no view on how those numbers were calculated...
Right, but we don't need to block C6 because of the hardware and/or Linux timers doing things for us :)
But the correct QoS latency for McBSP is the one which would ensure that the FIFO will be not drained. This is the whole point of PM_QOS. And that is: (1000/sampling_rate) * ((FIFOsize - threshold) / channels) In case of playback
On OMAP3.McBSP2 for example (44.1KHz, stereo): FIFO threshold 128
- DMA request will be triggered when 128 slots are free in the FIFO
- at that point we have still 1152 words in the FIFO.
- if the C wakeup latency is longer then what it takes to play out the
samples from the FIFO (13.06ms), we will drain the FIFO and got underflow.
- in this case the QOS should be set as 13.06ms
FIFO threshold 1024
- DMA request will be triggered when 1024 slots are free in the FIFO
- at that point we have still 256 words in the FIFO.
- if the C wakeup latency is longer then what it takes to play out the
samples from the FIFO (2.9ms), we will drain the FIFO and got underflow.
- in this case the QOS should be 2.9ms
On other McBSPs with 128 word FIFO the required latency is shorter to ensure we don't drain the FIFO.
IMHO if the driver sets the PM_QOS, it should set it in a way it is needing it and not to work around system issues.
Hmm well actually with the fifo threshold your calculations above start making some sense. The missing piece is that we will never hit off mode until DMA is done. So the true worst case would be:
- 2.9 - 13.06 ms for the fifo threshold minus dma setup time
- plus the time it takes to complete the fifo fill with dma as dma will keep the SoC active
- plus the time it takes to idle the dma after the last fifo fill as the dma will keep SoC active
Yeah, I have done this calculations for the DADC33 mode7LP mode :D
While the calculation is correct for the wake latency requirements, there is a flip side: FIFO threshold 128: - wake threshold is ~13.06ms to ensure that we don't drain the FIFO - DMA requests are coming at 1.45ms rate.
While we could take a C state which would take ~13.06ms to leave, in reality the system will be busy to respond to the DMA request coming in every 1.45ms.
FIFO threshold is 1024: - wake threshold is ~2.9ms to ensure that we don't drain the FIFO - DMA requests are coming at 11.6ms rate.
In this case we only going to allow C state from which we can leave under 2.9ms, but between DMA burst we have 11.6ms. We could be in deeper state, but we are going to be woken up by the DMA request and after the DMA request we have ~2.9ms before the FIFO is empty...
So either we allow deeper C state (threshold 128) which we might not able to take to the 'fast' DMA requests, or we only allow shallow C state (threshold 1024), and have more time between the DMA requests.
Is this makes sense?
The dma setup, complete, and idle latencies here can be probably be measured with hrtimer :) That still does not explain we don't miss anything in retention idle currently.
I don't know the PM code that well, but is there a way to set attribute to a device to tell how deep C state it can tolerate on the given board or SoC?
I believe we can only set the pm_qos latency requirements and there is no direct limiting of C states. Then I think the idea of dev_pm_qos and set_latency_tolerance callback is that it allows the SoC specific code to select the allowed C states. So if we implemented set_latency_tolerance in the cpuidle driver, we could tinker directly with the C states for latencies.
What we can agree on is that OFF need to be blocked when audio is in use. But I'm not sure what is the correct way.
* Peter Ujfalusi peter.ujfalusi@ti.com [160905 00:53]:
On 09/02/16 23:40, Tony Lindgren wrote:
Hmm well actually with the fifo threshold your calculations above start making some sense. The missing piece is that we will never hit off mode until DMA is done. So the true worst case would be:
- 2.9 - 13.06 ms for the fifo threshold minus dma setup time
- plus the time it takes to complete the fifo fill with dma as dma will keep the SoC active
- plus the time it takes to idle the dma after the last fifo fill as the dma will keep SoC active
Yeah, I have done this calculations for the DADC33 mode7LP mode :D
While the calculation is correct for the wake latency requirements, there is a flip side: FIFO threshold 128:
- wake threshold is ~13.06ms to ensure that we don't drain the FIFO
- DMA requests are coming at 1.45ms rate.
While we could take a C state which would take ~13.06ms to leave, in reality the system will be busy to respond to the DMA request coming in every 1.45ms.
FIFO threshold is 1024:
- wake threshold is ~2.9ms to ensure that we don't drain the FIFO
- DMA requests are coming at 11.6ms rate.
In this case we only going to allow C state from which we can leave under 2.9ms, but between DMA burst we have 11.6ms. We could be in deeper state, but we are going to be woken up by the DMA request and after the DMA request we have ~2.9ms before the FIFO is empty...
So either we allow deeper C state (threshold 128) which we might not able to take to the 'fast' DMA requests, or we only allow shallow C state (threshold 1024), and have more time between the DMA requests.
Is this makes sense?
Makes sense. Do you have a formula or updated patch I can test here? Then we can add comments about the possible unaccounted latencies that can be worked out as needed.
The dma setup, complete, and idle latencies here can be probably be measured with hrtimer :) That still does not explain we don't miss anything in retention idle currently.
I don't know the PM code that well, but is there a way to set attribute to a device to tell how deep C state it can tolerate on the given board or SoC?
I believe we can only set the pm_qos latency requirements and there is no direct limiting of C states. Then I think the idea of dev_pm_qos and set_latency_tolerance callback is that it allows the SoC specific code to select the allowed C states. So if we implemented set_latency_tolerance in the cpuidle driver, we could tinker directly with the C states for latencies.
What we can agree on is that OFF need to be blocked when audio is in use. But I'm not sure what is the correct way.
AFAIK pm_qos is the only Linux generic way to block certain C states.
Regards,
Tony
On 09/06/2016 11:16 PM, Tony Lindgren wrote:
While the calculation is correct for the wake latency requirements, there is a flip side: FIFO threshold 128:
- wake threshold is ~13.06ms to ensure that we don't drain the FIFO
- DMA requests are coming at 1.45ms rate.
While we could take a C state which would take ~13.06ms to leave, in reality the system will be busy to respond to the DMA request coming in every 1.45ms.
FIFO threshold is 1024:
- wake threshold is ~2.9ms to ensure that we don't drain the FIFO
- DMA requests are coming at 11.6ms rate.
In this case we only going to allow C state from which we can leave under 2.9ms, but between DMA burst we have 11.6ms. We could be in deeper state, but we are going to be woken up by the DMA request and after the DMA request we have ~2.9ms before the FIFO is empty...
So either we allow deeper C state (threshold 128) which we might not able to take to the 'fast' DMA requests, or we only allow shallow C state (threshold 1024), and have more time between the DMA requests.
Is this makes sense?
Makes sense. Do you have a formula or updated patch I can test here? Then we can add comments about the possible unaccounted latencies that can be worked out as needed.
Not yet, but I'll try to come up with something in the coming days.
btw: I have tried the idle off on beagbeboard-xm with audio, but even w/o the qos it is not triggering. w/o audio I hit off but if audio is running, not.
btw2: if you set the qos to 30ms and set the mcbsp2 threshold to 1024 (or leave it as default) do you have audio glitches? I think if we hit C6 we should not be able to wake up in time...
The dma setup, complete, and idle latencies here can be probably be measured with hrtimer :) That still does not explain we don't miss anything in retention idle currently.
I don't know the PM code that well, but is there a way to set attribute to a device to tell how deep C state it can tolerate on the given board or SoC?
I believe we can only set the pm_qos latency requirements and there is no direct limiting of C states. Then I think the idea of dev_pm_qos and set_latency_tolerance callback is that it allows the SoC specific code to select the allowed C states. So if we implemented set_latency_tolerance in the cpuidle driver, we could tinker directly with the C states for latencies.
What we can agree on is that OFF need to be blocked when audio is in use. But I'm not sure what is the correct way.
AFAIK pm_qos is the only Linux generic way to block certain C states.
Regards,
Tony
* Peter Ujfalusi peter.ujfalusi@ti.com [160907 07:32]:
On 09/06/2016 11:16 PM, Tony Lindgren wrote:
Makes sense. Do you have a formula or updated patch I can test here? Then we can add comments about the possible unaccounted latencies that can be worked out as needed.
Not yet, but I'll try to come up with something in the coming days.
OK
btw: I have tried the idle off on beagbeboard-xm with audio, but even w/o the qos it is not triggering. w/o audio I hit off but if audio is running, not.
OK yeah figured it out you need to comment out the watch -n5 command as that keeps the serial console busy :) I was doing it over ssh earlier on the torpedo board.
btw2: if you set the qos to 30ms and set the mcbsp2 threshold to 1024 (or leave it as default) do you have audio glitches? I think if we hit C6 we should not be able to wake up in time...
Not that I've noticed at least with the default fifo size. How can I force the threshold to 1024? I tried changing pkt_size for omap_mcbsp_set_threshold but not getting any audio.
Regards,
Tony
On 09/08/16 06:53, Tony Lindgren wrote:
- Peter Ujfalusi peter.ujfalusi@ti.com [160907 07:32]:
On 09/06/2016 11:16 PM, Tony Lindgren wrote:
Makes sense. Do you have a formula or updated patch I can test here? Then we can add comments about the possible unaccounted latencies that can be worked out as needed.
Not yet, but I'll try to come up with something in the coming days.
OK
I managed to get time for this and it looks like that we only need to block core OFF for audio. I don't know why C6 ("MPU OFF + CORE RET") with it's 3000 + 8500 exit latency not causing issues when we have less time from the DMA request to empty FIFO...
My findings so far: w/o the QoS patch we will hit core OFF even if we are in element mode in McBSP: # cat /sys/devices/platform/68000000.ocp/49022000.mcbsp/dma_op_mode [element] threshold
If I start the playback we will hit core OFF even if the we have DMA request coming at every sample (every 0.02ms).
With the QoS set to 30ms and: # cat /sys/devices/platform/68000000.ocp/49022000.mcbsp/dma_op_mode element [threshold] # cat /sys/devices/platform/68000000.ocp/49022000.mcbsp/max_tx_thres 1264
# cat /sys/kernel/debug/pm_debug/count; aplay -Dplughw:0,0 media/music/nin-echoplex.wav; cat /sys/kernel/debug/pm_debug/count
core_pwrdm (ON),OFF:1885,RET:6777,INA:86320,ON:94983,RET-LOGIC-OFF:0,RET-MEMBANK1-OFF:0,RET-MEMBANK2-OFF:0 per_pwrdm (ON),OFF:2507,RET:1518,INA:0,ON:4026,RET-LOGIC-OFF:0,RET-MEMBANK1-OFF:0 mpu_pwrdm (ON),OFF:80588,RET:7876,INA:97390,ON:185855,RET-LOGIC-OFF:0,RET-MEMBANK1-OFF:0
Playing WAVE 'media/music/nin-echoplex.wav' : Signed 16 bit Little Endian, Rate 48000 Hz, Stereo
core_pwrdm (ON),OFF:1885,RET:7860,INA:107434,ON:117180,RET-LOGIC-OFF:0,RET-MEMBANK1-OFF:0,RET-MEMBANK2-OFF:0 per_pwrdm (ON),OFF:2507,RET:1518,INA:0,ON:4026,RET-LOGIC-OFF:0,RET-MEMBANK1-OFF:0 mpu_pwrdm (ON),OFF:103223,RET:8185,INA:97461,ON:208870,RET-LOGIC-OFF:0,RET-MEMBANK1-OFF:0
We hit C6, but when the DMA request comes we have 16 words in the FIFO (8 samples, 0.16ms of audio). C6 should take 11.5ms to exit...
btw: I have tried the idle off on beagbeboard-xm with audio, but even w/o the qos it is not triggering. w/o audio I hit off but if audio is running, not.
OK yeah figured it out you need to comment out the watch -n5 command as that keeps the serial console busy :) I was doing it over ssh earlier on the torpedo board.
btw2: if you set the qos to 30ms and set the mcbsp2 threshold to 1024 (or leave it as default) do you have audio glitches? I think if we hit C6 we should not be able to wake up in time...
Not that I've noticed at least with the default fifo size. How can I force the threshold to 1024? I tried changing pkt_size for omap_mcbsp_set_threshold but not getting any audio.
Based on my experiments the FIFO threshold does not matter as even if we should not be able to leave from a C state in time, we do leave from the state :o
The patch generates the following warning when the playback is stopped: [ 115.726531] [ 115.728149] ================================= [ 115.732727] [ INFO: inconsistent lock state ] [ 115.737304] 4.8.0-rc3-next-20160825-00064-g957cf3635c39-dirty #1450 Not tainted [ 115.744964] --------------------------------- [ 115.749542] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. [ 115.755859] swapper/0/0 [HC0[0]:SC1[1]:HE0:SE0] takes: [ 115.761230] ((&(&req->work)->work)){+.?...}, at: [<c0152fe0>] flush_work+0x0/0x260 [ 115.769348] {SOFTIRQ-ON-W} state was registered at: [ 115.774475] [<c015301c>] flush_work+0x3c/0x260 [ 115.779357] [<c0155b74>] __cancel_work_timer+0x90/0x1c0 [ 115.785034] [<c01999c8>] pm_qos_update_request+0x28/0x54 [ 115.790802] [<c01549ec>] process_one_work+0x1fc/0x770 [ 115.796295] [<c0154f98>] worker_thread+0x38/0x550 [ 115.801452] [<c015b0b8>] kthread+0xd4/0xf0 [ 115.805969] [<c0107910>] ret_from_fork+0x14/0x24 [ 115.811004] irq event stamp: 191097 [ 115.814666] hardirqs last enabled at (191096): [<c07f8fb8>] _raw_spin_unlock_irq+0x24/0x2c [ 115.823455] hardirqs last disabled at (191097): [<c0663f30>] _snd_pcm_stream_lock_irqsave+0x2c/0x40 [ 115.832977] softirqs last enabled at (191090): [<c013e5dc>] irq_enter+0x68/0x84 [ 115.840759] softirqs last disabled at (191091): [<c013e6b8>] irq_exit+0xc0/0x134 [ 115.848541] [ 115.848541] other info that might help us debug this: [ 115.855407] Possible unsafe locking scenario: [ 115.855407] [ 115.861602] CPU0 [ 115.864166] ---- [ 115.866760] lock((&(&req->work)->work)); [ 115.871063] <Interrupt> [ 115.873809] lock((&(&req->work)->work)); [ 115.878326] [ 115.878326] *** DEADLOCK *** [ 115.878326] [ 115.884552] 2 locks held by swapper/0/0: [ 115.888671] #0: (snd_pcm_link_rwlock){...-..}, at: [<c0663e9c>] snd_pcm_stream_lock+0x20/0x50 [ 115.897857] #1: (&(&substream->self_group.lock)->rlock){..-...}, at: [<c0663f38>] _snd_pcm_stream_lock_irqsave+0x34/0x40 [ 115.909515] [ 115.909515] stack backtrace: [ 115.914123] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.8.0-rc3-next-20160825-00064-g957cf3635c39-dirty #1450 [ 115.924530] Hardware name: Generic OMAP36xx (Flattened Device Tree) [ 115.931121] [<c0110128>] (unwind_backtrace) from [<c010c2c0>] (show_stack+0x10/0x14) [ 115.939270] [<c010c2c0>] (show_stack) from [<c048c120>] (dump_stack+0xb0/0xe4) [ 115.946868] [<c048c120>] (dump_stack) from [<c0192708>] (print_usage_bug+0x1e0/0x2d4) [ 115.955108] [<c0192708>] (print_usage_bug) from [<c0192d7c>] (mark_lock+0x580/0x6c4) [ 115.963256] [<c0192d7c>] (mark_lock) from [<c0193a38>] (__lock_acquire+0x5a8/0x187c) [ 115.971374] [<c0193a38>] (__lock_acquire) from [<c0195128>] (lock_acquire+0xd0/0x238) [ 115.979614] [<c0195128>] (lock_acquire) from [<c015301c>] (flush_work+0x3c/0x260) [ 115.987487] [<c015301c>] (flush_work) from [<c0155b74>] (__cancel_work_timer+0x90/0x1c0) [ 115.996002] [<c0155b74>] (__cancel_work_timer) from [<c0199ad8>] (pm_qos_remove_request+0x28/0x1c8) [ 116.005523] [<c0199ad8>] (pm_qos_remove_request) from [<c0684634>] (omap_mcbsp_dai_trigger+0x70/0x8c) [ 116.015197] [<c0684634>] (omap_mcbsp_dai_trigger) from [<c067baa8>] (soc_pcm_trigger+0xd0/0x11c) [ 116.024414] [<c067baa8>] (soc_pcm_trigger) from [<c0663be0>] (snd_pcm_do_stop+0x50/0x54) [ 116.032928] [<c0663be0>] (snd_pcm_do_stop) from [<c06639c4>] (snd_pcm_action_single+0x38/0x80) [ 116.041961] [<c06639c4>] (snd_pcm_action_single) from [<c066aa4c>] (snd_pcm_update_state+0xd0/0x100) [ 116.051574] [<c066aa4c>] (snd_pcm_update_state) from [<c066ac20>] (snd_pcm_update_hw_ptr0+0x1a4/0x3ac) [ 116.061340] [<c066ac20>] (snd_pcm_update_hw_ptr0) from [<c066ae9c>] (snd_pcm_period_elapsed+0x74/0xb4) [ 116.071136] [<c066ae9c>] (snd_pcm_period_elapsed) from [<c04e63f8>] (vchan_complete+0x190/0x1a4) [ 116.080352] [<c04e63f8>] (vchan_complete) from [<c013ee68>] (tasklet_action+0x88/0x128) [ 116.088775] [<c013ee68>] (tasklet_action) from [<c013dee0>] (__do_softirq+0xc4/0x55c) [ 116.096984] [<c013dee0>] (__do_softirq) from [<c013e6b8>] (irq_exit+0xc0/0x134) [ 116.104675] [<c013e6b8>] (irq_exit) from [<c01a17a8>] (__handle_domain_irq+0x6c/0xe0) [ 116.112884] [<c01a17a8>] (__handle_domain_irq) from [<c07f96b0>] (__irq_svc+0x70/0x98) [ 116.121215] [<c07f96b0>] (__irq_svc) from [<c0613808>] (cpuidle_enter_state+0x15c/0x464) [ 116.129699] [<c0613808>] (cpuidle_enter_state) from [<c0187788>] (cpu_startup_entry+0x2b8/0x3e0) [ 116.138946] [<c0187788>] (cpu_startup_entry) from [<c0b00c1c>] (start_kernel+0x340/0x3b4)
There is one more issue: if we have playback and capture running at the same time and you stop one of them. It will remove the QoS and the remaining stream might break.
On 09/08/16 13:41, Peter Ujfalusi wrote:
On 09/08/16 06:53, Tony Lindgren wrote:
- Peter Ujfalusi peter.ujfalusi@ti.com [160907 07:32]:
On 09/06/2016 11:16 PM, Tony Lindgren wrote:
Makes sense. Do you have a formula or updated patch I can test here? Then we can add comments about the possible unaccounted latencies that can be worked out as needed.
Not yet, but I'll try to come up with something in the coming days.
OK
I managed to get time for this and it looks like that we only need to block core OFF for audio. I don't know why C6 ("MPU OFF + CORE RET") with it's 3000
- 8500 exit latency not causing issues when we have less time from the DMA
request to empty FIFO...
My findings so far: w/o the QoS patch we will hit core OFF even if we are in element mode in McBSP: # cat /sys/devices/platform/68000000.ocp/49022000.mcbsp/dma_op_mode [element] threshold
If I start the playback we will hit core OFF even if the we have DMA request coming at every sample (every 0.02ms).
With the QoS set to 30ms and: # cat /sys/devices/platform/68000000.ocp/49022000.mcbsp/dma_op_mode element [threshold] # cat /sys/devices/platform/68000000.ocp/49022000.mcbsp/max_tx_thres 1264
# cat /sys/kernel/debug/pm_debug/count; aplay -Dplughw:0,0 media/music/nin-echoplex.wav; cat /sys/kernel/debug/pm_debug/count
core_pwrdm (ON),OFF:1885,RET:6777,INA:86320,ON:94983,RET-LOGIC-OFF:0,RET-MEMBANK1-OFF:0,RET-MEMBANK2-OFF:0 per_pwrdm (ON),OFF:2507,RET:1518,INA:0,ON:4026,RET-LOGIC-OFF:0,RET-MEMBANK1-OFF:0 mpu_pwrdm (ON),OFF:80588,RET:7876,INA:97390,ON:185855,RET-LOGIC-OFF:0,RET-MEMBANK1-OFF:0
Playing WAVE 'media/music/nin-echoplex.wav' : Signed 16 bit Little Endian, Rate 48000 Hz, Stereo
core_pwrdm (ON),OFF:1885,RET:7860,INA:107434,ON:117180,RET-LOGIC-OFF:0,RET-MEMBANK1-OFF:0,RET-MEMBANK2-OFF:0 per_pwrdm (ON),OFF:2507,RET:1518,INA:0,ON:4026,RET-LOGIC-OFF:0,RET-MEMBANK1-OFF:0 mpu_pwrdm (ON),OFF:103223,RET:8185,INA:97461,ON:208870,RET-LOGIC-OFF:0,RET-MEMBANK1-OFF:0
We hit C6, but when the DMA request comes we have 16 words in the FIFO (8 samples, 0.16ms of audio). C6 should take 11.5ms to exit...
btw: I have tried the idle off on beagbeboard-xm with audio, but even w/o the qos it is not triggering. w/o audio I hit off but if audio is running, not.
OK yeah figured it out you need to comment out the watch -n5 command as that keeps the serial console busy :) I was doing it over ssh earlier on the torpedo board.
btw2: if you set the qos to 30ms and set the mcbsp2 threshold to 1024 (or leave it as default) do you have audio glitches? I think if we hit C6 we should not be able to wake up in time...
Not that I've noticed at least with the default fifo size. How can I force the threshold to 1024? I tried changing pkt_size for omap_mcbsp_set_threshold but not getting any audio.
Based on my experiments the FIFO threshold does not matter as even if we should not be able to leave from a C state in time, we do leave from the state :o
The patch generates the following warning when the playback is stopped: [ 115.726531] [ 115.728149] ================================= [ 115.732727] [ INFO: inconsistent lock state ] [ 115.737304] 4.8.0-rc3-next-20160825-00064-g957cf3635c39-dirty #1450 Not tainted [ 115.744964] --------------------------------- [ 115.749542] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. [ 115.755859] swapper/0/0 [HC0[0]:SC1[1]:HE0:SE0] takes: [ 115.761230] ((&(&req->work)->work)){+.?...}, at: [<c0152fe0>] flush_work+0x0/0x260 [ 115.769348] {SOFTIRQ-ON-W} state was registered at: [ 115.774475] [<c015301c>] flush_work+0x3c/0x260 [ 115.779357] [<c0155b74>] __cancel_work_timer+0x90/0x1c0 [ 115.785034] [<c01999c8>] pm_qos_update_request+0x28/0x54 [ 115.790802] [<c01549ec>] process_one_work+0x1fc/0x770 [ 115.796295] [<c0154f98>] worker_thread+0x38/0x550 [ 115.801452] [<c015b0b8>] kthread+0xd4/0xf0 [ 115.805969] [<c0107910>] ret_from_fork+0x14/0x24 [ 115.811004] irq event stamp: 191097 [ 115.814666] hardirqs last enabled at (191096): [<c07f8fb8>] _raw_spin_unlock_irq+0x24/0x2c [ 115.823455] hardirqs last disabled at (191097): [<c0663f30>] _snd_pcm_stream_lock_irqsave+0x2c/0x40 [ 115.832977] softirqs last enabled at (191090): [<c013e5dc>] irq_enter+0x68/0x84 [ 115.840759] softirqs last disabled at (191091): [<c013e6b8>] irq_exit+0xc0/0x134 [ 115.848541] [ 115.848541] other info that might help us debug this: [ 115.855407] Possible unsafe locking scenario: [ 115.855407] [ 115.861602] CPU0 [ 115.864166] ---- [ 115.866760] lock((&(&req->work)->work)); [ 115.871063] <Interrupt> [ 115.873809] lock((&(&req->work)->work)); [ 115.878326] [ 115.878326] *** DEADLOCK *** [ 115.878326] [ 115.884552] 2 locks held by swapper/0/0: [ 115.888671] #0: (snd_pcm_link_rwlock){...-..}, at: [<c0663e9c>] snd_pcm_stream_lock+0x20/0x50 [ 115.897857] #1: (&(&substream->self_group.lock)->rlock){..-...}, at: [<c0663f38>] _snd_pcm_stream_lock_irqsave+0x34/0x40 [ 115.909515] [ 115.909515] stack backtrace: [ 115.914123] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.8.0-rc3-next-20160825-00064-g957cf3635c39-dirty #1450 [ 115.924530] Hardware name: Generic OMAP36xx (Flattened Device Tree) [ 115.931121] [<c0110128>] (unwind_backtrace) from [<c010c2c0>] (show_stack+0x10/0x14) [ 115.939270] [<c010c2c0>] (show_stack) from [<c048c120>] (dump_stack+0xb0/0xe4) [ 115.946868] [<c048c120>] (dump_stack) from [<c0192708>] (print_usage_bug+0x1e0/0x2d4) [ 115.955108] [<c0192708>] (print_usage_bug) from [<c0192d7c>] (mark_lock+0x580/0x6c4) [ 115.963256] [<c0192d7c>] (mark_lock) from [<c0193a38>] (__lock_acquire+0x5a8/0x187c) [ 115.971374] [<c0193a38>] (__lock_acquire) from [<c0195128>] (lock_acquire+0xd0/0x238) [ 115.979614] [<c0195128>] (lock_acquire) from [<c015301c>] (flush_work+0x3c/0x260) [ 115.987487] [<c015301c>] (flush_work) from [<c0155b74>] (__cancel_work_timer+0x90/0x1c0) [ 115.996002] [<c0155b74>] (__cancel_work_timer) from [<c0199ad8>] (pm_qos_remove_request+0x28/0x1c8) [ 116.005523] [<c0199ad8>] (pm_qos_remove_request) from [<c0684634>] (omap_mcbsp_dai_trigger+0x70/0x8c) [ 116.015197] [<c0684634>] (omap_mcbsp_dai_trigger) from [<c067baa8>] (soc_pcm_trigger+0xd0/0x11c) [ 116.024414] [<c067baa8>] (soc_pcm_trigger) from [<c0663be0>] (snd_pcm_do_stop+0x50/0x54) [ 116.032928] [<c0663be0>] (snd_pcm_do_stop) from [<c06639c4>] (snd_pcm_action_single+0x38/0x80) [ 116.041961] [<c06639c4>] (snd_pcm_action_single) from [<c066aa4c>] (snd_pcm_update_state+0xd0/0x100) [ 116.051574] [<c066aa4c>] (snd_pcm_update_state) from [<c066ac20>] (snd_pcm_update_hw_ptr0+0x1a4/0x3ac) [ 116.061340] [<c066ac20>] (snd_pcm_update_hw_ptr0) from [<c066ae9c>] (snd_pcm_period_elapsed+0x74/0xb4) [ 116.071136] [<c066ae9c>] (snd_pcm_period_elapsed) from [<c04e63f8>] (vchan_complete+0x190/0x1a4) [ 116.080352] [<c04e63f8>] (vchan_complete) from [<c013ee68>] (tasklet_action+0x88/0x128) [ 116.088775] [<c013ee68>] (tasklet_action) from [<c013dee0>] (__do_softirq+0xc4/0x55c) [ 116.096984] [<c013dee0>] (__do_softirq) from [<c013e6b8>] (irq_exit+0xc0/0x134) [ 116.104675] [<c013e6b8>] (irq_exit) from [<c01a17a8>] (__handle_domain_irq+0x6c/0xe0) [ 116.112884] [<c01a17a8>] (__handle_domain_irq) from [<c07f96b0>] (__irq_svc+0x70/0x98) [ 116.121215] [<c07f96b0>] (__irq_svc) from [<c0613808>] (cpuidle_enter_state+0x15c/0x464) [ 116.129699] [<c0613808>] (cpuidle_enter_state) from [<c0187788>] (cpu_startup_entry+0x2b8/0x3e0) [ 116.138946] [<c0187788>] (cpu_startup_entry) from [<c0b00c1c>] (start_kernel+0x340/0x3b4)
There is one more issue: if we have playback and capture running at the same time and you stop one of them. It will remove the QoS and the remaining stream might break.
it is better to place the QoS in omap_mcbsp_request() and remove it in omap_mcbsp_free(). This way we retain the QoS for the time the McBSP is in active use.
* Peter Ujfalusi peter.ujfalusi@ti.com [160908 03:49]:
I managed to get time for this and it looks like that we only need to block core OFF for audio. I don't know why C6 ("MPU OFF + CORE RET") with it's 3000
- 8500 exit latency not causing issues when we have less time from the DMA
request to empty FIFO...
Right, the wakeup event should be the fifo threshold and the dma trigger line.. But there must some other thing we're not accounting for.
My findings so far: w/o the QoS patch we will hit core OFF even if we are in element mode in McBSP: # cat /sys/devices/platform/68000000.ocp/49022000.mcbsp/dma_op_mode [element] threshold
If I start the playback we will hit core OFF even if the we have DMA request coming at every sample (every 0.02ms).
I think that's because we still have windows without pm qos where dma is not active and we hit idle and there's nothing in hardware blocking off mode.
We hit C6, but when the DMA request comes we have 16 words in the FIFO (8 samples, 0.16ms of audio). C6 should take 11.5ms to exit...
So maybe we have the C6 latency for 36xx wrong, and it's much faster than for 34xx? I wonder what happens on the original beagleboard, not the xm variant.. I don't have one though.
Based on my experiments the FIFO threshold does not matter as even if we should not be able to leave from a C state in time, we do leave from the state :o
Yes that's a mystery :)
The patch generates the following warning when the playback is stopped:
...
[ 115.996002] [<c0155b74>] (__cancel_work_timer) from [<c0199ad8>] (pm_qos_remove_request+0x28/0x1c8) [ 116.005523] [<c0199ad8>] (pm_qos_remove_request) from [<c0684634>] (omap_mcbsp_dai_trigger+0x70/0x8c) [ 116.015197] [<c0684634>] (omap_mcbsp_dai_trigger) from [<c067baa8>] (soc_pcm_trigger+0xd0/0x11c) [ 116.024414] [<c067baa8>] (soc_pcm_trigger) from [<c0663be0>] (snd_pcm_do_stop+0x50/0x54) [ 116.032928] [<c0663be0>] (snd_pcm_do_stop) from [<c06639c4>] (snd_pcm_action_single+0x38/0x80)
Hmm OK I wonder why have not seen that one, I've been just hitting ctrl-c to stop the playback.
There is one more issue: if we have playback and capture running at the same time and you stop one of them. It will remove the QoS and the remaining stream might break.
it is better to place the QoS in omap_mcbsp_request() and remove it in omap_mcbsp_free(). This way we retain the QoS for the time the McBSP is in active use.
OK makes sense.
Regards,
Tony
On 09/08/16 17:48, Tony Lindgren wrote:
My findings so far: w/o the QoS patch we will hit core OFF even if we are in element mode in McBSP: # cat /sys/devices/platform/68000000.ocp/49022000.mcbsp/dma_op_mode [element] threshold
If I start the playback we will hit core OFF even if the we have DMA request coming at every sample (every 0.02ms).
I think that's because we still have windows without pm qos where dma is not active and we hit idle and there's nothing in hardware blocking off mode.
Hrm. Should we block idle when the FIFO threshold is 'low'. I don't think it is a good thing to try to enter to lower C state to immediately jump out of it. With 0.02ms there is not much time we can spend conserving power and the enter/leave takes a bit more compared to staying wherever we were. ON the other hand, if we have big ALSA buffer we can keep the MPU off for a long time as we only need the DMA and the McBSP to run between user space filling the ALSA buffer...
Oh, I have looked again to cpuidle34xx.c and we have two tables there: omap3_idle_driver and omap3430_idle_driver.
The omap3430_idle_driver numbers are based on measurements while the omap3_idle_driver probably contains safe numbers?
In any way, according to the numbers: C7 is (7505 + 15274) vs (10000 + 30000) C6 is (7580 + 4134) vs (3000 + 8500) C5 is (855 + 1146) vs (2500 + 7500) C4 is (121 + 3374) vs (1500 + 1800) C3 is (107 + 410) vs (50 + 50)
with the 30ms QoS we set we will still hit OFF on OMAP3430, it should minimum 11.715ms for omap3430, but that will block C6 on omap36xx.
If we could have the data for the struct cpuidle_state coming from DT and not wired in the cpuidle34xx/44xx files then the McBSP driver could look up the C7 number and block that...
We hit C6, but when the DMA request comes we have 16 words in the FIFO (8 samples, 0.16ms of audio). C6 should take 11.5ms to exit...
So maybe we have the C6 latency for 36xx wrong, and it's much faster than for 34xx? I wonder what happens on the original beagleboard, not the xm variant.. I don't have one though.
I have one working omap3 zoom2 board. AFAIK it has OMAP3430. I have 3.9.smthing kernel booting on it, but I'm not sure if mainline would even work. Zoom3 and Zoom2 are mostly identical, but the SoC is different for sure and probably memories, NAND, something else on the SOM? Can not find information on this.
Based on my experiments the FIFO threshold does not matter as even if we should not be able to leave from a C state in time, we do leave from the state :o
Yes that's a mystery :)
The patch generates the following warning when the playback is stopped:
...
[ 115.996002] [<c0155b74>] (__cancel_work_timer) from [<c0199ad8>] (pm_qos_remove_request+0x28/0x1c8) [ 116.005523] [<c0199ad8>] (pm_qos_remove_request) from [<c0684634>] (omap_mcbsp_dai_trigger+0x70/0x8c) [ 116.015197] [<c0684634>] (omap_mcbsp_dai_trigger) from [<c067baa8>] (soc_pcm_trigger+0xd0/0x11c) [ 116.024414] [<c067baa8>] (soc_pcm_trigger) from [<c0663be0>] (snd_pcm_do_stop+0x50/0x54) [ 116.032928] [<c0663be0>] (snd_pcm_do_stop) from [<c06639c4>] (snd_pcm_action_single+0x38/0x80)
Hmm OK I wonder why have not seen that one, I've been just hitting ctrl-c to stop the playback.
I also stopped the playback with ctrl-c :o
There is one more issue: if we have playback and capture running at the same time and you stop one of them. It will remove the QoS and the remaining stream might break.
it is better to place the QoS in omap_mcbsp_request() and remove it in omap_mcbsp_free(). This way we retain the QoS for the time the McBSP is in active use.
OK makes sense.
Regards,
Tony
* Peter Ujfalusi peter.ujfalusi@ti.com [160908 23:52]:
On 09/08/16 17:48, Tony Lindgren wrote:
My findings so far: w/o the QoS patch we will hit core OFF even if we are in element mode in McBSP: # cat /sys/devices/platform/68000000.ocp/49022000.mcbsp/dma_op_mode [element] threshold
If I start the playback we will hit core OFF even if the we have DMA request coming at every sample (every 0.02ms).
I think that's because we still have windows without pm qos where dma is not active and we hit idle and there's nothing in hardware blocking off mode.
Hrm. Should we block idle when the FIFO threshold is 'low'. I don't think it is a good thing to try to enter to lower C state to immediately jump out of it. With 0.02ms there is not much time we can spend conserving power and the enter/leave takes a bit more compared to staying wherever we were. ON the other hand, if we have big ALSA buffer we can keep the MPU off for a long time as we only need the DMA and the McBSP to run between user space filling the ALSA buffer...
Oh, I have looked again to cpuidle34xx.c and we have two tables there: omap3_idle_driver and omap3430_idle_driver.
The omap3430_idle_driver numbers are based on measurements while the omap3_idle_driver probably contains safe numbers?
In any way, according to the numbers: C7 is (7505 + 15274) vs (10000 + 30000) C6 is (7580 + 4134) vs (3000 + 8500) C5 is (855 + 1146) vs (2500 + 7500) C4 is (121 + 3374) vs (1500 + 1800) C3 is (107 + 410) vs (50 + 50)
with the 30ms QoS we set we will still hit OFF on OMAP3430, it should minimum 11.715ms for omap3430, but that will block C6 on omap36xx.
Yeah those don't seem to be correct. The Nokia N9(50) kernel seems to have some measured numbers for 36xx.
If we could have the data for the struct cpuidle_state coming from DT and not wired in the cpuidle34xx/44xx files then the McBSP driver could look up the C7 number and block that...
I'm now thinking that your fifo threshold calculation is what we should do, then fix the cpuidle latencies and playback should hit retention idle.
We hit C6, but when the DMA request comes we have 16 words in the FIFO (8 samples, 0.16ms of audio). C6 should take 11.5ms to exit...
So maybe we have the C6 latency for 36xx wrong, and it's much faster than for 34xx? I wonder what happens on the original beagleboard, not the xm variant.. I don't have one though.
I have one working omap3 zoom2 board. AFAIK it has OMAP3430. I have 3.9.smthing kernel booting on it, but I'm not sure if mainline would even work. Zoom3 and Zoom2 are mostly identical, but the SoC is different for sure and probably memories, NAND, something else on the SOM? Can not find information on this.
I would not bother with that one.. Chances are it has an older 3430 SoC that does not properly support off mode. We should be able to use n900 for that test :)
Hmm OK I wonder why have not seen that one, I've been just hitting ctrl-c to stop the playback.
I also stopped the playback with ctrl-c :o
Hmm I need to recheck.
Regards,
Tony
On 09/10/16 02:45, Tony Lindgren wrote:
In any way, according to the numbers: C7 is (7505 + 15274) vs (10000 + 30000) C6 is (7580 + 4134) vs (3000 + 8500) C5 is (855 + 1146) vs (2500 + 7500) C4 is (121 + 3374) vs (1500 + 1800) C3 is (107 + 410) vs (50 + 50)
with the 30ms QoS we set we will still hit OFF on OMAP3430, it should minimum 11.715ms for omap3430, but that will block C6 on omap36xx.
Yeah those don't seem to be correct. The Nokia N9(50) kernel seems to have some measured numbers for 36xx.
True, probably we should give those numbers a try? It looks like they have C1...C8 compared to mainline which have C1...C7.
If we could have the data for the struct cpuidle_state coming from DT and not wired in the cpuidle34xx/44xx files then the McBSP driver could look up the C7 number and block that...
I'm now thinking that your fifo threshold calculation is what we should do, then fix the cpuidle latencies and playback should hit retention idle.
Right. So the QoS should be set time(FIFOsize - FIFOthreshold) - margin? What margin we should use? The DMA does not need setup time as it will just continue where it stopped, so probably we can ignore the margin and use the number we got from the FIFO use? During hw_param we can calculate this, but we need to consider on more thing: we need to make sure that the QoS we place covers the capture and the playback as well, so we need to use the worst case number. the FIFO threshold can be different for capture and playback.
* Peter Ujfalusi peter.ujfalusi@ti.com [160913 04:45]:
On 09/10/16 02:45, Tony Lindgren wrote:
In any way, according to the numbers: C7 is (7505 + 15274) vs (10000 + 30000) C6 is (7580 + 4134) vs (3000 + 8500) C5 is (855 + 1146) vs (2500 + 7500) C4 is (121 + 3374) vs (1500 + 1800) C3 is (107 + 410) vs (50 + 50)
with the 30ms QoS we set we will still hit OFF on OMAP3430, it should minimum 11.715ms for omap3430, but that will block C6 on omap36xx.
Yeah those don't seem to be correct. The Nokia N9(50) kernel seems to have some measured numbers for 36xx.
True, probably we should give those numbers a try? It looks like they have C1...C8 compared to mainline which have C1...C7.
Well some of those numbers are based on OSWR (Open SWitch Retention), not sure if that works properly with mainline. Worth trying though.
If we could have the data for the struct cpuidle_state coming from DT and not wired in the cpuidle34xx/44xx files then the McBSP driver could look up the C7 number and block that...
I'm now thinking that your fifo threshold calculation is what we should do, then fix the cpuidle latencies and playback should hit retention idle.
Right. So the QoS should be set time(FIFOsize - FIFOthreshold) - margin? What margin we should use? The DMA does not need setup time as it will just continue where it stopped, so probably we can ignore the margin and use the number we got from the FIFO use?
Yeah seems safe assuming the mystery numbers are wrong C state latencies :)
During hw_param we can calculate this, but we need to consider on more thing: we need to make sure that the QoS we place covers the capture and the playback as well, so we need to use the worst case number. the FIFO threshold can be different for capture and playback.
OK makes sense to me.
Regards,
Tony
Hi!
When the FIFO is set to 128, it means that after the initial FIFO fill we will have DMA request coming from McBSP to sDMA with a rate of:
(1000/sampling_rate) * (FIFO-threshold / channels) = DMA_req_distance_in_ms
So in case of 44.1KHz, stereo with 128 FIFO threshold DMA request will come at every 1.45ms. If I'm not mistaken. The whole FIFO (1280) holds 14.51ms of audio in this case.
I don't see this correlate with the 30ms at all.
It seems we easily have a situation where DMA is done buffering to McBSP, and PMIC is playing audio, and we hit idle. At that point there are no immediate timers pending and cpuidle determines we can try to hit a deeper idle mode. As there are no hardware blockers with DMA off and McBSP not blocking, the hardware hits off mode. This cuts power to McBSP.
Ideally we'd configure McBSP activity to block deeper idle states in the hardware but I don't think we have such a configuration available.
I wonder why we have not seen this before? I can not recall anything like this with n900 (Jarkko might know that better) neither with n9/n950. On the n9/n950 I have even put the OMAP3 to OFF during audio playback with the
I was seeing something like that with 4.x kernel on Nokia N900.
Pavel
On 09/02/2016 06:54 PM, Pavel Machek wrote:
Hi!
When the FIFO is set to 128, it means that after the initial FIFO fill we will have DMA request coming from McBSP to sDMA with a rate of:
(1000/sampling_rate) * (FIFO-threshold / channels) = DMA_req_distance_in_ms
So in case of 44.1KHz, stereo with 128 FIFO threshold DMA request will come at every 1.45ms. If I'm not mistaken. The whole FIFO (1280) holds 14.51ms of audio in this case.
I don't see this correlate with the 30ms at all.
It seems we easily have a situation where DMA is done buffering to McBSP, and PMIC is playing audio, and we hit idle. At that point there are no immediate timers pending and cpuidle determines we can try to hit a deeper idle mode. As there are no hardware blockers with DMA off and McBSP not blocking, the hardware hits off mode. This cuts power to McBSP.
Ideally we'd configure McBSP activity to block deeper idle states in the hardware but I don't think we have such a configuration available.
I wonder why we have not seen this before? I can not recall anything like this with n900 (Jarkko might know that better) neither with n9/n950. On the n9/n950 I have even put the OMAP3 to OFF during audio playback with the
I was seeing something like that with 4.x kernel on Nokia N900.
So it must be possible to reproduce it on BeagleBoard-xm... I will try to get the PM working on it and let's see (hear).
* Peter Ujfalusi peter.ujfalusi@ti.com [160902 12:45]:
On 09/02/2016 06:54 PM, Pavel Machek wrote:
I wonder why we have not seen this before? I can not recall anything like this with n900 (Jarkko might know that better) neither with n9/n950. On the n9/n950 I have even put the OMAP3 to OFF during audio playback with the
I was seeing something like that with 4.x kernel on Nokia N900.
So it must be possible to reproduce it on BeagleBoard-xm... I will try to get the PM working on it and let's see (hear).
Yeah you can do that quite easily with current mainline or next with omap2plus_defconfig. Just use minimal modules and make sure *hci USB kernel modules are not loaded. So root on MMC, not NFS. The same goes for few other modules, at least hsi and isp used to block idle but I have not checked the recent status for those.
Then just configure uart timeouts and enable off idle, see the script below.
The playback test I use is just something like:
modprobe snd_soc_omap_mcbsp modprobe snd_soc_twl4030 modprobe snd_soc_omap_twl4030 alsactl restore watch -n5 grep core_pwrdm /sys/kernel/debug/pm_debug/count & #mpg123 test.mp3 aplay test.wav killall watch
Then you should see core_pwrdm ret counts increase during playback after maybe 10 or 20 seconds into playing.
Note that I've been mostly using my torpedo board as it has smsc911x on gpmc and allows off idle with NFSroot. Should be very similar configuration as on beagleboard xm.
Regards,
Tony
8< ------------------ #!/bin/bash
uarts=$(find /sys/class/tty/tty[SO]*/device/power/ -type d) for uart in $uarts; do echo 3000 > $uart/autosuspend_delay_ms 2>&1 done
uarts=$(find /sys/class/tty/tty[SO]*/power/ -type d 2>/dev/null) for uart in $uarts; do echo enabled > $uart/wakeup 2>&1 echo auto > $uart/control 2>&1 done
echo 1 > /sys/kernel/debug/pm_debug/enable_off_mode
Hi,
On Fri, Sep 02, 2016 at 01:09:23PM -0700, Tony Lindgren wrote:
So it must be possible to reproduce it on BeagleBoard-xm... I will try to get the PM working on it and let's see (hear).
Yeah you can do that quite easily with current mainline or next with omap2plus_defconfig. Just use minimal modules and make sure *hci USB kernel modules are not loaded. So root on MMC, not NFS. The same goes for few other modules, at least hsi and isp used to block idle but I have not checked the recent status for those.
hsi should be fixed since 9c99e5e51988, which is part of 4.8.
[...]
-- Sebastian
* Sebastian Reichel sre@kernel.org [160903 09:08]:
Hi,
On Fri, Sep 02, 2016 at 01:09:23PM -0700, Tony Lindgren wrote:
So it must be possible to reproduce it on BeagleBoard-xm... I will try to get the PM working on it and let's see (hear).
Yeah you can do that quite easily with current mainline or next with omap2plus_defconfig. Just use minimal modules and make sure *hci USB kernel modules are not loaded. So root on MMC, not NFS. The same goes for few other modules, at least hsi and isp used to block idle but I have not checked the recent status for those.
hsi should be fixed since 9c99e5e51988, which is part of 4.8.
OK thanks good to hear.
Regards,
Tony
participants (4)
-
Pavel Machek
-
Peter Ujfalusi
-
Sebastian Reichel
-
Tony Lindgren