[alsa-devel] [RFC PATCH 0/2] ASoC: Prevent the snow ASoC driver to spam due probe deferrals
Hello Mark,
Since v4.8 the Samsung snow ASoC driver needs more probe deferrals to wait for all the resources to be available before its probe succeeds.
But this makes the error messages printed in the deferral path to do a lot of spam in the kernel log. Slowing the boot process and even making some boot testing to fail due timeouts [0].
This is an RFC series to prevent this to happen by limiting to one-time messages in the case of deferrals. I know you mentioned in the past that the correct fix for this is to have a proper dependency handling and be able to probe in the correct order. But I'm sending as RFC in case you think this can be a workaround to avoid the log spamming, until we have support for that.
[0]: http://arm-soc.lixom.net/bootlogs/mainline/v4.8-14644-ge9f8f48/pi-arm-exynos...
Best regards, Javier
Javier Martinez Canillas (2): ASoC: core: Print one-time messages if missing resources when binding DAIs ASoC: samsung: Print a one-time message if the snow driver's probe defers
sound/soc/samsung/snow.c | 8 +++++++- sound/soc/soc-core.c | 13 +++++++------ 2 files changed, 14 insertions(+), 7 deletions(-)
If snd_soc_find_dai() doesn't find a DAI in soc_bind_dai_link(), a error message is printed and an -EPROBE_DEFER errno code returned to the caller.
But since many probe retries can happen before a resource is available, the printed messages can spam the kernel log buffer and slow the boot.
The information is useful to know that a dependency was not meet and a defer happened, but isn't necessary to print it on each probe deferral.
Signed-off-by: Javier Martinez Canillas javier@osg.samsung.com ---
sound/soc/soc-core.c | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/sound/soc/soc-core.c b/sound/soc/soc-core.c index c0bbcd903261..4679c07e0ff6 100644 --- a/sound/soc/soc-core.c +++ b/sound/soc/soc-core.c @@ -1013,8 +1013,8 @@ static int soc_bind_dai_link(struct snd_soc_card *card, cpu_dai_component.dai_name = dai_link->cpu_dai_name; rtd->cpu_dai = snd_soc_find_dai(&cpu_dai_component); if (!rtd->cpu_dai) { - dev_err(card->dev, "ASoC: CPU DAI %s not registered\n", - dai_link->cpu_dai_name); + dev_err_once(card->dev, "ASoC: CPU DAI %s not registered\n", + dai_link->cpu_dai_name); goto _err_defer; }
@@ -1025,8 +1025,9 @@ static int soc_bind_dai_link(struct snd_soc_card *card, for (i = 0; i < rtd->num_codecs; i++) { codec_dais[i] = snd_soc_find_dai(&codecs[i]); if (!codec_dais[i]) { - dev_err(card->dev, "ASoC: CODEC DAI %s not registered\n", - codecs[i].dai_name); + dev_err_once(card->dev, + "ASoC: CODEC DAI %s not registered\n", + codecs[i].dai_name); goto _err_defer; } } @@ -1054,8 +1055,8 @@ static int soc_bind_dai_link(struct snd_soc_card *card, rtd->platform = platform; } if (!rtd->platform) { - dev_err(card->dev, "ASoC: platform %s not registered\n", - dai_link->platform_name); + dev_err_once(card->dev, "ASoC: platform %s not registered\n", + dai_link->platform_name); goto _err_defer; }
On Wed, Oct 19, 2016 at 02:21:05PM -0300, Javier Martinez Canillas wrote:
If snd_soc_find_dai() doesn't find a DAI in soc_bind_dai_link(), a error message is printed and an -EPROBE_DEFER errno code returned to the caller.
But since many probe retries can happen before a resource is available, the printed messages can spam the kernel log buffer and slow the boot.
The information is useful to know that a dependency was not meet and a defer happened, but isn't necessary to print it on each probe deferral.
This then turns any subsequent failure into a silent one which isn't great for diagnostics. It'd be better to look into trying to avoid the retries in the first place, Raphael's dependency graph stuff looks very promising here.
Hello Mark,
On 10/24/2016 02:21 PM, Mark Brown wrote:
On Wed, Oct 19, 2016 at 02:21:05PM -0300, Javier Martinez Canillas wrote:
If snd_soc_find_dai() doesn't find a DAI in soc_bind_dai_link(), a error message is printed and an -EPROBE_DEFER errno code returned to the caller.
But since many probe retries can happen before a resource is available, the printed messages can spam the kernel log buffer and slow the boot.
The information is useful to know that a dependency was not meet and a defer happened, but isn't necessary to print it on each probe deferral.
This then turns any subsequent failure into a silent one which isn't great for diagnostics. It'd be better to look into trying to avoid the retries in the first place, Raphael's dependency graph stuff looks very promising here.
Ok, thanks for your feedback. I'll also investigate why the number of probes/retries increased that much from v4.8.
Best regards,
If the snd_soc_register_card() fails due a missing resource and the probe has to be deferred, the driver prints an error message.
But since many probe retries can happen before a resource is available, the printed messages can spam the kernel log buffer and slow the boot.
The information is useful to know that a dependency was not meet and a defer happened, but isn't necessary to print it on each probe deferral.
Signed-off-by: Javier Martinez Canillas javier@osg.samsung.com
---
sound/soc/samsung/snow.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/sound/soc/samsung/snow.c b/sound/soc/samsung/snow.c index d8ac907bbb0d..068bfb78a668 100644 --- a/sound/soc/samsung/snow.c +++ b/sound/soc/samsung/snow.c @@ -103,7 +103,13 @@ static int snow_probe(struct platform_device *pdev)
ret = devm_snd_soc_register_card(&pdev->dev, card); if (ret) { - dev_err(&pdev->dev, "snd_soc_register_card failed (%d)\n", ret); + if (ret == -EPROBE_DEFER) + dev_err_once(&pdev->dev, + "snd_soc_register_card deferred (%d)\n", + ret); + else + dev_err(&pdev->dev, + "snd_soc_register_card failed (%d)\n", ret); return ret; }
On Wed, Oct 19, 2016 at 02:21:06PM -0300, Javier Martinez Canillas wrote:
If the snd_soc_register_card() fails due a missing resource and the probe has to be deferred, the driver prints an error message.
But since many probe retries can happen before a resource is available, the printed messages can spam the kernel log buffer and slow the boot.
The information is useful to know that a dependency was not meet and a defer happened, but isn't necessary to print it on each probe deferral.
Signed-off-by: Javier Martinez Canillas javier@osg.samsung.com
sound/soc/samsung/snow.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/sound/soc/samsung/snow.c b/sound/soc/samsung/snow.c index d8ac907bbb0d..068bfb78a668 100644 --- a/sound/soc/samsung/snow.c +++ b/sound/soc/samsung/snow.c @@ -103,7 +103,13 @@ static int snow_probe(struct platform_device *pdev)
ret = devm_snd_soc_register_card(&pdev->dev, card); if (ret) {
dev_err(&pdev->dev, "snd_soc_register_card failed (%d)\n", ret);
if (ret == -EPROBE_DEFER)
dev_err_once(&pdev->dev,
"snd_soc_register_card deferred (%d)\n",
ret);
dev_warn_once? I understand you didn't want to change the logic behind this but this is not really an error condition. Probe deferral happens and one should not be worried seeing it once in 'dmesg -l err'.
Another point is now we would miss different error condition - infinite (or very long) probe deferral. I am not sure how useful it might be but theoretically seeing many deferrals is a sign of something to fix.
Best regards, Krzysztof
Hello Krzysztof,
Thanks a lot for your feedback.
On 10/19/2016 03:12 PM, Krzysztof Kozlowski wrote:
On Wed, Oct 19, 2016 at 02:21:06PM -0300, Javier Martinez Canillas wrote:
If the snd_soc_register_card() fails due a missing resource and the probe has to be deferred, the driver prints an error message.
But since many probe retries can happen before a resource is available, the printed messages can spam the kernel log buffer and slow the boot.
The information is useful to know that a dependency was not meet and a defer happened, but isn't necessary to print it on each probe deferral.
Signed-off-by: Javier Martinez Canillas javier@osg.samsung.com
sound/soc/samsung/snow.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/sound/soc/samsung/snow.c b/sound/soc/samsung/snow.c index d8ac907bbb0d..068bfb78a668 100644 --- a/sound/soc/samsung/snow.c +++ b/sound/soc/samsung/snow.c @@ -103,7 +103,13 @@ static int snow_probe(struct platform_device *pdev)
ret = devm_snd_soc_register_card(&pdev->dev, card); if (ret) {
dev_err(&pdev->dev, "snd_soc_register_card failed (%d)\n", ret);
if (ret == -EPROBE_DEFER)
dev_err_once(&pdev->dev,
"snd_soc_register_card deferred (%d)\n",
ret);
dev_warn_once? I understand you didn't want to change the logic behind this but this is not really an error condition. Probe deferral happens and one should not be worried seeing it once in 'dmesg -l err'.
Exactly, I even thought about doing that change (or even dev_dbg_once) but I didn't want to change the current logic.
Another point is now we would miss different error condition - infinite (or very long) probe deferral. I am not sure how useful it might be but theoretically seeing many deferrals is a sign of something to fix.
Yes, not sure how we can have both though. Maybe dev_{err,warn}_ratelimit is a good trade off?
Best regards, Krzysztof
Best regards,
On 10/19/2016 07:21 PM, Javier Martinez Canillas wrote:
If the snd_soc_register_card() fails due a missing resource and the probe has to be deferred, the driver prints an error message.
But since many probe retries can happen before a resource is available, the printed messages can spam the kernel log buffer and slow the boot.
Do you know why CPU DAI is not registered by the time snow machine driver gets probed? Did you try to debug the I2S driver? Does perhaps deferring happen due to some I2S clocks missing? DMA seems to be already initialized. The deferral loop looks suspicious, do you perhaps know what commits in v4.8 cause such changes in probing order? Did you start to use some drivers as loadable module?
As far as the error log is concerned, I would just not print anything in snow_probe() when register_card() returns EPROBE_DEFER.
Hello Sylwester,
Thanks a lot for your feedback.
On 10/20/2016 06:12 AM, Sylwester Nawrocki wrote:
On 10/19/2016 07:21 PM, Javier Martinez Canillas wrote:
If the snd_soc_register_card() fails due a missing resource and the probe has to be deferred, the driver prints an error message.
But since many probe retries can happen before a resource is available, the printed messages can spam the kernel log buffer and slow the boot.
Do you know why CPU DAI is not registered by the time snow machine driver gets probed? Did you try to debug the I2S driver? Does perhaps deferring happen due to some I2S clocks missing? DMA seems to be already initialized. The deferral loop looks suspicious, do you perhaps know what commits
It's suspicious indeed and I've been trying to figure out what changed the behavior in v4.8 but I haven't found it yet.
in v4.8 cause such changes in probing order? Did you start to use
I did a git bisect to find the first v4.8 commit that caused this but finished in a merge commit c9b95e5961c0 ("Merge tag 'sound-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound").
That branch alone works the same than v4.7 so it seems that is not a single commit but a combination from different branches that caused this on integration.
some drivers as loadable module?
I see no relevant changes in exynos_defconfig between v4.7..v4.8 and also no changes in drivers/Makefile that could cause things to be initialized on a different order.
But I thought the patches had merits on its own since probe deferral can make a driver probe many times and the error logs were noisy. I wasn't sure though and that's why are marked as RFC.
As far as the error log is concerned, I would just not print anything in snow_probe() when register_card() returns EPROBE_DEFER.
I believe it may be useful to know that a driver's probe is deferring due a missing dependency but have no strong opinion and can remove the message.
Best regards,
On 10/20/2016 12:41 PM, Javier Martinez Canillas wrote:
I see no relevant changes in exynos_defconfig between v4.7..v4.8 and also no changes in drivers/Makefile that could cause things to be initialized on a different order.
I remember this
commit 6eb1c9496b81680f2cd2e0eda06c531317e2e28d clk: probe common clock drivers earlier
going in recently, but it's rather dubious it could cause such trouble.
Anyway, I'd try to add some debug prints to samsung_i2s_probe() to see what's the issue with the CPU DAI registration.
But I thought the patches had merits on its own since probe deferral can make a driver probe many times and the error logs were noisy. I wasn't sure though and that's why are marked as RFC.
In general I wouldn't be disabling those err logs unless proper EPROBE_DEFER handling is added on related error paths and we can differentiate between probe deferral and real unrecoverable errors and can disable logging only for EPROBE_DEFER cases.
As far as the error log is concerned, I would just not print anything in snow_probe() when register_card() returns EPROBE_DEFER.
I believe it may be useful to know that a driver's probe is deferring due a missing dependency but have no strong opinion and can remove the message.
I'd rather rely on core code to inform about missing resources when registering components. Otherwise booting unnecessarily takes more time when there is more probe deferring logs printed on the console.
-- Thanks, Sylwester
Hello Sylwester,
On 10/20/2016 08:27 AM, Sylwester Nawrocki wrote:
On 10/20/2016 12:41 PM, Javier Martinez Canillas wrote:
I see no relevant changes in exynos_defconfig between v4.7..v4.8 and also no changes in drivers/Makefile that could cause things to be initialized on a different order.
I remember this
commit 6eb1c9496b81680f2cd2e0eda06c531317e2e28d clk: probe common clock drivers earlier
going in recently, but it's rather dubious it could cause such trouble.
Yes, I'm aware of this change (and in fact it broke MMC in the Peach Pi Chromebook) but that commit landed in v4.9-rc1, not v4.8.
Anyway, I'd try to add some debug prints to samsung_i2s_probe() to see what's the issue with the CPU DAI registration.
Sure, I'm busy with other stuff now but I'll dig again this next week.
But I thought the patches had merits on its own since probe deferral can make a driver probe many times and the error logs were noisy. I wasn't sure though and that's why are marked as RFC.
In general I wouldn't be disabling those err logs unless proper EPROBE_DEFER handling is added on related error paths and we can differentiate between probe deferral and real unrecoverable errors and can disable logging only for EPROBE_DEFER cases.
Yes, the sound core change (patch 1/2) is only for the EPROBE_DEFER path.
As far as the error log is concerned, I would just not print anything in snow_probe() when register_card() returns EPROBE_DEFER.
I believe it may be useful to know that a driver's probe is deferring due a missing dependency but have no strong opinion and can remove the message.
I'd rather rely on core code to inform about missing resources when registering components. Otherwise booting unnecessarily takes more time when there is more probe deferring logs printed on the console.
Fair enough.
-- Thanks, Sylwester
Best regards,
+Marek,
Hello,
On 10/20/2016 08:27 AM, Sylwester Nawrocki wrote:
On 10/20/2016 12:41 PM, Javier Martinez Canillas wrote:
I see no relevant changes in exynos_defconfig between v4.7..v4.8 and also no changes in drivers/Makefile that could cause things to be initialized on a different order.
I remember this
commit 6eb1c9496b81680f2cd2e0eda06c531317e2e28d clk: probe common clock drivers earlier
going in recently, but it's rather dubious it could cause such trouble.
Anyway, I'd try to add some debug prints to samsung_i2s_probe() to see what's the issue with the CPU DAI registration.
fyi, Marek already fixed this issue on patch "ASoC: samsung: get access to DMA engine early to defer probe properly":
https://patchwork.kernel.org/patch/9399613/
Best regards,
On 10/27/2016 05:38 PM, Javier Martinez Canillas wrote:
+Marek,
Hello,
On 10/20/2016 08:27 AM, Sylwester Nawrocki wrote:
On 10/20/2016 12:41 PM, Javier Martinez Canillas wrote:
I see no relevant changes in exynos_defconfig between v4.7..v4.8 and also no changes in drivers/Makefile that could cause things to be initialized on a different order.
I remember this
commit 6eb1c9496b81680f2cd2e0eda06c531317e2e28d clk: probe common clock drivers earlier
going in recently, but it's rather dubious it could cause such trouble.
Anyway, I'd try to add some debug prints to samsung_i2s_probe() to see what's the issue with the CPU DAI registration.
fyi, Marek already fixed this issue on patch "ASoC: samsung: get access to DMA engine early to defer probe properly":
Yeah, we have spent some time together today investigating this issue.
Hello Sylwester,
On 10/27/2016 12:48 PM, Sylwester Nawrocki wrote:
On 10/27/2016 05:38 PM, Javier Martinez Canillas wrote:
+Marek,
Hello,
On 10/20/2016 08:27 AM, Sylwester Nawrocki wrote:
On 10/20/2016 12:41 PM, Javier Martinez Canillas wrote:
I see no relevant changes in exynos_defconfig between v4.7..v4.8 and also no changes in drivers/Makefile that could cause things to be initialized on a different order.
I remember this
commit 6eb1c9496b81680f2cd2e0eda06c531317e2e28d clk: probe common clock drivers earlier
going in recently, but it's rather dubious it could cause such trouble.
Anyway, I'd try to add some debug prints to samsung_i2s_probe() to see what's the issue with the CPU DAI registration.
fyi, Marek already fixed this issue on patch "ASoC: samsung: get access to DMA engine early to defer probe properly":
Yeah, we have spent some time together today investigating this issue.
Thanks a lot. I thought you would be aware but wanted to mention it on this thread for completeness.
Best regards,
participants (4)
-
Javier Martinez Canillas
-
Krzysztof Kozlowski
-
Mark Brown
-
Sylwester Nawrocki