[alsa-devel] [PATCH v1] ASoC: tlv320aic32x4: handle regmap_read error gracefully
Fixes:
[ 5.169310] Division by zero in kernel. [ 5.200998] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.3.18-20191021-1+ #14 [ 5.203049] cdc_acm 2-1.6:1.0: ttyACM0: USB ACM device [ 5.208198] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) [ 5.220084] Backtrace: [ 5.222628] [<8010f60c>] (dump_backtrace) from [<8010f9a8>] (show_stack+0x20/0x24) [ 5.230348] r7:810a1c6c r6:00000000 r5:60000013 r4:810a1c6c [ 5.236097] [<8010f988>] (show_stack) from [<809e06a0>] (dump_stack+0xac/0xd8) [ 5.243469] [<809e05f4>] (dump_stack) from [<8010f780>] (__div0+0x24/0x28) [ 5.250431] r9:8111adc8 r8:ae631180 r7:aebd27c0 r6:ae631e40 r5:00000000 r4:81006508 [ 5.258325] [<8010f75c>] (__div0) from [<809de7ac>] (Ldiv0+0x8/0x10) [ 5.264841] [<8085c7e0>] (clk_aic32x4_div_recalc_rate) from [<805ba70c>] (__clk_register+0x2f8/0x7e4) [ 5.274141] r5:80dd065c r4:ae6bd480 [ 5.277869] [<805ba414>] (__clk_register) from [<805bace0>] (devm_clk_register+0x58/0x8c) [ 5.286130] r10:81006508 r9:810946d4 r8:00000000 r7:ae8de1c0 r6:ae631ac0 r5:ae631e40 [ 5.294103] r4:ae8d8020 [ 5.296724] [<805bac88>] (devm_clk_register) from [<8085cea8>] (aic32x4_register_clocks+0x120/0x14c) [ 5.306004] r7:ae8de1c0 r6:ae8d8020 r5:ae631e40 r4:810946c0 [ 5.311818] [<8085cd88>] (aic32x4_register_clocks) from [<8085bf60>] (aic32x4_probe+0x94/0x468) [ 5.320602] r10:81094730 r9:00000000 r8:af361fc0 r7:bfd6d040 r6:00000000 r5:ae8d8020 [ 5.328574] r4:af361e40 [ 5.331195] [<8085becc>] (aic32x4_probe) from [<8085cf60>] (aic32x4_i2c_probe+0x6c/0x88) [ 5.339434] r8:00000000 r7:ae8d8000 r6:81094730 r5:ae8d8000 r4:81006508 [ 5.346288] [<8085cef4>] (aic32x4_i2c_probe) from [<807554b0>] (i2c_device_probe+0x2ac/0x2f0) [ 5.354894] r5:8085cef4 r4:ae8d8020 [ 5.358625] [<80755204>] (i2c_device_probe) from [<80678e34>] (really_probe+0x11c/0x428) [ 5.366802] r9:00000000 r8:810b3e78 r7:00000000 r6:8111e020 r5:ae8d8020 r4:8111e01c [ 5.374694] [<80678d18>] (really_probe) from [<80679388>] (driver_probe_device+0x88/0x1e0) [ 5.383106] r10:80f63860 r9:ffffe000 r8:ffffe000 r7:80679794 r6:81094730 r5:81094730 [ 5.391080] r4:ae8d8020 [ 5.393702] [<80679300>] (driver_probe_device) from [<8067978c>] (device_driver_attach+0x68/0x70) [ 5.402724] r9:ffffe000 r8:ffffe000 r7:80679794 r6:81094730 r5:00000000 r4:ae8d8020 [ 5.410555] [<80679724>] (device_driver_attach) from [<80679858>] (__driver_attach+0xc4/0x164) [ 5.419313] r7:80679794 r6:ae8d8020 r5:81094730 r4:00000000 [ 5.425123] [<80679794>] (__driver_attach) from [<80676a14>] (bus_for_each_dev+0x84/0xc4) [ 5.433384] r7:80679794 r6:81094730 r5:81006508 r4:ae8dc0c0 [ 5.439192] [<80676990>] (bus_for_each_dev) from [<80678668>] (driver_attach+0x2c/0x30) [ 5.447279] r7:00000000 r6:af361500 r5:8107fd94 r4:81094730 [ 5.453087] [<8067863c>] (driver_attach) from [<80677fc4>] (bus_add_driver+0x1d0/0x210) [ 5.461240] [<80677df4>] (bus_add_driver) from [<80679f34>] (driver_register+0x84/0x118) [ 5.469414] r7:00000000 r6:80f4ac9c r5:81006508 r4:81094730 [ 5.475224] [<80679eb0>] (driver_register) from [<80755dfc>] (i2c_register_driver+0x4c/0xb8) [ 5.483807] r5:81006508 r4:81094714 [ 5.487472] [<80755db0>] (i2c_register_driver) from [<80f4acc0>] (aic32x4_i2c_driver_init+0x24/0x28) [ 5.496750] r5:81006508 r4:810a7180 [ 5.500415] [<80f4ac9c>] (aic32x4_i2c_driver_init) from [<80103288>] (do_one_initcall+0x64/0x2d0) [ 5.509442] [<80103224>] (do_one_initcall) from [<80f014a8>] (kernel_init_freeable+0x300/0x390) [ 5.518287] r8:810c7300 r7:810c7300 r6:00000007 r5:80f920c4 r4:80f63840 [ 5.525079] [<80f011a8>] (kernel_init_freeable) from [<809f892c>] (kernel_init+0x18/0x124) [ 5.533490] r10:00000000 r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:809f8914 [ 5.541461] r4:00000000 [ 5.544084] [<809f8914>] (kernel_init) from [<801010b4>] (ret_from_fork+0x14/0x20) [ 5.551800] Exception stack(0xaf115fb0 to 0xaf115ff8) [ 5.556935] 5fa0: 00000000 00000000 00000000 00000000 [ 5.565262] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 5.573522] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000 [ 5.580283] r5:809f8914 r4:00000000
Signed-off-by: Peter Seiderer ps.report@gmx.net --- sound/soc/codecs/tlv320aic32x4-clk.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/sound/soc/codecs/tlv320aic32x4-clk.c b/sound/soc/codecs/tlv320aic32x4-clk.c index 156c153c12ab..7a82e3448780 100644 --- a/sound/soc/codecs/tlv320aic32x4-clk.c +++ b/sound/soc/codecs/tlv320aic32x4-clk.c @@ -338,7 +338,8 @@ static unsigned long clk_aic32x4_div_recalc_rate(struct clk_hw *hw,
unsigned int val;
- regmap_read(div->regmap, div->reg, &val); + if (regmap_read(div->regmap, div->reg, &val)) + return 0;
return DIV_ROUND_UP(parent_rate, val & AIC32X4_DIV_MASK); }
On Fri, Dec 27, 2019 at 04:20:56PM +0100, Peter Seiderer wrote:
Fixes:
[ 5.169310] Division by zero in kernel. [ 5.200998] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.3.18-20191021-1+ #14 [ 5.203049] cdc_acm 2-1.6:1.0: ttyACM0: USB ACM device [ 5.208198] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) [ 5.220084] Backtrace: [ 5.222628] [<8010f60c>] (dump_backtrace) from [<8010f9a8>] (show_stack+0x20/0x24)
Please think hard before including complete backtraces in upstream reports, they are very large and contain almost no useful information relative to their size so often obscure the relevant content in your message. If part of the backtrace is usefully illustrative then it's usually better to pull out the relevant sections.
Hello Mark,
On Fri, 27 Dec 2019 22:52:04 +0000, Mark Brown broonie@kernel.org wrote:
On Fri, Dec 27, 2019 at 04:20:56PM +0100, Peter Seiderer wrote:
Fixes:
[ 5.169310] Division by zero in kernel. [ 5.200998] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.3.18-20191021-1+ #14 [ 5.203049] cdc_acm 2-1.6:1.0: ttyACM0: USB ACM device [ 5.208198] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) [ 5.220084] Backtrace: [ 5.222628] [<8010f60c>] (dump_backtrace) from [<8010f9a8>] (show_stack+0x20/0x24)
Please think hard before including complete backtraces in upstream reports, they are very large and contain almost no useful information relative to their size so often obscure the relevant content in your message. If part of the backtrace is usefully illustrative then it's usually better to pull out the relevant sections.
Thanks for review..., but a little disagree here, do not know much which is more informative than a complete back trace for a division by zero (and which is the complete information/starting point for investigating the reason therefore) and it would be a pity to loose this valuable information?
Maybe I should have added more information about why and how the failing regmap_read() call leads to a division by zero?
Any hint for a better commit message is welcome ;-)
Regards, Peter
On Mon, Jan 06, 2020 at 09:45:34PM +0100, Peter Seiderer wrote:
On Fri, 27 Dec 2019 22:52:04 +0000, Mark Brown broonie@kernel.org wrote:
On Fri, Dec 27, 2019 at 04:20:56PM +0100, Peter Seiderer wrote:
Please think hard before including complete backtraces in upstream reports, they are very large and contain almost no useful information relative to their size so often obscure the relevant content in your message. If part of the backtrace is usefully illustrative then it's usually better to pull out the relevant sections.
Thanks for review..., but a little disagree here, do not know much which is more informative than a complete back trace for a division by zero (and which is the complete information/starting point for investigating the reason therefore) and it would be a pity to loose this valuable information?
Right, some backtrace is definitely often useful for understanding where things broke and helping people search for problems - it's just providing the *full* backtrace that can be an issue as a lot of it can end up being redundant. As a rule of thumb I'd tend to say that once you get out of the driver or subsystem you're starting to loose relevance. For example with a probe failure like this once you get out of the driver probe function it almost never matters exactly what the stack in the device model core is, it's not adding anything and it's usually more than a screenful that needs to be paged through. Cutting that out helps people focus on the bits that matter.
Maybe I should have added more information about why and how the failing regmap_read() call leads to a division by zero?
Any analysis you've done about why things got confused and broken is definitely good to include!
On Fri, Dec 27, 2019 at 04:20:56PM +0100, Peter Seiderer wrote:
@@ -338,7 +338,8 @@ static unsigned long clk_aic32x4_div_recalc_rate(struct clk_hw *hw,
unsigned int val;
- regmap_read(div->regmap, div->reg, &val);
- if (regmap_read(div->regmap, div->reg, &val))
return 0;
Is this the best fix - shouldn't we be returning an error here? We don't know what the value programmed into the device actually is so zero might be wrong, and we still have the risk that the value we read from the device may be zero if the device is misprogrammed.
Hello Mark,
On Thu, 9 Jan 2020 20:38:19 +0000, Mark Brown broonie@kernel.org wrote:
On Fri, Dec 27, 2019 at 04:20:56PM +0100, Peter Seiderer wrote:
@@ -338,7 +338,8 @@ static unsigned long clk_aic32x4_div_recalc_rate(struct clk_hw *hw,
unsigned int val;
- regmap_read(div->regmap, div->reg, &val);
- if (regmap_read(div->regmap, div->reg, &val))
Unrelated to your question, but I would change this line (on next patch iteration) to (as all other return value checked places for regmap_read in the same file):
if (regmap_read(div->regmap, div->reg, &val) < 0)
return 0;
Is this the best fix - shouldn't we be returning an error here? We don't know what the value programmed into the device actually is so zero might be wrong, and we still have the risk that the value we read from the device may be zero if the device is misprogrammed.
clk_aic32x4_div_recalc_rate() is used for clk_ops aic32x4_div_ops.recalc_rate, did not check/or see on first sight if there is a error handling on the usage of recalc_rate, but did take a look at some other places where the error handling seems to be to return zero, e.g. sound/soc/codecs/da7219.c sound/soc/intel/skylake/skl-ssp-clk.c, etc.
The error occurred with linux-5.3.18, with the earlier versions on regmap_read failure val was (by chance) near 2^31 and evaluated with (val & AIC32X4_DIV_MASK) to 96 (or similar)...but with 5.3.18 evaluated to 0 and the line
return DIV_ROUND_UP(parent_rate, val & AIC32X4_DIV_MASK);
produced the 'Division by zero'...
Regards, Peter
participants (2)
-
Mark Brown
-
Peter Seiderer