ALSA: intel8x0: div by zero in snd_intel8x0_update()
Hi,
I'm running (sometimes) into the following problem during resume
divide error: 0000 [#1] PREEMPT SMP NOPTI RIP: 0010:snd_intel8x0_interrupt+0x121/0x279 Code: 42 8b 44 35 34 41 0f af c5 42 03 44 35 38 42 89 44 35 38 48 8b 0c 24 80 b9 60 03 00 00 00 78 0f 49 8d 0c 2e 48 83 c1 38 31 d2 <f7> 71 f4 89 11 42 8b 7c 35 48 44 01 ef 83 e7 1f 42 89 7c 35 48 48 RSP: 0000:ffff9a0a80108eb0 EFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000019 RCX: ffff90d8c5efc198 RDX: 0000000000000000 RSI: ffff9a0a80549016 RDI: ffff9a0a80549024 RBP: ffff90d8c5efc060 R08: 000000000000197a R09: 00000f604ed00191 R10: 00000000000001e0 R11: ffffffff9468e1d8 R12: 0000000000000020 R13: 0000000000000040 R14: 0000000000000100 R15: 0000000000000002 FS: 00007a75c397aff8(0000) GS:ffff90d912d80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007a77945d1000 CR3: 000000015bf46002 CR4: 0000000000360ea0 Call Trace: <IRQ> __handle_irq_event_percpu+0xa0/0x1c0 handle_irq_event_percpu+0x2d/0x70 handle_irq_event+0x2c/0x48 handle_fasteoi_irq+0xa1/0x161 do_IRQ+0x51/0xd6 common_interrupt+0xf/0xf </IRQ> RIP: 0033:0x7a7856462c59 Code: 89 ca 48 2b 57 20 48 83 c2 10 31 c0 48 3b 57 28 48 0f 46 c1 c3 cc cc cc cc cc cc cc cc cc cc cc cc 64 48 8b 0c 25 00 00 00 00 <b8> f8 02 00 00 48 03 41 08 c3 cc cc cc cc cc cc cc cc cc cc cc cc RSP: 002b:00007a75c39794e8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffde RAX: 02fa413b24209c6c RBX: 0000017f19e1cf9e RCX: 00007a75c397aff8 RDX: 00007a7855792472 RSI: 00007a7855790aa0 RDI: 0000000000000005 RBP: 0000000000000005 R08: 0000000000000012 R09: 000000000000000d R10: 00000000009f86d2 R11: 000000000000197a R12: 0000017f19e40e7d R13: 000005ee937ae557 R14: 00007a7855790aa0 R15: 00007a7855792472 Modules linked in: ---[ end trace 2ef6d63d0e3d757c ]--- RIP: 0010:snd_intel8x0_interrupt+0x121/0x279 Code: 42 8b 44 35 34 41 0f af c5 42 03 44 35 38 42 89 44 35 38 48 8b 0c 24 80 b9 60 03 00 00 00 78 0f 49 8d 0c 2e 48 83 c1 38 31 d2 <f7> 71 f4 89 11 42 8b 7c 35 48 44 01 ef 83 e7 1f 42 89 7c 35 48 48 RSP: 0000:ffff9a0a80108eb0 EFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000019 RCX: ffff90d8c5efc198 RDX: 0000000000000000 RSI: ffff9a0a80549016 RDI: ffff9a0a80549024 RBP: ffff90d8c5efc060 R08: 000000000000197a R09: 00000f604ed00191 R10: 00000000000001e0 R11: ffffffff9468e1d8 R12: 0000000000000020 R13: 0000000000000040 R14: 0000000000000100 R15: 0000000000000002 FS: 00007a75c397aff8(0000) GS:ffff90d912d80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007a77945d1000 CR3: 000000015bf46002 CR4: 0000000000360ea0
This corresponds to
ichdev->position %= ichdev->size;
in snd_intel8x0_update().
A print out of that ichdev looks as follows
snd_intel8x0 0000:00:18.0: lvi_frag = 0, frags = 0, size = 0, period_size = 0x0, period_size1 = 0x0
On Fri, 14 May 2021 10:17:10 +0200, Sergey Senozhatsky wrote:
Hi,
I'm running (sometimes) into the following problem during resume
divide error: 0000 [#1] PREEMPT SMP NOPTI RIP: 0010:snd_intel8x0_interrupt+0x121/0x279 Code: 42 8b 44 35 34 41 0f af c5 42 03 44 35 38 42 89 44 35 38 48 8b 0c 24 80 b9 60 03 00 00 00 78 0f 49 8d 0c 2e 48 83 c1 38 31 d2 <f7> 71 f4 89 11 42 8b 7c 35 48 44 01 ef 83 e7 1f 42 89 7c 35 48 48 RSP: 0000:ffff9a0a80108eb0 EFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000019 RCX: ffff90d8c5efc198 RDX: 0000000000000000 RSI: ffff9a0a80549016 RDI: ffff9a0a80549024 RBP: ffff90d8c5efc060 R08: 000000000000197a R09: 00000f604ed00191 R10: 00000000000001e0 R11: ffffffff9468e1d8 R12: 0000000000000020 R13: 0000000000000040 R14: 0000000000000100 R15: 0000000000000002 FS: 00007a75c397aff8(0000) GS:ffff90d912d80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007a77945d1000 CR3: 000000015bf46002 CR4: 0000000000360ea0 Call Trace:
<IRQ> __handle_irq_event_percpu+0xa0/0x1c0 handle_irq_event_percpu+0x2d/0x70 handle_irq_event+0x2c/0x48 handle_fasteoi_irq+0xa1/0x161 do_IRQ+0x51/0xd6 common_interrupt+0xf/0xf </IRQ> RIP: 0033:0x7a7856462c59 Code: 89 ca 48 2b 57 20 48 83 c2 10 31 c0 48 3b 57 28 48 0f 46 c1 c3 cc cc cc cc cc cc cc cc cc cc cc cc 64 48 8b 0c 25 00 00 00 00 <b8> f8 02 00 00 48 03 41 08 c3 cc cc cc cc cc cc cc cc cc cc cc cc RSP: 002b:00007a75c39794e8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffde RAX: 02fa413b24209c6c RBX: 0000017f19e1cf9e RCX: 00007a75c397aff8 RDX: 00007a7855792472 RSI: 00007a7855790aa0 RDI: 0000000000000005 RBP: 0000000000000005 R08: 0000000000000012 R09: 000000000000000d R10: 00000000009f86d2 R11: 000000000000197a R12: 0000017f19e40e7d R13: 000005ee937ae557 R14: 00007a7855790aa0 R15: 00007a7855792472 Modules linked in: ---[ end trace 2ef6d63d0e3d757c ]--- RIP: 0010:snd_intel8x0_interrupt+0x121/0x279 Code: 42 8b 44 35 34 41 0f af c5 42 03 44 35 38 42 89 44 35 38 48 8b 0c 24 80 b9 60 03 00 00 00 78 0f 49 8d 0c 2e 48 83 c1 38 31 d2 <f7> 71 f4 89 11 42 8b 7c 35 48 44 01 ef 83 e7 1f 42 89 7c 35 48 48 RSP: 0000:ffff9a0a80108eb0 EFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000019 RCX: ffff90d8c5efc198 RDX: 0000000000000000 RSI: ffff9a0a80549016 RDI: ffff9a0a80549024 RBP: ffff90d8c5efc060 R08: 000000000000197a R09: 00000f604ed00191 R10: 00000000000001e0 R11: ffffffff9468e1d8 R12: 0000000000000020 R13: 0000000000000040 R14: 0000000000000100 R15: 0000000000000002 FS: 00007a75c397aff8(0000) GS:ffff90d912d80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007a77945d1000 CR3: 000000015bf46002 CR4: 0000000000360ea0
This corresponds to
ichdev->position %= ichdev->size;
in snd_intel8x0_update().
A print out of that ichdev looks as follows
snd_intel8x0 0000:00:18.0: lvi_frag = 0, frags = 0, size = 0, period_size = 0x0, period_size1 = 0x0
This sounds like some spurious IRQ that casually hits during the resume. It's strange that, even if it's a spurious IRQ, it contains the proper update bits for the stream. Is that on a real hardware or on a VM?
In anyway, the patch like below might cover enough.
Takashi
--- a/sound/pci/intel8x0.c +++ b/sound/pci/intel8x0.c @@ -691,6 +691,9 @@ static inline void snd_intel8x0_update(struct intel8x0 *chip, struct ichdev *ich int status, civ, i, step; int ack = 0;
+ if (!ichdev->substream || ichdev->suspended) + return; + spin_lock_irqsave(&chip->reg_lock, flags); status = igetbyte(chip, port + ichdev->roff_sr); civ = igetbyte(chip, port + ICH_REG_OFF_CIV);
On (21/05/14 13:05), Takashi Iwai wrote:
divide error: 0000 [#1] PREEMPT SMP NOPTI RIP: 0010:snd_intel8x0_interrupt+0x121/0x279 Code: 42 8b 44 35 34 41 0f af c5 42 03 44 35 38 42 89 44 35 38 48 8b 0c 24 80 b9 60 03 00 00 00 78 0f 49 8d 0c 2e 48 83 c1 38 31 d2 <f7> 71 f4 89 11 42 8b 7c 35 48 44 01 ef 83 e7 1f 42 89 7c 35 48 48 RSP: 0000:ffff9a0a80108eb0 EFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000019 RCX: ffff90d8c5efc198 RDX: 0000000000000000 RSI: ffff9a0a80549016 RDI: ffff9a0a80549024 RBP: ffff90d8c5efc060 R08: 000000000000197a R09: 00000f604ed00191 R10: 00000000000001e0 R11: ffffffff9468e1d8 R12: 0000000000000020 R13: 0000000000000040 R14: 0000000000000100 R15: 0000000000000002 FS: 00007a75c397aff8(0000) GS:ffff90d912d80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007a77945d1000 CR3: 000000015bf46002 CR4: 0000000000360ea0 Call Trace:
<IRQ> __handle_irq_event_percpu+0xa0/0x1c0 handle_irq_event_percpu+0x2d/0x70 handle_irq_event+0x2c/0x48 handle_fasteoi_irq+0xa1/0x161 do_IRQ+0x51/0xd6 common_interrupt+0xf/0xf </IRQ> RIP: 0033:0x7a7856462c59 Code: 89 ca 48 2b 57 20 48 83 c2 10 31 c0 48 3b 57 28 48 0f 46 c1 c3 cc cc cc cc cc cc cc cc cc cc cc cc 64 48 8b 0c 25 00 00 00 00 <b8> f8 02 00 00 48 03 41 08 c3 cc cc cc cc cc cc cc cc cc cc cc cc RSP: 002b:00007a75c39794e8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffde RAX: 02fa413b24209c6c RBX: 0000017f19e1cf9e RCX: 00007a75c397aff8 RDX: 00007a7855792472 RSI: 00007a7855790aa0 RDI: 0000000000000005 RBP: 0000000000000005 R08: 0000000000000012 R09: 000000000000000d R10: 00000000009f86d2 R11: 000000000000197a R12: 0000017f19e40e7d R13: 000005ee937ae557 R14: 00007a7855790aa0 R15: 00007a7855792472 Modules linked in: ---[ end trace 2ef6d63d0e3d757c ]--- RIP: 0010:snd_intel8x0_interrupt+0x121/0x279 Code: 42 8b 44 35 34 41 0f af c5 42 03 44 35 38 42 89 44 35 38 48 8b 0c 24 80 b9 60 03 00 00 00 78 0f 49 8d 0c 2e 48 83 c1 38 31 d2 <f7> 71 f4 89 11 42 8b 7c 35 48 44 01 ef 83 e7 1f 42 89 7c 35 48 48 RSP: 0000:ffff9a0a80108eb0 EFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000019 RCX: ffff90d8c5efc198 RDX: 0000000000000000 RSI: ffff9a0a80549016 RDI: ffff9a0a80549024 RBP: ffff90d8c5efc060 R08: 000000000000197a R09: 00000f604ed00191 R10: 00000000000001e0 R11: ffffffff9468e1d8 R12: 0000000000000020 R13: 0000000000000040 R14: 0000000000000100 R15: 0000000000000002 FS: 00007a75c397aff8(0000) GS:ffff90d912d80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007a77945d1000 CR3: 000000015bf46002 CR4: 0000000000360ea0
This corresponds to
ichdev->position %= ichdev->size;
in snd_intel8x0_update().
A print out of that ichdev looks as follows
snd_intel8x0 0000:00:18.0: lvi_frag = 0, frags = 0, size = 0, period_size = 0x0, period_size1 = 0x0
This sounds like some spurious IRQ that casually hits during the resume. It's strange that, even if it's a spurious IRQ, it contains the proper update bits for the stream.
Yes, I found this to be strange as well. That's why I started dumping ichdev fields and so on.
Is that on a real hardware or on a VM?
VM.
In anyway, the patch like below might cover enough.
I'll give it a try.
--- a/sound/pci/intel8x0.c +++ b/sound/pci/intel8x0.c @@ -691,6 +691,9 @@ static inline void snd_intel8x0_update(struct intel8x0 *chip, struct ichdev *ich int status, civ, i, step; int ack = 0;
- if (!ichdev->substream || ichdev->suspended)
return;
- spin_lock_irqsave(&chip->reg_lock, flags); status = igetbyte(chip, port + ichdev->roff_sr); civ = igetbyte(chip, port + ICH_REG_OFF_CIV);
On (21/05/14 20:16), Sergey Senozhatsky wrote:
--- a/sound/pci/intel8x0.c +++ b/sound/pci/intel8x0.c @@ -691,6 +691,9 @@ static inline void snd_intel8x0_update(struct intel8x0 *chip, struct ichdev *ich int status, civ, i, step; int ack = 0;
- if (!ichdev->substream || ichdev->suspended)
return;
- spin_lock_irqsave(&chip->reg_lock, flags); status = igetbyte(chip, port + ichdev->roff_sr); civ = igetbyte(chip, port + ICH_REG_OFF_CIV);
This does the problem for me.
On (21/05/16 17:30), Sergey Senozhatsky wrote:
On (21/05/14 20:16), Sergey Senozhatsky wrote:
--- a/sound/pci/intel8x0.c +++ b/sound/pci/intel8x0.c @@ -691,6 +691,9 @@ static inline void snd_intel8x0_update(struct intel8x0 *chip, struct ichdev *ich int status, civ, i, step; int ack = 0;
- if (!ichdev->substream || ichdev->suspended)
return;
- spin_lock_irqsave(&chip->reg_lock, flags); status = igetbyte(chip, port + ichdev->roff_sr); civ = igetbyte(chip, port + ICH_REG_OFF_CIV);
This does the problem for me.
^^^ does fix
On Sun, 16 May 2021 10:31:41 +0200, Sergey Senozhatsky wrote:
On (21/05/16 17:30), Sergey Senozhatsky wrote:
On (21/05/14 20:16), Sergey Senozhatsky wrote:
--- a/sound/pci/intel8x0.c +++ b/sound/pci/intel8x0.c @@ -691,6 +691,9 @@ static inline void snd_intel8x0_update(struct intel8x0 *chip, struct ichdev *ich int status, civ, i, step; int ack = 0;
- if (!ichdev->substream || ichdev->suspended)
return;
- spin_lock_irqsave(&chip->reg_lock, flags); status = igetbyte(chip, port + ichdev->roff_sr); civ = igetbyte(chip, port + ICH_REG_OFF_CIV);
This does the problem for me.
^^^ does fix
OK, thanks for confirmation. So this looks like some spurious interrupt with the unexpected hardware bits.
However, the suggested check doesn't seem covering enough, and it might still hit if the suspend/resume happens before the device is opened but not set up (and such a spurious irq is triggered).
Below is more comprehensive fix. Let me know if this works, too.
thanks,
Takashi
-- 8< -- Subject: [PATCH] ALSA: intel8x0: Don't update period unless prepared
The interrupt handler of intel8x0 calls snd_intel8x0_update() whenever the hardware sets the corresponding status bit for each stream. This works fine for most cases as long as the hardware behaves properly. But when the hardware gives a wrong bit set, this leads to a NULL dereference Oops, and reportedly, this seems what happened on a VM.
For fixing the crash, this patch adds a internal flag indicating that the stream is ready to be updated, and check it (as well as the flag being in suspended) to ignore such spurious update.
Cc: stable@vger.kernel.org Reported-by: Sergey Senozhatsky senozhatsky@chromium.org Signed-off-by: Takashi Iwai tiwai@suse.de --- sound/pci/intel8x0.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/sound/pci/intel8x0.c b/sound/pci/intel8x0.c index 35903d1a1cbd..5b124c4ad572 100644 --- a/sound/pci/intel8x0.c +++ b/sound/pci/intel8x0.c @@ -331,6 +331,7 @@ struct ichdev { unsigned int ali_slot; /* ALI DMA slot */ struct ac97_pcm *pcm; int pcm_open_flag; + unsigned int prepared:1; unsigned int suspended: 1; };
@@ -691,6 +692,9 @@ static inline void snd_intel8x0_update(struct intel8x0 *chip, struct ichdev *ich int status, civ, i, step; int ack = 0;
+ if (!ichdev->prepared || ichdev->suspended) + return; + spin_lock_irqsave(&chip->reg_lock, flags); status = igetbyte(chip, port + ichdev->roff_sr); civ = igetbyte(chip, port + ICH_REG_OFF_CIV); @@ -881,6 +885,7 @@ static int snd_intel8x0_hw_params(struct snd_pcm_substream *substream, if (ichdev->pcm_open_flag) { snd_ac97_pcm_close(ichdev->pcm); ichdev->pcm_open_flag = 0; + ichdev->prepared = 0; } err = snd_ac97_pcm_open(ichdev->pcm, params_rate(hw_params), params_channels(hw_params), @@ -902,6 +907,7 @@ static int snd_intel8x0_hw_free(struct snd_pcm_substream *substream) if (ichdev->pcm_open_flag) { snd_ac97_pcm_close(ichdev->pcm); ichdev->pcm_open_flag = 0; + ichdev->prepared = 0; } return 0; } @@ -976,6 +982,7 @@ static int snd_intel8x0_pcm_prepare(struct snd_pcm_substream *substream) ichdev->pos_shift = (runtime->sample_bits > 16) ? 2 : 1; } snd_intel8x0_setup_periods(chip, ichdev); + ichdev->prepared = 1; return 0; }
On (21/05/16 11:49), Takashi Iwai wrote:
Subject: [PATCH] ALSA: intel8x0: Don't update period unless prepared
The interrupt handler of intel8x0 calls snd_intel8x0_update() whenever the hardware sets the corresponding status bit for each stream. This works fine for most cases as long as the hardware behaves properly. But when the hardware gives a wrong bit set, this leads to a NULL dereference Oops, and reportedly, this seems what happened on a VM.
For fixing the crash, this patch adds a internal flag indicating that the stream is ready to be updated, and check it (as well as the flag being in suspended) to ignore such spurious update.
Cc: stable@vger.kernel.org Reported-by: Sergey Senozhatsky senozhatsky@chromium.org Signed-off-by: Takashi Iwai tiwai@suse.de
I kicked the tests. Will let you know.
On (21/05/16 11:49), Takashi Iwai wrote:
Subject: [PATCH] ALSA: intel8x0: Don't update period unless prepared
The interrupt handler of intel8x0 calls snd_intel8x0_update() whenever the hardware sets the corresponding status bit for each stream. This works fine for most cases as long as the hardware behaves properly. But when the hardware gives a wrong bit set, this leads to a NULL dereference Oops, and reportedly, this seems what happened on a VM.
VM, yes. I didn't see NULL derefs, my VMs crash because of div by zero in `% size`.
For fixing the crash, this patch adds a internal flag indicating that the stream is ready to be updated, and check it (as well as the flag being in suspended) to ignore such spurious update.
I reproduced the "spurious IRQ" case, and the patch handled it correctly (VM did not crash).
Cc: stable@vger.kernel.org Reported-by: Sergey Senozhatsky senozhatsky@chromium.org Signed-off-by: Takashi Iwai tiwai@suse.de
I'll keep running test, but seems that it works as intended
Tested-by: Sergey Senozhatsky senozhatsky@chromium.org
On Sun, 16 May 2021 13:23:21 +0200, Sergey Senozhatsky wrote:
On (21/05/16 11:49), Takashi Iwai wrote:
Subject: [PATCH] ALSA: intel8x0: Don't update period unless prepared
The interrupt handler of intel8x0 calls snd_intel8x0_update() whenever the hardware sets the corresponding status bit for each stream. This works fine for most cases as long as the hardware behaves properly. But when the hardware gives a wrong bit set, this leads to a NULL dereference Oops, and reportedly, this seems what happened on a VM.
VM, yes. I didn't see NULL derefs, my VMs crash because of div by zero in `% size`.
Ah, right, I'll fix the description.
For fixing the crash, this patch adds a internal flag indicating that the stream is ready to be updated, and check it (as well as the flag being in suspended) to ignore such spurious update.
I reproduced the "spurious IRQ" case, and the patch handled it correctly (VM did not crash).
Cc: stable@vger.kernel.org Reported-by: Sergey Senozhatsky senozhatsky@chromium.org Signed-off-by: Takashi Iwai tiwai@suse.de
I'll keep running test, but seems that it works as intended
Tested-by: Sergey Senozhatsky senozhatsky@chromium.org
OK, below is the revised patch I'm going to apply.
Thanks!
Takashi
-- 8< -- From: Takashi Iwai tiwai@suse.de Subject: [PATCH v2] ALSA: intel8x0: Don't update period unless prepared
The interrupt handler of intel8x0 calls snd_intel8x0_update() whenever the hardware sets the corresponding status bit for each stream. This works fine for most cases as long as the hardware behaves properly. But when the hardware gives a wrong bit set, this leads to a zero- division Oops, and reportedly, this seems what happened on a VM.
For fixing the crash, this patch adds a internal flag indicating that the stream is ready to be updated, and check it (as well as the flag being in suspended) to ignore such spurious update.
Cc: stable@vger.kernel.org Reported-and-tested-by: Sergey Senozhatsky senozhatsky@chromium.org Signed-off-by: Takashi Iwai tiwai@suse.de --- v1->v2: fixed description, updated tested-by tag
sound/pci/intel8x0.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/sound/pci/intel8x0.c b/sound/pci/intel8x0.c index 35903d1a1cbd..5b124c4ad572 100644 --- a/sound/pci/intel8x0.c +++ b/sound/pci/intel8x0.c @@ -331,6 +331,7 @@ struct ichdev { unsigned int ali_slot; /* ALI DMA slot */ struct ac97_pcm *pcm; int pcm_open_flag; + unsigned int prepared:1; unsigned int suspended: 1; };
@@ -691,6 +692,9 @@ static inline void snd_intel8x0_update(struct intel8x0 *chip, struct ichdev *ich int status, civ, i, step; int ack = 0;
+ if (!ichdev->prepared || ichdev->suspended) + return; + spin_lock_irqsave(&chip->reg_lock, flags); status = igetbyte(chip, port + ichdev->roff_sr); civ = igetbyte(chip, port + ICH_REG_OFF_CIV); @@ -881,6 +885,7 @@ static int snd_intel8x0_hw_params(struct snd_pcm_substream *substream, if (ichdev->pcm_open_flag) { snd_ac97_pcm_close(ichdev->pcm); ichdev->pcm_open_flag = 0; + ichdev->prepared = 0; } err = snd_ac97_pcm_open(ichdev->pcm, params_rate(hw_params), params_channels(hw_params), @@ -902,6 +907,7 @@ static int snd_intel8x0_hw_free(struct snd_pcm_substream *substream) if (ichdev->pcm_open_flag) { snd_ac97_pcm_close(ichdev->pcm); ichdev->pcm_open_flag = 0; + ichdev->prepared = 0; } return 0; } @@ -976,6 +982,7 @@ static int snd_intel8x0_pcm_prepare(struct snd_pcm_substream *substream) ichdev->pos_shift = (runtime->sample_bits > 16) ? 2 : 1; } snd_intel8x0_setup_periods(chip, ichdev); + ichdev->prepared = 1; return 0; }
On (21/05/16 14:07), Takashi Iwai wrote:
For fixing the crash, this patch adds a internal flag indicating that the stream is ready to be updated, and check it (as well as the flag being in suspended) to ignore such spurious update.
I reproduced the "spurious IRQ" case, and the patch handled it correctly (VM did not crash).
Cc: stable@vger.kernel.org Reported-by: Sergey Senozhatsky senozhatsky@chromium.org Signed-off-by: Takashi Iwai tiwai@suse.de
I'll keep running test, but seems that it works as intended
Tested-by: Sergey Senozhatsky senozhatsky@chromium.org
OK, below is the revised patch I'm going to apply.
Sounds good.
Thanks!
Thank you.
Hello,
On Sun, May 16, 2021 at 2:50 AM Takashi Iwai tiwai@suse.de wrote:
On Sun, 16 May 2021 10:31:41 +0200, Sergey Senozhatsky wrote:
On (21/05/16 17:30), Sergey Senozhatsky wrote:
On (21/05/14 20:16), Sergey Senozhatsky wrote:
--- a/sound/pci/intel8x0.c +++ b/sound/pci/intel8x0.c @@ -691,6 +691,9 @@ static inline void snd_intel8x0_update(struct intel8x0 *chip, struct ichdev *ich int status, civ, i, step; int ack = 0;
if (!ichdev->substream || ichdev->suspended)
return;
spin_lock_irqsave(&chip->reg_lock, flags); status = igetbyte(chip, port + ichdev->roff_sr); civ = igetbyte(chip, port + ICH_REG_OFF_CIV);
This does the problem for me.
^^^ does fix
OK, thanks for confirmation. So this looks like some spurious interrupt with the unexpected hardware bits.
However, the suggested check doesn't seem covering enough, and it might still hit if the suspend/resume happens before the device is opened but not set up (and such a spurious irq is triggered).
Below is more comprehensive fix. Let me know if this works, too.
thanks,
Takashi
-- 8< -- Subject: [PATCH] ALSA: intel8x0: Don't update period unless prepared
The interrupt handler of intel8x0 calls snd_intel8x0_update() whenever the hardware sets the corresponding status bit for each stream. This works fine for most cases as long as the hardware behaves properly. But when the hardware gives a wrong bit set, this leads to a NULL dereference Oops, and reportedly, this seems what happened on a VM.
For fixing the crash, this patch adds a internal flag indicating that the stream is ready to be updated, and check it (as well as the flag being in suspended) to ignore such spurious update.
Cc: stable@vger.kernel.org Reported-by: Sergey Senozhatsky senozhatsky@chromium.org Signed-off-by: Takashi Iwai tiwai@suse.de
sound/pci/intel8x0.c | 7 +++++++ 1 file changed, 7 insertions(+)
linux v5.13 booting on qemu-system-xtensa virt board gets stuck inside snd_intel8x0_probe -> intel8x0_measure_ac97_clock with this patch. Prior to it it boots successfully for me. I'm curious if this issue has been reported yet.
What I see is an IRQ flood, at some point snd_intel8x0_interrupt and timer ISR are called in loop and execution never returns to the interrupted function intel8x0_measure_ac97_clock.
Any idea what it could be?
-- Thanks. -- Max
On Tue, 06 Jul 2021 19:50:08 +0200, Max Filippov wrote:
Hello,
On Sun, May 16, 2021 at 2:50 AM Takashi Iwai tiwai@suse.de wrote:
On Sun, 16 May 2021 10:31:41 +0200, Sergey Senozhatsky wrote:
On (21/05/16 17:30), Sergey Senozhatsky wrote:
On (21/05/14 20:16), Sergey Senozhatsky wrote:
--- a/sound/pci/intel8x0.c +++ b/sound/pci/intel8x0.c @@ -691,6 +691,9 @@ static inline void snd_intel8x0_update(struct intel8x0 *chip, struct ichdev *ich int status, civ, i, step; int ack = 0;
if (!ichdev->substream || ichdev->suspended)
return;
spin_lock_irqsave(&chip->reg_lock, flags); status = igetbyte(chip, port + ichdev->roff_sr); civ = igetbyte(chip, port + ICH_REG_OFF_CIV);
This does the problem for me.
^^^ does fix
OK, thanks for confirmation. So this looks like some spurious interrupt with the unexpected hardware bits.
However, the suggested check doesn't seem covering enough, and it might still hit if the suspend/resume happens before the device is opened but not set up (and such a spurious irq is triggered).
Below is more comprehensive fix. Let me know if this works, too.
thanks,
Takashi
-- 8< -- Subject: [PATCH] ALSA: intel8x0: Don't update period unless prepared
The interrupt handler of intel8x0 calls snd_intel8x0_update() whenever the hardware sets the corresponding status bit for each stream. This works fine for most cases as long as the hardware behaves properly. But when the hardware gives a wrong bit set, this leads to a NULL dereference Oops, and reportedly, this seems what happened on a VM.
For fixing the crash, this patch adds a internal flag indicating that the stream is ready to be updated, and check it (as well as the flag being in suspended) to ignore such spurious update.
Cc: stable@vger.kernel.org Reported-by: Sergey Senozhatsky senozhatsky@chromium.org Signed-off-by: Takashi Iwai tiwai@suse.de
sound/pci/intel8x0.c | 7 +++++++ 1 file changed, 7 insertions(+)
linux v5.13 booting on qemu-system-xtensa virt board gets stuck inside snd_intel8x0_probe -> intel8x0_measure_ac97_clock with this patch. Prior to it it boots successfully for me. I'm curious if this issue has been reported yet.
What I see is an IRQ flood, at some point snd_intel8x0_interrupt and timer ISR are called in loop and execution never returns to the interrupted function intel8x0_measure_ac97_clock.
Any idea what it could be?
That's something odd with the VM. As the chip itself has never shown such a problem on real systems, maybe the best action would be to just skip the clock measurement on VM. The measurement itself is unreliable on VM, so it makes more sense.
That said, something like below would work?
thanks,
Takashi
--- diff --git a/sound/pci/intel8x0.c b/sound/pci/intel8x0.c index 2d1bfbcba933..b75f832d7777 100644 --- a/sound/pci/intel8x0.c +++ b/sound/pci/intel8x0.c @@ -2199,6 +2199,9 @@ static int snd_intel8x0_mixer(struct intel8x0 *chip, int ac97_clock, pbus->private_free = snd_intel8x0_mixer_free_ac97_bus; if (ac97_clock >= 8000 && ac97_clock <= 48000) pbus->clock = ac97_clock; + else if (chip->inside_vm) + pbus->clock = 48000; + /* FIXME: my test board doesn't work well with VRA... */ if (chip->device_type == DEVICE_ALI) pbus->no_vra = 1;
On Wed, Jul 7, 2021 at 12:02 AM Takashi Iwai tiwai@suse.de wrote:
On Tue, 06 Jul 2021 19:50:08 +0200, Max Filippov wrote:
linux v5.13 booting on qemu-system-xtensa virt board gets stuck inside snd_intel8x0_probe -> intel8x0_measure_ac97_clock with this patch. Prior to it it boots successfully for me. I'm curious if this issue has been reported yet.
What I see is an IRQ flood, at some point snd_intel8x0_interrupt and timer ISR are called in loop and execution never returns to the interrupted function intel8x0_measure_ac97_clock.
Any idea what it could be?
That's something odd with the VM. As the chip itself has never shown such a problem on real systems, maybe the best action would be to just skip the clock measurement on VM. The measurement itself is unreliable on VM, so it makes more sense.
That said, something like below would work?
It didn't change anything in my case. My further observation is that the snd_intel8x0_update is called before the ichdev->prepared is set to one and as a result IRQ is apparently never cleared. Perhaps because intel8x0_measure_ac97_clock is called from the snd_intel8x0_probe, well before the snd_intel8x0_pcm_prepare that sets ichdev->prepared is called.
thanks,
Takashi
diff --git a/sound/pci/intel8x0.c b/sound/pci/intel8x0.c index 2d1bfbcba933..b75f832d7777 100644 --- a/sound/pci/intel8x0.c +++ b/sound/pci/intel8x0.c @@ -2199,6 +2199,9 @@ static int snd_intel8x0_mixer(struct intel8x0 *chip, int ac97_clock, pbus->private_free = snd_intel8x0_mixer_free_ac97_bus; if (ac97_clock >= 8000 && ac97_clock <= 48000) pbus->clock = ac97_clock;
else if (chip->inside_vm)
pbus->clock = 48000;
/* FIXME: my test board doesn't work well with VRA... */ if (chip->device_type == DEVICE_ALI) pbus->no_vra = 1;
On Wed, 07 Jul 2021 19:50:07 +0200, Max Filippov wrote:
On Wed, Jul 7, 2021 at 12:02 AM Takashi Iwai tiwai@suse.de wrote:
On Tue, 06 Jul 2021 19:50:08 +0200, Max Filippov wrote:
linux v5.13 booting on qemu-system-xtensa virt board gets stuck inside snd_intel8x0_probe -> intel8x0_measure_ac97_clock with this patch. Prior to it it boots successfully for me. I'm curious if this issue has been reported yet.
What I see is an IRQ flood, at some point snd_intel8x0_interrupt and timer ISR are called in loop and execution never returns to the interrupted function intel8x0_measure_ac97_clock.
Any idea what it could be?
That's something odd with the VM. As the chip itself has never shown such a problem on real systems, maybe the best action would be to just skip the clock measurement on VM. The measurement itself is unreliable on VM, so it makes more sense.
That said, something like below would work?
It didn't change anything in my case. My further observation is that the snd_intel8x0_update is called before the ichdev->prepared is set to one and as a result IRQ is apparently never cleared.
So it's broken in anyway no matter whether intel8x0_measure_ac97_clock() is called or not, right? I'm afraid that something is wrong in VM, then. The driver has been working over decades on thousands of real different boards.
Skipping the clock measurement on VM would be still useful, independent from your problem, though.
Takashi
Perhaps because intel8x0_measure_ac97_clock is called from the snd_intel8x0_probe, well before the snd_intel8x0_pcm_prepare that sets ichdev->prepared is called.
thanks,
Takashi
diff --git a/sound/pci/intel8x0.c b/sound/pci/intel8x0.c index 2d1bfbcba933..b75f832d7777 100644 --- a/sound/pci/intel8x0.c +++ b/sound/pci/intel8x0.c @@ -2199,6 +2199,9 @@ static int snd_intel8x0_mixer(struct intel8x0 *chip, int ac97_clock, pbus->private_free = snd_intel8x0_mixer_free_ac97_bus; if (ac97_clock >= 8000 && ac97_clock <= 48000) pbus->clock = ac97_clock;
else if (chip->inside_vm)
pbus->clock = 48000;
/* FIXME: my test board doesn't work well with VRA... */ if (chip->device_type == DEVICE_ALI) pbus->no_vra = 1;
-- Thanks. -- Max
On Wed, Jul 7, 2021 at 11:14 AM Takashi Iwai tiwai@suse.de wrote:
On Wed, 07 Jul 2021 19:50:07 +0200, Max Filippov wrote:
It didn't change anything in my case. My further observation is that the snd_intel8x0_update is called before the ichdev->prepared is set to one and as a result IRQ is apparently never cleared.
So it's broken in anyway no matter whether intel8x0_measure_ac97_clock() is called or not, right?
The change that you suggested didn't eliminate the call to intel8x0_measure_ac97_clock, it's still called and an interrupt flood happens at the same place.
I've also tried the following change instead and it fixes my issue:
diff --git a/sound/pci/intel8x0.c b/sound/pci/intel8x0.c index 5b124c4ad572..13d1c9edea10 100644 --- a/sound/pci/intel8x0.c +++ b/sound/pci/intel8x0.c @@ -692,11 +692,14 @@ static inline void snd_intel8x0_update(struct intel8x0 *chip, struct ichdev *ich int status, civ, i, step; int ack = 0;
- if (!ichdev->prepared || ichdev->suspended) - return; - spin_lock_irqsave(&chip->reg_lock, flags); status = igetbyte(chip, port + ichdev->roff_sr); + if (!ichdev->prepared || ichdev->suspended) { + spin_unlock_irqrestore(&chip->reg_lock, flags); + iputbyte(chip, port + ichdev->roff_sr, + status & (ICH_FIFOE | ICH_BCIS | ICH_LVBCI)); + return; + } civ = igetbyte(chip, port + ICH_REG_OFF_CIV); if (!(status & ICH_BCIS)) { step = 0;
I'm afraid that something is wrong in VM, then. The driver has been working over decades on thousands of real different boards.
Skipping the clock measurement on VM would be still useful, independent from your problem, though.
On Wed, 07 Jul 2021 22:33:22 +0200, Max Filippov wrote:
On Wed, Jul 7, 2021 at 11:14 AM Takashi Iwai tiwai@suse.de wrote:
On Wed, 07 Jul 2021 19:50:07 +0200, Max Filippov wrote:
It didn't change anything in my case. My further observation is that the snd_intel8x0_update is called before the ichdev->prepared is set to one and as a result IRQ is apparently never cleared.
So it's broken in anyway no matter whether intel8x0_measure_ac97_clock() is called or not, right?
The change that you suggested didn't eliminate the call to intel8x0_measure_ac97_clock, it's still called and an interrupt flood happens at the same place.
Ah I see the point. Then the fix would be a oneliner like below.
Takashi
--- a/sound/pci/intel8x0.c +++ b/sound/pci/intel8x0.c @@ -694,7 +694,7 @@ static inline void snd_intel8x0_update(struct intel8x0 *chip, struct ichdev *ich int status, civ, i, step; int ack = 0;
- if (!ichdev->prepared || ichdev->suspended) + if (!(ichdev->prepared || ichdev->in_measurement) || ichdev->suspended) return;
spin_lock_irqsave(&chip->reg_lock, flags);
On Thu, Jul 8, 2021 at 12:13 AM Takashi Iwai tiwai@suse.de wrote:
On Wed, 07 Jul 2021 22:33:22 +0200, Max Filippov wrote:
On Wed, Jul 7, 2021 at 11:14 AM Takashi Iwai tiwai@suse.de wrote:
On Wed, 07 Jul 2021 19:50:07 +0200, Max Filippov wrote:
It didn't change anything in my case. My further observation is that the snd_intel8x0_update is called before the ichdev->prepared is set to one and as a result IRQ is apparently never cleared.
So it's broken in anyway no matter whether intel8x0_measure_ac97_clock() is called or not, right?
The change that you suggested didn't eliminate the call to intel8x0_measure_ac97_clock, it's still called and an interrupt flood happens at the same place.
Ah I see the point. Then the fix would be a oneliner like below.
Takashi
--- a/sound/pci/intel8x0.c +++ b/sound/pci/intel8x0.c @@ -694,7 +694,7 @@ static inline void snd_intel8x0_update(struct intel8x0 *chip, struct ichdev *ich int status, civ, i, step; int ack = 0;
if (!ichdev->prepared || ichdev->suspended)
if (!(ichdev->prepared || ichdev->in_measurement) || ichdev->suspended)
There's no ichdev::in_measurement, but if replaced with chip->in_measurement it indeed fixes my issue. So with this change: Tested-by: Max Filippov jcmvbkbc@gmail.com
On Thu, 08 Jul 2021 10:41:50 +0200, Max Filippov wrote:
On Thu, Jul 8, 2021 at 12:13 AM Takashi Iwai tiwai@suse.de wrote:
On Wed, 07 Jul 2021 22:33:22 +0200, Max Filippov wrote:
On Wed, Jul 7, 2021 at 11:14 AM Takashi Iwai tiwai@suse.de wrote:
On Wed, 07 Jul 2021 19:50:07 +0200, Max Filippov wrote:
It didn't change anything in my case. My further observation is that the snd_intel8x0_update is called before the ichdev->prepared is set to one and as a result IRQ is apparently never cleared.
So it's broken in anyway no matter whether intel8x0_measure_ac97_clock() is called or not, right?
The change that you suggested didn't eliminate the call to intel8x0_measure_ac97_clock, it's still called and an interrupt flood happens at the same place.
Ah I see the point. Then the fix would be a oneliner like below.
Takashi
--- a/sound/pci/intel8x0.c +++ b/sound/pci/intel8x0.c @@ -694,7 +694,7 @@ static inline void snd_intel8x0_update(struct intel8x0 *chip, struct ichdev *ich int status, civ, i, step; int ack = 0;
if (!ichdev->prepared || ichdev->suspended)
if (!(ichdev->prepared || ichdev->in_measurement) || ichdev->suspended)
There's no ichdev::in_measurement, but if replaced with chip->in_measurement it indeed fixes my issue.
One must compile the code before sending out :-<
So with this change: Tested-by: Max Filippov jcmvbkbc@gmail.com
Great, thanks for quick testing, I'll prepare the fix patch now.
Takashi
On (21/07/08 11:00), Takashi Iwai wrote:
--- a/sound/pci/intel8x0.c +++ b/sound/pci/intel8x0.c @@ -694,7 +694,7 @@ static inline void snd_intel8x0_update(struct intel8x0 *chip, struct ichdev *ich int status, civ, i, step; int ack = 0;
if (!ichdev->prepared || ichdev->suspended)
if (!(ichdev->prepared || ichdev->in_measurement) || ichdev->suspended)
There's no ichdev::in_measurement, but if replaced with chip->in_measurement it indeed fixes my issue.
One must compile the code before sending out :-<
So with this change: Tested-by: Max Filippov jcmvbkbc@gmail.com
Great, thanks for quick testing, I'll prepare the fix patch now.
Tested-by: Sergey Senozhatsky senozhatsky@chromium.org
participants (3)
-
Max Filippov
-
Sergey Senozhatsky
-
Takashi Iwai