[alsa-devel] RFC: workaround for 'azx_get_response timeout, switching to polling mode...'

Hi,
Although I thought that polling mode is harmless, it sometimes isn't. I found out that after few dozens of hibernate cycles, the timeout would happen again kicking out first MSI, and then whole RRIB (single_cmd).
However, I also found out that if I resend the command that caused the timeout, it would complete normally. Even futher, it is possible to poll for this command, and send next normally. It just works. So interrupts do work, but sometimes (very rarely) are missed. It might be even a hardware bug.
The patch I send allows to delay switch to polling till we get 3 such timeouts in a row.
I tested this approach for about 130 hibernate cycles.
Best regards, Maxim Levitsky

My sound codec seems sometimes (very rarely) to omit interrupts (ALC268) However, interrupt mode still works. Thus if we get timeout, poll the codec once.
If we get 3 such polls in a row, then switch to polling mode.
This patch is maybe an bandaid, but this might be a workaround for hardware bug.
Signed-off-by: Maxim Levitsky maximlevitsky@gmail.com --- sound/pci/hda/hda_intel.c | 19 +++++++++++++++++-- 1 files changed, 17 insertions(+), 2 deletions(-)
diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c index ec9c348..11ce655 100644 --- a/sound/pci/hda/hda_intel.c +++ b/sound/pci/hda/hda_intel.c @@ -426,6 +426,7 @@ struct azx {
/* flags */ int position_fix; + int poll_count; unsigned int running :1; unsigned int initialized :1; unsigned int single_cmd :1; @@ -506,7 +507,7 @@ static char *driver_short_names[] __devinitdata = { #define get_azx_dev(substream) (substream->runtime->private_data)
static int azx_acquire_irq(struct azx *chip, int do_disconnect); - +static int azx_send_cmd(struct hda_bus *bus, unsigned int val); /* * Interface for HD codec */ @@ -664,11 +665,12 @@ static unsigned int azx_rirb_get_response(struct hda_bus *bus, { struct azx *chip = bus->private_data; unsigned long timeout; + int do_poll = 0;
again: timeout = jiffies + msecs_to_jiffies(1000); for (;;) { - if (chip->polling_mode) { + if (chip->polling_mode || do_poll) { spin_lock_irq(&chip->reg_lock); azx_update_rirb(chip); spin_unlock_irq(&chip->reg_lock); @@ -676,6 +678,9 @@ static unsigned int azx_rirb_get_response(struct hda_bus *bus, if (!chip->rirb.cmds[addr]) { smp_rmb(); bus->rirb_error = 0; + + if (!do_poll) + chip->poll_count = 0; return chip->rirb.res[addr]; /* the last value */ } if (time_after(jiffies, timeout)) @@ -688,6 +693,16 @@ static unsigned int azx_rirb_get_response(struct hda_bus *bus, } }
+ if (!chip->polling_mode && chip->poll_count < 2) { + snd_printd(SFX "azx_get_response timeout, " + "polling the codec once: last cmd=0x%08x\n", + chip->last_cmd[addr]); + do_poll = 1; + chip->poll_count++; + goto again; + } + + if (!chip->polling_mode) { snd_printk(KERN_WARNING SFX "azx_get_response timeout, " "switching to polling mode: last cmd=0x%08x\n",

At Thu, 04 Feb 2010 22:20:17 +0200, Maxim Levitsky wrote:
Hi,
Although I thought that polling mode is harmless, it sometimes isn't. I found out that after few dozens of hibernate cycles, the timeout would happen again kicking out first MSI, and then whole RRIB (single_cmd).
However, I also found out that if I resend the command that caused the timeout, it would complete normally. Even futher, it is possible to poll for this command, and send next normally. It just works. So interrupts do work, but sometimes (very rarely) are missed. It might be even a hardware bug.
The patch I send allows to delay switch to polling till we get 3 such timeouts in a row.
I tested this approach for about 130 hibernate cycles.
Thanks! Applied both patches now.
I changed snd_printd() to snd_printdd() since the former is enabled on vendor kernels often as default, and this can worry innocent users (I know from my experiences :)
Takashi

On Fri, 2010-02-05 at 09:13 +0100, Takashi Iwai wrote:
At Thu, 04 Feb 2010 22:20:17 +0200, Maxim Levitsky wrote:
Hi,
Although I thought that polling mode is harmless, it sometimes isn't. I found out that after few dozens of hibernate cycles, the timeout would happen again kicking out first MSI, and then whole RRIB (single_cmd).
However, I also found out that if I resend the command that caused the timeout, it would complete normally. Even futher, it is possible to poll for this command, and send next normally. It just works. So interrupts do work, but sometimes (very rarely) are missed. It might be even a hardware bug.
The patch I send allows to delay switch to polling till we get 3 such timeouts in a row.
I tested this approach for about 130 hibernate cycles.
Thanks! Applied both patches now.
I changed snd_printd() to snd_printdd() since the former is enabled on vendor kernels often as default, and this can worry innocent users (I know from my experiences :)
Thanks a lot!
Best regards, Maxim Levitsky
participants (2)
-
Maxim Levitsky
-
Takashi Iwai