Re: [alsa-devel] snd-hda-intel - wallclk patch

11 May 2010

At Tue, 11 May 2010 12:23:19 +0200 (CEST),
Jaroslav Kysela wrote:
...
On Tue, 11 May 2010, Takashi Iwai wrote:
...
At Tue, 11 May 2010 11:07:05 +0200 (CEST),
Jaroslav Kysela wrote:
...
On Tue, 11 May 2010, Takashi Iwai wrote:
...
At Tue, 11 May 2010 10:44:17 +0200 (CEST),
Jaroslav Kysela wrote:
...
On Tue, 11 May 2010, Takashi Iwai wrote:
...
At Tue, 11 May 2010 10:28:50 +0200 (CEST),
Jaroslav Kysela wrote:
>
> On Mon, 10 May 2010, Jaroslav Kysela wrote:
>
>> Hi,
>>
>> 	I would like to ask HDA gurus to check this patch (I will include it
>> to my tree once acked). The patch uses WALLCLK from HDA chips (marked as
>> required in the HDA specification) to check for bogus - too early - interrups
>> which confuses the upper PCM layer (sound skipping issues).
>>
>> 	More details about patch testing on problematic hardware can be found
>> at:
>>
>> 	https://bugzilla.kernel.org/show_bug.cgi?id=15912
>
> No objections received, so I pushed this code to my master/devel branches.
Heh, that was quick, I had no time to review :)
The change looks good to me.  Thanks for your fix (and rebasing).
One thing is that you removed bdl_pos_adj=0 check.  This was there
explicitly to avoid the delayed irq handling no matter whether the
value is correct or not.  I guess it's safer to take it back, or
give any other way for user to avoid the delayed irq handling.
I see. But I need the pos % period_bytes check too. What about this
change?
With bdl_pos_adj==0, it should return 1 (non-error), but otherwise
this should be OK, I guess.
The problem with returning value 1 -> call snd_pcm_elapsed() - is that the
upper layer will be most probably confused on broken hw. From logs in
kernel bz#15912, the bad irq can be triggered at any time (I really don't
know why INTSTS register is set).
The first wallclk check uses to check for first 2/3 of period and the
pos % period_bytes check checks for second half of period. Writing this, I
think that the extra check can be added to the second condition to do
position related check for first next period only:
Isn't a similar check done in PCM core side?
Filtering or delaying ack is perhaps better done here, though...
Unfortunately, the check in PCM core expects that at least 'period_time' 
is elapsed between two snd_pcm_elapsed() calls:
     if (in_interrupt) {
             /* we know that one period was processed */
             /* delta = "expected next hw_ptr" for in_interrupt != 0 */
             delta = runtime->hw_ptr_interrupt + runtime->period_size;


In situation, when the driver calls the snd_pcm_elapsed() too early, 
the pointers will be updated wrongly (whole ring buffer size will be 
added to hw_ptr).
Yeah, that's bad.
...
...
...

diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
index 0a6c55b..d4d532a 100644
--- a/sound/pci/hda/hda_intel.c
+++ b/sound/pci/hda/hda_intel.c
@@ -1890,9 +1890,8 @@ static int azx_position_ok(struct azx *chip, struct azx_dev *azx_dev)
   unsigned int pos;
   int stream;

wallclk = azx_readl(chip, WALLCLK);
if ((wallclk - azx_dev->start_wallclk) <
		(azx_dev->period_wallclk * 2) / 3)




wallclk = azx_readl(chip, WALLCLK) - azx_dev->start_wallclk;

if (wallclk < (azx_dev->period_wallclk * 2) / 3)
return -1;	/* bogus (too early) interrupt */
stream = azx_dev->substream->stream;


@@ -1910,9 +1909,12 @@ static int azx_position_ok(struct azx *chip, struct azx_dev *azx_dev)
if (WARN_ONCE(!azx_dev->period_bytes,
   	      "hda-intel: zero azx_dev->period_bytes"))

return 0; /* this shouldn't happen! */


if (pos % azx_dev->period_bytes > azx_dev->period_bytes / 2)
return 0; /* NG - it's below the period boundary */




/* this shouldn't happen! */


return bdl_pos_adj[chip->dev_index] ? 0 : -1;


if (wallclk <= azx_dev->period_wallclk &&
   pos % azx_dev->period_bytes > azx_dev->period_bytes / 2)


/* NG - it's below the first next period boundary */


return bdl_pos_adj[chip->dev_index] ? 0 : -1;

azx_dev->start_wallclk = wallclk;
return 1; /* OK, it's fine */
}

I don't see any reason to call snd_pcm_elapsed() in bad time. It's better
to return -1 (ignore) and wait for next interrupt.
Hm, judging from the current situation, a missing IRQ is harmless than
the wrong position ack, indeed.
But delaying the period handling is a bit different from filtering
bogus IRQs.  I'm not entirely sure which is the cause in the bug
above.  If it's a bogus IRQ, could it be filtered out.  If it's an IRQ
at wrong time (too early), the delayed ack is needed.  If it's the
wrong position read, the delayed ack isn't needed but the position
correction is needed.
So, I'm a bit afraid of adding too much checks that might give a
reaction based on wrong information.  We'd need to take a detailed
look again at each cause...
I'm also not so much happy to have this code in the driver, but it seems 
that HDA hardware does not respect the standard behaviour in some cases.
Anyway, it's good to have the code robust.
...
BTW, now I looked back again, and think
WARN_ONCE(!azx_dev->period_bytes) should be also returning -1 for
normal cases, no?  This is just a sanity check, and if this happens,
the chance that you get the right IRQ at the next moment isn't high,
since something got screwed up.
I agree. I put all suggested changes to my tree into "[ALSA] 
snd-hda-intel: Improve azx_position_ok()" commit.
Thanks!  I pulled now this, too.
Takashi