[alsa-devel] snd-hda-intel - wallclk patch

Tue May 11 11:24:07 CEST 2010

At Tue, 11 May 2010 11:07:05 +0200 (CEST),
Jaroslav Kysela wrote:
> 
> On Tue, 11 May 2010, Takashi Iwai wrote:
> 
> > At Tue, 11 May 2010 10:44:17 +0200 (CEST),
> > Jaroslav Kysela wrote:
> >>
> >> On Tue, 11 May 2010, Takashi Iwai wrote:
> >>
> >>> At Tue, 11 May 2010 10:28:50 +0200 (CEST),
> >>> Jaroslav Kysela wrote:
> >>>>
> >>>> On Mon, 10 May 2010, Jaroslav Kysela wrote:
> >>>>
> >>>>> Hi,
> >>>>>
> >>>>> 	I would like to ask HDA gurus to check this patch (I will include it
> >>>>> to my tree once acked). The patch uses WALLCLK from HDA chips (marked as
> >>>>> required in the HDA specification) to check for bogus - too early - interrups
> >>>>> which confuses the upper PCM layer (sound skipping issues).
> >>>>>
> >>>>> 	More details about patch testing on problematic hardware can be found
> >>>>> at:
> >>>>>
> >>>>> 	https://bugzilla.kernel.org/show_bug.cgi?id=15912
> >>>>
> >>>> No objections received, so I pushed this code to my master/devel branches.
> >>>
> >>> Heh, that was quick, I had no time to review :)
> >>>
> >>> The change looks good to me.  Thanks for your fix (and rebasing).
> >>>
> >>> One thing is that you removed bdl_pos_adj=0 check.  This was there
> >>> explicitly to avoid the delayed irq handling no matter whether the
> >>> value is correct or not.  I guess it's safer to take it back, or
> >>> give any other way for user to avoid the delayed irq handling.
> >>
> >> I see. But I need the pos % period_bytes check too. What about this
> >> change?
> >
> > With bdl_pos_adj==0, it should return 1 (non-error), but otherwise
> > this should be OK, I guess.
> 
> The problem with returning value 1 -> call snd_pcm_elapsed() - is that the 
> upper layer will be most probably confused on broken hw. From logs in 
> kernel bz#15912, the bad irq can be triggered at any time (I really don't 
> know why INTSTS register is set).
> 
> The first wallclk check uses to check for first 2/3 of period and the
> pos % period_bytes check checks for second half of period. Writing this, I 
> think that the extra check can be added to the second condition to do 
> position related check for first next period only:

Isn't a similar check done in PCM core side?
Filtering or delaying ack is perhaps better done here, though...

> diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
> index 0a6c55b..d4d532a 100644
> --- a/sound/pci/hda/hda_intel.c
> +++ b/sound/pci/hda/hda_intel.c
> @@ -1890,9 +1890,8 @@ static int azx_position_ok(struct azx *chip, struct azx_dev *azx_dev)
>   	unsigned int pos;
>   	int stream;
> 
> -	wallclk = azx_readl(chip, WALLCLK);
> -	if ((wallclk - azx_dev->start_wallclk) <
> -				(azx_dev->period_wallclk * 2) / 3)
> +	wallclk = azx_readl(chip, WALLCLK) - azx_dev->start_wallclk;
> +	if (wallclk < (azx_dev->period_wallclk * 2) / 3)
>   		return -1;	/* bogus (too early) interrupt */
> 
>   	stream = azx_dev->substream->stream;
> @@ -1910,9 +1909,12 @@ static int azx_position_ok(struct azx *chip, struct azx_dev *azx_dev)
> 
>   	if (WARN_ONCE(!azx_dev->period_bytes,
>   		      "hda-intel: zero azx_dev->period_bytes"))
> -		return 0; /* this shouldn't happen! */
> -	if (pos % azx_dev->period_bytes > azx_dev->period_bytes / 2)
> -		return 0; /* NG - it's below the period boundary */
> +		/* this shouldn't happen! */
> +		return bdl_pos_adj[chip->dev_index] ? 0 : -1;
> +	if (wallclk <= azx_dev->period_wallclk &&
> +	    pos % azx_dev->period_bytes > azx_dev->period_bytes / 2)
> +		/* NG - it's below the first next period boundary */
> +		return bdl_pos_adj[chip->dev_index] ? 0 : -1;
>   	azx_dev->start_wallclk = wallclk;
>   	return 1; /* OK, it's fine */
>   }
> 
> I don't see any reason to call snd_pcm_elapsed() in bad time. It's better 
> to return -1 (ignore) and wait for next interrupt.

Hm, judging from the current situation, a missing IRQ is harmless than
the wrong position ack, indeed.

But delaying the period handling is a bit different from filtering
bogus IRQs.  I'm not entirely sure which is the cause in the bug
above.  If it's a bogus IRQ, could it be filtered out.  If it's an IRQ
at wrong time (too early), the delayed ack is needed.  If it's the
wrong position read, the delayed ack isn't needed but the position
correction is needed.

So, I'm a bit afraid of adding too much checks that might give a
reaction based on wrong information.  We'd need to take a detailed
look again at each cause...

BTW, now I looked back again, and think
WARN_ONCE(!azx_dev->period_bytes) should be also returning -1 for
normal cases, no?  This is just a sanity check, and if this happens,
the chance that you get the right IRQ at the next moment isn't high,
since something got screwed up.

thanks,

Takashi