[alsa-devel] snd-hda-intel - wallclk patch

Tue May 11 12:51:12 CEST 2010

At Tue, 11 May 2010 12:23:19 +0200 (CEST),
Jaroslav Kysela wrote:
> 
> On Tue, 11 May 2010, Takashi Iwai wrote:
> 
> > At Tue, 11 May 2010 11:07:05 +0200 (CEST),
> > Jaroslav Kysela wrote:
> >>
> >> On Tue, 11 May 2010, Takashi Iwai wrote:
> >>
> >>> At Tue, 11 May 2010 10:44:17 +0200 (CEST),
> >>> Jaroslav Kysela wrote:
> >>>>
> >>>> On Tue, 11 May 2010, Takashi Iwai wrote:
> >>>>
> >>>>> At Tue, 11 May 2010 10:28:50 +0200 (CEST),
> >>>>> Jaroslav Kysela wrote:
> >>>>>>
> >>>>>> On Mon, 10 May 2010, Jaroslav Kysela wrote:
> >>>>>>
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> 	I would like to ask HDA gurus to check this patch (I will include it
> >>>>>>> to my tree once acked). The patch uses WALLCLK from HDA chips (marked as
> >>>>>>> required in the HDA specification) to check for bogus - too early - interrups
> >>>>>>> which confuses the upper PCM layer (sound skipping issues).
> >>>>>>>
> >>>>>>> 	More details about patch testing on problematic hardware can be found
> >>>>>>> at:
> >>>>>>>
> >>>>>>> 	https://bugzilla.kernel.org/show_bug.cgi?id=15912
> >>>>>>
> >>>>>> No objections received, so I pushed this code to my master/devel branches.
> >>>>>
> >>>>> Heh, that was quick, I had no time to review :)
> >>>>>
> >>>>> The change looks good to me.  Thanks for your fix (and rebasing).
> >>>>>
> >>>>> One thing is that you removed bdl_pos_adj=0 check.  This was there
> >>>>> explicitly to avoid the delayed irq handling no matter whether the
> >>>>> value is correct or not.  I guess it's safer to take it back, or
> >>>>> give any other way for user to avoid the delayed irq handling.
> >>>>
> >>>> I see. But I need the pos % period_bytes check too. What about this
> >>>> change?
> >>>
> >>> With bdl_pos_adj==0, it should return 1 (non-error), but otherwise
> >>> this should be OK, I guess.
> >>
> >> The problem with returning value 1 -> call snd_pcm_elapsed() - is that the
> >> upper layer will be most probably confused on broken hw. From logs in
> >> kernel bz#15912, the bad irq can be triggered at any time (I really don't
> >> know why INTSTS register is set).
> >>
> >> The first wallclk check uses to check for first 2/3 of period and the
> >> pos % period_bytes check checks for second half of period. Writing this, I
> >> think that the extra check can be added to the second condition to do
> >> position related check for first next period only:
> >
> > Isn't a similar check done in PCM core side?
> > Filtering or delaying ack is perhaps better done here, though...
> 
> Unfortunately, the check in PCM core expects that at least 'period_time' 
> is elapsed between two snd_pcm_elapsed() calls:
> 
>          if (in_interrupt) {
>                  /* we know that one period was processed */
>                  /* delta = "expected next hw_ptr" for in_interrupt != 0 */
>                  delta = runtime->hw_ptr_interrupt + runtime->period_size;
> 
> In situation, when the driver calls the snd_pcm_elapsed() too early, 
> the pointers will be updated wrongly (whole ring buffer size will be 
> added to hw_ptr).

Yeah, that's bad.

> >> diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
> >> index 0a6c55b..d4d532a 100644
> >> --- a/sound/pci/hda/hda_intel.c
> >> +++ b/sound/pci/hda/hda_intel.c
> >> @@ -1890,9 +1890,8 @@ static int azx_position_ok(struct azx *chip, struct azx_dev *azx_dev)
> >>   	unsigned int pos;
> >>   	int stream;
> >>
> >> -	wallclk = azx_readl(chip, WALLCLK);
> >> -	if ((wallclk - azx_dev->start_wallclk) <
> >> -				(azx_dev->period_wallclk * 2) / 3)
> >> +	wallclk = azx_readl(chip, WALLCLK) - azx_dev->start_wallclk;
> >> +	if (wallclk < (azx_dev->period_wallclk * 2) / 3)
> >>   		return -1;	/* bogus (too early) interrupt */
> >>
> >>   	stream = azx_dev->substream->stream;
> >> @@ -1910,9 +1909,12 @@ static int azx_position_ok(struct azx *chip, struct azx_dev *azx_dev)
> >>
> >>   	if (WARN_ONCE(!azx_dev->period_bytes,
> >>   		      "hda-intel: zero azx_dev->period_bytes"))
> >> -		return 0; /* this shouldn't happen! */
> >> -	if (pos % azx_dev->period_bytes > azx_dev->period_bytes / 2)
> >> -		return 0; /* NG - it's below the period boundary */
> >> +		/* this shouldn't happen! */
> >> +		return bdl_pos_adj[chip->dev_index] ? 0 : -1;
> >> +	if (wallclk <= azx_dev->period_wallclk &&
> >> +	    pos % azx_dev->period_bytes > azx_dev->period_bytes / 2)
> >> +		/* NG - it's below the first next period boundary */
> >> +		return bdl_pos_adj[chip->dev_index] ? 0 : -1;
> >>   	azx_dev->start_wallclk = wallclk;
> >>   	return 1; /* OK, it's fine */
> >>   }
> >>
> >> I don't see any reason to call snd_pcm_elapsed() in bad time. It's better
> >> to return -1 (ignore) and wait for next interrupt.
> >
> > Hm, judging from the current situation, a missing IRQ is harmless than
> > the wrong position ack, indeed.
> >
> > But delaying the period handling is a bit different from filtering
> > bogus IRQs.  I'm not entirely sure which is the cause in the bug
> > above.  If it's a bogus IRQ, could it be filtered out.  If it's an IRQ
> > at wrong time (too early), the delayed ack is needed.  If it's the
> > wrong position read, the delayed ack isn't needed but the position
> > correction is needed.
> >
> > So, I'm a bit afraid of adding too much checks that might give a
> > reaction based on wrong information.  We'd need to take a detailed
> > look again at each cause...
> 
> I'm also not so much happy to have this code in the driver, but it seems 
> that HDA hardware does not respect the standard behaviour in some cases.
> Anyway, it's good to have the code robust.
> 
> > BTW, now I looked back again, and think
> > WARN_ONCE(!azx_dev->period_bytes) should be also returning -1 for
> > normal cases, no?  This is just a sanity check, and if this happens,
> > the chance that you get the right IRQ at the next moment isn't high,
> > since something got screwed up.
> 
> I agree. I put all suggested changes to my tree into "[ALSA] 
> snd-hda-intel: Improve azx_position_ok()" commit.

Thanks!  I pulled now this, too.

Takashi