[PATCH] ALSA: hda: Reduce udelay() at SKL+ position reporting

Takashi Iwai tiwai at suse.de
Tue Sep 28 11:01:49 CEST 2021


On Wed, 22 Sep 2021 00:18:14 +0200,
Pierre-Louis Bossart wrote:
> 
> Sorry Takashi for the delay, I missed your reply.
> 
> >>> The position reporting on Intel Skylake and later chips via
> >>> azx_get_pos_skl() contains a udelay(20) call for the capture streams.
> >>> A call for this alone doesn't sound too harmful.  However, as the
> >>> pointer PCM ops is one of the hottest path in the PCM operations --
> >>> especially for the timer-scheduled operations like PulseAudio -- such
> >>> a delay hogs CPU usage significantly in the total performance.
> >>>
> >>> The function there was taken from the original code in ASoC SST
> >>> Skylake driver blindly.  The udelay() is a workaround for the case
> >>> where the reported position is behind the period boundary at the
> >>> timing triggered from interrupts; applications often expect that the
> >>> full data is available for the whole period when returned (and also
> >>> that's the definition of the ALSA PCM period).
> >>
> >> that initial work-around from the Intel SST driver does not seem to be
> >> legit in the first place, when I checked with hardware folks recently no
> >> one understands why there are delays and special cases for capture. The
> >> only requirement wrt. DPIB is that the DDR update is valid across VC0
> >> and VC1, while the DPIB registers are only valid with VC0. For SOF we
> >> don't know of any VC1 use so will default to the DPIB vendor-specific
> >> registers.
> > 
> > What are those VC0 and VC1 registers?  I can't find the definitions in
> > the code, so I assume that none of ALSA/ASoC drivers use VC1.
> 
> These are PCI concepts/capabilities. VC stands for "Virtual Channel",
> which are mapped to Traffic Class (TC). These registers are typically
> set by the BIOS AFAIK. The point is that if VC1 is enabled only the DPIB
> updates are valid, the vendor-specific will report information can be
> misleading.
> 
> The recommendation from hardware folks is to use DPIB updates in DDR,
> which are known to work across both VC0 and VC1.
> 
> For SOF, we do know VC1 is never used so we'll use the vendor-specific
> registers.

OK, thanks for clarification.

> >> See https://github.com/thesofproject/linux/pull/3143 for my WIP fixes
> >> for SOF.
> >>
> >> I don't have the time to look at this specific patch today but wanted to
> >> make sure you are aware of my on-going fixes.
> >>
> >> Note that the use of DPIB works best if you don't use the IOC interrupt.
> >> when the interrupt is thrown, there is likely a delay for the DPIB
> >> information to be refreshed.
> > 
> > Thanks for the information!  The delay could be the reason of the
> > udelay(), and that's superfluous as mentioned in the commit.
> > 
> > So the remaining question seems to be which method is a better
> > approach for the capture stream: DPIB or posbuf.  I kept the posbuf
> > just for conservatism, but judging from your comment, we may use DPIB
> > for both directions safely?
> 
> sorry you lost me. Isn't DPIB updates in DDR precisely what posbuf reports?
> 
> I think what you need to use is use azx_get_pos_posbuf(chip, azx_dev);
> unconditionally, both for capture and playback, and remove the use of
> the skylake specific stuff and the delay.

When I measured, I saw a slight difference between values in DPIB and
posbuf, so I wonder which is actually more accurate.  The latter is
sometimes a bit behind the former.

If the posbuf is more correct in the sense for the PCM pointer, we can
simply move back to the posbuf like other platforms.


thanks,

Takashi


> > In anyway, the additional mechanism to check the delayed position
> > report in this patch can be kept no matter which way (DPIB or posbuf)
> > is used.
> 
> Agree!
> 
> > 
> > 
> > Takashi
> > 
> >>
> >>>
> >>> OTOH, HD-audio (legacy) driver has already some workarounds for the
> >>> delayed position reporting due to its relatively large FIFO, such as
> >>> the BDL position adjustment and the delayed period-elapsed call in the
> >>> work.  That said, the udelay() is almost superfluous for HD-audio
> >>> driver unlike SST, and we can drop the udelay().
> >>>
> >>> Though, the current code doesn't guarantee the full period readiness
> >>> as mentioned in the above, but rather it checks the wallclock and
> >>> detects the unexpected jump.  That's one missing piece, and the drop
> >>> of udelay() needs a bit more sanity checks for the delayed handling.
> >>>
> >>> This patch implements those: the drop of udelay() call in
> >>> azx_get_pos_skl() and the more proper check of hwptr in
> >>> azx_position_ok().  The latter change is applied only for the case
> >>> where the stream is running in the normal mode without
> >>> no_period_wakeup flag.  When no_period_wakeup is set, it essentially
> >>> ignores the period handling and rather concentrates only on the
> >>> current position; which implies that we don't need to care about the
> >>> period boundary at all.
> >>>
> >>> Fixes: f87e7f25893d ("ALSA: hda - Improved position reporting on SKL+")
> >>> Reported-by: Jens Axboe <axboe at kernel.dk>
> >>> Signed-off-by: Takashi Iwai <tiwai at suse.de>
> >>> ---
> >>>  sound/pci/hda/hda_intel.c | 28 +++++++++++++++++++++++-----
> >>>  1 file changed, 23 insertions(+), 5 deletions(-)
> >>>
> >>> diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
> >>> index 3aa432d814a2..faeeeb923d5e 100644
> >>> --- a/sound/pci/hda/hda_intel.c
> >>> +++ b/sound/pci/hda/hda_intel.c
> >>> @@ -637,13 +637,17 @@ static int azx_position_check(struct azx *chip, struct azx_dev *azx_dev)
> >>>   * the update-IRQ timing.  The IRQ is issued before actually the
> >>>   * data is processed.  So, we need to process it afterwords in a
> >>>   * workqueue.
> >>> + *
> >>> + * Returns 1 if OK to proceed, 0 for delay handling, -1 for skipping update
> >>>   */
> >>>  static int azx_position_ok(struct azx *chip, struct azx_dev *azx_dev)
> >>>  {
> >>>  	struct snd_pcm_substream *substream = azx_dev->core.substream;
> >>> +	struct snd_pcm_runtime *runtime = substream->runtime;
> >>>  	int stream = substream->stream;
> >>>  	u32 wallclk;
> >>>  	unsigned int pos;
> >>> +	snd_pcm_uframes_t hwptr, target;
> >>>  
> >>>  	wallclk = azx_readl(chip, WALLCLK) - azx_dev->core.start_wallclk;
> >>>  	if (wallclk < (azx_dev->core.period_wallclk * 2) / 3)
> >>> @@ -680,6 +684,24 @@ static int azx_position_ok(struct azx *chip, struct azx_dev *azx_dev)
> >>>  		/* NG - it's below the first next period boundary */
> >>>  		return chip->bdl_pos_adj ? 0 : -1;
> >>>  	azx_dev->core.start_wallclk += wallclk;
> >>> +
> >>> +	if (azx_dev->core.no_period_wakeup)
> >>> +		return 1; /* OK, no need to check period boundary */
> >>> +
> >>> +	if (runtime->hw_ptr_base != runtime->hw_ptr_interrupt)
> >>> +		return 1; /* OK, already in hwptr updating process */
> >>> +
> >>> +	/* check whether the period gets really elapsed */
> >>> +	pos = bytes_to_frames(runtime, pos);
> >>> +	hwptr = runtime->hw_ptr_base + pos;
> >>> +	if (hwptr < runtime->status->hw_ptr)
> >>> +		hwptr += runtime->buffer_size;
> >>> +	target = runtime->hw_ptr_interrupt + runtime->period_size;
> >>> +	if (hwptr < target) {
> >>> +		/* too early wakeup, process it later */
> >>> +		return chip->bdl_pos_adj ? 0 : -1;
> >>> +	}
> >>> +
> >>>  	return 1; /* OK, it's fine */
> >>>  }
> >>>  
> >>> @@ -874,11 +896,7 @@ static unsigned int azx_get_pos_skl(struct azx *chip, struct azx_dev *azx_dev)
> >>>  	if (azx_dev->core.substream->stream == SNDRV_PCM_STREAM_PLAYBACK)
> >>>  		return azx_skl_get_dpib_pos(chip, azx_dev);
> >>>  
> >>> -	/* For capture, we need to read posbuf, but it requires a delay
> >>> -	 * for the possible boundary overlap; the read of DPIB fetches the
> >>> -	 * actual posbuf
> >>> -	 */
> >>> -	udelay(20);
> >>> +	/* read of DPIB fetches the actual posbuf */
> >>>  	azx_skl_get_dpib_pos(chip, azx_dev);
> >>>  	return azx_get_pos_posbuf(chip, azx_dev);
> >>>  }
> >>>
> >>
> 


More information about the Alsa-devel mailing list