[PATCH] ALSA: hda: Reduce udelay() at SKL+ position reporting
Takashi Iwai
tiwai at suse.de
Tue Sep 28 11:01:49 CEST 2021
On Wed, 22 Sep 2021 00:18:14 +0200,
Pierre-Louis Bossart wrote:
>
> Sorry Takashi for the delay, I missed your reply.
>
> >>> The position reporting on Intel Skylake and later chips via
> >>> azx_get_pos_skl() contains a udelay(20) call for the capture streams.
> >>> A call for this alone doesn't sound too harmful. However, as the
> >>> pointer PCM ops is one of the hottest path in the PCM operations --
> >>> especially for the timer-scheduled operations like PulseAudio -- such
> >>> a delay hogs CPU usage significantly in the total performance.
> >>>
> >>> The function there was taken from the original code in ASoC SST
> >>> Skylake driver blindly. The udelay() is a workaround for the case
> >>> where the reported position is behind the period boundary at the
> >>> timing triggered from interrupts; applications often expect that the
> >>> full data is available for the whole period when returned (and also
> >>> that's the definition of the ALSA PCM period).
> >>
> >> that initial work-around from the Intel SST driver does not seem to be
> >> legit in the first place, when I checked with hardware folks recently no
> >> one understands why there are delays and special cases for capture. The
> >> only requirement wrt. DPIB is that the DDR update is valid across VC0
> >> and VC1, while the DPIB registers are only valid with VC0. For SOF we
> >> don't know of any VC1 use so will default to the DPIB vendor-specific
> >> registers.
> >
> > What are those VC0 and VC1 registers? I can't find the definitions in
> > the code, so I assume that none of ALSA/ASoC drivers use VC1.
>
> These are PCI concepts/capabilities. VC stands for "Virtual Channel",
> which are mapped to Traffic Class (TC). These registers are typically
> set by the BIOS AFAIK. The point is that if VC1 is enabled only the DPIB
> updates are valid, the vendor-specific will report information can be
> misleading.
>
> The recommendation from hardware folks is to use DPIB updates in DDR,
> which are known to work across both VC0 and VC1.
>
> For SOF, we do know VC1 is never used so we'll use the vendor-specific
> registers.
OK, thanks for clarification.
> >> See https://github.com/thesofproject/linux/pull/3143 for my WIP fixes
> >> for SOF.
> >>
> >> I don't have the time to look at this specific patch today but wanted to
> >> make sure you are aware of my on-going fixes.
> >>
> >> Note that the use of DPIB works best if you don't use the IOC interrupt.
> >> when the interrupt is thrown, there is likely a delay for the DPIB
> >> information to be refreshed.
> >
> > Thanks for the information! The delay could be the reason of the
> > udelay(), and that's superfluous as mentioned in the commit.
> >
> > So the remaining question seems to be which method is a better
> > approach for the capture stream: DPIB or posbuf. I kept the posbuf
> > just for conservatism, but judging from your comment, we may use DPIB
> > for both directions safely?
>
> sorry you lost me. Isn't DPIB updates in DDR precisely what posbuf reports?
>
> I think what you need to use is use azx_get_pos_posbuf(chip, azx_dev);
> unconditionally, both for capture and playback, and remove the use of
> the skylake specific stuff and the delay.
When I measured, I saw a slight difference between values in DPIB and
posbuf, so I wonder which is actually more accurate. The latter is
sometimes a bit behind the former.
If the posbuf is more correct in the sense for the PCM pointer, we can
simply move back to the posbuf like other platforms.
thanks,
Takashi
> > In anyway, the additional mechanism to check the delayed position
> > report in this patch can be kept no matter which way (DPIB or posbuf)
> > is used.
>
> Agree!
>
> >
> >
> > Takashi
> >
> >>
> >>>
> >>> OTOH, HD-audio (legacy) driver has already some workarounds for the
> >>> delayed position reporting due to its relatively large FIFO, such as
> >>> the BDL position adjustment and the delayed period-elapsed call in the
> >>> work. That said, the udelay() is almost superfluous for HD-audio
> >>> driver unlike SST, and we can drop the udelay().
> >>>
> >>> Though, the current code doesn't guarantee the full period readiness
> >>> as mentioned in the above, but rather it checks the wallclock and
> >>> detects the unexpected jump. That's one missing piece, and the drop
> >>> of udelay() needs a bit more sanity checks for the delayed handling.
> >>>
> >>> This patch implements those: the drop of udelay() call in
> >>> azx_get_pos_skl() and the more proper check of hwptr in
> >>> azx_position_ok(). The latter change is applied only for the case
> >>> where the stream is running in the normal mode without
> >>> no_period_wakeup flag. When no_period_wakeup is set, it essentially
> >>> ignores the period handling and rather concentrates only on the
> >>> current position; which implies that we don't need to care about the
> >>> period boundary at all.
> >>>
> >>> Fixes: f87e7f25893d ("ALSA: hda - Improved position reporting on SKL+")
> >>> Reported-by: Jens Axboe <axboe at kernel.dk>
> >>> Signed-off-by: Takashi Iwai <tiwai at suse.de>
> >>> ---
> >>> sound/pci/hda/hda_intel.c | 28 +++++++++++++++++++++++-----
> >>> 1 file changed, 23 insertions(+), 5 deletions(-)
> >>>
> >>> diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
> >>> index 3aa432d814a2..faeeeb923d5e 100644
> >>> --- a/sound/pci/hda/hda_intel.c
> >>> +++ b/sound/pci/hda/hda_intel.c
> >>> @@ -637,13 +637,17 @@ static int azx_position_check(struct azx *chip, struct azx_dev *azx_dev)
> >>> * the update-IRQ timing. The IRQ is issued before actually the
> >>> * data is processed. So, we need to process it afterwords in a
> >>> * workqueue.
> >>> + *
> >>> + * Returns 1 if OK to proceed, 0 for delay handling, -1 for skipping update
> >>> */
> >>> static int azx_position_ok(struct azx *chip, struct azx_dev *azx_dev)
> >>> {
> >>> struct snd_pcm_substream *substream = azx_dev->core.substream;
> >>> + struct snd_pcm_runtime *runtime = substream->runtime;
> >>> int stream = substream->stream;
> >>> u32 wallclk;
> >>> unsigned int pos;
> >>> + snd_pcm_uframes_t hwptr, target;
> >>>
> >>> wallclk = azx_readl(chip, WALLCLK) - azx_dev->core.start_wallclk;
> >>> if (wallclk < (azx_dev->core.period_wallclk * 2) / 3)
> >>> @@ -680,6 +684,24 @@ static int azx_position_ok(struct azx *chip, struct azx_dev *azx_dev)
> >>> /* NG - it's below the first next period boundary */
> >>> return chip->bdl_pos_adj ? 0 : -1;
> >>> azx_dev->core.start_wallclk += wallclk;
> >>> +
> >>> + if (azx_dev->core.no_period_wakeup)
> >>> + return 1; /* OK, no need to check period boundary */
> >>> +
> >>> + if (runtime->hw_ptr_base != runtime->hw_ptr_interrupt)
> >>> + return 1; /* OK, already in hwptr updating process */
> >>> +
> >>> + /* check whether the period gets really elapsed */
> >>> + pos = bytes_to_frames(runtime, pos);
> >>> + hwptr = runtime->hw_ptr_base + pos;
> >>> + if (hwptr < runtime->status->hw_ptr)
> >>> + hwptr += runtime->buffer_size;
> >>> + target = runtime->hw_ptr_interrupt + runtime->period_size;
> >>> + if (hwptr < target) {
> >>> + /* too early wakeup, process it later */
> >>> + return chip->bdl_pos_adj ? 0 : -1;
> >>> + }
> >>> +
> >>> return 1; /* OK, it's fine */
> >>> }
> >>>
> >>> @@ -874,11 +896,7 @@ static unsigned int azx_get_pos_skl(struct azx *chip, struct azx_dev *azx_dev)
> >>> if (azx_dev->core.substream->stream == SNDRV_PCM_STREAM_PLAYBACK)
> >>> return azx_skl_get_dpib_pos(chip, azx_dev);
> >>>
> >>> - /* For capture, we need to read posbuf, but it requires a delay
> >>> - * for the possible boundary overlap; the read of DPIB fetches the
> >>> - * actual posbuf
> >>> - */
> >>> - udelay(20);
> >>> + /* read of DPIB fetches the actual posbuf */
> >>> azx_skl_get_dpib_pos(chip, azx_dev);
> >>> return azx_get_pos_posbuf(chip, azx_dev);
> >>> }
> >>>
> >>
>
More information about the Alsa-devel
mailing list