[PATCH v2 0/2] ALSA: hda: Reduce CPU hog with SKL+ position reporting
Hi,
this is a v2 patch set for reducing the CPU hog with the HD-audio PCM position reporting. The first patch is almost same, while the second patch is added to take back to the position buffer as suggested by Pierre.
Takashi
v1: https://lore.kernel.org/r/20210910141002.32749-1-tiwai@suse.de
===
Takashi Iwai (2): ALSA: hda: Reduce udelay() at SKL+ position reporting ALSA: hda: Use position buffer for SKL+ again
sound/pci/hda/hda_intel.c | 49 ++++++++++++++++++--------------------- 1 file changed, 23 insertions(+), 26 deletions(-)
The position reporting on Intel Skylake and later chips via azx_get_pos_skl() contains a udelay(20) call for the capture streams. A call for this alone doesn't sound too harmful. However, as the pointer PCM ops is one of the hottest path in the PCM operations -- especially for the timer-scheduled operations like PulseAudio -- such a delay hogs CPU usage significantly in the total performance.
The code there was taken from the original code in ASoC SST Skylake driver blindly. The udelay() is a workaround for the case where the reported position is behind the period boundary at the timing triggered from interrupts; applications often expect that the full data is available for the whole period when returned (and also that's the definition of the ALSA PCM period).
OTOH, HD-audio (legacy) driver has already some workarounds for the delayed position reporting due to its relatively large FIFO, such as the BDL position adjustment and the delayed period-elapsed call in the work. That said, the udelay() is almost superfluous for HD-audio driver unlike SST, and we can drop the udelay().
Though, the current code doesn't guarantee the full period readiness as mentioned in the above, but rather it checks the wallclock and detects the unexpected jump. That's one missing piece, and the drop of udelay() needs a bit more sanity checks for the delayed handling.
This patch implements those: the drop of udelay() call in azx_get_pos_skl() and the more proper check of hwptr in azx_position_ok(). The latter change is applied only for the case where the stream is running in the normal mode without no_period_wakeup flag. When no_period_wakeup is set, it essentially ignores the period handling and rather concentrates only on the current position; which implies that we don't need to care about the period boundary at all.
Fixes: f87e7f25893d ("ALSA: hda - Improved position reporting on SKL+") Reported-by: Jens Axboe axboe@kernel.dk Signed-off-by: Takashi Iwai tiwai@suse.de --- sound/pci/hda/hda_intel.c | 28 +++++++++++++++++++++++----- 1 file changed, 23 insertions(+), 5 deletions(-)
diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c index 47777439961c..9989ec4dc324 100644 --- a/sound/pci/hda/hda_intel.c +++ b/sound/pci/hda/hda_intel.c @@ -637,13 +637,17 @@ static int azx_position_check(struct azx *chip, struct azx_dev *azx_dev) * the update-IRQ timing. The IRQ is issued before actually the * data is processed. So, we need to process it afterwords in a * workqueue. + * + * Returns 1 if OK to proceed, 0 for delay handling, -1 for skipping update */ static int azx_position_ok(struct azx *chip, struct azx_dev *azx_dev) { struct snd_pcm_substream *substream = azx_dev->core.substream; + struct snd_pcm_runtime *runtime = substream->runtime; int stream = substream->stream; u32 wallclk; unsigned int pos; + snd_pcm_uframes_t hwptr, target;
wallclk = azx_readl(chip, WALLCLK) - azx_dev->core.start_wallclk; if (wallclk < (azx_dev->core.period_wallclk * 2) / 3) @@ -680,6 +684,24 @@ static int azx_position_ok(struct azx *chip, struct azx_dev *azx_dev) /* NG - it's below the first next period boundary */ return chip->bdl_pos_adj ? 0 : -1; azx_dev->core.start_wallclk += wallclk; + + if (azx_dev->core.no_period_wakeup) + return 1; /* OK, no need to check period boundary */ + + if (runtime->hw_ptr_base != runtime->hw_ptr_interrupt) + return 1; /* OK, already in hwptr updating process */ + + /* check whether the period gets really elapsed */ + pos = bytes_to_frames(runtime, pos); + hwptr = runtime->hw_ptr_base + pos; + if (hwptr < runtime->status->hw_ptr) + hwptr += runtime->buffer_size; + target = runtime->hw_ptr_interrupt + runtime->period_size; + if (hwptr < target) { + /* too early wakeup, process it later */ + return chip->bdl_pos_adj ? 0 : -1; + } + return 1; /* OK, it's fine */ }
@@ -874,11 +896,7 @@ static unsigned int azx_get_pos_skl(struct azx *chip, struct azx_dev *azx_dev) if (azx_dev->core.substream->stream == SNDRV_PCM_STREAM_PLAYBACK) return azx_skl_get_dpib_pos(chip, azx_dev);
- /* For capture, we need to read posbuf, but it requires a delay - * for the possible boundary overlap; the read of DPIB fetches the - * actual posbuf - */ - udelay(20); + /* read of DPIB fetches the actual posbuf */ azx_skl_get_dpib_pos(chip, azx_dev); return azx_get_pos_posbuf(chip, azx_dev); }
@@ -874,11 +896,7 @@ static unsigned int azx_get_pos_skl(struct azx *chip, struct azx_dev *azx_dev) if (azx_dev->core.substream->stream == SNDRV_PCM_STREAM_PLAYBACK) return azx_skl_get_dpib_pos(chip, azx_dev);
- /* For capture, we need to read posbuf, but it requires a delay
* for the possible boundary overlap; the read of DPIB fetches the
* actual posbuf
*/
- udelay(20);
- /* read of DPIB fetches the actual posbuf */ azx_skl_get_dpib_pos(chip, azx_dev);
I don't think extra read has any effect, it could be removed....
return azx_get_pos_posbuf(chip, azx_dev); }
The suggestion was to further simplify with
static unsigned int azx_get_pos_skl(struct azx *chip, struct azx_dev *azx_dev) { return azx_get_pos_posbuf(chip, azx_dev); }
i.e. same behavior on playback and capture. that's the recommendation from Intel hw folks.
On Wed, 29 Sep 2021 16:15:44 +0200, Pierre-Louis Bossart wrote:
@@ -874,11 +896,7 @@ static unsigned int azx_get_pos_skl(struct azx *chip, struct azx_dev *azx_dev) if (azx_dev->core.substream->stream == SNDRV_PCM_STREAM_PLAYBACK) return azx_skl_get_dpib_pos(chip, azx_dev);
- /* For capture, we need to read posbuf, but it requires a delay
* for the possible boundary overlap; the read of DPIB fetches the
* actual posbuf
*/
- udelay(20);
- /* read of DPIB fetches the actual posbuf */ azx_skl_get_dpib_pos(chip, azx_dev);
I don't think extra read has any effect, it could be removed....
return azx_get_pos_posbuf(chip, azx_dev); }
The suggestion was to further simplify with
static unsigned int azx_get_pos_skl(struct azx *chip, struct azx_dev *azx_dev) { return azx_get_pos_posbuf(chip, azx_dev); }
i.e. same behavior on playback and capture. that's the recommendation from Intel hw folks.
It's achieved in the second patch.
Takashi
On 9/29/21 9:39 AM, Takashi Iwai wrote:
On Wed, 29 Sep 2021 16:15:44 +0200, Pierre-Louis Bossart wrote:
@@ -874,11 +896,7 @@ static unsigned int azx_get_pos_skl(struct azx *chip, struct azx_dev *azx_dev) if (azx_dev->core.substream->stream == SNDRV_PCM_STREAM_PLAYBACK) return azx_skl_get_dpib_pos(chip, azx_dev);
- /* For capture, we need to read posbuf, but it requires a delay
* for the possible boundary overlap; the read of DPIB fetches the
* actual posbuf
*/
- udelay(20);
- /* read of DPIB fetches the actual posbuf */ azx_skl_get_dpib_pos(chip, azx_dev);
I don't think extra read has any effect, it could be removed....
return azx_get_pos_posbuf(chip, azx_dev); }
The suggestion was to further simplify with
static unsigned int azx_get_pos_skl(struct azx *chip, struct azx_dev *azx_dev) { return azx_get_pos_posbuf(chip, azx_dev); }
i.e. same behavior on playback and capture. that's the recommendation from Intel hw folks.
It's achieved in the second patch.
Ah ok. the additional comment made me think it was there to stay.
The commit f87e7f25893d ("ALSA: hda - Improved position reporting on SKL+") changed the PCM position report for SKL+ chips to use DPIB, but according to Pierre, DPIB is no best choice for the accurate position reports and it often reports too early. The recommended method is rather the classical position buffer.
This patch makes the PCM position reporting on SKL+ back to the position buffer again.
Fixes: f87e7f25893d ("ALSA: hda - Improved position reporting on SKL+") Suggested-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com Signed-off-by: Takashi Iwai tiwai@suse.de --- sound/pci/hda/hda_intel.c | 23 +---------------------- 1 file changed, 1 insertion(+), 22 deletions(-)
diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c index 9989ec4dc324..14298f015fba 100644 --- a/sound/pci/hda/hda_intel.c +++ b/sound/pci/hda/hda_intel.c @@ -880,27 +880,6 @@ static int azx_get_delay_from_fifo(struct azx *chip, struct azx_dev *azx_dev, return substream->runtime->delay; }
-static unsigned int azx_skl_get_dpib_pos(struct azx *chip, - struct azx_dev *azx_dev) -{ - return _snd_hdac_chip_readl(azx_bus(chip), - AZX_REG_VS_SDXDPIB_XBASE + - (AZX_REG_VS_SDXDPIB_XINTERVAL * - azx_dev->core.index)); -} - -/* get the current DMA position with correction on SKL+ chips */ -static unsigned int azx_get_pos_skl(struct azx *chip, struct azx_dev *azx_dev) -{ - /* DPIB register gives a more accurate position for playback */ - if (azx_dev->core.substream->stream == SNDRV_PCM_STREAM_PLAYBACK) - return azx_skl_get_dpib_pos(chip, azx_dev); - - /* read of DPIB fetches the actual posbuf */ - azx_skl_get_dpib_pos(chip, azx_dev); - return azx_get_pos_posbuf(chip, azx_dev); -} - static void __azx_shutdown_chip(struct azx *chip, bool skip_link_reset) { azx_stop_chip(chip); @@ -1590,7 +1569,7 @@ static void assign_position_fix(struct azx *chip, int fix) [POS_FIX_POSBUF] = azx_get_pos_posbuf, [POS_FIX_VIACOMBO] = azx_via_get_position, [POS_FIX_COMBO] = azx_get_pos_lpib, - [POS_FIX_SKL] = azx_get_pos_skl, + [POS_FIX_SKL] = azx_get_pos_posbuf, [POS_FIX_FIFO] = azx_get_pos_fifo, };
On 9/29/21 2:29 AM, Takashi Iwai wrote:
Hi,
this is a v2 patch set for reducing the CPU hog with the HD-audio PCM position reporting. The first patch is almost same, while the second patch is added to take back to the position buffer as suggested by Pierre.
Takashi
v1: https://lore.kernel.org/r/20210910141002.32749-1-tiwai@suse.de
===
Takashi Iwai (2): ALSA: hda: Reduce udelay() at SKL+ position reporting ALSA: hda: Use position buffer for SKL+ again
Reviewed-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com
On Wed, 29 Sep 2021 09:29:32 +0200, Takashi Iwai wrote:
Hi,
this is a v2 patch set for reducing the CPU hog with the HD-audio PCM position reporting. The first patch is almost same, while the second patch is added to take back to the position buffer as suggested by Pierre.
Takashi
v1: https://lore.kernel.org/r/20210910141002.32749-1-tiwai@suse.de
===
Takashi Iwai (2): ALSA: hda: Reduce udelay() at SKL+ position reporting ALSA: hda: Use position buffer for SKL+ again
Now applied both patches to for-next branch.
Takashi
participants (2)
-
Pierre-Louis Bossart
-
Takashi Iwai