[PATCH] ALSA: pcm: fix incorrect hw_base increase
The hw_base will be increased by runtime->buffer_size frames unconditionally if the runtime->status->hw_ptr is not updated for over half of buffer time. As the hw_base increases, so does the runtime->status->hw_ptr which could lead to invalid return value when user space program calls snd_pcm_avail() function.
By updating runtime->hw_ptr_jiffies each time the HWSYNC is called, the hw_base will keep the same when buffer stall happens, so does the hw_ptr.
Signed-off-by: Brent Lu brent.lu@intel.com --- sound/core/pcm_lib.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/sound/core/pcm_lib.c b/sound/core/pcm_lib.c index 872a852..d531e1b 100644 --- a/sound/core/pcm_lib.c +++ b/sound/core/pcm_lib.c @@ -433,6 +433,7 @@ static int snd_pcm_update_hw_ptr0(struct snd_pcm_substream *substream,
no_delta_check: if (runtime->status->hw_ptr == new_hw_ptr) { + runtime->hw_ptr_jiffies = curr_jiffies; update_audio_tstamp(substream, &curr_tstamp, &audio_tstamp); return 0; }
On Fri, May 15, 2020 at 12:09:39PM +0800, Brent Lu wrote:
The hw_base will be increased by runtime->buffer_size frames unconditionally if the runtime->status->hw_ptr is not updated for over half of buffer time. As the hw_base increases, so does the runtime->status->hw_ptr which could lead to invalid return value when user space program calls snd_pcm_avail() function.
By updating runtime->hw_ptr_jiffies each time the HWSYNC is called, the hw_base will keep the same when buffer stall happens, so does the hw_ptr.
Signed-off-by: Brent Lu brent.lu@intel.com
sound/core/pcm_lib.c | 1 + 1 file changed, 1 insertion(+)
Is this a bugfix needed for older kernels as well? When did this issue show up?
thanks,
greg k-h
Is this a bugfix needed for older kernels as well? When did this issue show up?
thanks,
greg k-h
It happens when DMA stop moving data from host to DSP/DAI for a long time (> half of buffer time). I know host driver should do something about it. But if not, the HWSYNC will keep increasing the hw_base and hw_ptr and confuses user space program.
Regards, Brent
Dne 15. 05. 20 v 11:04 Lu, Brent napsal(a):
Is this a bugfix needed for older kernels as well? When did this issue show up?
thanks,
greg k-h
It happens when DMA stop moving data from host to DSP/DAI for a long time (> half of buffer time). I know host driver should do something about it. But if not, the HWSYNC will keep increasing the hw_base and hw_ptr and confuses user space program.
I'm afraid, but with this code, you turn off the hw_ptr jiffies code. It would be better to fix the driver in this case (return the updated / estimated DMA pointer, increase DMA buffer size etc.). This "lag" is unacceptable.
Jaroslav
Regards, Brent
On Fri, 15 May 2020 11:30:54 +0200, Jaroslav Kysela wrote:
Dne 15. 05. 20 v 11:04 Lu, Brent napsal(a):
Is this a bugfix needed for older kernels as well? When did this issue show up?
thanks,
greg k-h
It happens when DMA stop moving data from host to DSP/DAI for a long time (> half of buffer time). I know host driver should do something about it. But if not, the HWSYNC will keep increasing the hw_base and hw_ptr and confuses user space program.
I'm afraid, but with this code, you turn off the hw_ptr jiffies code. It would be better to fix the driver in this case (return the updated / estimated DMA pointer, increase DMA buffer size etc.). This "lag" is unacceptable.
The problem is obviously in the driver's side and it's best to be addressed there. But, I think it's still worth to apply this change.
The hw_ptr jiffies check is performed basically in two places: one is snd_pcm_period_elapsed() call from ISR, and another is with the no_period_wakeup flag. In both cases, it calculates the diff of jiffies from the previous update, and corrects the hw_ptr_base if that exceeds the threshold.
And the bug here is that the "previous" jiffies is kept as long as the hwptr itself is updated. What we need is the correction of the base when it really has processed the period size; i.e. hwptr got the same value (with no_period_wakeup) and yet the jiffies diff is big. For this check, it's correct to update hw_ptr_jiffies at each call no matter whether hwptr moved or not; we need to evaluate from the previous update, after all.
But I might overlook something. Jaroslav, could you check it again? The jiffies check code is your black magic :)
thanks,
Takashi
Dne 15. 05. 20 v 12:39 Takashi Iwai napsal(a):
On Fri, 15 May 2020 11:30:54 +0200, Jaroslav Kysela wrote:
Dne 15. 05. 20 v 11:04 Lu, Brent napsal(a):
Is this a bugfix needed for older kernels as well? When did this issue show up?
thanks,
greg k-h
It happens when DMA stop moving data from host to DSP/DAI for a long time (> half of buffer time). I know host driver should do something about it. But if not, the HWSYNC will keep increasing the hw_base and hw_ptr and confuses user space program.
I'm afraid, but with this code, you turn off the hw_ptr jiffies code. It would be better to fix the driver in this case (return the updated / estimated DMA pointer, increase DMA buffer size etc.). This "lag" is unacceptable.
The problem is obviously in the driver's side and it's best to be addressed there. But, I think it's still worth to apply this change.
The hw_ptr jiffies check is performed basically in two places: one is snd_pcm_period_elapsed() call from ISR, and another is with the no_period_wakeup flag. In both cases, it calculates the diff of jiffies from the previous update, and corrects the hw_ptr_base if that exceeds the threshold.
And the bug here is that the "previous" jiffies is kept as long as the hwptr itself is updated. What we need is the correction of the base when it really has processed the period size; i.e. hwptr got the same value (with no_period_wakeup) and yet the jiffies diff is big. For this check, it's correct to update hw_ptr_jiffies at each call no matter whether hwptr moved or not; we need to evaluate from the previous update, after all.
But I might overlook something. Jaroslav, could you check it again? The jiffies check code is your black magic :)
I tried to imagine a negative impact for this hw_ptr_jiffies update when the DMA position is not updated from the driver and I haven't found any so far.
Let's apply this and we'll see in future :-)
And yes, the patch description should be improved (DMA ptr is not updated / streaming is inactive).
Reviewed-by: Jaroslav Kysela perex@perex.cz
thanks,
Takashi
I tried to imagine a negative impact for this hw_ptr_jiffies update when the DMA position is not updated from the driver and I haven't found any so far.
Let's apply this and we'll see in future :-)
And yes, the patch description should be improved (DMA ptr is not updated / streaming is inactive).
Reviewed-by: Jaroslav Kysela perex@perex.cz
Hi Jaroslav and Takashi,
Thank you for the review. Patch V2 has been uploaded.
Regards, Brent
thanks,
Takashi
-- Jaroslav Kysela perex@perex.cz Linux Sound Maintainer; ALSA Project; Red Hat, Inc.
On Fri, 15 May 2020 06:09:39 +0200, Brent Lu wrote:
The hw_base will be increased by runtime->buffer_size frames unconditionally if the runtime->status->hw_ptr is not updated for over half of buffer time. As the hw_base increases, so does the runtime->status->hw_ptr which could lead to invalid return value when user space program calls snd_pcm_avail() function.
By updating runtime->hw_ptr_jiffies each time the HWSYNC is called, the hw_base will keep the same when buffer stall happens, so does the hw_ptr.
Updating hw_ptr_jiffies at that code path looks correct, but it still leaves the question why this condition happens. It means that the actual hwptr isn't changed and yet only jiffies increase significantly; it means that the hardware can't report proper pointer, and it should have set SNDRV_PCM_INFO_BATCH flag, then the jiffies check is skipped.
With which hardware and under which situation did it happen (and the patch fixed)?
thanks,
Takashi
Signed-off-by: Brent Lu brent.lu@intel.com
sound/core/pcm_lib.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/sound/core/pcm_lib.c b/sound/core/pcm_lib.c index 872a852..d531e1b 100644 --- a/sound/core/pcm_lib.c +++ b/sound/core/pcm_lib.c @@ -433,6 +433,7 @@ static int snd_pcm_update_hw_ptr0(struct snd_pcm_substream *substream,
no_delta_check: if (runtime->status->hw_ptr == new_hw_ptr) {
update_audio_tstamp(substream, &curr_tstamp, &audio_tstamp); return 0; }runtime->hw_ptr_jiffies = curr_jiffies;
-- 2.7.4
Updating hw_ptr_jiffies at that code path looks correct, but it still leaves the question why this condition happens. It means that the actual hwptr isn't changed and yet only jiffies increase significantly; it means that the hardware can't report proper pointer, and it should have set SNDRV_PCM_INFO_BATCH flag, then the jiffies check is skipped.
With which hardware and under which situation did it happen (and the patch fixed)?
thanks,
Takashi
From time to time we got questions from google about why sometimes the
snd_pcm_avail() returns a value larger than buffer size. Recently we finally found reliable reproduce steps: it's on Intel GLK Chromebook Fleex with SOF firmware. There is a 1/20 chance the audio playback to HDMI fails.
From the FW side we observe a buffer runderrun, the FW is not able to
recover it for some reason and stops the pipe.
From the Linux side we see the pos does not increase because the FW stops
receiving data but suddenly the hw_prt is increased by buffer_size (16368). It could make snd_pcm_avail() reporting a false underrun to the caller like following log:
2020-05-09T16:08:32.712042+08:00 DEBUG kernel: [ 418.510086] sound pcmC0D5p: pos 96 hw_ptr 96 appl_ptr 4096 avail 12368 2020-05-09T16:08:32.712043+08:00 DEBUG kernel: [ 418.510149] sound pcmC0D5p: pos 96 hw_ptr 96 appl_ptr 6910 avail 9554 ... 2020-05-09T16:08:32.883095+08:00 DEBUG kernel: [ 418.680868] sound pcmC0D5p: pos 96 hw_ptr 96 appl_ptr 15102 avail 1362 2020-05-09T16:08:32.883104+08:00 DEBUG kernel: [ 418.681052] sound pcmC0D5p: pos 96 hw_ptr 96 appl_ptr 15102 avail 1362 2020-05-09T16:08:32.883109+08:00 DEBUG kernel: [ 418.681130] sound pcmC0D5p: pos 96 hw_ptr 96 appl_ptr 16464 avail 0 2020-05-09T16:08:32.929330+08:00 DEBUG kernel: [ 418.726515] sound pcmC0D5p: pos 96 hw_ptr 16464 appl_ptr 16464 avail 16368 2020-05-09T16:08:32.929512+08:00 DEBUG kernel: [ 418.727041] sound pcmC0D5p: pos 96 hw_ptr 16464 appl_ptr 16464 avail 16368
Or it could make snd_pcm_avail() returns an invalid number and confuses the Caller like following log:
2020-05-09T16:08:33.054039+08:00 DEBUG kernel: [ 418.851755] sound pcmC0D5p: pos 96 hw_ptr 16464 appl_ptr 27390 avail 5442 2020-05-09T16:08:33.129552+08:00 DEBUG kernel: [ 418.926491] sound pcmC0D5p: pos 96 hw_ptr 32832 appl_ptr 27390 avail 21810 2020-05-09T16:08:33.131746+08:00 ERR cras_server[1907]: pcm_avail returned frames larger than buf_size: sof-glkda7219max: :0,5: 21810 > 16368
I've submitted a patch to fix the issue in SOF side. However, I think it's also good to fix the incorrect hw_base increasement in Linux side.
https://github.com/thesofproject/sof/pull/2926
Regards, Brent
On Fri, 15 May 2020 11:36:19 +0200, Lu, Brent wrote:
Updating hw_ptr_jiffies at that code path looks correct, but it still leaves the question why this condition happens. It means that the actual hwptr isn't changed and yet only jiffies increase significantly; it means that the hardware can't report proper pointer, and it should have set SNDRV_PCM_INFO_BATCH flag, then the jiffies check is skipped.
With which hardware and under which situation did it happen (and the patch fixed)?
thanks,
Takashi
From time to time we got questions from google about why sometimes the
snd_pcm_avail() returns a value larger than buffer size. Recently we finally found reliable reproduce steps: it's on Intel GLK Chromebook Fleex with SOF firmware. There is a 1/20 chance the audio playback to HDMI fails.
From the FW side we observe a buffer runderrun, the FW is not able to
recover it for some reason and stops the pipe.
From the Linux side we see the pos does not increase because the FW stops
receiving data but suddenly the hw_prt is increased by buffer_size (16368). It could make snd_pcm_avail() reporting a false underrun to the caller like following log:
2020-05-09T16:08:32.712042+08:00 DEBUG kernel: [ 418.510086] sound pcmC0D5p: pos 96 hw_ptr 96 appl_ptr 4096 avail 12368 2020-05-09T16:08:32.712043+08:00 DEBUG kernel: [ 418.510149] sound pcmC0D5p: pos 96 hw_ptr 96 appl_ptr 6910 avail 9554 ... 2020-05-09T16:08:32.883095+08:00 DEBUG kernel: [ 418.680868] sound pcmC0D5p: pos 96 hw_ptr 96 appl_ptr 15102 avail 1362 2020-05-09T16:08:32.883104+08:00 DEBUG kernel: [ 418.681052] sound pcmC0D5p: pos 96 hw_ptr 96 appl_ptr 15102 avail 1362 2020-05-09T16:08:32.883109+08:00 DEBUG kernel: [ 418.681130] sound pcmC0D5p: pos 96 hw_ptr 96 appl_ptr 16464 avail 0 2020-05-09T16:08:32.929330+08:00 DEBUG kernel: [ 418.726515] sound pcmC0D5p: pos 96 hw_ptr 16464 appl_ptr 16464 avail 16368 2020-05-09T16:08:32.929512+08:00 DEBUG kernel: [ 418.727041] sound pcmC0D5p: pos 96 hw_ptr 16464 appl_ptr 16464 avail 16368
Or it could make snd_pcm_avail() returns an invalid number and confuses the Caller like following log:
2020-05-09T16:08:33.054039+08:00 DEBUG kernel: [ 418.851755] sound pcmC0D5p: pos 96 hw_ptr 16464 appl_ptr 27390 avail 5442 2020-05-09T16:08:33.129552+08:00 DEBUG kernel: [ 418.926491] sound pcmC0D5p: pos 96 hw_ptr 32832 appl_ptr 27390 avail 21810 2020-05-09T16:08:33.131746+08:00 ERR cras_server[1907]: pcm_avail returned frames larger than buf_size: sof-glkda7219max: :0,5: 21810 > 16368
I've submitted a patch to fix the issue in SOF side. However, I think it's also good to fix the incorrect hw_base increasement in Linux side.
Oh this whole information (at least some digested version) should have been included in the patch description. Otherwise we have no idea why it's needed and what actually means.
Let's wait for Jaroslav's review, and if it's OK, could you resubmit with more description?
thanks,
Takashi
participants (5)
-
Brent Lu
-
Greg Kroah-Hartman
-
Jaroslav Kysela
-
Lu, Brent
-
Takashi Iwai