[alsa-devel] [PATCH - dmix v3 0/1] pcm: dmix: Align slave_hw_ptr to slave period boundary
On 10/30/18 14:21, Takashi Iwai wrote:
Yes, and is this correct? Suppose you start a stream at the position one sample before the next period (psize * N + (psize-1)). Then slave_hw_ptr is psize * N. At the next moment, the dpcm->hw.ptr reaches to psize * (N+1), and snd_pcm_dmix_sync_ptr() gets called. Then slave_hw_ptr will be updated to psize * (N+1) although only one sample has been processed?
I have created a spreadsheet to simulate the behavior of dmix (see attachment). I used this sheet to test different corner cases with different implementations. Column B-D describes 3 corner cases with the original implementation. Column E-H describes the corner cases with patch v2 applied and column I-L describes the corner cases with patch v3. With patch v3 I was not able to find a corner case which would fail. All our internal audio test are also passing fine.
The corner cases are described in the form N*period+period-1 (N+1)period+1 (N+2)period+1
The first line describes the value of dmix->spcm->hw.ptr when snd_pcm_dmix_start() will be called. The second line describes the hw_ptr when poll() returns for the first time and the third line describes the hw_ptr when poll() returns for the second time.
In column B is the issue case described when the patch is not applied. snd_pcm_avail() returns 2 frames only (B27) but the buffer only contains 3 frames (B18). Therefore up to 5 frames are available (a buffer size of 8 frames is choosen).
The issue case of patch v2 is shown in column H. snd_pcm_avail() returns 7 frames (H46) but the buffer contains still 3 frames (H37). Therefore there are only 5 frames free to overwrite.
Do you see any corner cases which I missed or any other drawbacks?
Best regards and thanks for your time
Timo
From: Laxmi Devi Laxmi.Devi@in.bosch.com
These changes are required due to the kernel commit 07b7acb51d283d8469696c906b91f1882696a4d4 ("ASoC: rsnd: update pointer more accurate")
Issue is that snd_pcm_wait() goes back to waiting because the hw_ptr is not period aligned. Therefore snd_pcm_wait() will block for a longer time as required.
With these rcar driver changes the exact position of the dma is returned. During snd_pcm_start they read hw_ptr as reference, and this hw_ptr is now not period aligned, and is a little ahead over the period while it is read. Therefore when the avail is calculated during snd_pcm_wait(), it is missing the avail_min by a few frames. Consider the below example:
Considering the application is trying to write 0x120 frames and the period_size = 0x60, avail_min = 0x120 and buffersize = 0x360 :
rsnd_pointer=0x12c -> dma pointer at the end of second write during snd_pcm_dmix_start(). Since another 0x120 buffer is available, application writes 0x120 and goes to snd_pcm_wait(). It is woken up after 3 snd_pcm_period_elapsed() to see rsnd_pointer=0x248. So hw_ptr = new_slave_hw_ptr - reference_slave_hw_ptr = 0x248 - 0x12c = 0x11c. It needs 4 more frames to be able to write. And so it goes back to waiting.
But since 3 snd_pcm_period_elapsed(), 3 periods should be available and it should have been able to write. If rsnd_pointer during the start was 0x120 which is 3 periods then 0x248 - 0x120 = 128 it could go on with write.
Signed-off-by: Laxmi Devi Laxmi.Devi@in.bosch.com Signed-off-by: Timo Wischer twischer@de.adit-jv.com ---
With this patch the slave_hw_ptr will always contain a period align value.
src/pcm/pcm_dmix.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/src/pcm/pcm_dmix.c b/src/pcm/pcm_dmix.c index 881154e..6c9e052 100644 --- a/src/pcm/pcm_dmix.c +++ b/src/pcm/pcm_dmix.c @@ -398,6 +398,8 @@ static int snd_pcm_dmix_sync_ptr0(snd_pcm_t *pcm, snd_pcm_uframes_t slave_hw_ptr snd_pcm_sframes_t diff; old_slave_hw_ptr = dmix->slave_hw_ptr; + slave_hw_ptr = ((slave_hw_ptr / dmix->slave_period_size) + * dmix->slave_period_size); dmix->slave_hw_ptr = slave_hw_ptr; diff = slave_hw_ptr - old_slave_hw_ptr; if (diff == 0) /* fast path */ @@ -560,6 +562,8 @@ static int snd_pcm_dmix_hwsync(snd_pcm_t *pcm) static void reset_slave_ptr(snd_pcm_t *pcm, snd_pcm_direct_t *dmix) { dmix->slave_appl_ptr = dmix->slave_hw_ptr = *dmix->spcm->hw.ptr; + dmix->slave_hw_ptr = ((dmix->slave_hw_ptr / dmix->slave_period_size) + * dmix->slave_period_size); if (pcm->buffer_size > pcm->period_size * 2) return; /* If we have too litte periods, better to align the start position
On Tue, 06 Nov 2018 13:52:53 +0100, twischer@de.adit-jv.com wrote:
From: Laxmi Devi Laxmi.Devi@in.bosch.com
These changes are required due to the kernel commit 07b7acb51d283d8469696c906b91f1882696a4d4 ("ASoC: rsnd: update pointer more accurate")
Issue is that snd_pcm_wait() goes back to waiting because the hw_ptr is not period aligned. Therefore snd_pcm_wait() will block for a longer time as required.
With these rcar driver changes the exact position of the dma is returned. During snd_pcm_start they read hw_ptr as reference, and this hw_ptr is now not period aligned, and is a little ahead over the period while it is read. Therefore when the avail is calculated during snd_pcm_wait(), it is missing the avail_min by a few frames. Consider the below example:
Considering the application is trying to write 0x120 frames and the period_size = 0x60, avail_min = 0x120 and buffersize = 0x360 :
rsnd_pointer=0x12c -> dma pointer at the end of second write during snd_pcm_dmix_start(). Since another 0x120 buffer is available, application writes 0x120 and goes to snd_pcm_wait(). It is woken up after 3 snd_pcm_period_elapsed() to see rsnd_pointer=0x248. So hw_ptr = new_slave_hw_ptr - reference_slave_hw_ptr = 0x248 - 0x12c = 0x11c. It needs 4 more frames to be able to write. And so it goes back to waiting.
But since 3 snd_pcm_period_elapsed(), 3 periods should be available and it should have been able to write. If rsnd_pointer during the start was 0x120 which is 3 periods then 0x248 - 0x120 = 128 it could go on with write.
Signed-off-by: Laxmi Devi Laxmi.Devi@in.bosch.com Signed-off-by: Timo Wischer twischer@de.adit-jv.com
With this patch the slave_hw_ptr will always contain a period align value.
By enforcing that, application won't be able to get the fine-grained hwptr update any longer, hence it is a clear regression.
That is, we can't apply this as is. If any, such adjustment has to be applied only conditionally for certain hardware setup.
thanks,
Takashi
On Tue, 06 Nov 2018 13:48:08 +0100, Timo Wischer wrote:
On 10/30/18 14:21, Takashi Iwai wrote:
Yes, and is this correct? Suppose you start a stream at the position one sample before the next period (psize * N + (psize-1)). Then slave_hw_ptr is psize * N. At the next moment, the dpcm->hw.ptr reaches to psize * (N+1), and snd_pcm_dmix_sync_ptr() gets called. Then slave_hw_ptr will be updated to psize * (N+1) although only one sample has been processed?
I have created a spreadsheet to simulate the behavior of dmix (see attachment). I used this sheet to test different corner cases with different implementations. Column B-D describes 3 corner cases with the original implementation. Column E-H describes the corner cases with patch v2 applied and column I-L describes the corner cases with patch v3. With patch v3 I was not able to find a corner case which would fail. All our internal audio test are also passing fine.
The corner cases are described in the form N*period+period-1 (N+1)period+1 (N+2)period+1
The first line describes the value of dmix->spcm->hw.ptr when snd_pcm_dmix_start() will be called. The second line describes the hw_ptr when poll() returns for the first time and the third line describes the hw_ptr when poll() returns for the second time.
In column B is the issue case described when the patch is not applied. snd_pcm_avail() returns 2 frames only (B27) but the buffer only contains 3 frames (B18). Therefore up to 5 frames are available (a buffer size of 8 frames is choosen).
The issue case of patch v2 is shown in column H. snd_pcm_avail() returns 7 frames (H46) but the buffer contains still 3 frames (H37). Therefore there are only 5 frames free to overwrite.
Hrm, sorry, it's still not clear to me what you've tested and what these items represent.
Do you see any corner cases which I missed or any other drawbacks?
I guess that the biggest issue is the understanding of PCM period wakeup; let me cite the description of a part of your patch:
"But since 3 snd_pcm_period_elapsed(), 3 periods should be available and it should have been able to write. If rsnd_pointer during the start was 0x120 which is 3 periods then 0x248 - 0x120 = 128 it could go on with write."
This assumption can't be applied.
However, the current implementation of dmix is designed to achieve the lowest latency by putting the data at the exact position being played back now. This, of course, doesn't guarantee the period wakeup = one period free to fill like the above. So it's the design choice.
The current code has a workaround for the case of nperiods <= 2, by setting the initial slave_appl_ptr to the next period start. This guarantees that the period wakeup = one period to be filled. But its cost is the start latency; the playback doesn't start immediately but wait until the next period start.
Takashi
On 11/6/18 15:27, Takashi Iwai wrote:
I guess that the biggest issue is the understanding of PCM period wakeup; let me cite the description of a part of your patch:
"But since 3 snd_pcm_period_elapsed(), 3 periods should be available and it should have been able to write. If rsnd_pointer during the start was 0x120 which is 3 periods then 0x248 - 0x120 = 128 it could go on with write."
This assumption can't be applied.
However, the current implementation of dmix is designed to achieve the lowest latency by putting the data at the exact position being played back now. This, of course, doesn't guarantee the period wakeup = one period free to fill like the above. So it's the design choice.
The current code has a workaround for the case of nperiods <= 2, by setting the initial slave_appl_ptr to the next period start. This guarantees that the period wakeup = one period to be filled. But its cost is the start latency; the playback doesn't start immediately but wait until the next period start.
Takashi
Thanks for this explanation. That helps me to understand.
Now, I am thinking about a solution to align the slave_hw_ptr and slave_appl_ptr to the slave_period (round down) only on start up. So we keep this low latency. (At start up it is the same behavior as known from old sound drivers)
As known from the past in worst case it will drop up to slave_period-1 frames. If the application uses big periods and the drop is not acceptable a small slave_period using the var_period feature could be configured. But I think it is also not acceptable for an application which is using really big periods to wait sometimes one period longer.
There is always a higher probability of an under run whenever a poll() wakeup will not result in a write. For example on systems with high load this could be an issue.
If you are anyway not happy with this solution may be we can think about an option to disable this feature like "min_drop_on_start true" But then we also should describe the drawbacks of this option like * Delay of snd_pcm_wait() up to 2*periods * Higher probability of Xruns
Best regards
Timo
On Tue, 06 Nov 2018 17:15:31 +0100, Timo Wischer wrote:
On 11/6/18 15:27, Takashi Iwai wrote:
I guess that the biggest issue is the understanding of PCM period wakeup; let me cite the description of a part of your patch:
"But since 3 snd_pcm_period_elapsed(), 3 periods should be available and it should have been able to write. If rsnd_pointer during the start was 0x120 which is 3 periods then 0x248 - 0x120 = 128 it could go on with write."
This assumption can't be applied.
However, the current implementation of dmix is designed to achieve the lowest latency by putting the data at the exact position being played back now. This, of course, doesn't guarantee the period wakeup = one period free to fill like the above. So it's the design choice.
The current code has a workaround for the case of nperiods <= 2, by setting the initial slave_appl_ptr to the next period start. This guarantees that the period wakeup = one period to be filled. But its cost is the start latency; the playback doesn't start immediately but wait until the next period start.
Takashi
Thanks for this explanation. That helps me to understand.
Now, I am thinking about a solution to align the slave_hw_ptr and slave_appl_ptr to the slave_period (round down) only on start up. So we keep this low latency. (At start up it is the same behavior as known from old sound drivers)
As known from the past in worst case it will drop up to slave_period-1 frames. If the application uses big periods and the drop is not acceptable a small slave_period using the var_period feature could be configured. But I think it is also not acceptable for an application which is using really big periods to wait sometimes one period longer.
Well, which case is acceptable and which isn't -- we can't define it in our side so easily. This really depends on how the application is written and how the hardware setup is.
For example, imagine to play a very short beep sound with dmix. Dropping samples (at most period-1 size) may result in silence, and that's not acceptable.
There is always a higher probability of an under run whenever a poll() wakeup will not result in a write. For example on systems with high load this could be an issue.
If you are anyway not happy with this solution may be we can think about an option to disable this feature like "min_drop_on_start true" But then we also should describe the drawbacks of this option like
- Delay of snd_pcm_wait() up to 2*periods
- Higher probability of Xruns
This probably depends on how many periods are present on the buffer. For the standard PC onboard sounds, 16 periods are used as default, and that's enough for avoiding xruns for normal applications.
I guess we may provide an option for choosing one of the following modes:
- current default mode; no latency, but wakeup may be delayed, and may require 2*periods
- appl_ptr round up to the next period; the current workaround for periods <= 2. This imposes the start latency.
- round down to the current period start; No latency, but will drop a few samples at start.
... and ideally, provide a mode "auto" for choosing the appropriate one depending on the situation by a smart guess.
thanks,
Takashi
participants (3)
-
Takashi Iwai
-
Timo Wischer
-
twischer@de.adit-jv.com