[alsa-devel] [RFC 0/4] FIFO caused playback delay (latency) handling in soc
Hello,
There has been discussion in alsa-devel in this subject several times, but no actual patches has been sent.
Based on the available information the latency caused by the HW buffer on some systems can be handled by updating the runtime->delay.
It has been discussed, that the runtime->delay can also be updated dynamically to show more accurate delay.
To further complicate things, in ASoC we could have more buffer in the chain. To handle this we need soc level support.
This RFC series tries to do that in soc by: - introducing a pcm_pointer wrapper - in this wrapper we call the original pcm_pointer functions to get the DMA pointer - introducing a new interface in dai_ops to ask the delay from the dais - adding the cpu_dai and codec_dai returned delay to form the actual delay - update the runtime->delay with this value.
With this approach none of the existing drivers need change, but they can add support for specifying the FIFO caused delay.
In this series on top of the core changes the omap(3) code is updated to take this delay reporting into use. I have not added the support to the tlv320dac33 codec driver, since it needs a bit more work, but along the same line it can be done, and if the tlv320dac33 is hooked to omap McBSP than applications can know the whole delay/latency on that path.
Since this is for RFC, I have based it on sound-2.6 topc/asoc branch, which does not have the McBSP regcache nor the sidetone patches, but I would like to get feedback on this method first than create a proper series.
I have left out linux-omap from this RFC round, since the problem is ALSA related.
Thank you, Péter Ujfalusi
--- Peter Ujfalusi (4): ASoC: core: soc level wrapper for pcm_pointer callback ASoC: core: Add delay operation to snd_soc_dai_ops OMAP3: McBSP: Add interface for transmit FIFO state query ASoC: OMAP3: Report delay on playback caused by the internal FIFO
arch/arm/plat-omap/include/plat/mcbsp.h | 4 +++ arch/arm/plat-omap/mcbsp.c | 27 ++++++++++++++++++++++++ include/sound/soc-dai.h | 6 +++++ sound/soc/omap/omap-mcbsp.c | 26 +++++++++++++++++++++++ sound/soc/soc-core.c | 34 ++++++++++++++++++++++++++++++- 5 files changed, 96 insertions(+), 1 deletions(-)
Create a soc level wrapper for pcm_pointer callbacks. This will facilitate the soc level handling of different HW buffers in the audio path.
Signed-off-by: Peter Ujfalusi peter.ujfalusi@nokia.com --- sound/soc/soc-core.c | 19 ++++++++++++++++++- 1 files changed, 18 insertions(+), 1 deletions(-)
diff --git a/sound/soc/soc-core.c b/sound/soc/soc-core.c index a03bac9..6e35741 100644 --- a/sound/soc/soc-core.c +++ b/sound/soc/soc-core.c @@ -797,6 +797,23 @@ static int soc_pcm_trigger(struct snd_pcm_substream *substream, int cmd) return 0; }
+/* + * soc level wrapper for pointer and pcm_delay handling + */ +static snd_pcm_uframes_t soc_pcm_pointer(struct snd_pcm_substream *substream) +{ + struct snd_soc_pcm_runtime *rtd = substream->private_data; + struct snd_soc_device *socdev = rtd->socdev; + struct snd_soc_card *card = socdev->card; + struct snd_soc_platform *platform = card->platform; + snd_pcm_uframes_t offset = 0; + + if (platform->pcm_ops->pointer) + offset = platform->pcm_ops->pointer(substream); + + return offset; +} + /* ASoC PCM operations */ static struct snd_pcm_ops soc_pcm_ops = { .open = soc_pcm_open, @@ -805,6 +822,7 @@ static struct snd_pcm_ops soc_pcm_ops = { .hw_free = soc_pcm_hw_free, .prepare = soc_pcm_prepare, .trigger = soc_pcm_trigger, + .pointer = soc_pcm_pointer, };
#ifdef CONFIG_PM @@ -1331,7 +1349,6 @@ static int soc_new_pcm(struct snd_soc_device *socdev, dai_link->pcm = pcm; pcm->private_data = rtd; soc_pcm_ops.mmap = platform->pcm_ops->mmap; - soc_pcm_ops.pointer = platform->pcm_ops->pointer; soc_pcm_ops.ioctl = platform->pcm_ops->ioctl; soc_pcm_ops.copy = platform->pcm_ops->copy; soc_pcm_ops.silence = platform->pcm_ops->silence;
The delay calback can be used by the core to query the delay on the dais caused by FIFO. In case if both CPU and CODEC dai has FIFO the delay reported by each will be added to form the full delay on the chain. If none of the dai has FIFO, than the delay will be kept as zero.
Signed-off-by: Peter Ujfalusi peter.ujfalusi@nokia.com --- include/sound/soc-dai.h | 6 ++++++ sound/soc/soc-core.c | 15 +++++++++++++++ 2 files changed, 21 insertions(+), 0 deletions(-)
diff --git a/include/sound/soc-dai.h b/include/sound/soc-dai.h index 061f16d..be9cd47 100644 --- a/include/sound/soc-dai.h +++ b/include/sound/soc-dai.h @@ -182,6 +182,12 @@ struct snd_soc_dai_ops { struct snd_soc_dai *); int (*trigger)(struct snd_pcm_substream *, int, struct snd_soc_dai *); + /* + * For hardware based FIFO caused delay reporting. + * Optional. + */ + snd_pcm_sframes_t (*delay)(struct snd_pcm_substream *, + struct snd_soc_dai *); };
/* diff --git a/sound/soc/soc-core.c b/sound/soc/soc-core.c index 6e35741..feaf03b 100644 --- a/sound/soc/soc-core.c +++ b/sound/soc/soc-core.c @@ -806,11 +806,26 @@ static snd_pcm_uframes_t soc_pcm_pointer(struct snd_pcm_substream *substream) struct snd_soc_device *socdev = rtd->socdev; struct snd_soc_card *card = socdev->card; struct snd_soc_platform *platform = card->platform; + struct snd_soc_dai_link *machine = rtd->dai; + struct snd_soc_dai *cpu_dai = machine->cpu_dai; + struct snd_soc_dai *codec_dai = machine->codec_dai; + struct snd_pcm_runtime *runtime = substream->runtime; snd_pcm_uframes_t offset = 0; + snd_pcm_sframes_t delay = 0;
if (platform->pcm_ops->pointer) offset = platform->pcm_ops->pointer(substream);
+ /* Query the delay only for playback stream */ + if (substream->stream == SNDRV_PCM_STREAM_PLAYBACK) { + if (cpu_dai->ops->delay) + delay += cpu_dai->ops->delay(substream, cpu_dai); + + if (codec_dai->ops->delay) + delay += codec_dai->ops->delay(substream, codec_dai); + + runtime->delay = delay; + } return offset; }
On Tue, Mar 02, 2010 at 03:39:49PM +0200, Peter Ujfalusi wrote:
- /* Query the delay only for playback stream */
- if (substream->stream == SNDRV_PCM_STREAM_PLAYBACK) {
if (cpu_dai->ops->delay)
delay += cpu_dai->ops->delay(substream, cpu_dai);
if (codec_dai->ops->delay)
delay += codec_dai->ops->delay(substream, codec_dai);
runtime->delay = delay;
- }
I'd be inclined to leave this up to the drivers to avoid surprises if it becomes useful for the capture case in the future. Not sure that that's likely, but I'd rather avoid potential future transitions.
On Tuesday 02 March 2010 15:45:08 ext Mark Brown wrote:
On Tue, Mar 02, 2010 at 03:39:49PM +0200, Peter Ujfalusi wrote:
- /* Query the delay only for playback stream */
- if (substream->stream == SNDRV_PCM_STREAM_PLAYBACK) {
if (cpu_dai->ops->delay)
delay += cpu_dai->ops->delay(substream, cpu_dai);
if (codec_dai->ops->delay)
delay += codec_dai->ops->delay(substream, codec_dai);
runtime->delay = delay;
- }
I'd be inclined to leave this up to the drivers to avoid surprises if it becomes useful for the capture case in the future. Not sure that that's likely, but I'd rather avoid potential future transitions.
Sure, actually I have added this check later, since I had some trouble to make sensible reporting on the capture path. But sure, I can remove this check and leave it to the dais to deal with.
Thanks Peter
On Tue, Mar 02, 2010 at 03:49:33PM +0200, Peter Ujfalusi wrote:
Sure, actually I have added this check later, since I had some trouble to make sensible reporting on the capture path. But sure, I can remove this check and leave it to the dais to deal with.
Yeah, I'm not convinced it's going to get used for capture but it seems a little easier to allow it rather than have to transition later. It's also consistent with all the other ops.
New function for reading the XBUFFSTAT register, which holds the fill state of the transmit buffer on McBSP.
Signed-off-by: Peter Ujfalusi peter.ujfalusi@nokia.com --- arch/arm/plat-omap/include/plat/mcbsp.h | 4 ++++ arch/arm/plat-omap/mcbsp.c | 27 +++++++++++++++++++++++++++ 2 files changed, 31 insertions(+), 0 deletions(-)
diff --git a/arch/arm/plat-omap/include/plat/mcbsp.h b/arch/arm/plat-omap/include/plat/mcbsp.h index 4f22e5b..98de3d4 100644 --- a/arch/arm/plat-omap/include/plat/mcbsp.h +++ b/arch/arm/plat-omap/include/plat/mcbsp.h @@ -147,6 +147,8 @@ #define OMAP_MCBSP_REG_WAKEUPEN 0xA8 #define OMAP_MCBSP_REG_XCCR 0xAC #define OMAP_MCBSP_REG_RCCR 0xB0 +#define OMAP_MCBSP_REG_XBUFFSTAT 0xB4 +#define OMAP_MCBSP_REG_RBUFFSTAT 0xB8
#define AUDIO_MCBSP_DATAWRITE (OMAP24XX_MCBSP2_BASE + OMAP_MCBSP_REG_DXR1) #define AUDIO_MCBSP_DATAREAD (OMAP24XX_MCBSP2_BASE + OMAP_MCBSP_REG_DRR1) @@ -428,6 +430,7 @@ void omap_mcbsp_set_tx_threshold(unsigned int id, u16 threshold); void omap_mcbsp_set_rx_threshold(unsigned int id, u16 threshold); u16 omap_mcbsp_get_max_tx_threshold(unsigned int id); u16 omap_mcbsp_get_max_rx_threshold(unsigned int id); +u16 omap_mcbsp_get_tx_buffstat(unsigned int id); int omap_mcbsp_get_dma_op_mode(unsigned int id); #else static inline void omap_mcbsp_set_tx_threshold(unsigned int id, u16 threshold) @@ -436,6 +439,7 @@ static inline void omap_mcbsp_set_rx_threshold(unsigned int id, u16 threshold) { } static inline u16 omap_mcbsp_get_max_tx_threshold(unsigned int id) { return 0; } static inline u16 omap_mcbsp_get_max_rx_threshold(unsigned int id) { return 0; } +static inline u16 omap_mcbsp_get_tx_buffstat(unsigned int id) { return 0; } static inline int omap_mcbsp_get_dma_op_mode(unsigned int id) { return 0; } #endif int omap_mcbsp_request(unsigned int id); diff --git a/arch/arm/plat-omap/mcbsp.c b/arch/arm/plat-omap/mcbsp.c index f757672..e49af86 100644 --- a/arch/arm/plat-omap/mcbsp.c +++ b/arch/arm/plat-omap/mcbsp.c @@ -284,6 +284,33 @@ u16 omap_mcbsp_get_max_rx_threshold(unsigned int id) EXPORT_SYMBOL(omap_mcbsp_get_max_rx_threshold);
/* + * omap_mcbsp_get_tx_buffstat returns the number of used slots in the McBSP FIFO + */ +u16 omap_mcbsp_get_tx_buffstat(unsigned int id) +{ + struct omap_mcbsp *mcbsp; + u16 buffstat; + + if (!omap_mcbsp_check_valid_id(id)) { + printk(KERN_ERR "%s: Invalid id (%d)\n", __func__, id + 1); + return -ENODEV; + } + mcbsp = id_to_mcbsp_ptr(id); + + /* Returns the number of free locations in the buffer */ + buffstat = OMAP_MCBSP_READ(mcbsp->io_base, XBUFFSTAT); + + /* Number of free slots on McBSP ports */ + if (mcbsp->id == 2) + buffstat = 0x500 - buffstat; + else + buffstat = 0x80 - buffstat; + + return buffstat; +} +EXPORT_SYMBOL(omap_mcbsp_get_tx_buffstat); + +/* * omap_mcbsp_get_dma_op_mode just return the current configured * operating mode for the mcbsp channel */
On Tue, Mar 02, 2010 at 03:39:50PM +0200, Peter Ujfalusi wrote:
New function for reading the XBUFFSTAT register, which holds the fill state of the transmit buffer on McBSP.
Signed-off-by: Peter Ujfalusi peter.ujfalusi@nokia.com
It'd be really nice if this could go into 2.6.34 to avoid cross-tree issues but for that you'd need to submit via OMAP. It'll need their ack anyway.
On Tue, 2010-03-02 at 14:39 +0100, Ujfalusi Peter (Nokia-D/Tampere) wrote:
New function for reading the XBUFFSTAT register, which holds the fill state of the transmit buffer on McBSP.
Signed-off-by: Peter Ujfalusi peter.ujfalusi@nokia.com
Reading the XBUFFSTAT register is by no means accurate. IIRC, it reports the buffer status incorrectly about 1/50 times on average (@ 48000khz); with simple math, it may be read during the DMA burst. Or is it guaranteed not being read during DMA transfer / have you otherwise verified the behavior?
- Eero
arch/arm/plat-omap/include/plat/mcbsp.h | 4 ++++ arch/arm/plat-omap/mcbsp.c | 27 +++++++++++++++++++++++++++ 2 files changed, 31 insertions(+), 0 deletions(-)
diff --git a/arch/arm/plat-omap/include/plat/mcbsp.h b/arch/arm/plat-omap/include/plat/mcbsp.h index 4f22e5b..98de3d4 100644 --- a/arch/arm/plat-omap/include/plat/mcbsp.h +++ b/arch/arm/plat-omap/include/plat/mcbsp.h @@ -147,6 +147,8 @@ #define OMAP_MCBSP_REG_WAKEUPEN 0xA8 #define OMAP_MCBSP_REG_XCCR 0xAC #define OMAP_MCBSP_REG_RCCR 0xB0 +#define OMAP_MCBSP_REG_XBUFFSTAT 0xB4 +#define OMAP_MCBSP_REG_RBUFFSTAT 0xB8
#define AUDIO_MCBSP_DATAWRITE (OMAP24XX_MCBSP2_BASE + OMAP_MCBSP_REG_DXR1) #define AUDIO_MCBSP_DATAREAD (OMAP24XX_MCBSP2_BASE + OMAP_MCBSP_REG_DRR1) @@ -428,6 +430,7 @@ void omap_mcbsp_set_tx_threshold(unsigned int id, u16 threshold); void omap_mcbsp_set_rx_threshold(unsigned int id, u16 threshold); u16 omap_mcbsp_get_max_tx_threshold(unsigned int id); u16 omap_mcbsp_get_max_rx_threshold(unsigned int id); +u16 omap_mcbsp_get_tx_buffstat(unsigned int id); int omap_mcbsp_get_dma_op_mode(unsigned int id); #else static inline void omap_mcbsp_set_tx_threshold(unsigned int id, u16 threshold) @@ -436,6 +439,7 @@ static inline void omap_mcbsp_set_rx_threshold(unsigned int id, u16 threshold) { } static inline u16 omap_mcbsp_get_max_tx_threshold(unsigned int id) { return 0; } static inline u16 omap_mcbsp_get_max_rx_threshold(unsigned int id) { return 0; } +static inline u16 omap_mcbsp_get_tx_buffstat(unsigned int id) { return 0; } static inline int omap_mcbsp_get_dma_op_mode(unsigned int id) { return 0; } #endif int omap_mcbsp_request(unsigned int id); diff --git a/arch/arm/plat-omap/mcbsp.c b/arch/arm/plat-omap/mcbsp.c index f757672..e49af86 100644 --- a/arch/arm/plat-omap/mcbsp.c +++ b/arch/arm/plat-omap/mcbsp.c @@ -284,6 +284,33 @@ u16 omap_mcbsp_get_max_rx_threshold(unsigned int id) EXPORT_SYMBOL(omap_mcbsp_get_max_rx_threshold);
/*
- omap_mcbsp_get_tx_buffstat returns the number of used slots in the McBSP FIFO
- */
+u16 omap_mcbsp_get_tx_buffstat(unsigned int id) +{
- struct omap_mcbsp *mcbsp;
- u16 buffstat;
- if (!omap_mcbsp_check_valid_id(id)) {
printk(KERN_ERR "%s: Invalid id (%d)\n", __func__, id + 1);
return -ENODEV;
- }
- mcbsp = id_to_mcbsp_ptr(id);
- /* Returns the number of free locations in the buffer */
- buffstat = OMAP_MCBSP_READ(mcbsp->io_base, XBUFFSTAT);
- /* Number of free slots on McBSP ports */
- if (mcbsp->id == 2)
buffstat = 0x500 - buffstat;
- else
buffstat = 0x80 - buffstat;
- return buffstat;
+} +EXPORT_SYMBOL(omap_mcbsp_get_tx_buffstat);
+/*
- omap_mcbsp_get_dma_op_mode just return the current configured
- operating mode for the mcbsp channel
*/
On Tue, 2010-03-02 at 15:52 +0200, Eero Nurkkala wrote:
On Tue, 2010-03-02 at 14:39 +0100, Ujfalusi Peter (Nokia-D/Tampere) wrote:
New function for reading the XBUFFSTAT register, which holds the fill state of the transmit buffer on McBSP.
Signed-off-by: Peter Ujfalusi peter.ujfalusi@nokia.com
Reading the XBUFFSTAT register is by no means accurate. IIRC, it reports the buffer status incorrectly about 1/50 times on average (@ 48000khz); with simple math, it may be read during the DMA burst. Or is it guaranteed not being read during DMA transfer / have you otherwise verified the behavior?
I agree, I've also seen XBUFFSTAT giving complete garbage with mcbsp3.
Liam
On Tuesday 02 March 2010 15:52:00 Nurkkala Eero.An (EXT-Offcode/Oulu) wrote:
On Tue, 2010-03-02 at 14:39 +0100, Ujfalusi Peter (Nokia-D/Tampere)
wrote:
New function for reading the XBUFFSTAT register, which holds the fill state of the transmit buffer on McBSP.
Signed-off-by: Peter Ujfalusi peter.ujfalusi@nokia.com
Reading the XBUFFSTAT register is by no means accurate. IIRC, it reports the buffer status incorrectly about 1/50 times on average (@ 48000khz); with simple math, it may be read during the DMA burst. Or is it guaranteed not being read during DMA transfer / have you otherwise verified the behavior?
That is one side, the other is that since the interface is running on the L4 domain, the actual buffer fill level might be different inside... But it is not guarantied that the DMA is not running. The pointer callback is usually called after the driver calls the elapsed, which in our case is when the DMA finished it's burst. So, yes it is not going to be precise, but it is still going to be better than it is now. I did run some tests, and the delay numbers reported looked reasonable. In element mode on McBSP2 the delay was 640 samples in some cases dropped to 639. In threshold mode, hmm I lost the log. But it was fluctuating, but that is expected. I'll do some tests on this.
- Eero
On Tue, 02 Mar 2010 15:52:00 +0200 Eero Nurkkala ext-eero.nurkkala@nokia.com wrote:
Reading the XBUFFSTAT register is by no means accurate. IIRC, it reports the buffer status incorrectly about 1/50 times on average (@ 48000khz); with simple math, it may be read during the DMA burst. Or is it guaranteed not being read during DMA transfer / have you otherwise verified the behavior?
Do you remember how big was this variation? Was it near the burst size or some arbitrary difference?
On Tue, 2010-03-02 at 15:06 +0100, ext Jarkko Nikula wrote:
On Tue, 02 Mar 2010 15:52:00 +0200 Eero Nurkkala ext-eero.nurkkala@nokia.com wrote:
Reading the XBUFFSTAT register is by no means accurate. IIRC, it reports the buffer status incorrectly about 1/50 times on average (@ 48000khz); with simple math, it may be read during the DMA burst. Or is it guaranteed not being read during DMA transfer / have you otherwise verified the behavior?
Do you remember how big was this variation? Was it near the burst size or some arbitrary difference?
Mostly the samples were good (49/50, @48khz, 2 channels, value polled from userpace), but the remaining 1/50 were garbage - and actually matched the probability of a simultaneous DMA burst. So the problem is - if you read at time t0, and the DMA burst is taking place at the same time - you get a value that looks rather random, as when (if) you poll the position @ t0 + n uS, the value is (after some usecs) a lot different. Or maybe the XBUFFSTAT just isn't reliable at all (regardless of DMA bursts etc).
It's easy to check - at DMA IRQ, read out the XBUFFSTAT... if the range is not valid, print a message...(to rule out the DMA bursts).
- Eero
On Wed, 03 Mar 2010 08:07:52 +0200 Eero Nurkkala ext-eero.nurkkala@nokia.com wrote:
Mostly the samples were good (49/50, @48khz, 2 channels, value polled from userpace), but the remaining 1/50 were garbage - and actually matched the probability of a simultaneous DMA burst. So the problem is - if you read at time t0, and the DMA burst is taking place at the same time - you get a value that looks rather random, as when (if) you poll the position @ t0 + n uS, the value is (after some usecs) a lot different. Or maybe the XBUFFSTAT just isn't reliable at all (regardless of DMA bursts etc).
Yeah, that's interesting to find out are the XBUFFSTAT reading differences inside the DMA burst size or distributed completely randomly. I think the XBUFFSTAT should increase at the sample rate and decrease or jump down fast when the DMA burst takes place.
On Wednesday 03 March 2010 09:03:24 ext Jarkko Nikula wrote:
On Wed, 03 Mar 2010 08:07:52 +0200
Eero Nurkkala ext-eero.nurkkala@nokia.com wrote:
Mostly the samples were good (49/50, @48khz, 2 channels, value polled from userpace), but the remaining 1/50 were garbage - and actually matched the probability of a simultaneous DMA burst.
Yeah, but you can place the same argument for the DMA pointer as well: If you read the pointer in a right time (very close to the start of the burst), than you have the same problem, since after few usec the DMA pointer will be at the end of the period, right?
So the problem is - if you read at time t0, and the DMA burst is taking place at the same time - you get a value that looks rather random, as when (if) you poll the position @ t0 + n uS, the value is (after some usecs) a lot different. Or maybe the XBUFFSTAT just isn't reliable at all (regardless of DMA bursts etc).
Yes, the XBUFFSTAT is not that reliable, it shows the buffer state at the last L4 clock phase. If the diff between the L4 speed and the audio is big enough, than you will have old data in the register, which does not reflect the reality. As far as I understand.
Yeah, that's interesting to find out are the XBUFFSTAT reading differences inside the DMA burst size or distributed completely randomly. I think the XBUFFSTAT should increase at the sample rate and decrease or jump down fast when the DMA burst takes place.
Anyhow, I have done few tests (McBSP2): printing XBUFFSTAT before trigger:start printing XBUFFSTAT right after trigger:start printing XBUFFSTAT at the pointer call time print at DMA interrupt printing XBUFFSTAT before trigger:stop printing XBUFFSTAT right after trigger:stop
The command for test: aplay -fdat -d 1 /dev/urandom
In element mode: [ 96.211364] Start 01 [ 96.213592] XBUFFSTAT: 1280 # before omap_mcbsp_start [ 96.216918] XBUFFSTAT: 0 # after omap_mcbsp_start [ 96.219451] Start 02 [ 96.244415] XBUFFSTAT: 0 # at pointer. [ 96.246978] XBUFFSTAT: 0 [ 96.249542] XBUFFSTAT: 0 [ 96.253265] XBUFFSTAT: 0 [ 96.255859] XBUFFSTAT: 0 [ 96.259033] XBUFFSTAT: 0 [ 96.261596] XBUFFSTAT: 0 [ 96.264404] XBUFFSTAT: 0 [ 96.267791] XBUFFSTAT: 0 [ 96.270324] XBUFFSTAT: 0 [ 96.272888] XBUFFSTAT: 0 [ 96.276458] XBUFFSTAT: 0 [ 96.279022] XBUFFSTAT: 0 [ 96.282165] XBUFFSTAT: 0 [ 96.284729] XBUFFSTAT: 0 [ 96.287536] XBUFFSTAT: 0 [ 96.290924] XBUFFSTAT: 0 [ 96.293487] XBUFFSTAT: 0 [ 96.296020] XBUFFSTAT: 0 [ 96.299774] XBUFFSTAT: 1 [ 96.302459] XBUFFSTAT: 0 [ 96.305847] XBUFFSTAT: 0 [ 96.308471] XBUFFSTAT: 0 [ 96.311035] XBUFFSTAT: 0 [ 96.314666] XBUFFSTAT: 0 [ 96.317230] XBUFFSTAT: 0 [ 96.319793] XBUFFSTAT: 0 [ 96.323150] XBUFFSTAT: 0 [ 96.325744] XBUFFSTAT: 0 [ 96.328399] DMA # DMA interrupt [ 96.330169] XBUFFSTAT: 0 # at pointer. ... [ 97.201904] DMA [ 97.203704] XBUFFSTAT: 0 [ 97.206237] Stop 01 [ 97.208343] XBUFFSTAT: 203 # before omap_mcbsp_stop [ 97.211090] XBUFFSTAT: 1280 # after omap_mcbsp_stop [ 97.213897] Stop 02
So the buffer is kept full in element mode, as it is expected.
Now the interesting thin is in threshold mode (threshold = 1023): [ 288.093475] Start 01 [ 288.095703] XBUFFSTAT: 1280 # before omap_mcbsp_start [ 288.099029] XBUFFSTAT: 257 # after omap_mcbsp_start (257+1023=1280), OK [ 288.101745] Start 02 [ 288.104003] DMA [ 288.105773] XBUFFSTAT: 904 [ 288.108520] DMA [ 288.110260] XBUFFSTAT: 311 [ 288.117706] XBUFFSTAT: 1024 [ 288.120513] DMA [ 288.122283] XBUFFSTAT: 442 [ 288.127929] XBUFFSTAT: 982 [ 288.130645] DMA [ 288.132415] XBUFFSTAT: 391 [ 288.137969] XBUFFSTAT: 923 [ 288.140686] DMA [ 288.142456] XBUFFSTAT: 331 [ 288.147796] XBUFFSTAT: 844 [ 288.150573] DMA [ 288.152343] XBUFFSTAT: 257 [ 288.157867] XBUFFSTAT: 787 [ 288.160614] DMA [ 288.162384] XBUFFSTAT: 196 [ 288.168395] XBUFFSTAT: 774 [ 288.171142] DMA [ 288.172912] XBUFFSTAT: 183 [ 288.178466] XBUFFSTAT: 717 ... [ 288.565612] DMA [ 288.567382] XBUFFSTAT: 182 [ 288.572509] XBUFFSTAT: 673 [ 288.576263] DMA [ 288.578033] XBUFFSTAT: 181 [ 288.583007] XBUFFSTAT: 659 [ 288.586944] DMA [ 288.588714] XBUFFSTAT: 182 [ 288.597595] DMA [ 288.599365] XBUFFSTAT: 182 [ 288.608245] DMA [ 288.610015] XBUFFSTAT: 182 [ 288.618896] DMA [ 288.620697] XBUFFSTAT: 182 [ 288.629577] DMA [ 288.631347] XBUFFSTAT: 182 [ 288.640228] DMA ... [ 289.077331] DMA [ 289.079101] XBUFFSTAT: 182 [ 289.087982] DMA [ 289.089752] XBUFFSTAT: 181 [ 289.092498] Stop 01 [ 289.094604] XBUFFSTAT: 646 # before omap_mcbsp_stop [ 289.097320] XBUFFSTAT: 1280 # after omap_mcbsp_stop [ 289.100128] Stop 02
These numbers were in a same range (variation was around 10) among several rounds. Looking at the numbers, I think they are kind of sane. After DMA interrupt the buffer has more data than a bit later, when the pointer is called (McBSP played out samples). One thing that I don't really understand is in the second block: suddenly we have only one pointer call in response to snd_pcm_period_elapsed(substream); while earlier we had one extra pointer call...
let's take one part out: [ 288.160614] DMA [ 288.162384] XBUFFSTAT: 196 [ 288.168395] XBUFFSTAT: 774
If application would write a sample at 288.162384, than it will be played out after 542 stereo samples [(1280 - 196 = 1084) / 2]. If application would write a sample at 288.168395, than it will be played out after 253 stereo samples [(1280 - 774 = 506) / 2]. The difference between these in samples: 542 - 253 = 289 The time difference is: 288.168395 - 288.162384 = 0.006011 In 0.006011 seconds at 48KHz the number of samples played out is 288.528
So according to XBUFSTAT we have played out 289 samples. based on the time we actually played 288.528 samples.
I would say this is really close to what we would expect to have?
Note: I have used printk for these, which pretty much alters the behavior a bit in timing wise...
On Wednesday 03 March 2010 12:02:48 Ujfalusi Peter (Nokia-D/Tampere) wrote:
Anyhow, I have done few tests (McBSP2): printing XBUFFSTAT before trigger:start printing XBUFFSTAT right after trigger:start printing XBUFFSTAT at the pointer call time print at DMA interrupt
In DMA interrupt I only printed out "DMA" and have not printed the XBUFFSTAT. XBUFFSTAT is only printed at the next pointer/delay call. Probably it would be also good to print at DMA interrupt, if there is a demand...
printing XBUFFSTAT before trigger:stop printing XBUFFSTAT right after trigger:stop
On Wed, 3 Mar 2010 12:02:48 +0200 Peter Ujfalusi peter.ujfalusi@nokia.com wrote:
The command for test: aplay -fdat -d 1 /dev/urandom
I recommend to use /dev/zero instead since urandom takes CPU time.
[ 96.290924] XBUFFSTAT: 0 [ 96.293487] XBUFFSTAT: 0 [ 96.296020] XBUFFSTAT: 0 [ 96.299774] XBUFFSTAT: 1
So the buffer is kept full in element mode, as it is expected.
I was expecting to see more variations. IRCC, the DMA in OMAP is doing transfers over the memory bus using some small bursts but I'm not completely sure. I'm thinking if that could explain the Eero's and Liam's observations?
So according to XBUFSTAT we have played out 289 samples. based on the time we actually played 288.528 samples.
I would say this is really close to what we would expect to have?
Sounds accurate enough.
Note: I have used printk for these, which pretty much alters the behavior a bit in timing wise...
BTW: There is OMAP3 port in LTTng.
On Wednesday 03 March 2010 16:18:42 ext Jarkko Nikula wrote:
On Wed, 3 Mar 2010 12:02:48 +0200
Peter Ujfalusi peter.ujfalusi@nokia.com wrote:
The command for test: aplay -fdat -d 1 /dev/urandom
I recommend to use /dev/zero instead since urandom takes CPU time.
[ 96.290924] XBUFFSTAT: 0 [ 96.293487] XBUFFSTAT: 0 [ 96.296020] XBUFFSTAT: 0 [ 96.299774] XBUFFSTAT: 1
So the buffer is kept full in element mode, as it is expected.
I was expecting to see more variations. IRCC, the DMA in OMAP is doing transfers over the memory bus using some small bursts but I'm not completely sure. I'm thinking if that could explain the Eero's and Liam's observations?
I kind of expecting similar result (this). In element mode the threshold value is 0 (1 word). So if one word is free in McBSP buffer, than it will assert the DMA request line, and the DMA will write one word to McBSP. So in theory the buffer must be kept full in this mode, with quite minimal variation. But I might be wrong on this...
So according to XBUFSTAT we have played out 289 samples. based on the time we actually played 288.528 samples.
I would say this is really close to what we would expect to have?
Sounds accurate enough.
Note: I have used printk for these, which pretty much alters the behavior a bit in timing wise...
BTW: There is OMAP3 port in LTTng.
Thanks, I'll look at that, since this printk is altering the behavior too much.
Hello,
On Wednesday 03 March 2010 17:01:39 Ujfalusi Peter (Nokia-D/Tampere) wrote:
On Wednesday 03 March 2010 16:18:42 ext Jarkko Nikula wrote:
On Wed, 3 Mar 2010 12:02:48 +0200
Peter Ujfalusi peter.ujfalusi@nokia.com wrote:
The command for test: aplay -fdat -d 1 /dev/urandom
I recommend to use /dev/zero instead since urandom takes CPU time.
[ 96.290924] XBUFFSTAT: 0 [ 96.293487] XBUFFSTAT: 0 [ 96.296020] XBUFFSTAT: 0 [ 96.299774] XBUFFSTAT: 1
So the buffer is kept full in element mode, as it is expected.
I was expecting to see more variations. IRCC, the DMA in OMAP is doing transfers over the memory bus using some small bursts but I'm not completely sure. I'm thinking if that could explain the Eero's and Liam's observations?
I kind of expecting similar result (this). In element mode the threshold value is 0 (1 word). So if one word is free in McBSP buffer, than it will assert the DMA request line, and the DMA will write one word to McBSP. So in theory the buffer must be kept full in this mode, with quite minimal variation. But I might be wrong on this...
Well, to be precise, I expected that the XBUFFSTAT would have been 1 most of the time, and not 0. But in stereo we are still within one sample accuracy, so I think this is OK.
From: Ujfalusi Peter
So according to XBUFSTAT we have played out 289 samples. based on the time we actually played 288.528 samples.
I would say this is really close to what we would expect to have?
Note: I have used printk for these, which pretty much alters the behavior a bit in timing wise...
-- Péter
There was something like this, reading XBUFFSTAT and RBUFFSTAT (adjacently), and comparing the diff, it should be 145 +-1 in this case, but that's not true:
(usespace app reading the values through a sysfs node):
1: tx: 247, rx: 101, (tx-rx): 146, (480+(tx-rx))%480: 146 2: tx: 377, rx: 232, (tx-rx): 145, (480+(tx-rx))%480: 145 3: tx: 136, rx: 471, (tx-rx): -335, (480+(tx-rx))%480: 145 4: tx: 40, rx: 375, (tx-rx): -335, (480+(tx-rx))%480: 145 5: tx: 63, rx: 398, (tx-rx): -335, (480+(tx-rx))%480: 145 6: tx: 444, rx: 299, (tx-rx): 145, (480+(tx-rx))%480: 145 7: tx: 330, rx: 185, (tx-rx): 145, (480+(tx-rx))%480: 145 8: tx: 198, rx: 53, (tx-rx): 145, (480+(tx-rx))%480: 145 9: tx: 453, rx: 308, (tx-rx): 145, (480+(tx-rx))%480: 145 10: tx: 459, rx: 313, (tx-rx): 146, (480+(tx-rx))%480: 146 11: tx: 353, rx: 208, (tx-rx): 145, (480+(tx-rx))%480: 145 12: tx: 435, rx: 290, (tx-rx): 145, (480+(tx-rx))%480: 145 13: tx: 421, rx: 276, (tx-rx): 145, (480+(tx-rx))%480: 145 14: tx: 397, rx: 251, (tx-rx): 146, (480+(tx-rx))%480: 146 15: tx: 297, rx: 151, (tx-rx): 146, (480+(tx-rx))%480: 146 16: tx: 341, rx: 196, (tx-rx): 145, (480+(tx-rx))%480: 145 17: tx: 347, rx: 202, (tx-rx): 145, (480+(tx-rx))%480: 145 18: tx: 298, rx: 153, (tx-rx): 145, (480+(tx-rx))%480: 145 19: tx: 341, rx: 196, (tx-rx): 145, (480+(tx-rx))%480: 145 20: tx: 121, rx: 456, (tx-rx): -335, (480+(tx-rx))%480: 145 21: tx: 109, rx: 444, (tx-rx): -335, (480+(tx-rx))%480: 145 22: tx: 465, rx: 320, (tx-rx): 145, (480+(tx-rx))%480: 145 23: tx: 300, rx: 155, (tx-rx): 145, (480+(tx-rx))%480: 145 24: tx: 217, rx: 72, (tx-rx): 145, (480+(tx-rx))%480: 145 25: tx: 361, rx: 216, (tx-rx): 145, (480+(tx-rx))%480: 145 26: tx: 443, rx: 298, (tx-rx): 145, (480+(tx-rx))%480: 145 27: tx: 325, rx: 180, (tx-rx): 145, (480+(tx-rx))%480: 145 28: tx: 398, rx: 252, (tx-rx): 146, (480+(tx-rx))%480: 146 29: tx: 346, rx: 201, (tx-rx): 145, (480+(tx-rx))%480: 145 30: tx: 474, rx: 329, (tx-rx): 145, (480+(tx-rx))%480: 145 31: tx: 323, rx: 178, (tx-rx): 145, (480+(tx-rx))%480: 145 32: tx: 349, rx: 204, (tx-rx): 145, (480+(tx-rx))%480: 145 33: tx: 202, rx: 57, (tx-rx): 145, (480+(tx-rx))%480: 145 34: tx: 462, rx: 317, (tx-rx): 145, (480+(tx-rx))%480: 145 35: tx: 87, rx: 422, (tx-rx): -335, (480+(tx-rx))%480: 145 36: tx: 407, rx: 262, (tx-rx): 145, (480+(tx-rx))%480: 145 37: tx: 354, rx: 209, (tx-rx): 145, (480+(tx-rx))%480: 145 38: tx: 377, rx: 232, (tx-rx): 145, (480+(tx-rx))%480: 145 39: tx: 435, rx: 290, (tx-rx): 145, (480+(tx-rx))%480: 145 40: tx: 456, rx: 311, (tx-rx): 145, (480+(tx-rx))%480: 145 41: tx: 466, rx: 321, (tx-rx): 145, (480+(tx-rx))%480: 145 42: tx: 349, rx: 204, (tx-rx): 145, (480+(tx-rx))%480: 145 43: tx: 328, rx: 183, (tx-rx): 145, (480+(tx-rx))%480: 145 44: tx: 385, rx: 240, (tx-rx): 145, (480+(tx-rx))%480: 145 45: tx: 115, rx: 450, (tx-rx): -335, (480+(tx-rx))%480: 145 46: tx: 297, rx: 152, (tx-rx): 145, (480+(tx-rx))%480: 145 47: tx: 332, rx: 187, (tx-rx): 145, (480+(tx-rx))%480: 145 48: tx: 298, rx: 153, (tx-rx): 145, (480+(tx-rx))%480: 145 49: tx: 432, rx: 286, (tx-rx): 146, (480+(tx-rx))%480: 146 50: tx: 99, rx: 434, (tx-rx): -335, (480+(tx-rx))%480: 145 51: tx: 96, rx: 431, (tx-rx): -335, (480+(tx-rx))%480: 145 52: tx: 380, rx: 235, (tx-rx): 145, (480+(tx-rx))%480: 145 53: tx: 417, rx: 272, (tx-rx): 145, (480+(tx-rx))%480: 145 54: tx: 272, rx: 337, (tx-rx): -65, (480+(tx-rx))%480: 415 < NOK 55: tx: 391, rx: 246, (tx-rx): 145, (480+(tx-rx))%480: 145 56: tx: 305, rx: 336, (tx-rx): -31, (480+(tx-rx))%480: 449 < NOK
what went wrong? (the pcm pointer w/dma pos never missed that greatly) Very unlikely that an IRQ just happened to come in between the register reads..with that frequency (> 1/50)
so subjectively XBUFFSTAT / RBUFFSTAT are not useful for any "real" use...
- Eero
what went wrong? (the pcm pointer w/dma pos never missed that greatly) Very unlikely that an IRQ just happened to come in between the register reads..with that frequency (> 1/50)
Just to make sure:
+void omap_mcbsp_buffstat(unsigned int id, unsigned int *xbuffstat, + unsigned int *rbuffstat) +{ + struct omap_mcbsp *mcbsp; + + if (!(cpu_is_omap2430() || cpu_is_omap34xx())) + return; + + if (!omap_mcbsp_check_valid_id(id)) { + printk(KERN_ERR "%s: Invalid id (%d)\n", __func__, id + 1); + return; + } + + mcbsp = id_to_mcbsp_ptr(id); + *xbuffstat = OMAP_MCBSP_READ(mcbsp->io_base, XBUFFSTAT); + *rbuffstat = OMAP_MCBSP_READ(mcbsp->io_base, RBUFFSTAT); +}
see, the above is not trusted. However, it's 100% trusted to read off the timestamps recorded at DMA IRQ callback. (and at that time the buffer has just been filled up.).
- Eero
On Wednesday 03 March 2010 21:07:21 Nurkkala Eero.An (EXT-Offcode/Oulu) wrote:
what went wrong? (the pcm pointer w/dma pos never missed that greatly) Very unlikely that an IRQ just happened to come in between the register reads..with that frequency (> 1/50)
Just to make sure:
+void omap_mcbsp_buffstat(unsigned int id, unsigned int *xbuffstat,
unsigned int *rbuffstat)
+{
- struct omap_mcbsp *mcbsp;
- if (!(cpu_is_omap2430() || cpu_is_omap34xx()))
return;
- if (!omap_mcbsp_check_valid_id(id)) {
printk(KERN_ERR "%s: Invalid id (%d)\n", __func__, id + 1);
return;
- }
- mcbsp = id_to_mcbsp_ptr(id);
- *xbuffstat = OMAP_MCBSP_READ(mcbsp->io_base, XBUFFSTAT);
- *rbuffstat = OMAP_MCBSP_READ(mcbsp->io_base, RBUFFSTAT);
+}
Yes, and your sysfs read function was actually calling the omap_mcbsp_buffstat, and than it prints out both of the buffstat in one line.
see, the above is not trusted. However, it's 100% trusted to read off the timestamps recorded at DMA IRQ callback. (and at that time the buffer has just been filled up.).
Well, in element mode it would be kind of safe to report the FIFO size as delay in all cases. But in threshold mode it is a bit different. It all depends how the threshold value has been configured: For example if the threshold is 400, than at startup you will have three consequent bursts filling the buffer to 1200, and leaving 80 locations still free in the buffer. But as the playback progresses and we have 400 free locations, than the DMA will be filling the FIFO to close to full.
if the threshold is 1024, than you will have one burst at startup, the buffer will have 1024 location filled and 256 free. But as the playback progresses and we have 1024 free locations, than the DMA will be filling the FIFO to close to full.
So yes, over time you will get pretty close estimation about the delay.
But I still think, that my proposal also gives pretty good estimation about the delay as well.
- Eero
Hello,
On Wednesday 03 March 2010 21:00:58 Nurkkala Eero.An (EXT-Offcode/Oulu) wrote:
From: Ujfalusi Peter
So according to XBUFSTAT we have played out 289 samples. based on the time we actually played 288.528 samples.
I would say this is really close to what we would expect to have?
Note: I have used printk for these, which pretty much alters the behavior a bit in timing wise...
-- Péter
There was something like this, reading XBUFFSTAT and RBUFFSTAT (adjacently), and comparing the diff, it should be 145 +-1 in this case, but that's not true:
(usespace app reading the values through a sysfs node):
1: tx: 247, rx: 101, (tx-rx): 146, (480+(tx-rx))%480: 146 2: tx: 377, rx: 232, (tx-rx): 145, (480+(tx-rx))%480: 145 3: tx: 136, rx: 471, (tx-rx): -335, (480+(tx-rx))%480: 145 4: tx: 40, rx: 375, (tx-rx): -335, (480+(tx-rx))%480: 145 5: tx: 63, rx: 398, (tx-rx): -335, (480+(tx-rx))%480: 145 6: tx: 444, rx: 299, (tx-rx): 145, (480+(tx-rx))%480: 145 7: tx: 330, rx: 185, (tx-rx): 145, (480+(tx-rx))%480: 145 8: tx: 198, rx: 53, (tx-rx): 145, (480+(tx-rx))%480: 145 9: tx: 453, rx: 308, (tx-rx): 145, (480+(tx-rx))%480: 145 10: tx: 459, rx: 313, (tx-rx): 146, (480+(tx-rx))%480: 146 11: tx: 353, rx: 208, (tx-rx): 145, (480+(tx-rx))%480: 145 12: tx: 435, rx: 290, (tx-rx): 145, (480+(tx-rx))%480: 145 13: tx: 421, rx: 276, (tx-rx): 145, (480+(tx-rx))%480: 145 14: tx: 397, rx: 251, (tx-rx): 146, (480+(tx-rx))%480: 146 15: tx: 297, rx: 151, (tx-rx): 146, (480+(tx-rx))%480: 146 16: tx: 341, rx: 196, (tx-rx): 145, (480+(tx-rx))%480: 145 17: tx: 347, rx: 202, (tx-rx): 145, (480+(tx-rx))%480: 145 18: tx: 298, rx: 153, (tx-rx): 145, (480+(tx-rx))%480: 145 19: tx: 341, rx: 196, (tx-rx): 145, (480+(tx-rx))%480: 145 20: tx: 121, rx: 456, (tx-rx): -335, (480+(tx-rx))%480: 145 21: tx: 109, rx: 444, (tx-rx): -335, (480+(tx-rx))%480: 145 22: tx: 465, rx: 320, (tx-rx): 145, (480+(tx-rx))%480: 145 23: tx: 300, rx: 155, (tx-rx): 145, (480+(tx-rx))%480: 145 24: tx: 217, rx: 72, (tx-rx): 145, (480+(tx-rx))%480: 145 25: tx: 361, rx: 216, (tx-rx): 145, (480+(tx-rx))%480: 145 26: tx: 443, rx: 298, (tx-rx): 145, (480+(tx-rx))%480: 145 27: tx: 325, rx: 180, (tx-rx): 145, (480+(tx-rx))%480: 145 28: tx: 398, rx: 252, (tx-rx): 146, (480+(tx-rx))%480: 146 29: tx: 346, rx: 201, (tx-rx): 145, (480+(tx-rx))%480: 145 30: tx: 474, rx: 329, (tx-rx): 145, (480+(tx-rx))%480: 145 31: tx: 323, rx: 178, (tx-rx): 145, (480+(tx-rx))%480: 145 32: tx: 349, rx: 204, (tx-rx): 145, (480+(tx-rx))%480: 145 33: tx: 202, rx: 57, (tx-rx): 145, (480+(tx-rx))%480: 145 34: tx: 462, rx: 317, (tx-rx): 145, (480+(tx-rx))%480: 145 35: tx: 87, rx: 422, (tx-rx): -335, (480+(tx-rx))%480: 145 36: tx: 407, rx: 262, (tx-rx): 145, (480+(tx-rx))%480: 145 37: tx: 354, rx: 209, (tx-rx): 145, (480+(tx-rx))%480: 145 38: tx: 377, rx: 232, (tx-rx): 145, (480+(tx-rx))%480: 145 39: tx: 435, rx: 290, (tx-rx): 145, (480+(tx-rx))%480: 145 40: tx: 456, rx: 311, (tx-rx): 145, (480+(tx-rx))%480: 145 41: tx: 466, rx: 321, (tx-rx): 145, (480+(tx-rx))%480: 145 42: tx: 349, rx: 204, (tx-rx): 145, (480+(tx-rx))%480: 145 43: tx: 328, rx: 183, (tx-rx): 145, (480+(tx-rx))%480: 145 44: tx: 385, rx: 240, (tx-rx): 145, (480+(tx-rx))%480: 145 45: tx: 115, rx: 450, (tx-rx): -335, (480+(tx-rx))%480: 145 46: tx: 297, rx: 152, (tx-rx): 145, (480+(tx-rx))%480: 145 47: tx: 332, rx: 187, (tx-rx): 145, (480+(tx-rx))%480: 145 48: tx: 298, rx: 153, (tx-rx): 145, (480+(tx-rx))%480: 145 49: tx: 432, rx: 286, (tx-rx): 146, (480+(tx-rx))%480: 146 50: tx: 99, rx: 434, (tx-rx): -335, (480+(tx-rx))%480: 145 51: tx: 96, rx: 431, (tx-rx): -335, (480+(tx-rx))%480: 145 52: tx: 380, rx: 235, (tx-rx): 145, (480+(tx-rx))%480: 145 53: tx: 417, rx: 272, (tx-rx): 145, (480+(tx-rx))%480: 145 54: tx: 272, rx: 337, (tx-rx): -65, (480+(tx-rx))%480: 415 < NOK 55: tx: 391, rx: 246, (tx-rx): 145, (480+(tx-rx))%480: 145 56: tx: 305, rx: 336, (tx-rx): -31, (480+(tx-rx))%480: 449 < NOK
what went wrong? (the pcm pointer w/dma pos never missed that greatly)
I see your point.
Very unlikely that an IRQ just happened to come in between the register reads..with that frequency (> 1/50)
I don't think that this has anything to do with IRQ, it looks to me more like an 'accident' in the timing of the reading the buffstat registers.
so subjectively XBUFFSTAT / RBUFFSTAT are not useful for any "real" use...
Well, yes and no. we can get good estimation with this I think still.
- Eero
How I see this: In your code, you have a sysfs file, which can be read from user space. User space semi randomly reading that file to get the buffstat values. These registers are reflecting the buffer state on the L4 clock domain, which means that the actual value might differ. So I think what happens is, that the user space happens to read the buffstat registers most of the time, when neither the TX or RX is buffer has burst DMA access, but time to time you can hit a point 'by accident' or just because of the state of the moon, when either of the TX or/and RX is actually under DMA burst. Since this burst is fast, L4 domain register can not be updated as frequently, so it will reflect a bit later state of the buffer. If this happens you will have inconsistent relation between TX and RX.
But I think, if you have similar sysfs file for comparing the DMA pointers you will also hit the same spot. If you happened to read the pointers when one of the stream is in DMA burst, than you will definitely have the same problem. So the DMA pointer comparison is 'correct' when neither of the stream are in DMA burst. Same goes for the buffstat registers: if neither of the streams are in burst they will provide pretty good numbers.
Since the pointer callback is naturally called after a period has been sent, the DMA is not in burst anymore, so the reading of buffstat on the given stream will give close enough estimation on the delay caused by the FIFO.
But yes, as you mentioned in your mail, this estimation can be done using timestamp in DMA completion handler, and than calculate the delay based on the timestamp we got, the threshold, and the size of the FIFO on the given McBSP port.
On Thu, 2010-03-04 at 09:09 +0100, Ujfalusi Peter (Nokia-D/Tampere) wrote:
Hello,
On Wednesday 03 March 2010 21:00:58 Nurkkala Eero.An (EXT-Offcode/Oulu) wrote:
From: Ujfalusi Peter
So according to XBUFSTAT we have played out 289 samples. based on the time we actually played 288.528 samples.
I would say this is really close to what we would expect to have?
Note: I have used printk for these, which pretty much alters the behavior a bit in timing wise...
-- Péter
There was something like this, reading XBUFFSTAT and RBUFFSTAT (adjacently), and comparing the diff, it should be 145 +-1 in this case, but that's not true:
(usespace app reading the values through a sysfs node):
1: tx: 247, rx: 101, (tx-rx): 146, (480+(tx-rx))%480: 146 2: tx: 377, rx: 232, (tx-rx): 145, (480+(tx-rx))%480: 145 3: tx: 136, rx: 471, (tx-rx): -335, (480+(tx-rx))%480: 145 4: tx: 40, rx: 375, (tx-rx): -335, (480+(tx-rx))%480: 145 5: tx: 63, rx: 398, (tx-rx): -335, (480+(tx-rx))%480: 145 6: tx: 444, rx: 299, (tx-rx): 145, (480+(tx-rx))%480: 145 7: tx: 330, rx: 185, (tx-rx): 145, (480+(tx-rx))%480: 145 8: tx: 198, rx: 53, (tx-rx): 145, (480+(tx-rx))%480: 145 9: tx: 453, rx: 308, (tx-rx): 145, (480+(tx-rx))%480: 145 10: tx: 459, rx: 313, (tx-rx): 146, (480+(tx-rx))%480: 146 11: tx: 353, rx: 208, (tx-rx): 145, (480+(tx-rx))%480: 145 12: tx: 435, rx: 290, (tx-rx): 145, (480+(tx-rx))%480: 145 13: tx: 421, rx: 276, (tx-rx): 145, (480+(tx-rx))%480: 145 14: tx: 397, rx: 251, (tx-rx): 146, (480+(tx-rx))%480: 146 15: tx: 297, rx: 151, (tx-rx): 146, (480+(tx-rx))%480: 146 16: tx: 341, rx: 196, (tx-rx): 145, (480+(tx-rx))%480: 145 17: tx: 347, rx: 202, (tx-rx): 145, (480+(tx-rx))%480: 145 18: tx: 298, rx: 153, (tx-rx): 145, (480+(tx-rx))%480: 145 19: tx: 341, rx: 196, (tx-rx): 145, (480+(tx-rx))%480: 145 20: tx: 121, rx: 456, (tx-rx): -335, (480+(tx-rx))%480: 145 21: tx: 109, rx: 444, (tx-rx): -335, (480+(tx-rx))%480: 145 22: tx: 465, rx: 320, (tx-rx): 145, (480+(tx-rx))%480: 145 23: tx: 300, rx: 155, (tx-rx): 145, (480+(tx-rx))%480: 145 24: tx: 217, rx: 72, (tx-rx): 145, (480+(tx-rx))%480: 145 25: tx: 361, rx: 216, (tx-rx): 145, (480+(tx-rx))%480: 145 26: tx: 443, rx: 298, (tx-rx): 145, (480+(tx-rx))%480: 145 27: tx: 325, rx: 180, (tx-rx): 145, (480+(tx-rx))%480: 145 28: tx: 398, rx: 252, (tx-rx): 146, (480+(tx-rx))%480: 146 29: tx: 346, rx: 201, (tx-rx): 145, (480+(tx-rx))%480: 145 30: tx: 474, rx: 329, (tx-rx): 145, (480+(tx-rx))%480: 145 31: tx: 323, rx: 178, (tx-rx): 145, (480+(tx-rx))%480: 145 32: tx: 349, rx: 204, (tx-rx): 145, (480+(tx-rx))%480: 145 33: tx: 202, rx: 57, (tx-rx): 145, (480+(tx-rx))%480: 145 34: tx: 462, rx: 317, (tx-rx): 145, (480+(tx-rx))%480: 145 35: tx: 87, rx: 422, (tx-rx): -335, (480+(tx-rx))%480: 145 36: tx: 407, rx: 262, (tx-rx): 145, (480+(tx-rx))%480: 145 37: tx: 354, rx: 209, (tx-rx): 145, (480+(tx-rx))%480: 145 38: tx: 377, rx: 232, (tx-rx): 145, (480+(tx-rx))%480: 145 39: tx: 435, rx: 290, (tx-rx): 145, (480+(tx-rx))%480: 145 40: tx: 456, rx: 311, (tx-rx): 145, (480+(tx-rx))%480: 145 41: tx: 466, rx: 321, (tx-rx): 145, (480+(tx-rx))%480: 145 42: tx: 349, rx: 204, (tx-rx): 145, (480+(tx-rx))%480: 145 43: tx: 328, rx: 183, (tx-rx): 145, (480+(tx-rx))%480: 145 44: tx: 385, rx: 240, (tx-rx): 145, (480+(tx-rx))%480: 145 45: tx: 115, rx: 450, (tx-rx): -335, (480+(tx-rx))%480: 145 46: tx: 297, rx: 152, (tx-rx): 145, (480+(tx-rx))%480: 145 47: tx: 332, rx: 187, (tx-rx): 145, (480+(tx-rx))%480: 145 48: tx: 298, rx: 153, (tx-rx): 145, (480+(tx-rx))%480: 145 49: tx: 432, rx: 286, (tx-rx): 146, (480+(tx-rx))%480: 146 50: tx: 99, rx: 434, (tx-rx): -335, (480+(tx-rx))%480: 145 51: tx: 96, rx: 431, (tx-rx): -335, (480+(tx-rx))%480: 145 52: tx: 380, rx: 235, (tx-rx): 145, (480+(tx-rx))%480: 145 53: tx: 417, rx: 272, (tx-rx): 145, (480+(tx-rx))%480: 145 54: tx: 272, rx: 337, (tx-rx): -65, (480+(tx-rx))%480: 415 < NOK 55: tx: 391, rx: 246, (tx-rx): 145, (480+(tx-rx))%480: 145 56: tx: 305, rx: 336, (tx-rx): -31, (480+(tx-rx))%480: 449 < NOK
what went wrong? (the pcm pointer w/dma pos never missed that greatly)
I see your point.
Very unlikely that an IRQ just happened to come in between the register reads..with that frequency (> 1/50)
I don't think that this has anything to do with IRQ, it looks to me more like an 'accident' in the timing of the reading the buffstat registers.
so subjectively XBUFFSTAT / RBUFFSTAT are not useful for any "real" use...
Well, yes and no. we can get good estimation with this I think still.
- Eero
How I see this: In your code, you have a sysfs file, which can be read from user space. User space semi randomly reading that file to get the buffstat values. These registers are reflecting the buffer state on the L4 clock domain, which means that the actual value might differ. So I think what happens is, that the user space happens to read the buffstat registers most of the time, when neither the TX or RX is buffer has burst DMA access, but time to time you can hit a point 'by accident' or just because of the state of the moon, when either of the TX or/and RX is actually under DMA burst. Since this burst is fast, L4 domain register can not be updated as frequently, so it will reflect a bit later state of the buffer. If this happens you will have inconsistent relation between TX and RX.
But I think, if you have similar sysfs file for comparing the DMA pointers you will also hit the same spot.
Don't recall so. Maybe the DMA pointer is not updated during a burst at all.
If you happened to read the pointers when one of the stream is in DMA burst, than you will definitely have the same problem. So the DMA pointer comparison is 'correct' when neither of the stream are in DMA burst. Same goes for the buffstat registers: if neither of the streams are in burst they will provide pretty good numbers.
I guess that depends on the use case. "laissez-faire" style applications could just swallow that - but applications requiring perfect numbers are in deep trouble.
Since the pointer callback is naturally called after a period has been sent, the DMA is not in burst anymore, so the reading of buffstat on the given stream will give close enough estimation on the delay caused by the FIFO.
It's always called at that point, but the user may be interested of that at any given time.
But yes, as you mentioned in your mail, this estimation can be done using timestamp in DMA completion handler, and than calculate the delay based on the timestamp we got, the threshold, and the size of the FIFO on the given McBSP port.
Use the new delay calback function to report the delay through ALSA for application caused by the internal FIFO.
Signed-off-by: Peter Ujfalusi peter.ujfalusi@nokia.com --- sound/soc/omap/omap-mcbsp.c | 26 ++++++++++++++++++++++++++ 1 files changed, 26 insertions(+), 0 deletions(-)
diff --git a/sound/soc/omap/omap-mcbsp.c b/sound/soc/omap/omap-mcbsp.c index d297256..72036b5 100644 --- a/sound/soc/omap/omap-mcbsp.c +++ b/sound/soc/omap/omap-mcbsp.c @@ -248,6 +248,31 @@ static int omap_mcbsp_dai_trigger(struct snd_pcm_substream *substream, int cmd, return err; }
+static snd_pcm_sframes_t omap_mcbsp_dai_delay( + struct snd_pcm_substream *substream, + struct snd_soc_dai *dai) +{ + struct snd_soc_pcm_runtime *rtd = substream->private_data; + struct snd_soc_dai *cpu_dai = rtd->dai->cpu_dai; + struct omap_mcbsp_data *mcbsp_data = to_mcbsp(cpu_dai->private_data); + u16 fifo_use; + snd_pcm_sframes_t delay; + + if (substream->stream == SNDRV_PCM_STREAM_PLAYBACK) + fifo_use = omap_mcbsp_get_tx_buffstat(mcbsp_data->bus_id); + else + return 0; + + /* + * Devide the used locations with the channel count to get the + * FIFO usage in samples (don't care about partial samples in the + * buffer). + */ + delay = fifo_use / substream->runtime->channels; + + return delay; +} + static int omap_mcbsp_dai_hw_params(struct snd_pcm_substream *substream, struct snd_pcm_hw_params *params, struct snd_soc_dai *dai) @@ -599,6 +624,7 @@ static struct snd_soc_dai_ops omap_mcbsp_dai_ops = { .startup = omap_mcbsp_dai_startup, .shutdown = omap_mcbsp_dai_shutdown, .trigger = omap_mcbsp_dai_trigger, + .delay = omap_mcbsp_dai_delay, .hw_params = omap_mcbsp_dai_hw_params, .set_fmt = omap_mcbsp_dai_set_dai_fmt, .set_clkdiv = omap_mcbsp_dai_set_clkdiv,
On Tue, Mar 02, 2010 at 03:39:51PM +0200, Peter Ujfalusi wrote:
- if (substream->stream == SNDRV_PCM_STREAM_PLAYBACK)
fifo_use = omap_mcbsp_get_tx_buffstat(mcbsp_data->bus_id);
- else
return 0;
You're already handling the playback vs. capture thing here! :)
- /*
* Devide the used locations with the channel count to get the
Divide.
On Tuesday 02 March 2010 15:47:07 ext Mark Brown wrote:
On Tue, Mar 02, 2010 at 03:39:51PM +0200, Peter Ujfalusi wrote:
- if (substream->stream == SNDRV_PCM_STREAM_PLAYBACK)
fifo_use = omap_mcbsp_get_tx_buffstat(mcbsp_data->bus_id);
- else
return 0;
You're already handling the playback vs. capture thing here! :)
Oh yes, I do. I have removed the capture (rx related) reporting, since as I mentioned, I was not sure how to, and what to report. But is it OK to have initially support for playback delay handling, and if there is a need, add the capture later?
- /*
* Devide the used locations with the channel count to get the
Divide.
Yes ;)
On Tue, Mar 02, 2010 at 03:52:36PM +0200, Peter Ujfalusi wrote:
On Tuesday 02 March 2010 15:47:07 ext Mark Brown wrote:
You're already handling the playback vs. capture thing here! :)
Oh yes, I do. I have removed the capture (rx related) reporting, since as I mentioned, I was not sure how to, and what to report. But is it OK to have initially support for playback delay handling, and if there is a need, add the capture later?
The code is fine, this was more a comment on the earlier patch.
On Tue, Mar 02, 2010 at 03:39:47PM +0200, Peter Ujfalusi wrote:
There has been discussion in alsa-devel in this subject several times, but no actual patches has been sent.
Well, it's never really been discussed in an ASoC context - most of the interest has been on the PC audio side. This is the first interest in it from the embedded side.
This is pretty much what I would have implemented, aside from the playback/capture thing I mentioned the only other thing that I'd suggest is also adding a callback for the platform driver. Many CPUs actually introduce the latency in the DMA stream rather than in the interface, for example by filling up a SRAM buffer as DaVinci currently does, and it's more natural for them to keep track of the latency introduced in the code that's managing the DMA. For devices where the buffer is on the interface the DAI makes sense.
2010/3/2 Peter Ujfalusi peter.ujfalusi@nokia.com
Hello,
There has been discussion in alsa-devel in this subject several times, but no actual patches has been sent.
Based on the available information the latency caused by the HW buffer on some systems can be handled by updating the runtime->delay.
It has been discussed, that the runtime->delay can also be updated dynamically to show more accurate delay.
To further complicate things, in ASoC we could have more buffer in the chain. To handle this we need soc level support.
This RFC series tries to do that in soc by:
- introducing a pcm_pointer wrapper
- in this wrapper we call the original pcm_pointer functions to get the
DMA pointer
- introducing a new interface in dai_ops to ask the delay from the dais
- adding the cpu_dai and codec_dai returned delay to form the actual
delay
- update the runtime->delay with this value.
With this approach none of the existing drivers need change, but they can add support for specifying the FIFO caused delay.
In this series on top of the core changes the omap(3) code is updated to take this delay reporting into use. I have not added the support to the tlv320dac33 codec driver, since it needs a bit more work, but along the same line it can be done, and if the tlv320dac33 is hooked to omap McBSP than applications can know the whole delay/latency on that path.
Since this is for RFC, I have based it on sound-2.6 topc/asoc branch, which does not have the McBSP regcache nor the sidetone patches, but I would like to get feedback on this method first than create a proper series.
I have left out linux-omap from this RFC round, since the problem is ALSA related.
Thank you, Péter Ujfalusi
Peter Ujfalusi (4): ASoC: core: soc level wrapper for pcm_pointer callback ASoC: core: Add delay operation to snd_soc_dai_ops OMAP3: McBSP: Add interface for transmit FIFO state query ASoC: OMAP3: Report delay on playback caused by the internal FIFO
arch/arm/plat-omap/include/plat/mcbsp.h | 4 +++ arch/arm/plat-omap/mcbsp.c | 27 ++++++++++++++++++++++++ include/sound/soc-dai.h | 6 +++++ sound/soc/omap/omap-mcbsp.c | 26 +++++++++++++++++++++++ sound/soc/soc-core.c | 34 ++++++++++++++++++++++++++++++- 5 files changed, 96 insertions(+), 1 deletions(-)
Do soc support snd_pcm_rewind() and snd_pcm_forward ?
On Wed, Mar 03, 2010 at 07:29:23AM +0800, Raymond Yau wrote:
Do soc support snd_pcm_rewind() and snd_pcm_forward ?
No driver demand yet, though it'd be trivially implemented if anyone wanted to do it.
participants (7)
-
Eero Nurkkala
-
ext-Eero.Nurkkala@nokia.com
-
Jarkko Nikula
-
Liam Girdwood
-
Mark Brown
-
Peter Ujfalusi
-
Raymond Yau