[alsa-devel] What does snd_pcm_delay() actually return?

newer
[alsa-devel] [PATCH] Added model...

older
[alsa-devel] Linux Plumbers...

Lennart Poettering

9 Jun 2008 9 Jun '08

9:02 p.m.

Takashi, Jaroslav,

could you please explain what exactly snd_pcm_delay() returns?

Some applications (such as WINE) assume it is the time that would pass until we reach an underrun if we would stop writing any further data to the PCM device.

Other applications (such as most media players) use it for time synchronisation. i.e. assume that it is the time that passes until a sample I write to a PCM device now would take to be played.

Why does this make a difference? For all drivers/ioplug backends where audio is not directly written to the hardware buffer of the audio device the underrun might happen much earlier than the last sample written is heard. This seems to be the case of USB audio for example. If I fill up the hw buffer completely, I will be notified about a buffer underrun much earlier than the buffer size would suggest.

Another example is the PulseAudio backend: for each client stream we maintain a private playback buffer. After that buffer there is the hardware buffer. If the client buffer runs empty we'd like to signal an underrun. However, in that situation it still will take some time until the underrun is really hearable, because the audio first has to pass the second buffer in line, the hardware buffer.

Or in other words: in some backends (PCI/DMA) after a sample is read from the ALSA playback buffer and the read index is increased it is immediately heard. In others it will still take substantial time until it is heard. The current API doesn't expose any information about how long that additional time can be, and it is not clear if snd_pcm_delay() includes or does not include this extra time in its result.

The ALSA documentation claims that snd_pcm_delay() returns the "distance between current application frame position and sound frame position". That would point that WINE's interpretation is correct. (i.e. ignore the extra time)

However, the USB audio driver actually seems to return the total time delay as it is useful for synchronization, so follows what media players expect. (i.e. take the extra time into account)

I personally believe that the USB audio driver does it right because the most important use for snd_pcm_delay() is to achieve synchronization between audio and video. However, the other value is very important too. Not just for WINE, but also for my own work, PulseAudio.

Most of the time the small difference between those two values doesn't really matter, since the difference is very small and for classic PCI/DMI devices zero. However, with USB I am now encountering this problem, because I cannot reliably estimate from the data ALSA supplies me with when the next underrun would happen. Also, the WINE people are pretty vocal that I am not right with my interepretation of the sitaution.

Takashi, Jaroslav, could you please eleborate on this, and explain which interpretation is the correct one?

Also, regardless which one is the correct one, could we please add a second API function the allows clients to query the other value?

I have already raised this issue in differents words two times before [1]. I never got a comment from you on this. So, please please with cream on top, respond!

Thank you very much,

Lennart

[1] http://mailman.alsa-project.org/pipermail/alsa-devel/2008-April/007354.html

-- Lennart Poettering Red Hat, Inc. lennart [at] poettering [dot] net ICQ# 11060553 http://0pointer.net/lennart/ GnuPG 0x1A015CC4

Show replies by date

Rafał Mużyło

10 Jun 10 Jun

4:01 p.m.

New subject: [alsa-devel] What does snd_pcm_delay() actually return?

Just to add my 2c: wine 1.0-rc4 kind of "works" with pulseaudio 0.9.10, however in a funny kind of way. First of all, volume seems quite a bit lower than it should be and volume control doesn't seem to work (or at least it's barely noticeable). Then there's a silly effect: 'wine winmm_test wave' sometimes completes with 0-2 failures, sometimes hangs (in a random spot). I'm using standard pulseaudio asound.conf (from the webpage). The strange thing is that wine opens hw:0 for default, while I think that asound.conf should have changed that (though I may simple fail to understand that file).

James Courtier-Dutton

4:36 p.m.

Lennart Poettering wrote:

...

Takashi, Jaroslav,

could you please explain what exactly snd_pcm_delay() returns?

Some applications (such as WINE) assume it is the time that would pass until we reach an underrun if we would stop writing any further data to the PCM device.

<WRONG>

...

Other applications (such as most media players) use it for time synchronisation. i.e. assume that it is the time that passes until a sample I write to a PCM device now would take to be played.

<RIGHT>

Takashi Iwai

11 Jun 11 Jun

6:56 p.m.

At Mon, 9 Jun 2008 21:02:25 +0200, Lennart Poettering wrote:

...

Takashi, Jaroslav,

could you please explain what exactly snd_pcm_delay() returns?

Some applications (such as WINE) assume it is the time that would pass until we reach an underrun if we would stop writing any further data to the PCM device.

Other applications (such as most media players) use it for time synchronisation. i.e. assume that it is the time that passes until a sample I write to a PCM device now would take to be played.

As James already pointed, the correct answer is the latter. In the driver implementation level, snd_pcm_delay() simply returns the difference between appl_ptr and hw_ptr. It means how many samples are ahead on the buffer from the point currently being played.

However, if you stop feeding samples now, snd_pcm_delay() returns the least time XRUN occurs. So the first understanding isn't 100% wrong.

...

Why does this make a difference? For all drivers/ioplug backends where audio is not directly written to the hardware buffer of the audio device the underrun might happen much earlier than the last sample written is heard. This seems to be the case of USB audio for example. If I fill up the hw buffer completely, I will be notified about a buffer underrun much earlier than the buffer size would suggest.

Another example is the PulseAudio backend: for each client stream we maintain a private playback buffer. After that buffer there is the hardware buffer. If the client buffer runs empty we'd like to signal an underrun. However, in that situation it still will take some time until the underrun is really hearable, because the audio first has to pass the second buffer in line, the hardware buffer.

Or in other words: in some backends (PCI/DMA) after a sample is read from the ALSA playback buffer and the read index is increased it is immediately heard. In others it will still take substantial time until it is heard. The current API doesn't expose any information about how long that additional time can be, and it is not clear if snd_pcm_delay() includes or does not include this extra time in its result.

The ALSA documentation claims that snd_pcm_delay() returns the "distance between current application frame position and sound frame position". That would point that WINE's interpretation is correct. (i.e. ignore the extra time)

However, the USB audio driver actually seems to return the total time delay as it is useful for synchronization, so follows what media players expect. (i.e. take the extra time into account)

The implementation of snd_pcm_delay() (at least in the driver level) purely depends on the accuracy of PCM pointer callback of each driver. So, if the driver returns more accurate hw_ptr via pointer callback, you'll get more accurate value of snd_pcm_delay(). In the worst case, it may be bigger up to one period size than the real delay.

Takashi

...

I personally believe that the USB audio driver does it right because the most important use for snd_pcm_delay() is to achieve synchronization between audio and video. However, the other value is very important too. Not just for WINE, but also for my own work, PulseAudio.

Most of the time the small difference between those two values doesn't really matter, since the difference is very small and for classic PCI/DMI devices zero. However, with USB I am now encountering this problem, because I cannot reliably estimate from the data ALSA supplies me with when the next underrun would happen. Also, the WINE people are pretty vocal that I am not right with my interepretation of the sitaution.

Takashi, Jaroslav, could you please eleborate on this, and explain which interpretation is the correct one?

Also, regardless which one is the correct one, could we please add a second API function the allows clients to query the other value?

I have already raised this issue in differents words two times before [1]. I never got a comment from you on this. So, please please with cream on top, respond!

Thank you very much,

Lennart

[1] http://mailman.alsa-project.org/pipermail/alsa-devel/2008-April/007354.html

-- Lennart Poettering Red Hat, Inc. lennart [at] poettering [dot] net ICQ# 11060553 http://0pointer.net/lennart/ GnuPG 0x1A015CC4 _______________________________________________ Alsa-devel mailing list Alsa-devel@alsa-project.org http://mailman.alsa-project.org/mailman/listinfo/alsa-devel

Colin Guthrie

10:24 p.m.

Takashi Iwai wrote:

...

At Mon, 9 Jun 2008 21:02:25 +0200, Lennart Poettering wrote:

...
Takashi, Jaroslav,

could you please explain what exactly snd_pcm_delay() returns?

Some applications (such as WINE) assume it is the time that would pass until we reach an underrun if we would stop writing any further data to the PCM device.

Other applications (such as most media players) use it for time synchronisation. i.e. assume that it is the time that passes until a sample I write to a PCM device now would take to be played.

As James already pointed, the correct answer is the latter. In the driver implementation level, snd_pcm_delay() simply returns the difference between appl_ptr and hw_ptr. It means how many samples are ahead on the buffer from the point currently being played.

However, if you stop feeding samples now, snd_pcm_delay() returns the least time XRUN occurs. So the first understanding isn't 100% wrong.

snip

...

The implementation of snd_pcm_delay() (at least in the driver level) purely depends on the accuracy of PCM pointer callback of each driver. So, if the driver returns more accurate hw_ptr via pointer callback, you'll get more accurate value of snd_pcm_delay(). In the worst case, it may be bigger up to one period size than the real delay.

I could be wrong here as I'm only going on discussions I've had with wine folks rather than poking at the code myself (I did look a while back but I've forgotten it all now!).

AFAIK, the way Wine uses snd_pcm_delay() is to check when a sample is fully played. e.g. they wait for the function to return 0. I think this was done due to the docs specifically say that it is the "difference between appl_ptr and hw_ptr" so it makes sense to assume this will return 0 when there is nothing waiting to be played. I would strongly recommend that you remove the implementation detail from the (supposedly high level) docs.

Given this clarification can this bug please be closed as invalid? https://bugtrack.alsa-project.org/alsa-bug/view.php?id=3943

In the mean time, can you suggest how the wine code can check to see if there is any data waiting to be played (e.g. things are idle) so that they can refactor their use of snd_pcm_delay()?

Col

Disclaimer: Any wine person, please feel free to correct me if I'd misunderstood things!

Tomas Carnecky

11:40 p.m.

Colin Guthrie wrote:

...

AFAIK, the way Wine uses snd_pcm_delay() is to check when a sample is fully played. e.g. they wait for the function to return 0. I think this was done due to the docs specifically say that it is the "difference between appl_ptr and hw_ptr" so it makes sense to assume this will return 0 when there is nothing waiting to be played. I would strongly recommend that you remove the implementation detail from the (supposedly high level) docs.

Wine needs to know when a particular sample has been played by the hardware. And it should be reasonably accurate as some applications may depend on that timing. Currently Wine accounts for how many bytes it has written into the device, and then uses 'bytes_written - frames_to_bytes(snd_pcm_delay())' to find out which samples have been played. AFAICS alsa has no dedicated function to find out which samples have been played, so Wine is forced to emulate it using snd_pcm_delay(). If there is any function at all that would be more appropriate in that situation, please tell the Wine developers.

tom

Takashi Iwai

12 Jun 12 Jun

12:25 p.m.

At Wed, 11 Jun 2008 21:24:20 +0100, Colin Guthrie wrote:

...

Takashi Iwai wrote:

...
At Mon, 9 Jun 2008 21:02:25 +0200, Lennart Poettering wrote:

...
Takashi, Jaroslav,

could you please explain what exactly snd_pcm_delay() returns?

Some applications (such as WINE) assume it is the time that would pass until we reach an underrun if we would stop writing any further data to the PCM device.

Other applications (such as most media players) use it for time synchronisation. i.e. assume that it is the time that passes until a sample I write to a PCM device now would take to be played.

As James already pointed, the correct answer is the latter. In the driver implementation level, snd_pcm_delay() simply returns the difference between appl_ptr and hw_ptr. It means how many samples are ahead on the buffer from the point currently being played.

However, if you stop feeding samples now, snd_pcm_delay() returns the least time XRUN occurs. So the first understanding isn't 100% wrong.

snip

...
The implementation of snd_pcm_delay() (at least in the driver level) purely depends on the accuracy of PCM pointer callback of each driver. So, if the driver returns more accurate hw_ptr via pointer callback, you'll get more accurate value of snd_pcm_delay(). In the worst case, it may be bigger up to one period size than the real delay.

I could be wrong here as I'm only going on discussions I've had with wine folks rather than poking at the code myself (I did look a while back but I've forgotten it all now!).

AFAIK, the way Wine uses snd_pcm_delay() is to check when a sample is fully played. e.g. they wait for the function to return 0.

It returns 0 or a negative value, or returns error EPIPE which means buffer underrun, dependong on the setup. XRUN detection can be disabled.

Anyway, use snd_pcm_delay() to determine the hwptr is correct. But, of course, this doesn't guarantee the accuracy of hwptr because its accuracy depends on the driver.

...

I think this was done due to the docs specifically say that it is the "difference between appl_ptr and hw_ptr" so it makes sense to assume this will return 0 when there is nothing waiting to be played. I would strongly recommend that you remove the implementation detail from the (supposedly high level) docs.

Why? I don't see your logic...

Takashi

Colin Guthrie

1:51 p.m.

Takashi Iwai wrote:

...

...
I think this was done due to the docs specifically say that it is the "difference between appl_ptr and hw_ptr" so it makes sense to assume this will return 0 when there is nothing waiting to be played. I would strongly recommend that you remove the implementation detail from the (supposedly high level) docs.

Why? I don't see your logic...

Well in the pulseaudio driver, snd_pcm_delay() will typically *not* return 0 because it will include e.g. network latency.

Therefore the wine code which *expects* it to return 0 (e.g. waits for it to return 0) does not get the relevant trigger it needs.

The fact that the implementation detail is included in the docs means that the belief that this function *will* return 0 in all cases when no samples are left to be played is totally understandable. As you and James have confirmed, the snd_pcm_delay() function should return details about the delay expected. The fact that this is the "difference

...

...
between appl_ptr and hw_ptr" does not seem to be correct in all

cases/impelmentations.

I'm no alsa expert so forgive me if I'm just "not getting it". :p

Col

Takashi Iwai

2:08 p.m.

At Thu, 12 Jun 2008 12:51:01 +0100, Colin Guthrie wrote:

...

Takashi Iwai wrote:

...
...
I think this was done due to the docs specifically say that it is the "difference between appl_ptr and hw_ptr" so it makes sense to assume this will return 0 when there is nothing waiting to be played. I would strongly recommend that you remove the implementation detail from the (supposedly high level) docs.

Why? I don't see your logic...

Well in the pulseaudio driver, snd_pcm_delay() will typically *not* return 0 because it will include e.g. network latency.

Therefore the wine code which *expects* it to return 0 (e.g. waits for it to return 0) does not get the relevant trigger it needs.

The fact that the implementation detail is included in the docs means that the belief that this function *will* return 0 in all cases when no samples are left to be played is totally understandable. As you and James have confirmed, the snd_pcm_delay() function should return details about the delay expected. The fact that this is the "difference

...
...
between appl_ptr and hw_ptr" does not seem to be correct in all

cases/impelmentations.

I'm no alsa expert so forgive me if I'm just "not getting it". :p

AFAIK, the problem here is that the handling of hwptr isn't inconsistent in the pulse plugin. The definition of hwptr is the point being played (or at least, the point where it was already processed). So, it's fine that you take the network latency into account for calculation of hwptr like the pulse delay callback actually does.

But, then, pointer callback also must contain the same latency. If the hwptr with network latency doesn't work well, then delay callback shouldn't have the latency as well.

Takashi

Lennart Poettering

11 p.m.

On Thu, 12.06.08 14:08, Takashi Iwai (tiwai@suse.de) wrote:

...

AFAIK, the problem here is that the handling of hwptr isn't inconsistent in the pulse plugin. The definition of hwptr is the point being played (or at least, the point where it was already processed). So, it's fine that you take the network latency into account for calculation of hwptr like the pulse delay callback actually does.

But, then, pointer callback also must contain the same latency. If the hwptr with network latency doesn't work well, then delay callback shouldn't have the latency as well.

But we need the network latency in there, because it is necessary for doing a/v synchronization. The network latency can be quite substantial.

The "hw_ptr/appl_ptr" is just too simple to cover the networked cases.

Lennart

-- Lennart Poettering Red Hat, Inc. lennart [at] poettering [dot] net ICQ# 11060553 http://0pointer.net/lennart/ GnuPG 0x1A015CC4

Takashi Iwai

13 Jun 13 Jun

8:22 a.m.

At Thu, 12 Jun 2008 23:00:06 +0200, Lennart Poettering wrote:

...

On Thu, 12.06.08 14:08, Takashi Iwai (tiwai@suse.de) wrote:

...
AFAIK, the problem here is that the handling of hwptr isn't inconsistent in the pulse plugin. The definition of hwptr is the point being played (or at least, the point where it was already processed). So, it's fine that you take the network latency into account for calculation of hwptr like the pulse delay callback actually does.

But, then, pointer callback also must contain the same latency. If the hwptr with network latency doesn't work well, then delay callback shouldn't have the latency as well.

But we need the network latency in there, because it is necessary for doing a/v synchronization. The network latency can be quite substantial.

That's why I suggest to fix hwptr to consider the network latency in the first place.

...

The "hw_ptr/appl_ptr" is just too simple to cover the networked cases.

Likely. The hwptr/applptr model was designed for PCI-DMA h/w, and doesn't suit with the queue style implementation, indeed. IOW, it's optimized for mmap-style access.

Takashi

Lennart Poettering

12 Jun 12 Jun

10:52 p.m.

On Wed, 11.06.08 18:56, Takashi Iwai (tiwai@suse.de) wrote:

...

...
could you please explain what exactly snd_pcm_delay() returns?

Some applications (such as WINE) assume it is the time that would pass until we reach an underrun if we would stop writing any further data to the PCM device.

Other applications (such as most media players) use it for time synchronisation. i.e. assume that it is the time that passes until a sample I write to a PCM device now would take to be played.

As James already pointed, the correct answer is the latter. In the driver implementation level, snd_pcm_delay() simply returns the difference between appl_ptr and hw_ptr. It means how many samples are ahead on the buffer from the point currently being played.

However, if you stop feeding samples now, snd_pcm_delay() returns the least time XRUN occurs. So the first understanding isn't 100% wrong.

uh? I think we have a misunderstanding here. What you are explaining here would suggest that the first answer is the right one, but you actually claim it is the second one? This doesn't make sense to me.

As far as I understood the "hw_ptr" is the index where the PCM data is *read* from the playback buffer, While "appl_ptr" is where the data is *written* to the playback buffer. Right?

In the USB audio case, the playback buffer is in normal memory, right? Every now and then a bit of it is dma'd to the usb controller and sent over the USB wire, then on the receiving side it is buffered again and then passed to the DAC. Correct? The hw_ptr in this case is the pointer into the system memory buffer where the data will be read from next and sent to the usb wire. However, since the data is not dma'd sample-by-sample but in a block at a time, the sample that is currently hearable is still quite a bit before this index. Right? And hence would snd_pcm_delay() when always defined as "appl_ptr - hw_ptr" not really be suitable for synchronization because the hw_ptr is always a bit ahead of what is actually being played -- and WINE's interpretation would be right and the media player's (and mine) wrong. In contrast to what James and you just said.

Or again, in other words: is the delay that is caused by the fact that after the data is read from the DMA buffer it still has quite a bit time to travel to the speakers included in snd_pcm_delay(), or is it not?

In the USB case this extra time might be small. However, using PA as backend for libasound is surprisingly similar to the USB case: we maintain a buffer in memory, and PA reads from that, mixes it, does some other things with it. The time the data still has to travel after it was read from the playback buffer is much longer than for the USB case. The WINE people and I now disagree if the extra delay should be included in snd_pcm_delay() or not. I say, yes, absolutely, snd_pcm_delay() is for synchronization, not for calculating when the next xrun is going to happen. The WINE people say no, it's right the other way. You now claim the first but explain it like the latter was true.

...

...
However, the USB audio driver actually seems to return the total time delay as it is useful for synchronization, so follows what media players expect. (i.e. take the extra time into account)

The implementation of snd_pcm_delay() (at least in the driver level) purely depends on the accuracy of PCM pointer callback of each driver. So, if the driver returns more accurate hw_ptr via pointer callback, you'll get more accurate value of snd_pcm_delay(). In the worst case, it may be bigger up to one period size than the real delay.

Yes, but the question is whether hw_ptr is actually that useful for synchronization at all, if there is still some latency *after* the data was read from the playback buffer.

Is it clear now, what I actually try to explain here? This sure is tricky stuff, I am not sure how I should explain better what I mean. Please ask for clarifications if I am not clear enough with what I try to point out!

You didn't respond to my suggesion to add a second function, so that we'd have two: one that includes the extra delay after hw_ptr, and the other that does not. The former would then be useful for audio/video sync, the latter for estimating when an underrun will happen. Would it be possible to add this?

Lennart

-- Lennart Poettering Red Hat, Inc. lennart [at] poettering [dot] net ICQ# 11060553 http://0pointer.net/lennart/ GnuPG 0x1A015CC4

Takashi Iwai

13 Jun 13 Jun

8:13 a.m.

At Thu, 12 Jun 2008 22:52:25 +0200, Lennart Poettering wrote:

...

On Wed, 11.06.08 18:56, Takashi Iwai (tiwai@suse.de) wrote:

...
...
could you please explain what exactly snd_pcm_delay() returns?

Some applications (such as WINE) assume it is the time that would pass until we reach an underrun if we would stop writing any further data to the PCM device.

Other applications (such as most media players) use it for time synchronisation. i.e. assume that it is the time that passes until a sample I write to a PCM device now would take to be played.

As James already pointed, the correct answer is the latter. In the driver implementation level, snd_pcm_delay() simply returns the difference between appl_ptr and hw_ptr. It means how many samples are ahead on the buffer from the point currently being played.

However, if you stop feeding samples now, snd_pcm_delay() returns the least time XRUN occurs. So the first understanding isn't 100% wrong.

uh? I think we have a misunderstanding here. What you are explaining here would suggest that the first answer is the right one, but you actually claim it is the second one? This doesn't make sense to me.

As far as I understood the "hw_ptr" is the index where the PCM data is *read* from the playback buffer, While "appl_ptr" is where the data is *written* to the playback buffer. Right?

No, the hwptr is the point whether the data is being played... well, ideally. The hwptr can behind the actual point. But, it shouldn't be ahead the actual point.

...

In the USB audio case, the playback buffer is in normal memory, right? Every now and then a bit of it is dma'd to the usb controller and sent over the USB wire, then on the receiving side it is buffered again and then passed to the DAC. Correct? The hw_ptr in this case is the pointer into the system memory buffer where the data will be read from next and sent to the usb wire.

It shouldn't be so in a restricted manner.

...

However, since the data is not dma'd sample-by-sample but in a block at a time, the sample that is currently hearable is still quite a bit before this index. Right? And hence would snd_pcm_delay() when always defined as "appl_ptr - hw_ptr" not really be suitable for synchronization because the hw_ptr is always a bit ahead of what is actually being played -- and WINE's interpretation would be right and the media player's (and mine) wrong. In contrast to what James and you just said.

It's a problem of usb-driver.

...

Or again, in other words: is the delay that is caused by the fact that after the data is read from the DMA buffer it still has quite a bit time to travel to the speakers included in snd_pcm_delay(), or is it not?

No. It's appl_ptr - hw_ptr per definition.

In the ALSA framework, it's so simplified. And yes, this is suboptimal in some implementation cases.

Takashi

Lennart Poettering

3:51 p.m.

On Fri, 13.06.08 08:13, Takashi Iwai (tiwai@suse.de) wrote:

...

...
uh? I think we have a misunderstanding here. What you are explaining here would suggest that the first answer is the right one, but you actually claim it is the second one? This doesn't make sense to me.

As far as I understood the "hw_ptr" is the index where the PCM data is *read* from the playback buffer, While "appl_ptr" is where the data is *written* to the playback buffer. Right?

No, the hwptr is the point whether the data is being played... well, ideally. The hwptr can behind the actual point. But, it shouldn't be ahead the actual point.

Ah, that's good to know. Thanks for the clarification!

...

...
However, since the data is not dma'd sample-by-sample but in a block at a time, the sample that is currently hearable is still quite a bit before this index. Right? And hence would snd_pcm_delay() when always defined as "appl_ptr - hw_ptr" not really be suitable for synchronization because the hw_ptr is always a bit ahead of what is actually being played -- and WINE's interpretation would be right and the media player's (and mine) wrong. In contrast to what James and you just said.

It's a problem of usb-driver.

OK, so to summarize the situation:

a) snd_pcm_delay() should actually return write-to-hear latency. And is supposed to be called *before* the next write and it will tell you when that next write will reach the speakers, in the sound card device time domain. For networked outputs it will most likely never return 0.

b) for the usb driver this is however broken and it returns the fill level of the playback buffer

c) snd_pcm_hw_params_get_buffer_size() - snd_pcm_update_avail() returns the fill level, but only works on mmap.

d) The snd_pcm_hw_params_get_fifo_size() was supposed to return the 'difference' between a) and c), but was never actually used for that.

(Since this matches Lennart's original interpretation of the situation he's now happy... ;-))

Lennart

-- Lennart Poettering Red Hat, Inc. lennart [at] poettering [dot] net ICQ# 11060553 http://0pointer.net/lennart/ GnuPG 0x1A015CC4

Jaroslav Kysela

3:55 p.m.

On Fri, 13 Jun 2008, Lennart Poettering wrote:

...

c) snd_pcm_hw_params_get_buffer_size() - snd_pcm_update_avail() returns the fill level, but only works on mmap.

snd_pcm_update_avail() works (or should work) also for standard r/w ops.

...

d) The snd_pcm_hw_params_get_fifo_size() was supposed to return the 'difference' between a) and c), but was never actually used for that.

Nope. It's maximum latency added by an extra fifo/queue (it does not depend if it's hw or sw fifo). For variable fifo size, we need to extend API.

Jaroslav

----- Jaroslav Kysela perex@perex.cz Linux Kernel Sound Maintainer ALSA Project, Red Hat, Inc.

Lennart Poettering

4:29 p.m.

On Fri, 13.06.08 15:55, Jaroslav Kysela (perex@perex.cz) wrote:

...

On Fri, 13 Jun 2008, Lennart Poettering wrote:

...
c) snd_pcm_hw_params_get_buffer_size() - snd_pcm_update_avail() returns the fill level, but only works on mmap.

snd_pcm_update_avail() works (or should work) also for standard r/w ops.

Ok, so it's just the documentation that needs fixing? Quoting: "Using of this function is useless for the standard read/write operations. Use it only for mmap access."

...

...
d) The snd_pcm_hw_params_get_fifo_size() was supposed to return the 'difference' between a) and c), but was never actually used for that.

Nope. It's maximum latency added by an extra fifo/queue (it does not depend if it's hw or sw fifo). For variable fifo size, we need to extend API.

Ok, so it's an upper ceiling for the difference between a) and c), right?

Lennart

-- Lennart Poettering Red Hat, Inc. lennart [at] poettering [dot] net ICQ# 11060553 http://0pointer.net/lennart/ GnuPG 0x1A015CC4

Takashi Iwai

8:59 a.m.

At Thu, 12 Jun 2008 22:52:25 +0200, Lennart Poettering wrote:

...

You didn't respond to my suggesion to add a second function,

Hm, which function? I seem to have missed that.

...

so that we'd have two: one that includes the extra delay after hw_ptr, and the other that does not. The former would then be useful for audio/video sync, the latter for estimating when an underrun will happen. Would it be possible to add this?

Indeed, there are two problems in the background: - the period irq model - no latency assumption

The period irq model is what you called "traditional playback model". No latency assumption in hwptr comes from the PCI-style hardware.

About the latency, my proposal is like below:

- Renew the definition of hwptr -- make it what you imagned, the pointer where hardware is reading or has read.

This won't change anything for PCI drivers, so the impact is minimal.

- Add some new API functions, * To give the accuracy of the position inquiry (optional)

This requires some new kernel <-> user-space stuff.

* To query the known latency

Ditto, or we may reuse snd_pcm_hw_params_fifo_size()?

* To query the position being played (optionally)

This can be simply hwptr + latency.

- Don't change snd_pcm_delay(). Keep as is now.

About the period model -- it's a bit more tough problem. I once already made a patch to allow user-apps to use a system timer for more flexible period settings. But, it was for very old version of ALSA. Better to write from the scratch again...

Takashi

Jaroslav Kysela

10:14 a.m.

On Fri, 13 Jun 2008, Takashi Iwai wrote:

...

Add some new API functions,

I would prefer to extend the current API than to change meaning of hw_ptr to handle extra latencies.

...

To give the accuracy of the position inquiry (optional)

This requires some new kernel <-> user-space stuff.

...

To query the known latency

Ditto, or we may reuse snd_pcm_hw_params_fifo_size()?

Yes, fifo_size was designed to announce possible extra latency to applications.

...

To query the position being played (optionally)

This can be simply hwptr + latency.

Don't change snd_pcm_delay(). Keep as is now.

About the period model -- it's a bit more tough problem. I once already made a patch to allow user-apps to use a system timer for more flexible period settings. But, it was for very old version of ALSA. Better to write from the scratch again...

I think that the current PCM API concept is tightly period based. You cannot change it easily. It would be probably better to move to "byte-stream" in next revision of PCM API.

Jaroslav

----- Jaroslav Kysela perex@perex.cz Linux Kernel Sound Maintainer ALSA Project, Red Hat, Inc.

James Courtier-Dutton

12:14 p.m.

Jaroslav Kysela wrote:

...

On Fri, 13 Jun 2008, Takashi Iwai wrote:

...

Add some new API functions,

I would prefer to extend the current API than to change meaning of hw_ptr to handle extra latencies.

I would prefer the definition of snd_pcm_delay() to be: Before the next sample is written to the buffer, snd_pcm_delay() returns the expect time delay, in frames, indicating the time the next sample will reach the speakers. This is what snd_pcm_delay() was introduced into the ALSA api for. I remember, because it was added as a result of me requesting it.

Then implement a totally different api call to help with under run timings.

James

Colin Guthrie

2:44 p.m.

James Courtier-Dutton wrote:

...

Jaroslav Kysela wrote:

...
On Fri, 13 Jun 2008, Takashi Iwai wrote:

...

Add some new API functions,

I would prefer to extend the current API than to change meaning of hw_ptr to handle extra latencies.

I would prefer the definition of snd_pcm_delay() to be: Before the next sample is written to the buffer, snd_pcm_delay() returns the expect time delay, in frames, indicating the time the next sample will reach the speakers. This is what snd_pcm_delay() was introduced into the ALSA api for. I remember, because it was added as a result of me requesting it.

Then implement a totally different api call to help with under run timings.

Big +1

You also have to think of the use cases currently out there.

* World + Dog uses it as James (and Lennart) states. * Wine uses it sans latency.

So lots of apps having to change to cope verses one (known) app changing. I think even the wine folk would agree this is more sensible!

Also by changing the definition, would you not have to change the usb-audio driver works? In general I'd say it's a whole lot less work to leave snd_pcm_delay() as it is, update the docs and add a new api function for under run stuff.

Col

Jaroslav Kysela

3:06 p.m.

On Fri, 13 Jun 2008, Colin Guthrie wrote:

...

James Courtier-Dutton wrote:

...
Jaroslav Kysela wrote:

...
On Fri, 13 Jun 2008, Takashi Iwai wrote:

...

Add some new API functions,

I would prefer to extend the current API than to change meaning of hw_ptr to handle extra latencies.

I would prefer the definition of snd_pcm_delay() to be: Before the next sample is written to the buffer, snd_pcm_delay() returns the expect time delay, in frames, indicating the time the next sample will reach the speakers. This is what snd_pcm_delay() was introduced into the ALSA api for. I remember, because it was added as a result of me requesting it.

Then implement a totally different api call to help with under run timings.

Big +1

You also have to think of the use cases currently out there.

World + Dog uses it as James (and Lennart) states.

Wine uses it sans latency.

The comment for snd_pcm_delay() in alsa-lib is clear and it's what James wrote. If only one app interprets this wrongly, then I agree it would be better not to change original meaning.

Then we have snd_pcm_avail_update(). Although it is stated in comment that this function is useless for non-mmap mode, it's quite clear that returns number of available frames to be handled (r/w) by application.

The easiest method would be just to remove "useless for non-mmap" from comments for snd_pcm_avail_update() and suggest to application developers:

1) overall latency is returned by snd_pcm_delay() 2) ring buffer filling is controlled by snd_pcm_avail_update(), for non-mmap access is usage of this function optional

As mentioned, it will need to fix some drivers mixing software output FIFO with ring buffer (USB, PCMCIA)..

Opinions?

Jaroslav

----- Jaroslav Kysela perex@perex.cz Linux Kernel Sound Maintainer ALSA Project, Red Hat, Inc.

Lennart Poettering

4:48 p.m.

On Fri, 13.06.08 15:06, Jaroslav Kysela (perex@perex.cz) wrote:

...

The comment for snd_pcm_delay() in alsa-lib is clear and it's what James wrote. If only one app interprets this wrongly, then I agree it would be better not to change original meaning.

Then we have snd_pcm_avail_update(). Although it is stated in comment that this function is useless for non-mmap mode, it's quite clear that returns number of available frames to be handled (r/w) by application.

The easiest method would be just to remove "useless for non-mmap" from comments for snd_pcm_avail_update() and suggest to application developers:

overall latency is returned by snd_pcm_delay()

ring buffer filling is controlled by snd_pcm_avail_update(), for non-mmap access is usage of this function optional

As mentioned, it will need to fix some drivers mixing software output FIFO with ring buffer (USB, PCMCIA)..

Opinions?

Sounds good to me, on first sight.

Hmm, however, there is one thing I'd still need for PulseAudio:

I'd like to know when (in time units) the playback buffer would underrun from now on if I don't write anything anymore. For the USB driver at least this happens much earlier than just calculating (buffer_size - snd_pcm_avail_update()) and transforming that into time units, because the USB driver seems to remove a block at a time from the playback buffer, and hence it will signal the XRUN much earlier then the aforementioned value. To fix this I'd need to know what this granularity is. If I knew that I could fix my sleep time accordingly.

In short: I need some kind of granularity information about snd_pcm_avail_update() but I must admit that right now I am not actually sure which parameter would be the best one to know about.

Lennart

-- Lennart Poettering Red Hat, Inc. lennart [at] poettering [dot] net ICQ# 11060553 http://0pointer.net/lennart/ GnuPG 0x1A015CC4

Jaroslav Kysela

5:02 p.m.

On Fri, 13 Jun 2008, Lennart Poettering wrote:

...

On Fri, 13.06.08 15:06, Jaroslav Kysela (perex@perex.cz) wrote:

...
The comment for snd_pcm_delay() in alsa-lib is clear and it's what James wrote. If only one app interprets this wrongly, then I agree it would be better not to change original meaning.

Then we have snd_pcm_avail_update(). Although it is stated in comment that this function is useless for non-mmap mode, it's quite clear that returns number of available frames to be handled (r/w) by application.

The easiest method would be just to remove "useless for non-mmap" from comments for snd_pcm_avail_update() and suggest to application developers:

overall latency is returned by snd_pcm_delay()

ring buffer filling is controlled by snd_pcm_avail_update(), for non-mmap access is usage of this function optional

As mentioned, it will need to fix some drivers mixing software output FIFO with ring buffer (USB, PCMCIA)..

Opinions?

Sounds good to me, on first sight.

Hmm, however, there is one thing I'd still need for PulseAudio:

I'd like to know when (in time units) the playback buffer would underrun from now on if I don't write anything anymore. For the USB driver at least this happens much earlier than just calculating (buffer_size - snd_pcm_avail_update()) and transforming that into time units, because the USB driver seems to remove a block at a time from the playback buffer, and hence it will signal the XRUN much earlier then the aforementioned value. To fix this I'd need to know what this granularity is. If I knew that I could fix my sleep time accordingly.

In short: I need some kind of granularity information about snd_pcm_avail_update() but I must admit that right now I am not actually sure which parameter would be the best one to know about.

I think that USB drivers should be fixed using a software FIFO. It means that the ring buffer will be fully maintainable (and underrun occurs correctly then).

It means adding extra URBs to which will be copied data. They will work as "extra" FIFO counted in snd_pcm_delay() but not counted in snd_pcm_avail_update(). If I look to USB code correctly, it's just about to change hw_ptr management. Data are copied to URBs anyway.

Jaroslav

----- Jaroslav Kysela perex@perex.cz Linux Kernel Sound Maintainer ALSA Project, Red Hat, Inc.

Takashi Iwai

5:42 p.m.

At Fri, 13 Jun 2008 17:02:47 +0200 (CEST), Jaroslav Kysela wrote:

...

On Fri, 13 Jun 2008, Lennart Poettering wrote:

...
On Fri, 13.06.08 15:06, Jaroslav Kysela (perex@perex.cz) wrote:

...
The comment for snd_pcm_delay() in alsa-lib is clear and it's what James wrote. If only one app interprets this wrongly, then I agree it would be better not to change original meaning.

Then we have snd_pcm_avail_update(). Although it is stated in comment that this function is useless for non-mmap mode, it's quite clear that returns number of available frames to be handled (r/w) by application.

The easiest method would be just to remove "useless for non-mmap" from comments for snd_pcm_avail_update() and suggest to application developers:

overall latency is returned by snd_pcm_delay()

ring buffer filling is controlled by snd_pcm_avail_update(), for non-mmap access is usage of this function optional

As mentioned, it will need to fix some drivers mixing software output FIFO with ring buffer (USB, PCMCIA)..

Opinions?

Sounds good to me, on first sight.

Hmm, however, there is one thing I'd still need for PulseAudio:

I'd like to know when (in time units) the playback buffer would underrun from now on if I don't write anything anymore. For the USB driver at least this happens much earlier than just calculating (buffer_size - snd_pcm_avail_update()) and transforming that into time units, because the USB driver seems to remove a block at a time from the playback buffer, and hence it will signal the XRUN much earlier then the aforementioned value. To fix this I'd need to know what this granularity is. If I knew that I could fix my sleep time accordingly.

In short: I need some kind of granularity information about snd_pcm_avail_update() but I must admit that right now I am not actually sure which parameter would be the best one to know about.

I think that USB drivers should be fixed using a software FIFO. It means that the ring buffer will be fully maintainable (and underrun occurs correctly then).

It means adding extra URBs to which will be copied data. They will work as "extra" FIFO counted in snd_pcm_delay() but not counted in snd_pcm_avail_update(). If I look to USB code correctly, it's just about to change hw_ptr management. Data are copied to URBs anyway.

Hmm... I don't buy this.

First off, it's not only about USB but for hardwares that require the own h/w queue. There bunch of such hardwares, and we provide really poor support for them. The above sounds like bandaiding on a bandaid over a bandaid.

Well, now I continue to discuss in another post, following Lennart's reply.

Takashi

Eliot Blennerhassett

17 Jun 17 Jun

2:53 a.m.

Takashi Iwai wrote:

...

First off, it's not only about USB but for hardwares that require the own h/w queue. There bunch of such hardwares, and we provide really poor support for them. The above sounds like bandaiding on a bandaid over a bandaid.

Hmm. This thread will take some digesting...meanwhile here are my (not particularly coherent) thoughts from the POV of driver developer for cards that have large on-card FIFOs. (asihpi)

The period-based refresh makes it hard to use the fifo effectively. If the card fifo is allowed to 'suck' all the data from the ringbuffer then it makes it look like an underrun. Also it makes time appear to run fast until the fifo is filled up.

The 'fast time' creates problems for ALSA on playback start, because alsa assumes that it will take a whole period for a period of data to be consumed, while the driver is capable of consuming multiple periods almost instantly. In my driver I have to throttle the rate that data is transferred to the card fifos.

Just as in the network case, where it is desirable to get data across the network as early as possible to allow secure playback, we want to fill the oncard fifo as early and as much as possible.

Only if both ringbuffer and fifo are empty (playback) or full (record) is a true xrun occuring.

Our underlying API returns the following info * Size of host ringbuffer * Amount of unread data in host ringbuffer * Amount of unread data in card fifo * Frames output at the DAC/input at ADC since stream start Plus info about granularity and latency of the above info.

I suspect that a byte-stream approach would be a better match to our cards.

regards

Eliot Blennerhassett

Lennart Poettering

13 Jun 13 Jun

4:38 p.m.

On Fri, 13.06.08 11:14, James Courtier-Dutton (James@superbug.co.uk) wrote:

...

...
I would prefer to extend the current API than to change meaning of hw_ptr to handle extra latencies.

...

I would prefer the definition of snd_pcm_delay() to be: Before the next sample is written to the buffer, snd_pcm_delay() returns the expect time delay, in frames, indicating the time the next sample will reach the speakers.

This is how I always understood the API.

OTOH I actually can understand the WINE folks, that they want to have an API that can be used to query how much of what has already been written is still unplayed. Why? because snd_pcm_delay() in your definition would return the time in samples *in the sound card time domain*. To emulate what they need from this value is quite hard, because you'd need to deal with deviating system/audio clocks. It's pain.

(And that's why I'd love to see that much more generic snd_pcm_get_timing() call implemented, which supports all of these timing parameters)

Lennart

-- Lennart Poettering Red Hat, Inc. lennart [at] poettering [dot] net ICQ# 11060553 http://0pointer.net/lennart/ GnuPG 0x1A015CC4

Takashi Iwai

5:27 p.m.

At Fri, 13 Jun 2008 10:14:33 +0200 (CEST), Jaroslav Kysela wrote:

...

On Fri, 13 Jun 2008, Takashi Iwai wrote:

...

Add some new API functions,

I would prefer to extend the current API than to change meaning of hw_ptr to handle extra latencies.

...

To give the accuracy of the position inquiry (optional)

This requires some new kernel <-> user-space stuff.

...

To query the known latency

Ditto, or we may reuse snd_pcm_hw_params_fifo_size()?

Yes, fifo_size was designed to announce possible extra latency to applications.

On the second thought, it's better *not* to query this value as hw_params. The latency may be variable. And the word "FIFO" isn't appropriate in every case.

And, above all, reviving an old API is bad...

...

...

To query the position being played (optionally)

This can be simply hwptr + latency.

Don't change snd_pcm_delay(). Keep as is now.

About the period model -- it's a bit more tough problem. I once already made a patch to allow user-apps to use a system timer for more flexible period settings. But, it was for very old version of ALSA. Better to write from the scratch again...

I think that the current PCM API concept is tightly period based. You cannot change it easily. It would be probably better to move to "byte-stream" in next revision of PCM API.

Not that difficult, I guess, from API POV. The major work is in the PCM core part and some alsa-lib plugins. But, it's not API.

What I once worked on is an extra timinig queue. Suppose that we provide an API to access a timing queue that holds the wake-up schedule, either in time or sample unit. The poll / read / write syscalls are woken up at the time of this schedule. In the case of period model, it means that the queue is automatically filled up with a constant period. If app wants to schedule by itself, it can use this queue manually. (Of course, this means that the timing queue must be filled before starting the stream.)

Takashi

Jaroslav Kysela

5:44 p.m.

On Fri, 13 Jun 2008, Takashi Iwai wrote:

...

...
...
Ditto, or we may reuse snd_pcm_hw_params_fifo_size()?
Yes, fifo_size was designed to announce possible extra latency to applications.
On the second thought, it's better *not* to query this value as hw_params. The latency may be variable. And the word "FIFO" isn't appropriate in every case.

And, above all, reviving an old API is bad...

This value should define maximum latency - not actual latency. snd_pcm_delay() should give actual overall latency to apps.

...

...
I think that the current PCM API concept is tightly period based. You cannot change it easily. It would be probably better to move to "byte-stream" in next revision of PCM API.

Not that difficult, I guess, from API POV. The major work is in the PCM core part and some alsa-lib plugins. But, it's not API.

What I once worked on is an extra timinig queue. Suppose that we provide an API to access a timing queue that holds the wake-up schedule, either in time or sample unit. The poll / read / write syscalls are woken up at the time of this schedule. In the case of period model, it means that the queue is automatically filled up with a constant period. If app wants to schedule by itself, it can use this queue manually. (Of course, this means that the timing queue must be filled before starting the stream.)

It look like more complicated sleep_min implementation we already had:

http://git.alsa-project.org/?p=alsa-kernel.git;a=commit;h=31e8960b35975ed235...

Jaroslav

----- Jaroslav Kysela perex@perex.cz Linux Kernel Sound Maintainer ALSA Project, Red Hat, Inc.

Takashi Iwai

5:59 p.m.

At Fri, 13 Jun 2008 17:44:24 +0200 (CEST), Jaroslav Kysela wrote:

...

On Fri, 13 Jun 2008, Takashi Iwai wrote:

...
...
...
Ditto, or we may reuse snd_pcm_hw_params_fifo_size()?
Yes, fifo_size was designed to announce possible extra latency to applications.
On the second thought, it's better *not* to query this value as hw_params. The latency may be variable. And the word "FIFO" isn't appropriate in every case.

And, above all, reviving an old API is bad...
This value should define maximum latency - not actual latency. snd_pcm_delay() should give actual overall latency to apps.

...
...
I think that the current PCM API concept is tightly period based. You cannot change it easily. It would be probably better to move to "byte-stream" in next revision of PCM API.

Not that difficult, I guess, from API POV. The major work is in the PCM core part and some alsa-lib plugins. But, it's not API.

What I once worked on is an extra timinig queue. Suppose that we provide an API to access a timing queue that holds the wake-up schedule, either in time or sample unit. The poll / read / write syscalls are woken up at the time of this schedule. In the case of period model, it means that the queue is automatically filled up with a constant period. If app wants to schedule by itself, it can use this queue manually. (Of course, this means that the timing queue must be filled before starting the stream.)

It look like more complicated sleep_min implementation we already had:

http://git.alsa-project.org/?p=alsa-kernel.git;a=commit;h=31e8960b35975ed235...

Yes and no. Yes, it updates with a system timer. No, it's not bound with the period like the current framework.

The point is that the wake-up timing isn't defined as constant but via a timing queue (or a request queue). This is more suitable for pull-style apps like JACK. Which irq source is used doesn't matter.

Takashi

Jaroslav Kysela

6:20 p.m.

On Fri, 13 Jun 2008, Takashi Iwai wrote:

...

The point is that the wake-up timing isn't defined as constant but via a timing queue (or a request queue). This is more suitable for pull-style apps like JACK. Which irq source is used doesn't matter.

But applications can use timers or any other wakeup source directly (for example from video card interrupt) - thus I'm not sure if it's good to complicate our API, again. If we remove period tied I/O operations and assumptions in alsa-lib (thus we will do only byte-stream transfers), then everything will be fine and possible.

Other option is to create such timing queues separately as complete new API without integration to PCM API. Complex apps call poll() anyway. Support this complex timing for read()/write() for simple apps does not make much sense, I think.

Jaroslav

----- Jaroslav Kysela perex@perex.cz Linux Kernel Sound Maintainer ALSA Project, Red Hat, Inc.

Takashi Iwai

6:38 p.m.

At Fri, 13 Jun 2008 18:20:43 +0200 (CEST), Jaroslav Kysela wrote:

...

On Fri, 13 Jun 2008, Takashi Iwai wrote:

...
The point is that the wake-up timing isn't defined as constant but via a timing queue (or a request queue). This is more suitable for pull-style apps like JACK. Which irq source is used doesn't matter.

But applications can use timers or any other wakeup source directly (for example from video card interrupt) - thus I'm not sure if it's good to complicate our API, again. If we remove period tied I/O operations and assumptions in alsa-lib (thus we will do only byte-stream transfers), then everything will be fine and possible.

Other option is to create such timing queues separately as complete new API without integration to PCM API. Complex apps call poll() anyway. Support this complex timing for read()/write() for simple apps does not make much sense, I think.

Well, you miss the point. Let me clarify some issues:

- The timing queue is optional. As default, we assume the period model as is now.

- The period model is internally driven as the automatic fill of constant periods to the timing queue.

- The stream is handled as a byte-stream.

- If apps wants another wakeup source, it's fine. This has nothing to do with wakeup of the sound driver, no? It's up to apps how to handle streams. We just provide the timing from our system.

- What the timing queue affects is only the wake-up timing of poll/read/write. So, basically, no other API change is required (although a simpler API with combination of queue+read/write would be more helpful).

- In my original implementation, periods become the sync points between the hardware position and the system timer. But, this can be optional.

Takashi

Takashi Iwai

6:48 p.m.

At Fri, 13 Jun 2008 18:38:53 +0200, I wrote:

...

At Fri, 13 Jun 2008 18:20:43 +0200 (CEST), Jaroslav Kysela wrote:

...
On Fri, 13 Jun 2008, Takashi Iwai wrote:

...
The point is that the wake-up timing isn't defined as constant but via a timing queue (or a request queue). This is more suitable for pull-style apps like JACK. Which irq source is used doesn't matter.

But applications can use timers or any other wakeup source directly (for example from video card interrupt) - thus I'm not sure if it's good to complicate our API, again. If we remove period tied I/O operations and assumptions in alsa-lib (thus we will do only byte-stream transfers), then everything will be fine and possible.

Other option is to create such timing queues separately as complete new API without integration to PCM API. Complex apps call poll() anyway. Support this complex timing for read()/write() for simple apps does not make much sense, I think.

Well, you miss the point. Let me clarify some issues:

The timing queue is optional. As default, we assume the period model as is now.

The period model is internally driven as the automatic fill of constant periods to the timing queue.

The stream is handled as a byte-stream.

If apps wants another wakeup source, it's fine. This has nothing to do with wakeup of the sound driver, no? It's up to apps how to handle streams. We just provide the timing from our system.

What the timing queue affects is only the wake-up timing of poll/read/write. So, basically, no other API change is required (although a simpler API with combination of queue+read/write would be more helpful).

In my original implementation, periods become the sync points between the hardware position and the system timer. But, this can be optional.

And one thing forgotten.

- If this were so great, I'd have already pushed to the upstream. But I didn't. So you can guess that it's no perfect solution.

Takashi

James Courtier-Dutton

9:55 p.m.

Jaroslav Kysela wrote:

...

If we remove period tied I/O operations and assumptions in alsa-lib (thus we will do only byte-stream transfers), then everything will be fine and possible.

When will this "byte-stream" architecture be implemented?

James

Jaroslav Kysela

16 Jun 16 Jun

2:07 p.m.

Hi all,

I tried to describe expressions for all values as Lennart proposed with current snd_pcm_avail_update() / snd_pcm_delay() functions and add some more comments to things noted in this discussion. I do not think that we should make internal ring buffer pointers external for applications.

Lennart's proposal (marked with *, my reply marked with -)

* ABSOLUTE_WRITE: the current absolute write counter in samples, since the device was opened. - the application can calculate this value itself without too much pain

* ABSOLUTE_READ: the current absolute read counter in samples, since the device was opened. - if it's value 'how much samples consumer (driver) read': ABSOLUTE_WRITE - (buffer_size - snd_pcm_avail_update())

* ABSOLUTE_HEAR: the current absolute hear counter in samples, since the device was opened - ABSOLUTE_WRITE - snd_pcm_delay()

* WRITE_TO_READ: the current fill level of the playback buffer - buffer_size - snd_pcm_avail_update()

* WRITE_TO_HEAR: if i write a sample immediately after this call, how much time takes it to be played. - snd_pcm_delay()

* READ_TO_HEAR: the 'fifo' latency, i.e. the time that passes after a sample was read form the playback buffer that it is actually played. - snd_pcm_delay() - (buffer_size - snd_pcm_avail_update())

* CAN_WRITE: how much can be written into the buffer right now (buffer_size - WRITE_TO_READ) - snd_pcm_avail_update()

* LEFT_TO_PLAY: similar to WRITE_TO_HEAR but callable *after* we wrote something, and it will return how much of what we wrote has not been heard yet. In contrast to WRITE_TO_HEAR this will return 0 eventually. - this accouting can be easily done in app using snd_pcm_delay() - application know how many samples were written

Atomicity of 'avail / delay' values:

It's not actually guaranteed. But I'm not sure if it's required to have these values "in exact sync". Both values have different meaning. While avail value should be used for the ring buffer filling / draining, the purpose of delay value is for sync with other timing source. In time of call, both functions (snd_pcm_avail_update() and snd_pcm_delay()) return correct value.

Only the READ_TO_HEAD expression uses both delay + avail values. I'm not sure for which purpose application will use this value. It seems to me only purely informational and we should offer this "internal queue size" value using another API.

But I am open to implement atomic call in alsa-lib:

Original functions:

snd_pcm_avail_update() /* light version - not forced ptr sync */ snd_pcm_delay() /* already includes hwsync() */ snd_pcm_hwsync() /* might be osoleted by snd_pcm_avail() ? */

New:

snd_pcm_sframes_t snd_pcm_avail(snd_pcm_t *); /* adds hwsync() */ int snd_pcm_avail_and_delay(snd_pcm_t *, snd_sframes_t *avail, snd_sframes_t *delay); /* hwsync() + avail_update() + delay() - atomic operation */

This extension will allow us to optimize case when both avail and delay values are required by application and we can remove "problematic" snd_pcm_hwsync() function by snd_pcm_avail() function.

Exposing appl_ptr / hw_ptr / curr_ptr to applications

More complicated buffer state description. Applications must do pointer wrap calculations itself, including underrun / overrun detection. It's far more complicated that current avail / delay scheme.

Jaroslav

----- Jaroslav Kysela perex@perex.cz Linux Kernel Sound Maintainer ALSA Project, Red Hat, Inc.

Lennart Poettering

19 Jun 19 Jun

7:59 p.m.

On Mon, 16.06.08 14:07, Jaroslav Kysela (perex@perex.cz) wrote:

...

Hi all,

I tried to describe expressions for all values as Lennart proposed with current snd_pcm_avail_update() / snd_pcm_delay() functions and add some more comments to things noted in this discussion. I do not think that we should make internal ring buffer pointers external for applications.

BTW, i changed my mind on this one. I now agree that exporting the pointers is a bad idea, because they are not sufficient to detect the length of buffer underruns that are more than one buffer size long. However snd_pcm_avail_updates() informs properly about those. Hence I think exporting the pointers just like that is a bad idea.

...

LEFT_TO_PLAY: similar to WRITE_TO_HEAR but callable *after* we wrote something, and it will return how much of what we wrote has not been heard yet. In contrast to WRITE_TO_HEAR this will return 0 eventually.

this accouting can be easily done in app using snd_pcm_delay() - application know how many samples were written

This one is actually much more complex. If you want to do it properly you need to save snd_pcm_delay() before you write something and than subtract the time passed since that point of time. However the problem here is that the time passed since then you only get in the system time domain, while snd_pcm_delay() is in sound card time domain. So doing this properly is difficult. If you don't care about correctness you can of course ignore the different timing sources and then the calculation isn't that difficult anymore.

...

int snd_pcm_avail_and_delay(snd_pcm_t *, snd_sframes_t *avail, snd_sframes_t *delay); /* hwsync() + avail_update() + delay() - atomic operation */

This sounds good to me.

Lennart

-- Lennart Poettering Red Hat, Inc. lennart [at] poettering [dot] net ICQ# 11060553 http://0pointer.net/lennart/ GnuPG 0x1A015CC4

James Courtier-Dutton

13 Jun 13 Jun

9:59 p.m.

Jaroslav Kysela wrote:

...

If we remove period tied I/O operations and assumptions in alsa-lib (thus we will do only byte-stream transfers), then everything will be fine and possible.

When will this "byte-stream" architecture be implemented? I think it would be a good improvement to ALSA. The current period based architecture has become difficult for users to use properly.

James

Lennart Poettering

4:25 p.m.

On Fri, 13.06.08 08:59, Takashi Iwai (tiwai@suse.de) wrote:

...

About the latency, my proposal is like below:

Renew the definition of hwptr -- make it what you imagned, the pointer where hardware is reading or has read.

This won't change anything for PCI drivers, so the impact is minimal.

And it would suddenly make the USB drivers do the right thing ;-)

OTOH I am not a big fan of this solution, since the delay in "snd_pcm_delay()" kind of suggests that it is useful for time synchronization.

...

Add some new API functions,

To give the accuracy of the position inquiry (optional)

This requires some new kernel <-> user-space stuff.

I'd really like to have this. Only when I know this I can know how near to the current hw_ptr I can actually still make changes in the playback buffer.

...

To query the known latency

Ditto, or we may reuse snd_pcm_hw_params_fifo_size()?

I am not a fan of snd_pcm_hw_params_fifo_size() because the 'fifo' latency is dynamic in the network case, it changes all the time with the level of network congestion. However if something is part of hw_params it is supposed to stay fixed, right?

Hence I'd rather prefer if we let snd_pcm_hw_params_fifo_size() rest in peace.

My personal favourite solution would actually be to have a new call that allows you to query *all* timing related values *atomically*. Why? different programs need different timing information from ALSA. Also, timing information happens to change all the time. If we just export two basic values, than people might end up doing arithmetics on them, in the risk the two values are not consistent with each other, since between querying them some time already passed. So, what I would suggest is this:

typedef enum snd_pcm_timing { SND_PCM_TIMING_ABSOLUTE_WRITE, SND_PCM_TIMING_ABSOLUTE_READ, SND_PCM_TIMING_ABSOLUTE_HEAR, SND_PCM_TIMING_WRITE_TO_READ, SND_PCM_TIMING_WRITE_TO_HEAR, SND_PCM_TIMING_READ_TO_HEAR, SND_PCM_TIMING_CAN_WRITE, SND_PCM_TIMING_LEFT_TO_PLAY } snd_pcm_timing_t;

snd_pcm_sframes_t snd_pcm_get_timing(snd_pcm_t *pcm, snd_pcm_timing_t timing);

The user would just pass which of the timing values he needs. The meaning would be:

ABSOLUTE_WRITE: the current absolute write counter in samples, since the device was opened. ABSOLUTE_READ: the current absolute read counter in samples, since the device was opened. ABSOLUTE_HEAR: the current absolute hear counter in samples, since the device was opened. WRITE_TO_READ: the current fill level of the playback buffer WRITE_TO_HEAR: if i write a sample immediately after this call, how much time takes it to be played. READ_TO_HEAR: the 'fifo' latency, i.e. the time that passes after a sample was read form the playback buffer that it is actually played. CAN_WRITE: how much can be written into the buffer right now (buffer_size - WRITE_TO_READ) LEFT_TO_PLAY: similar to WRITE_TO_HEAR but callable *after* we wrote something, and it will return how much of what we wrote has not been heard yet. In contrast to WRITE_TO_HEAR this will return 0 eventually.

snd_pcm_delay() would be identical to snd_pcm_get_timing(SND_PCM_TIMING_WRITE_TO_HEAR)

snd_pcm_update_avail() would be identical to snd_pcm_get_timing(SND_PCM_TIMING_CAN_WRITE).

Media players would use WRITE_TO_HEAR for doing their sync stuff. WINE would use LEFT_TO_PLAY. PulseAudio would use WRITE_TO_READ to estimate when the next XRUN might happen.

Of course, the fragment granularity would need to be added to this in some way.

If we had an API like this we can also make sure that people using this won't do invalid calculations. I mean, we have the situation now that WINE does invalid calculations. If ALSA does all the calculations for them and they just have to pick the one value that suits what they need we'd have a much better chance that people wouldn't misuse the ALSA API.

Lennart

-- Lennart Poettering Red Hat, Inc. lennart [at] poettering [dot] net ICQ# 11060553 http://0pointer.net/lennart/ GnuPG 0x1A015CC4

Takashi Iwai

5:55 p.m.

At Fri, 13 Jun 2008 16:25:26 +0200, Lennart Poettering wrote:

...

On Fri, 13.06.08 08:59, Takashi Iwai (tiwai@suse.de) wrote:

...
About the latency, my proposal is like below:

Renew the definition of hwptr -- make it what you imagned, the pointer where hardware is reading or has read.

This won't change anything for PCI drivers, so the impact is minimal.

And it would suddenly make the USB drivers do the right thing ;-)

This alone wouldn't :)

...

OTOH I am not a big fan of this solution, since the delay in "snd_pcm_delay()" kind of suggests that it is useful for time synchronization.

Yes, snd_pcm_delay() is the API for synchronization, as even now it explicitly defines as the delay from the position currently being played.

...

...

Add some new API functions,

To give the accuracy of the position inquiry (optional)

This requires some new kernel <-> user-space stuff.

I'd really like to have this. Only when I know this I can know how near to the current hw_ptr I can actually still make changes in the playback buffer.

One question is what quantity should it be. In most cases, it's simply a flag -- that is, the hwptr is updated only at the period boundary. But, this isn't maybe appropriate, e.g. if we introduce the stream control via (hr)timer.

...

...

To query the known latency

Ditto, or we may reuse snd_pcm_hw_params_fifo_size()?

I am not a fan of snd_pcm_hw_params_fifo_size() because the 'fifo' latency is dynamic in the network case, it changes all the time with the level of network congestion. However if something is part of hw_params it is supposed to stay fixed, right?

Hence I'd rather prefer if we let snd_pcm_hw_params_fifo_size() rest in peace.

Agreed. A new API sounds saner to me.

...

My personal favourite solution would actually be to have a new call that allows you to query *all* timing related values *atomically*. Why? different programs need different timing information from ALSA. Also, timing information happens to change all the time. If we just export two basic values, than people might end up doing arithmetics on them, in the risk the two values are not consistent with each other, since between querying them some time already passed. So, what I would suggest is this:

typedef enum snd_pcm_timing { SND_PCM_TIMING_ABSOLUTE_WRITE, SND_PCM_TIMING_ABSOLUTE_READ, SND_PCM_TIMING_ABSOLUTE_HEAR, SND_PCM_TIMING_WRITE_TO_READ, SND_PCM_TIMING_WRITE_TO_HEAR, SND_PCM_TIMING_READ_TO_HEAR, SND_PCM_TIMING_CAN_WRITE, SND_PCM_TIMING_LEFT_TO_PLAY } snd_pcm_timing_t;

snd_pcm_sframes_t snd_pcm_get_timing(snd_pcm_t *pcm, snd_pcm_timing_t timing);

The user would just pass which of the timing values he needs. The meaning would be:
    ABSOLUTE_WRITE: the current absolute write counter in samples, since the device was opened.
    ABSOLUTE_READ:  the current absolute read counter in samples, since the device was opened.
    ABSOLUTE_HEAR:  the current absolute hear counter in samples, since the device was opened.
    WRITE_TO_READ:  the current fill level of the playback buffer
    WRITE_TO_HEAR:  if i write a sample immediately after this call, how much time takes it to be played.
    READ_TO_HEAR:   the 'fifo' latency, i.e. the time that passes
                    after a sample was read form the playback buffer that it is
                    actually played.
    CAN_WRITE:      how much can be written into the buffer right
                    now  (buffer_size - WRITE_TO_READ)
    LEFT_TO_PLAY:   similar to WRITE_TO_HEAR but callable *after* we
                    wrote something, and it will return how much
                    of what we wrote has not been heard yet. In
                    contrast to WRITE_TO_HEAR this will return 0 eventually.        

Hmm... at a first glance, this already looks complicated.

What about just providing three pointers: curr_ptr, hw_ptr and appl_ptr? curr_ptr corresponds to the point being played, and hw_ptr is the point where the data was already sent to h/w, and appl_ptr is the point where the data is filled by user. The above definitions are all combinations of these pointers.

I really don't understand why we need to hide hw_ptr and appl_ptr in the current API. To me, exposing these points is much more straightforward.

In addition, there will be an API to provide the position granularity as mentioned in the above. But, this can be a different thing from the pointer APIs.

Takashi

Jaroslav Kysela

6:11 p.m.

On Fri, 13 Jun 2008, Takashi Iwai wrote:

...

What about just providing three pointers: curr_ptr, hw_ptr and appl_ptr? curr_ptr corresponds to the point being played, and hw_ptr is the point where the data was already sent to h/w, and appl_ptr is the point where the data is filled by user. The above definitions are all combinations of these pointers.

But I think that curr_ptr can be managed in drivers, thus invisible to user space (except for snd_pcm_delay() propagation). If driver requires extra handling of samples, it can allocate and manage extra buffers itself. I don't see the point to have "locked" samples already processed by hardware in the main ring buffer described by appl_ptr / hw_ptr. Application can use this space for new samples.

The only advantage with your implementation might be zero-copy, but USB and PCMCIA cards have or create own buffers, so I don't think that this advantage can be used in actual drivers and I cannot even imagine hardware which work in way to use zero-copy in this situation.

...

I really don't understand why we need to hide hw_ptr and appl_ptr in the current API. To me, exposing these points is much more straightforward.

In addition, there will be an API to provide the position granularity as mentioned in the above. But, this can be a different thing from the pointer APIs.

I agree.

Jaroslav

----- Jaroslav Kysela perex@perex.cz Linux Kernel Sound Maintainer ALSA Project, Red Hat, Inc.

Takashi Iwai

6:26 p.m.

At Fri, 13 Jun 2008 18:11:12 +0200 (CEST), Jaroslav Kysela wrote:

...

On Fri, 13 Jun 2008, Takashi Iwai wrote:

...
What about just providing three pointers: curr_ptr, hw_ptr and appl_ptr? curr_ptr corresponds to the point being played, and hw_ptr is the point where the data was already sent to h/w, and appl_ptr is the point where the data is filled by user. The above definitions are all combinations of these pointers.

But I think that curr_ptr can be managed in drivers, thus invisible to user space (except for snd_pcm_delay() propagation).

Ditto for hw_ptr. Why is it hidden at all?

...

If driver requires extra handling of samples, it can allocate and manage extra buffers itself. I don't see the point to have "locked" samples already processed by hardware in the main ring buffer described by appl_ptr / hw_ptr. Application can use this space for new samples.

The only advantage with your implementation might be zero-copy, but USB and PCMCIA cards have or create own buffers, so I don't think that this advantage can be used in actual drivers and I cannot even imagine hardware which work in way to use zero-copy in this situation.

Wait, wait. Please don't mix up. The above doesn't imply anything about the further implementation of usb-audio driver. What I suggested is, instead of hiding two pointers (hw_ptr and curr_ptr) and creating a complex API, simply expose them.

Now, regarding the usb-driver. Honestly, I don't understand what you want to do with an extra URB.

As now, usb-audio driver handles as curr_ptr == hw_ptr. But, in reality, curr_ptr = hw_ptr - samples_in_urbs. So, in the case of USB-audio, hw_ptr is ahead of curr_ptr. (And the granularity is samples_in_urbs).

Takashi

Jaroslav Kysela

6:47 p.m.

On Fri, 13 Jun 2008, Takashi Iwai wrote:

...

At Fri, 13 Jun 2008 18:11:12 +0200 (CEST), Jaroslav Kysela wrote:

...
On Fri, 13 Jun 2008, Takashi Iwai wrote:

...
What about just providing three pointers: curr_ptr, hw_ptr and appl_ptr? curr_ptr corresponds to the point being played, and hw_ptr is the point where the data was already sent to h/w, and appl_ptr is the point where the data is filled by user. The above definitions are all combinations of these pointers.

But I think that curr_ptr can be managed in drivers, thus invisible to user space (except for snd_pcm_delay() propagation).

Ditto for hw_ptr. Why is it hidden at all?

Does it improve something to show this pointer to apps? I don't see any reason to show it outside alsa-lib.

...

...
If driver requires extra handling of samples, it can allocate and manage extra buffers itself. I don't see the point to have "locked" samples already processed by hardware in the main ring buffer described by appl_ptr / hw_ptr. Application can use this space for new samples.

The only advantage with your implementation might be zero-copy, but USB and PCMCIA cards have or create own buffers, so I don't think that this advantage can be used in actual drivers and I cannot even imagine hardware which work in way to use zero-copy in this situation.

Wait, wait. Please don't mix up. The above doesn't imply anything about the further implementation of usb-audio driver. What I suggested is, instead of hiding two pointers (hw_ptr and curr_ptr) and creating a complex API, simply expose them.

I don't see a reason to make current API more complex. We have already two functions, One showing overall latency and second one how much samples can be processed by application. It's enough. We need only improve things internaly in alsa-lib <-> kernel (provide correct information for snd_pcm_delay()).

...

Now, regarding the usb-driver. Honestly, I don't understand what you want to do with an extra URB.

Note that we don't need to have extra URBs, just change hw_ptr handling in USB driver.

...

As now, usb-audio driver handles as curr_ptr == hw_ptr. But, in reality, curr_ptr = hw_ptr - samples_in_urbs. So, in the case of USB-audio, hw_ptr is ahead of curr_ptr. (And the granularity is samples_in_urbs).

As Lennart mentioned, in this case you can reach underrun at different position than expected (when URB cannot be filled). In my case, you'll reach underrun exactly at point when whole ring buffer is drained. So application can better estimate queueing and also it makes things more logical.

USB: application -> ring buffer -> driver's buffer -> DMA -> hardware FIFO -> DAC

stardard PCI card: application -> ring buffer -> DMA -> hardware FIFO -> DAC

When we add the additional latency of extra buffers to snd_pcm_delay() and fifo_size() (or any other similimar function providing maximum extra added latency), application can change ring buffer parameters to match own requirements.

Jaroslav

----- Jaroslav Kysela perex@perex.cz Linux Kernel Sound Maintainer ALSA Project, Red Hat, Inc.

Takashi Iwai

6:52 p.m.

At Fri, 13 Jun 2008 18:47:48 +0200 (CEST), Jaroslav Kysela wrote:

...

On Fri, 13 Jun 2008, Takashi Iwai wrote:

...
At Fri, 13 Jun 2008 18:11:12 +0200 (CEST), Jaroslav Kysela wrote:

...
On Fri, 13 Jun 2008, Takashi Iwai wrote:

...
What about just providing three pointers: curr_ptr, hw_ptr and appl_ptr? curr_ptr corresponds to the point being played, and hw_ptr is the point where the data was already sent to h/w, and appl_ptr is the point where the data is filled by user. The above definitions are all combinations of these pointers.

But I think that curr_ptr can be managed in drivers, thus invisible to user space (except for snd_pcm_delay() propagation).

Ditto for hw_ptr. Why is it hidden at all?

Does it improve something to show this pointer to apps? I don't see any reason to show it outside alsa-lib.

Then it'll be more clear.

...

...
...
If driver requires extra handling of samples, it can allocate and manage extra buffers itself. I don't see the point to have "locked" samples already processed by hardware in the main ring buffer described by appl_ptr / hw_ptr. Application can use this space for new samples.

The only advantage with your implementation might be zero-copy, but USB and PCMCIA cards have or create own buffers, so I don't think that this advantage can be used in actual drivers and I cannot even imagine hardware which work in way to use zero-copy in this situation.

Wait, wait. Please don't mix up. The above doesn't imply anything about the further implementation of usb-audio driver. What I suggested is, instead of hiding two pointers (hw_ptr and curr_ptr) and creating a complex API, simply expose them.

I don't see a reason to make current API more complex.

Because the current API is complex and hard to understand.

...

We have already two functions, One showing overall latency and second one how much samples can be processed by application. It's enough. We need only improve things internaly in alsa-lib <-> kernel (provide correct information for snd_pcm_delay()).

...
Now, regarding the usb-driver. Honestly, I don't understand what you want to do with an extra URB.

Note that we don't need to have extra URBs, just change hw_ptr handling in USB driver.

OK, then it's different from your previous explanation...

...

...
As now, usb-audio driver handles as curr_ptr == hw_ptr. But, in reality, curr_ptr = hw_ptr - samples_in_urbs. So, in the case of USB-audio, hw_ptr is ahead of curr_ptr. (And the granularity is samples_in_urbs).

As Lennart mentioned, in this case you can reach underrun at different position than expected (when URB cannot be filled). In my case, you'll reach underrun exactly at point when whole ring buffer is drained. So application can better estimate queueing and also it makes things more logical.

Hm, could you elaborate how to do this more exactly? That wasn't clear from your previous post at all.

Takashi

Jaroslav Kysela

7:37 p.m.

On Fri, 13 Jun 2008, Takashi Iwai wrote:

...

At Fri, 13 Jun 2008 18:47:48 +0200 (CEST), Jaroslav Kysela wrote:

...
On Fri, 13 Jun 2008, Takashi Iwai wrote:

...
At Fri, 13 Jun 2008 18:11:12 +0200 (CEST), Jaroslav Kysela wrote:

...
On Fri, 13 Jun 2008, Takashi Iwai wrote:

...
What about just providing three pointers: curr_ptr, hw_ptr and appl_ptr? curr_ptr corresponds to the point being played, and hw_ptr is the point where the data was already sent to h/w, and appl_ptr is the point where the data is filled by user. The above definitions are all combinations of these pointers.

But I think that curr_ptr can be managed in drivers, thus invisible to user space (except for snd_pcm_delay() propagation).

Ditto for hw_ptr. Why is it hidden at all?

Does it improve something to show this pointer to apps? I don't see any reason to show it outside alsa-lib.

Then it'll be more clear.

Maybe for us, but not for application developers. They need to know only how much samples are available for I/O operation.

...

...
...
...
If driver requires extra handling of samples, it can allocate and manage extra buffers itself. I don't see the point to have "locked" samples already processed by hardware in the main ring buffer described by appl_ptr / hw_ptr. Application can use this space for new samples.

The only advantage with your implementation might be zero-copy, but USB and PCMCIA cards have or create own buffers, so I don't think that this advantage can be used in actual drivers and I cannot even imagine hardware which work in way to use zero-copy in this situation.

Wait, wait. Please don't mix up. The above doesn't imply anything about the further implementation of usb-audio driver. What I suggested is, instead of hiding two pointers (hw_ptr and curr_ptr) and creating a complex API, simply expose them.

I don't see a reason to make current API more complex.

Because the current API is complex and hard to understand.

But this concrete part of API is quite simple, isn't?

...

...
We have already two functions, One showing overall latency and second one how much samples can be processed by application. It's enough. We need only improve things internaly in alsa-lib <-> kernel (provide correct information for snd_pcm_delay()).

...
Now, regarding the usb-driver. Honestly, I don't understand what you want to do with an extra URB.

Note that we don't need to have extra URBs, just change hw_ptr handling in USB driver.

OK, then it's different from your previous explanation...

Yes, sorry for not perfect explanation. I meant this.

...

...
...
As now, usb-audio driver handles as curr_ptr == hw_ptr. But, in reality, curr_ptr = hw_ptr - samples_in_urbs. So, in the case of USB-audio, hw_ptr is ahead of curr_ptr. (And the granularity is samples_in_urbs).

As Lennart mentioned, in this case you can reach underrun at different position than expected (when URB cannot be filled). In my case, you'll reach underrun exactly at point when whole ring buffer is drained. So application can better estimate queueing and also it makes things more logical.

Hm, could you elaborate how to do this more exactly? That wasn't clear from your previous post at all.

Looking to USB driver, snd_period_elapsed() is called directly after URB is filled (of course when it crosses the period boundary). Also hwptr_done variable is updated in this time.

It means that the PCM midlevel code thinks that samples in URBs are played (underrun can be detected), but they are queued in URBs.

OK, my fault. It's exactly behaviour I proposed (URBs are extra buffers), but we need to take in account the right snd_pcm_delay() output. Lennart probably meant that samples are consumed too much quickly at the stream start and impossibility to detect the extra buffering mechanism with the current code.

I would propose to add an extra callback to snd_pcm_ops to determine queued samples by driver and/or in hardware and extend snd_pcm_status and snd_pcm_mmap_status structures to propagate this value to user space.

Jaroslav

----- Jaroslav Kysela perex@perex.cz Linux Kernel Sound Maintainer ALSA Project, Red Hat, Inc.

Takashi Iwai

8:23 p.m.

At Fri, 13 Jun 2008 19:37:48 +0200 (CEST), Jaroslav Kysela wrote:

...

On Fri, 13 Jun 2008, Takashi Iwai wrote:

...
At Fri, 13 Jun 2008 18:47:48 +0200 (CEST), Jaroslav Kysela wrote:

...
On Fri, 13 Jun 2008, Takashi Iwai wrote:

...
At Fri, 13 Jun 2008 18:11:12 +0200 (CEST), Jaroslav Kysela wrote:

...
On Fri, 13 Jun 2008, Takashi Iwai wrote:

...
What about just providing three pointers: curr_ptr, hw_ptr and appl_ptr? curr_ptr corresponds to the point being played, and hw_ptr is the point where the data was already sent to h/w, and appl_ptr is the point where the data is filled by user. The above definitions are all combinations of these pointers.

But I think that curr_ptr can be managed in drivers, thus invisible to user space (except for snd_pcm_delay() propagation).

Ditto for hw_ptr. Why is it hidden at all?

Does it improve something to show this pointer to apps? I don't see any reason to show it outside alsa-lib.

Then it'll be more clear.

Maybe for us, but not for application developers. They need to know only how much samples are available for I/O operation.

I don't agree with this fully. It really depends on the type of the application, and our analysis on this isn't enough, I'm afraid. If it were only about the available samples, this issue wouldn't be raised. But, apps want more than that, obviously. We should have more closer looking at real use cases.

...

...
...
...
...
If driver requires extra handling of samples, it can allocate and manage extra buffers itself. I don't see the point to have "locked" samples already processed by hardware in the main ring buffer described by appl_ptr / hw_ptr. Application can use this space for new samples.

The only advantage with your implementation might be zero-copy, but USB and PCMCIA cards have or create own buffers, so I don't think that this advantage can be used in actual drivers and I cannot even imagine hardware which work in way to use zero-copy in this situation.

Wait, wait. Please don't mix up. The above doesn't imply anything about the further implementation of usb-audio driver. What I suggested is, instead of hiding two pointers (hw_ptr and curr_ptr) and creating a complex API, simply expose them.

I don't see a reason to make current API more complex.

Because the current API is complex and hard to understand.

But this concrete part of API is quite simple, isn't?

Well, it's a question how to see the current API. I don't mind to change the semantics of the current API slightly to resolve this particular problem, i.e. make snd_pcm_avail_update() to return the hwptr while snd_pcm_delay() returns a different size containing curr_ptr. So, let it be. My suggestion is no mandatory requirement, but just answering to Lennart's proposal.

But, my suggestion should be kept in mind for designing newer APIs. Many things come from the hiding obvious things although many parts are still sticking with old Unix concept.

...

...
...
We have already two functions, One showing overall latency and second one how much samples can be processed by application. It's enough. We need only improve things internaly in alsa-lib <-> kernel (provide correct information for snd_pcm_delay()).

...
Now, regarding the usb-driver. Honestly, I don't understand what you want to do with an extra URB.

Note that we don't need to have extra URBs, just change hw_ptr handling in USB driver.

OK, then it's different from your previous explanation...

Yes, sorry for not perfect explanation. I meant this.

...
...
...
As now, usb-audio driver handles as curr_ptr == hw_ptr. But, in reality, curr_ptr = hw_ptr - samples_in_urbs. So, in the case of USB-audio, hw_ptr is ahead of curr_ptr. (And the granularity is samples_in_urbs).

As Lennart mentioned, in this case you can reach underrun at different position than expected (when URB cannot be filled). In my case, you'll reach underrun exactly at point when whole ring buffer is drained. So application can better estimate queueing and also it makes things more logical.

Hm, could you elaborate how to do this more exactly? That wasn't clear from your previous post at all.

Looking to USB driver, snd_period_elapsed() is called directly after URB is filled (of course when it crosses the period boundary). Also hwptr_done variable is updated in this time.

It means that the PCM midlevel code thinks that samples in URBs are played (underrun can be detected), but they are queued in URBs.

OK, my fault. It's exactly behaviour I proposed (URBs are extra buffers), but we need to take in account the right snd_pcm_delay() output. Lennart probably meant that samples are consumed too much quickly at the stream start and impossibility to detect the extra buffering mechanism with the current code.

I would propose to add an extra callback to snd_pcm_ops to determine queued samples by driver and/or in hardware and extend snd_pcm_status and snd_pcm_mmap_status structures to propagate this value to user space.

Exposing to user space can be done by snd_pcm_delay(), I think. For example, add a new field, runtime->fifo_size, which is 0 as default. snd_pcm_delay() computes the delay together with runtime->fifo_size. And, the driver changes runtime->fifo_size appropriately, even dynamically. In this way, we wouldn't need any change in kernel API but only driver internals.

Takashi

Lennart Poettering

8:48 p.m.

On Fri, 13.06.08 19:37, Jaroslav Kysela (perex@perex.cz) wrote:

...

It means that the PCM midlevel code thinks that samples in URBs are played (underrun can be detected), but they are queued in URBs.

OK, my fault. It's exactly behaviour I proposed (URBs are extra buffers), but we need to take in account the right snd_pcm_delay() output. Lennart probably meant that samples are consumed too much quickly at the stream start and impossibility to detect the extra buffering mechanism with the current code.

Yes, this is exactly what I am experiencing. At stream start my estimations (based on update_avail) are way off. Afterwards everything is fine. As a dirty workaround to fix this I halve the initial sleep time always so that I can make sure I don't sleep for too long and get an xrun. But that's really ugly, because halving it is just a wild guess and it isn't even necessary on PCI hardware.

Lennart

-- Lennart Poettering Red Hat, Inc. lennart [at] poettering [dot] net ICQ# 11060553 http://0pointer.net/lennart/ GnuPG 0x1A015CC4

Lennart Poettering

8:43 p.m.

On Fri, 13.06.08 18:47, Jaroslav Kysela (perex@perex.cz) wrote:

...

...
...
...
What about just providing three pointers: curr_ptr, hw_ptr and appl_ptr? curr_ptr corresponds to the point being played, and hw_ptr is the point where the data was already sent to h/w, and appl_ptr is the point where the data is filled by user. The above definitions are all combinations of these pointers.

But I think that curr_ptr can be managed in drivers, thus invisible to user space (except for snd_pcm_delay() propagation).

Ditto for hw_ptr. Why is it hidden at all?

Does it improve something to show this pointer to apps? I don't see any reason to show it outside alsa-lib.

As I already pointed out different applications need different kinds of timing information. Some want to estimate when the next xrun will happen, other's want the delay that the next write will take to be played, even other's want how much of the already written data is not yet played. Even other's want absolute times.

And I do follow Takashi in so far that when you'd just choose to export the three pointers to userspace, so that they can be queried atomically everybody can relatively easily calculate from these 'raw' values what they need.

The problem I see with those 'cooked' values that are currently exported (i.e. delay and update_avail) is that they only serve a single purpose and there are so many different ways to use and especially misuse them, as we have seen in the WINE case. Certainly, the ptrs can be misused too, but at least all the information the kernel has that it uses to calculate those 'cooked' values is available to userspace too, so without too ugly code userspace can always calculate what is necessary. If you only have a subset of the cooked stuff and want to calculate the other cooked stuff from it your in pain.

I think there are two extremes:

a) export the 'raw' info to userspace: the three pointers

b) export the 'cooked' info to userspace in all its variations: similar to the API I suggested that would take an enum for choosing the right timing parameter the app wants.

Everything in between these extremes would be incomplete.

I could live with any situation of these two. The question is do you trust userspace programmers to use the three pointers correctly, or do you want to make sure they get things right with the enum API?

Lennart

-- Lennart Poettering Red Hat, Inc. lennart [at] poettering [dot] net ICQ# 11060553 http://0pointer.net/lennart/ GnuPG 0x1A015CC4

Lennart Poettering

8:29 p.m.

On Fri, 13.06.08 18:26, Takashi Iwai (tiwai@suse.de) wrote:

...

Wait, wait. Please don't mix up. The above doesn't imply anything about the further implementation of usb-audio driver. What I suggested is, instead of hiding two pointers (hw_ptr and curr_ptr) and creating a complex API, simply expose them.

Now, regarding the usb-driver. Honestly, I don't understand what you want to do with an extra URB.

As now, usb-audio driver handles as curr_ptr == hw_ptr. But, in reality, curr_ptr = hw_ptr - samples_in_urbs. So, in the case of USB-audio, hw_ptr is ahead of curr_ptr. (And the granularity is samples_in_urbs).

BTW: what's the relation between periods and URBs on usb-audio right now? I mean, the URBs should be exposed as periods to userspace, right? But they are not right now, are they? I mean, I can set all kinds of strange period settings for my USB device and I am pretty sure that this is not reflected in the URB size, or am I wrong?

Lennart

-- Lennart Poettering Red Hat, Inc. lennart [at] poettering [dot] net ICQ# 11060553 http://0pointer.net/lennart/ GnuPG 0x1A015CC4

Takashi Iwai

8:46 p.m.

At Fri, 13 Jun 2008 20:29:53 +0200, Lennart Poettering wrote:

...

On Fri, 13.06.08 18:26, Takashi Iwai (tiwai@suse.de) wrote:

...
Wait, wait. Please don't mix up. The above doesn't imply anything about the further implementation of usb-audio driver. What I suggested is, instead of hiding two pointers (hw_ptr and curr_ptr) and creating a complex API, simply expose them.

Now, regarding the usb-driver. Honestly, I don't understand what you want to do with an extra URB.

As now, usb-audio driver handles as curr_ptr == hw_ptr. But, in reality, curr_ptr = hw_ptr - samples_in_urbs. So, in the case of USB-audio, hw_ptr is ahead of curr_ptr. (And the granularity is samples_in_urbs).

BTW: what's the relation between periods and URBs on usb-audio right now? I mean, the URBs should be exposed as periods to userspace, right? But they are not right now, are they? I mean, I can set all kinds of strange period settings for my USB device and I am pretty sure that this is not reflected in the URB size, or am I wrong?

It's a bit complicated due to the restriction of URB size for isochronous transfer. The driver tries to adjust the packet size and number of URBs as much as possible to fit with the requested hw parameters. But, the requested condition doesn't always fit with the requirement, then the driver's wakeup time drifts a bit.

Whether the condition perfectly fits isn't exposed to the user-space, unfortunately. I first tried to make a strict hw_params constraint, but then no apps ever worked. So, the current code is a compromise: it accepts as much as possible, but doesn't work accurately if the condition doesn't fit.

Takashi

Lennart Poettering

9:01 p.m.

On Fri, 13.06.08 20:46, Takashi Iwai (tiwai@suse.de) wrote:

...

...
...
As now, usb-audio driver handles as curr_ptr == hw_ptr. But, in reality, curr_ptr = hw_ptr - samples_in_urbs. So, in the case of USB-audio, hw_ptr is ahead of curr_ptr. (And the granularity is samples_in_urbs).

BTW: what's the relation between periods and URBs on usb-audio right now? I mean, the URBs should be exposed as periods to userspace, right? But they are not right now, are they? I mean, I can set all kinds of strange period settings for my USB device and I am pretty sure that this is not reflected in the URB size, or am I wrong?

It's a bit complicated due to the restriction of URB size for isochronous transfer. The driver tries to adjust the packet size and number of URBs as much as possible to fit with the requested hw parameters. But, the requested condition doesn't always fit with the requirement, then the driver's wakeup time drifts a bit.

Whether the condition perfectly fits isn't exposed to the user-space, unfortunately. I first tried to make a strict hw_params constraint, but then no apps ever worked. So, the current code is a compromise: it accepts as much as possible, but doesn't work accurately if the condition doesn't fit.

Grmbl. I'd strongly prefer if the kernel would give me the data I am asking for instead of emulating stuff just to get broken programs to work...

Any chance to change this behaviour? Maybe during runtime by setting some "I want things raw" flag?

Lennart

-- Lennart Poettering Red Hat, Inc. lennart [at] poettering [dot] net ICQ# 11060553 http://0pointer.net/lennart/ GnuPG 0x1A015CC4

Takashi Iwai

9:05 p.m.

At Fri, 13 Jun 2008 21:01:15 +0200, Lennart Poettering wrote:

...

On Fri, 13.06.08 20:46, Takashi Iwai (tiwai@suse.de) wrote:

...
...
...
As now, usb-audio driver handles as curr_ptr == hw_ptr. But, in reality, curr_ptr = hw_ptr - samples_in_urbs. So, in the case of USB-audio, hw_ptr is ahead of curr_ptr. (And the granularity is samples_in_urbs).

BTW: what's the relation between periods and URBs on usb-audio right now? I mean, the URBs should be exposed as periods to userspace, right? But they are not right now, are they? I mean, I can set all kinds of strange period settings for my USB device and I am pretty sure that this is not reflected in the URB size, or am I wrong?

It's a bit complicated due to the restriction of URB size for isochronous transfer. The driver tries to adjust the packet size and number of URBs as much as possible to fit with the requested hw parameters. But, the requested condition doesn't always fit with the requirement, then the driver's wakeup time drifts a bit.

Whether the condition perfectly fits isn't exposed to the user-space, unfortunately. I first tried to make a strict hw_params constraint, but then no apps ever worked. So, the current code is a compromise: it accepts as much as possible, but doesn't work accurately if the condition doesn't fit.

Grmbl. I'd strongly prefer if the kernel would give me the data I am asking for instead of emulating stuff just to get broken programs to work...

Any chance to change this behaviour? Maybe during runtime by setting some "I want things raw" flag?

It was pretty hard in the current scheme, IIRC. Almost all hw_params setup become "self-inconsistent".

Takashi

Lennart Poettering

8:22 p.m.

On Fri, 13.06.08 17:55, Takashi Iwai (tiwai@suse.de) wrote:

...

What about just providing three pointers: curr_ptr, hw_ptr and appl_ptr? curr_ptr corresponds to the point being played, and hw_ptr is the point where the data was already sent to h/w, and appl_ptr is the point where the data is filled by user. The above definitions are all combinations of these pointers.

I could agree to that. However, to be useful it must be possible to query those three pointers atomically. i.e. in a single call:

typedef struct snd_ptr_info { snd_uframes_t curr_ptr; snd_uframes_t hw_ptr; snd_uframes_t appl_ptr; } snd_ptr_info_t;

int snd_pcm_get_ptr_info(snd_pcm_t *pcm, snd_ptr_info_t *i);

...

I really don't understand why we need to hide hw_ptr and appl_ptr in the current API. To me, exposing these points is much more straightforward.

I think I could subscribe to that.

Lennart

-- Lennart Poettering Red Hat, Inc. lennart [at] poettering [dot] net ICQ# 11060553 http://0pointer.net/lennart/ GnuPG 0x1A015CC4

Takashi Iwai

8:32 p.m.

At Fri, 13 Jun 2008 20:22:34 +0200, Lennart Poettering wrote:

...

On Fri, 13.06.08 17:55, Takashi Iwai (tiwai@suse.de) wrote:

...
What about just providing three pointers: curr_ptr, hw_ptr and appl_ptr? curr_ptr corresponds to the point being played, and hw_ptr is the point where the data was already sent to h/w, and appl_ptr is the point where the data is filled by user. The above definitions are all combinations of these pointers.

I could agree to that. However, to be useful it must be possible to query those three pointers atomically. i.e. in a single call:

typedef struct snd_ptr_info { snd_uframes_t curr_ptr; snd_uframes_t hw_ptr; snd_uframes_t appl_ptr; } snd_ptr_info_t;

int snd_pcm_get_ptr_info(snd_pcm_t *pcm, snd_ptr_info_t *i);

Yes, I thought of that, too.

OTOH, I agree that new addition to the current API should be avoided as much as possible. So, I'm not 100% happy but almost fine with the way Jaroslav suggested. In the end, you can retrieve the above three values from the return values of two API calls... (hmm, but abstract values are not given, though).

Takashi

James Courtier-Dutton

14 Jun 14 Jun

11:51 a.m.

Lennart Poettering wrote:

...

I could agree to that. However, to be useful it must be possible to query those three pointers atomically. i.e. in a single call:

typedef struct snd_ptr_info { snd_uframes_t curr_ptr; snd_uframes_t hw_ptr; snd_uframes_t appl_ptr; } snd_ptr_info_t;

int snd_pcm_get_ptr_info(snd_pcm_t *pcm, snd_ptr_info_t *i);

We would have to be careful. The current snd_pcm_delay() sorts out all the wrap around detection etc. for us. Having the raw pointers would require the application to do the wrap around detection but the driver has extra information in order for it to detect wrap around.

6232

Age (days ago)

6242

Last active (days ago)

List overview

Download

52 comments

8 participants

tags (0)

participants (8)

Colin Guthrie
Eliot Blennerhassett
James Courtier-Dutton
Jaroslav Kysela
Lennart Poettering
Rafał Mużyło
Takashi Iwai
Tomas Carnecky