[alsa-devel] How to pair Wine with ALSA? part2: buffer&period size / blocking / mmap (long)
Dear ALSA developers,
[Please refer to part1:intro & underruns for the introductory text] http://mailman.alsa-project.org/pipermail/alsa-devel/2011-August/042746.html
Topic T2 period and buffer size and time
Wine apps using the old winmm API cannot tell at waveOutOpen time "give me a large buffer" or "give me fast reaction to explosions".
R6 A good compromise is needed for when Wine opens ALSA on behalf of winmm.
Years later, the mmdevapi introduced with Vista provides "period" and duration parameters. It also implements sort of a dmix device: it appears to mix at a fixed rate (48000 or 44100 samples/sec, user settable) and to mix data in packets of 10ms. Incidentally, that's why MS claims 10ms latency. For compatibility, Wine should match that rate -- at least when accessing the "default" device.
Does that really translate to set_period_time? I doubt it. I wonder why Wine up to now insists on particular ALSA buffer and period sizes. I tend to consider it's none of its business.
It might make sense for Wine to use a periodic timer that calls snd_pcm_write (see T3 below). *That* timer should be set to mmdevapi's 10ms. Doesn't that translate to set_period_time_*max*(10ms)? However if ALSA wants/requires 50ms, why not?
Furthermore, if the app wants to use large buffers, why insist on a tiny period on the ALSA side? (MS appears to stick to 10ms packets regardless of the app's requested buffer size).
I expect that setting Wine's timer period to at least ALSA's allows it to actually find room for new data each turn.
So here's what makes sense to me: if (shared_mode && device=="default") set_period_time_near(10ms); else if (exclusive_mode) /* mmdevapi:Initialize receives period from the app */ set_period_time_near(clamp(3ms, app's wish in mmdevapi:Intialize, 100ms)); /* 100ms is arbitrary, why not 1s? */ snd_pcm_hw_params() wine_set_timer_rate(clamp(3ms, snd_pcm_get_period_time(), 0.5s))
Topic T2.b Duration / buffer size
mmdevapi's Initialize method receives a duration parameter as a hint towards either small latency or large buffering. One would think that it makes perfect sense to forward that to snd_pcm_set_buffer_time.
*However*, mmdevapi also requires to hand out a pointer to a buffer that large (GetBuffer). Thus Wine must maintain a buffer that large (possibly even two of them, for reasons not relevant here). Now should ALSA really keep yet another buffer that large? Isn't that precisely why people have gripes with PA's 2s buffer?
I'm wondering whether Wine should solely rely on its periodic timer to regularly submit data and e.g. ask ALSA to use a buffer 3 times the period? 30ms (3x10ms) seems to play a role in MS systems. Dmix seems to prefer 4 to 6 times period size.
Unfortunately, snd_pcm_get_period_time may be known only after invoking snd_pcm_hw_params(), so I can't express: set_period_time_near(10ms); set_buffer_time_near(3 x actual_period);
Prefer set_buffer_time_near(3 x period_above); or simply not call set_buffer at all?
Finally, there's that snd_pcm_sw_params_set_avail_min whose purpose I cannot figure out. Should Wine call snd_pcm_sw_params_set_avail_min(1); or snd_pcm_sw_params_set_avail_min(0); or not at all?
Topic T3 blocking or not
Wine has traditionally used ALSA in non-blocking mode, which ALSA people recommended against (still?). Now suppose every write is preceded with avail_update, for reasons I gave in part1: underruns.
Remember my example from part 1, slightly refined: if (snd_pcm_avail_update(&avail) > buffer_size) snd_pcm_reset() /* skip over late samples */ /* should be equivalent to snd_pcm_forward(avail) */ written = snd_pcm_write(min(avail,frames));
My understanding of the semantics is that write(<avail) will not block, thus Wine could as well dispose of SND_PCM_NONBLOCK.
In the NONBLOCK case, this could be simplified to written = snd_pcm_write(frames); since I discovered that ALSA can write a little more than what avail returns, and NONBLOCK implies it'll not wait in an attempt to write a 2s data buffer, but return written < frames instead.
Actually, I've a slightly more elaborate sequence in mind. I've read that snd_pcm_open() may be delayed because of networking issues, which I want to avoid in the audio thread. Therefore I'm considering using:
snd_pcm_open(SND_PCM_NONBLOCK); ... setup hw&sw_params snd_pcm_prepare() snd_pcm_nonblock(0); /* or should it be called before prepare? */
I.e. allow blocking while playing, which does not actually happen thanks to avail_update() and a push model based on periodic timers to feed data with pcm_write().
Note that so far, I've not questioned whether a timer-based push model actually makes sense for Wine with ALSA...
Topic T4 mmap
After I re-read the mmdevapi documentation, I believe that mmdevapi's GetBuffer/ReleaseBuffer rendering protocol may be compatible with ALSA's snd_pcm_mmap_begin+commit after all. It all depends on whether ALSA grants requests for sizes as large as buffer_size, with buffer_size up to 2 seconds * samples/sec.
Hence, as an optimization, one could imagine a driver using mmap, like Winealsa used to have prior to the 2011 rewrite. Yet it always had the fallback without mmap, so let's concentrate on the non-mmap case initially and get that right.
Thank you for your help and for reading up to this point, Jörg Höhle
Joerg-Cyril.Hoehle@t-systems.com wrote:
Topic T2 period and buffer size and time
Wine apps using the old winmm API cannot tell at waveOutOpen time "give me a large buffer" or "give me fast reaction to explosions".
AFAIK the WinMM API allows to vary the number and size of submitted buffers arbitrarily even while the stream is running. This was designed for hardware that is reprogrammed after each buffer anyway (ISA DMA) or that allows dynamic buffers (e.g. ICH AC'97, which was designed for WinMM).
R6 A good compromise is needed for when Wine opens ALSA on behalf of winmm.
Let's assume that ALSA is configured for a certain buffer size. If the application does not submit as much data, the ALSA buffer is never full. (This is not a problem, except that the time until an xrun happens is, of course, shorter. PulseAudio does exactly this if it wants to decrease latency dynamically.) OTOH, if the application submits more data than fits into the buffer, Wine must write the remaining data when some space has become available.
This suggests to use a buffer as big as possible (for the hardware).
Years later, the mmdevapi introduced with Vista provides "period" and duration parameters. It also implements sort of a dmix device: it appears to mix at a fixed rate (48000 or 44100 samples/sec, user settable) and to mix data in packets of 10ms. Incidentally, that's why MS claims 10ms latency. For compatibility, Wine should match that rate -- at least when accessing the "default" device.
Does that really translate to set_period_time? I doubt it. [...] I expect that setting Wine's timer period to at least ALSA's allows it to actually find room for new data each turn.
The meaning of ALSA's periods is as follows: 1) The hardware is configured to generate an interrupt every period_size samples. (Please note that the buffer size is not necessarily an integer multiple of that.) 2) When ALSA is blocked (in snd_pcm_write* or in poll), it checks whether to wake up the application only when an interrupt arrives.
[...] Finally, there's that snd_pcm_sw_params_set_avail_min whose purpose I cannot figure out. Should Wine call snd_pcm_sw_params_set_avail_min(1); or snd_pcm_sw_params_set_avail_min(0); or not at all?
This is an additional restriction on when to wake up.
The device is considered 'ready' (i.e., the application is to be woken up so that new data can be written) if the number of available (free) samples in the buffer is at least avail_min. avail_min=0 does not make sense.
Topic T2.b Duration / buffer size
mmdevapi's Initialize method receives a duration parameter as a hint towards either small latency or large buffering. One would think that it makes perfect sense to forward that to snd_pcm_set_buffer_time.
*However*, mmdevapi also requires to hand out a pointer to a buffer that large (GetBuffer). Thus Wine must maintain a buffer that large (possibly even two of them, for reasons not relevant here). Now should ALSA really keep yet another buffer that large?
Can't you hand out a pointer to ALSA's buffer?
I'm wondering whether Wine should solely rely on its periodic timer to regularly submit data and e.g. ask ALSA to use a buffer 3 times the period? 30ms (3x10ms) seems to play a role in MS systems. Dmix seems to prefer 4 to 6 times period size.
Now you are trying to do what PulseAudio does. Why not simply use PA instead of ALSA?
Unfortunately, snd_pcm_get_period_time may be known only after invoking snd_pcm_hw_params(),
Indeed.
so I can't express: set_period_time_near(10ms); set_buffer_time_near(3 x actual_period);
Yes you can: set_period_time_near(10ms); set_periods_near(3);
Topic T3 blocking or not
Wine has traditionally used ALSA in non-blocking mode, which ALSA people recommended against (still?).
Non-blocking mode is perfectly fine if you're using poll() to wait for other events at the same time.
I've read that snd_pcm_open() may be delayed because of networking issues, which I want to avoid in the audio thread.
This has nothing to do with networking; snd_pcm_open without NONBLOCK just waits for the device to be closed. This behaviour is there for historical reasons; in practice, you always want NONBLOCK.
Therefore I'm considering using:
snd_pcm_open(SND_PCM_NONBLOCK); ... setup hw&sw_params snd_pcm_prepare() snd_pcm_nonblock(0); /* or should it be called before prepare? */
You can call nonblock(0) immediately after open. But if your code never actually blocks, why bother to set it?
Topic T4 mmap [...] Hence, as an optimization, one could imagine a driver using mmap,
What exactly gets optimized with mmap? Please note that snd_pcm_write* copies the data from the supplied buffer into ALSA's buffer; if your code does the same, it is not the slightest bit faster.
Regards, Clemens
Hi,
[I've reordered some paragraphs]
Clemens Ladisch wrote:
AFAIK the WinMM API allows to vary the number and size of submitted buffers arbitrarily even while the stream is running.
The API is simply like pcm_write: "here are N frames at address X". The app allocates and owns the buffer memory at address X.
This was designed for hardware that is reprogrammed after each buffer anyway (ISA DMA) or that allows dynamic buffers (e.g. ICH AC'97, which was designed for WinMM).
You mean that HW can be told: "play N1 frames at address X1, to be followed without glitch with N2 frames at address X2"? And afterwards, "after you'll be done with X2, play N3 at address X3"?
This is very interesting. All I knew about were circular buffers. Hence in my mind, every API where the app supplies the data pointer would require copying from the app buffers into the circular HW one.
Can't you hand out a pointer to ALSA's buffer?
I am not aware of any way outside snd_pcm_mmap_begin to obtain a pointer from ALSA. Perhaps that's the base of my misunderstandings?
What exactly gets optimized with mmap? Please note that snd_pcm_write* copies the data from the supplied buffer into ALSA's buffer; if your code does the same, it is not the slightest bit faster.
I'm not sure I understand what you mean.
The mmdevapi is unlike WinMM: GetBuffer yields a buffer pointer from the OS and has some timing restrictions (unclear to me, MSDN talks about "buffer-processing periods"). Not unlike snd_pcm_mmap_begin. This pointer could be the audio HW's write_ptr.
Hence in theory, as the app asks mmdevapi for a buffer and fills it, it can be played by the OS/HW (following ReleaseBuffer, sort of snd_pcm_mmap_commit) without additional copying. Thus, mmdevapi is not in the way of such optimizations with either HW ring-buffer or the dynamic buffers you mention, whereas WinMM cannot avoid copying with ring-buffers.
What's expected to happen in Wine without mmap is: 1. app copies/writes data into Wine-managed GetBuffer pool 1.b because of ring-buffer management, there's even a little more copying in case of wrap-around. 2. ALSA pcm_write* copies from Wine's pool into HW buffer.
The optimization is: 0. app gets pointer from Wine's GetBuffer, which in turn gets it from ALSA's snd_pcm_mmap_begin. 1. app copies/writes data into ALSA's HW buffer.
The current state is even worse w.r.t. WinMM: 1. app copies/writes data into app buffers for use by WinMM. 2. Wine's WinMM copies data into Wine's mmdevapi buffer via GetBuffer. 3. pcm_write* copies data into ALSA's HW buffer.
Well, that's a lot of text about a situation that's becoming less and less likely. Desktop reality is: all apps (incl. video) use a mixer (dmix/PA/upmix) and data still needs to be mixed: no app writes into the audio HW ringbuffer. I'm not sure PA or dmix support mmap.
set_periods_near(3) / avail_min / periods explanations;
Thank you very much. The explanation is very welcome as the ALSA doc did not make it all clear to me.
This suggests to use a buffer as big as possible (for the hardware).
I'm a little reluctant about that since I've myself experienced some apps in Wine that exhibited extreme loss of sync between audio and video when PA was in the queue instead of dmix. I don't know what the reason is but for sure I will test those again as I make progress.
I have no idea what one of those apps used for synchronisation. mmdevapi did not exist at the time the app was written. - Did it use dsound? I know next to nothing about that other API. - Did it use WinMM:waveOutGetPosition? - How is waveOutGetPosition defined in the presence of huge latencies, i.e. is it implicitly based on a ~0 latency assumption? - When are buffers submitted to waveOutWrite returned to the app? a) After the front-end processed them (sent them to the next stage)? b) After the back-end (speaker) played the last sample in it? The difference matters only as non-zero latencies are introduced into the audio chain, i.e. with networking, USB or simply PA's 2s buffers. That's about WinMM. mmdevapi defines its API much more precisely -- lesson learned.
Note that I don't think that b) can be retro-fitted into a system with huge buffering (e.g. network, USB or PA buffers), because numerous apps simply work like this: allocate 3 buffers of 1/3 second worth of samples each and play them in turn. With 5s latency that breaks. Hence Wine may need to implement a combination of both: c) After the front end processed them *and* they would have been played by a zero-latency system, e.g. HW ring-buffers.
You can call nonblock(0) immediately after open. But if your code never actually blocks, why bother to set it?
Somebody reported and I've verified that snd_pcm_drain would always fail (with -11 IIRC) in non-blocking mode. It's understandable as an afterthought since the API says "wait for ..." which violates non-blocking. Only snd_pcm_drop works in non-blocking mode.
Now you are trying to do what PulseAudio does.
No surprise. Every audio framework does something similar.
Why not simply use PA instead of ALSA?
It is my understanding that the Wine project will maintain at least 4 drivers eventually, including PA. However, for the time being, it starts with 3: ALSA/OSS/MacOS' CoreAudio. It is only when these will work well enough (and the dynamic constraints on the mmdevapi be understood/known well enough) that another one will be added AFAIK. We are not there yet.
Thank you very much for your help and explanations, Jörg Höhle
Joerg-Cyril.Hoehle@t-systems.com wrote:
Clemens Ladisch wrote:
This was designed for hardware that is reprogrammed after each buffer anyway (ISA DMA) or that allows dynamic buffers (e.g. ICH AC'97, which was designed for WinMM).
You mean that HW can be told: "play N1 frames at address X1, to be followed without glitch with N2 frames at address X2"? And afterwards, "after you'll be done with X2, play N3 at address X3"?
Yes.
Can't you hand out a pointer to ALSA's buffer?
I am not aware of any way outside snd_pcm_mmap_begin to obtain a pointer from ALSA.
Well, you could hand out that pointer.
What exactly gets optimized with mmap? Please note that snd_pcm_write* copies the data from the supplied buffer into ALSA's buffer; if your code does the same, it is not the slightest bit faster.
I'm not sure I understand what you mean.
In practice, almost every program ends up with a function like this:
my_pcm_write(buffer, count) { snd_pcm_mmap_begin(); memcpy(mmap_buffer, buffer, count); /* and handle wraparound */ snd_pcm_mmap_commit(); }
This would be no optimization over snd_pcm_writei().
What's expected to happen in Wine without mmap is:
- app copies/writes data into Wine-managed GetBuffer pool
1.b because of ring-buffer management, there's even a little more copying in case of wrap-around. 2. ALSA pcm_write* copies from Wine's pool into HW buffer.
The optimization is: 0. app gets pointer from Wine's GetBuffer, which in turn gets it from ALSA's snd_pcm_mmap_begin.
- app copies/writes data into ALSA's HW buffer.
That would indeed be an optimization.
I'm not sure PA or dmix support mmap.
There are devices that don't have 'real' memory. However, the default device then uses the "plug" plugin that supports mmap emulation (with a timer that then writes data with the normal write call).
- How is waveOutGetPosition defined in the presence of huge latencies, i.e. is it implicitly based on a ~0 latency assumption?
- When are buffers submitted to waveOutWrite returned to the app? a) After the front-end processed them (sent them to the next stage)? b) After the back-end (speaker) played the last sample in it? The difference matters only as non-zero latencies are introduced into the audio chain, i.e. with networking, USB or simply PA's 2s buffers.
In the bad old times, there was not software processing, buffers were returned immediately after the hardware was finished with them, and there was no significant latency.
You can call nonblock(0) immediately after open. But if your code never actually blocks, why bother to set it?
Somebody reported and I've verified that snd_pcm_drain would always fail (with -11 IIRC) in non-blocking mode.
Well, if you do want to block, you indeed need blocking mode. :)
Regards, Clemens
participants (2)
-
Clemens Ladisch
-
Joerg-Cyril.Hoehle@t-systems.com