[alsa-devel] hw_params function and OSS emulation
I'm working on an ASoC driver, and I noticed that with OSS emulation enabled, my snd_pcm_ops.hw_params and and snd_pcm_ops.hw_free are called multiple times when an OSS driver uses the OSS emulation. In my case, .hw_params is called *four* times, each time with a different DMA buffer size and number of periods.
The problem is that my driver allocates a DMA buffer in my .hw_params function. For now, I have it deallocate the buffers at the top of the function and then allocate new ones based on the new hw_params values.
This is really annoying. So I have a few questions:
1) Is there any way this can be fixed? Can't the OSS emulation code figure out what it needs and wait until it's done before it calls .hw_params?
As a solution to the DMA buffer deallocate/reallocate hack I'm using, would it be okay to move the actual allocations to snd_pcm_ops.prepare()? My .hw_params function will collect the relevant data and keep them in some private structure. Then when .prepare() is called, I do the actual buffer allocation.
2) Can I assume that .prepare() is called only once?
3) Can I assume that .hw_params() is never called after .prepare() is called?
On Tue, 21 Aug 2007, Timur Tabi wrote:
I'm working on an ASoC driver, and I noticed that with OSS emulation enabled, my snd_pcm_ops.hw_params and and snd_pcm_ops.hw_free are called multiple times when an OSS driver uses the OSS emulation. In my case, .hw_params is called *four* times, each time with a different DMA buffer size and number of periods.
The problem is that my driver allocates a DMA buffer in my .hw_params function. For now, I have it deallocate the buffers at the top of the function and then allocate new ones based on the new hw_params values.
This is really annoying. So I have a few questions:
- Is there any way this can be fixed? Can't the OSS emulation code figure
out what it needs and wait until it's done before it calls .hw_params?
As a solution to the DMA buffer deallocate/reallocate hack I'm using, would it be okay to move the actual allocations to snd_pcm_ops.prepare()? My .hw_params function will collect the relevant data and keep them in some private structure. Then when .prepare() is called, I do the actual buffer allocation.
- Can I assume that .prepare() is called only once?
The drivers docs say: The difference from hw_params is that the prepare callback will be called at each time snd_pcm_prepare() is called, i.e. when recovered after underruns, etc.
Be careful that this callback will be called many times at each set up, too.
- Can I assume that .hw_params() is never called after .prepare() is called?
Good question! When prepare is called multiple times, will the hw_params values be the same each time? i.e., could one allocate memory and setup IOMMU resources for the dma buffer the first time it is called, then avoid doing so on subsequent calls?
At Tue, 21 Aug 2007 13:15:16 -0500, Timur Tabi wrote:
I'm working on an ASoC driver, and I noticed that with OSS emulation enabled, my snd_pcm_ops.hw_params and and snd_pcm_ops.hw_free are called multiple times when an OSS driver uses the OSS emulation. In my case, .hw_params is called *four* times, each time with a different DMA buffer size and number of periods.
The problem is that my driver allocates a DMA buffer in my .hw_params function. For now, I have it deallocate the buffers at the top of the function and then allocate new ones based on the new hw_params values.
This is really annoying. So I have a few questions:
- Is there any way this can be fixed? Can't the OSS emulation code figure
out what it needs and wait until it's done before it calls .hw_params?
It's hard to fix. There is no distinct between hw_params setup and prepare setup in the OSS API. In the OSS API, the device has to be always ready for use at any time. And, OSS API doesn't define the default parameters (there are tacit understandings, though). Because of these requirements, the emulation needs to set up hw_params and prepare at each time you open, or at each time any parameter-change (OSS) ioctl is called.
As a solution to the DMA buffer deallocate/reallocate hack I'm using, would it be okay to move the actual allocations to snd_pcm_ops.prepare()? My .hw_params function will collect the relevant data and keep them in some private structure. Then when .prepare() is called, I do the actual buffer allocation.
It's no good idea because the prepare callback _is_ the one that is more frequently called in ALSA API than hw_params.
- Can I assume that .prepare() is called only once?
No. It's called to restart the stream after xrun, suspend/resume, etc. hw_params isn't called for restarting the stream.
- Can I assume that .hw_params() is never called after .prepare() is called?
The call is allowed.
Takashi
Takashi Iwai wrote:
- Can I assume that .hw_params() is never called after .prepare() is called?
The call is allowed.
Ugh, so in other words, .hw_params() and .prepare() can be called any number of times in any order? That makes it impossible to optimize the creation of the DMA buffer. Currently, I have this code at the top of my .hwparams() function:
if (substream->dma_buffer.addr) { dma_free_coherent(substream->pcm->dev, runtime_data->ld_buf_size, runtime_data->link, runtime_data->ld_buf_phys); snd_dma_free_pages(&substream->dma_buffer); }
When I look at the AT91 ASOC driver as an example, I see it allocates a DMA buffer of the maximum allowed size (currently hard-coded to 32KB) in the .new function. To me, this is cheating, but it appears to be the only way to avoid doing what I'm currently doing. Is this the recommend approach?
At Wed, 22 Aug 2007 09:28:31 -0500, Timur Tabi wrote:
Takashi Iwai wrote:
- Can I assume that .hw_params() is never called after .prepare() is called?
The call is allowed.
Ugh, so in other words, .hw_params() and .prepare() can be called any number of times in any order?
Well, prepare is always after hw_params. But the number of times is unlimited.
That makes it impossible to optimize the creation of the DMA buffer. Currently, I have this code at the top of my .hwparams() function:
if (substream->dma_buffer.addr) { dma_free_coherent(substream->pcm->dev, runtime_data->ld_buf_size, runtime_data->link, runtime_data->ld_buf_phys); snd_dma_free_pages(&substream->dma_buffer); }
When I look at the AT91 ASOC driver as an example, I see it allocates a DMA buffer of the maximum allowed size (currently hard-coded to 32KB) in the .new function. To me, this is cheating, but it appears to be the only way to avoid doing what I'm currently doing. Is this the recommend approach?
Well, the buffer size is determined dynamically, and apps may want to change sometimes the buffer size during the run. So, the buffer pre-allocation of fixed size is a workaround.
Alternatively, you can remember the last allocated buffer-size and re-use the same buffer if the requested size is less than it. Since the buffer size change won't happen _so often_ (even via OSS emulation), this would work well in practice, too.
Takashi
Takashi Iwai wrote:
Alternatively, you can remember the last allocated buffer-size and re-use the same buffer if the requested size is less than it. Since the buffer size change won't happen _so often_ (even via OSS emulation), this would work well in practice, too.
Hmm... Neither of these options is really that great. I think I like the idea of pre-allocating the buffer in .new. What is a reasonable maximum size for the DMA buffer? 32KB seems small to me. When I was testing OSS emulation, the first call to .hw_params passed a DMA buffer size of over 1MB!
At Wed, 22 Aug 2007 10:02:11 -0500, Timur Tabi wrote:
Takashi Iwai wrote:
Alternatively, you can remember the last allocated buffer-size and re-use the same buffer if the requested size is less than it. Since the buffer size change won't happen _so often_ (even via OSS emulation), this would work well in practice, too.
Hmm... Neither of these options is really that great.
... but no other way :) The buffer size is dynamically configurable, per design. That's all.
I think I like the idea of pre-allocating the buffer in .new. What is a reasonable maximum size for the DMA buffer? 32KB seems small to me.
AFAIK, 64kB seems sufficient in most cases.
When I was testing OSS emulation, the first call to .hw_params passed a DMA buffer size of over 1MB!
The driver should deny a too large size if it's unrealistic. But 1MB is realistic for many systems. So, the number depends on the system.
Takashi
Takashi Iwai wrote:
The driver should deny a too large size if it's unrealistic. But 1MB is realistic for many systems. So, the number depends on the system.
Are there any popular applications that really want a DMA buffer larger than 64KB?
On a side note, I notice that ALSA has some scatter-gather support. If an application uses S/G, I presume the concept of a DMA buffer size doesn't even apply?
At Wed, 22 Aug 2007 10:48:44 -0500, Timur Tabi wrote:
Takashi Iwai wrote:
The driver should deny a too large size if it's unrealistic. But 1MB is realistic for many systems. So, the number depends on the system.
Are there any popular applications that really want a DMA buffer larger than 64KB?
It's something like an urban legend: the bigger, the better. A big buffer seems preferred for applications that require the robust operation. Such a requirement makes a little bit sense although multi-threading and adjusting the RT-priority would give you far better result.
On a side note, I notice that ALSA has some scatter-gather support. If an application uses S/G, I presume the concept of a DMA buffer size doesn't even apply?
The SG-buffer isn't for applications but for drivers. From the application viewpoint, the buffer looks linear.
Takashi
Takashi Iwai wrote:
On a side note, I notice that ALSA has some scatter-gather support. If an application uses S/G, I presume the concept of a DMA buffer size doesn't even apply?
The SG-buffer isn't for applications but for drivers. From the application viewpoint, the buffer looks linear.
Yes, but in SG, the application allocates the DMA buffer, and a list of physical addresses is passed to the driver. Normally, the driver allocates the DMA buffer and passes a virtual address to the app. Therefore, the DMA buffer processing for SG is completely different than for driver-allocated buffers, including the concept of a DMA buffer length.
For normal DMA buffers, I would specify a limit of 64KB just because of the problems with multiple calls to .hw_params() and .prepare(). I could support much larger DMA buffers, but not with the way ALSA calls the driver.
For SG, I haven't looked at the API, but I assume that the ALSA gives a list of physical addresses to the driver only *once*. In this case, my DMA buffer limitations are much larger, since I just give the list of addresses to my hardware and it does the rest. So I would hope that the application *doesn't* use the DMA buffer size limit in my snd_pcm_hardware structure.
At Thu, 23 Aug 2007 12:36:43 -0500, Timur Tabi wrote:
Takashi Iwai wrote:
On a side note, I notice that ALSA has some scatter-gather support. If an application uses S/G, I presume the concept of a DMA buffer size doesn't even apply?
The SG-buffer isn't for applications but for drivers. From the application viewpoint, the buffer looks linear.
Yes, but in SG, the application allocates the DMA buffer, and a list of physical addresses is passed to the driver.
Err, usually, application = user-space application. Do you mean really this?
Takashi
Takashi Iwai wrote:
Err, usually, application = user-space application. Do you mean really this?
Yes. The application allocates a buffer, locks it down, and passes it to the kernel. Since the buffer was allocated in user space, it is virtual contiguous but not physically contiguous, hence the need for a list of physical addresses. To the hardware, the buffer appears as a collection of scattered buffers. Hence, scatter/gather.
Like I said, I haven't look at the ALSA API for S/G, so I may be talking about an implementation of S/G that does not apply to ALSA.
At Thu, 23 Aug 2007 13:43:40 -0500, Timur Tabi wrote:
Takashi Iwai wrote:
Err, usually, application = user-space application. Do you mean really this?
Yes. The application allocates a buffer, locks it down, and passes it to the kernel. Since the buffer was allocated in user space, it is virtual contiguous but not physically contiguous, hence the need for a list of physical addresses. To the hardware, the buffer appears as a collection of scattered buffers. Hence, scatter/gather.
Hm, then it must be a really special application. I've never seen linux audio apps doing such things...
The problem in your scenario is that the buffers allocated from user-space are not always DMA-able for devices, thus not always usable for the hardware.
Like I said, I haven't look at the ALSA API for S/G, so I may be talking about an implementation of S/G that does not apply to ALSA.
The ALSA SG-buffer helper is simply to allocate individual pages and gather as a virtually linear buffer. From the user-space, it's nothing but a normal linear buffer.
The helper is specific to devices that can handle SG-buffers with kernel PAGE grains, and for architectures with vmap support.
Anyway, the hw_params implementation is basically free for drivers. The driver can use its own SG-buffer handler if the given SG-buffer helper doesn't match with the requirement.
Takashi
Takashi Iwai wrote:
At Thu, 23 Aug 2007 13:43:40 -0500, Timur Tabi wrote:
Takashi Iwai wrote:
Err, usually, application = user-space application. Do you mean really this?
Yes. The application allocates a buffer, locks it down, and passes it to the kernel. Since the buffer was allocated in user space, it is virtual contiguous but not physically contiguous, hence the need for a list of physical addresses. To the hardware, the buffer appears as a collection of scattered buffers. Hence, scatter/gather.
Hm, then it must be a really special application. I've never seen linux audio apps doing such things...
That could be. I'm using a generic definition of S/G. Perhaps I should read the ALSA API before posting further. :-)
The problem in your scenario is that the buffers allocated from user-space are not always DMA-able for devices, thus not always usable for the hardware.
It's possible for an application to allocate DMA-able memory. I think you need to allocate it normally than use mlock() on it, but it's been a while so I'm not sure.
The ALSA SG-buffer helper is simply to allocate individual pages and gather as a virtually linear buffer. From the user-space, it's nothing but a normal linear buffer.
So what is the value of having the driver allocate a physically discontiguous buffer when it can easily allocate a contiguous one? Are there systems where allocating a 32KB physically-contiguous buffer is too hard?
At Thu, 23 Aug 2007 14:13:08 -0500, Timur Tabi wrote:
Takashi Iwai wrote:
At Thu, 23 Aug 2007 13:43:40 -0500, Timur Tabi wrote:
Takashi Iwai wrote:
Err, usually, application = user-space application. Do you mean really this?
Yes. The application allocates a buffer, locks it down, and passes it to the kernel. Since the buffer was allocated in user space, it is virtual contiguous but not physically contiguous, hence the need for a list of physical addresses. To the hardware, the buffer appears as a collection of scattered buffers. Hence, scatter/gather.
Hm, then it must be a really special application. I've never seen linux audio apps doing such things...
That could be. I'm using a generic definition of S/G. Perhaps I should read the ALSA API before posting further. :-)
Well, a generic definition of SG in the "kernel" driver area is what the kernel does, not what user-space does :)
The problem in your scenario is that the buffers allocated from user-space are not always DMA-able for devices, thus not always usable for the hardware.
It's possible for an application to allocate DMA-able memory. I think you need to allocate it normally than use mlock() on it, but it's been a while so I'm not sure.
The DMA-able area is sometimes limited (especally with old hardwares). For example, ISA requires the memory below 16MB where are not usable via normal syscalls from the user-space.
The ALSA SG-buffer helper is simply to allocate individual pages and gather as a virtually linear buffer. From the user-space, it's nothing but a normal linear buffer.
So what is the value of having the driver allocate a physically discontiguous buffer when it can easily allocate a contiguous one? Are there systems where allocating a 32KB physically-contiguous buffer is too hard?
Sometimes even 32kB allocation fails when the whole memory is fragmented.
And, as you noticed, apps sometimes want really large area like over MB. Contiguous pages in such a size are hard to allocate in the later stage.
Takashi
On Thu, 23 Aug 2007, Takashi Iwai wrote:
At Thu, 23 Aug 2007 13:43:40 -0500, Timur Tabi wrote:
Takashi Iwai wrote:
Err, usually, application = user-space application. Do you mean really this?
Yes. The application allocates a buffer, locks it down, and passes it to the kernel. Since the buffer was allocated in user space, it is virtual contiguous but not physically contiguous, hence the need for a list of physical addresses. To the hardware, the buffer appears as a collection of scattered buffers. Hence, scatter/gather.
Hm, then it must be a really special application. I've never seen linux audio apps doing such things...
v4l2 suppoorts both kinds of memory mapped dma. Normally, the dma buffer is allocated by the kernel and the user space application is given an address to memory map. This way DMA-able memory can be allocated. For some device, e.g. zr36060, it might even need to be physically contiguous.
The other kind is called "user pointer" and in this case user space allocates the memory and passes a pointer to the kernel. It's defined in the API, but I think few (any?) driver actually support it. One could use this to place the dma buffer in a shared memory segment or some other reason the application wants the buffer at a certain address.
On Wed, 22 Aug 2007, Takashi Iwai wrote:
At Wed, 22 Aug 2007 09:28:31 -0500, Timur Tabi wrote:
Takashi Iwai wrote:
- Can I assume that .hw_params() is never called after .prepare() is called?
The call is allowed.
Ugh, so in other words, .hw_params() and .prepare() can be called any number of times in any order?
Well, prepare is always after hw_params. But the number of times is unlimited.
Does that mean that once prepare is called, the hw_params can no longer change? Or do you mean that hw_params must always be called before prepare, but it's possible to call hw_params again after prepare, provided one then calls prepare again before trigger?
If that is the case, wouldn't it work to allocate the DMA buffer the first time prepare is called?
At Wed, 22 Aug 2007 13:25:04 -0700 (PDT), Trent Piepho wrote:
On Wed, 22 Aug 2007, Takashi Iwai wrote:
At Wed, 22 Aug 2007 09:28:31 -0500, Timur Tabi wrote:
Takashi Iwai wrote:
- Can I assume that .hw_params() is never called after .prepare() is called?
The call is allowed.
Ugh, so in other words, .hw_params() and .prepare() can be called any number of times in any order?
Well, prepare is always after hw_params. But the number of times is unlimited.
Does that mean that once prepare is called, the hw_params can no longer change? Or do you mean that hw_params must always be called before prepare, but it's possible to call hw_params again after prepare, provided one then calls prepare again before trigger?
The latter case.
If that is the case, wouldn't it work to allocate the DMA buffer the first time prepare is called?
The buffer allocation in prepare callback would actually do work. I've never mentioned that it doesn't work. It's just not recommended, simply because the prepare is called more often than hw_params even without changing the parameters.
Takashi
Takashi Iwai wrote:
The buffer allocation in prepare callback would actually do work. I've never mentioned that it doesn't work. It's just not recommended, simply because the prepare is called more often than hw_params even without changing the parameters.
What is the driver supposed to do on the second call to .prepare()? In other words, what is the point of calling it multiple times in a row? Once my driver is prepared to start, how could it become more prepared?
At Wed, 22 Aug 2007 16:08:49 -0500, Timur Tabi wrote:
Takashi Iwai wrote:
The buffer allocation in prepare callback would actually do work. I've never mentioned that it doesn't work. It's just not recommended, simply because the prepare is called more often than hw_params even without changing the parameters.
What is the driver supposed to do on the second call to .prepare()? In other words, what is the point of calling it multiple times in a row? Once my driver is prepared to start, how could it become more prepared?
The purpose of prepare callback is to make the PCM stream ready to start. (The trigger callbacks are supposed to be just triggering, not preparing the stream.) Usually, once after the PCM is triggered, the registers are no longer as same as the beginning. If you want to stop and restart the stream, you'll very likely to reset the registers again. That's why prepare is called again. (Or, your driver is always go-or-die? :)
Takashi
Takashi Iwai wrote:
The purpose of prepare callback is to make the PCM stream ready to start. (The trigger callbacks are supposed to be just triggering, not preparing the stream.) Usually, once after the PCM is triggered, the registers are no longer as same as the beginning. If you want to stop and restart the stream, you'll very likely to reset the registers again. That's why prepare is called again. (Or, your driver is always go-or-die? :)
I thought you meant that the .prepare() function can be called multiple times *in a row*, like .hw_params() is. But I guess that's not what you meant.
participants (3)
-
Takashi Iwai
-
Timur Tabi
-
Trent Piepho