[alsa-devel] ALSA vs. non coherent DMA
Hi Takashi !
I'm bringing up an old thread as I'm just discovering that the problem still hasn't been fixed.
There seem to be a few issues with ALSA current usage of mmap vs. non cache coherent architecture, such as embedded PowerPC's.
I can see at least two with a quick look to pcm-native.c, one I don't understand and one I think I do:
- The control/status mapping. Can you elaborate a bit on what this is actually doing and why it shouldn't be done on "non coherent" architectures ? Currently this -is- done on all powerpc's, whether they are coherent or not and I want to understand what the underlying issue is.
- The mmap of DMA pages. Here, the problem appears two fold:
* Use of virt_to_page() on virtual addresses returned by dma_alloc_coherent().
* No using the appropriate page protection for a DMA coherent mapping to userspace.
It seems like you have solved that in part with implementing a generic dma_mmap_coherent() in the past that for some reason you never merged upstream (I can track that to about 2 years ago). Is there a reason ?
I think we need to at least apply a band-aid today as it's becoming a nasty issue for several non-coherent powerpc platforms. It could be in the form of implementing dma_mmap_coherent() and changing Alsa to use it with the appropriate ifdef, or just adding an ifdef CONFIG_PPC with the right code in there for now until a better solution is found.
It should be trivial though. Getting the PFN from the DMA address is easy if we have the dma handle and the virtual address, though that -is- definitely platform specific. I can implement a function for that if you need. As for the pgprot, we can come up with something like pgprot_mmap_dma(). Either that or I can fold it all in a powerpc wide implementation of a dma_mmap_coherent() like we envisioned initially.
Let me know what approach is preferred here and I'll come up with patches ASAP. As far as I'm concerned, this is a bug and thus must be fixed now for .26 and possibly backported to stable even if we can come up with a non invasive solution). I'm annoyed because it represents a trivial amount of code, this problem should have been fixed a long time ago.
Cheers, Ben.
Hi Ben,
thanks for signaling this long-standing issue again.
At Tue, 06 May 2008 10:08:28 +1000, Benjamin Herrenschmidt wrote:
Hi Takashi !
I'm bringing up an old thread as I'm just discovering that the problem still hasn't been fixed.
There seem to be a few issues with ALSA current usage of mmap vs. non cache coherent architecture, such as embedded PowerPC's.
Yep. And on MIPS, obviously.
I can see at least two with a quick look to pcm-native.c, one I don't understand and one I think I do:
- The control/status mapping. Can you elaborate a bit on what this is
actually doing and why it shouldn't be done on "non coherent" architectures ?
This is a mmap of the data record to be shared in realtime with apps. The app updates its data pointer (appl_ptr) on the mmapped buffer while the driver updates the data (e.g. DMA position, called hwptr) on the fly on the mmapped record. Due to its real-time nature, it has to be coherent -- at least, it was a problem on ARM.
Currently this -is- done on all powerpc's, whether they are coherent or not and I want to understand what the underlying issue is.
It's actually buggy. Should check more precisely.
The mmap of DMA pages. Here, the problem appears two fold:
- Use of virt_to_page() on virtual addresses returned by
dma_alloc_coherent().
- No using the appropriate page protection for a DMA coherent mapping
to userspace.
It seems like you have solved that in part with implementing a generic dma_mmap_coherent() in the past that for some reason you never merged upstream (I can track that to about 2 years ago). Is there a reason ?
IIRC, dma_mmap_coherent() cannot be implemented properly on some architectures. This is no big problem for ALSA as long as it returns an error or make it out via ifdef. But, the fact that this API cannot be done for all archs discourage arch maintainers, and the idea faded out again.
I think we need to at least apply a band-aid today as it's becoming a nasty issue for several non-coherent powerpc platforms. It could be in the form of implementing dma_mmap_coherent() and changing Alsa to use it with the appropriate ifdef, or just adding an ifdef CONFIG_PPC with the right code in there for now until a better solution is found.
Agreed.
It should be trivial though. Getting the PFN from the DMA address is easy if we have the dma handle and the virtual address, though that -is- definitely platform specific. I can implement a function for that if you need.
That'll be great. dma_mmap_coherent() and friends would be then really helpful to solve this issue.
As for the pgprot, we can come up with something like pgprot_mmap_dma(). Either that or I can fold it all in a powerpc wide implementation of a dma_mmap_coherent() like we envisioned initially.
In principle, pgprot_*() isn't actually needed in the driver side at all. We use pgprot_noncached() in one part, and it's for hacky way to mmap the ioremapped pages. It's not available on all architectures, and I'm not sure whether it works on all PPC models although it's enabled right now: in include/sound/pcm.h,
/* mmap for io-memory area */ #if defined(CONFIG_X86) || defined(CONFIG_PPC) || defined(CONFIG_ALPHA) #define SNDRV_PCM_INFO_MMAP_IOMEM SNDRV_PCM_INFO_MMAP int snd_pcm_lib_mmap_iomem(struct snd_pcm_substream *substream, struct vm_area_struct *area); #else #define SNDRV_PCM_INFO_MMAP_IOMEM 0 #define snd_pcm_lib_mmap_iomem NULL #endif
Highly likely we need to fix this, too. In the easiest way, disable this except for X86...
Let me know what approach is preferred here and I'll come up with patches ASAP. As far as I'm concerned, this is a bug and thus must be fixed now for .26 and possibly backported to stable even if we can come up with a non invasive solution). I'm annoyed because it represents a trivial amount of code, this problem should have been fixed a long time ago.
As a pragmatic solution, as you mentioned in the above, we can disable or change the problematic code with ifdefs. At best, use dma_mmap_coherent() if it's available. If not, and if the arch is known to have not-simply-mappable DMA pages (like MIPS), we can simply disable the mmap feature.
Once after we have dma_mmap_*() generally, we can clean up codes.
thanks,
Takashi
Takashi Iwai wrote:
This is a mmap of the data record to be shared in realtime with apps. The app updates its data pointer (appl_ptr) on the mmapped buffer while the driver updates the data (e.g. DMA position, called hwptr) on the fly on the mmapped record. Due to its real-time nature, it has to be coherent -- at least, it was a problem on ARM.
This doesn't sound like a coherency problem to me, and least not one you'd find on PowerPC. Both the driver and the application run on the host CPU, so there shouldn't be any coherency problem. My understanding is that a "non coherent" platform is one where the host CPU isn't aware when a *hardware device* writes directly to memory, e.g. via DMA.
On Wed, May 7, 2008 at 8:22 AM, Timur Tabi timur@freescale.com wrote:
Takashi Iwai wrote:
This is a mmap of the data record to be shared in realtime with apps. The app updates its data pointer (appl_ptr) on the mmapped buffer while the driver updates the data (e.g. DMA position, called hwptr) on the fly on the mmapped record. Due to its real-time nature, it has to be coherent -- at least, it was a problem on ARM.
This doesn't sound like a coherency problem to me, and least not one you'd find on PowerPC. Both the driver and the application run on the host CPU, so there shouldn't be any coherency problem. My understanding is that a "non coherent" platform is one where the host CPU isn't aware when a *hardware device* writes directly to memory, e.g. via DMA.
IIRC, some ARMs have a different situation because the dcache is virtually instead of physically tagged. Therefore, the kernel mapping may not see data that has not been flushed out of the user space mappings. (Someone please correct me if I'm wrong).
Cheers, g.
On Wed, 2008-05-07 at 09:22 -0500, Timur Tabi wrote:
Takashi Iwai wrote:
This is a mmap of the data record to be shared in realtime with apps. The app updates its data pointer (appl_ptr) on the mmapped buffer while the driver updates the data (e.g. DMA position, called hwptr) on the fly on the mmapped record. Due to its real-time nature, it has to be coherent -- at least, it was a problem on ARM.
This doesn't sound like a coherency problem to me, and least not one you'd find on PowerPC. Both the driver and the application run on the host CPU, so there shouldn't be any coherency problem. My understanding is that a "non coherent" platform is one where the host CPU isn't aware when a *hardware device* writes directly to memory, e.g. via DMA.
Yes, precisely. I was about to make a reply here. There is some confusion at least in terminology, in Alsa. This is not DMA coherency, though it is a problem with virtually tagged data caches that some archs such as ARM have.
So this is ok for all PowerPC since they all have a physically tagged data cache.
The real problem -is- still the DMA coherency issue and as I see it, is two fold:
- mmap'ing of the result of dma_alloc_coherent() doesn't work. There are two issues at play here, one is the pgprot that -must- be set to uncached for such a mapping on non coherent architectures (and non coherent architectures only), and the other is our virt_to_page() that will puke on virtual addresses coming from dma_alloc_coherent().
- mmap'ing of SG lists for non coherent DMA. There the problem is a mixture of how Alsa allocate the SG buffers mixes with the previous problem.
I think it's never valid to create an SG list with the output of dma_alloc_coherent though. We would need a dma_alloc_sg() for that...
sglists are made of pages, thus allocated with GFP, and later DMA mapped with dma_map_*, however this brings a whole other set of issues/constra ints such as bouce bufferring on some MMU less platforms if the memory happens to come out of the wrong place. Also, such mapped buffers are -not- coherent as they must not be modified via their virtual address while mapped, -unless- they are also mapped in kernel and/or user space (vmap & mmap) using some kind of "coherent" attributes such as pgprot_noncached. (and provided that is possible at all in kernel place for archs like MIPS).
I don't have an easy answer there, it seems the bogosity roots deep in alsa, at least for the SG bits. For the non-SG bits, we can probably work around with an accessor to get the right pgprot and maybe some variant of virt_to_page() (dma_virt_to_page() ?) that would walk the kernel page tables to obtain the pfn.
Ben.
At Thu, 08 May 2008 07:53:11 +1000, Benjamin Herrenschmidt wrote:
On Wed, 2008-05-07 at 09:22 -0500, Timur Tabi wrote:
Takashi Iwai wrote:
This is a mmap of the data record to be shared in realtime with apps. The app updates its data pointer (appl_ptr) on the mmapped buffer while the driver updates the data (e.g. DMA position, called hwptr) on the fly on the mmapped record. Due to its real-time nature, it has to be coherent -- at least, it was a problem on ARM.
This doesn't sound like a coherency problem to me, and least not one you'd find on PowerPC. Both the driver and the application run on the host CPU, so there shouldn't be any coherency problem. My understanding is that a "non coherent" platform is one where the host CPU isn't aware when a *hardware device* writes directly to memory, e.g. via DMA.
Yes, precisely. I was about to make a reply here. There is some confusion at least in terminology, in Alsa. This is not DMA coherency, though it is a problem with virtually tagged data caches that some archs such as ARM have.
Right. The words should be corrected. Since the only way to get a certain non-cached map was the (ab-)use of dma_mmap_coherent(), such a confusing wording was chosen.
So this is ok for all PowerPC since they all have a physically tagged data cache.
OK, so that part should work as is for PPC.
The real problem -is- still the DMA coherency issue and as I see it, is two fold:
- mmap'ing of the result of dma_alloc_coherent() doesn't work. There
are two issues at play here, one is the pgprot that -must- be set to uncached for such a mapping on non coherent architectures (and non coherent architectures only), and the other is our virt_to_page() that will puke on virtual addresses coming from dma_alloc_coherent().
And dma_mmap_coherent() would be a solution for it, I suppose.
- mmap'ing of SG lists for non coherent DMA. There the problem is a
mixture of how Alsa allocate the SG buffers mixes with the previous problem.
Yes.
I think it's never valid to create an SG list with the output of dma_alloc_coherent though. We would need a dma_alloc_sg() for that...
sglists are made of pages, thus allocated with GFP, and later DMA mapped with dma_map_*, however this brings a whole other set of issues/constra ints such as bouce bufferring on some MMU less platforms if the memory happens to come out of the wrong place. Also, such mapped buffers are -not- coherent as they must not be modified via their virtual address while mapped, -unless- they are also mapped in kernel and/or user space (vmap & mmap) using some kind of "coherent" attributes such as pgprot_noncached. (and provided that is possible at all in kernel place for archs like MIPS).
I don't have an easy answer there, it seems the bogosity roots deep in alsa, at least for the SG bits. For the non-SG bits, we can probably work around with an accessor to get the right pgprot and maybe some variant of virt_to_page() (dma_virt_to_page() ?) that would walk the kernel page tables to obtain the pfn.
The vmap() in sound/core/sgbuf.c can be omitted by adding proper PCM callbacks (copy and silent) to handle SG-buffers. These are only guys that access the linear buffer runtime->area.
Then we'll just need a proepr mmap PCM callback just calling dma_mmap_coherent() for each SG page. Also, the default PCM mmap should be fixed to use dma_mmap_coherent() appropriately. That's all.
So, what we really need is dma_mmap_coherent() implementations...
thanks,
Takashi
participants (4)
-
Benjamin Herrenschmidt
-
Grant Likely
-
Takashi Iwai
-
Timur Tabi