[alsa-devel] Audio buffer mmap-ing on sh arch vs. cache aliasing problem
Hi Guys,
Recently I was investigating an issue with capturing audio using USB Audio Class device on a sh4-based board. "Bad voice quality" was reported...
Finally I have traced the problem to something which is (unfortunately) well known to sh developers as a D-cache aliasing (or synonym) problem.
Briefly speaking: due to some MMU design decisions, one can have two different virtual address pointing to the same physical location, which is fine, but going via different cache slots! So if there was a value of "0" in the memory and user "A" will write "1" there, user "B" will still read "0"...
The solution is to ensure all TLB entries (so virtual memory areas) are beginning from a 16kB-aligned virtual address. Otherwise it is necessary to flush the cache between accesses from "A" and "B" sides.
And now. The USB Audio Class driver (sound/usb/usbaudio.c) is allocating the sound buffer like this...
static int snd_pcm_alloc_vmalloc_buffer(struct snd_pcm_substream *subs, size_t size) { [...] runtime->dma_area = vmalloc(size); [...] }
... and vmalloc will return a page(4k)-aligned pointer, possibly not 16k-aligned one. This is the source of all evil ;-)
When using RW transfers everything is fine, as data is memcopied between two independent buffers.
In a case of MMAP mode we will end up with such a situation:
1. The driver will memcpy data from an URB to the buffer. It will populate several cache lines. Let's call it a "kernel cache". 2. As the library has mapped the non-16k-aligned address via different cache line ("user cache"), it will read some rubbish from the physical memory, populating the "user cache" by the way. 3. Some time later the "kernel cache" is flushed, writing the data to the memory. 4. Some new data from URB is entering "kernel cache". 5. The library will access mmap-ed area again, going via the "user cache", which may or may not reflect the correct data (depending on the fact was the "user cache" flushed or not) etc.
Of course this cycle is completely not deterministic, so some of the "kernel cache" lines will be flushed before being accessed from user space, other not... The final effect is... hmm... bizarre :-) First, of course, you will get a (probably loud) glitch (rubbish from buffer's underlying memory, before the first valid data is written back), and then something that could be described as an "echo" ;-) I mean - you are capturing "1 2 3 4 5 6 7..." and the result is (G stands for Glitch ;-) "G 1 23 3 4 4 6 67..."
As a quick-and-dirty work-around I have modified snd_pcm_alloc_vmalloc_buffer() to allocate always 12kB more and then use 16k-aligned pointer:
static int snd_pcm_alloc_vmalloc_buffer(struct snd_pcm_substream *subs, size_t size) { struct snd_pcm_runtime *runtime = subs->runtime; if (runtime->dma_addr) { if (runtime->dma_bytes >= size) return 0; /* already large enough */ vfree((void *)runtime->dma_addr); } runtime->dma_addr = (unsigned long)vmalloc(size + 12 * 1024); if (!runtime->dma_addr) return -ENOMEM; runtime->dma_area = (void *)ALIGN(runtime->dma_addr, 16 * 1024); runtime->dma_bytes = size; return 0; }
Of course it cannot be regarded as anything more as a hack. And definitely not as a solution... So my question is:
Any idea how the "proper solution" should look like? I see no obvious method to get an arch-independent "mmap-compliant" buffer. This problem was found on a sh arch, using an USB Audio Class device, however out there some other architectures are suffering from this MMU "feature" as well (ie. see http://docs.hp.com/en/B3906-90006/ch07s09.html) and possibly other drivers could behave "wrongly" (quoted as the driver is actually innocent...)
To be honest I just have absolutely no idea what to do with this all! :-O
I hope I was clear enough in the description... Any feedback, advise, idea etc. will be more than appreciated.
Cheers
Paweł
At Thu, 04 Sep 2008 12:23:54 +0100, Pawel MOLL wrote:
Hi Guys,
Recently I was investigating an issue with capturing audio using USB Audio Class device on a sh4-based board. "Bad voice quality" was reported...
Finally I have traced the problem to something which is (unfortunately) well known to sh developers as a D-cache aliasing (or synonym) problem.
Briefly speaking: due to some MMU design decisions, one can have two different virtual address pointing to the same physical location, which is fine, but going via different cache slots! So if there was a value of "0" in the memory and user "A" will write "1" there, user "B" will still read "0"...
The solution is to ensure all TLB entries (so virtual memory areas) are beginning from a 16kB-aligned virtual address. Otherwise it is necessary to flush the cache between accesses from "A" and "B" sides.
And now. The USB Audio Class driver (sound/usb/usbaudio.c) is allocating the sound buffer like this...
static int snd_pcm_alloc_vmalloc_buffer(struct snd_pcm_substream *subs, size_t size) { [...] runtime->dma_area = vmalloc(size); [...] }
... and vmalloc will return a page(4k)-aligned pointer, possibly not 16k-aligned one. This is the source of all evil ;-)
Well, the main problem is the cache coherency with mmap in general, and not about the page size. The 4k and 16k things are just a coincidence, I'd say.
When using RW transfers everything is fine, as data is memcopied between two independent buffers.
In a case of MMAP mode we will end up with such a situation:
- The driver will memcpy data from an URB to the buffer. It will
populate several cache lines. Let's call it a "kernel cache". 2. As the library has mapped the non-16k-aligned address via different cache line ("user cache"), it will read some rubbish from the physical memory, populating the "user cache" by the way. 3. Some time later the "kernel cache" is flushed, writing the data to the memory. 4. Some new data from URB is entering "kernel cache". 5. The library will access mmap-ed area again, going via the "user cache", which may or may not reflect the correct data (depending on the fact was the "user cache" flushed or not) etc.
Of course this cycle is completely not deterministic, so some of the "kernel cache" lines will be flushed before being accessed from user space, other not... The final effect is... hmm... bizarre :-) First, of course, you will get a (probably loud) glitch (rubbish from buffer's underlying memory, before the first valid data is written back), and then something that could be described as an "echo" ;-) I mean - you are capturing "1 2 3 4 5 6 7..." and the result is (G stands for Glitch ;-) "G 1 23 3 4 4 6 67..."
As a quick-and-dirty work-around I have modified snd_pcm_alloc_vmalloc_buffer() to allocate always 12kB more and then use 16k-aligned pointer:
static int snd_pcm_alloc_vmalloc_buffer(struct snd_pcm_substream *subs, size_t size) { struct snd_pcm_runtime *runtime = subs->runtime; if (runtime->dma_addr) { if (runtime->dma_bytes >= size) return 0; /* already large enough */ vfree((void *)runtime->dma_addr); } runtime->dma_addr = (unsigned long)vmalloc(size + 12 * 1024); if (!runtime->dma_addr) return -ENOMEM; runtime->dma_area = (void *)ALIGN(runtime->dma_addr, 16 * 1024); runtime->dma_bytes = size; return 0; }
Of course it cannot be regarded as anything more as a hack. And definitely not as a solution... So my question is:
Any idea how the "proper solution" should look like? I see no obvious method to get an arch-independent "mmap-compliant" buffer. This problem was found on a sh arch, using an USB Audio Class device, however out there some other architectures are suffering from this MMU "feature" as well (ie. see http://docs.hp.com/en/B3906-90006/ch07s09.html) and possibly other drivers could behave "wrongly" (quoted as the driver is actually innocent...)
Currently, it's PITA on Linux to get a non-cached mmap properly in a generic way. Especially together with coherent memory allocations, there is no API for that. But, in the case of usb-audio, the problem is the vmaloc'ed intermediate buffer, and this could be a bit easier.
One thing we can try is a patch like below. But, I'm not sure whether this is correct over all architectures, too. At best, a generic API would be helpful for such a thing...
thanks,
Takashi
diff --git a/sound/core/pcm_native.c b/sound/core/pcm_native.c index 9cf81bc..cecf4b4 100644 --- a/sound/core/pcm_native.c +++ b/sound/core/pcm_native.c @@ -3136,6 +3136,7 @@ static int snd_pcm_default_mmap(struct snd_pcm_substream *substream, struct vm_area_struct *area) { area->vm_ops = &snd_pcm_vm_ops_data; + area->vm_page_prot = pgprot_noncached(area->vm_page_prot); area->vm_private_data = substream; area->vm_flags |= VM_RESERVED; atomic_inc(&substream->mmap_count);
On Tue, Sep 16, 2008 at 08:18:06PM +0200, Takashi Iwai wrote:
At Thu, 04 Sep 2008 12:23:54 +0100, Pawel MOLL wrote:
As a quick-and-dirty work-around I have modified snd_pcm_alloc_vmalloc_buffer() to allocate always 12kB more and then use 16k-aligned pointer:
static int snd_pcm_alloc_vmalloc_buffer(struct snd_pcm_substream *subs, size_t size) { struct snd_pcm_runtime *runtime = subs->runtime; if (runtime->dma_addr) { if (runtime->dma_bytes >= size) return 0; /* already large enough */ vfree((void *)runtime->dma_addr); } runtime->dma_addr = (unsigned long)vmalloc(size + 12 * 1024); if (!runtime->dma_addr) return -ENOMEM; runtime->dma_area = (void *)ALIGN(runtime->dma_addr, 16 * 1024); runtime->dma_bytes = size; return 0; }
Of course it cannot be regarded as anything more as a hack. And definitely not as a solution... So my question is:
Any idea how the "proper solution" should look like? I see no obvious method to get an arch-independent "mmap-compliant" buffer. This problem was found on a sh arch, using an USB Audio Class device, however out there some other architectures are suffering from this MMU "feature" as well (ie. see http://docs.hp.com/en/B3906-90006/ch07s09.html) and possibly other drivers could behave "wrongly" (quoted as the driver is actually innocent...)
Currently, it's PITA on Linux to get a non-cached mmap properly in a generic way. Especially together with coherent memory allocations, there is no API for that. But, in the case of usb-audio, the problem is the vmaloc'ed intermediate buffer, and this could be a bit easier.
One thing we can try is a patch like below. But, I'm not sure whether this is correct over all architectures, too. At best, a generic API would be helpful for such a thing...
pgprot_noncached() is the closest you will get to a generic API, but it's true that not every architecture implements it. Given that, you will have to toss up an #ifdef pgprot_noncached() around it, as seems to already be the case for snd_pcm_lib_mmap_iomem(). Any sane platform that has problems in this regard will have to define pgprot_noncached(), and that's already true for things like /dev/mem today.
Hi, Taksashi,
One thing we can try is a patch like below. But, I'm not sure whether this is correct over all architectures, too. At best, a generic API would be helpful for such a thing...
- area->vm_page_prot = pgprot_noncached(area->vm_page_prot);
Well, it's not enough, because the kernel mapping of buffer is still cached... A hack below does the job, but it is not nice as well...
Cheers
Paweł
diff --git a/sound/core/pcm_native.c b/sound/core/pcm_native.c index 59b29cd..304f9e5 100644 --- a/sound/core/pcm_native.c +++ b/sound/core/pcm_native.c @@ -3159,6 +3159,9 @@ static struct vm_operations_struct snd_pcm_vm_ops_data = static int snd_pcm_default_mmap(struct snd_pcm_substream *substream, struct vm_area_struct *area) { +#ifdef pgprot_noncached + area->vm_page_prot = pgprot_noncached(area->vm_page_prot); +#endif area->vm_ops = &snd_pcm_vm_ops_data; area->vm_private_data = substream; area->vm_flags |= VM_RESERVED; diff --git a/sound/usb/usbaudio.c b/sound/usb/usbaudio.c index 7bd5852..c98a4ec 100644 --- a/sound/usb/usbaudio.c +++ b/sound/usb/usbaudio.c @@ -714,7 +714,8 @@ static int snd_pcm_alloc_vmalloc_buffer(struct snd_pcm_substream *subs, size_t s return 0; /* already large enough */ vfree(runtime->dma_area); } - runtime->dma_area = vmalloc(size); + runtime->dma_area = __vmalloc(size, GFP_KERNEL | __GFP_HIGHMEM, + PAGE_KERNEL_NOCACHE); if (! runtime->dma_area) return -ENOMEM; runtime->dma_bytes = size;
At Thu, 25 Sep 2008 15:55:53 +0100, Pawel MOLL wrote:
One thing we can try is a patch like below. But, I'm not sure whether this is correct over all architectures, too. At best, a generic API would be helpful for such a thing...
- area->vm_page_prot = pgprot_noncached(area->vm_page_prot);
Well, it's not enough, because the kernel mapping of buffer is still cached... A hack below does the job, but it is not nice as well...
Yeah, that's not sexy, but maybe the only working case right now (better with arch-specific ifdefs).
IIRC, a similar buffer handling (via vmalloc) is used in video drivers. I suppose they don't work as well, right? DRM driver uses __vmalloc() with PAGE_KERNEL_NOCACHE, but it's only for PPC32 non-coherent.
Takashi
Cheers
Paweł
diff --git a/sound/core/pcm_native.c b/sound/core/pcm_native.c index 59b29cd..304f9e5 100644 --- a/sound/core/pcm_native.c +++ b/sound/core/pcm_native.c @@ -3159,6 +3159,9 @@ static struct vm_operations_struct snd_pcm_vm_ops_data = static int snd_pcm_default_mmap(struct snd_pcm_substream *substream, struct vm_area_struct *area) { +#ifdef pgprot_noncached
area->vm_page_prot = pgprot_noncached(area->vm_page_prot);
+#endif area->vm_ops = &snd_pcm_vm_ops_data; area->vm_private_data = substream; area->vm_flags |= VM_RESERVED; diff --git a/sound/usb/usbaudio.c b/sound/usb/usbaudio.c index 7bd5852..c98a4ec 100644 --- a/sound/usb/usbaudio.c +++ b/sound/usb/usbaudio.c @@ -714,7 +714,8 @@ static int snd_pcm_alloc_vmalloc_buffer(struct snd_pcm_substream *subs, size_t s return 0; /* already large enough */ vfree(runtime->dma_area); }
runtime->dma_area = vmalloc(size);
runtime->dma_area = __vmalloc(size, GFP_KERNEL | __GFP_HIGHMEM,
PAGE_KERNEL_NOCACHE); if (! runtime->dma_area) return -ENOMEM; runtime->dma_bytes = size;
On Fri, Sep 26, 2008 at 12:04:23PM +0200, Takashi Iwai wrote:
At Thu, 25 Sep 2008 15:55:53 +0100, Pawel MOLL wrote:
One thing we can try is a patch like below. But, I'm not sure whether this is correct over all architectures, too. At best, a generic API would be helpful for such a thing...
- area->vm_page_prot = pgprot_noncached(area->vm_page_prot);
Well, it's not enough, because the kernel mapping of buffer is still cached... A hack below does the job, but it is not nice as well...
Yeah, that's not sexy, but maybe the only working case right now (better with arch-specific ifdefs).
IIRC, a similar buffer handling (via vmalloc) is used in video drivers. I suppose they don't work as well, right? DRM driver uses __vmalloc() with PAGE_KERNEL_NOCACHE, but it's only for PPC32 non-coherent.
x86 does it also, via its PAGE_AGP definition.
There are not that many platforms that define PAGE_KERNEL_NOCACHE though, so this gets a bit messy..
At Fri, 26 Sep 2008 19:30:45 +0900, Paul Mundt wrote:
On Fri, Sep 26, 2008 at 12:04:23PM +0200, Takashi Iwai wrote:
At Thu, 25 Sep 2008 15:55:53 +0100, Pawel MOLL wrote:
One thing we can try is a patch like below. But, I'm not sure whether this is correct over all architectures, too. At best, a generic API would be helpful for such a thing...
- area->vm_page_prot = pgprot_noncached(area->vm_page_prot);
Well, it's not enough, because the kernel mapping of buffer is still cached... A hack below does the job, but it is not nice as well...
Yeah, that's not sexy, but maybe the only working case right now (better with arch-specific ifdefs).
IIRC, a similar buffer handling (via vmalloc) is used in video drivers. I suppose they don't work as well, right? DRM driver uses __vmalloc() with PAGE_KERNEL_NOCACHE, but it's only for PPC32 non-coherent.
x86 does it also, via its PAGE_AGP definition.
Right, but it doesn't seem for pages allocated via vmalloc() as in drm_scatter.c.
There are not that many platforms that define PAGE_KERNEL_NOCACHE though, so this gets a bit messy..
Indeed. We really need to have a generic API for this kind of things...
thanks,
Takashi
IIRC, a similar buffer handling (via vmalloc) is used in video drivers. I suppose they don't work as well, right?
Well, our FB driver is aware of the problem. I can't say much about other ones...
DRM driver uses __vmalloc() with PAGE_KERNEL_NOCACHE, but it's only for PPC32 non-coherent.
Hm. PPC with non-coherent cache?
/* * Only on coherent architectures, we can mmap the status and the control records * for effcient data transfer. On others, we have to use HWSYNC ioctl... */ #if defined(CONFIG_X86) || defined(CONFIG_PPC) || defined(CONFIG_ALPHA) ^^^^^^^^^^ /* * mmap status record */ static struct page * snd_pcm_mmap_status_nopage(struct vm_area_struct *area, unsigned long address, int *type)
x86 does it also, via its PAGE_AGP definition.
There are not that many platforms that define PAGE_KERNEL_NOCACHE though, so this gets a bit messy..
Oh, bloody hell, I haven't realised that!
Yeah, that's not sexy, but maybe the only working case right now
(better with arch-specific ifdefs).
The question is what ifdefs to use ;-) Because the cached/uncached decision must be consistent between snd_pcm_default_mmap() and buffer allocation...
What a mess! ;-)
Paweł
At Fri, 26 Sep 2008 12:25:17 +0100, Pawel MOLL wrote:
IIRC, a similar buffer handling (via vmalloc) is used in video drivers. I suppose they don't work as well, right?
Well, our FB driver is aware of the problem. I can't say much about other ones...
FB driver has some generic API for that, defined in asm/fb.h.
DRM driver uses __vmalloc() with PAGE_KERNEL_NOCACHE, but it's only for PPC32 non-coherent.
Hm. PPC with non-coherent cache?
/*
- Only on coherent architectures, we can mmap the status and the control records
- for effcient data transfer. On others, we have to use HWSYNC ioctl...
*/ #if defined(CONFIG_X86) || defined(CONFIG_PPC) || defined(CONFIG_ALPHA) ^^^^^^^^^^ /*
- mmap status record
*/ static struct page * snd_pcm_mmap_status_nopage(struct vm_area_struct *area, unsigned long address, int *type)
Yeah, it's a known problem, and I already have patches. But, it's still under discussion how to solve it better. I'm going to post the theme to linux-arch, which James Bottomley and others suggested in the previous discussions.
x86 does it also, via its PAGE_AGP definition.
There are not that many platforms that define PAGE_KERNEL_NOCACHE though, so this gets a bit messy..
Oh, bloody hell, I haven't realised that!
Yeah, that's not sexy, but maybe the only working case right now
(better with arch-specific ifdefs).
The question is what ifdefs to use ;-) Because the cached/uncached decision must be consistent between snd_pcm_default_mmap() and buffer allocation...
Right. Both must have matching ifdefs. The vmalloc buffer handling can be in sound/core common part, so that it won't be scattered in different sub directories.
Or, let's define a common generic API (oh let me dream on)
What a mess! ;-)
Indeed ;)
Takashi
On Fri, Sep 26, 2008 at 12:04:23PM +0200, Takashi Iwai wrote:
At Thu, 25 Sep 2008 15:55:53 +0100, Pawel MOLL wrote:
One thing we can try is a patch like below. But, I'm not sure whether this is correct over all architectures, too. At best, a generic API would be helpful for such a thing...
- area->vm_page_prot = pgprot_noncached(area->vm_page_prot);
Well, it's not enough, because the kernel mapping of buffer is still cached... A hack below does the job, but it is not nice as well...
Yeah, that's not sexy, but maybe the only working case right now (better with arch-specific ifdefs).
IIRC, a similar buffer handling (via vmalloc) is used in video drivers. I suppose they don't work as well, right? DRM driver uses __vmalloc() with PAGE_KERNEL_NOCACHE, but it's only for PPC32 non-coherent.
To come back to this issue, we don't appear to have made any headway on the vmalloc front, but pgprot_noncached() is now a standard interface provided by all architectures, so the ifdef is no longer needed in the snd_pcm_default_mmap() path.
At Fri, 15 Jan 2010 09:04:17 +0900, Paul Mundt wrote:
On Fri, Sep 26, 2008 at 12:04:23PM +0200, Takashi Iwai wrote:
At Thu, 25 Sep 2008 15:55:53 +0100, Pawel MOLL wrote:
One thing we can try is a patch like below. But, I'm not sure whether this is correct over all architectures, too. At best, a generic API would be helpful for such a thing...
- area->vm_page_prot = pgprot_noncached(area->vm_page_prot);
Well, it's not enough, because the kernel mapping of buffer is still cached... A hack below does the job, but it is not nice as well...
Yeah, that's not sexy, but maybe the only working case right now (better with arch-specific ifdefs).
IIRC, a similar buffer handling (via vmalloc) is used in video drivers. I suppose they don't work as well, right? DRM driver uses __vmalloc() with PAGE_KERNEL_NOCACHE, but it's only for PPC32 non-coherent.
To come back to this issue, we don't appear to have made any headway on the vmalloc front, but pgprot_noncached() is now a standard interface provided by all architectures, so the ifdef is no longer needed in the snd_pcm_default_mmap() path.
Thanks for heading up about this.
It seems, however, that pgprot_noncached() is still dangerous to apply unconditionally, according to MIPS guys. Some MIPS platforms may freeze when this is set to invalid pages.
So, this should be applied still in an arch-dependent manner... Oh well.
thanks,
Takashi
On Sat, Jan 16, 2010 at 11:05:12AM +0100, Takashi Iwai wrote:
At Fri, 15 Jan 2010 09:04:17 +0900, Paul Mundt wrote:
On Fri, Sep 26, 2008 at 12:04:23PM +0200, Takashi Iwai wrote:
At Thu, 25 Sep 2008 15:55:53 +0100, Pawel MOLL wrote:
One thing we can try is a patch like below. But, I'm not sure whether this is correct over all architectures, too. At best, a generic API would be helpful for such a thing...
- area->vm_page_prot = pgprot_noncached(area->vm_page_prot);
Well, it's not enough, because the kernel mapping of buffer is still cached... A hack below does the job, but it is not nice as well...
Yeah, that's not sexy, but maybe the only working case right now (better with arch-specific ifdefs).
IIRC, a similar buffer handling (via vmalloc) is used in video drivers. I suppose they don't work as well, right? DRM driver uses __vmalloc() with PAGE_KERNEL_NOCACHE, but it's only for PPC32 non-coherent.
To come back to this issue, we don't appear to have made any headway on the vmalloc front, but pgprot_noncached() is now a standard interface provided by all architectures, so the ifdef is no longer needed in the snd_pcm_default_mmap() path.
Thanks for heading up about this.
It seems, however, that pgprot_noncached() is still dangerous to apply unconditionally, according to MIPS guys. Some MIPS platforms may freeze when this is set to invalid pages.
So, this should be applied still in an arch-dependent manner... Oh well.
Then it is something they will have to fix on their own. It's used by the generic VM code for VMA writenotify (see mm/mmap.c).
participants (3)
-
Paul Mundt
-
Pawel MOLL
-
Takashi Iwai