[alsa-devel] Question on arm64 unaligned faults during playback
Hi,
I am working on a 64-bit ARM SoC (Samsung's Exynos7) and have observed unaligned faults while testing certain sound streams with aplay. Playing a 44.1k sampling rate stream gives the following crash while 48k streams work fine.
[root@(none) ~]# aplay test2_16bit_44k.wav Playing WAVE 'test2_16bit_44k.wav' : Signed 1[ 24.500724] sound: 2ch, 44100Hz, 88208bytes 6 bit Little Endian, Rate 44100 Hz, Stereo [ 24.520738] snd_pcm_lib_write_transfer:sound/core/pcm_lib.c hwbuf is ffffff8000080000, runtime->dma_area is ffffff8000080000, hwoff is 0, frames_to_bytes is 0, frames is 5513 [ 24.535661] snd_pcm_lib_write_transfer:sound/core/pcm_lib.c hwbuf is ffffff8000085624, runtime->dma_area is ffffff8000080000, hwoff is 5513, frames_to_bytes is 22052, frames is 5513 [ 24.551244] Unhandled fault: alignment fault (0x96000061) at 0xffffff8000085624 [ 24.558448] Internal error: : 96000061 [#1] PREEMPT SMP [ 24.563449] Modules linked in: [ 24.566489] CPU: 0 PID: 928 Comm: aplay Not tainted 3.16.0+ #232 [ 24.572473] task: ffffffc0a934c380 ti: ffffffc0038d4000 task.ti: ffffffc0038d4000 [ 24.579944] PC is at __copy_from_user+0x14/0x60 [ 24.584450] LR is at snd_pcm_lib_write_transfer+0xe4/0x104 [ 24.589912] pc : [<ffffffc0002d4784>] lr : [<ffffffc0004b58f0>] pstate: 20000145 [ 24.597285] sp : ffffffc0038d7c90 [ 24.600583] x29: ffffffc0038d7c90 x28: ffffffc0038c1c00 [ 24.605876] x27: ffffffc0a93eae00 x26: 0000000000001589 [ 24.611168] x25: 0000000000000000 x24: 0000000000000000 [ 24.616461] x23: 0000000000000008 x22: ffffff8000085624 [ 24.621753] x21: 0000000000005624 x20: ffffffc0038c1c00 [ 24.627046] x19: 0000000011d69620 x18: 0000007fd2e8b1c0 [ 24.632339] x17: 0000007fa8057370 x16: ffffffc000196108 [ 24.637632] x15: 232b226b00000000 x14: 617266202c333135 [ 24.642925] x13: 352073692066666f x12: 7768202c30303030 [ 24.648218] x11: 3830303030386666 x10: 6666666620736920 [ 24.653510] x9 : 616572615f616d64 x8 : 000000000000026f [ 24.658803] x7 : ffffffc000998790 x6 : 0000000000000001 [ 24.664095] x5 : 0000000000000000 x4 : 0000000011d6ec44 [ 24.669388] x3 : f459f780f2b3f6d4 x2 : 0000000000005614 [ 24.674681] x1 : 0000000011d69628 x0 : ffffff8000085624 [ 24.679973] [ 24.681451] Process aplay (pid: 928, stack limit = 0xffffffc0038d4058) [ 24.687959] Stack: (0xffffffc0038d7c90 to 0xffffffc0038d8000) [ 24.693688] 7c80: 038d7cd0 ffffffc0 004b7578 ffffffc0 [ 24.701846] 7ca0: 00001589 00000000 a93eaf50 ffffffc0 009ef000 ffffffc0 11d69620 00000000 [ 24.710001] 7cc0: 004b580c ffffffc0 004b755c ffffffc0 038d7d70 ffffffc0 004b7758 ffffffc0 [ 24.718158] 7ce0: a93eae00 ffffffc0 d2e8b438 0000007f a91eb858 ffffffc0 d2e8b438 0000007f [ 24.726313] 7d00: 40184150 00000000 d2e8b438 0000007f 00000115 00000000 0000001d 00000000 [ 24.734470] 7d20: 00960000 ffffffc0 038d4000 ffffffc0 0000000a 00000000 038d7d68 ffffffc0 [ 24.742626] 7d40: 00001589 00000000 00001589 00000000 00000000 00000000 009ef898 ffffffc0 [ 24.750782] 7d60: a8b6d800 ffffffc0 0000409b 00000000 038d7da0 ffffffc0 004b2dd0 ffffffc0 [ 24.758938] 7d80: a93eae00 ffffffc0 11d69620 00000000 11d69620 00000000 00001589 00000000 [ 24.767094] 7da0: 038d7df0 ffffffc0 004b2e10 ffffffc0 a8b6d900 ffffffc0 d2e8b438 0000007f [ 24.775249] 7dc0: a91eb858 ffffffc0 00000004 00000000 00000001 00000000 00000000 00000000 [ 24.783406] 7de0: 11d69620 00000000 00001589 00000000 038d7e00 ffffffc0 00195ecc ffffffc0 [ 24.791563] 7e00: 038d7e90 ffffffc0 0019618c ffffffc0 00000000 00000000 a8b6d900 ffffffc0 [ 24.799718] 7e20: a8b6d900 ffffffc0 00000004 00000000 40184150 00000000 00185888 ffffffc0 [ 24.807875] 7e40: a8b6d800 ffffffc0 a8b6d800 ffffffc0 ffffffff ffffffff a80f0588 0000007f [ 24.816030] 7e60: 80000000 00000000 00000015 00000000 038d7e80 ffffffc0 001a0340 ffffffc0 [ 24.824187] 7e80: 038d7e90 ffffffc0 00196148 ffffffc0 d2e8b410 0000007f 0008425c ffffffc0 [ 24.832343] 7ea0: 00000000 00000000 11d500e0 00000000 ffffffff ffffffff a805737c 0000007f [ 24.840499] 7ec0: 60000000 00000000 00000015 00000000 00000004 00000000 40184150 00000000 [ 24.848655] 7ee0: d2e8b438 0000007f a8223fbc 0000007f 00000003 00000000 00000000 00000000 [ 24.856811] 7f00: a82d46f0 0000007f 00000000 00000000 0000001d 00000000 00000120 00000000 [ 24.864967] 7f20: 01010101 01010101 00000016 00000000 00000003 00000000 11d510d0 00000000 [ 24.873123] 7f40: 00000000 00000000 00000000 232b226b a8299b80 0000007f a8057370 0000007f [ 24.881279] 7f60: d2e8b1c0 0000007f 11d691f0 00000000 11d500e0 00000000 00001589 00000000 [ 24.889435] 7f80: 11d69620 00000000 00000000 00000000 0041f7d8 00000000 00000003 00000000 [ 24.897591] 7fa0: 00000000 00000000 00000004 00000000 00000003 00000000 d2e8b410 0000007f [ 24.905748] 7fc0: a8223ff0 0000007f d2e8b410 0000007f a805737c 0000007f 60000000 00000000 [ 24.913904] 7fe0: 00000004 00000000 0000001d 00000000 d10d0579 f13ea086 786e62ff 1f2437e3 [ 24.922054] Call trace: [ 24.924488] [<ffffffc0002d4784>] __copy_from_user+0x14/0x60 [ 24.930040] [<ffffffc0004b7574>] snd_pcm_lib_write1+0x1fc/0x384 [ 24.935940] [<ffffffc0004b7754>] snd_pcm_lib_write+0x58/0x80 [ 24.941580] [<ffffffc0004b2dcc>] snd_pcm_playback_ioctl1+0x2e8/0x300 [ 24.947913] [<ffffffc0004b2e0c>] snd_pcm_playback_ioctl+0x28/0x44 [ 24.953990] [<ffffffc000195ec8>] do_vfs_ioctl+0x370/0x5b0 [ 24.959367] [<ffffffc000196188>] SyS_ioctl+0x80/0x98 [ 24.964314] Code: f1002042 540000a4 f8408423 f1002042 (f8008403) [ 24.976808] ---[ end trace 35415135d4341fc9 ]---
We are using the internal sram available for sound, for DMA buffer allocation, using the generic SRAM driver. From the above log, the buffer address offset is not 8-byte aligned and as we are using the SRAM driver which maps the memory as device memory we are getting an unaligned fault. Is it incorrect to use the generic sram driver for arm64 or am I missing something ?
Regards, Abhilash
Abhilash Kesavan wrote:
I am working on a 64-bit ARM SoC (Samsung's Exynos7) and have observed unaligned faults while testing certain sound streams with aplay.
[ 24.535661] snd_pcm_lib_write_transfer:sound/core/pcm_lib.c hwbuf is ffffff8000085624, runtime->dma_area is ffffff8000080000, hwoff is 5513, frames_to_bytes is 22052, frames is 5513 [ 24.551244] Unhandled fault: alignment fault (0x96000061) at 0xffffff8000085624 [ 24.579944] PC is at __copy_from_user+0x14/0x60 [ 24.584450] LR is at snd_pcm_lib_write_transfer+0xe4/0x104 [ 24.922054] Call trace: [ 24.924488] [<ffffffc0002d4784>] __copy_from_user+0x14/0x60 [ 24.930040] [<ffffffc0004b7574>] snd_pcm_lib_write1+0x1fc/0x384
We are using the internal sram available for sound, for DMA buffer allocation, using the generic SRAM driver. From the above log, the buffer address offset is not 8-byte aligned and as we are using the SRAM driver which maps the memory as device memory we are getting an unaligned fault. Is it incorrect to use the generic sram driver for arm64 or am I missing something ?
When you give the ALSA framework a memory buffer, it is assumed that it is general-purpose memory, i.e., can be accessed with any alignment.
If your memory does not allow such accesses, you have to do all the accesses in your driver, i.e., do everything in the .copy/.silence callbacks. (You also have to add constraints so that periods and buffer are correctly aligned.)
Regards, Clemens
At Mon, 08 Dec 2014 09:49:37 +0100, Clemens Ladisch wrote:
Abhilash Kesavan wrote:
I am working on a 64-bit ARM SoC (Samsung's Exynos7) and have observed unaligned faults while testing certain sound streams with aplay.
[ 24.535661] snd_pcm_lib_write_transfer:sound/core/pcm_lib.c hwbuf is ffffff8000085624, runtime->dma_area is ffffff8000080000, hwoff is 5513, frames_to_bytes is 22052, frames is 5513 [ 24.551244] Unhandled fault: alignment fault (0x96000061) at 0xffffff8000085624 [ 24.579944] PC is at __copy_from_user+0x14/0x60 [ 24.584450] LR is at snd_pcm_lib_write_transfer+0xe4/0x104 [ 24.922054] Call trace: [ 24.924488] [<ffffffc0002d4784>] __copy_from_user+0x14/0x60 [ 24.930040] [<ffffffc0004b7574>] snd_pcm_lib_write1+0x1fc/0x384
We are using the internal sram available for sound, for DMA buffer allocation, using the generic SRAM driver. From the above log, the buffer address offset is not 8-byte aligned and as we are using the SRAM driver which maps the memory as device memory we are getting an unaligned fault. Is it incorrect to use the generic sram driver for arm64 or am I missing something ?
When you give the ALSA framework a memory buffer, it is assumed that it is general-purpose memory, i.e., can be accessed with any alignment.
If your memory does not allow such accesses, you have to do all the accesses in your driver, i.e., do everything in the .copy/.silence callbacks. (You also have to add constraints so that periods and buffer are correctly aligned.)
You can set runtime->min_align in the own hw_params callback so that the PCM core handle it well, at least, for non-mmap mode.
But this feature has been rarely used (currently no driver sets it explicitly), so there might be still some bugs there, too :)
Takashi
Added ARM64 folks to Cc.
On 12/08/2014 11:41 AM, Takashi Iwai wrote:
At Mon, 08 Dec 2014 09:49:37 +0100, Clemens Ladisch wrote:
Abhilash Kesavan wrote:
I am working on a 64-bit ARM SoC (Samsung's Exynos7) and have observed unaligned faults while testing certain sound streams with aplay.
[ 24.535661] snd_pcm_lib_write_transfer:sound/core/pcm_lib.c hwbuf is ffffff8000085624, runtime->dma_area is ffffff8000080000, hwoff is 5513, frames_to_bytes is 22052, frames is 5513 [ 24.551244] Unhandled fault: alignment fault (0x96000061) at 0xffffff8000085624 [ 24.579944] PC is at __copy_from_user+0x14/0x60 [ 24.584450] LR is at snd_pcm_lib_write_transfer+0xe4/0x104 [ 24.922054] Call trace: [ 24.924488] [<ffffffc0002d4784>] __copy_from_user+0x14/0x60 [ 24.930040] [<ffffffc0004b7574>] snd_pcm_lib_write1+0x1fc/0x384
We are using the internal sram available for sound, for DMA buffer allocation, using the generic SRAM driver. From the above log, the buffer address offset is not 8-byte aligned and as we are using the SRAM driver which maps the memory as device memory we are getting an unaligned fault. Is it incorrect to use the generic sram driver for arm64 or am I missing something ?
When you give the ALSA framework a memory buffer, it is assumed that it is general-purpose memory, i.e., can be accessed with any alignment.
If your memory does not allow such accesses, you have to do all the accesses in your driver, i.e., do everything in the .copy/.silence callbacks. (You also have to add constraints so that periods and buffer are correctly aligned.)
You can set runtime->min_align in the own hw_params callback so that the PCM core handle it well, at least, for non-mmap mode.
But this feature has been rarely used (currently no driver sets it explicitly), so there might be still some bugs there, too :)
As far as I can see min_align only aligns the hardware pointer, but not the application pointer. The later causes the issue here.
E.g. if a application writes a unaligned amount of bytes the next write will start at an unaligned buffer address.
How safe is it in general to use IO mapped memory as general purpose memory on ARM64? Are there maybe even more constraints than just that access has to be aligned? Maybe the driver should rather use copy_from_user_toio() instead of copy_from_user() to make sure that things work correctly.
- Lars
On Mon, Dec 08, 2014 at 11:15:52AM +0000, Lars-Peter Clausen wrote:
Added ARM64 folks to Cc.
On 12/08/2014 11:41 AM, Takashi Iwai wrote:
At Mon, 08 Dec 2014 09:49:37 +0100, Clemens Ladisch wrote:
Abhilash Kesavan wrote:
I am working on a 64-bit ARM SoC (Samsung's Exynos7) and have observed unaligned faults while testing certain sound streams with aplay.
[ 24.535661] snd_pcm_lib_write_transfer:sound/core/pcm_lib.c hwbuf is ffffff8000085624, runtime->dma_area is ffffff8000080000, hwoff is 5513, frames_to_bytes is 22052, frames is 5513 [ 24.551244] Unhandled fault: alignment fault (0x96000061) at 0xffffff8000085624 [ 24.579944] PC is at __copy_from_user+0x14/0x60 [ 24.584450] LR is at snd_pcm_lib_write_transfer+0xe4/0x104 [ 24.922054] Call trace: [ 24.924488] [<ffffffc0002d4784>] __copy_from_user+0x14/0x60 [ 24.930040] [<ffffffc0004b7574>] snd_pcm_lib_write1+0x1fc/0x384
We are using the internal sram available for sound, for DMA buffer allocation, using the generic SRAM driver. From the above log, the buffer address offset is not 8-byte aligned and as we are using the SRAM driver which maps the memory as device memory we are getting an unaligned fault. Is it incorrect to use the generic sram driver for arm64 or am I missing something ?
When you give the ALSA framework a memory buffer, it is assumed that it is general-purpose memory, i.e., can be accessed with any alignment.
If your memory does not allow such accesses, you have to do all the accesses in your driver, i.e., do everything in the .copy/.silence callbacks. (You also have to add constraints so that periods and buffer are correctly aligned.)
You can set runtime->min_align in the own hw_params callback so that the PCM core handle it well, at least, for non-mmap mode.
But this feature has been rarely used (currently no driver sets it explicitly), so there might be still some bugs there, too :)
As far as I can see min_align only aligns the hardware pointer, but not the application pointer. The later causes the issue here.
E.g. if a application writes a unaligned amount of bytes the next write will start at an unaligned buffer address.
How safe is it in general to use IO mapped memory as general purpose memory on ARM64? Are there maybe even more constraints than just that access has to be aligned? Maybe the driver should rather use copy_from_user_toio() instead of copy_from_user() to make sure that things work correctly.
If you're simply mapping RAM as device memory, then you need to ensure:
- Naturally aligned access - No exclusive instructions (e.g atomics, locks, semaphores ...)
Out of interest, why are you using device mappings for memory? By design, they prohibit speculation (and, depending on the subtype, re-ordering, gathering and early write response) and will be significantly slower than normal memory accesses.
Could you use something like ioremap_wc, which will give you normal, non-cacheable memory instead?
Will
Hi Will,
On Mon, Dec 8, 2014 at 5:01 PM, Will Deacon will.deacon@arm.com wrote:
On Mon, Dec 08, 2014 at 11:15:52AM +0000, Lars-Peter Clausen wrote:
Added ARM64 folks to Cc.
On 12/08/2014 11:41 AM, Takashi Iwai wrote:
At Mon, 08 Dec 2014 09:49:37 +0100, Clemens Ladisch wrote:
Abhilash Kesavan wrote:
I am working on a 64-bit ARM SoC (Samsung's Exynos7) and have observed unaligned faults while testing certain sound streams with aplay.
[ 24.535661] snd_pcm_lib_write_transfer:sound/core/pcm_lib.c hwbuf is ffffff8000085624, runtime->dma_area is ffffff8000080000, hwoff is 5513, frames_to_bytes is 22052, frames is 5513 [ 24.551244] Unhandled fault: alignment fault (0x96000061) at 0xffffff8000085624 [ 24.579944] PC is at __copy_from_user+0x14/0x60 [ 24.584450] LR is at snd_pcm_lib_write_transfer+0xe4/0x104 [ 24.922054] Call trace: [ 24.924488] [<ffffffc0002d4784>] __copy_from_user+0x14/0x60 [ 24.930040] [<ffffffc0004b7574>] snd_pcm_lib_write1+0x1fc/0x384
We are using the internal sram available for sound, for DMA buffer allocation, using the generic SRAM driver. From the above log, the buffer address offset is not 8-byte aligned and as we are using the SRAM driver which maps the memory as device memory we are getting an unaligned fault. Is it incorrect to use the generic sram driver for arm64 or am I missing something ?
When you give the ALSA framework a memory buffer, it is assumed that it is general-purpose memory, i.e., can be accessed with any alignment.
If your memory does not allow such accesses, you have to do all the accesses in your driver, i.e., do everything in the .copy/.silence callbacks. (You also have to add constraints so that periods and buffer are correctly aligned.)
You can set runtime->min_align in the own hw_params callback so that the PCM core handle it well, at least, for non-mmap mode.
But this feature has been rarely used (currently no driver sets it explicitly), so there might be still some bugs there, too :)
As far as I can see min_align only aligns the hardware pointer, but not the application pointer. The later causes the issue here.
E.g. if a application writes a unaligned amount of bytes the next write will start at an unaligned buffer address.
How safe is it in general to use IO mapped memory as general purpose memory on ARM64? Are there maybe even more constraints than just that access has to be aligned? Maybe the driver should rather use copy_from_user_toio() instead of copy_from_user() to make sure that things work correctly.
If you're simply mapping RAM as device memory, then you need to ensure:
- Naturally aligned access
- No exclusive instructions (e.g atomics, locks, semaphores ...)
Out of interest, why are you using device mappings for memory? By design, they prohibit speculation (and, depending on the subtype, re-ordering, gathering and early write response) and will be significantly slower than normal memory accesses.
Thanks for the comments.
We are using the generic SRAM allocation driver (drivers/misc/sram.c) to use the internal audio SRAM for buffer allocation. The generic driver ioremaps the sram and exposes the obtained pool via the genalloc API to other drivers such as audio. There already exists support for internal sram in the audio framework. The specific use case is in an Exynos7 low power state where the DRAM cannot be used.
Could you use something like ioremap_wc, which will give you normal, non-cacheable memory instead?
Modifying ioremap to ioremap_wc gets rid of the crash. Do you think the sram driver ioremap usage should be modified to use ioremap_wc ?
Regards, Abhilash
Will
At Mon, 08 Dec 2014 12:15:52 +0100, Lars-Peter Clausen wrote:
Added ARM64 folks to Cc.
On 12/08/2014 11:41 AM, Takashi Iwai wrote:
At Mon, 08 Dec 2014 09:49:37 +0100, Clemens Ladisch wrote:
Abhilash Kesavan wrote:
I am working on a 64-bit ARM SoC (Samsung's Exynos7) and have observed unaligned faults while testing certain sound streams with aplay.
[ 24.535661] snd_pcm_lib_write_transfer:sound/core/pcm_lib.c hwbuf is ffffff8000085624, runtime->dma_area is ffffff8000080000, hwoff is 5513, frames_to_bytes is 22052, frames is 5513 [ 24.551244] Unhandled fault: alignment fault (0x96000061) at 0xffffff8000085624 [ 24.579944] PC is at __copy_from_user+0x14/0x60 [ 24.584450] LR is at snd_pcm_lib_write_transfer+0xe4/0x104 [ 24.922054] Call trace: [ 24.924488] [<ffffffc0002d4784>] __copy_from_user+0x14/0x60 [ 24.930040] [<ffffffc0004b7574>] snd_pcm_lib_write1+0x1fc/0x384
We are using the internal sram available for sound, for DMA buffer allocation, using the generic SRAM driver. From the above log, the buffer address offset is not 8-byte aligned and as we are using the SRAM driver which maps the memory as device memory we are getting an unaligned fault. Is it incorrect to use the generic sram driver for arm64 or am I missing something ?
When you give the ALSA framework a memory buffer, it is assumed that it is general-purpose memory, i.e., can be accessed with any alignment.
If your memory does not allow such accesses, you have to do all the accesses in your driver, i.e., do everything in the .copy/.silence callbacks. (You also have to add constraints so that periods and buffer are correctly aligned.)
You can set runtime->min_align in the own hw_params callback so that the PCM core handle it well, at least, for non-mmap mode.
But this feature has been rarely used (currently no driver sets it explicitly), so there might be still some bugs there, too :)
As far as I can see min_align only aligns the hardware pointer, but not the application pointer. The later causes the issue here.
E.g. if a application writes a unaligned amount of bytes the next write will start at an unaligned buffer address.
Oh right, and I think we should fix the alignment of applptr, too. We can set min_align zero as default and allow the unaligned applptr update while aligning to the size if given explicitly.
Takashi
On Mon, Dec 08, 2014 at 11:15:52AM +0000, Lars-Peter Clausen wrote:
Added ARM64 folks to Cc.
Thanks.
On 12/08/2014 11:41 AM, Takashi Iwai wrote:
At Mon, 08 Dec 2014 09:49:37 +0100, Clemens Ladisch wrote:
Abhilash Kesavan wrote:
I am working on a 64-bit ARM SoC (Samsung's Exynos7) and have observed unaligned faults while testing certain sound streams with aplay.
[ 24.535661] snd_pcm_lib_write_transfer:sound/core/pcm_lib.c hwbuf is ffffff8000085624, runtime->dma_area is ffffff8000080000, hwoff is 5513, frames_to_bytes is 22052, frames is 5513 [ 24.551244] Unhandled fault: alignment fault (0x96000061) at 0xffffff8000085624 [ 24.579944] PC is at __copy_from_user+0x14/0x60 [ 24.584450] LR is at snd_pcm_lib_write_transfer+0xe4/0x104 [ 24.922054] Call trace: [ 24.924488] [<ffffffc0002d4784>] __copy_from_user+0x14/0x60 [ 24.930040] [<ffffffc0004b7574>] snd_pcm_lib_write1+0x1fc/0x384
I can see copy_from_user() here, I assume the source is a user space buffer and the destination is the ioremap'ed SRAM (which causes the unaligned fault). Such function is not supposed copy into device memory (its prototype does not specify __iomem for the destination pointer).
If you really want Device memory for the SRAM, I think we need some other API. memcpy_toio() could copy into such buffer but the source must be kernel. Maybe something like copy_from_user_toio()?
Otherwise, if you can remap the buffer as Normal (non-cacheable) memory, you could use something like vmap() with pgprot_writecombine for the mapping. Even though ioremap_wc() returns (currently) Normal non-cacheable memory, such pointer (__iomem) should only be used with specific IO accessors (though I can see this assumption broken in a few places in the kernel).
Hi Catalin,
On Mon, Dec 8, 2014 at 5:55 PM, Catalin Marinas catalin.marinas@arm.com wrote:
On Mon, Dec 08, 2014 at 11:15:52AM +0000, Lars-Peter Clausen wrote:
Added ARM64 folks to Cc.
Thanks.
On 12/08/2014 11:41 AM, Takashi Iwai wrote:
At Mon, 08 Dec 2014 09:49:37 +0100, Clemens Ladisch wrote:
Abhilash Kesavan wrote:
I am working on a 64-bit ARM SoC (Samsung's Exynos7) and have observed unaligned faults while testing certain sound streams with aplay.
[ 24.535661] snd_pcm_lib_write_transfer:sound/core/pcm_lib.c hwbuf is ffffff8000085624, runtime->dma_area is ffffff8000080000, hwoff is 5513, frames_to_bytes is 22052, frames is 5513 [ 24.551244] Unhandled fault: alignment fault (0x96000061) at 0xffffff8000085624 [ 24.579944] PC is at __copy_from_user+0x14/0x60 [ 24.584450] LR is at snd_pcm_lib_write_transfer+0xe4/0x104 [ 24.922054] Call trace: [ 24.924488] [<ffffffc0002d4784>] __copy_from_user+0x14/0x60 [ 24.930040] [<ffffffc0004b7574>] snd_pcm_lib_write1+0x1fc/0x384
I can see copy_from_user() here, I assume the source is a user space buffer and the destination is the ioremap'ed SRAM (which causes the unaligned fault). Such function is not supposed copy into device memory (its prototype does not specify __iomem for the destination pointer).
If you really want Device memory for the SRAM, I think we need some other API. memcpy_toio() could copy into such buffer but the source must be kernel. Maybe something like copy_from_user_toio()?
Thanks for the response.
Changing copy_from_user() to copy_from_user_toio() in snd_pcm_lib_write_transfer() (sound/core/pcm_lib.c) works. I am not sure about other effects of making such a change in a core sound file, can someone comment ?
Otherwise, if you can remap the buffer as Normal (non-cacheable) memory, you could use something like vmap() with pgprot_writecombine for the mapping. Even though ioremap_wc() returns (currently) Normal non-cacheable memory, such pointer (__iomem) should only be used with specific IO accessors (though I can see this assumption broken in a few places in the kernel).
Currently, if I continue to use the SRAM allocation driver, two options are working: 1) ioremap to ioremap_wc in drivers/misc/sram.c. 2) copy_from_user to copy_from_user_toio in sound/core/pcm_lib.c
Which would be the preferred option ?
Regards, Abhilash
-- Catalin
Abhilash Kesavan wrote:
Currently, if I continue to use the SRAM allocation driver, two options are working:
- ioremap to ioremap_wc in drivers/misc/sram.c.
- copy_from_user to copy_from_user_toio in sound/core/pcm_lib.c
Which would be the preferred option ?
I don't know how to tell the SRAM allocator to hand out WC memory, but copy_from_user_toio() is what you would put into your .copy callback.
Regards, Clemens
On Tue, Dec 09, 2014 at 02:56:59AM +0000, Abhilash Kesavan wrote:
On Mon, Dec 8, 2014 at 5:55 PM, Catalin Marinas catalin.marinas@arm.com wrote:
On Mon, Dec 08, 2014 at 11:15:52AM +0000, Lars-Peter Clausen wrote:
On 12/08/2014 11:41 AM, Takashi Iwai wrote:
At Mon, 08 Dec 2014 09:49:37 +0100, Clemens Ladisch wrote:
Abhilash Kesavan wrote:
I am working on a 64-bit ARM SoC (Samsung's Exynos7) and have observed unaligned faults while testing certain sound streams with aplay.
[ 24.535661] snd_pcm_lib_write_transfer:sound/core/pcm_lib.c hwbuf is ffffff8000085624, runtime->dma_area is ffffff8000080000, hwoff is 5513, frames_to_bytes is 22052, frames is 5513 [ 24.551244] Unhandled fault: alignment fault (0x96000061) at 0xffffff8000085624 [ 24.579944] PC is at __copy_from_user+0x14/0x60 [ 24.584450] LR is at snd_pcm_lib_write_transfer+0xe4/0x104 [ 24.922054] Call trace: [ 24.924488] [<ffffffc0002d4784>] __copy_from_user+0x14/0x60 [ 24.930040] [<ffffffc0004b7574>] snd_pcm_lib_write1+0x1fc/0x384
I can see copy_from_user() here, I assume the source is a user space buffer and the destination is the ioremap'ed SRAM (which causes the unaligned fault). Such function is not supposed copy into device memory (its prototype does not specify __iomem for the destination pointer).
If you really want Device memory for the SRAM, I think we need some other API. memcpy_toio() could copy into such buffer but the source must be kernel. Maybe something like copy_from_user_toio()?
Thanks for the response.
Changing copy_from_user() to copy_from_user_toio() in snd_pcm_lib_write_transfer() (sound/core/pcm_lib.c) works. I am not sure about other effects of making such a change in a core sound file, can someone comment ?
The default implementation uses copy_from_user() to an intermediate buffer and another memcpy_toio() to the SRAM destination. So, it will be slower.
In general it may be better to copy this function to some more generic place and let the arch code override it with a more optimal implementation.
Otherwise, if you can remap the buffer as Normal (non-cacheable) memory, you could use something like vmap() with pgprot_writecombine for the mapping. Even though ioremap_wc() returns (currently) Normal non-cacheable memory, such pointer (__iomem) should only be used with specific IO accessors (though I can see this assumption broken in a few places in the kernel).
Currently, if I continue to use the SRAM allocation driver, two options are working:
- ioremap to ioremap_wc in drivers/misc/sram.c.
I think the SRAM driver does the correct devm_ioremap_resource() call but the implementation in lib/devres.c assumes that if IORESOURCE_CACHEABLE it should use devm_ioremap(). AFAIK, ioremap() always returns Device memory (even on x86, the default ioremap behaviour is nocache).
- copy_from_user to copy_from_user_toio in sound/core/pcm_lib.c
Which would be the preferred option ?
I would prefer the sram changed. It provides an allocation pool from SRAM which could as well be mapped as write-combine. If no-one complains about changing the SRAM mapping to WC, you could introduce a devm_ioremap_wc() in lib/devres.c and call that directly from sram_probe(). But I can't tell for sure what the memory type expectations for SRAM are.
Hi Catalin,
On Tue, Dec 9, 2014 at 5:13 PM, Catalin Marinas catalin.marinas@arm.com wrote:
On Tue, Dec 09, 2014 at 02:56:59AM +0000, Abhilash Kesavan wrote:
On Mon, Dec 8, 2014 at 5:55 PM, Catalin Marinas catalin.marinas@arm.com wrote:
On Mon, Dec 08, 2014 at 11:15:52AM +0000, Lars-Peter Clausen wrote:
On 12/08/2014 11:41 AM, Takashi Iwai wrote:
At Mon, 08 Dec 2014 09:49:37 +0100, Clemens Ladisch wrote:
Abhilash Kesavan wrote: > I am working on a 64-bit ARM SoC (Samsung's Exynos7) and have observed > unaligned faults while testing certain sound streams with aplay. > > [ 24.535661] snd_pcm_lib_write_transfer:sound/core/pcm_lib.c hwbuf is ffffff8000085624, runtime->dma_area is ffffff8000080000, hwoff is 5513, frames_to_bytes is 22052, frames is 5513 > [ 24.551244] Unhandled fault: alignment fault (0x96000061) at 0xffffff8000085624 > [ 24.579944] PC is at __copy_from_user+0x14/0x60 > [ 24.584450] LR is at snd_pcm_lib_write_transfer+0xe4/0x104 > [ 24.922054] Call trace: > [ 24.924488] [<ffffffc0002d4784>] __copy_from_user+0x14/0x60 > [ 24.930040] [<ffffffc0004b7574>] snd_pcm_lib_write1+0x1fc/0x384
I can see copy_from_user() here, I assume the source is a user space buffer and the destination is the ioremap'ed SRAM (which causes the unaligned fault). Such function is not supposed copy into device memory (its prototype does not specify __iomem for the destination pointer).
If you really want Device memory for the SRAM, I think we need some other API. memcpy_toio() could copy into such buffer but the source must be kernel. Maybe something like copy_from_user_toio()?
Thanks for the response.
Changing copy_from_user() to copy_from_user_toio() in snd_pcm_lib_write_transfer() (sound/core/pcm_lib.c) works. I am not sure about other effects of making such a change in a core sound file, can someone comment ?
The default implementation uses copy_from_user() to an intermediate buffer and another memcpy_toio() to the SRAM destination. So, it will be slower.
In general it may be better to copy this function to some more generic place and let the arch code override it with a more optimal implementation.
Otherwise, if you can remap the buffer as Normal (non-cacheable) memory, you could use something like vmap() with pgprot_writecombine for the mapping. Even though ioremap_wc() returns (currently) Normal non-cacheable memory, such pointer (__iomem) should only be used with specific IO accessors (though I can see this assumption broken in a few places in the kernel).
Currently, if I continue to use the SRAM allocation driver, two options are working:
- ioremap to ioremap_wc in drivers/misc/sram.c.
I think the SRAM driver does the correct devm_ioremap_resource() call but the implementation in lib/devres.c assumes that if IORESOURCE_CACHEABLE it should use devm_ioremap(). AFAIK, ioremap() always returns Device memory (even on x86, the default ioremap behaviour is nocache).
- copy_from_user to copy_from_user_toio in sound/core/pcm_lib.c
Which would be the preferred option ?
I would prefer the sram changed. It provides an allocation pool from SRAM which could as well be mapped as write-combine. If no-one complains about changing the SRAM mapping to WC, you could introduce a devm_ioremap_wc() in lib/devres.c and call that directly from sram_probe(). But I can't tell for sure what the memory type expectations for SRAM are.
Thanks for the response. I will post patches to do this. Also, adding guys who have been working on the generic sram driver and internal sram support for sound in case they have any comments regarding the change to ioremap_wc.
Regards, Abhilash
Hi Clemens,
On Mon, Dec 8, 2014 at 2:19 PM, Clemens Ladisch clemens@ladisch.de wrote:
Abhilash Kesavan wrote:
I am working on a 64-bit ARM SoC (Samsung's Exynos7) and have observed unaligned faults while testing certain sound streams with aplay.
[ 24.535661] snd_pcm_lib_write_transfer:sound/core/pcm_lib.c hwbuf is ffffff8000085624, runtime->dma_area is ffffff8000080000, hwoff is 5513, frames_to_bytes is 22052, frames is 5513 [ 24.551244] Unhandled fault: alignment fault (0x96000061) at 0xffffff8000085624 [ 24.579944] PC is at __copy_from_user+0x14/0x60 [ 24.584450] LR is at snd_pcm_lib_write_transfer+0xe4/0x104 [ 24.922054] Call trace: [ 24.924488] [<ffffffc0002d4784>] __copy_from_user+0x14/0x60 [ 24.930040] [<ffffffc0004b7574>] snd_pcm_lib_write1+0x1fc/0x384
We are using the internal sram available for sound, for DMA buffer allocation, using the generic SRAM driver. From the above log, the buffer address offset is not 8-byte aligned and as we are using the SRAM driver which maps the memory as device memory we are getting an unaligned fault. Is it incorrect to use the generic sram driver for arm64 or am I missing something ?
When you give the ALSA framework a memory buffer, it is assumed that it is general-purpose memory, i.e., can be accessed with any alignment.
If your memory does not allow such accesses, you have to do all the accesses in your driver, i.e., do everything in the .copy/.silence callbacks. (You also have to add constraints so that periods and buffer are correctly aligned.)
Thanks for the reply. Exynos7 is using generic dmaengine and the default dmaengine_pcm_ops does not have the copy/silence call-backs populated. Is the expectation that I should add the copy call-back for my platform even though it uses the generic dmaengine framework ?
Regards, Abhilash
Regards, Clemens
participants (6)
-
Abhilash Kesavan
-
Catalin Marinas
-
Clemens Ladisch
-
Lars-Peter Clausen
-
Takashi Iwai
-
Will Deacon