a saner API for allocating DMA addressable pages v3
Hi all,
this series replaced the DMA_ATTR_NON_CONSISTENT flag to dma_alloc_attrs with a separate new dma_alloc_pages API, which is available on all platforms. In addition to cleaning up the convoluted code path, this ensures that other drivers that have asked for better support for non-coherent DMA to pages with incurring bounce buffering over can finally be properly supported.
As a follow up I plan to move the implementation of the DMA_ATTR_NO_KERNEL_MAPPING flag over to this framework as well, given that is also is a fundamentally non coherent allocation. The replacement for that flag would then return a struct page, as it is allowed to actually return pages without a kernel mapping as the name suggested (although most of the time they will actually have a kernel mapping..)
In addition to the conversions of the existing non-coherent DMA users, I've also added a patch to convert the firewire ohci driver to use the new dma_alloc_pages API.
The first patch is queued up for 5.9 in the media tree, but included here for completeness.
A git tree is available here:
git://git.infradead.org/users/hch/misc.git dma_alloc_pages
Gitweb:
http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma_alloc_pa...
Changes since v2: - fix up the patch reshuffle which wasn't quite correct - fix up a few commit messages
Changes since v1: - rebased on the latests dma-mapping tree, which merged many of the cleanups - fix an argument passing typo in 53c700, caught by sparse - rename a few macro arguments in 53c700 - pass the right device to the DMA API in the lib82596 drivers - fix memory ownershiptransfers in sgiseeq - better document what a page in the direct kernel mapping means - split into dma_alloc_pages that returns a struct page and is in the direct mapping vs dma_alloc_noncoherent that can be vmapped - conver the firewire ohci driver to dma_alloc_pages
Diffstat:
From: Sergey Senozhatsky sergey.senozhatsky@gmail.com
The patch partially reverts some of the UAPI bits of the buffer cache management hints. Namely, the queue consistency (memory coherency) user-space hint because, as it turned out, the kernel implementation of this feature was misusing DMA_ATTR_NON_CONSISTENT.
The patch revers both kernel and user space parts: removes the DMA consistency attr functions, rollbacks changes to v4l2_requestbuffers, v4l2_create_buffers structures and corresponding UAPI functions (plus compat32 layer) and cleanups the documentation.
Signed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Sergey Senozhatsky sergey.senozhatsky@gmail.com Signed-off-by: Christoph Hellwig hch@lst.de --- .../userspace-api/media/v4l/buffer.rst | 17 ------- .../media/v4l/vidioc-create-bufs.rst | 6 +-- .../media/v4l/vidioc-reqbufs.rst | 12 +---- .../media/common/videobuf2/videobuf2-core.c | 46 +++---------------- .../common/videobuf2/videobuf2-dma-contig.c | 19 -------- .../media/common/videobuf2/videobuf2-dma-sg.c | 3 +- .../media/common/videobuf2/videobuf2-v4l2.c | 18 +------- drivers/media/v4l2-core/v4l2-compat-ioctl32.c | 10 +--- drivers/media/v4l2-core/v4l2-ioctl.c | 5 +- include/media/videobuf2-core.h | 7 +-- include/uapi/linux/videodev2.h | 13 +----- 11 files changed, 22 insertions(+), 134 deletions(-)
diff --git a/Documentation/userspace-api/media/v4l/buffer.rst b/Documentation/userspace-api/media/v4l/buffer.rst index 57e752aaf414a7..2044ed13cd9d7d 100644 --- a/Documentation/userspace-api/media/v4l/buffer.rst +++ b/Documentation/userspace-api/media/v4l/buffer.rst @@ -701,23 +701,6 @@ Memory Consistency Flags :stub-columns: 0 :widths: 3 1 4
- * .. _`V4L2-FLAG-MEMORY-NON-CONSISTENT`: - - - ``V4L2_FLAG_MEMORY_NON_CONSISTENT`` - - 0x00000001 - - A buffer is allocated either in consistent (it will be automatically - coherent between the CPU and the bus) or non-consistent memory. The - latter can provide performance gains, for instance the CPU cache - sync/flush operations can be avoided if the buffer is accessed by the - corresponding device only and the CPU does not read/write to/from that - buffer. However, this requires extra care from the driver -- it must - guarantee memory consistency by issuing a cache flush/sync when - consistency is needed. If this flag is set V4L2 will attempt to - allocate the buffer in non-consistent memory. The flag takes effect - only if the buffer is used for :ref:`memory mapping <mmap>` I/O and the - queue reports the :ref:`V4L2_BUF_CAP_SUPPORTS_MMAP_CACHE_HINTS - <V4L2-BUF-CAP-SUPPORTS-MMAP-CACHE-HINTS>` capability. - .. c:type:: v4l2_memory
enum v4l2_memory diff --git a/Documentation/userspace-api/media/v4l/vidioc-create-bufs.rst b/Documentation/userspace-api/media/v4l/vidioc-create-bufs.rst index f2a702870fadc1..12cf6b44f414f7 100644 --- a/Documentation/userspace-api/media/v4l/vidioc-create-bufs.rst +++ b/Documentation/userspace-api/media/v4l/vidioc-create-bufs.rst @@ -120,13 +120,9 @@ than the number requested. If you want to just query the capabilities without making any other changes, then set ``count`` to 0, ``memory`` to ``V4L2_MEMORY_MMAP`` and ``format.type`` to the buffer type. - * - __u32 - - ``flags`` - - Specifies additional buffer management attributes. - See :ref:`memory-flags`.
* - __u32 - - ``reserved``\ [6] + - ``reserved``\ [7] - A place holder for future extensions. Drivers and applications must set the array to zero.
diff --git a/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst b/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst index 75d894d9c36c42..0e3e2fde65e850 100644 --- a/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst +++ b/Documentation/userspace-api/media/v4l/vidioc-reqbufs.rst @@ -112,17 +112,10 @@ aborting or finishing any DMA in progress, an implicit ``V4L2_MEMORY_MMAP`` and ``type`` set to the buffer type. This will free any previously allocated buffers, so this is typically something that will be done at the start of the application. - * - union { - - (anonymous) - * - __u32 - - ``flags`` - - Specifies additional buffer management attributes. - See :ref:`memory-flags`. * - __u32 - ``reserved``\ [1] - - Kept for backwards compatibility. Use ``flags`` instead. - * - } - - + - A place holder for future extensions. Drivers and applications + must set the array to zero.
.. tabularcolumns:: |p{6.1cm}|p{2.2cm}|p{8.7cm}|
@@ -169,7 +162,6 @@ aborting or finishing any DMA in progress, an implicit - This capability is set by the driver to indicate that the queue supports cache and memory management hints. However, it's only valid when the queue is used for :ref:`memory mapping <mmap>` streaming I/O. See - :ref:`V4L2_FLAG_MEMORY_NON_CONSISTENT <V4L2-FLAG-MEMORY-NON-CONSISTENT>`, :ref:`V4L2_BUF_FLAG_NO_CACHE_INVALIDATE <V4L2-BUF-FLAG-NO-CACHE-INVALIDATE>` and :ref:`V4L2_BUF_FLAG_NO_CACHE_CLEAN <V4L2-BUF-FLAG-NO-CACHE-CLEAN>`.
diff --git a/drivers/media/common/videobuf2/videobuf2-core.c b/drivers/media/common/videobuf2/videobuf2-core.c index f544d3393e9d6b..4eab6d81cce170 100644 --- a/drivers/media/common/videobuf2/videobuf2-core.c +++ b/drivers/media/common/videobuf2/videobuf2-core.c @@ -721,39 +721,14 @@ int vb2_verify_memory_type(struct vb2_queue *q, } EXPORT_SYMBOL(vb2_verify_memory_type);
-static void set_queue_consistency(struct vb2_queue *q, bool consistent_mem) -{ - q->dma_attrs &= ~DMA_ATTR_NON_CONSISTENT; - - if (!vb2_queue_allows_cache_hints(q)) - return; - if (!consistent_mem) - q->dma_attrs |= DMA_ATTR_NON_CONSISTENT; -} - -static bool verify_consistency_attr(struct vb2_queue *q, bool consistent_mem) -{ - bool queue_is_consistent = !(q->dma_attrs & DMA_ATTR_NON_CONSISTENT); - - if (consistent_mem != queue_is_consistent) { - dprintk(q, 1, "memory consistency model mismatch\n"); - return false; - } - return true; -} - int vb2_core_reqbufs(struct vb2_queue *q, enum vb2_memory memory, - unsigned int flags, unsigned int *count) + unsigned int *count) { unsigned int num_buffers, allocated_buffers, num_planes = 0; unsigned plane_sizes[VB2_MAX_PLANES] = { }; - bool consistent_mem = true; unsigned int i; int ret;
- if (flags & V4L2_FLAG_MEMORY_NON_CONSISTENT) - consistent_mem = false; - if (q->streaming) { dprintk(q, 1, "streaming active\n"); return -EBUSY; @@ -765,8 +740,7 @@ int vb2_core_reqbufs(struct vb2_queue *q, enum vb2_memory memory, }
if (*count == 0 || q->num_buffers != 0 || - (q->memory != VB2_MEMORY_UNKNOWN && q->memory != memory) || - !verify_consistency_attr(q, consistent_mem)) { + (q->memory != VB2_MEMORY_UNKNOWN && q->memory != memory)) { /* * We already have buffers allocated, so first check if they * are not in use and can be freed. @@ -803,7 +777,6 @@ int vb2_core_reqbufs(struct vb2_queue *q, enum vb2_memory memory, num_buffers = min_t(unsigned int, num_buffers, VB2_MAX_FRAME); memset(q->alloc_devs, 0, sizeof(q->alloc_devs)); q->memory = memory; - set_queue_consistency(q, consistent_mem);
/* * Ask the driver how many buffers and planes per buffer it requires. @@ -888,18 +861,14 @@ int vb2_core_reqbufs(struct vb2_queue *q, enum vb2_memory memory, EXPORT_SYMBOL_GPL(vb2_core_reqbufs);
int vb2_core_create_bufs(struct vb2_queue *q, enum vb2_memory memory, - unsigned int flags, unsigned int *count, + unsigned int *count, unsigned int requested_planes, const unsigned int requested_sizes[]) { unsigned int num_planes = 0, num_buffers, allocated_buffers; unsigned plane_sizes[VB2_MAX_PLANES] = { }; - bool consistent_mem = true; int ret;
- if (flags & V4L2_FLAG_MEMORY_NON_CONSISTENT) - consistent_mem = false; - if (q->num_buffers == VB2_MAX_FRAME) { dprintk(q, 1, "maximum number of buffers already allocated\n"); return -ENOBUFS; @@ -912,15 +881,12 @@ int vb2_core_create_bufs(struct vb2_queue *q, enum vb2_memory memory, } memset(q->alloc_devs, 0, sizeof(q->alloc_devs)); q->memory = memory; - set_queue_consistency(q, consistent_mem); q->waiting_for_buffers = !q->is_output; } else { if (q->memory != memory) { dprintk(q, 1, "memory model mismatch\n"); return -EINVAL; } - if (!verify_consistency_attr(q, consistent_mem)) - return -EINVAL; }
num_buffers = min(*count, VB2_MAX_FRAME - q->num_buffers); @@ -2581,7 +2547,7 @@ static int __vb2_init_fileio(struct vb2_queue *q, int read) fileio->memory = VB2_MEMORY_MMAP; fileio->type = q->type; q->fileio = fileio; - ret = vb2_core_reqbufs(q, fileio->memory, 0, &fileio->count); + ret = vb2_core_reqbufs(q, fileio->memory, &fileio->count); if (ret) goto err_kfree;
@@ -2638,7 +2604,7 @@ static int __vb2_init_fileio(struct vb2_queue *q, int read)
err_reqbufs: fileio->count = 0; - vb2_core_reqbufs(q, fileio->memory, 0, &fileio->count); + vb2_core_reqbufs(q, fileio->memory, &fileio->count);
err_kfree: q->fileio = NULL; @@ -2658,7 +2624,7 @@ static int __vb2_cleanup_fileio(struct vb2_queue *q) vb2_core_streamoff(q, q->type); q->fileio = NULL; fileio->count = 0; - vb2_core_reqbufs(q, fileio->memory, 0, &fileio->count); + vb2_core_reqbufs(q, fileio->memory, &fileio->count); kfree(fileio); dprintk(q, 3, "file io emulator closed\n"); } diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c b/drivers/media/common/videobuf2/videobuf2-dma-contig.c index ec3446cc45b8da..7b1b86ec942d7d 100644 --- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c +++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c @@ -42,11 +42,6 @@ struct vb2_dc_buf { struct dma_buf_attachment *db_attach; };
-static inline bool vb2_dc_buffer_consistent(unsigned long attr) -{ - return !(attr & DMA_ATTR_NON_CONSISTENT); -} - /*********************************************/ /* scatterlist table functions */ /*********************************************/ @@ -341,13 +336,6 @@ static int vb2_dc_dmabuf_ops_begin_cpu_access(struct dma_buf *dbuf, enum dma_data_direction direction) { - struct vb2_dc_buf *buf = dbuf->priv; - struct sg_table *sgt = buf->dma_sgt; - - if (vb2_dc_buffer_consistent(buf->attrs)) - return 0; - - dma_sync_sg_for_cpu(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir); return 0; }
@@ -355,13 +343,6 @@ static int vb2_dc_dmabuf_ops_end_cpu_access(struct dma_buf *dbuf, enum dma_data_direction direction) { - struct vb2_dc_buf *buf = dbuf->priv; - struct sg_table *sgt = buf->dma_sgt; - - if (vb2_dc_buffer_consistent(buf->attrs)) - return 0; - - dma_sync_sg_for_device(buf->dev, sgt->sgl, sgt->nents, buf->dma_dir); return 0; }
diff --git a/drivers/media/common/videobuf2/videobuf2-dma-sg.c b/drivers/media/common/videobuf2/videobuf2-dma-sg.c index 0a40e00f0d7e5c..a86fce5d8ea8bf 100644 --- a/drivers/media/common/videobuf2/videobuf2-dma-sg.c +++ b/drivers/media/common/videobuf2/videobuf2-dma-sg.c @@ -123,8 +123,7 @@ static void *vb2_dma_sg_alloc(struct device *dev, unsigned long dma_attrs, /* * NOTE: dma-sg allocates memory using the page allocator directly, so * there is no memory consistency guarantee, hence dma-sg ignores DMA - * attributes passed from the upper layer. That means that - * V4L2_FLAG_MEMORY_NON_CONSISTENT has no effect on dma-sg buffers. + * attributes passed from the upper layer. */ buf->pages = kvmalloc_array(buf->num_pages, sizeof(struct page *), GFP_KERNEL | __GFP_ZERO); diff --git a/drivers/media/common/videobuf2/videobuf2-v4l2.c b/drivers/media/common/videobuf2/videobuf2-v4l2.c index 30caad27281e1a..cfe197df970df2 100644 --- a/drivers/media/common/videobuf2/videobuf2-v4l2.c +++ b/drivers/media/common/videobuf2/videobuf2-v4l2.c @@ -722,22 +722,12 @@ static void fill_buf_caps(struct vb2_queue *q, u32 *caps) #endif }
-static void clear_consistency_attr(struct vb2_queue *q, - int memory, - unsigned int *flags) -{ - if (!q->allow_cache_hints || memory != V4L2_MEMORY_MMAP) - *flags &= ~V4L2_FLAG_MEMORY_NON_CONSISTENT; -} - int vb2_reqbufs(struct vb2_queue *q, struct v4l2_requestbuffers *req) { int ret = vb2_verify_memory_type(q, req->memory, req->type);
fill_buf_caps(q, &req->capabilities); - clear_consistency_attr(q, req->memory, &req->flags); - return ret ? ret : vb2_core_reqbufs(q, req->memory, - req->flags, &req->count); + return ret ? ret : vb2_core_reqbufs(q, req->memory, &req->count); } EXPORT_SYMBOL_GPL(vb2_reqbufs);
@@ -769,7 +759,6 @@ int vb2_create_bufs(struct vb2_queue *q, struct v4l2_create_buffers *create) unsigned i;
fill_buf_caps(q, &create->capabilities); - clear_consistency_attr(q, create->memory, &create->flags); create->index = q->num_buffers; if (create->count == 0) return ret != -EBUSY ? ret : 0; @@ -813,7 +802,6 @@ int vb2_create_bufs(struct vb2_queue *q, struct v4l2_create_buffers *create) if (requested_sizes[i] == 0) return -EINVAL; return ret ? ret : vb2_core_create_bufs(q, create->memory, - create->flags, &create->count, requested_planes, requested_sizes); @@ -998,12 +986,11 @@ int vb2_ioctl_reqbufs(struct file *file, void *priv, int res = vb2_verify_memory_type(vdev->queue, p->memory, p->type);
fill_buf_caps(vdev->queue, &p->capabilities); - clear_consistency_attr(vdev->queue, p->memory, &p->flags); if (res) return res; if (vb2_queue_is_busy(vdev, file)) return -EBUSY; - res = vb2_core_reqbufs(vdev->queue, p->memory, p->flags, &p->count); + res = vb2_core_reqbufs(vdev->queue, p->memory, &p->count); /* If count == 0, then the owner has released all buffers and he is no longer owner of the queue. Otherwise we have a new owner. */ if (res == 0) @@ -1021,7 +1008,6 @@ int vb2_ioctl_create_bufs(struct file *file, void *priv,
p->index = vdev->queue->num_buffers; fill_buf_caps(vdev->queue, &p->capabilities); - clear_consistency_attr(vdev->queue, p->memory, &p->flags); /* * If count == 0, then just check if memory and type are valid. * Any -EBUSY result from vb2_verify_memory_type can be mapped to 0. diff --git a/drivers/media/v4l2-core/v4l2-compat-ioctl32.c b/drivers/media/v4l2-core/v4l2-compat-ioctl32.c index 593bcf6c373502..a99e82ec9ab60d 100644 --- a/drivers/media/v4l2-core/v4l2-compat-ioctl32.c +++ b/drivers/media/v4l2-core/v4l2-compat-ioctl32.c @@ -246,9 +246,6 @@ struct v4l2_format32 { * @memory: buffer memory type * @format: frame format, for which buffers are requested * @capabilities: capabilities of this buffer type. - * @flags: additional buffer management attributes (ignored unless the - * queue has V4L2_BUF_CAP_SUPPORTS_MMAP_CACHE_HINTS capability and - * configured for MMAP streaming I/O). * @reserved: future extensions */ struct v4l2_create_buffers32 { @@ -257,8 +254,7 @@ struct v4l2_create_buffers32 { __u32 memory; /* enum v4l2_memory */ struct v4l2_format32 format; __u32 capabilities; - __u32 flags; - __u32 reserved[6]; + __u32 reserved[7]; };
static int __bufsize_v4l2_format(struct v4l2_format32 __user *p32, u32 *size) @@ -359,8 +355,7 @@ static int get_v4l2_create32(struct v4l2_create_buffers __user *p64, { if (!access_ok(p32, sizeof(*p32)) || copy_in_user(p64, p32, - offsetof(struct v4l2_create_buffers32, format)) || - assign_in_user(&p64->flags, &p32->flags)) + offsetof(struct v4l2_create_buffers32, format))) return -EFAULT; return __get_v4l2_format32(&p64->format, &p32->format, aux_buf, aux_space); @@ -422,7 +417,6 @@ static int put_v4l2_create32(struct v4l2_create_buffers __user *p64, copy_in_user(p32, p64, offsetof(struct v4l2_create_buffers32, format)) || assign_in_user(&p32->capabilities, &p64->capabilities) || - assign_in_user(&p32->flags, &p64->flags) || copy_in_user(p32->reserved, p64->reserved, sizeof(p64->reserved))) return -EFAULT; return __put_v4l2_format32(&p64->format, &p32->format); diff --git a/drivers/media/v4l2-core/v4l2-ioctl.c b/drivers/media/v4l2-core/v4l2-ioctl.c index 2a22e13a630346..e0520c85a3b725 100644 --- a/drivers/media/v4l2-core/v4l2-ioctl.c +++ b/drivers/media/v4l2-core/v4l2-ioctl.c @@ -2042,6 +2042,9 @@ static int v4l_reqbufs(const struct v4l2_ioctl_ops *ops,
if (ret) return ret; + + CLEAR_AFTER_FIELD(p, capabilities); + return ops->vidioc_reqbufs(file, fh, p); }
@@ -2081,7 +2084,7 @@ static int v4l_create_bufs(const struct v4l2_ioctl_ops *ops, if (ret) return ret;
- CLEAR_AFTER_FIELD(create, flags); + CLEAR_AFTER_FIELD(create, capabilities);
v4l_sanitize_format(&create->format);
diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h index 52ef92049073e3..bbb3f26fbde978 100644 --- a/include/media/videobuf2-core.h +++ b/include/media/videobuf2-core.h @@ -744,8 +744,6 @@ void vb2_core_querybuf(struct vb2_queue *q, unsigned int index, void *pb); * vb2_core_reqbufs() - Initiate streaming. * @q: pointer to &struct vb2_queue with videobuf2 queue. * @memory: memory type, as defined by &enum vb2_memory. - * @flags: auxiliary queue/buffer management flags. Currently, the only - * used flag is %V4L2_FLAG_MEMORY_NON_CONSISTENT. * @count: requested buffer count. * * Videobuf2 core helper to implement VIDIOC_REQBUF() operation. It is called @@ -770,13 +768,12 @@ void vb2_core_querybuf(struct vb2_queue *q, unsigned int index, void *pb); * Return: returns zero on success; an error code otherwise. */ int vb2_core_reqbufs(struct vb2_queue *q, enum vb2_memory memory, - unsigned int flags, unsigned int *count); + unsigned int *count);
/** * vb2_core_create_bufs() - Allocate buffers and any required auxiliary structs * @q: pointer to &struct vb2_queue with videobuf2 queue. * @memory: memory type, as defined by &enum vb2_memory. - * @flags: auxiliary queue/buffer management flags. * @count: requested buffer count. * @requested_planes: number of planes requested. * @requested_sizes: array with the size of the planes. @@ -794,7 +791,7 @@ int vb2_core_reqbufs(struct vb2_queue *q, enum vb2_memory memory, * Return: returns zero on success; an error code otherwise. */ int vb2_core_create_bufs(struct vb2_queue *q, enum vb2_memory memory, - unsigned int flags, unsigned int *count, + unsigned int *count, unsigned int requested_planes, const unsigned int requested_sizes[]);
diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h index c7b70ff53bc1dd..235db7754606d6 100644 --- a/include/uapi/linux/videodev2.h +++ b/include/uapi/linux/videodev2.h @@ -191,8 +191,6 @@ enum v4l2_memory { V4L2_MEMORY_DMABUF = 4, };
-#define V4L2_FLAG_MEMORY_NON_CONSISTENT (1 << 0) - /* see also http://vektor.theorem.ca/graphics/ycbcr/ */ enum v4l2_colorspace { /* @@ -949,10 +947,7 @@ struct v4l2_requestbuffers { __u32 type; /* enum v4l2_buf_type */ __u32 memory; /* enum v4l2_memory */ __u32 capabilities; - union { - __u32 flags; - __u32 reserved[1]; - }; + __u32 reserved[1]; };
/* capabilities for struct v4l2_requestbuffers and v4l2_create_buffers */ @@ -2456,9 +2451,6 @@ struct v4l2_dbg_chip_info { * @memory: enum v4l2_memory; buffer memory type * @format: frame format, for which buffers are requested * @capabilities: capabilities of this buffer type. - * @flags: additional buffer management attributes (ignored unless the - * queue has V4L2_BUF_CAP_SUPPORTS_MMAP_CACHE_HINTS capability - * and configured for MMAP streaming I/O). * @reserved: future extensions */ struct v4l2_create_buffers { @@ -2467,8 +2459,7 @@ struct v4l2_create_buffers { __u32 memory; struct v4l2_format format; __u32 capabilities; - __u32 flags; - __u32 reserved[6]; + __u32 reserved[7]; };
/*
Hi Christoph,
On Tue, Sep 15, 2020 at 05:51:05PM +0200, Christoph Hellwig wrote:
From: Sergey Senozhatsky sergey.senozhatsky@gmail.com
The patch partially reverts some of the UAPI bits of the buffer cache management hints. Namely, the queue consistency (memory coherency) user-space hint because, as it turned out, the kernel implementation of this feature was misusing DMA_ATTR_NON_CONSISTENT.
The patch revers both kernel and user space parts: removes the DMA consistency attr functions, rollbacks changes to v4l2_requestbuffers, v4l2_create_buffers structures and corresponding UAPI functions (plus compat32 layer) and cleanups the documentation.
Signed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Sergey Senozhatsky sergey.senozhatsky@gmail.com Signed-off-by: Christoph Hellwig hch@lst.de
.../userspace-api/media/v4l/buffer.rst | 17 ------- .../media/v4l/vidioc-create-bufs.rst | 6 +-- .../media/v4l/vidioc-reqbufs.rst | 12 +---- .../media/common/videobuf2/videobuf2-core.c | 46 +++---------------- .../common/videobuf2/videobuf2-dma-contig.c | 19 -------- .../media/common/videobuf2/videobuf2-dma-sg.c | 3 +- .../media/common/videobuf2/videobuf2-v4l2.c | 18 +------- drivers/media/v4l2-core/v4l2-compat-ioctl32.c | 10 +--- drivers/media/v4l2-core/v4l2-ioctl.c | 5 +- include/media/videobuf2-core.h | 7 +-- include/uapi/linux/videodev2.h | 13 +----- 11 files changed, 22 insertions(+), 134 deletions(-)
Acked-by: Tomasz Figa tfiga@chromium.org
Best regards, Tomasz
To prevent a compiler error when a method call alloc_pages is added (which I plan to for the dma_map_ops).
Signed-off-by: Christoph Hellwig hch@lst.de --- include/linux/gfp.h | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 67a0774e080b98..dd2577c5407112 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -550,8 +550,10 @@ extern struct page *alloc_pages_vma(gfp_t gfp_mask, int order, #define alloc_hugepage_vma(gfp_mask, vma, addr, order) \ alloc_pages_vma(gfp_mask, order, vma, addr, numa_node_id(), true) #else -#define alloc_pages(gfp_mask, order) \ - alloc_pages_node(numa_node_id(), gfp_mask, order) +static inline struct page *alloc_pages(gfp_t gfp_mask, unsigned int order) +{ + return alloc_pages_node(numa_node_id(), gfp_mask, order); +} #define alloc_pages_vma(gfp_mask, order, vma, addr, node, false)\ alloc_pages(gfp_mask, order) #define alloc_hugepage_vma(gfp_mask, vma, addr, order) \
DMA_ATTR_NON_CONSISTENT is a no-op except on PA-RISC and a few MIPS configs, so don't set it in this ARM specific driver.
Signed-off-by: Christoph Hellwig hch@lst.de --- drivers/gpu/drm/exynos/exynos_drm_gem.c | 2 -- 1 file changed, 2 deletions(-)
diff --git a/drivers/gpu/drm/exynos/exynos_drm_gem.c b/drivers/gpu/drm/exynos/exynos_drm_gem.c index efa476858db54b..07073222b8f691 100644 --- a/drivers/gpu/drm/exynos/exynos_drm_gem.c +++ b/drivers/gpu/drm/exynos/exynos_drm_gem.c @@ -42,8 +42,6 @@ static int exynos_drm_alloc_buf(struct exynos_drm_gem *exynos_gem, bool kvmap) if (exynos_gem->flags & EXYNOS_BO_WC || !(exynos_gem->flags & EXYNOS_BO_CACHABLE)) attr |= DMA_ATTR_WRITE_COMBINE; - else - attr |= DMA_ATTR_NON_CONSISTENT;
/* FBDev emulation requires kernel mapping */ if (!kvmap)
DMA_ATTR_NON_CONSISTENT is a no-op except on PA-RISC and a few MIPS configs, so don't set it in this ARM specific driver part.
Signed-off-by: Christoph Hellwig hch@lst.de --- drivers/gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c b/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c index 985f2990ab0dda..13d4d7ac0697b4 100644 --- a/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c @@ -594,8 +594,7 @@ gk20a_instmem_new(struct nvkm_device *device, int index,
nvkm_info(&imem->base.subdev, "using IOMMU\n"); } else { - imem->attrs = DMA_ATTR_NON_CONSISTENT | - DMA_ATTR_WEAK_ORDERING | + imem->attrs = DMA_ATTR_WEAK_ORDERING | DMA_ATTR_WRITE_COMBINE;
nvkm_info(&imem->base.subdev, "using DMA API\n");
The au1000-eth driver contains none of the manual cache synchronization required for using DMA_ATTR_NON_CONSISTENT. From what I can tell it can be used on both dma coherent and non-coherent DMA platforms, but I suspect it has been buggy on the non-coherent platforms all along.
Signed-off-by: Christoph Hellwig hch@lst.de --- drivers/net/ethernet/amd/au1000_eth.c | 15 ++++++--------- 1 file changed, 6 insertions(+), 9 deletions(-)
diff --git a/drivers/net/ethernet/amd/au1000_eth.c b/drivers/net/ethernet/amd/au1000_eth.c index 75dbd221dc594b..19e195420e2434 100644 --- a/drivers/net/ethernet/amd/au1000_eth.c +++ b/drivers/net/ethernet/amd/au1000_eth.c @@ -1131,10 +1131,9 @@ static int au1000_probe(struct platform_device *pdev) /* Allocate the data buffers * Snooping works fine with eth on all au1xxx */ - aup->vaddr = (u32)dma_alloc_attrs(&pdev->dev, MAX_BUF_SIZE * + aup->vaddr = (u32)dma_alloc_coherent(&pdev->dev, MAX_BUF_SIZE * (NUM_TX_BUFFS + NUM_RX_BUFFS), - &aup->dma_addr, 0, - DMA_ATTR_NON_CONSISTENT); + &aup->dma_addr, 0); if (!aup->vaddr) { dev_err(&pdev->dev, "failed to allocate data buffers\n"); err = -ENOMEM; @@ -1310,9 +1309,8 @@ static int au1000_probe(struct platform_device *pdev) err_remap2: iounmap(aup->mac); err_remap1: - dma_free_attrs(&pdev->dev, MAX_BUF_SIZE * (NUM_TX_BUFFS + NUM_RX_BUFFS), - (void *)aup->vaddr, aup->dma_addr, - DMA_ATTR_NON_CONSISTENT); + dma_free_coherent(&pdev->dev, MAX_BUF_SIZE * (NUM_TX_BUFFS + NUM_RX_BUFFS), + (void *)aup->vaddr, aup->dma_addr); err_vaddr: free_netdev(dev); err_alloc: @@ -1344,9 +1342,8 @@ static int au1000_remove(struct platform_device *pdev) if (aup->tx_db_inuse[i]) au1000_ReleaseDB(aup, aup->tx_db_inuse[i]);
- dma_free_attrs(&pdev->dev, MAX_BUF_SIZE * (NUM_TX_BUFFS + NUM_RX_BUFFS), - (void *)aup->vaddr, aup->dma_addr, - DMA_ATTR_NON_CONSISTENT); + dma_free_coherent(&pdev->dev, MAX_BUF_SIZE * (NUM_TX_BUFFS + NUM_RX_BUFFS), + (void *)aup->vaddr, aup->dma_addr);
iounmap(aup->macdma); iounmap(aup->mac);
This allows us to get rid of the LIB82596_DMA_ATTR defined and prepare for untangling the coherent vs non-coherent DMA allocation API.
Signed-off-by: Christoph Hellwig hch@lst.de --- drivers/net/ethernet/i825xx/lasi_82596.c | 24 ++++++++++------ drivers/net/ethernet/i825xx/lib82596.c | 36 ++++++++---------------- drivers/net/ethernet/i825xx/sni_82596.c | 19 +++++++++---- 3 files changed, 40 insertions(+), 39 deletions(-)
diff --git a/drivers/net/ethernet/i825xx/lasi_82596.c b/drivers/net/ethernet/i825xx/lasi_82596.c index aec7e98bcc853a..a12218e940a2fa 100644 --- a/drivers/net/ethernet/i825xx/lasi_82596.c +++ b/drivers/net/ethernet/i825xx/lasi_82596.c @@ -96,8 +96,6 @@
#define OPT_SWAP_PORT 0x0001 /* Need to wordswp on the MPU port */
-#define LIB82596_DMA_ATTR DMA_ATTR_NON_CONSISTENT - #define DMA_WBACK(ndev, addr, len) \ do { dma_cache_sync((ndev)->dev.parent, (void *)addr, len, DMA_TO_DEVICE); } while (0)
@@ -155,7 +153,7 @@ lan_init_chip(struct parisc_device *dev) { struct net_device *netdevice; struct i596_private *lp; - int retval; + int retval = -ENOMEM; int i;
if (!dev->irq) { @@ -186,12 +184,22 @@ lan_init_chip(struct parisc_device *dev)
lp = netdev_priv(netdevice); lp->options = dev->id.sversion == 0x72 ? OPT_SWAP_PORT : 0; + lp->dma = dma_alloc_attrs(&dev->dev, sizeof(struct i596_dma), + &lp->dma_addr, GFP_KERNEL, + DMA_ATTR_NON_CONSISTENT); + if (!lp->dma) + goto out_free_netdev;
retval = i82596_probe(netdevice); - if (retval) { - free_netdev(netdevice); - return -ENODEV; - } + if (retval) + goto out_free_dma; + return 0; + +out_free_dma: + dma_free_attrs(&dev->dev, sizeof(struct i596_dma), lp->dma, + lp->dma_addr, DMA_ATTR_NON_CONSISTENT); +out_free_netdev: + free_netdev(netdevice); return retval; }
@@ -202,7 +210,7 @@ static int __exit lan_remove_chip(struct parisc_device *pdev)
unregister_netdev (dev); dma_free_attrs(&pdev->dev, sizeof(struct i596_private), lp->dma, - lp->dma_addr, LIB82596_DMA_ATTR); + lp->dma_addr, DMA_ATTR_NON_CONSISTENT); free_netdev (dev); return 0; } diff --git a/drivers/net/ethernet/i825xx/lib82596.c b/drivers/net/ethernet/i825xx/lib82596.c index b03757e169e475..b4e4b3eb5758b5 100644 --- a/drivers/net/ethernet/i825xx/lib82596.c +++ b/drivers/net/ethernet/i825xx/lib82596.c @@ -1047,9 +1047,8 @@ static const struct net_device_ops i596_netdev_ops = {
static int i82596_probe(struct net_device *dev) { - int i; struct i596_private *lp = netdev_priv(dev); - struct i596_dma *dma; + int ret;
/* This lot is ensure things have been cache line aligned. */ BUILD_BUG_ON(sizeof(struct i596_rfd) != 32); @@ -1063,41 +1062,28 @@ static int i82596_probe(struct net_device *dev) if (!dev->base_addr || !dev->irq) return -ENODEV;
- dma = dma_alloc_attrs(dev->dev.parent, sizeof(struct i596_dma), - &lp->dma_addr, GFP_KERNEL, - LIB82596_DMA_ATTR); - if (!dma) { - printk(KERN_ERR "%s: Couldn't get shared memory\n", __FILE__); - return -ENOMEM; - } - dev->netdev_ops = &i596_netdev_ops; dev->watchdog_timeo = TX_TIMEOUT;
- memset(dma, 0, sizeof(struct i596_dma)); - lp->dma = dma; - - dma->scb.command = 0; - dma->scb.cmd = I596_NULL; - dma->scb.rfd = I596_NULL; + memset(lp->dma, 0, sizeof(struct i596_dma)); + lp->dma->scb.command = 0; + lp->dma->scb.cmd = I596_NULL; + lp->dma->scb.rfd = I596_NULL; spin_lock_init(&lp->lock);
- DMA_WBACK_INV(dev, dma, sizeof(struct i596_dma)); + DMA_WBACK_INV(dev, lp->dma, sizeof(struct i596_dma));
- i = register_netdev(dev); - if (i) { - dma_free_attrs(dev->dev.parent, sizeof(struct i596_dma), - dma, lp->dma_addr, LIB82596_DMA_ATTR); - return i; - } + ret = register_netdev(dev); + if (ret) + return ret;
DEB(DEB_PROBE, printk(KERN_INFO "%s: 82596 at %#3lx, %pM IRQ %d.\n", dev->name, dev->base_addr, dev->dev_addr, dev->irq)); DEB(DEB_INIT, printk(KERN_INFO "%s: dma at 0x%p (%d bytes), lp->scb at 0x%p\n", - dev->name, dma, (int)sizeof(struct i596_dma), - &dma->scb)); + dev->name, lp->dma, (int)sizeof(struct i596_dma), + &lp->dma->scb));
return 0; } diff --git a/drivers/net/ethernet/i825xx/sni_82596.c b/drivers/net/ethernet/i825xx/sni_82596.c index 22f5887578b2bd..4b9ac0c6557731 100644 --- a/drivers/net/ethernet/i825xx/sni_82596.c +++ b/drivers/net/ethernet/i825xx/sni_82596.c @@ -24,8 +24,6 @@
static const char sni_82596_string[] = "snirm_82596";
-#define LIB82596_DMA_ATTR 0 - #define DMA_WBACK(priv, addr, len) do { } while (0) #define DMA_INV(priv, addr, len) do { } while (0) #define DMA_WBACK_INV(priv, addr, len) do { } while (0) @@ -134,10 +132,19 @@ static int sni_82596_probe(struct platform_device *dev) lp->ca = ca_addr; lp->mpu_port = mpu_addr;
+ lp->dma = dma_alloc_coherent(&dev->dev, sizeof(struct i596_dma), + &lp->dma_addr, GFP_KERNEL); + if (!lp->dma) + goto probe_failed; + retval = i82596_probe(netdevice); - if (retval == 0) - return 0; + if (retval) + goto probe_failed_free_dma; + return 0;
+probe_failed_free_dma: + dma_free_coherent(&dev->dev, sizeof(struct i596_dma), lp->dma, + lp->dma_addr); probe_failed: free_netdev(netdevice); probe_failed_free_ca: @@ -153,8 +160,8 @@ static int sni_82596_driver_remove(struct platform_device *pdev) struct i596_private *lp = netdev_priv(dev);
unregister_netdev(dev); - dma_free_attrs(dev->dev.parent, sizeof(struct i596_private), lp->dma, - lp->dma_addr, LIB82596_DMA_ATTR); + dma_free_coherent(&pdev->dev, sizeof(struct i596_private), lp->dma, + lp->dma_addr); iounmap(lp->ca); iounmap(lp->mpu_port); free_netdev (dev);
On Tue, Sep 15, 2020 at 05:51:10PM +0200, Christoph Hellwig wrote:
This allows us to get rid of the LIB82596_DMA_ATTR defined and prepare for untangling the coherent vs non-coherent DMA allocation API.
Signed-off-by: Christoph Hellwig hch@lst.de
drivers/net/ethernet/i825xx/lasi_82596.c | 24 ++++++++++------ drivers/net/ethernet/i825xx/lib82596.c | 36 ++++++++---------------- drivers/net/ethernet/i825xx/sni_82596.c | 19 +++++++++---- 3 files changed, 40 insertions(+), 39 deletions(-)
Tested-by: Thomas Bogendoerfer tsbogend@alpha.franken.de (SNI part)
Switch the 53c700 driver to only use non-coherent descriptor memory if it really has to because dma_alloc_coherent fails. This doesn't matter for any of the platforms it runs on currently, but that will change soon.
To help with this two new helpers to transfer ownership to and from the device are added that abstract the syncing of the non-coherent memory. The two current bidirectional cases are mapped to transfers to the device, as that appears to what they are used for. Note that for parisc, which is the only architecture this driver needs to use non-coherent memory on, the direction argument of dma_cache_sync is ignored, so this will not change behavior in any way.
Signed-off-by: Christoph Hellwig hch@lst.de --- drivers/scsi/53c700.c | 113 +++++++++++++++++++++++------------------- drivers/scsi/53c700.h | 17 ++++--- 2 files changed, 72 insertions(+), 58 deletions(-)
diff --git a/drivers/scsi/53c700.c b/drivers/scsi/53c700.c index 84b57a8f86bfa9..c59226d7e2f6b5 100644 --- a/drivers/scsi/53c700.c +++ b/drivers/scsi/53c700.c @@ -269,6 +269,20 @@ NCR_700_get_SXFER(struct scsi_device *SDp) spi_period(SDp->sdev_target)); }
+static inline void dma_sync_to_dev(struct NCR_700_Host_Parameters *h, + void *addr, size_t size) +{ + if (h->noncoherent) + dma_cache_sync(h->dev, addr, size, DMA_TO_DEVICE); +} + +static inline void dma_sync_from_dev(struct NCR_700_Host_Parameters *h, + void *addr, size_t size) +{ + if (h->noncoherent) + dma_cache_sync(h->dev, addr, size, DMA_FROM_DEVICE); +} + struct Scsi_Host * NCR_700_detect(struct scsi_host_template *tpnt, struct NCR_700_Host_Parameters *hostdata, struct device *dev) @@ -283,9 +297,13 @@ NCR_700_detect(struct scsi_host_template *tpnt, if(tpnt->sdev_attrs == NULL) tpnt->sdev_attrs = NCR_700_dev_attrs;
- memory = dma_alloc_attrs(dev, TOTAL_MEM_SIZE, &pScript, - GFP_KERNEL, DMA_ATTR_NON_CONSISTENT); - if(memory == NULL) { + memory = dma_alloc_coherent(dev, TOTAL_MEM_SIZE, &pScript, GFP_KERNEL); + if (!memory) { + hostdata->noncoherent = 1; + memory = dma_alloc_attrs(dev, TOTAL_MEM_SIZE, &pScript, + GFP_KERNEL, DMA_ATTR_NON_CONSISTENT); + } + if (!memory) { printk(KERN_ERR "53c700: Failed to allocate memory for driver, detaching\n"); return NULL; } @@ -339,11 +357,11 @@ NCR_700_detect(struct scsi_host_template *tpnt, for (j = 0; j < PATCHES; j++) script[LABELPATCHES[j]] = bS_to_host(pScript + SCRIPT[LABELPATCHES[j]]); /* now patch up fixed addresses. */ - script_patch_32(hostdata->dev, script, MessageLocation, + script_patch_32(hostdata, script, MessageLocation, pScript + MSGOUT_OFFSET); - script_patch_32(hostdata->dev, script, StatusAddress, + script_patch_32(hostdata, script, StatusAddress, pScript + STATUS_OFFSET); - script_patch_32(hostdata->dev, script, ReceiveMsgAddress, + script_patch_32(hostdata, script, ReceiveMsgAddress, pScript + MSGIN_OFFSET);
hostdata->script = script; @@ -395,8 +413,12 @@ NCR_700_release(struct Scsi_Host *host) struct NCR_700_Host_Parameters *hostdata = (struct NCR_700_Host_Parameters *)host->hostdata[0];
- dma_free_attrs(hostdata->dev, TOTAL_MEM_SIZE, hostdata->script, - hostdata->pScript, DMA_ATTR_NON_CONSISTENT); + if (hostdata->noncoherent) + dma_free_attrs(hostdata->dev, TOTAL_MEM_SIZE, hostdata->script, + hostdata->pScript, DMA_ATTR_NON_CONSISTENT); + else + dma_free_coherent(hostdata->dev, TOTAL_MEM_SIZE, + hostdata->script, hostdata->pScript); return 1; }
@@ -804,8 +826,8 @@ process_extended_message(struct Scsi_Host *host, shost_printk(KERN_WARNING, host, "Unexpected SDTR msg\n"); hostdata->msgout[0] = A_REJECT_MSG; - dma_cache_sync(hostdata->dev, hostdata->msgout, 1, DMA_TO_DEVICE); - script_patch_16(hostdata->dev, hostdata->script, + dma_sync_to_dev(hostdata, hostdata->msgout, 1); + script_patch_16(hostdata, hostdata->script, MessageCount, 1); /* SendMsgOut returns, so set up the return * address */ @@ -817,9 +839,8 @@ process_extended_message(struct Scsi_Host *host, printk(KERN_INFO "scsi%d: (%d:%d), Unsolicited WDTR after CMD, Rejecting\n", host->host_no, pun, lun); hostdata->msgout[0] = A_REJECT_MSG; - dma_cache_sync(hostdata->dev, hostdata->msgout, 1, DMA_TO_DEVICE); - script_patch_16(hostdata->dev, hostdata->script, MessageCount, - 1); + dma_sync_to_dev(hostdata, hostdata->msgout, 1); + script_patch_16(hostdata, hostdata->script, MessageCount, 1); resume_offset = hostdata->pScript + Ent_SendMessageWithATN;
break; @@ -832,9 +853,8 @@ process_extended_message(struct Scsi_Host *host, printk("\n"); /* just reject it */ hostdata->msgout[0] = A_REJECT_MSG; - dma_cache_sync(hostdata->dev, hostdata->msgout, 1, DMA_TO_DEVICE); - script_patch_16(hostdata->dev, hostdata->script, MessageCount, - 1); + dma_sync_to_dev(hostdata, hostdata->msgout, 1); + script_patch_16(hostdata, hostdata->script, MessageCount, 1); /* SendMsgOut returns, so set up the return * address */ resume_offset = hostdata->pScript + Ent_SendMessageWithATN; @@ -917,9 +937,8 @@ process_message(struct Scsi_Host *host, struct NCR_700_Host_Parameters *hostdata printk("\n"); /* just reject it */ hostdata->msgout[0] = A_REJECT_MSG; - dma_cache_sync(hostdata->dev, hostdata->msgout, 1, DMA_TO_DEVICE); - script_patch_16(hostdata->dev, hostdata->script, MessageCount, - 1); + dma_sync_to_dev(hostdata, hostdata->msgout, 1); + script_patch_16(hostdata, hostdata->script, MessageCount, 1); /* SendMsgOut returns, so set up the return * address */ resume_offset = hostdata->pScript + Ent_SendMessageWithATN; @@ -928,7 +947,7 @@ process_message(struct Scsi_Host *host, struct NCR_700_Host_Parameters *hostdata } NCR_700_writel(temp, host, TEMP_REG); /* set us up to receive another message */ - dma_cache_sync(hostdata->dev, hostdata->msgin, MSG_ARRAY_SIZE, DMA_FROM_DEVICE); + dma_sync_from_dev(hostdata, hostdata->msgin, MSG_ARRAY_SIZE); return resume_offset; }
@@ -1008,8 +1027,8 @@ process_script_interrupt(__u32 dsps, __u32 dsp, struct scsi_cmnd *SCp, slot->SG[1].ins = bS_to_host(SCRIPT_RETURN); slot->SG[1].pAddr = 0; slot->resume_offset = hostdata->pScript; - dma_cache_sync(hostdata->dev, slot->SG, sizeof(slot->SG[0])*2, DMA_TO_DEVICE); - dma_cache_sync(hostdata->dev, SCp->sense_buffer, SCSI_SENSE_BUFFERSIZE, DMA_FROM_DEVICE); + dma_sync_to_dev(hostdata, slot->SG, sizeof(slot->SG[0])*2); + dma_sync_from_dev(hostdata, SCp->sense_buffer, SCSI_SENSE_BUFFERSIZE);
/* queue the command for reissue */ slot->state = NCR_700_SLOT_QUEUED; @@ -1129,11 +1148,11 @@ process_script_interrupt(__u32 dsps, __u32 dsp, struct scsi_cmnd *SCp, hostdata->cmd = slot->cmnd;
/* re-patch for this command */ - script_patch_32_abs(hostdata->dev, hostdata->script, + script_patch_32_abs(hostdata, hostdata->script, CommandAddress, slot->pCmd); - script_patch_16(hostdata->dev, hostdata->script, + script_patch_16(hostdata, hostdata->script, CommandCount, slot->cmnd->cmd_len); - script_patch_32_abs(hostdata->dev, hostdata->script, + script_patch_32_abs(hostdata, hostdata->script, SGScriptStartAddress, to32bit(&slot->pSG[0].ins));
@@ -1144,14 +1163,14 @@ process_script_interrupt(__u32 dsps, __u32 dsp, struct scsi_cmnd *SCp, * should therefore always clear ACK */ NCR_700_writeb(NCR_700_get_SXFER(hostdata->cmd->device), host, SXFER_REG); - dma_cache_sync(hostdata->dev, hostdata->msgin, - MSG_ARRAY_SIZE, DMA_FROM_DEVICE); - dma_cache_sync(hostdata->dev, hostdata->msgout, - MSG_ARRAY_SIZE, DMA_TO_DEVICE); + dma_sync_from_dev(hostdata, hostdata->msgin, + MSG_ARRAY_SIZE); + dma_sync_to_dev(hostdata, hostdata->msgout, + MSG_ARRAY_SIZE); /* I'm just being paranoid here, the command should * already have been flushed from the cache */ - dma_cache_sync(hostdata->dev, slot->cmnd->cmnd, - slot->cmnd->cmd_len, DMA_TO_DEVICE); + dma_sync_to_dev(hostdata, slot->cmnd->cmnd, + slot->cmnd->cmd_len);
@@ -1214,8 +1233,7 @@ process_script_interrupt(__u32 dsps, __u32 dsp, struct scsi_cmnd *SCp, hostdata->reselection_id = reselection_id; /* just in case we have a stale simple tag message, clear it */ hostdata->msgin[1] = 0; - dma_cache_sync(hostdata->dev, hostdata->msgin, - MSG_ARRAY_SIZE, DMA_BIDIRECTIONAL); + dma_sync_to_dev(hostdata, hostdata->msgin, MSG_ARRAY_SIZE); if(hostdata->tag_negotiated & (1<<reselection_id)) { resume_offset = hostdata->pScript + Ent_GetReselectionWithTag; } else { @@ -1329,8 +1347,7 @@ process_selection(struct Scsi_Host *host, __u32 dsp) hostdata->cmd = NULL; /* clear any stale simple tag message */ hostdata->msgin[1] = 0; - dma_cache_sync(hostdata->dev, hostdata->msgin, MSG_ARRAY_SIZE, - DMA_BIDIRECTIONAL); + dma_sync_to_dev(hostdata, hostdata->msgin, MSG_ARRAY_SIZE);
if(id == 0xff) { /* Selected as target, Ignore */ @@ -1427,30 +1444,26 @@ NCR_700_start_command(struct scsi_cmnd *SCp) NCR_700_set_flag(SCp->device, NCR_700_DEV_BEGIN_SYNC_NEGOTIATION); }
- script_patch_16(hostdata->dev, hostdata->script, MessageCount, count); - + script_patch_16(hostdata, hostdata->script, MessageCount, count);
- script_patch_ID(hostdata->dev, hostdata->script, - Device_ID, 1<<scmd_id(SCp)); + script_patch_ID(hostdata, hostdata->script, Device_ID, 1<<scmd_id(SCp));
- script_patch_32_abs(hostdata->dev, hostdata->script, CommandAddress, + script_patch_32_abs(hostdata, hostdata->script, CommandAddress, slot->pCmd); - script_patch_16(hostdata->dev, hostdata->script, CommandCount, - SCp->cmd_len); + script_patch_16(hostdata, hostdata->script, CommandCount, SCp->cmd_len); /* finally plumb the beginning of the SG list into the script * */ - script_patch_32_abs(hostdata->dev, hostdata->script, + script_patch_32_abs(hostdata, hostdata->script, SGScriptStartAddress, to32bit(&slot->pSG[0].ins)); NCR_700_clear_fifo(SCp->device->host);
if(slot->resume_offset == 0) slot->resume_offset = hostdata->pScript; /* now perform all the writebacks and invalidates */ - dma_cache_sync(hostdata->dev, hostdata->msgout, count, DMA_TO_DEVICE); - dma_cache_sync(hostdata->dev, hostdata->msgin, MSG_ARRAY_SIZE, - DMA_FROM_DEVICE); - dma_cache_sync(hostdata->dev, SCp->cmnd, SCp->cmd_len, DMA_TO_DEVICE); - dma_cache_sync(hostdata->dev, hostdata->status, 1, DMA_FROM_DEVICE); + dma_sync_to_dev(hostdata, hostdata->msgout, count); + dma_sync_from_dev(hostdata, hostdata->msgin, MSG_ARRAY_SIZE); + dma_sync_to_dev(hostdata, SCp->cmnd, SCp->cmd_len); + dma_sync_from_dev(hostdata, hostdata->status, 1);
/* set the synchronous period/offset */ NCR_700_writeb(NCR_700_get_SXFER(SCp->device), @@ -1626,7 +1639,7 @@ NCR_700_intr(int irq, void *dev_id) slot->SG[i].ins = bS_to_host(SCRIPT_NOP); slot->SG[i].pAddr = 0; } - dma_cache_sync(hostdata->dev, slot->SG, sizeof(slot->SG), DMA_TO_DEVICE); + dma_sync_to_dev(hostdata, slot->SG, sizeof(slot->SG)); /* and pretend we disconnected after * the command phase */ resume_offset = hostdata->pScript + Ent_MsgInDuringData; @@ -1878,7 +1891,7 @@ NCR_700_queuecommand_lck(struct scsi_cmnd *SCp, void (*done)(struct scsi_cmnd *) } slot->SG[i].ins = bS_to_host(SCRIPT_RETURN); slot->SG[i].pAddr = 0; - dma_cache_sync(hostdata->dev, slot->SG, sizeof(slot->SG), DMA_TO_DEVICE); + dma_sync_to_dev(hostdata, slot->SG, sizeof(slot->SG)); DEBUG((" SETTING %p to %x\n", (&slot->pSG[i].ins), slot->SG[i].ins)); diff --git a/drivers/scsi/53c700.h b/drivers/scsi/53c700.h index 05fe439b66afe5..c9f8c497babb3d 100644 --- a/drivers/scsi/53c700.h +++ b/drivers/scsi/53c700.h @@ -209,6 +209,7 @@ struct NCR_700_Host_Parameters { #endif __u32 chip710:1; /* set if really a 710 not 700 */ __u32 burst_length:4; /* set to 0 to disable 710 bursting */ + __u32 noncoherent:1; /* needs to use non-coherent DMA */
/* NOTHING BELOW HERE NEEDS ALTERING */ __u32 fast:1; /* if we can alter the SCSI bus clock @@ -422,33 +423,33 @@ struct NCR_700_Host_Parameters { #define NCR_710_MIN_XFERP 0 #define NCR_700_MIN_PERIOD 25 /* for SDTR message, 100ns */
-#define script_patch_32(dev, script, symbol, value) \ +#define script_patch_32(h, script, symbol, value) \ { \ int i; \ dma_addr_t da = value; \ for(i=0; i< (sizeof(A_##symbol##_used) / sizeof(__u32)); i++) { \ __u32 val = bS_to_cpu((script)[A_##symbol##_used[i]]) + da; \ (script)[A_##symbol##_used[i]] = bS_to_host(val); \ - dma_cache_sync((dev), &(script)[A_##symbol##_used[i]], 4, DMA_TO_DEVICE); \ + dma_sync_to_dev((h), &(script)[A_##symbol##_used[i]], 4); \ DEBUG((" script, patching %s at %d to %pad\n", \ #symbol, A_##symbol##_used[i], &da)); \ } \ }
-#define script_patch_32_abs(dev, script, symbol, value) \ +#define script_patch_32_abs(h, script, symbol, value) \ { \ int i; \ dma_addr_t da = value; \ for(i=0; i< (sizeof(A_##symbol##_used) / sizeof(__u32)); i++) { \ (script)[A_##symbol##_used[i]] = bS_to_host(da); \ - dma_cache_sync((dev), &(script)[A_##symbol##_used[i]], 4, DMA_TO_DEVICE); \ + dma_sync_to_dev((h), &(script)[A_##symbol##_used[i]], 4); \ DEBUG((" script, patching %s at %d to %pad\n", \ #symbol, A_##symbol##_used[i], &da)); \ } \ }
/* Used for patching the SCSI ID in the SELECT instruction */ -#define script_patch_ID(dev, script, symbol, value) \ +#define script_patch_ID(h, script, symbol, value) \ { \ int i; \ for(i=0; i< (sizeof(A_##symbol##_used) / sizeof(__u32)); i++) { \ @@ -456,13 +457,13 @@ struct NCR_700_Host_Parameters { val &= 0xff00ffff; \ val |= ((value) & 0xff) << 16; \ (script)[A_##symbol##_used[i]] = bS_to_host(val); \ - dma_cache_sync((dev), &(script)[A_##symbol##_used[i]], 4, DMA_TO_DEVICE); \ + dma_sync_to_dev((h), &(script)[A_##symbol##_used[i]], 4); \ DEBUG((" script, patching ID field %s at %d to 0x%x\n", \ #symbol, A_##symbol##_used[i], val)); \ } \ }
-#define script_patch_16(dev, script, symbol, value) \ +#define script_patch_16(h, script, symbol, value) \ { \ int i; \ for(i=0; i< (sizeof(A_##symbol##_used) / sizeof(__u32)); i++) { \ @@ -470,7 +471,7 @@ struct NCR_700_Host_Parameters { val &= 0xffff0000; \ val |= ((value) & 0xffff); \ (script)[A_##symbol##_used[i]] = bS_to_host(val); \ - dma_cache_sync((dev), &(script)[A_##symbol##_used[i]], 4, DMA_TO_DEVICE); \ + dma_sync_to_dev((h), &(script)[A_##symbol##_used[i]], 4); \ DEBUG((" script, patching short field %s at %d to 0x%x\n", \ #symbol, A_##symbol##_used[i], val)); \ } \
On Tue, Sep 15, 2020 at 05:51:11PM +0200, Christoph Hellwig wrote:
Switch the 53c700 driver to only use non-coherent descriptor memory if it really has to because dma_alloc_coherent fails. This doesn't matter for any of the platforms it runs on currently, but that will change soon.
To help with this two new helpers to transfer ownership to and from the device are added that abstract the syncing of the non-coherent memory. The two current bidirectional cases are mapped to transfers to the device, as that appears to what they are used for. Note that for parisc, which is the only architecture this driver needs to use non-coherent memory on, the direction argument of dma_cache_sync is ignored, so this will not change behavior in any way.
Signed-off-by: Christoph Hellwig hch@lst.de
drivers/scsi/53c700.c | 113 +++++++++++++++++++++++------------------- drivers/scsi/53c700.h | 17 ++++--- 2 files changed, 72 insertions(+), 58 deletions(-)
Tested-by: Thomas Bogendoerfer tsbogend@alpha.franken.de
Add a new API to allocate and free memory that is guaranteed to be addressable by a device, but which potentially is not cache coherent for DMA.
To transfer ownership to and from the device, the existing streaming DMA API calls dma_sync_single_for_device and dma_sync_single_for_cpu must be used.
For now the new calls are implemented on top of dma_alloc_attrs just like the old-noncoherent API, but once all drivers are switched to the new API it will be replaced with a better working implementation that is available on all architectures.
Signed-off-by: Christoph Hellwig hch@lst.de --- Documentation/core-api/dma-api.rst | 75 ++++++++++++++---------------- include/linux/dma-mapping.h | 12 +++++ 2 files changed, 48 insertions(+), 39 deletions(-)
diff --git a/Documentation/core-api/dma-api.rst b/Documentation/core-api/dma-api.rst index 90239348b30f6f..ea0413276ddb70 100644 --- a/Documentation/core-api/dma-api.rst +++ b/Documentation/core-api/dma-api.rst @@ -516,48 +516,56 @@ routines, e.g.::: }
-Part II - Advanced dma usage ----------------------------- +Part II - Non-coherent DMA allocations +--------------------------------------
-Warning: These pieces of the DMA API should not be used in the -majority of cases, since they cater for unlikely corner cases that -don't belong in usual drivers. +These APIs allow to allocate pages in the kernel direct mapping that are +guaranteed to be DMA addressable. This means that unlike dma_alloc_coherent, +virt_to_page can be called on the resulting address, and the resulting +struct page can be used for everything a struct page is suitable for.
-If you don't understand how cache line coherency works between a -processor and an I/O device, you should not be using this part of the -API at all. +If you don't understand how cache line coherency works between a processor and +an I/O device, you should not be using this part of the API.
::
void * - dma_alloc_attrs(struct device *dev, size_t size, dma_addr_t *dma_handle, - gfp_t flag, unsigned long attrs) + dma_alloc_noncoherent(struct device *dev, size_t size, + dma_addr_t *dma_handle, enum dma_data_direction dir, + gfp_t gfp)
-Identical to dma_alloc_coherent() except that when the -DMA_ATTR_NON_CONSISTENT flags is passed in the attrs argument, the -platform will choose to return either consistent or non-consistent memory -as it sees fit. By using this API, you are guaranteeing to the platform -that you have all the correct and necessary sync points for this memory -in the driver should it choose to return non-consistent memory. +This routine allocates a region of <size> bytes of consistent memory. It +returns a pointer to the allocated region (in the processor's virtual address +space) or NULL if the allocation failed. The returned memory may or may not +be in the kernels direct mapping. Drivers must not call virt_to_page on +the returned memory region.
-Note: where the platform can return consistent memory, it will -guarantee that the sync points become nops. +It also returns a <dma_handle> which may be cast to an unsigned integer the +same width as the bus and given to the device as the DMA address base of +the region.
-Warning: Handling non-consistent memory is a real pain. You should -only use this API if you positively know your driver will be -required to work on one of the rare (usually non-PCI) architectures -that simply cannot make consistent memory. +The dir parameter specified if data is read and/or written by the device, +see dma_map_single() for details. + +The gfp parameter allows the caller to specify the ``GFP_`` flags (see +kmalloc()) for the allocation, but rejects flags used to specify a memory +zone such as GFP_DMA or GFP_HIGHMEM. + +Before giving the memory to the device, dma_sync_single_for_device() needs +to be called, and before reading memory written by the device, +dma_sync_single_for_cpu(), just like for streaming DMA mappings that are +reused.
::
void - dma_free_attrs(struct device *dev, size_t size, void *cpu_addr, - dma_addr_t dma_handle, unsigned long attrs) + dma_free_noncoherent(struct device *dev, size_t size, void *cpu_addr, + dma_addr_t dma_handle, enum dma_data_direction dir)
-Free memory allocated by the dma_alloc_attrs(). All common -parameters must be identical to those otherwise passed to dma_free_coherent, -and the attrs argument must be identical to the attrs passed to -dma_alloc_attrs(). +Free a region of memory previously allocated using dma_alloc_noncoherent(). +dev, size and dma_handle and dir must all be the same as those passed into +dma_alloc_noncoherent(). cpu_addr must be the virtual address returned by +the dma_alloc_noncoherent().
::
@@ -575,17 +583,6 @@ memory or doing partial flushes. into the width returned by this call. It will also always be a power of two for easy alignment.
-:: - - void - dma_cache_sync(struct device *dev, void *vaddr, size_t size, - enum dma_data_direction direction) - -Do a partial sync of memory that was allocated by dma_alloc_attrs() with -the DMA_ATTR_NON_CONSISTENT flag starting at virtual address vaddr and -continuing on for size. Again, you *must* observe the cache line -boundaries when doing this. -
Part III - Debug drivers use of the DMA-API ------------------------------------------- diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h index df0bff2ea750e0..4e1de194b45cbf 100644 --- a/include/linux/dma-mapping.h +++ b/include/linux/dma-mapping.h @@ -389,6 +389,18 @@ static inline unsigned long dma_get_merge_boundary(struct device *dev) } #endif /* CONFIG_HAS_DMA */
+static inline void *dma_alloc_noncoherent(struct device *dev, size_t size, + dma_addr_t *dma_handle, enum dma_data_direction dir, gfp_t gfp) +{ + return dma_alloc_attrs(dev, size, dma_handle, gfp, + DMA_ATTR_NON_CONSISTENT); +} +static inline void dma_free_noncoherent(struct device *dev, size_t size, + void *vaddr, dma_addr_t dma_handle, enum dma_data_direction dir) +{ + dma_free_attrs(dev, size, vaddr, dma_handle, DMA_ATTR_NON_CONSISTENT); +} + static inline dma_addr_t dma_map_single_attrs(struct device *dev, void *ptr, size_t size, enum dma_data_direction dir, unsigned long attrs) {
On 2020-09-15 16:51, Christoph Hellwig wrote: [...]
+These APIs allow to allocate pages in the kernel direct mapping that are +guaranteed to be DMA addressable. This means that unlike dma_alloc_coherent, +virt_to_page can be called on the resulting address, and the resulting
Nit: if we explicitly describe this as if it's a guarantee that can be relied upon...
+struct page can be used for everything a struct page is suitable for.
[...]
+This routine allocates a region of <size> bytes of consistent memory. It +returns a pointer to the allocated region (in the processor's virtual address +space) or NULL if the allocation failed. The returned memory may or may not +be in the kernels direct mapping. Drivers must not call virt_to_page on +the returned memory region.
...then forbid this document's target audience from relying on it, something seems off. At the very least it's unhelpfully unclear :/
Given patch #17, I suspect that the first paragraph is the one that's no longer true.
Robin.
On Fri, Sep 25, 2020 at 12:15:37PM +0100, Robin Murphy wrote:
On 2020-09-15 16:51, Christoph Hellwig wrote: [...]
+These APIs allow to allocate pages in the kernel direct mapping that are +guaranteed to be DMA addressable. This means that unlike dma_alloc_coherent, +virt_to_page can be called on the resulting address, and the resulting
Nit: if we explicitly describe this as if it's a guarantee that can be relied upon...
+struct page can be used for everything a struct page is suitable for.
[...]
+This routine allocates a region of <size> bytes of consistent memory. It +returns a pointer to the allocated region (in the processor's virtual address +space) or NULL if the allocation failed. The returned memory may or may not +be in the kernels direct mapping. Drivers must not call virt_to_page on +the returned memory region.
...then forbid this document's target audience from relying on it, something seems off. At the very least it's unhelpfully unclear :/
Given patch #17, I suspect that the first paragraph is the one that's no longer true.
Yes. dma_alloc_pages is the replacement for allocations that need the direct mapping. I'll send a patch to document dma_alloc_pages and fixes this up
Use the new non-coherent DMA API including proper ownership transfers. This also means we can allocate the memory as DMA_TO_DEVICE instead of bidirectional.
Signed-off-by: Christoph Hellwig hch@lst.de --- drivers/scsi/sgiwd93.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/drivers/scsi/sgiwd93.c b/drivers/scsi/sgiwd93.c index 3bdf0deb8f1529..cf1030c9dda17f 100644 --- a/drivers/scsi/sgiwd93.c +++ b/drivers/scsi/sgiwd93.c @@ -95,7 +95,7 @@ void fill_hpc_entries(struct ip22_hostdata *hd, struct scsi_cmnd *cmd, int din) */ hcp->desc.pbuf = 0; hcp->desc.cntinfo = HPCDMA_EOX; - dma_cache_sync(hd->dev, hd->cpu, + dma_sync_single_for_device(hd->dev, hd->dma, (unsigned long)(hcp + 1) - (unsigned long)hd->cpu, DMA_TO_DEVICE); } @@ -234,8 +234,8 @@ static int sgiwd93_probe(struct platform_device *pdev)
hdata = host_to_hostdata(host); hdata->dev = &pdev->dev; - hdata->cpu = dma_alloc_attrs(&pdev->dev, HPC_DMA_SIZE, &hdata->dma, - GFP_KERNEL, DMA_ATTR_NON_CONSISTENT); + hdata->cpu = dma_alloc_noncoherent(&pdev->dev, HPC_DMA_SIZE, + &hdata->dma, DMA_TO_DEVICE, GFP_KERNEL); if (!hdata->cpu) { printk(KERN_WARNING "sgiwd93: Could not allocate memory for " "host %d buffer.\n", unit); @@ -274,8 +274,8 @@ static int sgiwd93_probe(struct platform_device *pdev) out_irq: free_irq(irq, host); out_free: - dma_free_attrs(&pdev->dev, HPC_DMA_SIZE, hdata->cpu, hdata->dma, - DMA_ATTR_NON_CONSISTENT); + dma_free_noncoherent(&pdev->dev, HPC_DMA_SIZE, hdata->cpu, hdata->dma, + DMA_TO_DEVICE); out_put: scsi_host_put(host); out: @@ -291,8 +291,8 @@ static int sgiwd93_remove(struct platform_device *pdev)
scsi_remove_host(host); free_irq(pd->irq, host); - dma_free_attrs(&pdev->dev, HPC_DMA_SIZE, hdata->cpu, hdata->dma, - DMA_ATTR_NON_CONSISTENT); + dma_free_noncoherent(&pdev->dev, HPC_DMA_SIZE, hdata->cpu, hdata->dma, + DMA_TO_DEVICE); scsi_host_put(host); return 0; }
On Tue, Sep 15, 2020 at 05:51:13PM +0200, Christoph Hellwig wrote:
Use the new non-coherent DMA API including proper ownership transfers. This also means we can allocate the memory as DMA_TO_DEVICE instead of bidirectional.
Signed-off-by: Christoph Hellwig hch@lst.de
drivers/scsi/sgiwd93.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-)
Tested-by: Thomas Bogendoerfer tsbogend@alpha.franken.de
Use the new non-coherent DMA API including proper ownership transfers. This also means we can allocate the buffer memory with the proper direction instead of bidirectional.
Signed-off-by: Christoph Hellwig hch@lst.de --- sound/mips/hal2.c | 58 ++++++++++++++++++++++------------------------- 1 file changed, 27 insertions(+), 31 deletions(-)
diff --git a/sound/mips/hal2.c b/sound/mips/hal2.c index ec84bc4c3a6e77..9ac9b58d7c8cdd 100644 --- a/sound/mips/hal2.c +++ b/sound/mips/hal2.c @@ -441,7 +441,8 @@ static inline void hal2_stop_adc(struct snd_hal2 *hal2) hal2->adc.pbus.pbus->pbdma_ctrl = HPC3_PDMACTRL_LD; }
-static int hal2_alloc_dmabuf(struct snd_hal2 *hal2, struct hal2_codec *codec) +static int hal2_alloc_dmabuf(struct snd_hal2 *hal2, struct hal2_codec *codec, + enum dma_data_direction buffer_dir) { struct device *dev = hal2->card->dev; struct hal2_desc *desc; @@ -449,15 +450,15 @@ static int hal2_alloc_dmabuf(struct snd_hal2 *hal2, struct hal2_codec *codec) int count = H2_BUF_SIZE / H2_BLOCK_SIZE; int i;
- codec->buffer = dma_alloc_attrs(dev, H2_BUF_SIZE, &buffer_dma, - GFP_KERNEL, DMA_ATTR_NON_CONSISTENT); + codec->buffer = dma_alloc_noncoherent(dev, H2_BUF_SIZE, &buffer_dma, + buffer_dir, GFP_KERNEL); if (!codec->buffer) return -ENOMEM; - desc = dma_alloc_attrs(dev, count * sizeof(struct hal2_desc), - &desc_dma, GFP_KERNEL, DMA_ATTR_NON_CONSISTENT); + desc = dma_alloc_noncoherent(dev, count * sizeof(struct hal2_desc), + &desc_dma, DMA_BIDIRECTIONAL, GFP_KERNEL); if (!desc) { - dma_free_attrs(dev, H2_BUF_SIZE, codec->buffer, buffer_dma, - DMA_ATTR_NON_CONSISTENT); + dma_free_noncoherent(dev, H2_BUF_SIZE, codec->buffer, buffer_dma, + buffer_dir); return -ENOMEM; } codec->buffer_dma = buffer_dma; @@ -470,20 +471,22 @@ static int hal2_alloc_dmabuf(struct snd_hal2 *hal2, struct hal2_codec *codec) desc_dma : desc_dma + (i + 1) * sizeof(struct hal2_desc); desc++; } - dma_cache_sync(dev, codec->desc, count * sizeof(struct hal2_desc), - DMA_TO_DEVICE); + dma_sync_single_for_device(dev, codec->desc_dma, + count * sizeof(struct hal2_desc), + DMA_BIDIRECTIONAL); codec->desc_count = count; return 0; }
-static void hal2_free_dmabuf(struct snd_hal2 *hal2, struct hal2_codec *codec) +static void hal2_free_dmabuf(struct snd_hal2 *hal2, struct hal2_codec *codec, + enum dma_data_direction buffer_dir) { struct device *dev = hal2->card->dev;
- dma_free_attrs(dev, codec->desc_count * sizeof(struct hal2_desc), - codec->desc, codec->desc_dma, DMA_ATTR_NON_CONSISTENT); - dma_free_attrs(dev, H2_BUF_SIZE, codec->buffer, codec->buffer_dma, - DMA_ATTR_NON_CONSISTENT); + dma_free_noncoherent(dev, codec->desc_count * sizeof(struct hal2_desc), + codec->desc, codec->desc_dma, DMA_BIDIRECTIONAL); + dma_free_noncoherent(dev, H2_BUF_SIZE, codec->buffer, codec->buffer_dma, + buffer_dir); }
static const struct snd_pcm_hardware hal2_pcm_hw = { @@ -509,21 +512,16 @@ static int hal2_playback_open(struct snd_pcm_substream *substream) { struct snd_pcm_runtime *runtime = substream->runtime; struct snd_hal2 *hal2 = snd_pcm_substream_chip(substream); - int err;
runtime->hw = hal2_pcm_hw; - - err = hal2_alloc_dmabuf(hal2, &hal2->dac); - if (err) - return err; - return 0; + return hal2_alloc_dmabuf(hal2, &hal2->dac, DMA_TO_DEVICE); }
static int hal2_playback_close(struct snd_pcm_substream *substream) { struct snd_hal2 *hal2 = snd_pcm_substream_chip(substream);
- hal2_free_dmabuf(hal2, &hal2->dac); + hal2_free_dmabuf(hal2, &hal2->dac, DMA_TO_DEVICE); return 0; }
@@ -579,7 +577,9 @@ static void hal2_playback_transfer(struct snd_pcm_substream *substream, unsigned char *buf = hal2->dac.buffer + rec->hw_data;
memcpy(buf, substream->runtime->dma_area + rec->sw_data, bytes); - dma_cache_sync(hal2->card->dev, buf, bytes, DMA_TO_DEVICE); + dma_sync_single_for_device(hal2->card->dev, + hal2->dac.buffer_dma + rec->hw_data, bytes, + DMA_TO_DEVICE);
}
@@ -597,22 +597,16 @@ static int hal2_capture_open(struct snd_pcm_substream *substream) { struct snd_pcm_runtime *runtime = substream->runtime; struct snd_hal2 *hal2 = snd_pcm_substream_chip(substream); - struct hal2_codec *adc = &hal2->adc; - int err;
runtime->hw = hal2_pcm_hw; - - err = hal2_alloc_dmabuf(hal2, adc); - if (err) - return err; - return 0; + return hal2_alloc_dmabuf(hal2, &hal2->adc, DMA_FROM_DEVICE); }
static int hal2_capture_close(struct snd_pcm_substream *substream) { struct snd_hal2 *hal2 = snd_pcm_substream_chip(substream);
- hal2_free_dmabuf(hal2, &hal2->adc); + hal2_free_dmabuf(hal2, &hal2->adc, DMA_FROM_DEVICE); return 0; }
@@ -667,7 +661,9 @@ static void hal2_capture_transfer(struct snd_pcm_substream *substream, struct snd_hal2 *hal2 = snd_pcm_substream_chip(substream); unsigned char *buf = hal2->adc.buffer + rec->hw_data;
- dma_cache_sync(hal2->card->dev, buf, bytes, DMA_FROM_DEVICE); + dma_sync_single_for_cpu(hal2->card->dev, + hal2->adc.buffer_dma + rec->hw_data, bytes, + DMA_FROM_DEVICE); memcpy(substream->runtime->dma_area + rec->sw_data, buf, bytes); }
On Tue, Sep 15, 2020 at 05:51:14PM +0200, Christoph Hellwig wrote:
Use the new non-coherent DMA API including proper ownership transfers. This also means we can allocate the buffer memory with the proper direction instead of bidirectional.
Signed-off-by: Christoph Hellwig hch@lst.de
sound/mips/hal2.c | 58 ++++++++++++++++++++++------------------------- 1 file changed, 27 insertions(+), 31 deletions(-)
Tested-by: Thomas Bogendoerfer tsbogend@alpha.franken.de
Use the new non-coherent DMA API including proper ownership transfers. This includes moving the DMA helpers to lib82596 based of an ifdef to avoid include order problems.
Signed-off-by: Christoph Hellwig hch@lst.de --- drivers/net/ethernet/i825xx/lasi_82596.c | 25 ++--- drivers/net/ethernet/i825xx/lib82596.c | 114 ++++++++++++++--------- drivers/net/ethernet/i825xx/sni_82596.c | 4 - 3 files changed, 80 insertions(+), 63 deletions(-)
diff --git a/drivers/net/ethernet/i825xx/lasi_82596.c b/drivers/net/ethernet/i825xx/lasi_82596.c index a12218e940a2fa..96c6f4f36904ed 100644 --- a/drivers/net/ethernet/i825xx/lasi_82596.c +++ b/drivers/net/ethernet/i825xx/lasi_82596.c @@ -96,21 +96,14 @@
#define OPT_SWAP_PORT 0x0001 /* Need to wordswp on the MPU port */
-#define DMA_WBACK(ndev, addr, len) \ - do { dma_cache_sync((ndev)->dev.parent, (void *)addr, len, DMA_TO_DEVICE); } while (0) - -#define DMA_INV(ndev, addr, len) \ - do { dma_cache_sync((ndev)->dev.parent, (void *)addr, len, DMA_FROM_DEVICE); } while (0) - -#define DMA_WBACK_INV(ndev, addr, len) \ - do { dma_cache_sync((ndev)->dev.parent, (void *)addr, len, DMA_BIDIRECTIONAL); } while (0) - #define SYSBUS 0x0000006c
/* big endian CPU, 82596 "big" endian mode */ #define SWAP32(x) (((u32)(x)<<16) | ((((u32)(x)))>>16)) #define SWAP16(x) (x)
+#define NONCOHERENT_DMA 1 + #include "lib82596.c"
MODULE_AUTHOR("Richard Hirst"); @@ -184,9 +177,9 @@ lan_init_chip(struct parisc_device *dev)
lp = netdev_priv(netdevice); lp->options = dev->id.sversion == 0x72 ? OPT_SWAP_PORT : 0; - lp->dma = dma_alloc_attrs(&dev->dev, sizeof(struct i596_dma), - &lp->dma_addr, GFP_KERNEL, - DMA_ATTR_NON_CONSISTENT); + lp->dma = dma_alloc_noncoherent(&dev->dev, + sizeof(struct i596_dma), &lp->dma_addr, + DMA_BIDIRECTIONAL, GFP_KERNEL); if (!lp->dma) goto out_free_netdev;
@@ -196,8 +189,8 @@ lan_init_chip(struct parisc_device *dev) return 0;
out_free_dma: - dma_free_attrs(&dev->dev, sizeof(struct i596_dma), lp->dma, - lp->dma_addr, DMA_ATTR_NON_CONSISTENT); + dma_free_noncoherent(&dev->dev, sizeof(struct i596_dma), + lp->dma, lp->dma_addr, DMA_BIDIRECTIONAL); out_free_netdev: free_netdev(netdevice); return retval; @@ -209,8 +202,8 @@ static int __exit lan_remove_chip(struct parisc_device *pdev) struct i596_private *lp = netdev_priv(dev);
unregister_netdev (dev); - dma_free_attrs(&pdev->dev, sizeof(struct i596_private), lp->dma, - lp->dma_addr, DMA_ATTR_NON_CONSISTENT); + dma_free_noncoherent(&pdev->dev, sizeof(struct i596_private), lp->dma, + lp->dma_addr, DMA_BIDIRECTIONAL); free_netdev (dev); return 0; } diff --git a/drivers/net/ethernet/i825xx/lib82596.c b/drivers/net/ethernet/i825xx/lib82596.c index b4e4b3eb5758b5..ca2fb303fcc6f6 100644 --- a/drivers/net/ethernet/i825xx/lib82596.c +++ b/drivers/net/ethernet/i825xx/lib82596.c @@ -365,13 +365,44 @@ static int max_cmd_backlog = TX_RING_SIZE-1; static void i596_poll_controller(struct net_device *dev); #endif
+static inline dma_addr_t virt_to_dma(struct i596_private *lp, volatile void *v) +{ + return lp->dma_addr + ((unsigned long)v - (unsigned long)lp->dma); +} + +#ifdef NONCOHERENT_DMA +static inline void dma_sync_dev(struct net_device *ndev, volatile void *addr, + size_t len) +{ + dma_sync_single_for_device(ndev->dev.parent, + virt_to_dma(netdev_priv(ndev), addr), len, + DMA_BIDIRECTIONAL); +} + +static inline void dma_sync_cpu(struct net_device *ndev, volatile void *addr, + size_t len) +{ + dma_sync_single_for_cpu(ndev->dev.parent, + virt_to_dma(netdev_priv(ndev), addr), len, + DMA_BIDIRECTIONAL); +} +#else +static inline void dma_sync_dev(struct net_device *ndev, volatile void *addr, + size_t len) +{ +} +static inline void dma_sync_cpu(struct net_device *ndev, volatile void *addr, + size_t len) +{ +} +#endif /* NONCOHERENT_DMA */
static inline int wait_istat(struct net_device *dev, struct i596_dma *dma, int delcnt, char *str) { - DMA_INV(dev, &(dma->iscp), sizeof(struct i596_iscp)); + dma_sync_cpu(dev, &(dma->iscp), sizeof(struct i596_iscp)); while (--delcnt && dma->iscp.stat) { udelay(10); - DMA_INV(dev, &(dma->iscp), sizeof(struct i596_iscp)); + dma_sync_cpu(dev, &(dma->iscp), sizeof(struct i596_iscp)); } if (!delcnt) { printk(KERN_ERR "%s: %s, iscp.stat %04x, didn't clear\n", @@ -384,10 +415,10 @@ static inline int wait_istat(struct net_device *dev, struct i596_dma *dma, int d
static inline int wait_cmd(struct net_device *dev, struct i596_dma *dma, int delcnt, char *str) { - DMA_INV(dev, &(dma->scb), sizeof(struct i596_scb)); + dma_sync_cpu(dev, &(dma->scb), sizeof(struct i596_scb)); while (--delcnt && dma->scb.command) { udelay(10); - DMA_INV(dev, &(dma->scb), sizeof(struct i596_scb)); + dma_sync_cpu(dev, &(dma->scb), sizeof(struct i596_scb)); } if (!delcnt) { printk(KERN_ERR "%s: %s, status %4.4x, cmd %4.4x.\n", @@ -451,12 +482,9 @@ static void i596_display_data(struct net_device *dev) SWAP32(rbd->b_data), SWAP16(rbd->size)); rbd = rbd->v_next; } while (rbd != lp->rbd_head); - DMA_INV(dev, dma, sizeof(struct i596_dma)); + dma_sync_cpu(dev, dma, sizeof(struct i596_dma)); }
- -#define virt_to_dma(lp, v) ((lp)->dma_addr + (dma_addr_t)((unsigned long)(v)-(unsigned long)((lp)->dma))) - static inline int init_rx_bufs(struct net_device *dev) { struct i596_private *lp = netdev_priv(dev); @@ -508,7 +536,7 @@ static inline int init_rx_bufs(struct net_device *dev) rfd->b_next = SWAP32(virt_to_dma(lp, dma->rfds)); rfd->cmd = SWAP16(CMD_EOL|CMD_FLEX);
- DMA_WBACK_INV(dev, dma, sizeof(struct i596_dma)); + dma_sync_dev(dev, dma, sizeof(struct i596_dma)); return 0; }
@@ -547,7 +575,7 @@ static void rebuild_rx_bufs(struct net_device *dev) lp->rbd_head = dma->rbds; dma->rfds[0].rbd = SWAP32(virt_to_dma(lp, dma->rbds));
- DMA_WBACK_INV(dev, dma, sizeof(struct i596_dma)); + dma_sync_dev(dev, dma, sizeof(struct i596_dma)); }
@@ -575,9 +603,9 @@ static int init_i596_mem(struct net_device *dev)
DEB(DEB_INIT, printk(KERN_DEBUG "%s: starting i82596.\n", dev->name));
- DMA_WBACK(dev, &(dma->scp), sizeof(struct i596_scp)); - DMA_WBACK(dev, &(dma->iscp), sizeof(struct i596_iscp)); - DMA_WBACK(dev, &(dma->scb), sizeof(struct i596_scb)); + dma_sync_dev(dev, &(dma->scp), sizeof(struct i596_scp)); + dma_sync_dev(dev, &(dma->iscp), sizeof(struct i596_iscp)); + dma_sync_dev(dev, &(dma->scb), sizeof(struct i596_scb));
mpu_port(dev, PORT_ALTSCP, virt_to_dma(lp, &dma->scp)); ca(dev); @@ -596,24 +624,24 @@ static int init_i596_mem(struct net_device *dev) rebuild_rx_bufs(dev);
dma->scb.command = 0; - DMA_WBACK(dev, &(dma->scb), sizeof(struct i596_scb)); + dma_sync_dev(dev, &(dma->scb), sizeof(struct i596_scb));
DEB(DEB_INIT, printk(KERN_DEBUG "%s: queuing CmdConfigure\n", dev->name)); memcpy(dma->cf_cmd.i596_config, init_setup, 14); dma->cf_cmd.cmd.command = SWAP16(CmdConfigure); - DMA_WBACK(dev, &(dma->cf_cmd), sizeof(struct cf_cmd)); + dma_sync_dev(dev, &(dma->cf_cmd), sizeof(struct cf_cmd)); i596_add_cmd(dev, &dma->cf_cmd.cmd);
DEB(DEB_INIT, printk(KERN_DEBUG "%s: queuing CmdSASetup\n", dev->name)); memcpy(dma->sa_cmd.eth_addr, dev->dev_addr, ETH_ALEN); dma->sa_cmd.cmd.command = SWAP16(CmdSASetup); - DMA_WBACK(dev, &(dma->sa_cmd), sizeof(struct sa_cmd)); + dma_sync_dev(dev, &(dma->sa_cmd), sizeof(struct sa_cmd)); i596_add_cmd(dev, &dma->sa_cmd.cmd);
DEB(DEB_INIT, printk(KERN_DEBUG "%s: queuing CmdTDR\n", dev->name)); dma->tdr_cmd.cmd.command = SWAP16(CmdTDR); - DMA_WBACK(dev, &(dma->tdr_cmd), sizeof(struct tdr_cmd)); + dma_sync_dev(dev, &(dma->tdr_cmd), sizeof(struct tdr_cmd)); i596_add_cmd(dev, &dma->tdr_cmd.cmd);
spin_lock_irqsave (&lp->lock, flags); @@ -625,7 +653,7 @@ static int init_i596_mem(struct net_device *dev) DEB(DEB_INIT, printk(KERN_DEBUG "%s: Issuing RX_START\n", dev->name)); dma->scb.command = SWAP16(RX_START); dma->scb.rfd = SWAP32(virt_to_dma(lp, dma->rfds)); - DMA_WBACK(dev, &(dma->scb), sizeof(struct i596_scb)); + dma_sync_dev(dev, &(dma->scb), sizeof(struct i596_scb));
ca(dev);
@@ -659,13 +687,13 @@ static inline int i596_rx(struct net_device *dev)
rfd = lp->rfd_head; /* Ref next frame to check */
- DMA_INV(dev, rfd, sizeof(struct i596_rfd)); + dma_sync_cpu(dev, rfd, sizeof(struct i596_rfd)); while (rfd->stat & SWAP16(STAT_C)) { /* Loop while complete frames */ if (rfd->rbd == I596_NULL) rbd = NULL; else if (rfd->rbd == lp->rbd_head->b_addr) { rbd = lp->rbd_head; - DMA_INV(dev, rbd, sizeof(struct i596_rbd)); + dma_sync_cpu(dev, rbd, sizeof(struct i596_rbd)); } else { printk(KERN_ERR "%s: rbd chain broken!\n", dev->name); /* XXX Now what? */ @@ -713,7 +741,7 @@ static inline int i596_rx(struct net_device *dev) DMA_FROM_DEVICE); rbd->v_data = newskb->data; rbd->b_data = SWAP32(dma_addr); - DMA_WBACK_INV(dev, rbd, sizeof(struct i596_rbd)); + dma_sync_dev(dev, rbd, sizeof(struct i596_rbd)); } else { skb = netdev_alloc_skb_ip_align(dev, pkt_len); } @@ -765,7 +793,7 @@ static inline int i596_rx(struct net_device *dev) if (rbd != NULL && (rbd->count & SWAP16(0x4000))) { rbd->count = 0; lp->rbd_head = rbd->v_next; - DMA_WBACK_INV(dev, rbd, sizeof(struct i596_rbd)); + dma_sync_dev(dev, rbd, sizeof(struct i596_rbd)); }
/* Tidy the frame descriptor, marking it as end of list */ @@ -779,14 +807,14 @@ static inline int i596_rx(struct net_device *dev)
lp->dma->scb.rfd = rfd->b_next; lp->rfd_head = rfd->v_next; - DMA_WBACK_INV(dev, rfd, sizeof(struct i596_rfd)); + dma_sync_dev(dev, rfd, sizeof(struct i596_rfd));
/* Remove end-of-list from old end descriptor */
rfd->v_prev->cmd = SWAP16(CMD_FLEX); - DMA_WBACK_INV(dev, rfd->v_prev, sizeof(struct i596_rfd)); + dma_sync_dev(dev, rfd->v_prev, sizeof(struct i596_rfd)); rfd = lp->rfd_head; - DMA_INV(dev, rfd, sizeof(struct i596_rfd)); + dma_sync_cpu(dev, rfd, sizeof(struct i596_rfd)); }
DEB(DEB_RXFRAME, printk(KERN_DEBUG "frames %d\n", frames)); @@ -827,12 +855,12 @@ static inline void i596_cleanup_cmd(struct net_device *dev, struct i596_private ptr->v_next = NULL; ptr->b_next = I596_NULL; } - DMA_WBACK_INV(dev, ptr, sizeof(struct i596_cmd)); + dma_sync_dev(dev, ptr, sizeof(struct i596_cmd)); }
wait_cmd(dev, lp->dma, 100, "i596_cleanup_cmd timed out"); lp->dma->scb.cmd = I596_NULL; - DMA_WBACK(dev, &(lp->dma->scb), sizeof(struct i596_scb)); + dma_sync_dev(dev, &(lp->dma->scb), sizeof(struct i596_scb)); }
@@ -850,7 +878,7 @@ static inline void i596_reset(struct net_device *dev, struct i596_private *lp)
/* FIXME: this command might cause an lpmc */ lp->dma->scb.command = SWAP16(CUC_ABORT | RX_ABORT); - DMA_WBACK(dev, &(lp->dma->scb), sizeof(struct i596_scb)); + dma_sync_dev(dev, &(lp->dma->scb), sizeof(struct i596_scb)); ca(dev);
/* wait for shutdown */ @@ -878,20 +906,20 @@ static void i596_add_cmd(struct net_device *dev, struct i596_cmd *cmd) cmd->command |= SWAP16(CMD_EOL | CMD_INTR); cmd->v_next = NULL; cmd->b_next = I596_NULL; - DMA_WBACK(dev, cmd, sizeof(struct i596_cmd)); + dma_sync_dev(dev, cmd, sizeof(struct i596_cmd));
spin_lock_irqsave (&lp->lock, flags);
if (lp->cmd_head != NULL) { lp->cmd_tail->v_next = cmd; lp->cmd_tail->b_next = SWAP32(virt_to_dma(lp, &cmd->status)); - DMA_WBACK(dev, lp->cmd_tail, sizeof(struct i596_cmd)); + dma_sync_dev(dev, lp->cmd_tail, sizeof(struct i596_cmd)); } else { lp->cmd_head = cmd; wait_cmd(dev, dma, 100, "i596_add_cmd timed out"); dma->scb.cmd = SWAP32(virt_to_dma(lp, &cmd->status)); dma->scb.command = SWAP16(CUC_START); - DMA_WBACK(dev, &(dma->scb), sizeof(struct i596_scb)); + dma_sync_dev(dev, &(dma->scb), sizeof(struct i596_scb)); ca(dev); } lp->cmd_tail = cmd; @@ -956,7 +984,7 @@ static void i596_tx_timeout (struct net_device *dev, unsigned int txqueue) /* Issue a channel attention signal */ DEB(DEB_ERRORS, printk(KERN_DEBUG "Kicking board.\n")); lp->dma->scb.command = SWAP16(CUC_START | RX_START); - DMA_WBACK_INV(dev, &(lp->dma->scb), sizeof(struct i596_scb)); + dma_sync_dev(dev, &(lp->dma->scb), sizeof(struct i596_scb)); ca (dev); lp->last_restart = dev->stats.tx_packets; } @@ -1014,8 +1042,8 @@ static netdev_tx_t i596_start_xmit(struct sk_buff *skb, struct net_device *dev) tbd->data = SWAP32(tx_cmd->dma_addr);
DEB(DEB_TXADDR, print_eth(skb->data, "tx-queued")); - DMA_WBACK_INV(dev, tx_cmd, sizeof(struct tx_cmd)); - DMA_WBACK_INV(dev, tbd, sizeof(struct i596_tbd)); + dma_sync_dev(dev, tx_cmd, sizeof(struct tx_cmd)); + dma_sync_dev(dev, tbd, sizeof(struct i596_tbd)); i596_add_cmd(dev, &tx_cmd->cmd);
dev->stats.tx_packets++; @@ -1071,7 +1099,7 @@ static int i82596_probe(struct net_device *dev) lp->dma->scb.rfd = I596_NULL; spin_lock_init(&lp->lock);
- DMA_WBACK_INV(dev, lp->dma, sizeof(struct i596_dma)); + dma_sync_dev(dev, lp->dma, sizeof(struct i596_dma));
ret = register_netdev(dev); if (ret) @@ -1141,7 +1169,7 @@ static irqreturn_t i596_interrupt(int irq, void *dev_id) dev->name, status & 0x0700));
while (lp->cmd_head != NULL) { - DMA_INV(dev, lp->cmd_head, sizeof(struct i596_cmd)); + dma_sync_cpu(dev, lp->cmd_head, sizeof(struct i596_cmd)); if (!(lp->cmd_head->status & SWAP16(STAT_C))) break;
@@ -1223,7 +1251,7 @@ static irqreturn_t i596_interrupt(int irq, void *dev_id) } ptr->v_next = NULL; ptr->b_next = I596_NULL; - DMA_WBACK(dev, ptr, sizeof(struct i596_cmd)); + dma_sync_dev(dev, ptr, sizeof(struct i596_cmd)); lp->last_cmd = jiffies; }
@@ -1237,13 +1265,13 @@ static irqreturn_t i596_interrupt(int irq, void *dev_id)
ptr->command &= SWAP16(0x1fff); ptr = ptr->v_next; - DMA_WBACK_INV(dev, prev, sizeof(struct i596_cmd)); + dma_sync_dev(dev, prev, sizeof(struct i596_cmd)); }
if (lp->cmd_head != NULL) ack_cmd |= CUC_START; dma->scb.cmd = SWAP32(virt_to_dma(lp, &lp->cmd_head->status)); - DMA_WBACK_INV(dev, &dma->scb, sizeof(struct i596_scb)); + dma_sync_dev(dev, &dma->scb, sizeof(struct i596_scb)); } if ((status & 0x1000) || (status & 0x4000)) { if ((status & 0x4000)) @@ -1268,7 +1296,7 @@ static irqreturn_t i596_interrupt(int irq, void *dev_id) } wait_cmd(dev, dma, 100, "i596 interrupt, timeout"); dma->scb.command = SWAP16(ack_cmd); - DMA_WBACK(dev, &dma->scb, sizeof(struct i596_scb)); + dma_sync_dev(dev, &dma->scb, sizeof(struct i596_scb));
/* DANGER: I suspect that some kind of interrupt acknowledgement aside from acking the 82596 might be needed @@ -1299,7 +1327,7 @@ static int i596_close(struct net_device *dev)
wait_cmd(dev, lp->dma, 100, "close1 timed out"); lp->dma->scb.command = SWAP16(CUC_ABORT | RX_ABORT); - DMA_WBACK(dev, &lp->dma->scb, sizeof(struct i596_scb)); + dma_sync_dev(dev, &lp->dma->scb, sizeof(struct i596_scb));
ca(dev);
@@ -1358,7 +1386,7 @@ static void set_multicast_list(struct net_device *dev) dev->name); else { dma->cf_cmd.cmd.command = SWAP16(CmdConfigure); - DMA_WBACK_INV(dev, &dma->cf_cmd, sizeof(struct cf_cmd)); + dma_sync_dev(dev, &dma->cf_cmd, sizeof(struct cf_cmd)); i596_add_cmd(dev, &dma->cf_cmd.cmd); } } @@ -1390,7 +1418,7 @@ static void set_multicast_list(struct net_device *dev) dev->name, cp)); cp += ETH_ALEN; } - DMA_WBACK_INV(dev, &dma->mc_cmd, sizeof(struct mc_cmd)); + dma_sync_dev(dev, &dma->mc_cmd, sizeof(struct mc_cmd)); i596_add_cmd(dev, &cmd->cmd); } } diff --git a/drivers/net/ethernet/i825xx/sni_82596.c b/drivers/net/ethernet/i825xx/sni_82596.c index 4b9ac0c6557731..27937c5d795673 100644 --- a/drivers/net/ethernet/i825xx/sni_82596.c +++ b/drivers/net/ethernet/i825xx/sni_82596.c @@ -24,10 +24,6 @@
static const char sni_82596_string[] = "snirm_82596";
-#define DMA_WBACK(priv, addr, len) do { } while (0) -#define DMA_INV(priv, addr, len) do { } while (0) -#define DMA_WBACK_INV(priv, addr, len) do { } while (0) - #define SYSBUS 0x00004400
/* big endian CPU, 82596 little endian */
On Tue, Sep 15, 2020 at 05:51:15PM +0200, Christoph Hellwig wrote:
Use the new non-coherent DMA API including proper ownership transfers. This includes moving the DMA helpers to lib82596 based of an ifdef to avoid include order problems.
Signed-off-by: Christoph Hellwig hch@lst.de
drivers/net/ethernet/i825xx/lasi_82596.c | 25 ++--- drivers/net/ethernet/i825xx/lib82596.c | 114 ++++++++++++++--------- drivers/net/ethernet/i825xx/sni_82596.c | 4 - 3 files changed, 80 insertions(+), 63 deletions(-)
Tested-by: Thomas Bogendoerfer tsbogend@alpha.franken.de (SNI part)
Use the new non-coherent DMA API including proper ownership transfers. This includes adding additional calls to dma_sync_desc_dev as the old syncing was rather ad-hoc.
Thanks to Thomas Bogendoerfer for debugging the ownership transfer issues.
Signed-off-by: Christoph Hellwig hch@lst.de --- drivers/net/ethernet/seeq/sgiseeq.c | 28 ++++++++++++++++++---------- 1 file changed, 18 insertions(+), 10 deletions(-)
diff --git a/drivers/net/ethernet/seeq/sgiseeq.c b/drivers/net/ethernet/seeq/sgiseeq.c index 8507ff2420143a..37ff25a84030eb 100644 --- a/drivers/net/ethernet/seeq/sgiseeq.c +++ b/drivers/net/ethernet/seeq/sgiseeq.c @@ -112,14 +112,18 @@ struct sgiseeq_private {
static inline void dma_sync_desc_cpu(struct net_device *dev, void *addr) { - dma_cache_sync(dev->dev.parent, addr, sizeof(struct sgiseeq_rx_desc), - DMA_FROM_DEVICE); + struct sgiseeq_private *sp = netdev_priv(dev); + + dma_sync_single_for_cpu(dev->dev.parent, VIRT_TO_DMA(sp, addr), + sizeof(struct sgiseeq_rx_desc), DMA_BIDIRECTIONAL); }
static inline void dma_sync_desc_dev(struct net_device *dev, void *addr) { - dma_cache_sync(dev->dev.parent, addr, sizeof(struct sgiseeq_rx_desc), - DMA_TO_DEVICE); + struct sgiseeq_private *sp = netdev_priv(dev); + + dma_sync_single_for_device(dev->dev.parent, VIRT_TO_DMA(sp, addr), + sizeof(struct sgiseeq_rx_desc), DMA_BIDIRECTIONAL); }
static inline void hpc3_eth_reset(struct hpc3_ethregs *hregs) @@ -403,6 +407,8 @@ static inline void sgiseeq_rx(struct net_device *dev, struct sgiseeq_private *sp rd = &sp->rx_desc[sp->rx_new]; dma_sync_desc_cpu(dev, rd); } + dma_sync_desc_dev(dev, rd); + dma_sync_desc_cpu(dev, &sp->rx_desc[orig_end]); sp->rx_desc[orig_end].rdma.cntinfo &= ~(HPCDMA_EOR); dma_sync_desc_dev(dev, &sp->rx_desc[orig_end]); @@ -443,6 +449,7 @@ static inline void kick_tx(struct net_device *dev, dma_sync_desc_cpu(dev, td); } if (td->tdma.cntinfo & HPCDMA_XIU) { + dma_sync_desc_dev(dev, td); hregs->tx_ndptr = VIRT_TO_DMA(sp, td); hregs->tx_ctrl = HPC3_ETXCTRL_ACTIVE; } @@ -476,6 +483,7 @@ static inline void sgiseeq_tx(struct net_device *dev, struct sgiseeq_private *sp if (!(td->tdma.cntinfo & (HPCDMA_XIU))) break; if (!(td->tdma.cntinfo & (HPCDMA_ETXD))) { + dma_sync_desc_dev(dev, td); if (!(status & HPC3_ETXCTRL_ACTIVE)) { hregs->tx_ndptr = VIRT_TO_DMA(sp, td); hregs->tx_ctrl = HPC3_ETXCTRL_ACTIVE; @@ -740,8 +748,8 @@ static int sgiseeq_probe(struct platform_device *pdev) sp = netdev_priv(dev);
/* Make private data page aligned */ - sr = dma_alloc_attrs(&pdev->dev, sizeof(*sp->srings), &sp->srings_dma, - GFP_KERNEL, DMA_ATTR_NON_CONSISTENT); + sr = dma_alloc_noncoherent(&pdev->dev, sizeof(*sp->srings), + &sp->srings_dma, DMA_BIDIRECTIONAL, GFP_KERNEL); if (!sr) { printk(KERN_ERR "Sgiseeq: Page alloc failed, aborting.\n"); err = -ENOMEM; @@ -802,8 +810,8 @@ static int sgiseeq_probe(struct platform_device *pdev) return 0;
err_out_free_attrs: - dma_free_attrs(&pdev->dev, sizeof(*sp->srings), sp->srings, - sp->srings_dma, DMA_ATTR_NON_CONSISTENT); + dma_free_noncoherent(&pdev->dev, sizeof(*sp->srings), sp->srings, + sp->srings_dma, DMA_BIDIRECTIONAL); err_out_free_dev: free_netdev(dev);
@@ -817,8 +825,8 @@ static int sgiseeq_remove(struct platform_device *pdev) struct sgiseeq_private *sp = netdev_priv(dev);
unregister_netdev(dev); - dma_free_attrs(&pdev->dev, sizeof(*sp->srings), sp->srings, - sp->srings_dma, DMA_ATTR_NON_CONSISTENT); + dma_free_noncoherent(&pdev->dev, sizeof(*sp->srings), sp->srings, + sp->srings_dma, DMA_BIDIRECTIONAL); free_netdev(dev);
return 0;
On Tue, Sep 15, 2020 at 05:51:16PM +0200, Christoph Hellwig wrote:
Use the new non-coherent DMA API including proper ownership transfers. This includes adding additional calls to dma_sync_desc_dev as the old syncing was rather ad-hoc.
Thanks to Thomas Bogendoerfer for debugging the ownership transfer issues.
Signed-off-by: Christoph Hellwig hch@lst.de
drivers/net/ethernet/seeq/sgiseeq.c | 28 ++++++++++++++++++---------- 1 file changed, 18 insertions(+), 10 deletions(-)
Tested-by: Thomas Bogendoerfer tsbogend@alpha.franken.de
Use the new non-coherent DMA API including proper ownership transfers.
Signed-off-by: Christoph Hellwig hch@lst.de --- drivers/scsi/53c700.c | 20 ++++++++++++++------ 1 file changed, 14 insertions(+), 6 deletions(-)
diff --git a/drivers/scsi/53c700.c b/drivers/scsi/53c700.c index c59226d7e2f6b5..5117d90ccd9edf 100644 --- a/drivers/scsi/53c700.c +++ b/drivers/scsi/53c700.c @@ -269,18 +269,25 @@ NCR_700_get_SXFER(struct scsi_device *SDp) spi_period(SDp->sdev_target)); }
+static inline dma_addr_t virt_to_dma(struct NCR_700_Host_Parameters *h, void *p) +{ + return h->pScript + ((uintptr_t)p - (uintptr_t)h->script); +} + static inline void dma_sync_to_dev(struct NCR_700_Host_Parameters *h, void *addr, size_t size) { if (h->noncoherent) - dma_cache_sync(h->dev, addr, size, DMA_TO_DEVICE); + dma_sync_single_for_device(h->dev, virt_to_dma(h, addr), + size, DMA_BIDIRECTIONAL); }
static inline void dma_sync_from_dev(struct NCR_700_Host_Parameters *h, void *addr, size_t size) { if (h->noncoherent) - dma_cache_sync(h->dev, addr, size, DMA_FROM_DEVICE); + dma_sync_single_for_device(h->dev, virt_to_dma(h, addr), size, + DMA_BIDIRECTIONAL); }
struct Scsi_Host * @@ -300,8 +307,8 @@ NCR_700_detect(struct scsi_host_template *tpnt, memory = dma_alloc_coherent(dev, TOTAL_MEM_SIZE, &pScript, GFP_KERNEL); if (!memory) { hostdata->noncoherent = 1; - memory = dma_alloc_attrs(dev, TOTAL_MEM_SIZE, &pScript, - GFP_KERNEL, DMA_ATTR_NON_CONSISTENT); + memory = dma_alloc_noncoherent(dev, TOTAL_MEM_SIZE, &pScript, + DMA_BIDIRECTIONAL, GFP_KERNEL); } if (!memory) { printk(KERN_ERR "53c700: Failed to allocate memory for driver, detaching\n"); @@ -414,8 +421,9 @@ NCR_700_release(struct Scsi_Host *host) (struct NCR_700_Host_Parameters *)host->hostdata[0];
if (hostdata->noncoherent) - dma_free_attrs(hostdata->dev, TOTAL_MEM_SIZE, hostdata->script, - hostdata->pScript, DMA_ATTR_NON_CONSISTENT); + dma_free_noncoherent(hostdata->dev, TOTAL_MEM_SIZE, + hostdata->script, hostdata->pScript, + DMA_BIDIRECTIONAL); else dma_free_coherent(hostdata->dev, TOTAL_MEM_SIZE, hostdata->script, hostdata->pScript);
On Tue, Sep 15, 2020 at 05:51:17PM +0200, Christoph Hellwig wrote:
Use the new non-coherent DMA API including proper ownership transfers.
Signed-off-by: Christoph Hellwig hch@lst.de
drivers/scsi/53c700.c | 20 ++++++++++++++------ 1 file changed, 14 insertions(+), 6 deletions(-)
Tested-by: Thomas Bogendoerfer tsbogend@alpha.franken.de
All users are gone now, remove the API.
Signed-off-by: Christoph Hellwig hch@lst.de --- arch/mips/Kconfig | 1 - arch/mips/jazz/jazzdma.c | 1 - arch/mips/mm/dma-noncoherent.c | 6 ------ arch/parisc/Kconfig | 1 - arch/parisc/kernel/pci-dma.c | 6 ------ include/linux/dma-mapping.h | 8 -------- include/linux/dma-noncoherent.h | 10 ---------- kernel/dma/Kconfig | 3 --- kernel/dma/mapping.c | 14 -------------- 9 files changed, 50 deletions(-)
diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig index c95fa3a2484cf0..1be91c5d666e61 100644 --- a/arch/mips/Kconfig +++ b/arch/mips/Kconfig @@ -1134,7 +1134,6 @@ config DMA_NONCOHERENT select ARCH_HAS_SYNC_DMA_FOR_DEVICE select ARCH_HAS_DMA_SET_UNCACHED select DMA_NONCOHERENT_MMAP - select DMA_NONCOHERENT_CACHE_SYNC select NEED_DMA_MAP_STATE
config SYS_HAS_EARLY_PRINTK diff --git a/arch/mips/jazz/jazzdma.c b/arch/mips/jazz/jazzdma.c index dab4d058cea9b1..2bf849caf507b1 100644 --- a/arch/mips/jazz/jazzdma.c +++ b/arch/mips/jazz/jazzdma.c @@ -620,7 +620,6 @@ const struct dma_map_ops jazz_dma_ops = { .sync_single_for_device = jazz_dma_sync_single_for_device, .sync_sg_for_cpu = jazz_dma_sync_sg_for_cpu, .sync_sg_for_device = jazz_dma_sync_sg_for_device, - .cache_sync = arch_dma_cache_sync, .mmap = dma_common_mmap, .get_sgtable = dma_common_get_sgtable, }; diff --git a/arch/mips/mm/dma-noncoherent.c b/arch/mips/mm/dma-noncoherent.c index 97a14adbafc99c..f34ad1f09799f1 100644 --- a/arch/mips/mm/dma-noncoherent.c +++ b/arch/mips/mm/dma-noncoherent.c @@ -137,12 +137,6 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, } #endif
-void arch_dma_cache_sync(struct device *dev, void *vaddr, size_t size, - enum dma_data_direction direction) -{ - dma_sync_virt_for_device(vaddr, size, direction); -} - #ifdef CONFIG_DMA_PERDEV_COHERENT void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, const struct iommu_ops *iommu, bool coherent) diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig index 3b0f53dd70bc9b..ed15da1da174e0 100644 --- a/arch/parisc/Kconfig +++ b/arch/parisc/Kconfig @@ -195,7 +195,6 @@ config PA11 depends on PA7000 || PA7100LC || PA7200 || PA7300LC select ARCH_HAS_SYNC_DMA_FOR_CPU select ARCH_HAS_SYNC_DMA_FOR_DEVICE - select DMA_NONCOHERENT_CACHE_SYNC
config PREFETCH def_bool y diff --git a/arch/parisc/kernel/pci-dma.c b/arch/parisc/kernel/pci-dma.c index 38c68e131bbe2a..ce38c0b9158125 100644 --- a/arch/parisc/kernel/pci-dma.c +++ b/arch/parisc/kernel/pci-dma.c @@ -454,9 +454,3 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, { flush_kernel_dcache_range((unsigned long)phys_to_virt(paddr), size); } - -void arch_dma_cache_sync(struct device *dev, void *vaddr, size_t size, - enum dma_data_direction direction) -{ - flush_kernel_dcache_range((unsigned long)vaddr, size); -} diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h index 4e1de194b45cbf..5b4e97b0846fd3 100644 --- a/include/linux/dma-mapping.h +++ b/include/linux/dma-mapping.h @@ -123,8 +123,6 @@ struct dma_map_ops { void (*sync_sg_for_device)(struct device *dev, struct scatterlist *sg, int nents, enum dma_data_direction dir); - void (*cache_sync)(struct device *dev, void *vaddr, size_t size, - enum dma_data_direction direction); int (*dma_supported)(struct device *dev, u64 mask); u64 (*get_required_mask)(struct device *dev); size_t (*max_mapping_size)(struct device *dev); @@ -254,8 +252,6 @@ void *dmam_alloc_attrs(struct device *dev, size_t size, dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs); void dmam_free_coherent(struct device *dev, size_t size, void *vaddr, dma_addr_t dma_handle); -void dma_cache_sync(struct device *dev, void *vaddr, size_t size, - enum dma_data_direction dir); int dma_get_sgtable_attrs(struct device *dev, struct sg_table *sgt, void *cpu_addr, dma_addr_t dma_addr, size_t size, unsigned long attrs); @@ -339,10 +335,6 @@ static inline void dmam_free_coherent(struct device *dev, size_t size, void *vaddr, dma_addr_t dma_handle) { } -static inline void dma_cache_sync(struct device *dev, void *vaddr, size_t size, - enum dma_data_direction dir) -{ -} static inline int dma_get_sgtable_attrs(struct device *dev, struct sg_table *sgt, void *cpu_addr, dma_addr_t dma_addr, size_t size, unsigned long attrs) diff --git a/include/linux/dma-noncoherent.h b/include/linux/dma-noncoherent.h index b9bc6c557ea46f..0888656369a45b 100644 --- a/include/linux/dma-noncoherent.h +++ b/include/linux/dma-noncoherent.h @@ -62,16 +62,6 @@ static inline pgprot_t dma_pgprot(struct device *dev, pgprot_t prot, } #endif /* CONFIG_MMU */
-#ifdef CONFIG_DMA_NONCOHERENT_CACHE_SYNC -void arch_dma_cache_sync(struct device *dev, void *vaddr, size_t size, - enum dma_data_direction direction); -#else -static inline void arch_dma_cache_sync(struct device *dev, void *vaddr, - size_t size, enum dma_data_direction direction) -{ -} -#endif /* CONFIG_DMA_NONCOHERENT_CACHE_SYNC */ - #ifdef CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir); diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig index 281785feb874db..c5f717021f5654 100644 --- a/kernel/dma/Kconfig +++ b/kernel/dma/Kconfig @@ -75,9 +75,6 @@ config ARCH_HAS_DMA_PREP_COHERENT config ARCH_HAS_FORCE_DMA_UNENCRYPTED bool
-config DMA_NONCOHERENT_CACHE_SYNC - bool - config DMA_VIRT_OPS bool depends on HAS_DMA diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c index 848c95c27d79ff..e71abcec8d3913 100644 --- a/kernel/dma/mapping.c +++ b/kernel/dma/mapping.c @@ -518,20 +518,6 @@ int dma_set_coherent_mask(struct device *dev, u64 mask) EXPORT_SYMBOL(dma_set_coherent_mask); #endif
-void dma_cache_sync(struct device *dev, void *vaddr, size_t size, - enum dma_data_direction dir) -{ - const struct dma_map_ops *ops = get_dma_ops(dev); - - BUG_ON(!valid_dma_direction(dir)); - - if (dma_alloc_direct(dev, ops)) - arch_dma_cache_sync(dev, vaddr, size, dir); - else if (ops->cache_sync) - ops->cache_sync(dev, vaddr, size, dir); -} -EXPORT_SYMBOL(dma_cache_sync); - size_t dma_max_mapping_size(struct device *dev) { const struct dma_map_ops *ops = get_dma_ops(dev);
On Tue, Sep 15, 2020 at 05:51:18PM +0200, Christoph Hellwig wrote:
All users are gone now, remove the API.
Signed-off-by: Christoph Hellwig hch@lst.de
arch/mips/Kconfig | 1 - arch/mips/jazz/jazzdma.c | 1 - arch/mips/mm/dma-noncoherent.c | 6 ------ arch/parisc/Kconfig | 1 - arch/parisc/kernel/pci-dma.c | 6 ------ include/linux/dma-mapping.h | 8 -------- include/linux/dma-noncoherent.h | 10 ---------- kernel/dma/Kconfig | 3 --- kernel/dma/mapping.c | 14 -------------- 9 files changed, 50 deletions(-)
Acked-by: Thomas Bogendoerfer tsbogend@alpha.franken.de (MIPS part)
This API is the equivalent of alloc_pages, except that the returned memory is guaranteed to be DMA addressable by the passed in device. The implementation will also be used to provide a more sensible replacement for DMA_ATTR_NON_CONSISTENT flag.
Additionally dma_alloc_noncoherent is switched over to use dma_alloc_pages as its backend.
Signed-off-by: Christoph Hellwig hch@lst.de --- Documentation/core-api/dma-attributes.rst | 8 --- arch/alpha/kernel/pci_iommu.c | 2 + arch/arm/mm/dma-mapping-nommu.c | 2 + arch/arm/mm/dma-mapping.c | 4 ++ arch/ia64/hp/common/sba_iommu.c | 2 + arch/mips/jazz/jazzdma.c | 7 +-- arch/powerpc/kernel/dma-iommu.c | 2 + arch/powerpc/platforms/ps3/system-bus.c | 4 ++ arch/powerpc/platforms/pseries/vio.c | 2 + arch/s390/pci/pci_dma.c | 2 + arch/x86/kernel/amd_gart_64.c | 2 + drivers/iommu/dma-iommu.c | 2 + drivers/iommu/intel/iommu.c | 4 ++ drivers/parisc/ccio-dma.c | 2 + drivers/parisc/sba_iommu.c | 2 + drivers/xen/swiotlb-xen.c | 2 + include/linux/dma-direct.h | 5 ++ include/linux/dma-mapping.h | 34 ++++++------ include/linux/dma-noncoherent.h | 3 -- kernel/dma/direct.c | 52 ++++++++++++++++++- kernel/dma/mapping.c | 63 +++++++++++++++++++++-- kernel/dma/ops_helpers.c | 35 +++++++++++++ kernel/dma/virt.c | 2 + 23 files changed, 206 insertions(+), 37 deletions(-)
diff --git a/Documentation/core-api/dma-attributes.rst b/Documentation/core-api/dma-attributes.rst index 29dcbe8826e85e..1887d92e8e9269 100644 --- a/Documentation/core-api/dma-attributes.rst +++ b/Documentation/core-api/dma-attributes.rst @@ -25,14 +25,6 @@ Since it is optional for platforms to implement DMA_ATTR_WRITE_COMBINE, those that do not will simply ignore the attribute and exhibit default behavior.
-DMA_ATTR_NON_CONSISTENT ------------------------ - -DMA_ATTR_NON_CONSISTENT lets the platform to choose to return either -consistent or non-consistent memory as it sees fit. By using this API, -you are guaranteeing to the platform that you have all the correct and -necessary sync points for this memory in the driver. - DMA_ATTR_NO_KERNEL_MAPPING --------------------------
diff --git a/arch/alpha/kernel/pci_iommu.c b/arch/alpha/kernel/pci_iommu.c index 6f7de4f4e191e7..447e0fd0ed3895 100644 --- a/arch/alpha/kernel/pci_iommu.c +++ b/arch/alpha/kernel/pci_iommu.c @@ -952,5 +952,7 @@ const struct dma_map_ops alpha_pci_ops = { .dma_supported = alpha_pci_supported, .mmap = dma_common_mmap, .get_sgtable = dma_common_get_sgtable, + .alloc_pages = dma_common_alloc_pages, + .free_pages = dma_common_free_pages, }; EXPORT_SYMBOL(alpha_pci_ops); diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping-nommu.c index 287ef898a55e11..43c6d66b6e733a 100644 --- a/arch/arm/mm/dma-mapping-nommu.c +++ b/arch/arm/mm/dma-mapping-nommu.c @@ -176,6 +176,8 @@ static void arm_nommu_dma_sync_sg_for_cpu(struct device *dev, struct scatterlist const struct dma_map_ops arm_nommu_dma_ops = { .alloc = arm_nommu_dma_alloc, .free = arm_nommu_dma_free, + .alloc_pages = dma_direct_alloc_pages, + .free_pages = dma_direct_free_pages, .mmap = arm_nommu_dma_mmap, .map_page = arm_nommu_dma_map_page, .unmap_page = arm_nommu_dma_unmap_page, diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index 8a8949174b1c06..7738b4d23f692f 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -199,6 +199,8 @@ static int arm_dma_supported(struct device *dev, u64 mask) const struct dma_map_ops arm_dma_ops = { .alloc = arm_dma_alloc, .free = arm_dma_free, + .alloc_pages = dma_direct_alloc_pages, + .free_pages = dma_direct_free_pages, .mmap = arm_dma_mmap, .get_sgtable = arm_dma_get_sgtable, .map_page = arm_dma_map_page, @@ -226,6 +228,8 @@ static int arm_coherent_dma_mmap(struct device *dev, struct vm_area_struct *vma, const struct dma_map_ops arm_coherent_dma_ops = { .alloc = arm_coherent_dma_alloc, .free = arm_coherent_dma_free, + .alloc_pages = dma_direct_alloc_pages, + .free_pages = dma_direct_free_pages, .mmap = arm_coherent_dma_mmap, .get_sgtable = arm_dma_get_sgtable, .map_page = arm_coherent_dma_map_page, diff --git a/arch/ia64/hp/common/sba_iommu.c b/arch/ia64/hp/common/sba_iommu.c index b49b73a95067d2..cafbb848a34e4d 100644 --- a/arch/ia64/hp/common/sba_iommu.c +++ b/arch/ia64/hp/common/sba_iommu.c @@ -2070,6 +2070,8 @@ static const struct dma_map_ops sba_dma_ops = { .dma_supported = sba_dma_supported, .mmap = dma_common_mmap, .get_sgtable = dma_common_get_sgtable, + .alloc_pages = dma_common_alloc_pages, + .free_pages = dma_common_free_pages, };
static int __init diff --git a/arch/mips/jazz/jazzdma.c b/arch/mips/jazz/jazzdma.c index 2bf849caf507b1..f53bc043334c01 100644 --- a/arch/mips/jazz/jazzdma.c +++ b/arch/mips/jazz/jazzdma.c @@ -506,9 +506,6 @@ static void *jazz_dma_alloc(struct device *dev, size_t size, *dma_handle = vdma_alloc(virt_to_phys(ret), size); if (*dma_handle == DMA_MAPPING_ERROR) goto out_free_pages; - - if (attrs & DMA_ATTR_NON_CONSISTENT) - return ret; arch_dma_prep_coherent(page, size); return (void *)(UNCAC_BASE + __pa(ret));
@@ -521,8 +518,6 @@ static void jazz_dma_free(struct device *dev, size_t size, void *vaddr, dma_addr_t dma_handle, unsigned long attrs) { vdma_free(dma_handle); - if (!(attrs & DMA_ATTR_NON_CONSISTENT)) - vaddr = __va(vaddr - UNCAC_BASE); __free_pages(virt_to_page(vaddr), get_order(size)); }
@@ -622,5 +617,7 @@ const struct dma_map_ops jazz_dma_ops = { .sync_sg_for_device = jazz_dma_sync_sg_for_device, .mmap = dma_common_mmap, .get_sgtable = dma_common_get_sgtable, + .alloc_pages = dma_common_alloc_pages, + .free_pages = dma_common_free_pages, }; EXPORT_SYMBOL(jazz_dma_ops); diff --git a/arch/powerpc/kernel/dma-iommu.c b/arch/powerpc/kernel/dma-iommu.c index 569fecd7b5b234..d4e702d74b3393 100644 --- a/arch/powerpc/kernel/dma-iommu.c +++ b/arch/powerpc/kernel/dma-iommu.c @@ -137,4 +137,6 @@ const struct dma_map_ops dma_iommu_ops = { .get_required_mask = dma_iommu_get_required_mask, .mmap = dma_common_mmap, .get_sgtable = dma_common_get_sgtable, + .alloc_pages = dma_common_alloc_pages, + .free_pages = dma_common_free_pages, }; diff --git a/arch/powerpc/platforms/ps3/system-bus.c b/arch/powerpc/platforms/ps3/system-bus.c index 3542b7bd6a4689..7bc5f9be3e12d8 100644 --- a/arch/powerpc/platforms/ps3/system-bus.c +++ b/arch/powerpc/platforms/ps3/system-bus.c @@ -696,6 +696,8 @@ static const struct dma_map_ops ps3_sb_dma_ops = { .unmap_page = ps3_unmap_page, .mmap = dma_common_mmap, .get_sgtable = dma_common_get_sgtable, + .alloc_pages = dma_common_alloc_pages, + .free_pages = dma_common_free_pages, };
static const struct dma_map_ops ps3_ioc0_dma_ops = { @@ -708,6 +710,8 @@ static const struct dma_map_ops ps3_ioc0_dma_ops = { .unmap_page = ps3_unmap_page, .mmap = dma_common_mmap, .get_sgtable = dma_common_get_sgtable, + .alloc_pages = dma_common_alloc_pages, + .free_pages = dma_common_free_pages, };
/** diff --git a/arch/powerpc/platforms/pseries/vio.c b/arch/powerpc/platforms/pseries/vio.c index 0487b26f6f1af3..98ed7b09b3fe50 100644 --- a/arch/powerpc/platforms/pseries/vio.c +++ b/arch/powerpc/platforms/pseries/vio.c @@ -608,6 +608,8 @@ static const struct dma_map_ops vio_dma_mapping_ops = { .get_required_mask = dma_iommu_get_required_mask, .mmap = dma_common_mmap, .get_sgtable = dma_common_get_sgtable, + .alloc_pages = dma_common_alloc_pages, + .free_pages = dma_common_free_pages, };
/** diff --git a/arch/s390/pci/pci_dma.c b/arch/s390/pci/pci_dma.c index 4a37d8f4de9d9d..9291023e9469c2 100644 --- a/arch/s390/pci/pci_dma.c +++ b/arch/s390/pci/pci_dma.c @@ -668,6 +668,8 @@ const struct dma_map_ops s390_pci_dma_ops = { .unmap_page = s390_dma_unmap_pages, .mmap = dma_common_mmap, .get_sgtable = dma_common_get_sgtable, + .alloc_pages = dma_common_alloc_pages, + .free_pages = dma_common_free_pages, /* dma_supported is unconditionally true without a callback */ }; EXPORT_SYMBOL_GPL(s390_pci_dma_ops); diff --git a/arch/x86/kernel/amd_gart_64.c b/arch/x86/kernel/amd_gart_64.c index 153374b996a279..c96dcaa572ebd3 100644 --- a/arch/x86/kernel/amd_gart_64.c +++ b/arch/x86/kernel/amd_gart_64.c @@ -677,6 +677,8 @@ static const struct dma_map_ops gart_dma_ops = { .get_sgtable = dma_common_get_sgtable, .dma_supported = dma_direct_supported, .get_required_mask = dma_direct_get_required_mask, + .alloc_pages = dma_direct_alloc_pages, + .free_pages = dma_direct_free_pages, };
static void gart_iommu_shutdown(void) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 5141d49a046baa..00a5b49248e334 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -1120,6 +1120,8 @@ static unsigned long iommu_dma_get_merge_boundary(struct device *dev) static const struct dma_map_ops iommu_dma_ops = { .alloc = iommu_dma_alloc, .free = iommu_dma_free, + .alloc_pages = dma_common_alloc_pages, + .free_pages = dma_common_free_pages, .mmap = iommu_dma_mmap, .get_sgtable = iommu_dma_get_sgtable, .map_page = iommu_dma_map_page, diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 7983c13b9eef7d..26eb7aafa0bda6 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -3669,6 +3669,8 @@ static const struct dma_map_ops intel_dma_ops = { .dma_supported = dma_direct_supported, .mmap = dma_common_mmap, .get_sgtable = dma_common_get_sgtable, + .alloc_pages = dma_common_alloc_pages, + .free_pages = dma_common_free_pages, .get_required_mask = intel_get_required_mask, };
@@ -3922,6 +3924,8 @@ static const struct dma_map_ops bounce_dma_ops = { .sync_sg_for_device = bounce_sync_sg_for_device, .map_resource = bounce_map_resource, .unmap_resource = bounce_unmap_resource, + .alloc_pages = dma_common_alloc_pages, + .free_pages = dma_common_free_pages, .dma_supported = dma_direct_supported, };
diff --git a/drivers/parisc/ccio-dma.c b/drivers/parisc/ccio-dma.c index ba16b7f8f80612..8cf0b9c8bdf795 100644 --- a/drivers/parisc/ccio-dma.c +++ b/drivers/parisc/ccio-dma.c @@ -1024,6 +1024,8 @@ static const struct dma_map_ops ccio_ops = { .map_sg = ccio_map_sg, .unmap_sg = ccio_unmap_sg, .get_sgtable = dma_common_get_sgtable, + .alloc_pages = dma_common_alloc_pages, + .free_pages = dma_common_free_pages, };
#ifdef CONFIG_PROC_FS diff --git a/drivers/parisc/sba_iommu.c b/drivers/parisc/sba_iommu.c index 959bda193b9603..6fcde7980358ae 100644 --- a/drivers/parisc/sba_iommu.c +++ b/drivers/parisc/sba_iommu.c @@ -1076,6 +1076,8 @@ static const struct dma_map_ops sba_ops = { .map_sg = sba_map_sg, .unmap_sg = sba_unmap_sg, .get_sgtable = dma_common_get_sgtable, + .alloc_pages = dma_common_alloc_pages, + .free_pages = dma_common_free_pages, };
diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c index 39a0f2e0847c95..030a225624b060 100644 --- a/drivers/xen/swiotlb-xen.c +++ b/drivers/xen/swiotlb-xen.c @@ -578,4 +578,6 @@ const struct dma_map_ops xen_swiotlb_dma_ops = { .dma_supported = xen_swiotlb_dma_supported, .mmap = dma_common_mmap, .get_sgtable = dma_common_get_sgtable, + .alloc_pages = dma_common_alloc_pages, + .free_pages = dma_common_free_pages, }; diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h index 805010ea5346f9..c11bb935fc7fe3 100644 --- a/include/linux/dma-direct.h +++ b/include/linux/dma-direct.h @@ -77,6 +77,11 @@ void *dma_direct_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs); void dma_direct_free(struct device *dev, size_t size, void *cpu_addr, dma_addr_t dma_addr, unsigned long attrs); +struct page *dma_direct_alloc_pages(struct device *dev, size_t size, + dma_addr_t *dma_handle, enum dma_data_direction dir, gfp_t gfp); +void dma_direct_free_pages(struct device *dev, size_t size, + struct page *page, dma_addr_t dma_addr, + enum dma_data_direction dir); int dma_direct_get_sgtable(struct device *dev, struct sg_table *sgt, void *cpu_addr, dma_addr_t dma_addr, size_t size, unsigned long attrs); diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h index 5b4e97b0846fd3..bf592cf0db4acb 100644 --- a/include/linux/dma-mapping.h +++ b/include/linux/dma-mapping.h @@ -27,11 +27,6 @@ * buffered to improve performance. */ #define DMA_ATTR_WRITE_COMBINE (1UL << 2) -/* - * DMA_ATTR_NON_CONSISTENT: Lets the platform to choose to return either - * consistent or non-consistent memory as it sees fit. - */ -#define DMA_ATTR_NON_CONSISTENT (1UL << 3) /* * DMA_ATTR_NO_KERNEL_MAPPING: Lets the platform to avoid creating a kernel * virtual mapping for the allocated buffer. @@ -80,6 +75,11 @@ struct dma_map_ops { void (*free)(struct device *dev, size_t size, void *vaddr, dma_addr_t dma_handle, unsigned long attrs); + struct page *(*alloc_pages)(struct device *dev, size_t size, + dma_addr_t *dma_handle, enum dma_data_direction dir, + gfp_t gfp); + void (*free_pages)(struct device *dev, size_t size, struct page *vaddr, + dma_addr_t dma_handle, enum dma_data_direction dir); int (*mmap)(struct device *, struct vm_area_struct *, void *, dma_addr_t, size_t, unsigned long attrs); @@ -381,17 +381,14 @@ static inline unsigned long dma_get_merge_boundary(struct device *dev) } #endif /* CONFIG_HAS_DMA */
-static inline void *dma_alloc_noncoherent(struct device *dev, size_t size, - dma_addr_t *dma_handle, enum dma_data_direction dir, gfp_t gfp) -{ - return dma_alloc_attrs(dev, size, dma_handle, gfp, - DMA_ATTR_NON_CONSISTENT); -} -static inline void dma_free_noncoherent(struct device *dev, size_t size, - void *vaddr, dma_addr_t dma_handle, enum dma_data_direction dir) -{ - dma_free_attrs(dev, size, vaddr, dma_handle, DMA_ATTR_NON_CONSISTENT); -} +struct page *dma_alloc_pages(struct device *dev, size_t size, + dma_addr_t *dma_handle, enum dma_data_direction dir, gfp_t gfp); +void dma_free_pages(struct device *dev, size_t size, struct page *page, + dma_addr_t dma_handle, enum dma_data_direction dir); +void *dma_alloc_noncoherent(struct device *dev, size_t size, + dma_addr_t *dma_handle, enum dma_data_direction dir, gfp_t gfp); +void dma_free_noncoherent(struct device *dev, size_t size, void *vaddr, + dma_addr_t dma_handle, enum dma_data_direction dir);
static inline dma_addr_t dma_map_single_attrs(struct device *dev, void *ptr, size_t size, enum dma_data_direction dir, unsigned long attrs) @@ -517,7 +514,10 @@ static inline void dma_sync_sgtable_for_device(struct device *dev, extern int dma_common_mmap(struct device *dev, struct vm_area_struct *vma, void *cpu_addr, dma_addr_t dma_addr, size_t size, unsigned long attrs); - +struct page *dma_common_alloc_pages(struct device *dev, size_t size, + dma_addr_t *dma_handle, enum dma_data_direction dir, gfp_t gfp); +void dma_common_free_pages(struct device *dev, size_t size, struct page *vaddr, + dma_addr_t dma_handle, enum dma_data_direction dir); struct page **dma_common_find_pages(void *cpu_addr); void *dma_common_contiguous_remap(struct page *page, size_t size, pgprot_t prot, const void *caller); diff --git a/include/linux/dma-noncoherent.h b/include/linux/dma-noncoherent.h index 0888656369a45b..e61283e06576a8 100644 --- a/include/linux/dma-noncoherent.h +++ b/include/linux/dma-noncoherent.h @@ -31,9 +31,6 @@ static __always_inline bool dma_alloc_need_uncached(struct device *dev, return false; if (attrs & DMA_ATTR_NO_KERNEL_MAPPING) return false; - if (IS_ENABLED(CONFIG_DMA_NONCOHERENT_CACHE_SYNC) && - (attrs & DMA_ATTR_NON_CONSISTENT)) - return false; return true; }
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c index 54db9cfdaecc6d..9ba320383b0d19 100644 --- a/kernel/dma/direct.c +++ b/kernel/dma/direct.c @@ -1,6 +1,6 @@ // SPDX-License-Identifier: GPL-2.0 /* - * Copyright (C) 2018 Christoph Hellwig. + * Copyright (C) 2018-2020 Christoph Hellwig. * * DMA operations that map physical memory directly without using an IOMMU. */ @@ -287,6 +287,56 @@ void dma_direct_free(struct device *dev, size_t size, dma_free_contiguous(dev, dma_direct_to_page(dev, dma_addr), size); }
+struct page *dma_direct_alloc_pages(struct device *dev, size_t size, + dma_addr_t *dma_handle, enum dma_data_direction dir, gfp_t gfp) +{ + struct page *page; + void *ret; + + if (dma_should_alloc_from_pool(dev, gfp, 0)) { + page = dma_alloc_from_pool(dev, size, &ret, gfp, + dma_coherent_ok); + if (!page) + return NULL; + goto done; + } + + page = __dma_direct_alloc_pages(dev, size, gfp); + if (!page) + return NULL; + ret = page_address(page); + if (force_dma_unencrypted(dev)) { + if (set_memory_decrypted((unsigned long)ret, + 1 << get_order(size))) + goto out_free_pages; + } + memset(ret, 0, size); +done: + *dma_handle = phys_to_dma_direct(dev, page_to_phys(page)); + return page; +out_free_pages: + dma_free_contiguous(dev, page, size); + return NULL; +} + +void dma_direct_free_pages(struct device *dev, size_t size, + struct page *page, dma_addr_t dma_addr, + enum dma_data_direction dir) +{ + unsigned int page_order = get_order(size); + void *vaddr = page_address(page); + + /* If cpu_addr is not from an atomic pool, dma_free_from_pool() fails */ + if (dma_should_free_from_pool(dev, 0) && + dma_free_from_pool(dev, vaddr, size)) + return; + + if (force_dma_unencrypted(dev)) + set_memory_encrypted((unsigned long)vaddr, 1 << page_order); + + dma_free_contiguous(dev, page, size); +} + #if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) || \ defined(CONFIG_SWIOTLB) void dma_direct_sync_sg_for_device(struct device *dev, diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c index e71abcec8d3913..6f86c925b8251d 100644 --- a/kernel/dma/mapping.c +++ b/kernel/dma/mapping.c @@ -330,9 +330,7 @@ pgprot_t dma_pgprot(struct device *dev, pgprot_t prot, unsigned long attrs) { if (force_dma_unencrypted(dev)) prot = pgprot_decrypted(prot); - if (dev_is_dma_coherent(dev) || - (IS_ENABLED(CONFIG_DMA_NONCOHERENT_CACHE_SYNC) && - (attrs & DMA_ATTR_NON_CONSISTENT))) + if (dev_is_dma_coherent(dev)) return prot; #ifdef CONFIG_ARCH_HAS_DMA_WRITE_COMBINE if (attrs & DMA_ATTR_WRITE_COMBINE) @@ -461,6 +459,65 @@ void dma_free_attrs(struct device *dev, size_t size, void *cpu_addr, } EXPORT_SYMBOL(dma_free_attrs);
+struct page *dma_alloc_pages(struct device *dev, size_t size, + dma_addr_t *dma_handle, enum dma_data_direction dir, gfp_t gfp) +{ + const struct dma_map_ops *ops = get_dma_ops(dev); + struct page *page; + + if (WARN_ON_ONCE(!dev->coherent_dma_mask)) + return NULL; + if (WARN_ON_ONCE(gfp & (__GFP_DMA | __GFP_DMA32 | __GFP_HIGHMEM))) + return NULL; + + size = PAGE_ALIGN(size); + if (dma_alloc_direct(dev, ops)) + page = dma_direct_alloc_pages(dev, size, dma_handle, dir, gfp); + else if (ops->alloc_pages) + page = ops->alloc_pages(dev, size, dma_handle, dir, gfp); + else + return NULL; + + debug_dma_map_page(dev, page, 0, size, dir, *dma_handle); + + return page; +} +EXPORT_SYMBOL_GPL(dma_alloc_pages); + +void dma_free_pages(struct device *dev, size_t size, struct page *page, + dma_addr_t dma_handle, enum dma_data_direction dir) +{ + const struct dma_map_ops *ops = get_dma_ops(dev); + + size = PAGE_ALIGN(size); + debug_dma_unmap_page(dev, dma_handle, size, dir); + + if (dma_alloc_direct(dev, ops)) + dma_direct_free_pages(dev, size, page, dma_handle, dir); + else if (ops->free_pages) + ops->free_pages(dev, size, page, dma_handle, dir); +} +EXPORT_SYMBOL_GPL(dma_free_pages); + +void *dma_alloc_noncoherent(struct device *dev, size_t size, + dma_addr_t *dma_handle, enum dma_data_direction dir, gfp_t gfp) +{ + struct page *page; + + page = dma_alloc_pages(dev, size, dma_handle, dir, gfp); + if (!page) + return NULL; + return page_address(page); +} +EXPORT_SYMBOL_GPL(dma_alloc_noncoherent); + +void dma_free_noncoherent(struct device *dev, size_t size, void *vaddr, + dma_addr_t dma_handle, enum dma_data_direction dir) +{ + dma_free_pages(dev, size, virt_to_page(vaddr), dma_handle, dir); +} +EXPORT_SYMBOL_GPL(dma_free_noncoherent); + int dma_supported(struct device *dev, u64 mask) { const struct dma_map_ops *ops = get_dma_ops(dev); diff --git a/kernel/dma/ops_helpers.c b/kernel/dma/ops_helpers.c index e443c69be4299f..5828e5e01b7913 100644 --- a/kernel/dma/ops_helpers.c +++ b/kernel/dma/ops_helpers.c @@ -3,6 +3,7 @@ * Helpers for DMA ops implementations. These generally rely on the fact that * the allocated memory contains normal pages in the direct kernel mapping. */ +#include <linux/dma-contiguous.h> #include <linux/dma-noncoherent.h>
/* @@ -49,3 +50,37 @@ int dma_common_mmap(struct device *dev, struct vm_area_struct *vma, return -ENXIO; #endif /* CONFIG_MMU */ } + +struct page *dma_common_alloc_pages(struct device *dev, size_t size, + dma_addr_t *dma_handle, enum dma_data_direction dir, gfp_t gfp) +{ + const struct dma_map_ops *ops = get_dma_ops(dev); + struct page *page; + + page = dma_alloc_contiguous(dev, size, gfp); + if (!page) + page = alloc_pages_node(dev_to_node(dev), gfp, get_order(size)); + if (!page) + return NULL; + + *dma_handle = ops->map_page(dev, page, 0, size, dir, + DMA_ATTR_SKIP_CPU_SYNC); + if (*dma_handle == DMA_MAPPING_ERROR) { + dma_free_contiguous(dev, page, size); + return NULL; + } + + memset(page_address(page), 0, size); + return page; +} + +void dma_common_free_pages(struct device *dev, size_t size, struct page *page, + dma_addr_t dma_handle, enum dma_data_direction dir) +{ + const struct dma_map_ops *ops = get_dma_ops(dev); + + if (ops->unmap_page) + ops->unmap_page(dev, dma_handle, size, dir, + DMA_ATTR_SKIP_CPU_SYNC); + dma_free_contiguous(dev, page, size); +} diff --git a/kernel/dma/virt.c b/kernel/dma/virt.c index ebe128833af7b5..6986bf1fd6689c 100644 --- a/kernel/dma/virt.c +++ b/kernel/dma/virt.c @@ -55,5 +55,7 @@ const struct dma_map_ops dma_virt_ops = { .free = dma_virt_free, .map_page = dma_virt_map_page, .map_sg = dma_virt_map_sg, + .alloc_pages = dma_common_alloc_pages, + .free_pages = dma_common_free_pages, }; EXPORT_SYMBOL(dma_virt_ops);
On Tue, Sep 15, 2020 at 05:51:19PM +0200, Christoph Hellwig wrote:
This API is the equivalent of alloc_pages, except that the returned memory is guaranteed to be DMA addressable by the passed in device. The implementation will also be used to provide a more sensible replacement for DMA_ATTR_NON_CONSISTENT flag.
Additionally dma_alloc_noncoherent is switched over to use dma_alloc_pages as its backend.
Signed-off-by: Christoph Hellwig hch@lst.de
Documentation/core-api/dma-attributes.rst | 8 --- arch/alpha/kernel/pci_iommu.c | 2 + arch/arm/mm/dma-mapping-nommu.c | 2 + arch/arm/mm/dma-mapping.c | 4 ++ arch/ia64/hp/common/sba_iommu.c | 2 + arch/mips/jazz/jazzdma.c | 7 +-- arch/powerpc/kernel/dma-iommu.c | 2 + arch/powerpc/platforms/ps3/system-bus.c | 4 ++ arch/powerpc/platforms/pseries/vio.c | 2 + arch/s390/pci/pci_dma.c | 2 + arch/x86/kernel/amd_gart_64.c | 2 + drivers/iommu/dma-iommu.c | 2 + drivers/iommu/intel/iommu.c | 4 ++ drivers/parisc/ccio-dma.c | 2 + drivers/parisc/sba_iommu.c | 2 + drivers/xen/swiotlb-xen.c | 2 + include/linux/dma-direct.h | 5 ++ include/linux/dma-mapping.h | 34 ++++++------ include/linux/dma-noncoherent.h | 3 -- kernel/dma/direct.c | 52 ++++++++++++++++++- kernel/dma/mapping.c | 63 +++++++++++++++++++++-- kernel/dma/ops_helpers.c | 35 +++++++++++++ kernel/dma/virt.c | 2 + 23 files changed, 206 insertions(+), 37 deletions(-)
Acked-by: Thomas Bogendoerfer tsbogend@alpha.franken.de (MIPS part)
This will allow IOMMU drivers to allocate non-contigous memory and return a vmapped virtual address.
Signed-off-by: Christoph Hellwig hch@lst.de --- include/linux/dma-mapping.h | 5 +++++ kernel/dma/mapping.c | 33 +++++++++++++++++++++++++++------ 2 files changed, 32 insertions(+), 6 deletions(-)
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h index bf592cf0db4acb..b4b5d75260d6dc 100644 --- a/include/linux/dma-mapping.h +++ b/include/linux/dma-mapping.h @@ -80,6 +80,11 @@ struct dma_map_ops { gfp_t gfp); void (*free_pages)(struct device *dev, size_t size, struct page *vaddr, dma_addr_t dma_handle, enum dma_data_direction dir); + void* (*alloc_noncoherent)(struct device *dev, size_t size, + dma_addr_t *dma_handle, enum dma_data_direction dir, + gfp_t gfp); + void (*free_noncoherent)(struct device *dev, size_t size, void *vaddr, + dma_addr_t dma_handle, enum dma_data_direction dir); int (*mmap)(struct device *, struct vm_area_struct *, void *, dma_addr_t, size_t, unsigned long attrs); diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c index 6f86c925b8251d..8614d7d2ee59a9 100644 --- a/kernel/dma/mapping.c +++ b/kernel/dma/mapping.c @@ -502,19 +502,40 @@ EXPORT_SYMBOL_GPL(dma_free_pages); void *dma_alloc_noncoherent(struct device *dev, size_t size, dma_addr_t *dma_handle, enum dma_data_direction dir, gfp_t gfp) { - struct page *page; + const struct dma_map_ops *ops = get_dma_ops(dev); + void *vaddr;
- page = dma_alloc_pages(dev, size, dma_handle, dir, gfp); - if (!page) - return NULL; - return page_address(page); + if (!ops || !ops->alloc_noncoherent) { + struct page *page; + + page = dma_alloc_pages(dev, size, dma_handle, dir, gfp); + if (!page) + return NULL; + return page_address(page); + } + + size = PAGE_ALIGN(size); + vaddr = ops->alloc_noncoherent(dev, size, dma_handle, dir, gfp); + if (vaddr) + debug_dma_map_page(dev, virt_to_page(vaddr), 0, size, dir, + *dma_handle); + return vaddr; } EXPORT_SYMBOL_GPL(dma_alloc_noncoherent);
void dma_free_noncoherent(struct device *dev, size_t size, void *vaddr, dma_addr_t dma_handle, enum dma_data_direction dir) { - dma_free_pages(dev, size, virt_to_page(vaddr), dma_handle, dir); + const struct dma_map_ops *ops = get_dma_ops(dev); + + if (!ops || !ops->free_noncoherent) { + dma_free_pages(dev, size, virt_to_page(vaddr), dma_handle, dir); + return; + } + + size = PAGE_ALIGN(size); + debug_dma_unmap_page(dev, dma_handle, size, dir); + ops->free_noncoherent(dev, size, vaddr, dma_handle, dir); } EXPORT_SYMBOL_GPL(dma_free_noncoherent);
Implement the alloc_noncoherent method to provide memory that is neither coherent not contiguous.
Signed-off-by: Christoph Hellwig hch@lst.de --- drivers/iommu/dma-iommu.c | 41 +++++++++++++++++++++++++++++++++++---- 1 file changed, 37 insertions(+), 4 deletions(-)
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 00a5b49248e334..c12c1dc43d312e 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -572,6 +572,7 @@ static struct page **__iommu_dma_alloc_pages(struct device *dev, * @size: Size of buffer in bytes * @dma_handle: Out argument for allocated DMA handle * @gfp: Allocation flags + * @prot: pgprot_t to use for the remapped mapping * @attrs: DMA attributes for this allocation * * If @size is less than PAGE_SIZE, then a full CPU page will be allocated, @@ -580,14 +581,14 @@ static struct page **__iommu_dma_alloc_pages(struct device *dev, * Return: Mapped virtual address, or NULL on failure. */ static void *iommu_dma_alloc_remap(struct device *dev, size_t size, - dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs) + dma_addr_t *dma_handle, gfp_t gfp, pgprot_t prot, + unsigned long attrs) { struct iommu_domain *domain = iommu_get_dma_domain(dev); struct iommu_dma_cookie *cookie = domain->iova_cookie; struct iova_domain *iovad = &cookie->iovad; bool coherent = dev_is_dma_coherent(dev); int ioprot = dma_info_to_prot(DMA_BIDIRECTIONAL, coherent, attrs); - pgprot_t prot = dma_pgprot(dev, PAGE_KERNEL, attrs); unsigned int count, min_size, alloc_sizes = domain->pgsize_bitmap; struct page **pages; struct sg_table sgt; @@ -1030,8 +1031,10 @@ static void *iommu_dma_alloc(struct device *dev, size_t size, gfp |= __GFP_ZERO;
if (IS_ENABLED(CONFIG_DMA_REMAP) && gfpflags_allow_blocking(gfp) && - !(attrs & DMA_ATTR_FORCE_CONTIGUOUS)) - return iommu_dma_alloc_remap(dev, size, handle, gfp, attrs); + !(attrs & DMA_ATTR_FORCE_CONTIGUOUS)) { + return iommu_dma_alloc_remap(dev, size, handle, gfp, + dma_pgprot(dev, PAGE_KERNEL, attrs), attrs); + }
if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && !gfpflags_allow_blocking(gfp) && !coherent) @@ -1052,6 +1055,34 @@ static void *iommu_dma_alloc(struct device *dev, size_t size, return cpu_addr; }
+#ifdef CONFIG_DMA_REMAP +static void *iommu_dma_alloc_noncoherent(struct device *dev, size_t size, + dma_addr_t *handle, enum dma_data_direction dir, gfp_t gfp) +{ + if (!gfpflags_allow_blocking(gfp)) { + struct page *page; + + page = dma_common_alloc_pages(dev, size, handle, dir, gfp); + if (!page) + return NULL; + return page_address(page); + } + + return iommu_dma_alloc_remap(dev, size, handle, gfp | __GFP_ZERO, + PAGE_KERNEL, 0); +} + +static void iommu_dma_free_noncoherent(struct device *dev, size_t size, + void *cpu_addr, dma_addr_t handle, enum dma_data_direction dir) +{ + __iommu_dma_unmap(dev, handle, size); + __iommu_dma_free(dev, size, cpu_addr); +} +#else +#define iommu_dma_alloc_noncoherent NULL +#define iommu_dma_free_noncoherent NULL +#endif /* CONFIG_DMA_REMAP */ + static int iommu_dma_mmap(struct device *dev, struct vm_area_struct *vma, void *cpu_addr, dma_addr_t dma_addr, size_t size, unsigned long attrs) @@ -1122,6 +1153,8 @@ static const struct dma_map_ops iommu_dma_ops = { .free = iommu_dma_free, .alloc_pages = dma_common_alloc_pages, .free_pages = dma_common_free_pages, + .alloc_noncoherent = iommu_dma_alloc_noncoherent, + .free_noncoherent = iommu_dma_free_noncoherent, .mmap = iommu_dma_mmap, .get_sgtable = iommu_dma_get_sgtable, .map_page = iommu_dma_map_page,
Hi Christoph,
On Tue, Sep 15, 2020 at 05:51:21PM +0200, Christoph Hellwig wrote:
Implement the alloc_noncoherent method to provide memory that is neither coherent not contiguous.
Signed-off-by: Christoph Hellwig hch@lst.de
drivers/iommu/dma-iommu.c | 41 +++++++++++++++++++++++++++++++++++---- 1 file changed, 37 insertions(+), 4 deletions(-)
Sorry for being late to the party and thanks a lot for the patch. Please see my comments inline.
[snip]
@@ -1052,6 +1055,34 @@ static void *iommu_dma_alloc(struct device *dev, size_t size, return cpu_addr; }
+#ifdef CONFIG_DMA_REMAP +static void *iommu_dma_alloc_noncoherent(struct device *dev, size_t size,
dma_addr_t *handle, enum dma_data_direction dir, gfp_t gfp)
+{
- if (!gfpflags_allow_blocking(gfp)) {
struct page *page;
page = dma_common_alloc_pages(dev, size, handle, dir, gfp);
if (!page)
return NULL;
return page_address(page);
- }
- return iommu_dma_alloc_remap(dev, size, handle, gfp | __GFP_ZERO,
PAGE_KERNEL, 0);
iommu_dma_alloc_remap() makes use of the DMA_ATTR_ALLOC_SINGLE_PAGES attribute to optimize the allocations for devices which don't care about how contiguous the backing memory is. Do you think we could add an attrs argument to this function and pass it there?
As ARM is being moved to the common iommu-dma layer as well, we'll probably make use of the argument to support the DMA_ATTR_NO_KERNEL_MAPPING attribute to conserve the vmalloc area.
Best regards, Tomasz
On Fri, Sep 25, 2020 at 06:46:22PM +0000, Tomasz Figa wrote:
+static void *iommu_dma_alloc_noncoherent(struct device *dev, size_t size,
dma_addr_t *handle, enum dma_data_direction dir, gfp_t gfp)
+{
- if (!gfpflags_allow_blocking(gfp)) {
struct page *page;
page = dma_common_alloc_pages(dev, size, handle, dir, gfp);
if (!page)
return NULL;
return page_address(page);
- }
- return iommu_dma_alloc_remap(dev, size, handle, gfp | __GFP_ZERO,
PAGE_KERNEL, 0);
iommu_dma_alloc_remap() makes use of the DMA_ATTR_ALLOC_SINGLE_PAGES attribute to optimize the allocations for devices which don't care about how contiguous the backing memory is. Do you think we could add an attrs argument to this function and pass it there?
As ARM is being moved to the common iommu-dma layer as well, we'll probably make use of the argument to support the DMA_ATTR_NO_KERNEL_MAPPING attribute to conserve the vmalloc area.
We could probably at it. However I wonder why this is something the drivers should care about. Isn't this really something that should be a kernel-wide policy for a given system?
On Sat, Sep 26, 2020 at 4:14 PM Christoph Hellwig hch@lst.de wrote:
On Fri, Sep 25, 2020 at 06:46:22PM +0000, Tomasz Figa wrote:
+static void *iommu_dma_alloc_noncoherent(struct device *dev, size_t size,
dma_addr_t *handle, enum dma_data_direction dir, gfp_t gfp)
+{
- if (!gfpflags_allow_blocking(gfp)) {
struct page *page;
page = dma_common_alloc_pages(dev, size, handle, dir, gfp);
if (!page)
return NULL;
return page_address(page);
- }
- return iommu_dma_alloc_remap(dev, size, handle, gfp | __GFP_ZERO,
PAGE_KERNEL, 0);
iommu_dma_alloc_remap() makes use of the DMA_ATTR_ALLOC_SINGLE_PAGES attribute to optimize the allocations for devices which don't care about how contiguous the backing memory is. Do you think we could add an attrs argument to this function and pass it there?
As ARM is being moved to the common iommu-dma layer as well, we'll probably make use of the argument to support the DMA_ATTR_NO_KERNEL_MAPPING attribute to conserve the vmalloc area.
We could probably at it. However I wonder why this is something the drivers should care about. Isn't this really something that should be a kernel-wide policy for a given system?
There are IOMMUs out there which support huge pages and those can benefit *some* hardware depending on what kind of accesses they perform, possibly on a per-buffer basis. At the same time, order > 0 allocations can be expensive, significantly affecting allocation latency, so for devices which don't care about huge pages anyone would prefer simple single-page allocations. Currently the drivers know the best on whether the hardware they drive would care. There are some decision factors listed in the documentation [1].
I can imagine cases where drivers could not be the best to decide about this - for example, the workload could vary depending on the userspace or a product decision regarding the performance vs allocation latency, but we haven't seen such cases in practice yet.
[1] https://www.kernel.org/doc/html/latest/core-api/dma-attributes.html?highligh...
Best regards, Tomasz
Use dma_alloc_pages to allocate DMAable pages instead of hoping that the architecture either has GFP_DMA32 or not more than 4G of memory.
Signed-off-by: Christoph Hellwig hch@lst.de --- drivers/firewire/ohci.c | 26 +++++++++++--------------- 1 file changed, 11 insertions(+), 15 deletions(-)
diff --git a/drivers/firewire/ohci.c b/drivers/firewire/ohci.c index 020cb15a4d8fcc..9811c40956e54d 100644 --- a/drivers/firewire/ohci.c +++ b/drivers/firewire/ohci.c @@ -674,17 +674,16 @@ static void ar_context_link_page(struct ar_context *ctx, unsigned int index)
static void ar_context_release(struct ar_context *ctx) { + struct device *dev = ctx->ohci->card.device; unsigned int i;
vunmap(ctx->buffer);
- for (i = 0; i < AR_BUFFERS; i++) - if (ctx->pages[i]) { - dma_unmap_page(ctx->ohci->card.device, - ar_buffer_bus(ctx, i), - PAGE_SIZE, DMA_FROM_DEVICE); - __free_page(ctx->pages[i]); - } + for (i = 0; i < AR_BUFFERS; i++) { + if (ctx->pages[i]) + dma_free_pages(dev, PAGE_SIZE, ctx->pages[i], + ar_buffer_bus(ctx, i), DMA_FROM_DEVICE); + } }
static void ar_context_abort(struct ar_context *ctx, const char *error_msg) @@ -970,6 +969,7 @@ static void ar_context_tasklet(unsigned long data) static int ar_context_init(struct ar_context *ctx, struct fw_ohci *ohci, unsigned int descriptors_offset, u32 regs) { + struct device *dev = ohci->card.device; unsigned int i; dma_addr_t dma_addr; struct page *pages[AR_BUFFERS + AR_WRAPAROUND_PAGES]; @@ -980,17 +980,13 @@ static int ar_context_init(struct ar_context *ctx, struct fw_ohci *ohci, tasklet_init(&ctx->tasklet, ar_context_tasklet, (unsigned long)ctx);
for (i = 0; i < AR_BUFFERS; i++) { - ctx->pages[i] = alloc_page(GFP_KERNEL | GFP_DMA32); + ctx->pages[i] = dma_alloc_pages(dev, PAGE_SIZE, &dma_addr, + DMA_FROM_DEVICE, GFP_KERNEL); if (!ctx->pages[i]) goto out_of_memory; - dma_addr = dma_map_page(ohci->card.device, ctx->pages[i], - 0, PAGE_SIZE, DMA_FROM_DEVICE); - if (dma_mapping_error(ohci->card.device, dma_addr)) { - __free_page(ctx->pages[i]); - ctx->pages[i] = NULL; - goto out_of_memory; - } set_page_private(ctx->pages[i], dma_addr); + dma_sync_single_for_device(dev, dma_addr, PAGE_SIZE, + DMA_FROM_DEVICE); }
for (i = 0; i < AR_BUFFERS; i++)
Any comments?
Thomas: this should be identical to the git tree I gave you for mips testing, and you add your tested-by (and reviewd-by tags where applicable)?
Helge: for parisc this should effectively be the same as the first version, but I've dropped the tested-by tags due to the reshuffle, and chance you could retest it?
On Tue, Sep 15, 2020 at 05:51:04PM +0200, Christoph Hellwig wrote:
Hi all,
this series replaced the DMA_ATTR_NON_CONSISTENT flag to dma_alloc_attrs with a separate new dma_alloc_pages API, which is available on all platforms. In addition to cleaning up the convoluted code path, this ensures that other drivers that have asked for better support for non-coherent DMA to pages with incurring bounce buffering over can finally be properly supported.
As a follow up I plan to move the implementation of the DMA_ATTR_NO_KERNEL_MAPPING flag over to this framework as well, given that is also is a fundamentally non coherent allocation. The replacement for that flag would then return a struct page, as it is allowed to actually return pages without a kernel mapping as the name suggested (although most of the time they will actually have a kernel mapping..)
In addition to the conversions of the existing non-coherent DMA users, I've also added a patch to convert the firewire ohci driver to use the new dma_alloc_pages API.
The first patch is queued up for 5.9 in the media tree, but included here for completeness.
A git tree is available here:
git://git.infradead.org/users/hch/misc.git dma_alloc_pages
Gitweb:
http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma_alloc_pages
Changes since v2:
- fix up the patch reshuffle which wasn't quite correct
- fix up a few commit messages
Changes since v1:
- rebased on the latests dma-mapping tree, which merged many of the cleanups
- fix an argument passing typo in 53c700, caught by sparse
- rename a few macro arguments in 53c700
- pass the right device to the DMA API in the lib82596 drivers
- fix memory ownershiptransfers in sgiseeq
- better document what a page in the direct kernel mapping means
- split into dma_alloc_pages that returns a struct page and is in the direct mapping vs dma_alloc_noncoherent that can be vmapped
- conver the firewire ohci driver to dma_alloc_pages
Diffstat: _______________________________________________ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
---end quoted text---
participants (4)
-
Christoph Hellwig
-
Robin Murphy
-
Thomas Bogendoerfer
-
Tomasz Figa