[alsa-devel] [PATCH 0/2] cuse: implement mmap/munmap
These patches implement mmap support for CUSE (character device in userspace).
The immediate application for this is to get old games requiring OSS/mmap support to work with OSS Proxy, written by Tejun. So firstly this is a call for testers; please try this out and report any problems.
Three pieces are needed:
kernel: git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git for-next
fuse: git://fuse.git.sourceforge.net/gitroot/fuse/fuse master
osspd: git://fuse.git.sourceforge.net/gitroot/fuse/osspd master
Comments about the interface and implementation in general (not related to OSS proxy) are very welcome as well.
Thanks, Miklos
--- Miklos Szeredi (1): fuse: create fuse_conn_operations
Tejun Heo (1): cuse: implement memory mapping
--- fs/fuse/cuse.c | 426 +++++++++++++++++++++++++++++++++++++++++++++++++- fs/fuse/dev.c | 155 ++----------------- fs/fuse/fuse_i.h | 35 ++++- fs/fuse/inode.c | 167 +++++++++++++++++++- include/linux/fuse.h | 25 +++ 5 files changed, 656 insertions(+), 152 deletions(-)
From: Miklos Szeredi mszeredi@suse.cz
Create a fuse_conn_operations structure that lets cuse implement its own notify_store and notify_retrieve operations.
The "release" operation is also moved to this structure.
Signed-off-by: Miklos Szeredi mszeredi@suse.cz --- fs/fuse/cuse.c | 6 ++- fs/fuse/dev.c | 153 +++----------------------------------------------- fs/fuse/fuse_i.h | 28 ++++++++- fs/fuse/inode.c | 166 +++++++++++++++++++++++++++++++++++++++++++++++++++++- 4 files changed, 202 insertions(+), 151 deletions(-)
diff --git a/fs/fuse/cuse.c b/fs/fuse/cuse.c index 3426521..53df9fe 100644 --- a/fs/fuse/cuse.c +++ b/fs/fuse/cuse.c @@ -461,6 +461,10 @@ static void cuse_fc_release(struct fuse_conn *fc) kfree(cc); }
+static const struct fuse_conn_operations cuse_ops = { + .release = cuse_fc_release, +}; + /** * cuse_channel_open - open method for /dev/cuse * @inode: inode for /dev/cuse @@ -489,7 +493,7 @@ static int cuse_channel_open(struct inode *inode, struct file *file) fuse_conn_init(&cc->fc);
INIT_LIST_HEAD(&cc->list); - cc->fc.release = cuse_fc_release; + cc->fc.ops = &cuse_ops;
cc->fc.connected = 1; cc->fc.blocked = 0; diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index 5f3368a..f1f5994 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -201,6 +201,7 @@ struct fuse_req *fuse_get_req_nofail(struct fuse_conn *fc, struct file *file) req->waiting = 1; return req; } +EXPORT_SYMBOL_GPL(fuse_get_req_nofail);
void fuse_put_request(struct fuse_conn *fc, struct fuse_req *req) { @@ -463,8 +464,8 @@ void fuse_request_send_background(struct fuse_conn *fc, struct fuse_req *req) } EXPORT_SYMBOL_GPL(fuse_request_send_background);
-static int fuse_request_send_notify_reply(struct fuse_conn *fc, - struct fuse_req *req, u64 unique) +int fuse_request_send_notify_reply(struct fuse_conn *fc, + struct fuse_req *req, u64 unique) { int err = -ENODEV;
@@ -813,8 +814,8 @@ static int fuse_ref_page(struct fuse_copy_state *cs, struct page *page, * Copy a page in the request to/from the userspace buffer. Must be * done atomically */ -static int fuse_copy_page(struct fuse_copy_state *cs, struct page **pagep, - unsigned offset, unsigned count, int zeroing) +int fuse_copy_page(struct fuse_copy_state *cs, struct page **pagep, + unsigned offset, unsigned count, int zeroing) { int err; struct page *page = *pagep; @@ -1445,15 +1446,7 @@ static int fuse_notify_store(struct fuse_conn *fc, unsigned int size, struct fuse_copy_state *cs) { struct fuse_notify_store_out outarg; - struct inode *inode; - struct address_space *mapping; - u64 nodeid; int err; - pgoff_t index; - unsigned int offset; - unsigned int num; - loff_t file_size; - loff_t end;
err = -EINVAL; if (size < sizeof(outarg)) @@ -1467,137 +1460,18 @@ static int fuse_notify_store(struct fuse_conn *fc, unsigned int size, if (size - sizeof(outarg) != outarg.size) goto out_finish;
- nodeid = outarg.nodeid; - - down_read(&fc->killsb); - - err = -ENOENT; - if (!fc->sb) - goto out_up_killsb; - - inode = ilookup5(fc->sb, nodeid, fuse_inode_eq, &nodeid); - if (!inode) - goto out_up_killsb; + err = fc->ops->notify_store(fc, cs, outarg.nodeid, outarg.size, + outarg.offset);
- mapping = inode->i_mapping; - index = outarg.offset >> PAGE_CACHE_SHIFT; - offset = outarg.offset & ~PAGE_CACHE_MASK; - file_size = i_size_read(inode); - end = outarg.offset + outarg.size; - if (end > file_size) { - file_size = end; - fuse_write_update_size(inode, file_size); - } - - num = outarg.size; - while (num) { - struct page *page; - unsigned int this_num; - - err = -ENOMEM; - page = find_or_create_page(mapping, index, - mapping_gfp_mask(mapping)); - if (!page) - goto out_iput; - - this_num = min_t(unsigned, num, PAGE_CACHE_SIZE - offset); - err = fuse_copy_page(cs, &page, offset, this_num, 0); - if (!err && offset == 0 && (num != 0 || file_size == end)) - SetPageUptodate(page); - unlock_page(page); - page_cache_release(page); - - if (err) - goto out_iput; - - num -= this_num; - offset = 0; - index++; - } - - err = 0; - -out_iput: - iput(inode); -out_up_killsb: - up_read(&fc->killsb); out_finish: fuse_copy_finish(cs); return err; }
-static void fuse_retrieve_end(struct fuse_conn *fc, struct fuse_req *req) -{ - release_pages(req->pages, req->num_pages, 0); -} - -static int fuse_retrieve(struct fuse_conn *fc, struct inode *inode, - struct fuse_notify_retrieve_out *outarg) -{ - int err; - struct address_space *mapping = inode->i_mapping; - struct fuse_req *req; - pgoff_t index; - loff_t file_size; - unsigned int num; - unsigned int offset; - size_t total_len = 0; - - req = fuse_get_req(fc); - if (IS_ERR(req)) - return PTR_ERR(req); - - offset = outarg->offset & ~PAGE_CACHE_MASK; - - req->in.h.opcode = FUSE_NOTIFY_REPLY; - req->in.h.nodeid = outarg->nodeid; - req->in.numargs = 2; - req->in.argpages = 1; - req->page_offset = offset; - req->end = fuse_retrieve_end; - - index = outarg->offset >> PAGE_CACHE_SHIFT; - file_size = i_size_read(inode); - num = outarg->size; - if (outarg->offset > file_size) - num = 0; - else if (outarg->offset + num > file_size) - num = file_size - outarg->offset; - - while (num && req->num_pages < FUSE_MAX_PAGES_PER_REQ) { - struct page *page; - unsigned int this_num; - - page = find_get_page(mapping, index); - if (!page) - break; - - this_num = min_t(unsigned, num, PAGE_CACHE_SIZE - offset); - req->pages[req->num_pages] = page; - req->num_pages++; - - num -= this_num; - total_len += this_num; - index++; - } - req->misc.retrieve_in.offset = outarg->offset; - req->misc.retrieve_in.size = total_len; - req->in.args[0].size = sizeof(req->misc.retrieve_in); - req->in.args[0].value = &req->misc.retrieve_in; - req->in.args[1].size = total_len; - - err = fuse_request_send_notify_reply(fc, req, outarg->notify_unique); - if (err) - fuse_retrieve_end(fc, req); - - return err; -} - static int fuse_notify_retrieve(struct fuse_conn *fc, unsigned int size, struct fuse_copy_state *cs) { struct fuse_notify_retrieve_out outarg; - struct inode *inode; int err;
err = -EINVAL; @@ -1610,18 +1484,7 @@ static int fuse_notify_retrieve(struct fuse_conn *fc, unsigned int size,
fuse_copy_finish(cs);
- down_read(&fc->killsb); - err = -ENOENT; - if (fc->sb) { - u64 nodeid = outarg.nodeid; - - inode = ilookup5(fc->sb, nodeid, fuse_inode_eq, &nodeid); - if (inode) { - err = fuse_retrieve(fc, inode, &outarg); - iput(inode); - } - } - up_read(&fc->killsb); + err = fc->ops->notify_retrieve(fc, &outarg);
return err;
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index a571584..9542f5b 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -312,6 +312,21 @@ struct fuse_req { struct file *stolen_file; };
+struct fuse_copy_state; + +struct fuse_conn_operations { + /** Called on final put */ + void (*release)(struct fuse_conn *); + + /** Called to store data into a mapping */ + int (*notify_store)(struct fuse_conn *, struct fuse_copy_state *, + u64 nodeid, u32 size, u64 pos); + + /** Called to retrieve data from a mapping */ + int (*notify_retrieve)(struct fuse_conn *, + struct fuse_notify_retrieve_out *); +}; + /** * A Fuse connection. * @@ -511,14 +526,14 @@ struct fuse_conn { /** Version counter for attribute changes */ u64 attr_version;
- /** Called on final put */ - void (*release)(struct fuse_conn *); - /** Super block for this connection. */ struct super_block *sb;
/** Read/write semaphore to hold when accessing sb. */ struct rw_semaphore killsb; + + /** Operations that fuse and cuse can implement differently */ + const struct fuse_conn_operations *ops; };
static inline struct fuse_conn *get_fuse_conn_super(struct super_block *sb) @@ -778,4 +793,11 @@ int fuse_dev_release(struct inode *inode, struct file *file);
void fuse_write_update_size(struct inode *inode, loff_t pos);
+int fuse_copy_page(struct fuse_copy_state *cs, struct page **pagep, + unsigned offset, unsigned count, int zeroing); + +int fuse_request_send_notify_reply(struct fuse_conn *fc, + struct fuse_req *req, u64 unique); + + #endif /* _FS_FUSE_I_H */ diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 3e6d727..4bf887f 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -551,7 +551,7 @@ void fuse_conn_put(struct fuse_conn *fc) if (fc->destroy_req) fuse_request_free(fc->destroy_req); mutex_destroy(&fc->inst_mutex); - fc->release(fc); + fc->ops->release(fc); } } EXPORT_SYMBOL_GPL(fuse_conn_put); @@ -915,6 +915,168 @@ static int fuse_bdi_init(struct fuse_conn *fc, struct super_block *sb) return 0; }
+static int fuse_notify_store_to_inode(struct fuse_conn *fc, + struct fuse_copy_state *cs, + u64 nodeid, u32 size, u64 pos) +{ + struct inode *inode; + struct address_space *mapping; + pgoff_t index; + unsigned int off; + loff_t file_size; + loff_t end; + int err; + + down_read(&fc->killsb); + + err = -ENOENT; + if (!fc->sb) + goto out_up_killsb; + + inode = ilookup5(fc->sb, nodeid, fuse_inode_eq, &nodeid); + if (!inode) + goto out_up_killsb; + + mapping = inode->i_mapping; + index = pos >> PAGE_CACHE_SHIFT; + off = pos & ~PAGE_CACHE_MASK; + file_size = i_size_read(inode); + end = pos + size; + if (end > file_size) { + file_size = end; + fuse_write_update_size(inode, file_size); + } + + while (size) { + struct page *page; + unsigned int this_num; + + err = -ENOMEM; + page = find_or_create_page(mapping, index, + mapping_gfp_mask(mapping)); + if (!page) + goto out_iput; + + this_num = min_t(unsigned, size, PAGE_CACHE_SIZE - off); + err = fuse_copy_page(cs, &page, off, this_num, 0); + if (!err && off == 0 && (size != 0 || file_size == end)) + SetPageUptodate(page); + unlock_page(page); + page_cache_release(page); + + if (err) + goto out_iput; + + size -= this_num; + off = 0; + index++; + } + + err = 0; + +out_iput: + iput(inode); +out_up_killsb: + up_read(&fc->killsb); + + return err; +} + +static void fuse_retrieve_end(struct fuse_conn *fc, struct fuse_req *req) +{ + release_pages(req->pages, req->num_pages, 0); +} + +static int fuse_retrieve(struct fuse_conn *fc, struct inode *inode, + struct fuse_notify_retrieve_out *outarg) +{ + int err; + struct address_space *mapping = inode->i_mapping; + struct fuse_req *req; + pgoff_t index; + loff_t file_size; + unsigned int num; + unsigned int offset; + size_t total_len = 0; + + req = fuse_get_req(fc); + if (IS_ERR(req)) + return PTR_ERR(req); + + offset = outarg->offset & ~PAGE_CACHE_MASK; + + req->in.h.opcode = FUSE_NOTIFY_REPLY; + req->in.h.nodeid = outarg->nodeid; + req->in.numargs = 2; + req->in.argpages = 1; + req->page_offset = offset; + req->end = fuse_retrieve_end; + + index = outarg->offset >> PAGE_CACHE_SHIFT; + file_size = i_size_read(inode); + num = outarg->size; + if (outarg->offset > file_size) + num = 0; + else if (outarg->offset + num > file_size) + num = file_size - outarg->offset; + + while (num && req->num_pages < FUSE_MAX_PAGES_PER_REQ) { + struct page *page; + unsigned int this_num; + + page = find_get_page(mapping, index); + if (!page) + break; + + this_num = min_t(unsigned, num, PAGE_CACHE_SIZE - offset); + req->pages[req->num_pages] = page; + req->num_pages++; + + num -= this_num; + total_len += this_num; + index++; + } + req->misc.retrieve_in.offset = outarg->offset; + req->misc.retrieve_in.size = total_len; + req->in.args[0].size = sizeof(req->misc.retrieve_in); + req->in.args[0].value = &req->misc.retrieve_in; + req->in.args[1].size = total_len; + + err = fuse_request_send_notify_reply(fc, req, outarg->notify_unique); + if (err) + fuse_retrieve_end(fc, req); + + return err; +} + +static int fuse_notify_retrieve_from_inode(struct fuse_conn *fc, + struct fuse_notify_retrieve_out *outarg) +{ + struct inode *inode; + int err; + + down_read(&fc->killsb); + err = -ENOENT; + if (fc->sb) { + u64 nodeid = outarg->nodeid; + + inode = ilookup5(fc->sb, nodeid, fuse_inode_eq, &nodeid); + if (inode) { + err = fuse_retrieve(fc, inode, outarg); + iput(inode); + } + } + up_read(&fc->killsb); + + return err; +} + +static const struct fuse_conn_operations fuse_default_ops = { + .release = fuse_free_conn, + .notify_store = fuse_notify_store_to_inode, + .notify_retrieve = fuse_notify_retrieve_from_inode, +}; + static int fuse_fill_super(struct super_block *sb, void *data, int silent) { struct fuse_conn *fc; @@ -978,7 +1140,7 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent) fc->dont_mask = 1; sb->s_flags |= MS_POSIXACL;
- fc->release = fuse_free_conn; + fc->ops = &fuse_default_ops; fc->flags = d.flags; fc->user_id = d.user_id; fc->group_id = d.group_id;
From: Tejun Heo htejun@gmail.com
This implements memory mapping of char devices.
Unlike memory maps for regular files this needs to allow more than one mapping to be associated with an open device.
Tha mapping is identified by a 64bit map ID. This is used in place of the node ID in the STORE and RETRIEVE notifications.
Original patch by Tejun Heo.
Signed-off-by: Miklos Szeredi mszeredi@suse.cz --- fs/fuse/cuse.c | 420 +++++++++++++++++++++++++++++++++++++++++++++++++- fs/fuse/dev.c | 2 + fs/fuse/fuse_i.h | 7 + fs/fuse/inode.c | 1 + include/linux/fuse.h | 25 +++ 5 files changed, 454 insertions(+), 1 deletions(-)
diff --git a/fs/fuse/cuse.c b/fs/fuse/cuse.c index 53df9fe..fc75f01 100644 --- a/fs/fuse/cuse.c +++ b/fs/fuse/cuse.c @@ -48,6 +48,8 @@ #include <linux/spinlock.h> #include <linux/stat.h> #include <linux/module.h> +#include <linux/mman.h> +#include <linux/pagemap.h>
#include "fuse_i.h"
@@ -174,6 +176,419 @@ static long cuse_file_compat_ioctl(struct file *file, unsigned int cmd, return fuse_do_ioctl(file, cmd, arg, flags); }
+struct fuse_dmmap_region { + u64 mapid; + u64 size; + pgoff_t nr_pages; + struct page **pages; + struct list_head list; + atomic_t ref; +}; + +/* + * fuse_dmmap_vm represents the result of a single mmap() call, which + * can be shared by multiple client vmas created by forking. + */ +struct fuse_dmmap_vm { + atomic_t open_count; + struct fuse_dmmap_region *region; +}; + +static void fuse_dmmap_region_put(struct fuse_conn *fc, + struct fuse_dmmap_region *fdr) +{ + if (atomic_dec_and_lock(&fdr->ref, &fc->lock)) { + pgoff_t idx; + + list_del(&fdr->list); + spin_unlock(&fc->lock); + + for (idx = 0; idx < fdr->nr_pages; idx++) + if (fdr->pages[idx]) + put_page(fdr->pages[idx]); + + kfree(fdr->pages); + kfree(fdr); + } +} + +static void fuse_dmmap_vm_open(struct vm_area_struct *vma) +{ + struct fuse_dmmap_vm *fdvm = vma->vm_private_data; + + /* vma copied */ + atomic_inc(&fdvm->open_count); +} + +static void fuse_dmmap_vm_close(struct vm_area_struct *vma) +{ + struct fuse_dmmap_vm *fdvm = vma->vm_private_data; + struct fuse_file *ff = vma->vm_file->private_data; + struct fuse_conn *fc = ff->fc; + struct fuse_req *req; + struct fuse_munmap_in *inarg; + + if (!atomic_dec_and_test(&fdvm->open_count)) + return; + /* + * Notify server that the mmap region has been unmapped. + * Failing this might lead to resource leak in server, don't + * fail. + */ + req = fuse_get_req_nofail(fc, vma->vm_file); + inarg = &req->misc.munmap_in; + + inarg->fh = ff->fh; + inarg->mapid = fdvm->region->mapid; + inarg->size = fdvm->region->size; + + req->in.h.opcode = FUSE_MUNMAP; + req->in.h.nodeid = ff->nodeid; + req->in.numargs = 1; + req->in.args[0].size = sizeof(*inarg); + req->in.args[0].value = inarg; + + fuse_request_send(fc, req); + fuse_dmmap_region_put(fc, fdvm->region); + kfree(fdvm); +} + +static struct page *fuse_dmmap_find_or_create_page(struct fuse_conn *fc, + struct fuse_dmmap_region *fdr, + pgoff_t index) +{ + struct page *new_page = NULL; + struct page *page; + + BUG_ON(index >= fdr->nr_pages); + + spin_lock(&fc->lock); + page = fdr->pages[index]; + if (!page) { + spin_unlock(&fc->lock); + /* need to allocate and install a new page */ + new_page = alloc_page(GFP_HIGHUSER | __GFP_ZERO); + if (!new_page) + return NULL; + + /* try to install, check whether someone else already did it */ + spin_lock(&fc->lock); + page = fdr->pages[index]; + if (!page) { + page = fdr->pages[index] = new_page; + new_page = NULL; + } + } + get_page(page); + spin_unlock(&fc->lock); + + if (new_page) + put_page(new_page); + + return page; +} + +static int fuse_dmmap_vm_fault(struct vm_area_struct *vma, struct vm_fault *vmf) +{ + struct fuse_dmmap_vm *fdvm = vma->vm_private_data; + struct fuse_dmmap_region *fdr = fdvm->region; + struct fuse_file *ff = vma->vm_file->private_data; + struct fuse_conn *fc = ff->fc; + + if (vmf->pgoff >= fdr->nr_pages) + return VM_FAULT_SIGBUS; + + vmf->page = fuse_dmmap_find_or_create_page(fc, fdr, vmf->pgoff); + if (!vmf->page) + return VM_FAULT_OOM; + + return 0; +} + +static const struct vm_operations_struct fuse_dmmap_vm_ops = { + .open = fuse_dmmap_vm_open, + .close = fuse_dmmap_vm_close, + .fault = fuse_dmmap_vm_fault, +}; + +static struct fuse_dmmap_region *fuse_dmmap_find_locked(struct fuse_conn *fc, + u64 mapid) +{ + struct fuse_dmmap_region *curr; + struct fuse_dmmap_region *fdr = NULL; + + list_for_each_entry(curr, &fc->dmmap_list, list) { + if (curr->mapid == mapid) { + fdr = curr; + atomic_inc(&fdr->ref); + break; + } + } + + return fdr; +} + +static struct fuse_dmmap_region *fuse_dmmap_find(struct fuse_conn *fc, + u64 mapid) +{ + struct fuse_dmmap_region *fdr; + + spin_lock(&fc->lock); + fdr = fuse_dmmap_find_locked(fc, mapid); + spin_unlock(&fc->lock); + + return fdr; +} + +static struct fuse_dmmap_region *fuse_dmmap_get(struct fuse_conn *fc, + u64 mapid, u64 size) +{ + struct fuse_dmmap_region *fdr; + pgoff_t nr_pages = (size + PAGE_SIZE - 1) >> PAGE_SHIFT; + + if ((loff_t) (nr_pages << PAGE_SHIFT) < size) + return ERR_PTR(-EIO); + + fdr = fuse_dmmap_find(fc, mapid); + if (fdr) { + if (fdr->size != size) { + fuse_dmmap_region_put(fc, fdr); + return ERR_PTR(-EIO); + } + } else { + struct fuse_dmmap_region *tmp; + + fdr = kzalloc(sizeof(struct fuse_dmmap_region), GFP_KERNEL); + if (!fdr) + return ERR_PTR(-ENOMEM); + + atomic_set(&fdr->ref, 1); + fdr->mapid = mapid; + fdr->size = size; + fdr->nr_pages = nr_pages; + + fdr->pages = kzalloc(sizeof(struct page *) * nr_pages, + GFP_KERNEL); + if (!fdr->pages) { + kfree(fdr); + return ERR_PTR(-ENOMEM); + } + + spin_lock(&fc->lock); + tmp = fuse_dmmap_find_locked(fc, mapid); + if (tmp) { + kfree(fdr->pages); + kfree(fdr); + fdr = tmp; + } else { + list_add(&fdr->list, &fc->dmmap_list); + } + spin_unlock(&fc->lock); + } + + return fdr; +} + +static int cuse_mmap(struct file *file, struct vm_area_struct *vma) +{ + struct fuse_file *ff = file->private_data; + struct fuse_conn *fc = ff->fc; + struct fuse_dmmap_vm *fdvm; + struct fuse_dmmap_region *fdr; + struct fuse_req *req = NULL; + struct fuse_mmap_in inarg; + struct fuse_mmap_out outarg; + int err; + + if (fc->no_dmmap) + return -ENOSYS; + + req = fuse_get_req(fc); + if (IS_ERR(req)) + return PTR_ERR(req); + + /* ask server whether this mmap is okay and what the offset should be */ + memset(&inarg, 0, sizeof(inarg)); + inarg.fh = ff->fh; + inarg.addr = vma->vm_start; + inarg.len = vma->vm_end - vma->vm_start; + inarg.prot = ((vma->vm_flags & VM_READ) ? PROT_READ : 0) | + ((vma->vm_flags & VM_WRITE) ? PROT_WRITE : 0) | + ((vma->vm_flags & VM_EXEC) ? PROT_EXEC : 0); + inarg.flags = ((vma->vm_flags & VM_GROWSDOWN) ? MAP_GROWSDOWN : 0) | + ((vma->vm_flags & VM_DENYWRITE) ? MAP_DENYWRITE : 0) | + ((vma->vm_flags & VM_EXECUTABLE) ? MAP_EXECUTABLE : 0) | + ((vma->vm_flags & VM_LOCKED) ? MAP_LOCKED : 0); + inarg.offset = (loff_t)vma->vm_pgoff << PAGE_SHIFT; + + req->in.h.opcode = FUSE_MMAP; + req->in.h.nodeid = ff->nodeid; + req->in.numargs = 1; + req->in.args[0].size = sizeof(inarg); + req->in.args[0].value = &inarg; + req->out.numargs = 1; + req->out.args[0].size = sizeof(outarg); + req->out.args[0].value = &outarg; + + fuse_request_send(fc, req); + err = req->out.h.error; + if (err) { + if (err == -ENOSYS) + fc->no_dmmap = 1; + goto free_req; + } + + fdr = fuse_dmmap_get(fc, outarg.mapid, outarg.size); + err = PTR_ERR(fdr); + if (IS_ERR(fdr)) + goto free_req; + + err = -ENOMEM; + fdvm = kzalloc(sizeof(*fdvm), GFP_KERNEL); + if (!fdvm) { + fuse_dmmap_region_put(fc, fdr); + goto free_req; + } + atomic_set(&fdvm->open_count, 1); + fdvm->region = fdr; + + vma->vm_ops = &fuse_dmmap_vm_ops; + vma->vm_private_data = fdvm; + vma->vm_flags |= VM_DONTEXPAND; /* disallow expansion for now */ + err = 0; + +free_req: + fuse_put_request(fc, req); + return err; +} + +static int fuse_notify_store_to_dmmap(struct fuse_conn *fc, + struct fuse_copy_state *cs, + u64 nodeid, u32 size, u64 pos) +{ + struct fuse_dmmap_region *fdr; + pgoff_t index; + unsigned int off; + int err; + + fdr = fuse_dmmap_find(fc, nodeid); + if (!fdr) + return -ENOENT; + + index = pos >> PAGE_SHIFT; + off = pos & ~PAGE_MASK; + if (pos > fdr->size) + size = 0; + else if (size > fdr->size - pos) + size = fdr->size - pos; + + while (size) { + struct page *page; + unsigned int this_num; + + err = -ENOMEM; + page = fuse_dmmap_find_or_create_page(fc, fdr, index); + if (!page) + goto out_iput; + + this_num = min_t(unsigned, size, PAGE_SIZE - off); + err = fuse_copy_page(cs, &page, off, this_num, 0); + put_page(page); + + if (err) + goto out_iput; + + size -= this_num; + off = 0; + index++; + } + + err = 0; + +out_iput: + fuse_dmmap_region_put(fc, fdr); + + return err; +} + +static void fuse_retrieve_dmmap_end(struct fuse_conn *fc, struct fuse_req *req) +{ + release_pages(req->pages, req->num_pages, 0); +} + +static int fuse_notify_retrieve_from_dmmap(struct fuse_conn *fc, + struct fuse_notify_retrieve_out *outarg) +{ + struct fuse_dmmap_region *fdr; + struct fuse_req *req; + pgoff_t index; + unsigned int num; + unsigned int offset; + size_t total_len = 0; + int err; + + fdr = fuse_dmmap_find(fc, outarg->nodeid); + if (!fdr) + return -ENOENT; + + req = fuse_get_req(fc); + err = PTR_ERR(req); + if (IS_ERR(req)) + goto out_put_region; + + offset = outarg->offset & ~PAGE_MASK; + + req->in.h.opcode = FUSE_NOTIFY_REPLY; + req->in.h.nodeid = outarg->nodeid; + req->in.numargs = 2; + req->in.argpages = 1; + req->page_offset = offset; + req->end = fuse_retrieve_dmmap_end; + + index = outarg->offset >> PAGE_SHIFT; + num = outarg->size; + if (outarg->offset > fdr->size) + num = 0; + else if (outarg->offset + num > fdr->size) + num = fdr->size - outarg->offset; + + while (num && req->num_pages < FUSE_MAX_PAGES_PER_REQ) { + struct page *page; + unsigned int this_num; + + BUG_ON(index >= fdr->nr_pages); + spin_lock(&fc->lock); + page = fdr->pages[index]; + if (!page) + page = ZERO_PAGE(0); + get_page(page); + spin_unlock(&fc->lock); + + this_num = min_t(unsigned, num, PAGE_SIZE - offset); + req->pages[req->num_pages] = page; + req->num_pages++; + + num -= this_num; + total_len += this_num; + index++; + } + req->misc.retrieve_in.offset = outarg->offset; + req->misc.retrieve_in.size = total_len; + req->in.args[0].size = sizeof(req->misc.retrieve_in); + req->in.args[0].value = &req->misc.retrieve_in; + req->in.args[1].size = total_len; + + err = fuse_request_send_notify_reply(fc, req, outarg->notify_unique); + if (err) + fuse_retrieve_dmmap_end(fc, req); + +out_put_region: + fuse_dmmap_region_put(fc, fdr); + + return err; +} + + static const struct file_operations cuse_frontend_fops = { .owner = THIS_MODULE, .read = cuse_read, @@ -183,7 +598,8 @@ static const struct file_operations cuse_frontend_fops = { .unlocked_ioctl = cuse_file_ioctl, .compat_ioctl = cuse_file_compat_ioctl, .poll = fuse_file_poll, - .llseek = noop_llseek, + .llseek = noop_llseek, + .mmap = cuse_mmap, };
@@ -463,6 +879,8 @@ static void cuse_fc_release(struct fuse_conn *fc)
static const struct fuse_conn_operations cuse_ops = { .release = cuse_fc_release, + .notify_store = fuse_notify_store_to_dmmap, + .notify_retrieve = fuse_notify_retrieve_from_dmmap, };
/** diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index f1f5994..e1b7a06 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -480,6 +480,7 @@ int fuse_request_send_notify_reply(struct fuse_conn *fc,
return err; } +EXPORT_SYMBOL_GPL(fuse_request_send_notify_reply);
/* * Called under fc->lock @@ -850,6 +851,7 @@ int fuse_copy_page(struct fuse_copy_state *cs, struct page **pagep, flush_dcache_page(page); return 0; } +EXPORT_SYMBOL_GPL(fuse_copy_page);
/* Copy pages in the request to/from userspace buffer */ static int fuse_copy_pages(struct fuse_copy_state *cs, unsigned nbytes, diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 9542f5b..c878fa9 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -285,6 +285,7 @@ struct fuse_req { } write; struct fuse_notify_retrieve_in retrieve_in; struct fuse_lk_in lk_in; + struct fuse_munmap_in munmap_in; } misc;
/** page vector */ @@ -484,6 +485,9 @@ struct fuse_conn { /** Is poll not implemented by fs? */ unsigned no_poll:1;
+ /** Is direct mmap not implemente by fs? */ + unsigned no_dmmap:1; + /** Do multi-page cached writes */ unsigned big_writes:1;
@@ -532,6 +536,9 @@ struct fuse_conn { /** Read/write semaphore to hold when accessing sb. */ struct rw_semaphore killsb;
+ /** List of direct mmaps (currently CUSE only) */ + struct list_head dmmap_list; + /** Operations that fuse and cuse can implement differently */ const struct fuse_conn_operations *ops; }; diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 4bf887f..7ffb64a 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -542,6 +542,7 @@ void fuse_conn_init(struct fuse_conn *fc) fc->blocked = 1; fc->attr_version = 1; get_random_bytes(&fc->scramble_key, sizeof(fc->scramble_key)); + INIT_LIST_HEAD(&fc->dmmap_list); } EXPORT_SYMBOL_GPL(fuse_conn_init);
diff --git a/include/linux/fuse.h b/include/linux/fuse.h index 8ba2c94..bc18853 100644 --- a/include/linux/fuse.h +++ b/include/linux/fuse.h @@ -54,6 +54,7 @@ * 7.18 * - add FUSE_IOCTL_DIR flag * - add FUSE_NOTIFY_DELETE + * - add FUSE_MMAP and FUSE_MUNMAP */
#ifndef _LINUX_FUSE_H @@ -278,6 +279,8 @@ enum fuse_opcode { FUSE_POLL = 40, FUSE_NOTIFY_REPLY = 41, FUSE_BATCH_FORGET = 42, + FUSE_MMAP = 43, + FUSE_MUNMAP = 44,
/* CUSE specific operations */ CUSE_INIT = 4096, @@ -571,6 +574,28 @@ struct fuse_notify_poll_wakeup_out { __u64 kh; };
+struct fuse_mmap_in { + __u64 fh; + __u64 addr; + __u64 len; + __u32 prot; + __u32 flags; + __u64 offset; +}; + +struct fuse_mmap_out { + __u64 mapid; /* Mmap ID, same namespace as Inode ID */ + __u64 size; /* Size of memory region */ + __u64 reserved; +}; + +struct fuse_munmap_in { + __u64 fh; + __u64 mapid; + __u64 size; /* Size of memory region */ + __u64 reserved; +}; + struct fuse_in_header { __u32 len; __u32 opcode;
On Fri, Jan 13, 2012 at 9:06 AM, Miklos Szeredi miklos@szeredi.hu wrote:
From: Tejun Heo htejun@gmail.com
This implements memory mapping of char devices.
I don't think this is how you want to do it.
It seems to maintain a page list of its own, and do the magic page fault etc behavior. Which to me smells like a really bad design.
I would expect that what you actually want to do is to expose it as a shared mmap, and depend on all the normal shmem support. Is there any reason not to do that?
I guess you don't generally have big mappings, so an argument like "that way you can page out pages etc" may not strike you as a very strong argument, but I'd still prefer to at least see that approach explored. Hmm?
Linus
Hello, Linus.
On Fri, Jan 13, 2012 at 10:19:50AM -0800, Linus Torvalds wrote:
On Fri, Jan 13, 2012 at 9:06 AM, Miklos Szeredi miklos@szeredi.hu wrote:
From: Tejun Heo htejun@gmail.com
This implements memory mapping of char devices.
I don't think this is how you want to do it.
It seems to maintain a page list of its own, and do the magic page fault etc behavior. Which to me smells like a really bad design.
I would expect that what you actually want to do is to expose it as a shared mmap, and depend on all the normal shmem support. Is there any reason not to do that?
I guess you don't generally have big mappings, so an argument like "that way you can page out pages etc" may not strike you as a very strong argument, but I'd still prefer to at least see that approach explored. Hmm?
The patch is years old and the original implementation had different requirements (it mapped the same pages in the client's and server's address spaces instead of using notifications). I don't really remember the details but I tried to use shmem pretty hard but couldn't. I *think* it was about having to accept random mapping offset. ISTR shmem implementation wasn't too happy with randomish high mapping offset and I couldn't shift the mapping offset due to the direct mapping requirement (again, memory is quite fuzzy).
With notification based implementation, it might be able to just shift the mapping offset and use shmem. Miklos, what do you think?
Thanks.
participants (3)
-
Linus Torvalds
-
Miklos Szeredi
-
Tejun Heo