Merge branch 'vfs.all' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git

This commit is contained in:
Stephen Rothwell 2024-08-29 08:14:19 +10:00
commit e7034c173d
211 changed files with 3891 additions and 1937 deletions

View File

@ -4788,6 +4788,16 @@
printk.time= Show timing data prefixed to each printk message line
Format: <bool> (1/Y/y=enable, 0/N/n=disable)
proc_mem.force_override= [KNL]
Format: {always | ptrace | never}
Traditionally /proc/pid/mem allows memory permissions to be
overridden without restrictions. This option may be set to
restrict that. Can be one of:
- 'always': traditional behavior always allows mem overrides.
- 'ptrace': only allow mem overrides for active ptracers.
- 'never': never allow mem overrides.
If not specified, default is the CONFIG_PROC_MEM_* choice.
processor.max_cstate= [HW,ACPI]
Limit processor to maximum C-state
max_cstate=9 overrides any DMI blacklist limit.

View File

@ -821,7 +821,7 @@ the same idmapping to the mount. We now perform three steps:
/* Map the userspace id down into a kernel id in the filesystem's idmapping. */
make_kuid(u0:k20000:r10000, u1000) = k21000
2. Verify that the caller's kernel ids can be mapped to userspace ids in the
3. Verify that the caller's kernel ids can be mapped to userspace ids in the
filesystem's idmapping::
from_kuid(u0:k20000:r10000, k21000) = u1000
@ -854,10 +854,10 @@ The same translation algorithm works with the third example.
/* Map the userspace id down into a kernel id in the filesystem's idmapping. */
make_kuid(u0:k0:r4294967295, u1000) = k1000
2. Verify that the caller's kernel ids can be mapped to userspace ids in the
3. Verify that the caller's kernel ids can be mapped to userspace ids in the
filesystem's idmapping::
from_kuid(u0:k0:r4294967295, k21000) = u1000
from_kuid(u0:k0:r4294967295, k1000) = u1000
So the ownership that lands on disk will be ``u1000``.
@ -994,7 +994,7 @@ from above:::
/* Map the userspace id down into a kernel id in the filesystem's idmapping. */
make_kuid(u0:k0:r4294967295, u1000) = k1000
2. Verify that the caller's filesystem ids can be mapped to userspace ids in the
3. Verify that the caller's filesystem ids can be mapped to userspace ids in the
filesystem's idmapping::
from_kuid(u0:k0:r4294967295, k1000) = u1000

View File

@ -29,6 +29,7 @@ algorithms work.
fiemap
files
locks
multigrain-ts
mount_api
quota
seq_file

View File

@ -142,9 +142,9 @@ Definitions
* **pure overwrite**: A write operation that does not require any
metadata or zeroing operations to perform during either submission
or completion.
This implies that the fileystem must have already allocated space
This implies that the filesystem must have already allocated space
on disk as ``IOMAP_MAPPED`` and the filesystem must not place any
constaints on IO alignment or size.
constraints on IO alignment or size.
The only constraints on I/O alignment are device level (minimum I/O
size and alignment, typically sector size).
@ -394,7 +394,7 @@ iomap is concerned:
* The **upper** level primitive is provided by the filesystem to
coordinate access to different iomap operations.
The exact primitive is specifc to the filesystem and operation,
The exact primitive is specific to the filesystem and operation,
but is often a VFS inode, pagecache invalidation, or folio lock.
For example, a filesystem might take ``i_rwsem`` before calling
``iomap_file_buffered_write`` and ``iomap_file_unshare`` to prevent

View File

@ -251,10 +251,10 @@ prototypes::
void (*readahead)(struct readahead_control *);
int (*write_begin)(struct file *, struct address_space *mapping,
loff_t pos, unsigned len,
struct page **pagep, void **fsdata);
struct folio **foliop, void **fsdata);
int (*write_end)(struct file *, struct address_space *mapping,
loff_t pos, unsigned len, unsigned copied,
struct page *page, void *fsdata);
struct folio *folio, void *fsdata);
sector_t (*bmap)(struct address_space *, sector_t);
void (*invalidate_folio) (struct folio *, size_t start, size_t len);
bool (*release_folio)(struct folio *, gfp_t);
@ -280,7 +280,7 @@ read_folio: yes, unlocks shared
writepages:
dirty_folio: maybe
readahead: yes, unlocks shared
write_begin: locks the page exclusive
write_begin: locks the folio exclusive
write_end: yes, unlocks exclusive
bmap:
invalidate_folio: yes exclusive

View File

@ -0,0 +1,121 @@
.. SPDX-License-Identifier: GPL-2.0
=====================
Multigrain Timestamps
=====================
Introduction
============
Historically, the kernel has always used coarse time values to stamp inodes.
This value is updated every jiffy, so any change that happens within that jiffy
will end up with the same timestamp.
When the kernel goes to stamp an inode (due to a read or write), it first gets
the current time and then compares it to the existing timestamp(s) to see
whether anything will change. If nothing changed, then it can avoid updating
the inode's metadata.
Coarse timestamps are therefore good from a performance standpoint, since they
reduce the need for metadata updates, but bad from the standpoint of
determining whether anything has changed, since a lot of things can happen in a
jiffy.
They are particularly troublesome with NFSv3, where unchanging timestamps can
make it difficult to tell whether to invalidate caches. NFSv4 provides a
dedicated change attribute that should always show a visible change, but not
all filesystems implement this properly, causing the NFS server to substitute
the ctime in many cases.
Multigrain timestamps aim to remedy this by selectively using fine-grained
timestamps when a file has had its timestamps queried recently, and the current
coarse-grained time does not cause a change.
Inode Timestamps
================
There are currently 3 timestamps in the inode that are updated to the current
wallclock time on different activity:
ctime:
The inode change time. This is stamped with the current time whenever
the inode's metadata is changed. Note that this value is not settable
from userland.
mtime:
The inode modification time. This is stamped with the current time
any time a file's contents change.
atime:
The inode access time. This is stamped whenever an inode's contents are
read. Widely considered to be a terrible mistake. Usually avoided with
options like noatime or relatime.
Updating the mtime always implies a change to the ctime, but updating the
atime due to a read request does not.
Multigrain timestamps are only tracked for the ctime and the mtime. atimes are
not affected and always use the coarse-grained value (subject to the floor).
Inode Timestamp Ordering
========================
In addition to just providing info about changes to individual files, file
timestamps also serve an important purpose in applications like "make". These
programs measure timestamps in order to determine whether source files might be
newer than cached objects.
Userland applications like make can only determine ordering based on
operational boundaries. For a syscall those are the syscall entry and exit
points. For io_uring or nfsd operations, that's the request submission and
response. In the case of concurrent operations, userland can make no
determination about the order in which things will occur.
For instance, if a single thread modifies one file, and then another file in
sequence, the second file must show an equal or later mtime than the first. The
same is true if two threads are issuing similar operations that do not overlap
in time.
If however, two threads have racing syscalls that overlap in time, then there
is no such guarantee, and the second file may appear to have been modified
before, after or at the same time as the first, regardless of which one was
submitted first.
Multigrain Timestamp Implementation
===================================
Multigrain timestamps are aimed at ensuring that changes to a single file are
always recognizable, without violating the ordering guarantees when multiple
different files are modified. This affects the mtime and the ctime, but the
atime will always use coarse-grained timestamps.
It uses an unused bit in the i_ctime_nsec field to indicate whether the mtime
or ctime has been queried. If either or both have, then the kernel takes
special care to ensure the next timestamp update will display a visible change.
This ensures tight cache coherency for use-cases like NFS, without sacrificing
the benefits of reduced metadata updates when files aren't being watched.
The Ctime Floor Value
=====================
It's not sufficient to simply use fine or coarse-grained timestamps based on
whether the mtime or ctime has been queried. A file could get a fine grained
timestamp, and then a second file modified later could get a coarse-grained one
that appears earlier than the first, which would break the kernel's timestamp
ordering guarantees.
To mitigate this problem, we maintain a global floor value that ensures that
this can't happen. The two files in the above example may appear to have been
modified at the same time in such a case, but they will never show the reverse
order. To avoid problems with realtime clock jumps, the floor is managed as a
monotonic ktime_t, and the values are converted to realtime clock values as
needed.
Implementation Notes
====================
Multigrain timestamps are intended for use by local filesystems that get
ctime values from the local clock. This is in contrast to network filesystems
and the like that just mirror timestamp values from a server.
For most filesystems, it's sufficient to just set the FS_MGTIME flag in the
fstype->fs_flags in order to opt-in, providing the ctime is only ever set via
inode_set_ctime_current(). If the filesystem has a ->getattr routine that
doesn't call generic_fillattr, then you should have it call fill_mg_cmtime to
fill those values. For setattr, it should use setattr_copy() to update the
timestamps, or otherwise mimic its behavior.

View File

@ -810,7 +810,7 @@ cache in your filesystem. The following members are defined:
struct page **pagep, void **fsdata);
int (*write_end)(struct file *, struct address_space *mapping,
loff_t pos, unsigned len, unsigned copied,
struct page *page, void *fsdata);
struct folio *folio, void *fsdata);
sector_t (*bmap)(struct address_space *, sector_t);
void (*invalidate_folio) (struct folio *, size_t start, size_t len);
bool (*release_folio)(struct folio *, gfp_t);
@ -926,12 +926,12 @@ cache in your filesystem. The following members are defined:
(if they haven't been read already) so that the updated blocks
can be written out properly.
The filesystem must return the locked pagecache page for the
specified offset, in ``*pagep``, for the caller to write into.
The filesystem must return the locked pagecache folio for the
specified offset, in ``*foliop``, for the caller to write into.
It must be able to cope with short writes (where the length
passed to write_begin is greater than the number of bytes copied
into the page).
into the folio).
A void * may be returned in fsdata, which then gets passed into
write_end.
@ -944,8 +944,8 @@ cache in your filesystem. The following members are defined:
called. len is the original len passed to write_begin, and
copied is the amount that was able to be copied.
The filesystem must take care of unlocking the page and
releasing it refcount, and updating i_size.
The filesystem must take care of unlocking the folio,
decrementing its refcount, and updating i_size.
Returns < 0 on failure, otherwise the number of bytes (<=
'copied') that were able to be copied into pagecache.

View File

@ -8599,6 +8599,7 @@ M: Christian Brauner <brauner@kernel.org>
R: Jan Kara <jack@suse.cz>
L: linux-fsdevel@vger.kernel.org
S: Maintained
T: git https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
F: fs/*
F: include/linux/fs.h
F: include/linux/fs_types.h

View File

@ -502,3 +502,7 @@
570 common lsm_set_self_attr sys_lsm_set_self_attr
571 common lsm_list_modules sys_lsm_list_modules
572 common mseal sys_mseal
573 common setxattrat sys_setxattrat
574 common getxattrat sys_getxattrat
575 common listxattrat sys_listxattrat
576 common removexattrat sys_removexattrat

View File

@ -477,3 +477,7 @@
460 common lsm_set_self_attr sys_lsm_set_self_attr
461 common lsm_list_modules sys_lsm_list_modules
462 common mseal sys_mseal
463 common setxattrat sys_setxattrat
464 common getxattrat sys_getxattrat
465 common listxattrat sys_listxattrat
466 common removexattrat sys_removexattrat

View File

@ -462,3 +462,7 @@
460 common lsm_set_self_attr sys_lsm_set_self_attr
461 common lsm_list_modules sys_lsm_list_modules
462 common mseal sys_mseal
463 common setxattrat sys_setxattrat
464 common getxattrat sys_getxattrat
465 common listxattrat sys_listxattrat
466 common removexattrat sys_removexattrat

View File

@ -468,3 +468,7 @@
460 common lsm_set_self_attr sys_lsm_set_self_attr
461 common lsm_list_modules sys_lsm_list_modules
462 common mseal sys_mseal
463 common setxattrat sys_setxattrat
464 common getxattrat sys_getxattrat
465 common listxattrat sys_listxattrat
466 common removexattrat sys_removexattrat

View File

@ -401,3 +401,7 @@
460 n32 lsm_set_self_attr sys_lsm_set_self_attr
461 n32 lsm_list_modules sys_lsm_list_modules
462 n32 mseal sys_mseal
463 n32 setxattrat sys_setxattrat
464 n32 getxattrat sys_getxattrat
465 n32 listxattrat sys_listxattrat
466 n32 removexattrat sys_removexattrat

View File

@ -377,3 +377,7 @@
460 n64 lsm_set_self_attr sys_lsm_set_self_attr
461 n64 lsm_list_modules sys_lsm_list_modules
462 n64 mseal sys_mseal
463 n64 setxattrat sys_setxattrat
464 n64 getxattrat sys_getxattrat
465 n64 listxattrat sys_listxattrat
466 n64 removexattrat sys_removexattrat

View File

@ -450,3 +450,7 @@
460 o32 lsm_set_self_attr sys_lsm_set_self_attr
461 o32 lsm_list_modules sys_lsm_list_modules
462 o32 mseal sys_mseal
463 o32 setxattrat sys_setxattrat
464 o32 getxattrat sys_getxattrat
465 o32 listxattrat sys_listxattrat
466 o32 removexattrat sys_removexattrat

View File

@ -461,3 +461,7 @@
460 common lsm_set_self_attr sys_lsm_set_self_attr
461 common lsm_list_modules sys_lsm_list_modules
462 common mseal sys_mseal
463 common setxattrat sys_setxattrat
464 common getxattrat sys_getxattrat
465 common listxattrat sys_listxattrat
466 common removexattrat sys_removexattrat

View File

@ -553,3 +553,7 @@
460 common lsm_set_self_attr sys_lsm_set_self_attr
461 common lsm_list_modules sys_lsm_list_modules
462 common mseal sys_mseal
463 common setxattrat sys_setxattrat
464 common getxattrat sys_getxattrat
465 common listxattrat sys_listxattrat
466 common removexattrat sys_removexattrat

View File

@ -465,3 +465,7 @@
460 common lsm_set_self_attr sys_lsm_set_self_attr sys_lsm_set_self_attr
461 common lsm_list_modules sys_lsm_list_modules sys_lsm_list_modules
462 common mseal sys_mseal sys_mseal
463 common setxattrat sys_setxattrat sys_setxattrat
464 common getxattrat sys_getxattrat sys_getxattrat
465 common listxattrat sys_listxattrat sys_listxattrat
466 common removexattrat sys_removexattrat sys_removexattrat

View File

@ -466,3 +466,7 @@
460 common lsm_set_self_attr sys_lsm_set_self_attr
461 common lsm_list_modules sys_lsm_list_modules
462 common mseal sys_mseal
463 common setxattrat sys_setxattrat
464 common getxattrat sys_getxattrat
465 common listxattrat sys_listxattrat
466 common removexattrat sys_removexattrat

View File

@ -508,3 +508,7 @@
460 common lsm_set_self_attr sys_lsm_set_self_attr
461 common lsm_list_modules sys_lsm_list_modules
462 common mseal sys_mseal
463 common setxattrat sys_setxattrat
464 common getxattrat sys_getxattrat
465 common listxattrat sys_listxattrat
466 common removexattrat sys_removexattrat

View File

@ -468,3 +468,7 @@
460 i386 lsm_set_self_attr sys_lsm_set_self_attr
461 i386 lsm_list_modules sys_lsm_list_modules
462 i386 mseal sys_mseal
463 i386 setxattrat sys_setxattrat
464 i386 getxattrat sys_getxattrat
465 i386 listxattrat sys_listxattrat
466 i386 removexattrat sys_removexattrat

View File

@ -386,6 +386,10 @@
460 common lsm_set_self_attr sys_lsm_set_self_attr
461 common lsm_list_modules sys_lsm_list_modules
462 common mseal sys_mseal
463 common setxattrat sys_setxattrat
464 common getxattrat sys_getxattrat
465 common listxattrat sys_listxattrat
466 common removexattrat sys_removexattrat
#
# Due to a historical design error, certain syscalls are numbered differently

View File

@ -433,3 +433,7 @@
460 common lsm_set_self_attr sys_lsm_set_self_attr
461 common lsm_list_modules sys_lsm_list_modules
462 common mseal sys_mseal
463 common setxattrat sys_setxattrat
464 common getxattrat sys_getxattrat
465 common listxattrat sys_listxattrat
466 common removexattrat sys_removexattrat

View File

@ -451,20 +451,20 @@ static void blkdev_readahead(struct readahead_control *rac)
}
static int blkdev_write_begin(struct file *file, struct address_space *mapping,
loff_t pos, unsigned len, struct page **pagep, void **fsdata)
loff_t pos, unsigned len, struct folio **foliop, void **fsdata)
{
return block_write_begin(mapping, pos, len, pagep, blkdev_get_block);
return block_write_begin(mapping, pos, len, foliop, blkdev_get_block);
}
static int blkdev_write_end(struct file *file, struct address_space *mapping,
loff_t pos, unsigned len, unsigned copied, struct page *page,
loff_t pos, unsigned len, unsigned copied, struct folio *folio,
void *fsdata)
{
int ret;
ret = block_write_end(file, mapping, pos, len, copied, page, fsdata);
ret = block_write_end(file, mapping, pos, len, copied, folio, fsdata);
unlock_page(page);
put_page(page);
folio_unlock(folio);
folio_put(folio);
return ret;
}
@ -665,7 +665,7 @@ blkdev_direct_write(struct kiocb *iocb, struct iov_iter *from)
static ssize_t blkdev_buffered_write(struct kiocb *iocb, struct iov_iter *from)
{
return iomap_file_buffered_write(iocb, from, &blkdev_iomap_ops);
return iomap_file_buffered_write(iocb, from, &blkdev_iomap_ops, NULL);
}
/*
@ -771,7 +771,7 @@ reexpand:
#define BLKDEV_FALLOC_FL_SUPPORTED \
(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE | \
FALLOC_FL_ZERO_RANGE | FALLOC_FL_NO_HIDE_STALE)
FALLOC_FL_ZERO_RANGE)
static long blkdev_fallocate(struct file *file, int mode, loff_t start,
loff_t len)
@ -830,14 +830,6 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start,
len >> SECTOR_SHIFT, GFP_KERNEL,
BLKDEV_ZERO_NOFALLBACK);
break;
case FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE | FALLOC_FL_NO_HIDE_STALE:
error = truncate_bdev_range(bdev, file_to_blk_mode(file), start, end);
if (error)
goto fail;
error = blkdev_issue_discard(bdev, start >> SECTOR_SHIFT,
len >> SECTOR_SHIFT, GFP_KERNEL);
break;
default:
error = -EOPNOTSUPP;
}

View File

@ -14,12 +14,6 @@
#define MAX_BUF_SZ PAGE_SIZE
static int adi_open(struct inode *inode, struct file *file)
{
file->f_mode |= FMODE_UNSIGNED_OFFSET;
return 0;
}
static int read_mcd_tag(unsigned long addr)
{
long err;
@ -206,9 +200,9 @@ static loff_t adi_llseek(struct file *file, loff_t offset, int whence)
static const struct file_operations adi_fops = {
.owner = THIS_MODULE,
.llseek = adi_llseek,
.open = adi_open,
.read = adi_read,
.write = adi_write,
.fop_flags = FOP_UNSIGNED_OFFSET,
};
static struct miscdevice adi_miscdev = {

View File

@ -643,6 +643,7 @@ static const struct file_operations __maybe_unused mem_fops = {
.get_unmapped_area = get_unmapped_area_mem,
.mmap_capabilities = memory_mmap_capabilities,
#endif
.fop_flags = FOP_UNSIGNED_OFFSET,
};
static const struct file_operations null_fops = {
@ -693,7 +694,7 @@ static const struct memdev {
umode_t mode;
} devlist[] = {
#ifdef CONFIG_DEVMEM
[DEVMEM_MINOR] = { "mem", &mem_fops, FMODE_UNSIGNED_OFFSET, 0 },
[DEVMEM_MINOR] = { "mem", &mem_fops, 0, 0 },
#endif
[3] = { "null", &null_fops, FMODE_NOWAIT, 0666 },
#ifdef CONFIG_DEVPORT

View File

@ -2908,6 +2908,7 @@ static const struct file_operations amdgpu_driver_kms_fops = {
#ifdef CONFIG_PROC_FS
.show_fdinfo = drm_show_fdinfo,
#endif
.fop_flags = FOP_UNSIGNED_OFFSET,
};
int amdgpu_file_to_fpriv(struct file *filp, struct amdgpu_fpriv **fpriv)

View File

@ -318,6 +318,8 @@ int drm_open_helper(struct file *filp, struct drm_minor *minor)
if (dev->switch_power_state != DRM_SWITCH_POWER_ON &&
dev->switch_power_state != DRM_SWITCH_POWER_DYNAMIC_OFF)
return -EINVAL;
if (WARN_ON_ONCE(!(filp->f_op->fop_flags & FOP_UNSIGNED_OFFSET)))
return -EINVAL;
drm_dbg_core(dev, "comm=\"%s\", pid=%d, minor=%d\n",
current->comm, task_pid_nr(current), minor->index);
@ -335,7 +337,6 @@ int drm_open_helper(struct file *filp, struct drm_minor *minor)
}
filp->private_data = priv;
filp->f_mode |= FMODE_UNSIGNED_OFFSET;
priv->filp = filp;
mutex_lock(&dev->filelist_mutex);

View File

@ -498,6 +498,7 @@ static const struct file_operations psb_gem_fops = {
.mmap = drm_gem_mmap,
.poll = drm_poll,
.read = drm_read,
.fop_flags = FOP_UNSIGNED_OFFSET,
};
static const struct drm_driver driver = {

View File

@ -424,7 +424,8 @@ shmem_pwrite(struct drm_i915_gem_object *obj,
struct address_space *mapping = obj->base.filp->f_mapping;
const struct address_space_operations *aops = mapping->a_ops;
char __user *user_data = u64_to_user_ptr(arg->data_ptr);
u64 remain, offset;
u64 remain;
loff_t pos;
unsigned int pg;
/* Caller already validated user args */
@ -457,12 +458,12 @@ shmem_pwrite(struct drm_i915_gem_object *obj,
*/
remain = arg->size;
offset = arg->offset;
pg = offset_in_page(offset);
pos = arg->offset;
pg = offset_in_page(pos);
do {
unsigned int len, unwritten;
struct page *page;
struct folio *folio;
void *data, *vaddr;
int err;
char __maybe_unused c;
@ -480,21 +481,19 @@ shmem_pwrite(struct drm_i915_gem_object *obj,
if (err)
return err;
err = aops->write_begin(obj->base.filp, mapping, offset, len,
&page, &data);
err = aops->write_begin(obj->base.filp, mapping, pos, len,
&folio, &data);
if (err < 0)
return err;
vaddr = kmap_local_page(page);
vaddr = kmap_local_folio(folio, offset_in_folio(folio, pos));
pagefault_disable();
unwritten = __copy_from_user_inatomic(vaddr + pg,
user_data,
len);
unwritten = __copy_from_user_inatomic(vaddr, user_data, len);
pagefault_enable();
kunmap_local(vaddr);
err = aops->write_end(obj->base.filp, mapping, offset, len,
len - unwritten, page, data);
err = aops->write_end(obj->base.filp, mapping, pos, len,
len - unwritten, folio, data);
if (err < 0)
return err;
@ -504,7 +503,7 @@ shmem_pwrite(struct drm_i915_gem_object *obj,
remain -= len;
user_data += len;
offset += len;
pos += len;
pg = 0;
} while (remain);
@ -660,7 +659,7 @@ i915_gem_object_create_shmem_from_data(struct drm_i915_private *i915,
struct drm_i915_gem_object *obj;
struct file *file;
const struct address_space_operations *aops;
resource_size_t offset;
loff_t pos;
int err;
GEM_WARN_ON(IS_DGFX(i915));
@ -672,29 +671,27 @@ i915_gem_object_create_shmem_from_data(struct drm_i915_private *i915,
file = obj->base.filp;
aops = file->f_mapping->a_ops;
offset = 0;
pos = 0;
do {
unsigned int len = min_t(typeof(size), size, PAGE_SIZE);
struct page *page;
void *pgdata, *vaddr;
struct folio *folio;
void *fsdata;
err = aops->write_begin(file, file->f_mapping, offset, len,
&page, &pgdata);
err = aops->write_begin(file, file->f_mapping, pos, len,
&folio, &fsdata);
if (err < 0)
goto fail;
vaddr = kmap(page);
memcpy(vaddr, data, len);
kunmap(page);
memcpy_to_folio(folio, offset_in_folio(folio, pos), data, len);
err = aops->write_end(file, file->f_mapping, offset, len, len,
page, pgdata);
err = aops->write_end(file, file->f_mapping, pos, len, len,
folio, fsdata);
if (err < 0)
goto fail;
size -= len;
data += len;
offset += len;
pos += len;
} while (size);
return obj;

View File

@ -1671,6 +1671,7 @@ static const struct file_operations i915_driver_fops = {
#ifdef CONFIG_PROC_FS
.show_fdinfo = drm_show_fdinfo,
#endif
.fop_flags = FOP_UNSIGNED_OFFSET,
};
static int

View File

@ -1274,6 +1274,7 @@ nouveau_driver_fops = {
.compat_ioctl = nouveau_compat_ioctl,
#endif
.llseek = noop_llseek,
.fop_flags = FOP_UNSIGNED_OFFSET,
};
static struct drm_driver

View File

@ -520,6 +520,7 @@ static const struct file_operations radeon_driver_kms_fops = {
#ifdef CONFIG_COMPAT
.compat_ioctl = radeon_kms_compat_ioctl,
#endif
.fop_flags = FOP_UNSIGNED_OFFSET,
};
static const struct drm_ioctl_desc radeon_ioctls_kms[] = {

View File

@ -801,6 +801,7 @@ static const struct file_operations tegra_drm_fops = {
.read = drm_read,
.compat_ioctl = drm_compat_ioctl,
.llseek = noop_llseek,
.fop_flags = FOP_UNSIGNED_OFFSET,
};
static int tegra_drm_context_cleanup(int id, void *p, void *data)

View File

@ -1609,6 +1609,7 @@ static const struct file_operations vmwgfx_driver_fops = {
.compat_ioctl = vmw_compat_ioctl,
#endif
.llseek = noop_llseek,
.fop_flags = FOP_UNSIGNED_OFFSET,
};
static const struct drm_driver driver = {

View File

@ -241,6 +241,7 @@ static const struct file_operations xe_driver_fops = {
#ifdef CONFIG_PROC_FS
.show_fdinfo = drm_show_fdinfo,
#endif
.fop_flags = FOP_UNSIGNED_OFFSET,
};
static struct drm_driver driver = {

View File

@ -3451,6 +3451,12 @@ static int tun_chr_fasync(int fd, struct file *file, int on)
struct tun_file *tfile = file->private_data;
int ret;
if (on) {
ret = file_f_owner_allocate(file);
if (ret)
goto out;
}
if ((ret = fasync_helper(fd, file, on, &tfile->fasync)) < 0)
goto out;

View File

@ -2225,6 +2225,12 @@ static int __tty_fasync(int fd, struct file *filp, int on)
if (tty_paranoia_check(tty, file_inode(filp), "tty_fasync"))
goto out;
if (on) {
retval = file_f_owner_allocate(filp);
if (retval)
goto out;
}
retval = fasync_helper(fd, filp, on, &tty->fasync);
if (retval <= 0)
goto out;

View File

@ -55,12 +55,11 @@ static void adfs_write_failed(struct address_space *mapping, loff_t to)
static int adfs_write_begin(struct file *file, struct address_space *mapping,
loff_t pos, unsigned len,
struct page **pagep, void **fsdata)
struct folio **foliop, void **fsdata)
{
int ret;
*pagep = NULL;
ret = cont_write_begin(file, mapping, pos, len, pagep, fsdata,
ret = cont_write_begin(file, mapping, pos, len, foliop, fsdata,
adfs_get_block,
&ADFS_I(mapping->host)->mmu_private);
if (unlikely(ret))

View File

@ -417,12 +417,11 @@ affs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
static int affs_write_begin(struct file *file, struct address_space *mapping,
loff_t pos, unsigned len,
struct page **pagep, void **fsdata)
struct folio **foliop, void **fsdata)
{
int ret;
*pagep = NULL;
ret = cont_write_begin(file, mapping, pos, len, pagep, fsdata,
ret = cont_write_begin(file, mapping, pos, len, foliop, fsdata,
affs_get_block,
&AFFS_I(mapping->host)->mmu_private);
if (unlikely(ret))
@ -433,12 +432,12 @@ static int affs_write_begin(struct file *file, struct address_space *mapping,
static int affs_write_end(struct file *file, struct address_space *mapping,
loff_t pos, unsigned int len, unsigned int copied,
struct page *page, void *fsdata)
struct folio *folio, void *fsdata)
{
struct inode *inode = mapping->host;
int ret;
ret = generic_write_end(file, mapping, pos, len, copied, page, fsdata);
ret = generic_write_end(file, mapping, pos, len, copied, folio, fsdata);
/* Clear Archived bit on file writes, as AmigaOS would do */
if (AFFS_I(inode)->i_protect & FIBF_ARCHIVED) {
@ -648,7 +647,7 @@ static int affs_read_folio_ofs(struct file *file, struct folio *folio)
static int affs_write_begin_ofs(struct file *file, struct address_space *mapping,
loff_t pos, unsigned len,
struct page **pagep, void **fsdata)
struct folio **foliop, void **fsdata)
{
struct inode *inode = mapping->host;
struct folio *folio;
@ -671,7 +670,7 @@ static int affs_write_begin_ofs(struct file *file, struct address_space *mapping
mapping_gfp_mask(mapping));
if (IS_ERR(folio))
return PTR_ERR(folio);
*pagep = &folio->page;
*foliop = folio;
if (folio_test_uptodate(folio))
return 0;
@ -687,9 +686,8 @@ static int affs_write_begin_ofs(struct file *file, struct address_space *mapping
static int affs_write_end_ofs(struct file *file, struct address_space *mapping,
loff_t pos, unsigned len, unsigned copied,
struct page *page, void *fsdata)
struct folio *folio, void *fsdata)
{
struct folio *folio = page_folio(page);
struct inode *inode = mapping->host;
struct super_block *sb = inode->i_sb;
struct buffer_head *bh, *prev_bh;
@ -882,14 +880,14 @@ affs_truncate(struct inode *inode)
if (inode->i_size > AFFS_I(inode)->mmu_private) {
struct address_space *mapping = inode->i_mapping;
struct page *page;
struct folio *folio;
void *fsdata = NULL;
loff_t isize = inode->i_size;
int res;
res = mapping->a_ops->write_begin(NULL, mapping, isize, 0, &page, &fsdata);
res = mapping->a_ops->write_begin(NULL, mapping, isize, 0, &folio, &fsdata);
if (!res)
res = mapping->a_ops->write_end(NULL, mapping, isize, 0, 0, page, fsdata);
res = mapping->a_ops->write_end(NULL, mapping, isize, 0, 0, folio, fsdata);
else
inode->i_size = AFFS_I(inode)->mmu_private;
mark_inode_dirty(inode);

View File

@ -100,7 +100,7 @@ struct kioctx {
unsigned long user_id;
struct __percpu kioctx_cpu *cpu;
struct kioctx_cpu __percpu *cpu;
/*
* For percpu reqs_available, number of slots we move to/from global

View File

@ -271,6 +271,42 @@ out_big:
}
EXPORT_SYMBOL(inode_newsize_ok);
/**
* setattr_copy_mgtime - update timestamps for mgtime inodes
* @inode: inode timestamps to be updated
* @attr: attrs for the update
*
* With multigrain timestamps, we need to take more care to prevent races
* when updating the ctime. Always update the ctime to the very latest
* using the standard mechanism, and use that to populate the atime and
* mtime appropriately (unless we're setting those to specific values).
*/
static void setattr_copy_mgtime(struct inode *inode, const struct iattr *attr)
{
unsigned int ia_valid = attr->ia_valid;
struct timespec64 now;
/*
* If the ctime isn't being updated then nothing else should be
* either.
*/
if (!(ia_valid & ATTR_CTIME)) {
WARN_ON_ONCE(ia_valid & (ATTR_ATIME|ATTR_MTIME));
return;
}
now = inode_set_ctime_current(inode);
if (ia_valid & ATTR_ATIME_SET)
inode_set_atime_to_ts(inode, attr->ia_atime);
else if (ia_valid & ATTR_ATIME)
inode_set_atime_to_ts(inode, now);
if (ia_valid & ATTR_MTIME_SET)
inode_set_mtime_to_ts(inode, attr->ia_mtime);
else if (ia_valid & ATTR_MTIME)
inode_set_mtime_to_ts(inode, now);
}
/**
* setattr_copy - copy simple metadata updates into the generic inode
* @idmap: idmap of the mount the inode was found from
@ -303,12 +339,6 @@ void setattr_copy(struct mnt_idmap *idmap, struct inode *inode,
i_uid_update(idmap, attr, inode);
i_gid_update(idmap, attr, inode);
if (ia_valid & ATTR_ATIME)
inode_set_atime_to_ts(inode, attr->ia_atime);
if (ia_valid & ATTR_MTIME)
inode_set_mtime_to_ts(inode, attr->ia_mtime);
if (ia_valid & ATTR_CTIME)
inode_set_ctime_to_ts(inode, attr->ia_ctime);
if (ia_valid & ATTR_MODE) {
umode_t mode = attr->ia_mode;
if (!in_group_or_capable(idmap, inode,
@ -316,6 +346,16 @@ void setattr_copy(struct mnt_idmap *idmap, struct inode *inode,
mode &= ~S_ISGID;
inode->i_mode = mode;
}
if (is_mgtime(inode))
return setattr_copy_mgtime(inode, attr);
if (ia_valid & ATTR_ATIME)
inode_set_atime_to_ts(inode, attr->ia_atime);
if (ia_valid & ATTR_MTIME)
inode_set_mtime_to_ts(inode, attr->ia_mtime);
if (ia_valid & ATTR_CTIME)
inode_set_ctime_to_ts(inode, attr->ia_ctime);
}
EXPORT_SYMBOL(setattr_copy);

View File

@ -62,6 +62,7 @@ struct autofs_info {
struct list_head expiring;
struct autofs_sb_info *sbi;
unsigned long exp_timeout;
unsigned long last_used;
int count;
@ -81,6 +82,9 @@ struct autofs_info {
*/
#define AUTOFS_INF_PENDING (1<<2) /* dentry pending mount */
#define AUTOFS_INF_EXPIRE_SET (1<<3) /* per-dentry expire timeout set for
this mount point.
*/
struct autofs_wait_queue {
wait_queue_head_t queue;
struct autofs_wait_queue *next;

View File

@ -128,7 +128,13 @@ static int validate_dev_ioctl(int cmd, struct autofs_dev_ioctl *param)
goto out;
}
/* Setting the per-dentry expire timeout requires a trailing
* path component, ie. no '/', so invert the logic of the
* check_name() return for AUTOFS_DEV_IOCTL_TIMEOUT_CMD.
*/
err = check_name(param->path);
if (cmd == AUTOFS_DEV_IOCTL_TIMEOUT_CMD)
err = err ? 0 : -EINVAL;
if (err) {
pr_warn("invalid path supplied for cmd(0x%08x)\n",
cmd);
@ -396,16 +402,97 @@ static int autofs_dev_ioctl_catatonic(struct file *fp,
return 0;
}
/* Set the autofs mount timeout */
/*
* Set the autofs mount expire timeout.
*
* There are two places an expire timeout can be set, in the autofs
* super block info. (this is all that's needed for direct and offset
* mounts because there's a distinct mount corresponding to each of
* these) and per-dentry within within the dentry info. If a per-dentry
* timeout is set it will override the expire timeout set in the parent
* autofs super block info.
*
* If setting the autofs super block expire timeout the autofs_dev_ioctl
* size field will be equal to the autofs_dev_ioctl structure size. If
* setting the per-dentry expire timeout the mount point name is passed
* in the autofs_dev_ioctl path field and the size field updated to
* reflect this.
*
* Setting the autofs mount expire timeout sets the timeout in the super
* block info. struct. Setting the per-dentry timeout does a little more.
* If the timeout is equal to -1 the per-dentry timeout (and flag) is
* cleared which reverts to using the super block timeout, otherwise if
* timeout is 0 the timeout is set to this value and the flag is left
* set which disables expiration for the mount point, lastly the flag
* and the timeout are set enabling the dentry to use this timeout.
*/
static int autofs_dev_ioctl_timeout(struct file *fp,
struct autofs_sb_info *sbi,
struct autofs_dev_ioctl *param)
{
unsigned long timeout;
unsigned long timeout = param->timeout.timeout;
timeout = param->timeout.timeout;
/* If setting the expire timeout for an individual indirect
* mount point dentry the mount trailing component path is
* placed in param->path and param->size adjusted to account
* for it otherwise param->size it is set to the structure
* size.
*/
if (param->size == AUTOFS_DEV_IOCTL_SIZE) {
param->timeout.timeout = sbi->exp_timeout / HZ;
sbi->exp_timeout = timeout * HZ;
} else {
struct dentry *base = fp->f_path.dentry;
struct inode *inode = base->d_inode;
int path_len = param->size - AUTOFS_DEV_IOCTL_SIZE - 1;
struct dentry *dentry;
struct autofs_info *ino;
if (!autofs_type_indirect(sbi->type))
return -EINVAL;
/* An expire timeout greater than the superblock timeout
* could be a problem at shutdown but the super block
* timeout itself can change so all we can really do is
* warn the user.
*/
if (timeout >= sbi->exp_timeout)
pr_warn("per-mount expire timeout is greater than "
"the parent autofs mount timeout which could "
"prevent shutdown\n");
inode_lock_shared(inode);
dentry = try_lookup_one_len(param->path, base, path_len);
inode_unlock_shared(inode);
if (IS_ERR_OR_NULL(dentry))
return dentry ? PTR_ERR(dentry) : -ENOENT;
ino = autofs_dentry_ino(dentry);
if (!ino) {
dput(dentry);
return -ENOENT;
}
if (ino->exp_timeout && ino->flags & AUTOFS_INF_EXPIRE_SET)
param->timeout.timeout = ino->exp_timeout / HZ;
else
param->timeout.timeout = sbi->exp_timeout / HZ;
if (timeout == -1) {
/* Revert to using the super block timeout */
ino->flags &= ~AUTOFS_INF_EXPIRE_SET;
ino->exp_timeout = 0;
} else {
/* Set the dentry expire flag and timeout.
*
* If timeout is 0 it will prevent the expire
* of this particular automount.
*/
ino->flags |= AUTOFS_INF_EXPIRE_SET;
ino->exp_timeout = timeout * HZ;
}
dput(dentry);
}
return 0;
}

View File

@ -429,8 +429,6 @@ static struct dentry *autofs_expire_indirect(struct super_block *sb,
if (!root)
return NULL;
timeout = sbi->exp_timeout;
dentry = NULL;
while ((dentry = get_next_positive_subdir(dentry, root))) {
spin_lock(&sbi->fs_lock);
@ -441,6 +439,11 @@ static struct dentry *autofs_expire_indirect(struct super_block *sb,
}
spin_unlock(&sbi->fs_lock);
if (ino->flags & AUTOFS_INF_EXPIRE_SET)
timeout = ino->exp_timeout;
else
timeout = sbi->exp_timeout;
expired = should_expire(dentry, mnt, timeout, how);
if (!expired)
continue;

View File

@ -19,6 +19,7 @@ struct autofs_info *autofs_new_ino(struct autofs_sb_info *sbi)
INIT_LIST_HEAD(&ino->expiring);
ino->last_used = jiffies;
ino->sbi = sbi;
ino->exp_timeout = -1;
ino->count = 1;
}
return ino;
@ -28,6 +29,7 @@ void autofs_clean_ino(struct autofs_info *ino)
{
ino->uid = GLOBAL_ROOT_UID;
ino->gid = GLOBAL_ROOT_GID;
ino->exp_timeout = -1;
ino->last_used = jiffies;
}
@ -172,7 +174,6 @@ static int autofs_parse_fd(struct fs_context *fc, struct autofs_sb_info *sbi,
ret = autofs_check_pipe(pipe);
if (ret < 0) {
errorf(fc, "Invalid/unusable pipe");
if (param->type != fs_value_is_file)
fput(pipe);
return -EBADF;
}

View File

@ -648,7 +648,7 @@ int bch2_writepages(struct address_space *mapping, struct writeback_control *wbc
int bch2_write_begin(struct file *file, struct address_space *mapping,
loff_t pos, unsigned len,
struct page **pagep, void **fsdata)
struct folio **foliop, void **fsdata)
{
struct bch_inode_info *inode = to_bch_ei(mapping->host);
struct bch_fs *c = inode->v.i_sb->s_fs_info;
@ -717,12 +717,11 @@ out:
goto err;
}
*pagep = &folio->page;
*foliop = folio;
return 0;
err:
folio_unlock(folio);
folio_put(folio);
*pagep = NULL;
err_unlock:
bch2_pagecache_add_put(inode);
kfree(res);
@ -732,12 +731,11 @@ err_unlock:
int bch2_write_end(struct file *file, struct address_space *mapping,
loff_t pos, unsigned len, unsigned copied,
struct page *page, void *fsdata)
struct folio *folio, void *fsdata)
{
struct bch_inode_info *inode = to_bch_ei(mapping->host);
struct bch_fs *c = inode->v.i_sb->s_fs_info;
struct bch2_folio_reservation *res = fsdata;
struct folio *folio = page_folio(page);
unsigned offset = pos - folio_pos(folio);
lockdep_assert_held(&inode->v.i_rwsem);

View File

@ -10,10 +10,10 @@ int bch2_read_folio(struct file *, struct folio *);
int bch2_writepages(struct address_space *, struct writeback_control *);
void bch2_readahead(struct readahead_control *);
int bch2_write_begin(struct file *, struct address_space *, loff_t,
unsigned, struct page **, void **);
int bch2_write_begin(struct file *, struct address_space *, loff_t pos,
unsigned len, struct folio **, void **);
int bch2_write_end(struct file *, struct address_space *, loff_t,
unsigned, unsigned, struct page *, void *);
unsigned len, unsigned copied, struct folio *, void *);
ssize_t bch2_write_iter(struct kiocb *, struct iov_iter *);

View File

@ -1736,14 +1736,16 @@ again:
break;
}
} else if (clean_pass && this_pass_clean) {
wait_queue_head_t *wq = bit_waitqueue(&inode->v.i_state, __I_NEW);
DEFINE_WAIT_BIT(wait, &inode->v.i_state, __I_NEW);
struct wait_bit_queue_entry wqe;
struct wait_queue_head *wq_head;
prepare_to_wait(wq, &wait.wq_entry, TASK_UNINTERRUPTIBLE);
wq_head = inode_bit_waitqueue(&wqe, &inode->v, __I_NEW);
prepare_to_wait_event(wq_head, &wqe.wq_entry,
TASK_UNINTERRUPTIBLE);
mutex_unlock(&c->vfs_inodes_lock);
schedule();
finish_wait(wq, &wait.wq_entry);
finish_wait(wq_head, &wqe.wq_entry);
goto again;
}
}

View File

@ -172,11 +172,11 @@ static void bfs_write_failed(struct address_space *mapping, loff_t to)
static int bfs_write_begin(struct file *file, struct address_space *mapping,
loff_t pos, unsigned len,
struct page **pagep, void **fsdata)
struct folio **foliop, void **fsdata)
{
int ret;
ret = block_write_begin(mapping, pos, len, pagep, bfs_get_block);
ret = block_write_begin(mapping, pos, len, foliop, bfs_get_block);
if (unlikely(ret))
bfs_write_failed(mapping, pos + len);

View File

@ -1120,26 +1120,6 @@ void btrfs_check_nocow_unlock(struct btrfs_inode *inode)
btrfs_drew_write_unlock(&inode->root->snapshot_lock);
}
static void update_time_for_write(struct inode *inode)
{
struct timespec64 now, ts;
if (IS_NOCMTIME(inode))
return;
now = current_time(inode);
ts = inode_get_mtime(inode);
if (!timespec64_equal(&ts, &now))
inode_set_mtime_to_ts(inode, now);
ts = inode_get_ctime(inode);
if (!timespec64_equal(&ts, &now))
inode_set_ctime_to_ts(inode, now);
if (IS_I_VERSION(inode))
inode_inc_iversion(inode);
}
int btrfs_write_check(struct kiocb *iocb, struct iov_iter *from, size_t count)
{
struct file *file = iocb->ki_filp;
@ -1170,7 +1150,10 @@ int btrfs_write_check(struct kiocb *iocb, struct iov_iter *from, size_t count)
* need to start yet another transaction to update the inode as we will
* update the inode when we finish writing whatever data we write.
*/
update_time_for_write(inode);
if (!IS_NOCMTIME(inode)) {
inode_set_mtime_to_ts(inode, inode_set_ctime_current(inode));
inode_inc_iversion(inode);
}
start_pos = round_down(pos, fs_info->sectorsize);
oldsize = i_size_read(inode);

View File

@ -2198,7 +2198,8 @@ static struct file_system_type btrfs_fs_type = {
.init_fs_context = btrfs_init_fs_context,
.parameters = btrfs_fs_parameters,
.kill_sb = btrfs_kill_super,
.fs_flags = FS_REQUIRES_DEV | FS_BINARY_MOUNTDATA | FS_ALLOW_IDMAP,
.fs_flags = FS_REQUIRES_DEV | FS_BINARY_MOUNTDATA |
FS_ALLOW_IDMAP | FS_MGTIME,
};
MODULE_ALIAS_FS("btrfs");

View File

@ -774,12 +774,11 @@ EXPORT_SYMBOL(block_dirty_folio);
static int fsync_buffers_list(spinlock_t *lock, struct list_head *list)
{
struct buffer_head *bh;
struct list_head tmp;
struct address_space *mapping;
int err = 0, err2;
struct blk_plug plug;
LIST_HEAD(tmp);
INIT_LIST_HEAD(&tmp);
blk_start_plug(&plug);
spin_lock(lock);
@ -2168,11 +2167,10 @@ int __block_write_begin_int(struct folio *folio, loff_t pos, unsigned len,
return err;
}
int __block_write_begin(struct page *page, loff_t pos, unsigned len,
int __block_write_begin(struct folio *folio, loff_t pos, unsigned len,
get_block_t *get_block)
{
return __block_write_begin_int(page_folio(page), pos, len, get_block,
NULL);
return __block_write_begin_int(folio, pos, len, get_block, NULL);
}
EXPORT_SYMBOL(__block_write_begin);
@ -2222,33 +2220,33 @@ static void __block_commit_write(struct folio *folio, size_t from, size_t to)
* The filesystem needs to handle block truncation upon failure.
*/
int block_write_begin(struct address_space *mapping, loff_t pos, unsigned len,
struct page **pagep, get_block_t *get_block)
struct folio **foliop, get_block_t *get_block)
{
pgoff_t index = pos >> PAGE_SHIFT;
struct page *page;
struct folio *folio;
int status;
page = grab_cache_page_write_begin(mapping, index);
if (!page)
return -ENOMEM;
folio = __filemap_get_folio(mapping, index, FGP_WRITEBEGIN,
mapping_gfp_mask(mapping));
if (IS_ERR(folio))
return PTR_ERR(folio);
status = __block_write_begin(page, pos, len, get_block);
status = __block_write_begin_int(folio, pos, len, get_block, NULL);
if (unlikely(status)) {
unlock_page(page);
put_page(page);
page = NULL;
folio_unlock(folio);
folio_put(folio);
folio = NULL;
}
*pagep = page;
*foliop = folio;
return status;
}
EXPORT_SYMBOL(block_write_begin);
int block_write_end(struct file *file, struct address_space *mapping,
loff_t pos, unsigned len, unsigned copied,
struct page *page, void *fsdata)
struct folio *folio, void *fsdata)
{
struct folio *folio = page_folio(page);
size_t start = pos - folio_pos(folio);
if (unlikely(copied < len)) {
@ -2280,19 +2278,19 @@ EXPORT_SYMBOL(block_write_end);
int generic_write_end(struct file *file, struct address_space *mapping,
loff_t pos, unsigned len, unsigned copied,
struct page *page, void *fsdata)
struct folio *folio, void *fsdata)
{
struct inode *inode = mapping->host;
loff_t old_size = inode->i_size;
bool i_size_changed = false;
copied = block_write_end(file, mapping, pos, len, copied, page, fsdata);
copied = block_write_end(file, mapping, pos, len, copied, folio, fsdata);
/*
* No need to use i_size_read() here, the i_size cannot change under us
* because we hold i_rwsem.
*
* But it's important to update i_size while still holding page lock:
* But it's important to update i_size while still holding folio lock:
* page writeout could otherwise come in and zero beyond i_size.
*/
if (pos + copied > inode->i_size) {
@ -2300,8 +2298,8 @@ int generic_write_end(struct file *file, struct address_space *mapping,
i_size_changed = true;
}
unlock_page(page);
put_page(page);
folio_unlock(folio);
folio_put(folio);
if (old_size < pos)
pagecache_isize_extended(inode, old_size, pos);
@ -2467,7 +2465,7 @@ int generic_cont_expand_simple(struct inode *inode, loff_t size)
{
struct address_space *mapping = inode->i_mapping;
const struct address_space_operations *aops = mapping->a_ops;
struct page *page;
struct folio *folio;
void *fsdata = NULL;
int err;
@ -2475,11 +2473,11 @@ int generic_cont_expand_simple(struct inode *inode, loff_t size)
if (err)
goto out;
err = aops->write_begin(NULL, mapping, size, 0, &page, &fsdata);
err = aops->write_begin(NULL, mapping, size, 0, &folio, &fsdata);
if (err)
goto out;
err = aops->write_end(NULL, mapping, size, 0, 0, page, fsdata);
err = aops->write_end(NULL, mapping, size, 0, 0, folio, fsdata);
BUG_ON(err > 0);
out:
@ -2493,7 +2491,7 @@ static int cont_expand_zero(struct file *file, struct address_space *mapping,
struct inode *inode = mapping->host;
const struct address_space_operations *aops = mapping->a_ops;
unsigned int blocksize = i_blocksize(inode);
struct page *page;
struct folio *folio;
void *fsdata = NULL;
pgoff_t index, curidx;
loff_t curpos;
@ -2512,12 +2510,12 @@ static int cont_expand_zero(struct file *file, struct address_space *mapping,
len = PAGE_SIZE - zerofrom;
err = aops->write_begin(file, mapping, curpos, len,
&page, &fsdata);
&folio, &fsdata);
if (err)
goto out;
zero_user(page, zerofrom, len);
folio_zero_range(folio, offset_in_folio(folio, curpos), len);
err = aops->write_end(file, mapping, curpos, len, len,
page, fsdata);
folio, fsdata);
if (err < 0)
goto out;
BUG_ON(err != len);
@ -2545,12 +2543,12 @@ static int cont_expand_zero(struct file *file, struct address_space *mapping,
len = offset - zerofrom;
err = aops->write_begin(file, mapping, curpos, len,
&page, &fsdata);
&folio, &fsdata);
if (err)
goto out;
zero_user(page, zerofrom, len);
folio_zero_range(folio, offset_in_folio(folio, curpos), len);
err = aops->write_end(file, mapping, curpos, len, len,
page, fsdata);
folio, fsdata);
if (err < 0)
goto out;
BUG_ON(err != len);
@ -2566,7 +2564,7 @@ out:
*/
int cont_write_begin(struct file *file, struct address_space *mapping,
loff_t pos, unsigned len,
struct page **pagep, void **fsdata,
struct folio **foliop, void **fsdata,
get_block_t *get_block, loff_t *bytes)
{
struct inode *inode = mapping->host;
@ -2584,7 +2582,7 @@ int cont_write_begin(struct file *file, struct address_space *mapping,
(*bytes)++;
}
return block_write_begin(mapping, pos, len, pagep, get_block);
return block_write_begin(mapping, pos, len, foliop, get_block);
}
EXPORT_SYMBOL(cont_write_begin);

View File

@ -1508,20 +1508,18 @@ static int ceph_netfs_check_write_begin(struct file *file, loff_t pos, unsigned
*/
static int ceph_write_begin(struct file *file, struct address_space *mapping,
loff_t pos, unsigned len,
struct page **pagep, void **fsdata)
struct folio **foliop, void **fsdata)
{
struct inode *inode = file_inode(file);
struct ceph_inode_info *ci = ceph_inode(inode);
struct folio *folio = NULL;
int r;
r = netfs_write_begin(&ci->netfs, file, inode->i_mapping, pos, len, &folio, NULL);
r = netfs_write_begin(&ci->netfs, file, inode->i_mapping, pos, len, foliop, NULL);
if (r < 0)
return r;
folio_wait_private_2(folio); /* [DEPRECATED] */
WARN_ON_ONCE(!folio_test_locked(folio));
*pagep = &folio->page;
folio_wait_private_2(*foliop); /* [DEPRECATED] */
WARN_ON_ONCE(!folio_test_locked(*foliop));
return 0;
}
@ -1531,9 +1529,8 @@ static int ceph_write_begin(struct file *file, struct address_space *mapping,
*/
static int ceph_write_end(struct file *file, struct address_space *mapping,
loff_t pos, unsigned len, unsigned copied,
struct page *subpage, void *fsdata)
struct folio *folio, void *fsdata)
{
struct folio *folio = page_folio(subpage);
struct inode *inode = file_inode(file);
struct ceph_client *cl = ceph_inode_to_client(inode);
bool check_cap = false;

View File

@ -119,31 +119,43 @@ static const struct fs_parameter_spec coda_param_specs[] = {
{}
};
static int coda_parse_fd(struct fs_context *fc, int fd)
static int coda_set_idx(struct fs_context *fc, struct file *file)
{
struct coda_fs_context *ctx = fc->fs_private;
struct fd f;
struct inode *inode;
int idx;
f = fdget(fd);
if (!f.file)
return -EBADF;
inode = file_inode(f.file);
inode = file_inode(file);
if (!S_ISCHR(inode->i_mode) || imajor(inode) != CODA_PSDEV_MAJOR) {
fdput(f);
return invalf(fc, "code: Not coda psdev");
return invalf(fc, "coda: Not coda psdev");
}
idx = iminor(inode);
fdput(f);
if (idx < 0 || idx >= MAX_CODADEVS)
return invalf(fc, "coda: Bad minor number");
ctx->idx = idx;
return 0;
}
static int coda_parse_fd(struct fs_context *fc, struct fs_parameter *param,
struct fs_parse_result *result)
{
struct file *file;
int err;
if (param->type == fs_value_is_file) {
file = param->file;
param->file = NULL;
} else {
file = fget(result->uint_32);
}
if (!file)
return -EBADF;
err = coda_set_idx(fc, file);
fput(file);
return err;
}
static int coda_parse_param(struct fs_context *fc, struct fs_parameter *param)
{
struct fs_parse_result result;
@ -155,7 +167,7 @@ static int coda_parse_param(struct fs_context *fc, struct fs_parameter *param)
switch (opt) {
case Opt_fd:
return coda_parse_fd(fc, result.uint_32);
return coda_parse_fd(fc, param, &result);
}
return 0;
@ -167,6 +179,7 @@ static int coda_parse_param(struct fs_context *fc, struct fs_parameter *param)
*/
static int coda_parse_monolithic(struct fs_context *fc, void *_data)
{
struct file *file;
struct coda_mount_data *data = _data;
if (!data)
@ -175,7 +188,11 @@ static int coda_parse_monolithic(struct fs_context *fc, void *_data)
if (data->version != CODA_MOUNT_VERSION)
return invalf(fc, "coda: Bad mount version");
coda_parse_fd(fc, data->fd);
file = fget(data->fd);
if (file) {
coda_set_idx(fc, file);
fput(file);
}
return 0;
}

View File

@ -1908,8 +1908,13 @@ void d_instantiate_new(struct dentry *entry, struct inode *inode)
__d_instantiate(entry, inode);
WARN_ON(!(inode->i_state & I_NEW));
inode->i_state &= ~I_NEW & ~I_CREATING;
/*
* Pairs with the barrier in prepare_to_wait_event() to make sure
* ___wait_var_event() either sees the bit cleared or
* waitqueue_active() check in wake_up_var() sees the waiter.
*/
smp_mb();
wake_up_bit(&inode->i_state, __I_NEW);
inode_wake_up_bit(inode, __I_NEW);
spin_unlock(&inode->i_lock);
}
EXPORT_SYMBOL(d_instantiate_new);
@ -2163,9 +2168,6 @@ seqretry:
* without taking d_lock and checking d_seq sequence count against @seq
* returned here.
*
* A refcount may be taken on the found dentry with the d_rcu_to_refcount
* function.
*
* Alternatively, __d_lookup_rcu may be called again to look up the child of
* the returned dentry, so long as its parent's seqlock is checked after the
* child is looked up. Thus, an interlocking stepping of sequence lock checks

View File

@ -89,12 +89,14 @@ enum {
Opt_uid,
Opt_gid,
Opt_mode,
Opt_source,
};
static const struct fs_parameter_spec debugfs_param_specs[] = {
fsparam_gid ("gid", Opt_gid),
fsparam_u32oct ("mode", Opt_mode),
fsparam_uid ("uid", Opt_uid),
fsparam_string ("source", Opt_source),
{}
};
@ -126,6 +128,12 @@ static int debugfs_parse_param(struct fs_context *fc, struct fs_parameter *param
case Opt_mode:
opts->mode = result.uint_32 & S_IALLUGO;
break;
case Opt_source:
if (fc->source)
return invalfc(fc, "Multiple sources specified");
fc->source = param->string;
param->string = NULL;
break;
/*
* We might like to report bad mount options here;
* but traditionally debugfs has ignored all mount options

View File

@ -37,7 +37,6 @@
#include <linux/rwsem.h>
#include <linux/uio.h>
#include <linux/atomic.h>
#include <linux/prefetch.h>
#include "internal.h"
@ -1121,11 +1120,6 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
struct blk_plug plug;
unsigned long align = offset | iov_iter_alignment(iter);
/*
* Avoid references to bdev if not absolutely needed to give
* the early prefetch in the caller enough time.
*/
/* watch out for a 0 len io from a tricksy fs */
if (iov_iter_rw(iter) == READ && !count)
return 0;

View File

@ -234,17 +234,17 @@ out:
/*
* Called with lower inode mutex held.
*/
static int fill_zeros_to_end_of_page(struct page *page, unsigned int to)
static int fill_zeros_to_end_of_page(struct folio *folio, unsigned int to)
{
struct inode *inode = page->mapping->host;
struct inode *inode = folio->mapping->host;
int end_byte_in_page;
if ((i_size_read(inode) / PAGE_SIZE) != page->index)
if ((i_size_read(inode) / PAGE_SIZE) != folio->index)
goto out;
end_byte_in_page = i_size_read(inode) % PAGE_SIZE;
if (to > end_byte_in_page)
end_byte_in_page = to;
zero_user_segment(page, end_byte_in_page, PAGE_SIZE);
folio_zero_segment(folio, end_byte_in_page, PAGE_SIZE);
out:
return 0;
}
@ -255,7 +255,7 @@ out:
* @mapping: The eCryptfs object
* @pos: The file offset at which to start writing
* @len: Length of the write
* @pagep: Pointer to return the page
* @foliop: Pointer to return the folio
* @fsdata: Pointer to return fs data (unused)
*
* This function must zero any hole we create
@ -265,38 +265,39 @@ out:
static int ecryptfs_write_begin(struct file *file,
struct address_space *mapping,
loff_t pos, unsigned len,
struct page **pagep, void **fsdata)
struct folio **foliop, void **fsdata)
{
pgoff_t index = pos >> PAGE_SHIFT;
struct page *page;
struct folio *folio;
loff_t prev_page_end_size;
int rc = 0;
page = grab_cache_page_write_begin(mapping, index);
if (!page)
return -ENOMEM;
*pagep = page;
folio = __filemap_get_folio(mapping, index, FGP_WRITEBEGIN,
mapping_gfp_mask(mapping));
if (IS_ERR(folio))
return PTR_ERR(folio);
*foliop = folio;
prev_page_end_size = ((loff_t)index << PAGE_SHIFT);
if (!PageUptodate(page)) {
if (!folio_test_uptodate(folio)) {
struct ecryptfs_crypt_stat *crypt_stat =
&ecryptfs_inode_to_private(mapping->host)->crypt_stat;
if (!(crypt_stat->flags & ECRYPTFS_ENCRYPTED)) {
rc = ecryptfs_read_lower_page_segment(
page, index, 0, PAGE_SIZE, mapping->host);
&folio->page, index, 0, PAGE_SIZE, mapping->host);
if (rc) {
printk(KERN_ERR "%s: Error attempting to read "
"lower page segment; rc = [%d]\n",
__func__, rc);
ClearPageUptodate(page);
folio_clear_uptodate(folio);
goto out;
} else
SetPageUptodate(page);
folio_mark_uptodate(folio);
} else if (crypt_stat->flags & ECRYPTFS_VIEW_AS_ENCRYPTED) {
if (crypt_stat->flags & ECRYPTFS_METADATA_IN_XATTR) {
rc = ecryptfs_copy_up_encrypted_with_header(
page, crypt_stat);
&folio->page, crypt_stat);
if (rc) {
printk(KERN_ERR "%s: Error attempting "
"to copy the encrypted content "
@ -304,46 +305,46 @@ static int ecryptfs_write_begin(struct file *file,
"inserting the metadata from "
"the xattr into the header; rc "
"= [%d]\n", __func__, rc);
ClearPageUptodate(page);
folio_clear_uptodate(folio);
goto out;
}
SetPageUptodate(page);
folio_mark_uptodate(folio);
} else {
rc = ecryptfs_read_lower_page_segment(
page, index, 0, PAGE_SIZE,
&folio->page, index, 0, PAGE_SIZE,
mapping->host);
if (rc) {
printk(KERN_ERR "%s: Error reading "
"page; rc = [%d]\n",
__func__, rc);
ClearPageUptodate(page);
folio_clear_uptodate(folio);
goto out;
}
SetPageUptodate(page);
folio_mark_uptodate(folio);
}
} else {
if (prev_page_end_size
>= i_size_read(page->mapping->host)) {
zero_user(page, 0, PAGE_SIZE);
SetPageUptodate(page);
>= i_size_read(mapping->host)) {
folio_zero_range(folio, 0, PAGE_SIZE);
folio_mark_uptodate(folio);
} else if (len < PAGE_SIZE) {
rc = ecryptfs_decrypt_page(page);
rc = ecryptfs_decrypt_page(&folio->page);
if (rc) {
printk(KERN_ERR "%s: Error decrypting "
"page at index [%ld]; "
"rc = [%d]\n",
__func__, page->index, rc);
ClearPageUptodate(page);
__func__, folio->index, rc);
folio_clear_uptodate(folio);
goto out;
}
SetPageUptodate(page);
folio_mark_uptodate(folio);
}
}
}
/* If creating a page or more of holes, zero them out via truncate.
* Note, this will increase i_size. */
if (index != 0) {
if (prev_page_end_size > i_size_read(page->mapping->host)) {
if (prev_page_end_size > i_size_read(mapping->host)) {
rc = ecryptfs_truncate(file->f_path.dentry,
prev_page_end_size);
if (rc) {
@ -359,12 +360,11 @@ static int ecryptfs_write_begin(struct file *file,
* of page? Zero it out. */
if ((i_size_read(mapping->host) == prev_page_end_size)
&& (pos != 0))
zero_user(page, 0, PAGE_SIZE);
folio_zero_range(folio, 0, PAGE_SIZE);
out:
if (unlikely(rc)) {
unlock_page(page);
put_page(page);
*pagep = NULL;
folio_unlock(folio);
folio_put(folio);
}
return rc;
}
@ -457,13 +457,13 @@ int ecryptfs_write_inode_size_to_metadata(struct inode *ecryptfs_inode)
* @pos: The file position
* @len: The length of the data (unused)
* @copied: The amount of data copied
* @page: The eCryptfs page
* @folio: The eCryptfs folio
* @fsdata: The fsdata (unused)
*/
static int ecryptfs_write_end(struct file *file,
struct address_space *mapping,
loff_t pos, unsigned len, unsigned copied,
struct page *page, void *fsdata)
struct folio *folio, void *fsdata)
{
pgoff_t index = pos >> PAGE_SHIFT;
unsigned from = pos & (PAGE_SIZE - 1);
@ -476,8 +476,8 @@ static int ecryptfs_write_end(struct file *file,
ecryptfs_printk(KERN_DEBUG, "Calling fill_zeros_to_end_of_page"
"(page w/ index = [0x%.16lx], to = [%d])\n", index, to);
if (!(crypt_stat->flags & ECRYPTFS_ENCRYPTED)) {
rc = ecryptfs_write_lower_page_segment(ecryptfs_inode, page, 0,
to);
rc = ecryptfs_write_lower_page_segment(ecryptfs_inode,
&folio->page, 0, to);
if (!rc) {
rc = copied;
fsstack_copy_inode_size(ecryptfs_inode,
@ -485,21 +485,21 @@ static int ecryptfs_write_end(struct file *file,
}
goto out;
}
if (!PageUptodate(page)) {
if (!folio_test_uptodate(folio)) {
if (copied < PAGE_SIZE) {
rc = 0;
goto out;
}
SetPageUptodate(page);
folio_mark_uptodate(folio);
}
/* Fills in zeros if 'to' goes beyond inode size */
rc = fill_zeros_to_end_of_page(page, to);
rc = fill_zeros_to_end_of_page(folio, to);
if (rc) {
ecryptfs_printk(KERN_WARNING, "Error attempting to fill "
"zeros in page with index = [0x%.16lx]\n", index);
goto out;
}
rc = ecryptfs_encrypt_page(page);
rc = ecryptfs_encrypt_page(&folio->page);
if (rc) {
ecryptfs_printk(KERN_WARNING, "Error encrypting page (upper "
"index [0x%.16lx])\n", index);
@ -518,8 +518,8 @@ static int ecryptfs_write_end(struct file *file,
else
rc = copied;
out:
unlock_page(page);
put_page(page);
folio_unlock(folio);
folio_put(folio);
return rc;
}

View File

@ -420,7 +420,7 @@ static bool busy_loop_ep_timeout(unsigned long start_time,
static bool ep_busy_loop_on(struct eventpoll *ep)
{
return !!ep->busy_poll_usecs || net_busy_loop_on();
return !!READ_ONCE(ep->busy_poll_usecs) || net_busy_loop_on();
}
static bool ep_busy_loop_end(void *p, unsigned long start_time)
@ -2200,11 +2200,6 @@ static int do_epoll_create(int flags)
error = PTR_ERR(file);
goto out_free_fd;
}
#ifdef CONFIG_NET_RX_BUSY_POLL
ep->busy_poll_usecs = 0;
ep->busy_poll_budget = 0;
ep->prefer_busy_poll = false;
#endif
ep->file = file;
fd_install(fd, file);
return fd;

View File

@ -145,13 +145,11 @@ SYSCALL_DEFINE1(uselib, const char __user *, library)
goto out;
/*
* may_open() has already checked for this, so it should be
* impossible to trip now. But we need to be extra cautious
* and check again at the very end too.
* Check do_open_execat() for an explanation.
*/
error = -EACCES;
if (WARN_ON_ONCE(!S_ISREG(file_inode(file)->i_mode) ||
path_noexec(&file->f_path)))
if (WARN_ON_ONCE(!S_ISREG(file_inode(file)->i_mode)) ||
path_noexec(&file->f_path))
goto exit;
error = -ENOEXEC;
@ -954,7 +952,6 @@ EXPORT_SYMBOL(transfer_args_to_stack);
static struct file *do_open_execat(int fd, struct filename *name, int flags)
{
struct file *file;
int err;
struct open_flags open_exec_flags = {
.open_flag = O_LARGEFILE | O_RDONLY | __FMODE_EXEC,
.acc_mode = MAY_EXEC,
@ -971,24 +968,20 @@ static struct file *do_open_execat(int fd, struct filename *name, int flags)
file = do_filp_open(fd, name, &open_exec_flags);
if (IS_ERR(file))
goto out;
/*
* may_open() has already checked for this, so it should be
* impossible to trip now. But we need to be extra cautious
* and check again at the very end too.
*/
err = -EACCES;
if (WARN_ON_ONCE(!S_ISREG(file_inode(file)->i_mode) ||
path_noexec(&file->f_path)))
goto exit;
out:
return file;
exit:
/*
* In the past the regular type check was here. It moved to may_open() in
* 633fb6ac3980 ("exec: move S_ISREG() check earlier"). Since then it is
* an invariant that all non-regular files error out before we get here.
*/
if (WARN_ON_ONCE(!S_ISREG(file_inode(file)->i_mode)) ||
path_noexec(&file->f_path)) {
fput(file);
return ERR_PTR(err);
return ERR_PTR(-EACCES);
}
return file;
}
/**

View File

@ -541,20 +541,20 @@ static int exfat_file_zeroed_range(struct file *file, loff_t start, loff_t end)
while (start < end) {
u32 zerofrom, len;
struct page *page = NULL;
struct folio *folio;
zerofrom = start & (PAGE_SIZE - 1);
len = PAGE_SIZE - zerofrom;
if (start + len > end)
len = end - start;
err = ops->write_begin(file, mapping, start, len, &page, NULL);
err = ops->write_begin(file, mapping, start, len, &folio, NULL);
if (err)
goto out;
zero_user_segment(page, zerofrom, zerofrom + len);
folio_zero_range(folio, offset_in_folio(folio, start), len);
err = ops->write_end(file, mapping, start, len, len, page, NULL);
err = ops->write_end(file, mapping, start, len, len, folio, NULL);
if (err < 0)
goto out;
start += len;

View File

@ -424,15 +424,14 @@ static void exfat_write_failed(struct address_space *mapping, loff_t to)
static int exfat_write_begin(struct file *file, struct address_space *mapping,
loff_t pos, unsigned int len,
struct page **pagep, void **fsdata)
struct folio **foliop, void **fsdata)
{
int ret;
if (unlikely(exfat_forced_shutdown(mapping->host->i_sb)))
return -EIO;
*pagep = NULL;
ret = block_write_begin(mapping, pos, len, pagep, exfat_get_block);
ret = block_write_begin(mapping, pos, len, foliop, exfat_get_block);
if (ret < 0)
exfat_write_failed(mapping, pos+len);
@ -442,13 +441,13 @@ static int exfat_write_begin(struct file *file, struct address_space *mapping,
static int exfat_write_end(struct file *file, struct address_space *mapping,
loff_t pos, unsigned int len, unsigned int copied,
struct page *pagep, void *fsdata)
struct folio *folio, void *fsdata)
{
struct inode *inode = mapping->host;
struct exfat_inode_info *ei = EXFAT_I(inode);
int err;
err = generic_write_end(file, mapping, pos, len, copied, pagep, fsdata);
err = generic_write_end(file, mapping, pos, len, copied, folio, fsdata);
if (err < len)
exfat_write_failed(mapping, pos+len);

View File

@ -87,7 +87,7 @@ static void ext2_commit_chunk(struct folio *folio, loff_t pos, unsigned len)
struct inode *dir = mapping->host;
inode_inc_iversion(dir);
block_write_end(NULL, mapping, pos, len, len, &folio->page, NULL);
block_write_end(NULL, mapping, pos, len, len, folio, NULL);
if (pos+len > dir->i_size) {
i_size_write(dir, pos+len);
@ -434,7 +434,7 @@ int ext2_inode_by_name(struct inode *dir, const struct qstr *child, ino_t *ino)
static int ext2_prepare_chunk(struct folio *folio, loff_t pos, unsigned len)
{
return __block_write_begin(&folio->page, pos, len, ext2_get_block);
return __block_write_begin(folio, pos, len, ext2_get_block);
}
static int ext2_handle_dirsync(struct inode *dir)

View File

@ -916,11 +916,11 @@ static void ext2_readahead(struct readahead_control *rac)
static int
ext2_write_begin(struct file *file, struct address_space *mapping,
loff_t pos, unsigned len, struct page **pagep, void **fsdata)
loff_t pos, unsigned len, struct folio **foliop, void **fsdata)
{
int ret;
ret = block_write_begin(mapping, pos, len, pagep, ext2_get_block);
ret = block_write_begin(mapping, pos, len, foliop, ext2_get_block);
if (ret < 0)
ext2_write_failed(mapping, pos + len);
return ret;
@ -928,11 +928,11 @@ ext2_write_begin(struct file *file, struct address_space *mapping,
static int ext2_write_end(struct file *file, struct address_space *mapping,
loff_t pos, unsigned len, unsigned copied,
struct page *page, void *fsdata)
struct folio *folio, void *fsdata)
{
int ret;
ret = generic_write_end(file, mapping, pos, len, copied, page, fsdata);
ret = generic_write_end(file, mapping, pos, len, copied, folio, fsdata);
if (ret < len)
ext2_write_failed(mapping, pos + len);
return ret;

View File

@ -3563,13 +3563,13 @@ int ext4_readpage_inline(struct inode *inode, struct folio *folio);
extern int ext4_try_to_write_inline_data(struct address_space *mapping,
struct inode *inode,
loff_t pos, unsigned len,
struct page **pagep);
struct folio **foliop);
int ext4_write_inline_data_end(struct inode *inode, loff_t pos, unsigned len,
unsigned copied, struct folio *folio);
extern int ext4_da_write_inline_data_begin(struct address_space *mapping,
struct inode *inode,
loff_t pos, unsigned len,
struct page **pagep,
struct folio **foliop,
void **fsdata);
extern int ext4_try_add_inline_entry(handle_t *handle,
struct ext4_filename *fname,

View File

@ -601,10 +601,10 @@ retry:
goto out;
if (ext4_should_dioread_nolock(inode)) {
ret = __block_write_begin(&folio->page, from, to,
ret = __block_write_begin(folio, from, to,
ext4_get_block_unwritten);
} else
ret = __block_write_begin(&folio->page, from, to, ext4_get_block);
ret = __block_write_begin(folio, from, to, ext4_get_block);
if (!ret && ext4_should_journal_data(inode)) {
ret = ext4_walk_page_buffers(handle, inode,
@ -660,7 +660,7 @@ out_nofolio:
int ext4_try_to_write_inline_data(struct address_space *mapping,
struct inode *inode,
loff_t pos, unsigned len,
struct page **pagep)
struct folio **foliop)
{
int ret;
handle_t *handle;
@ -708,7 +708,7 @@ int ext4_try_to_write_inline_data(struct address_space *mapping,
goto out;
}
*pagep = &folio->page;
*foliop = folio;
down_read(&EXT4_I(inode)->xattr_sem);
if (!ext4_has_inline_data(inode)) {
ret = 0;
@ -856,7 +856,7 @@ static int ext4_da_convert_inline_data_to_extent(struct address_space *mapping,
goto out;
}
ret = __block_write_begin(&folio->page, 0, inline_size,
ret = __block_write_begin(folio, 0, inline_size,
ext4_da_get_block_prep);
if (ret) {
up_read(&EXT4_I(inode)->xattr_sem);
@ -891,7 +891,7 @@ out:
int ext4_da_write_inline_data_begin(struct address_space *mapping,
struct inode *inode,
loff_t pos, unsigned len,
struct page **pagep,
struct folio **foliop,
void **fsdata)
{
int ret;
@ -954,7 +954,7 @@ retry_journal:
goto out_release_page;
up_read(&EXT4_I(inode)->xattr_sem);
*pagep = &folio->page;
*foliop = folio;
brelse(iloc.bh);
return 1;
out_release_page:

View File

@ -1145,7 +1145,7 @@ static int ext4_block_write_begin(struct folio *folio, loff_t pos, unsigned len,
*/
static int ext4_write_begin(struct file *file, struct address_space *mapping,
loff_t pos, unsigned len,
struct page **pagep, void **fsdata)
struct folio **foliop, void **fsdata)
{
struct inode *inode = mapping->host;
int ret, needed_blocks;
@ -1170,7 +1170,7 @@ static int ext4_write_begin(struct file *file, struct address_space *mapping,
if (ext4_test_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA)) {
ret = ext4_try_to_write_inline_data(mapping, inode, pos, len,
pagep);
foliop);
if (ret < 0)
return ret;
if (ret == 1)
@ -1224,10 +1224,10 @@ retry_journal:
ret = ext4_block_write_begin(folio, pos, len, ext4_get_block);
#else
if (ext4_should_dioread_nolock(inode))
ret = __block_write_begin(&folio->page, pos, len,
ret = __block_write_begin(folio, pos, len,
ext4_get_block_unwritten);
else
ret = __block_write_begin(&folio->page, pos, len, ext4_get_block);
ret = __block_write_begin(folio, pos, len, ext4_get_block);
#endif
if (!ret && ext4_should_journal_data(inode)) {
ret = ext4_walk_page_buffers(handle, inode,
@ -1270,7 +1270,7 @@ retry_journal:
folio_put(folio);
return ret;
}
*pagep = &folio->page;
*foliop = folio;
return ret;
}
@ -1298,9 +1298,8 @@ static int write_end_fn(handle_t *handle, struct inode *inode,
static int ext4_write_end(struct file *file,
struct address_space *mapping,
loff_t pos, unsigned len, unsigned copied,
struct page *page, void *fsdata)
struct folio *folio, void *fsdata)
{
struct folio *folio = page_folio(page);
handle_t *handle = ext4_journal_current_handle();
struct inode *inode = mapping->host;
loff_t old_size = inode->i_size;
@ -1315,7 +1314,7 @@ static int ext4_write_end(struct file *file,
return ext4_write_inline_data_end(inode, pos, len, copied,
folio);
copied = block_write_end(file, mapping, pos, len, copied, page, fsdata);
copied = block_write_end(file, mapping, pos, len, copied, folio, fsdata);
/*
* it's important to update i_size while still holding folio lock:
* page writeout could otherwise come in and zero beyond i_size.
@ -1402,9 +1401,8 @@ static void ext4_journalled_zero_new_buffers(handle_t *handle,
static int ext4_journalled_write_end(struct file *file,
struct address_space *mapping,
loff_t pos, unsigned len, unsigned copied,
struct page *page, void *fsdata)
struct folio *folio, void *fsdata)
{
struct folio *folio = page_folio(page);
handle_t *handle = ext4_journal_current_handle();
struct inode *inode = mapping->host;
loff_t old_size = inode->i_size;
@ -2926,7 +2924,7 @@ static int ext4_nonda_switch(struct super_block *sb)
static int ext4_da_write_begin(struct file *file, struct address_space *mapping,
loff_t pos, unsigned len,
struct page **pagep, void **fsdata)
struct folio **foliop, void **fsdata)
{
int ret, retries = 0;
struct folio *folio;
@ -2941,14 +2939,14 @@ static int ext4_da_write_begin(struct file *file, struct address_space *mapping,
if (ext4_nonda_switch(inode->i_sb) || ext4_verity_in_progress(inode)) {
*fsdata = (void *)FALL_BACK_TO_NONDELALLOC;
return ext4_write_begin(file, mapping, pos,
len, pagep, fsdata);
len, foliop, fsdata);
}
*fsdata = (void *)0;
trace_ext4_da_write_begin(inode, pos, len);
if (ext4_test_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA)) {
ret = ext4_da_write_inline_data_begin(mapping, inode, pos, len,
pagep, fsdata);
foliop, fsdata);
if (ret < 0)
return ret;
if (ret == 1)
@ -2964,7 +2962,7 @@ retry:
#ifdef CONFIG_FS_ENCRYPTION
ret = ext4_block_write_begin(folio, pos, len, ext4_da_get_block_prep);
#else
ret = __block_write_begin(&folio->page, pos, len, ext4_da_get_block_prep);
ret = __block_write_begin(folio, pos, len, ext4_da_get_block_prep);
#endif
if (ret < 0) {
folio_unlock(folio);
@ -2983,7 +2981,7 @@ retry:
return ret;
}
*pagep = &folio->page;
*foliop = folio;
return ret;
}
@ -3029,7 +3027,7 @@ static int ext4_da_do_write_end(struct address_space *mapping,
* flag, which all that's needed to trigger page writeback.
*/
copied = block_write_end(NULL, mapping, pos, len, copied,
&folio->page, NULL);
folio, NULL);
new_i_size = pos + copied;
/*
@ -3080,15 +3078,14 @@ static int ext4_da_do_write_end(struct address_space *mapping,
static int ext4_da_write_end(struct file *file,
struct address_space *mapping,
loff_t pos, unsigned len, unsigned copied,
struct page *page, void *fsdata)
struct folio *folio, void *fsdata)
{
struct inode *inode = mapping->host;
int write_mode = (int)(unsigned long)fsdata;
struct folio *folio = page_folio(page);
if (write_mode == FALL_BACK_TO_NONDELALLOC)
return ext4_write_end(file, mapping, pos,
len, copied, &folio->page, fsdata);
len, copied, folio, fsdata);
trace_ext4_da_write_end(inode, pos, len, copied);
@ -6222,7 +6219,7 @@ retry_alloc:
if (folio_pos(folio) + len > size)
len = size - folio_pos(folio);
err = __block_write_begin(&folio->page, 0, len, ext4_get_block);
err = __block_write_begin(folio, 0, len, ext4_get_block);
if (!err) {
ret = VM_FAULT_SIGBUS;
if (ext4_journal_folio_buffers(handle, folio, len))

View File

@ -7307,7 +7307,7 @@ static struct file_system_type ext4_fs_type = {
.init_fs_context = ext4_init_fs_context,
.parameters = ext4_param_specs,
.kill_sb = ext4_kill_sb,
.fs_flags = FS_REQUIRES_DEV | FS_ALLOW_IDMAP,
.fs_flags = FS_REQUIRES_DEV | FS_ALLOW_IDMAP | FS_MGTIME,
};
MODULE_ALIAS_FS("ext4");

View File

@ -76,17 +76,17 @@ static int pagecache_write(struct inode *inode, const void *buf, size_t count,
while (count) {
size_t n = min_t(size_t, count,
PAGE_SIZE - offset_in_page(pos));
struct page *page;
struct folio *folio;
void *fsdata = NULL;
int res;
res = aops->write_begin(NULL, mapping, pos, n, &page, &fsdata);
res = aops->write_begin(NULL, mapping, pos, n, &folio, &fsdata);
if (res)
return res;
memcpy_to_page(page, offset_in_page(pos), buf, n);
memcpy_to_folio(folio, offset_in_folio(folio, pos), buf, n);
res = aops->write_end(NULL, mapping, pos, n, n, page, fsdata);
res = aops->write_end(NULL, mapping, pos, n, n, folio, fsdata);
if (res < 0)
return res;
if (res != n)

View File

@ -3556,12 +3556,12 @@ reserve_block:
}
static int f2fs_write_begin(struct file *file, struct address_space *mapping,
loff_t pos, unsigned len, struct page **pagep, void **fsdata)
loff_t pos, unsigned len, struct folio **foliop, void **fsdata)
{
struct inode *inode = mapping->host;
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
struct page *page = NULL;
pgoff_t index = ((unsigned long long) pos) >> PAGE_SHIFT;
struct folio *folio;
pgoff_t index = pos >> PAGE_SHIFT;
bool need_balance = false;
bool use_cow = false;
block_t blkaddr = NULL_ADDR;
@ -3577,7 +3577,7 @@ static int f2fs_write_begin(struct file *file, struct address_space *mapping,
/*
* We should check this at this moment to avoid deadlock on inode page
* and #0 page. The locking rule for inline_data conversion should be:
* lock_page(page #0) -> lock_page(inode_page)
* folio_lock(folio #0) -> folio_lock(inode_page)
*/
if (index != 0) {
err = f2fs_convert_inline_inode(inode);
@ -3588,18 +3588,20 @@ static int f2fs_write_begin(struct file *file, struct address_space *mapping,
#ifdef CONFIG_F2FS_FS_COMPRESSION
if (f2fs_compressed_file(inode)) {
int ret;
struct page *page;
*fsdata = NULL;
if (len == PAGE_SIZE && !(f2fs_is_atomic_file(inode)))
goto repeat;
ret = f2fs_prepare_compress_overwrite(inode, pagep,
ret = f2fs_prepare_compress_overwrite(inode, &page,
index, fsdata);
if (ret < 0) {
err = ret;
goto fail;
} else if (ret) {
*foliop = page_folio(page);
return 0;
}
}
@ -3607,81 +3609,85 @@ static int f2fs_write_begin(struct file *file, struct address_space *mapping,
repeat:
/*
* Do not use grab_cache_page_write_begin() to avoid deadlock due to
* wait_for_stable_page. Will wait that below with our IO control.
* Do not use FGP_STABLE to avoid deadlock.
* Will wait that below with our IO control.
*/
page = f2fs_pagecache_get_page(mapping, index,
folio = __filemap_get_folio(mapping, index,
FGP_LOCK | FGP_WRITE | FGP_CREAT, GFP_NOFS);
if (!page) {
err = -ENOMEM;
if (IS_ERR(folio)) {
err = PTR_ERR(folio);
goto fail;
}
/* TODO: cluster can be compressed due to race with .writepage */
*pagep = page;
*foliop = folio;
if (f2fs_is_atomic_file(inode))
err = prepare_atomic_write_begin(sbi, page, pos, len,
err = prepare_atomic_write_begin(sbi, &folio->page, pos, len,
&blkaddr, &need_balance, &use_cow);
else
err = prepare_write_begin(sbi, page, pos, len,
err = prepare_write_begin(sbi, &folio->page, pos, len,
&blkaddr, &need_balance);
if (err)
goto fail;
goto put_folio;
if (need_balance && !IS_NOQUOTA(inode) &&
has_not_enough_free_secs(sbi, 0, 0)) {
unlock_page(page);
folio_unlock(folio);
f2fs_balance_fs(sbi, true);
lock_page(page);
if (page->mapping != mapping) {
/* The page got truncated from under us */
f2fs_put_page(page, 1);
folio_lock(folio);
if (folio->mapping != mapping) {
/* The folio got truncated from under us */
folio_unlock(folio);
folio_put(folio);
goto repeat;
}
}
f2fs_wait_on_page_writeback(page, DATA, false, true);
f2fs_wait_on_page_writeback(&folio->page, DATA, false, true);
if (len == PAGE_SIZE || PageUptodate(page))
if (len == folio_size(folio) || folio_test_uptodate(folio))
return 0;
if (!(pos & (PAGE_SIZE - 1)) && (pos + len) >= i_size_read(inode) &&
!f2fs_verity_in_progress(inode)) {
zero_user_segment(page, len, PAGE_SIZE);
folio_zero_segment(folio, len, PAGE_SIZE);
return 0;
}
if (blkaddr == NEW_ADDR) {
zero_user_segment(page, 0, PAGE_SIZE);
SetPageUptodate(page);
folio_zero_segment(folio, 0, folio_size(folio));
folio_mark_uptodate(folio);
} else {
if (!f2fs_is_valid_blkaddr(sbi, blkaddr,
DATA_GENERIC_ENHANCE_READ)) {
err = -EFSCORRUPTED;
goto fail;
goto put_folio;
}
err = f2fs_submit_page_read(use_cow ?
F2FS_I(inode)->cow_inode : inode, page,
F2FS_I(inode)->cow_inode : inode, &folio->page,
blkaddr, 0, true);
if (err)
goto fail;
goto put_folio;
lock_page(page);
if (unlikely(page->mapping != mapping)) {
f2fs_put_page(page, 1);
folio_lock(folio);
if (unlikely(folio->mapping != mapping)) {
folio_unlock(folio);
folio_put(folio);
goto repeat;
}
if (unlikely(!PageUptodate(page))) {
if (unlikely(!folio_test_uptodate(folio))) {
err = -EIO;
goto fail;
goto put_folio;
}
}
return 0;
put_folio:
folio_unlock(folio);
folio_put(folio);
fail:
f2fs_put_page(page, 1);
f2fs_write_failed(inode, pos + len);
return err;
}
@ -3689,9 +3695,9 @@ fail:
static int f2fs_write_end(struct file *file,
struct address_space *mapping,
loff_t pos, unsigned len, unsigned copied,
struct page *page, void *fsdata)
struct folio *folio, void *fsdata)
{
struct inode *inode = page->mapping->host;
struct inode *inode = folio->mapping->host;
trace_f2fs_write_end(inode, pos, len, copied);
@ -3700,17 +3706,17 @@ static int f2fs_write_end(struct file *file,
* should be PAGE_SIZE. Otherwise, we treat it with zero copied and
* let generic_perform_write() try to copy data again through copied=0.
*/
if (!PageUptodate(page)) {
if (!folio_test_uptodate(folio)) {
if (unlikely(copied != len))
copied = 0;
else
SetPageUptodate(page);
folio_mark_uptodate(folio);
}
#ifdef CONFIG_F2FS_FS_COMPRESSION
/* overwrite compressed file */
if (f2fs_compressed_file(inode) && fsdata) {
f2fs_compress_write_end(inode, fsdata, page->index, copied);
f2fs_compress_write_end(inode, fsdata, folio->index, copied);
f2fs_update_time(F2FS_I_SB(inode), REQ_TIME);
if (pos + copied > i_size_read(inode) &&
@ -3723,10 +3729,10 @@ static int f2fs_write_end(struct file *file,
if (!copied)
goto unlock_out;
set_page_dirty(page);
folio_mark_dirty(folio);
if (f2fs_is_atomic_file(inode))
set_page_private_atomic(page);
set_page_private_atomic(&folio->page);
if (pos + copied > i_size_read(inode) &&
!f2fs_verity_in_progress(inode)) {
@ -3736,7 +3742,8 @@ static int f2fs_write_end(struct file *file,
pos + copied);
}
unlock_out:
f2fs_put_page(page, 1);
folio_unlock(folio);
folio_put(folio);
f2fs_update_time(F2FS_I_SB(inode), REQ_TIME);
return copied;
}

View File

@ -2676,7 +2676,7 @@ static ssize_t f2fs_quota_write(struct super_block *sb, int type,
const struct address_space_operations *a_ops = mapping->a_ops;
int offset = off & (sb->s_blocksize - 1);
size_t towrite = len;
struct page *page;
struct folio *folio;
void *fsdata = NULL;
int err = 0;
int tocopy;
@ -2686,7 +2686,7 @@ static ssize_t f2fs_quota_write(struct super_block *sb, int type,
towrite);
retry:
err = a_ops->write_begin(NULL, mapping, off, tocopy,
&page, &fsdata);
&folio, &fsdata);
if (unlikely(err)) {
if (err == -ENOMEM) {
f2fs_io_schedule_timeout(DEFAULT_IO_TIMEOUT);
@ -2696,10 +2696,10 @@ retry:
break;
}
memcpy_to_page(page, offset, data, tocopy);
memcpy_to_folio(folio, offset_in_folio(folio, off), data, tocopy);
a_ops->write_end(NULL, mapping, off, tocopy, tocopy,
page, fsdata);
folio, fsdata);
offset = 0;
towrite -= tocopy;
off += tocopy;

View File

@ -80,17 +80,17 @@ static int pagecache_write(struct inode *inode, const void *buf, size_t count,
while (count) {
size_t n = min_t(size_t, count,
PAGE_SIZE - offset_in_page(pos));
struct page *page;
struct folio *folio;
void *fsdata = NULL;
int res;
res = aops->write_begin(NULL, mapping, pos, n, &page, &fsdata);
res = aops->write_begin(NULL, mapping, pos, n, &folio, &fsdata);
if (res)
return res;
memcpy_to_page(page, offset_in_page(pos), buf, n);
memcpy_to_folio(folio, offset_in_folio(folio, pos), buf, n);
res = aops->write_end(NULL, mapping, pos, n, n, page, fsdata);
res = aops->write_end(NULL, mapping, pos, n, n, folio, fsdata);
if (res < 0)
return res;
if (res != n)

View File

@ -221,13 +221,12 @@ static void fat_write_failed(struct address_space *mapping, loff_t to)
static int fat_write_begin(struct file *file, struct address_space *mapping,
loff_t pos, unsigned len,
struct page **pagep, void **fsdata)
struct folio **foliop, void **fsdata)
{
int err;
*pagep = NULL;
err = cont_write_begin(file, mapping, pos, len,
pagep, fsdata, fat_get_block,
foliop, fsdata, fat_get_block,
&MSDOS_I(mapping->host)->mmu_private);
if (err < 0)
fat_write_failed(mapping, pos + len);
@ -236,11 +235,11 @@ static int fat_write_begin(struct file *file, struct address_space *mapping,
static int fat_write_end(struct file *file, struct address_space *mapping,
loff_t pos, unsigned len, unsigned copied,
struct page *pagep, void *fsdata)
struct folio *folio, void *fsdata)
{
struct inode *inode = mapping->host;
int err;
err = generic_write_end(file, mapping, pos, len, copied, pagep, fsdata);
err = generic_write_end(file, mapping, pos, len, copied, folio, fsdata);
if (err < len)
fat_write_failed(mapping, pos + len);
if (!(err < 0) && !(MSDOS_I(inode)->i_attrs & ATTR_ARCH)) {

View File

@ -33,6 +33,8 @@
#include <asm/siginfo.h>
#include <linux/uaccess.h>
#include "internal.h"
#define SETFL_MASK (O_APPEND | O_NONBLOCK | O_NDELAY | O_DIRECT | O_NOATIME)
static int setfl(int fd, struct file * filp, unsigned int arg)
@ -87,22 +89,64 @@ static int setfl(int fd, struct file * filp, unsigned int arg)
return error;
}
/*
* Allocate an file->f_owner struct if it doesn't exist, handling racing
* allocations correctly.
*/
int file_f_owner_allocate(struct file *file)
{
struct fown_struct *f_owner;
f_owner = file_f_owner(file);
if (f_owner)
return 0;
f_owner = kzalloc(sizeof(struct fown_struct), GFP_KERNEL);
if (!f_owner)
return -ENOMEM;
rwlock_init(&f_owner->lock);
f_owner->file = file;
/* If someone else raced us, drop our allocation. */
if (unlikely(cmpxchg(&file->f_owner, NULL, f_owner)))
kfree(f_owner);
return 0;
}
EXPORT_SYMBOL(file_f_owner_allocate);
void file_f_owner_release(struct file *file)
{
struct fown_struct *f_owner;
f_owner = file_f_owner(file);
if (f_owner) {
put_pid(f_owner->pid);
kfree(f_owner);
}
}
static void f_modown(struct file *filp, struct pid *pid, enum pid_type type,
int force)
{
write_lock_irq(&filp->f_owner.lock);
if (force || !filp->f_owner.pid) {
put_pid(filp->f_owner.pid);
filp->f_owner.pid = get_pid(pid);
filp->f_owner.pid_type = type;
struct fown_struct *f_owner;
f_owner = file_f_owner(filp);
if (WARN_ON_ONCE(!f_owner))
return;
write_lock_irq(&f_owner->lock);
if (force || !f_owner->pid) {
put_pid(f_owner->pid);
f_owner->pid = get_pid(pid);
f_owner->pid_type = type;
if (pid) {
const struct cred *cred = current_cred();
filp->f_owner.uid = cred->uid;
filp->f_owner.euid = cred->euid;
f_owner->uid = cred->uid;
f_owner->euid = cred->euid;
}
}
write_unlock_irq(&filp->f_owner.lock);
write_unlock_irq(&f_owner->lock);
}
void __f_setown(struct file *filp, struct pid *pid, enum pid_type type,
@ -119,6 +163,8 @@ int f_setown(struct file *filp, int who, int force)
struct pid *pid = NULL;
int ret = 0;
might_sleep();
type = PIDTYPE_TGID;
if (who < 0) {
/* avoid overflow below */
@ -129,6 +175,10 @@ int f_setown(struct file *filp, int who, int force)
who = -who;
}
ret = file_f_owner_allocate(filp);
if (ret)
return ret;
rcu_read_lock();
if (who) {
pid = find_vpid(who);
@ -152,16 +202,21 @@ void f_delown(struct file *filp)
pid_t f_getown(struct file *filp)
{
pid_t pid = 0;
struct fown_struct *f_owner;
read_lock_irq(&filp->f_owner.lock);
f_owner = file_f_owner(filp);
if (!f_owner)
return pid;
read_lock_irq(&f_owner->lock);
rcu_read_lock();
if (pid_task(filp->f_owner.pid, filp->f_owner.pid_type)) {
pid = pid_vnr(filp->f_owner.pid);
if (filp->f_owner.pid_type == PIDTYPE_PGID)
if (pid_task(f_owner->pid, f_owner->pid_type)) {
pid = pid_vnr(f_owner->pid);
if (f_owner->pid_type == PIDTYPE_PGID)
pid = -pid;
}
rcu_read_unlock();
read_unlock_irq(&filp->f_owner.lock);
read_unlock_irq(&f_owner->lock);
return pid;
}
@ -194,6 +249,10 @@ static int f_setown_ex(struct file *filp, unsigned long arg)
return -EINVAL;
}
ret = file_f_owner_allocate(filp);
if (ret)
return ret;
rcu_read_lock();
pid = find_vpid(owner.pid);
if (owner.pid && !pid)
@ -210,13 +269,20 @@ static int f_getown_ex(struct file *filp, unsigned long arg)
struct f_owner_ex __user *owner_p = (void __user *)arg;
struct f_owner_ex owner = {};
int ret = 0;
struct fown_struct *f_owner;
enum pid_type pid_type = PIDTYPE_PID;
read_lock_irq(&filp->f_owner.lock);
f_owner = file_f_owner(filp);
if (f_owner) {
read_lock_irq(&f_owner->lock);
rcu_read_lock();
if (pid_task(filp->f_owner.pid, filp->f_owner.pid_type))
owner.pid = pid_vnr(filp->f_owner.pid);
if (pid_task(f_owner->pid, f_owner->pid_type))
owner.pid = pid_vnr(f_owner->pid);
rcu_read_unlock();
switch (filp->f_owner.pid_type) {
pid_type = f_owner->pid_type;
}
switch (pid_type) {
case PIDTYPE_PID:
owner.type = F_OWNER_TID;
break;
@ -234,7 +300,8 @@ static int f_getown_ex(struct file *filp, unsigned long arg)
ret = -EINVAL;
break;
}
read_unlock_irq(&filp->f_owner.lock);
if (f_owner)
read_unlock_irq(&f_owner->lock);
if (!ret) {
ret = copy_to_user(owner_p, &owner, sizeof(owner));
@ -248,14 +315,18 @@ static int f_getown_ex(struct file *filp, unsigned long arg)
static int f_getowner_uids(struct file *filp, unsigned long arg)
{
struct user_namespace *user_ns = current_user_ns();
struct fown_struct *f_owner;
uid_t __user *dst = (void __user *)arg;
uid_t src[2];
uid_t src[2] = {0, 0};
int err;
read_lock_irq(&filp->f_owner.lock);
src[0] = from_kuid(user_ns, filp->f_owner.uid);
src[1] = from_kuid(user_ns, filp->f_owner.euid);
read_unlock_irq(&filp->f_owner.lock);
f_owner = file_f_owner(filp);
if (f_owner) {
read_lock_irq(&f_owner->lock);
src[0] = from_kuid(user_ns, f_owner->uid);
src[1] = from_kuid(user_ns, f_owner->euid);
read_unlock_irq(&f_owner->lock);
}
err = put_user(src[0], &dst[0]);
err |= put_user(src[1], &dst[1]);
@ -343,6 +414,36 @@ static long f_dupfd_query(int fd, struct file *filp)
return f.file == filp;
}
/* Let the caller figure out whether a given file was just created. */
static long f_created_query(const struct file *filp)
{
return !!(filp->f_mode & FMODE_CREATED);
}
static int f_owner_sig(struct file *filp, int signum, bool setsig)
{
int ret = 0;
struct fown_struct *f_owner;
might_sleep();
if (setsig) {
if (!valid_signal(signum))
return -EINVAL;
ret = file_f_owner_allocate(filp);
if (ret)
return ret;
}
f_owner = file_f_owner(filp);
if (setsig)
f_owner->signum = signum;
else if (f_owner)
ret = f_owner->signum;
return ret;
}
static long do_fcntl(int fd, unsigned int cmd, unsigned long arg,
struct file *filp)
{
@ -352,6 +453,9 @@ static long do_fcntl(int fd, unsigned int cmd, unsigned long arg,
long err = -EINVAL;
switch (cmd) {
case F_CREATED_QUERY:
err = f_created_query(filp);
break;
case F_DUPFD:
err = f_dupfd(argi, filp, 0);
break;
@ -421,15 +525,10 @@ static long do_fcntl(int fd, unsigned int cmd, unsigned long arg,
err = f_getowner_uids(filp, arg);
break;
case F_GETSIG:
err = filp->f_owner.signum;
err = f_owner_sig(filp, 0, false);
break;
case F_SETSIG:
/* arg == 0 restores default behaviour. */
if (!valid_signal(argi)) {
break;
}
err = 0;
filp->f_owner.signum = argi;
err = f_owner_sig(filp, argi, true);
break;
case F_GETLEASE:
err = fcntl_getlease(filp);
@ -463,6 +562,7 @@ static long do_fcntl(int fd, unsigned int cmd, unsigned long arg,
static int check_fcntl_cmd(unsigned cmd)
{
switch (cmd) {
case F_CREATED_QUERY:
case F_DUPFD:
case F_DUPFD_CLOEXEC:
case F_DUPFD_QUERY:
@ -844,14 +944,19 @@ static void send_sigurg_to_task(struct task_struct *p,
do_send_sig_info(SIGURG, SEND_SIG_PRIV, p, type);
}
int send_sigurg(struct fown_struct *fown)
int send_sigurg(struct file *file)
{
struct fown_struct *fown;
struct task_struct *p;
enum pid_type type;
struct pid *pid;
unsigned long flags;
int ret = 0;
fown = file_f_owner(file);
if (!fown)
return 0;
read_lock_irqsave(&fown->lock, flags);
type = fown->pid_type;
@ -1027,13 +1132,16 @@ static void kill_fasync_rcu(struct fasync_struct *fa, int sig, int band)
}
read_lock_irqsave(&fa->fa_lock, flags);
if (fa->fa_file) {
fown = &fa->fa_file->f_owner;
fown = file_f_owner(fa->fa_file);
if (!fown)
goto next;
/* Don't send SIGURG to processes which have not set a
queued signum: SIGURG has its own default signalling
mechanism. */
if (!(sig == SIGURG && fown->signum == 0))
send_sigio(fown, fa->fa_fd, band);
}
next:
read_unlock_irqrestore(&fa->fa_lock, flags);
fa = rcu_dereference(fa->fa_next);
}

View File

@ -657,7 +657,7 @@ int close_fd(unsigned fd)
return filp_close(file, files);
}
EXPORT_SYMBOL(close_fd); /* for ksys_close() */
EXPORT_SYMBOL(close_fd);
/**
* last_fd - return last valid index into fd table

View File

@ -136,6 +136,7 @@ static int __init init_fs_stat_sysctls(void)
register_sysctl_init("fs", fs_stat_sysctls);
if (IS_ENABLED(CONFIG_BINFMT_MISC)) {
struct ctl_table_header *hdr;
hdr = register_sysctl_mount_point("fs/binfmt_misc");
kmemleak_not_leak(hdr);
}
@ -155,7 +156,6 @@ static int init_file(struct file *f, int flags, const struct cred *cred)
return error;
}
rwlock_init(&f->f_owner.lock);
spin_lock_init(&f->f_lock);
mutex_init(&f->f_pos_lock);
f->f_flags = flags;
@ -383,7 +383,9 @@ EXPORT_SYMBOL_GPL(alloc_file_pseudo_noaccount);
struct file *alloc_file_clone(struct file *base, int flags,
const struct file_operations *fops)
{
struct file *f = alloc_file(&base->f_path, flags, fops);
struct file *f;
f = alloc_file(&base->f_path, flags, fops);
if (!IS_ERR(f)) {
path_get(&f->f_path);
f->f_mapping = base->f_mapping;
@ -425,7 +427,7 @@ static void __fput(struct file *file)
cdev_put(inode->i_cdev);
}
fops_put(file->f_op);
put_pid(file->f_owner.pid);
file_f_owner_release(file);
put_file_access(file);
dput(dentry);
if (unlikely(mode & FMODE_NEED_UNMOUNT))
@ -512,9 +514,9 @@ EXPORT_SYMBOL(__fput_sync);
void __init files_init(void)
{
filp_cachep = kmem_cache_create("filp", sizeof(struct file), 0,
SLAB_TYPESAFE_BY_RCU | SLAB_HWCACHE_ALIGN |
SLAB_PANIC | SLAB_ACCOUNT, NULL);
filp_cachep = kmem_cache_create_rcu("filp", sizeof(struct file),
offsetof(struct file, f_freeptr),
SLAB_HWCACHE_ALIGN | SLAB_PANIC | SLAB_ACCOUNT);
percpu_counter_init(&nr_files, 0, GFP_KERNEL);
}

View File

@ -1132,6 +1132,7 @@ out_bdi_put:
/**
* cgroup_writeback_umount - flush inode wb switches for umount
* @sb: target super_block
*
* This function is called when a super_block is about to be destroyed and
* flushes in-flight inode wb switches. An inode wb switch goes through
@ -1140,8 +1141,12 @@ out_bdi_put:
* rare occurrences and synchronize_rcu() can take a while, perform
* flushing iff wb switches are in flight.
*/
void cgroup_writeback_umount(void)
void cgroup_writeback_umount(struct super_block *sb)
{
if (!(sb->s_bdi->capabilities & BDI_CAP_WRITEBACK))
return;
/*
* SB_ACTIVE should be reliably cleared before checking
* isw_nr_in_flight, see generic_shutdown_super().
@ -1381,12 +1386,13 @@ static void requeue_io(struct inode *inode, struct bdi_writeback *wb)
static void inode_sync_complete(struct inode *inode)
{
assert_spin_locked(&inode->i_lock);
inode->i_state &= ~I_SYNC;
/* If inode is clean an unused, put it into LRU now... */
inode_add_lru(inode);
/* Waiters must see I_SYNC cleared before being woken up */
smp_mb();
wake_up_bit(&inode->i_state, __I_SYNC);
/* Called with inode->i_lock which ensures memory ordering. */
inode_wake_up_bit(inode, __I_SYNC);
}
static bool inode_dirtied_after(struct inode *inode, unsigned long t)
@ -1505,30 +1511,27 @@ static int write_inode(struct inode *inode, struct writeback_control *wbc)
* Wait for writeback on an inode to complete. Called with i_lock held.
* Caller must make sure inode cannot go away when we drop i_lock.
*/
static void __inode_wait_for_writeback(struct inode *inode)
__releases(inode->i_lock)
__acquires(inode->i_lock)
{
DEFINE_WAIT_BIT(wq, &inode->i_state, __I_SYNC);
wait_queue_head_t *wqh;
wqh = bit_waitqueue(&inode->i_state, __I_SYNC);
while (inode->i_state & I_SYNC) {
spin_unlock(&inode->i_lock);
__wait_on_bit(wqh, &wq, bit_wait,
TASK_UNINTERRUPTIBLE);
spin_lock(&inode->i_lock);
}
}
/*
* Wait for writeback on an inode to complete. Caller must have inode pinned.
*/
void inode_wait_for_writeback(struct inode *inode)
{
spin_lock(&inode->i_lock);
__inode_wait_for_writeback(inode);
struct wait_bit_queue_entry wqe;
struct wait_queue_head *wq_head;
assert_spin_locked(&inode->i_lock);
if (!(inode->i_state & I_SYNC))
return;
wq_head = inode_bit_waitqueue(&wqe, inode, __I_SYNC);
for (;;) {
prepare_to_wait_event(wq_head, &wqe.wq_entry, TASK_UNINTERRUPTIBLE);
/* Checking I_SYNC with inode->i_lock guarantees memory ordering. */
if (!(inode->i_state & I_SYNC))
break;
spin_unlock(&inode->i_lock);
schedule();
spin_lock(&inode->i_lock);
}
finish_wait(wq_head, &wqe.wq_entry);
}
/*
@ -1539,16 +1542,20 @@ void inode_wait_for_writeback(struct inode *inode)
static void inode_sleep_on_writeback(struct inode *inode)
__releases(inode->i_lock)
{
DEFINE_WAIT(wait);
wait_queue_head_t *wqh = bit_waitqueue(&inode->i_state, __I_SYNC);
int sleep;
struct wait_bit_queue_entry wqe;
struct wait_queue_head *wq_head;
bool sleep;
prepare_to_wait(wqh, &wait, TASK_UNINTERRUPTIBLE);
sleep = inode->i_state & I_SYNC;
assert_spin_locked(&inode->i_lock);
wq_head = inode_bit_waitqueue(&wqe, inode, __I_SYNC);
prepare_to_wait_event(wq_head, &wqe.wq_entry, TASK_UNINTERRUPTIBLE);
/* Checking I_SYNC with inode->i_lock guarantees memory ordering. */
sleep = !!(inode->i_state & I_SYNC);
spin_unlock(&inode->i_lock);
if (sleep)
schedule();
finish_wait(wqh, &wait);
finish_wait(wq_head, &wqe.wq_entry);
}
/*
@ -1752,7 +1759,7 @@ static int writeback_single_inode(struct inode *inode,
*/
if (wbc->sync_mode != WB_SYNC_ALL)
goto out;
__inode_wait_for_writeback(inode);
inode_wait_for_writeback(inode);
}
WARN_ON(inode->i_state & I_SYNC);
/*

View File

@ -12,7 +12,6 @@
#include <linux/posix_acl_xattr.h>
static struct posix_acl *__fuse_get_acl(struct fuse_conn *fc,
struct mnt_idmap *idmap,
struct inode *inode, int type, bool rcu)
{
int size;
@ -74,7 +73,7 @@ struct posix_acl *fuse_get_acl(struct mnt_idmap *idmap,
if (fuse_no_acl(fc, inode))
return ERR_PTR(-EOPNOTSUPP);
return __fuse_get_acl(fc, idmap, inode, type, false);
return __fuse_get_acl(fc, inode, type, false);
}
struct posix_acl *fuse_get_inode_acl(struct inode *inode, int type, bool rcu)
@ -90,8 +89,7 @@ struct posix_acl *fuse_get_inode_acl(struct inode *inode, int type, bool rcu)
*/
if (!fc->posix_acl)
return NULL;
return __fuse_get_acl(fc, &nop_mnt_idmap, inode, type, rcu);
return __fuse_get_acl(fc, inode, type, rcu);
}
int fuse_set_acl(struct mnt_idmap *idmap, struct dentry *dentry,
@ -146,8 +144,8 @@ int fuse_set_acl(struct mnt_idmap *idmap, struct dentry *dentry,
* be stripped.
*/
if (fc->posix_acl &&
!in_group_or_capable(&nop_mnt_idmap, inode,
i_gid_into_vfsgid(&nop_mnt_idmap, inode)))
!in_group_or_capable(idmap, inode,
i_gid_into_vfsgid(idmap, inode)))
extra_flags |= FUSE_SETXATTR_ACL_KILL_SGID;
ret = fuse_setxattr(inode, name, value, size, 0, extra_flags);

View File

@ -572,7 +572,33 @@ static int get_create_supp_group(struct inode *dir, struct fuse_in_arg *ext)
return 0;
}
static int get_create_ext(struct fuse_args *args,
static int get_owner_uid_gid(struct mnt_idmap *idmap, struct fuse_conn *fc, struct fuse_in_arg *ext)
{
struct fuse_ext_header *xh;
struct fuse_owner_uid_gid *owner_creds;
u32 owner_creds_len = fuse_ext_size(sizeof(*owner_creds));
kuid_t owner_fsuid;
kgid_t owner_fsgid;
xh = extend_arg(ext, owner_creds_len);
if (!xh)
return -ENOMEM;
xh->size = owner_creds_len;
xh->type = FUSE_EXT_OWNER_UID_GID;
owner_creds = (struct fuse_owner_uid_gid *) &xh[1];
owner_fsuid = mapped_fsuid(idmap, fc->user_ns);
owner_fsgid = mapped_fsgid(idmap, fc->user_ns);
owner_creds->uid = from_kuid(fc->user_ns, owner_fsuid);
owner_creds->gid = from_kgid(fc->user_ns, owner_fsgid);
return 0;
}
static int get_create_ext(struct mnt_idmap *idmap,
struct fuse_args *args,
struct inode *dir, struct dentry *dentry,
umode_t mode)
{
@ -584,6 +610,8 @@ static int get_create_ext(struct fuse_args *args,
err = get_security_context(dentry, mode, &ext);
if (!err && fc->create_supp_group)
err = get_create_supp_group(dir, &ext);
if (!err && fc->owner_uid_gid_ext)
err = get_owner_uid_gid(idmap, fc, &ext);
if (!err && ext.size) {
WARN_ON(args->in_numargs >= ARRAY_SIZE(args->in_args));
@ -609,9 +637,9 @@ static void free_ext_value(struct fuse_args *args)
* If the filesystem doesn't support this, then fall back to separate
* 'mknod' + 'open' requests.
*/
static int fuse_create_open(struct inode *dir, struct dentry *entry,
struct file *file, unsigned int flags,
umode_t mode, u32 opcode)
static int fuse_create_open(struct mnt_idmap *idmap, struct inode *dir,
struct dentry *entry, struct file *file,
unsigned int flags, umode_t mode, u32 opcode)
{
int err;
struct inode *inode;
@ -668,7 +696,7 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
args.out_args[1].size = sizeof(*outopenp);
args.out_args[1].value = outopenp;
err = get_create_ext(&args, dir, entry, mode);
err = get_create_ext(idmap, &args, dir, entry, mode);
if (err)
goto out_put_forget_req;
@ -729,6 +757,7 @@ static int fuse_atomic_open(struct inode *dir, struct dentry *entry,
umode_t mode)
{
int err;
struct mnt_idmap *idmap = file_mnt_idmap(file);
struct fuse_conn *fc = get_fuse_conn(dir);
struct dentry *res = NULL;
@ -753,7 +782,7 @@ static int fuse_atomic_open(struct inode *dir, struct dentry *entry,
if (fc->no_create)
goto mknod;
err = fuse_create_open(dir, entry, file, flags, mode, FUSE_CREATE);
err = fuse_create_open(idmap, dir, entry, file, flags, mode, FUSE_CREATE);
if (err == -ENOSYS) {
fc->no_create = 1;
goto mknod;
@ -764,7 +793,7 @@ out_dput:
return err;
mknod:
err = fuse_mknod(&nop_mnt_idmap, dir, entry, mode, 0);
err = fuse_mknod(idmap, dir, entry, mode, 0);
if (err)
goto out_dput;
no_open:
@ -774,9 +803,9 @@ no_open:
/*
* Code shared between mknod, mkdir, symlink and link
*/
static int create_new_entry(struct fuse_mount *fm, struct fuse_args *args,
struct inode *dir, struct dentry *entry,
umode_t mode)
static int create_new_entry(struct mnt_idmap *idmap, struct fuse_mount *fm,
struct fuse_args *args, struct inode *dir,
struct dentry *entry, umode_t mode)
{
struct fuse_entry_out outarg;
struct inode *inode;
@ -798,7 +827,7 @@ static int create_new_entry(struct fuse_mount *fm, struct fuse_args *args,
args->out_args[0].value = &outarg;
if (args->opcode != FUSE_LINK) {
err = get_create_ext(args, dir, entry, mode);
err = get_create_ext(idmap, args, dir, entry, mode);
if (err)
goto out_put_forget_req;
}
@ -864,13 +893,13 @@ static int fuse_mknod(struct mnt_idmap *idmap, struct inode *dir,
args.in_args[0].value = &inarg;
args.in_args[1].size = entry->d_name.len + 1;
args.in_args[1].value = entry->d_name.name;
return create_new_entry(fm, &args, dir, entry, mode);
return create_new_entry(idmap, fm, &args, dir, entry, mode);
}
static int fuse_create(struct mnt_idmap *idmap, struct inode *dir,
struct dentry *entry, umode_t mode, bool excl)
{
return fuse_mknod(&nop_mnt_idmap, dir, entry, mode, 0);
return fuse_mknod(idmap, dir, entry, mode, 0);
}
static int fuse_tmpfile(struct mnt_idmap *idmap, struct inode *dir,
@ -882,7 +911,7 @@ static int fuse_tmpfile(struct mnt_idmap *idmap, struct inode *dir,
if (fc->no_tmpfile)
return -EOPNOTSUPP;
err = fuse_create_open(dir, file->f_path.dentry, file, file->f_flags, mode, FUSE_TMPFILE);
err = fuse_create_open(idmap, dir, file->f_path.dentry, file, file->f_flags, mode, FUSE_TMPFILE);
if (err == -ENOSYS) {
fc->no_tmpfile = 1;
err = -EOPNOTSUPP;
@ -909,7 +938,7 @@ static int fuse_mkdir(struct mnt_idmap *idmap, struct inode *dir,
args.in_args[0].value = &inarg;
args.in_args[1].size = entry->d_name.len + 1;
args.in_args[1].value = entry->d_name.name;
return create_new_entry(fm, &args, dir, entry, S_IFDIR);
return create_new_entry(idmap, fm, &args, dir, entry, S_IFDIR);
}
static int fuse_symlink(struct mnt_idmap *idmap, struct inode *dir,
@ -925,7 +954,7 @@ static int fuse_symlink(struct mnt_idmap *idmap, struct inode *dir,
args.in_args[0].value = entry->d_name.name;
args.in_args[1].size = len;
args.in_args[1].value = link;
return create_new_entry(fm, &args, dir, entry, S_IFLNK);
return create_new_entry(idmap, fm, &args, dir, entry, S_IFLNK);
}
void fuse_flush_time_update(struct inode *inode)
@ -1082,6 +1111,9 @@ static int fuse_rename2(struct mnt_idmap *idmap, struct inode *olddir,
if (flags & ~(RENAME_NOREPLACE | RENAME_EXCHANGE | RENAME_WHITEOUT))
return -EINVAL;
if ((flags & RENAME_WHITEOUT) && (idmap != &nop_mnt_idmap))
return -EINVAL;
if (flags) {
if (fc->no_rename2 || fc->minor < 23)
return -EINVAL;
@ -1119,7 +1151,7 @@ static int fuse_link(struct dentry *entry, struct inode *newdir,
args.in_args[0].value = &inarg;
args.in_args[1].size = newent->d_name.len + 1;
args.in_args[1].value = newent->d_name.name;
err = create_new_entry(fm, &args, newdir, newent, inode->i_mode);
err = create_new_entry(&nop_mnt_idmap, fm, &args, newdir, newent, inode->i_mode);
if (!err)
fuse_update_ctime_in_cache(inode);
else if (err == -EINTR)
@ -1128,18 +1160,22 @@ static int fuse_link(struct dentry *entry, struct inode *newdir,
return err;
}
static void fuse_fillattr(struct inode *inode, struct fuse_attr *attr,
struct kstat *stat)
static void fuse_fillattr(struct mnt_idmap *idmap, struct inode *inode,
struct fuse_attr *attr, struct kstat *stat)
{
unsigned int blkbits;
struct fuse_conn *fc = get_fuse_conn(inode);
vfsuid_t vfsuid = make_vfsuid(idmap, fc->user_ns,
make_kuid(fc->user_ns, attr->uid));
vfsgid_t vfsgid = make_vfsgid(idmap, fc->user_ns,
make_kgid(fc->user_ns, attr->gid));
stat->dev = inode->i_sb->s_dev;
stat->ino = attr->ino;
stat->mode = (inode->i_mode & S_IFMT) | (attr->mode & 07777);
stat->nlink = attr->nlink;
stat->uid = make_kuid(fc->user_ns, attr->uid);
stat->gid = make_kgid(fc->user_ns, attr->gid);
stat->uid = vfsuid_into_kuid(vfsuid);
stat->gid = vfsgid_into_kgid(vfsgid);
stat->rdev = inode->i_rdev;
stat->atime.tv_sec = attr->atime;
stat->atime.tv_nsec = attr->atimensec;
@ -1178,8 +1214,8 @@ static void fuse_statx_to_attr(struct fuse_statx *sx, struct fuse_attr *attr)
attr->blksize = sx->blksize;
}
static int fuse_do_statx(struct inode *inode, struct file *file,
struct kstat *stat)
static int fuse_do_statx(struct mnt_idmap *idmap, struct inode *inode,
struct file *file, struct kstat *stat)
{
int err;
struct fuse_attr attr;
@ -1232,15 +1268,15 @@ static int fuse_do_statx(struct inode *inode, struct file *file,
stat->result_mask = sx->mask & (STATX_BASIC_STATS | STATX_BTIME);
stat->btime.tv_sec = sx->btime.tv_sec;
stat->btime.tv_nsec = min_t(u32, sx->btime.tv_nsec, NSEC_PER_SEC - 1);
fuse_fillattr(inode, &attr, stat);
fuse_fillattr(idmap, inode, &attr, stat);
stat->result_mask |= STATX_TYPE;
}
return 0;
}
static int fuse_do_getattr(struct inode *inode, struct kstat *stat,
struct file *file)
static int fuse_do_getattr(struct mnt_idmap *idmap, struct inode *inode,
struct kstat *stat, struct file *file)
{
int err;
struct fuse_getattr_in inarg;
@ -1279,15 +1315,15 @@ static int fuse_do_getattr(struct inode *inode, struct kstat *stat,
ATTR_TIMEOUT(&outarg),
attr_version);
if (stat)
fuse_fillattr(inode, &outarg.attr, stat);
fuse_fillattr(idmap, inode, &outarg.attr, stat);
}
}
return err;
}
static int fuse_update_get_attr(struct inode *inode, struct file *file,
struct kstat *stat, u32 request_mask,
unsigned int flags)
static int fuse_update_get_attr(struct mnt_idmap *idmap, struct inode *inode,
struct file *file, struct kstat *stat,
u32 request_mask, unsigned int flags)
{
struct fuse_inode *fi = get_fuse_inode(inode);
struct fuse_conn *fc = get_fuse_conn(inode);
@ -1318,17 +1354,17 @@ retry:
forget_all_cached_acls(inode);
/* Try statx if BTIME is requested */
if (!fc->no_statx && (request_mask & ~STATX_BASIC_STATS)) {
err = fuse_do_statx(inode, file, stat);
err = fuse_do_statx(idmap, inode, file, stat);
if (err == -ENOSYS) {
fc->no_statx = 1;
err = 0;
goto retry;
}
} else {
err = fuse_do_getattr(inode, stat, file);
err = fuse_do_getattr(idmap, inode, stat, file);
}
} else if (stat) {
generic_fillattr(&nop_mnt_idmap, request_mask, inode, stat);
generic_fillattr(idmap, request_mask, inode, stat);
stat->mode = fi->orig_i_mode;
stat->ino = fi->orig_ino;
if (test_bit(FUSE_I_BTIME, &fi->state)) {
@ -1342,7 +1378,7 @@ retry:
int fuse_update_attributes(struct inode *inode, struct file *file, u32 mask)
{
return fuse_update_get_attr(inode, file, NULL, mask, 0);
return fuse_update_get_attr(&nop_mnt_idmap, inode, file, NULL, mask, 0);
}
int fuse_reverse_inval_entry(struct fuse_conn *fc, u64 parent_nodeid,
@ -1486,7 +1522,7 @@ static int fuse_perm_getattr(struct inode *inode, int mask)
return -ECHILD;
forget_all_cached_acls(inode);
return fuse_do_getattr(inode, NULL, NULL);
return fuse_do_getattr(&nop_mnt_idmap, inode, NULL, NULL);
}
/*
@ -1534,7 +1570,7 @@ static int fuse_permission(struct mnt_idmap *idmap,
}
if (fc->default_permissions) {
err = generic_permission(&nop_mnt_idmap, inode, mask);
err = generic_permission(idmap, inode, mask);
/* If permission is denied, try to refresh file
attributes. This is also needed, because the root
@ -1542,7 +1578,7 @@ static int fuse_permission(struct mnt_idmap *idmap,
if (err == -EACCES && !refreshed) {
err = fuse_perm_getattr(inode, mask);
if (!err)
err = generic_permission(&nop_mnt_idmap,
err = generic_permission(idmap,
inode, mask);
}
@ -1738,17 +1774,27 @@ static bool update_mtime(unsigned ivalid, bool trust_local_mtime)
return true;
}
static void iattr_to_fattr(struct fuse_conn *fc, struct iattr *iattr,
struct fuse_setattr_in *arg, bool trust_local_cmtime)
static void iattr_to_fattr(struct mnt_idmap *idmap, struct fuse_conn *fc,
struct iattr *iattr, struct fuse_setattr_in *arg,
bool trust_local_cmtime)
{
unsigned ivalid = iattr->ia_valid;
if (ivalid & ATTR_MODE)
arg->valid |= FATTR_MODE, arg->mode = iattr->ia_mode;
if (ivalid & ATTR_UID)
arg->valid |= FATTR_UID, arg->uid = from_kuid(fc->user_ns, iattr->ia_uid);
if (ivalid & ATTR_GID)
arg->valid |= FATTR_GID, arg->gid = from_kgid(fc->user_ns, iattr->ia_gid);
if (ivalid & ATTR_UID) {
kuid_t fsuid = from_vfsuid(idmap, fc->user_ns, iattr->ia_vfsuid);
arg->valid |= FATTR_UID;
arg->uid = from_kuid(fc->user_ns, fsuid);
}
if (ivalid & ATTR_GID) {
kgid_t fsgid = from_vfsgid(idmap, fc->user_ns, iattr->ia_vfsgid);
arg->valid |= FATTR_GID;
arg->gid = from_kgid(fc->user_ns, fsgid);
}
if (ivalid & ATTR_SIZE)
arg->valid |= FATTR_SIZE, arg->size = iattr->ia_size;
if (ivalid & ATTR_ATIME) {
@ -1868,8 +1914,8 @@ int fuse_flush_times(struct inode *inode, struct fuse_file *ff)
* vmtruncate() doesn't allow for this case, so do the rlimit checking
* and the actual truncation by hand.
*/
int fuse_do_setattr(struct dentry *dentry, struct iattr *attr,
struct file *file)
int fuse_do_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
struct iattr *attr, struct file *file)
{
struct inode *inode = d_inode(dentry);
struct fuse_mount *fm = get_fuse_mount(inode);
@ -1889,7 +1935,7 @@ int fuse_do_setattr(struct dentry *dentry, struct iattr *attr,
if (!fc->default_permissions)
attr->ia_valid |= ATTR_FORCE;
err = setattr_prepare(&nop_mnt_idmap, dentry, attr);
err = setattr_prepare(idmap, dentry, attr);
if (err)
return err;
@ -1948,7 +1994,7 @@ int fuse_do_setattr(struct dentry *dentry, struct iattr *attr,
memset(&inarg, 0, sizeof(inarg));
memset(&outarg, 0, sizeof(outarg));
iattr_to_fattr(fc, attr, &inarg, trust_local_cmtime);
iattr_to_fattr(idmap, fc, attr, &inarg, trust_local_cmtime);
if (file) {
struct fuse_file *ff = file->private_data;
inarg.valid |= FATTR_FH;
@ -2065,7 +2111,7 @@ static int fuse_setattr(struct mnt_idmap *idmap, struct dentry *entry,
* ia_mode calculation may have used stale i_mode.
* Refresh and recalculate.
*/
ret = fuse_do_getattr(inode, NULL, file);
ret = fuse_do_getattr(idmap, inode, NULL, file);
if (ret)
return ret;
@ -2083,7 +2129,7 @@ static int fuse_setattr(struct mnt_idmap *idmap, struct dentry *entry,
if (!attr->ia_valid)
return 0;
ret = fuse_do_setattr(entry, attr, file);
ret = fuse_do_setattr(idmap, entry, attr, file);
if (!ret) {
/*
* If filesystem supports acls it may have updated acl xattrs in
@ -2122,7 +2168,7 @@ static int fuse_getattr(struct mnt_idmap *idmap,
return -EACCES;
}
return fuse_update_get_attr(inode, NULL, stat, request_mask, flags);
return fuse_update_get_attr(idmap, inode, NULL, stat, request_mask, flags);
}
static const struct inode_operations fuse_dir_inode_operations = {

View File

@ -2387,76 +2387,77 @@ out:
* but how to implement it without killing performance need more thinking.
*/
static int fuse_write_begin(struct file *file, struct address_space *mapping,
loff_t pos, unsigned len, struct page **pagep, void **fsdata)
loff_t pos, unsigned len, struct folio **foliop, void **fsdata)
{
pgoff_t index = pos >> PAGE_SHIFT;
struct fuse_conn *fc = get_fuse_conn(file_inode(file));
struct page *page;
struct folio *folio;
loff_t fsize;
int err = -ENOMEM;
WARN_ON(!fc->writeback_cache);
page = grab_cache_page_write_begin(mapping, index);
if (!page)
folio = __filemap_get_folio(mapping, index, FGP_WRITEBEGIN,
mapping_gfp_mask(mapping));
if (IS_ERR(folio))
goto error;
fuse_wait_on_page_writeback(mapping->host, page->index);
fuse_wait_on_page_writeback(mapping->host, folio->index);
if (PageUptodate(page) || len == PAGE_SIZE)
if (folio_test_uptodate(folio) || len >= folio_size(folio))
goto success;
/*
* Check if the start this page comes after the end of file, in which
* case the readpage can be optimized away.
* Check if the start of this folio comes after the end of file,
* in which case the readpage can be optimized away.
*/
fsize = i_size_read(mapping->host);
if (fsize <= (pos & PAGE_MASK)) {
size_t off = pos & ~PAGE_MASK;
if (fsize <= folio_pos(folio)) {
size_t off = offset_in_folio(folio, pos);
if (off)
zero_user_segment(page, 0, off);
folio_zero_segment(folio, 0, off);
goto success;
}
err = fuse_do_readpage(file, page);
err = fuse_do_readpage(file, &folio->page);
if (err)
goto cleanup;
success:
*pagep = page;
*foliop = folio;
return 0;
cleanup:
unlock_page(page);
put_page(page);
folio_unlock(folio);
folio_put(folio);
error:
return err;
}
static int fuse_write_end(struct file *file, struct address_space *mapping,
loff_t pos, unsigned len, unsigned copied,
struct page *page, void *fsdata)
struct folio *folio, void *fsdata)
{
struct inode *inode = page->mapping->host;
struct inode *inode = folio->mapping->host;
/* Haven't copied anything? Skip zeroing, size extending, dirtying. */
if (!copied)
goto unlock;
pos += copied;
if (!PageUptodate(page)) {
if (!folio_test_uptodate(folio)) {
/* Zero any unwritten bytes at the end of the page */
size_t endoff = pos & ~PAGE_MASK;
if (endoff)
zero_user_segment(page, endoff, PAGE_SIZE);
SetPageUptodate(page);
folio_zero_segment(folio, endoff, PAGE_SIZE);
folio_mark_uptodate(folio);
}
if (pos > inode->i_size)
i_size_write(inode, pos);
set_page_dirty(page);
folio_mark_dirty(folio);
unlock:
unlock_page(page);
put_page(page);
folio_unlock(folio);
folio_put(folio);
return copied;
}
@ -2966,7 +2967,7 @@ static void fuse_do_truncate(struct file *file)
attr.ia_file = file;
attr.ia_valid |= ATTR_FILE;
fuse_do_setattr(file_dentry(file), &attr, file);
fuse_do_setattr(file_mnt_idmap(file), file_dentry(file), &attr, file);
}
static inline loff_t fuse_round_up(struct fuse_conn *fc, loff_t off)

View File

@ -845,6 +845,9 @@ struct fuse_conn {
/* Add supplementary group info when creating a new inode */
unsigned int create_supp_group:1;
/* Add owner_{u,g}id info when creating a new inode */
unsigned int owner_uid_gid_ext:1;
/* Does the filesystem support per inode DAX? */
unsigned int inode_dax:1;
@ -1330,8 +1333,8 @@ bool fuse_write_update_attr(struct inode *inode, loff_t pos, ssize_t written);
int fuse_flush_times(struct inode *inode, struct fuse_file *ff);
int fuse_write_inode(struct inode *inode, struct writeback_control *wbc);
int fuse_do_setattr(struct dentry *dentry, struct iattr *attr,
struct file *file);
int fuse_do_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
struct iattr *attr, struct file *file);
void fuse_set_initialized(struct fuse_conn *fc);

View File

@ -1343,6 +1343,14 @@ static void process_init_reply(struct fuse_mount *fm, struct fuse_args *args,
}
if (flags & FUSE_NO_EXPORT_SUPPORT)
fm->sb->s_export_op = &fuse_export_fid_operations;
if (flags & FUSE_OWNER_UID_GID_EXT)
fc->owner_uid_gid_ext = 1;
if (flags & FUSE_ALLOW_IDMAP) {
if (fc->owner_uid_gid_ext && fc->default_permissions)
fm->sb->s_iflags &= ~SB_I_NOIDMAP;
else
ok = false;
}
} else {
ra_pages = fc->max_read / PAGE_SIZE;
fc->no_lock = 1;
@ -1390,7 +1398,8 @@ void fuse_send_init(struct fuse_mount *fm)
FUSE_HANDLE_KILLPRIV_V2 | FUSE_SETXATTR_EXT | FUSE_INIT_EXT |
FUSE_SECURITY_CTX | FUSE_CREATE_SUPP_GROUP |
FUSE_HAS_EXPIRE_ONLY | FUSE_DIRECT_IO_ALLOW_MMAP |
FUSE_NO_EXPORT_SUPPORT | FUSE_HAS_RESEND;
FUSE_NO_EXPORT_SUPPORT | FUSE_HAS_RESEND | FUSE_OWNER_UID_GID_EXT |
FUSE_ALLOW_IDMAP;
#ifdef CONFIG_FUSE_DAX
if (fm->fc->dax)
flags |= FUSE_MAP_ALIGNMENT;
@ -1567,6 +1576,7 @@ static void fuse_sb_defaults(struct super_block *sb)
sb->s_time_gran = 1;
sb->s_export_op = &fuse_export_operations;
sb->s_iflags |= SB_I_IMA_UNVERIFIABLE_SIGNATURE;
sb->s_iflags |= SB_I_NOIDMAP;
if (sb->s_user_ns != &init_user_ns)
sb->s_iflags |= SB_I_UNTRUSTED_MOUNTER;
sb->s_flags &= ~(SB_NOSEC | SB_I_VERSION);
@ -1979,7 +1989,7 @@ static void fuse_kill_sb_anon(struct super_block *sb)
static struct file_system_type fuse_fs_type = {
.owner = THIS_MODULE,
.name = "fuse",
.fs_flags = FS_HAS_SUBTYPE | FS_USERNS_MOUNT,
.fs_flags = FS_HAS_SUBTYPE | FS_USERNS_MOUNT | FS_ALLOW_IDMAP,
.init_fs_context = fuse_init_fs_context,
.parameters = fuse_fs_parameters,
.kill_sb = fuse_kill_sb_anon,
@ -2000,7 +2010,7 @@ static struct file_system_type fuseblk_fs_type = {
.init_fs_context = fuse_init_fs_context,
.parameters = fuse_fs_parameters,
.kill_sb = fuse_kill_sb_blk,
.fs_flags = FS_REQUIRES_DEV | FS_HAS_SUBTYPE,
.fs_flags = FS_REQUIRES_DEV | FS_HAS_SUBTYPE | FS_ALLOW_IDMAP,
};
MODULE_ALIAS_FS("fuseblk");

View File

@ -1628,6 +1628,7 @@ static struct file_system_type virtio_fs_type = {
.name = "virtiofs",
.init_fs_context = virtio_fs_init_fs_context,
.kill_sb = virtio_kill_sb,
.fs_flags = FS_ALLOW_IDMAP,
};
static int virtio_fs_uevent(const struct kobject *kobj, struct kobj_uevent_env *env)

View File

@ -1057,7 +1057,7 @@ retry:
}
pagefault_disable();
ret = iomap_file_buffered_write(iocb, from, &gfs2_iomap_ops);
ret = iomap_file_buffered_write(iocb, from, &gfs2_iomap_ops, NULL);
pagefault_enable();
if (ret > 0)
written += ret;

View File

@ -487,15 +487,15 @@ void hfs_file_truncate(struct inode *inode)
if (inode->i_size > HFS_I(inode)->phys_size) {
struct address_space *mapping = inode->i_mapping;
void *fsdata = NULL;
struct page *page;
struct folio *folio;
/* XXX: Can use generic_cont_expand? */
size = inode->i_size - 1;
res = hfs_write_begin(NULL, mapping, size + 1, 0, &page,
res = hfs_write_begin(NULL, mapping, size + 1, 0, &folio,
&fsdata);
if (!res) {
res = generic_write_end(NULL, mapping, size + 1, 0, 0,
page, fsdata);
folio, fsdata);
}
if (res)
inode->i_size = HFS_I(inode)->phys_size;

View File

@ -202,7 +202,7 @@ extern const struct address_space_operations hfs_aops;
extern const struct address_space_operations hfs_btree_aops;
int hfs_write_begin(struct file *file, struct address_space *mapping,
loff_t pos, unsigned len, struct page **pagep, void **fsdata);
loff_t pos, unsigned len, struct folio **foliop, void **fsdata);
extern struct inode *hfs_new_inode(struct inode *, const struct qstr *, umode_t);
extern void hfs_inode_write_fork(struct inode *, struct hfs_extent *, __be32 *, __be32 *);
extern int hfs_write_inode(struct inode *, struct writeback_control *);

View File

@ -45,12 +45,11 @@ static void hfs_write_failed(struct address_space *mapping, loff_t to)
}
int hfs_write_begin(struct file *file, struct address_space *mapping,
loff_t pos, unsigned len, struct page **pagep, void **fsdata)
loff_t pos, unsigned len, struct folio **foliop, void **fsdata)
{
int ret;
*pagep = NULL;
ret = cont_write_begin(file, mapping, pos, len, pagep, fsdata,
ret = cont_write_begin(file, mapping, pos, len, foliop, fsdata,
hfs_get_block,
&HFS_I(mapping->host)->phys_size);
if (unlikely(ret))

View File

@ -554,16 +554,16 @@ void hfsplus_file_truncate(struct inode *inode)
if (inode->i_size > hip->phys_size) {
struct address_space *mapping = inode->i_mapping;
struct page *page;
struct folio *folio;
void *fsdata = NULL;
loff_t size = inode->i_size;
res = hfsplus_write_begin(NULL, mapping, size, 0,
&page, &fsdata);
&folio, &fsdata);
if (res)
return;
res = generic_write_end(NULL, mapping, size, 0, 0,
page, fsdata);
folio, fsdata);
if (res < 0)
return;
mark_inode_dirty(inode);

View File

@ -472,7 +472,7 @@ extern const struct address_space_operations hfsplus_btree_aops;
extern const struct dentry_operations hfsplus_dentry_operations;
int hfsplus_write_begin(struct file *file, struct address_space *mapping,
loff_t pos, unsigned len, struct page **pagep, void **fsdata);
loff_t pos, unsigned len, struct folio **foliop, void **fsdata);
struct inode *hfsplus_new_inode(struct super_block *sb, struct inode *dir,
umode_t mode);
void hfsplus_delete_inode(struct inode *inode);

View File

@ -39,12 +39,11 @@ static void hfsplus_write_failed(struct address_space *mapping, loff_t to)
}
int hfsplus_write_begin(struct file *file, struct address_space *mapping,
loff_t pos, unsigned len, struct page **pagep, void **fsdata)
loff_t pos, unsigned len, struct folio **foliop, void **fsdata)
{
int ret;
*pagep = NULL;
ret = cont_write_begin(file, mapping, pos, len, pagep, fsdata,
ret = cont_write_begin(file, mapping, pos, len, foliop, fsdata,
hfsplus_get_block,
&HFSPLUS_I(mapping->host)->phys_size);
if (unlikely(ret))

View File

@ -465,31 +465,32 @@ static int hostfs_read_folio(struct file *file, struct folio *folio)
static int hostfs_write_begin(struct file *file, struct address_space *mapping,
loff_t pos, unsigned len,
struct page **pagep, void **fsdata)
struct folio **foliop, void **fsdata)
{
pgoff_t index = pos >> PAGE_SHIFT;
*pagep = grab_cache_page_write_begin(mapping, index);
if (!*pagep)
*foliop = __filemap_get_folio(mapping, index, FGP_WRITEBEGIN,
mapping_gfp_mask(mapping));
if (!*foliop)
return -ENOMEM;
return 0;
}
static int hostfs_write_end(struct file *file, struct address_space *mapping,
loff_t pos, unsigned len, unsigned copied,
struct page *page, void *fsdata)
struct folio *folio, void *fsdata)
{
struct inode *inode = mapping->host;
void *buffer;
unsigned from = pos & (PAGE_SIZE - 1);
size_t from = offset_in_folio(folio, pos);
int err;
buffer = kmap_local_page(page);
err = write_file(FILE_HOSTFS_I(file)->fd, &pos, buffer + from, copied);
buffer = kmap_local_folio(folio, from);
err = write_file(FILE_HOSTFS_I(file)->fd, &pos, buffer, copied);
kunmap_local(buffer);
if (!PageUptodate(page) && err == PAGE_SIZE)
SetPageUptodate(page);
if (!folio_test_uptodate(folio) && err == folio_size(folio))
folio_mark_uptodate(folio);
/*
* If err > 0, write_file has added err to pos, so we are comparing
@ -497,8 +498,8 @@ static int hostfs_write_end(struct file *file, struct address_space *mapping,
*/
if (err > 0 && (pos > inode->i_size))
inode->i_size = pos;
unlock_page(page);
put_page(page);
folio_unlock(folio);
folio_put(folio);
return err;
}

View File

@ -190,12 +190,11 @@ static void hpfs_write_failed(struct address_space *mapping, loff_t to)
static int hpfs_write_begin(struct file *file, struct address_space *mapping,
loff_t pos, unsigned len,
struct page **pagep, void **fsdata)
struct folio **foliop, void **fsdata)
{
int ret;
*pagep = NULL;
ret = cont_write_begin(file, mapping, pos, len, pagep, fsdata,
ret = cont_write_begin(file, mapping, pos, len, foliop, fsdata,
hpfs_get_block,
&hpfs_i(mapping->host)->mmu_private);
if (unlikely(ret))
@ -206,11 +205,11 @@ static int hpfs_write_begin(struct file *file, struct address_space *mapping,
static int hpfs_write_end(struct file *file, struct address_space *mapping,
loff_t pos, unsigned len, unsigned copied,
struct page *pagep, void *fsdata)
struct folio *folio, void *fsdata)
{
struct inode *inode = mapping->host;
int err;
err = generic_write_end(file, mapping, pos, len, copied, pagep, fsdata);
err = generic_write_end(file, mapping, pos, len, copied, folio, fsdata);
if (err < len)
hpfs_write_failed(mapping, pos + len);
if (!(err < 0)) {

View File

@ -388,14 +388,14 @@ static ssize_t hugetlbfs_read_iter(struct kiocb *iocb, struct iov_iter *to)
static int hugetlbfs_write_begin(struct file *file,
struct address_space *mapping,
loff_t pos, unsigned len,
struct page **pagep, void **fsdata)
struct folio **foliop, void **fsdata)
{
return -EINVAL;
}
static int hugetlbfs_write_end(struct file *file, struct address_space *mapping,
loff_t pos, unsigned len, unsigned copied,
struct page *page, void *fsdata)
struct folio *folio, void *fsdata)
{
BUG();
return -EINVAL;

View File

@ -21,7 +21,12 @@
#include <linux/list_lru.h>
#include <linux/iversion.h>
#include <linux/rw_hint.h>
#include <linux/seq_file.h>
#include <linux/debugfs.h>
#include <trace/events/writeback.h>
#define CREATE_TRACE_POINTS
#include <trace/events/timestamp.h>
#include "internal.h"
/*
@ -60,6 +65,13 @@ static unsigned int i_hash_shift __ro_after_init;
static struct hlist_head *inode_hashtable __ro_after_init;
static __cacheline_aligned_in_smp DEFINE_SPINLOCK(inode_hash_lock);
/*
* This represents the latest fine-grained time that we have handed out as a
* timestamp on the system. Tracked as a monotonic value, and converted to the
* realtime clock on an as-needed basis.
*/
static __cacheline_aligned_in_smp atomic64_t ctime_floor;
/*
* Empty aops. Can be used for the cases where the user does not
* define any of the address_space operations.
@ -98,6 +110,77 @@ long get_nr_dirty_inodes(void)
return nr_dirty > 0 ? nr_dirty : 0;
}
#ifdef CONFIG_DEBUG_FS
static DEFINE_PER_CPU(unsigned long, mg_ctime_updates);
static DEFINE_PER_CPU(unsigned long, mg_fine_stamps);
static DEFINE_PER_CPU(unsigned long, mg_floor_swaps);
static DEFINE_PER_CPU(unsigned long, mg_ctime_swaps);
static long get_mg_ctime_updates(void)
{
int i;
long sum = 0;
for_each_possible_cpu(i)
sum += per_cpu(mg_ctime_updates, i);
return sum < 0 ? 0 : sum;
}
static long get_mg_fine_stamps(void)
{
int i;
long sum = 0;
for_each_possible_cpu(i)
sum += per_cpu(mg_fine_stamps, i);
return sum < 0 ? 0 : sum;
}
static long get_mg_floor_swaps(void)
{
int i;
long sum = 0;
for_each_possible_cpu(i)
sum += per_cpu(mg_floor_swaps, i);
return sum < 0 ? 0 : sum;
}
static long get_mg_ctime_swaps(void)
{
int i;
long sum = 0;
for_each_possible_cpu(i)
sum += per_cpu(mg_ctime_swaps, i);
return sum < 0 ? 0 : sum;
}
#define mgtime_counter_inc(__var) this_cpu_inc(__var)
static int mgts_show(struct seq_file *s, void *p)
{
long ctime_updates = get_mg_ctime_updates();
long ctime_swaps = get_mg_ctime_swaps();
long fine_stamps = get_mg_fine_stamps();
long floor_swaps = get_mg_floor_swaps();
seq_printf(s, "%lu %lu %lu %lu\n",
ctime_updates, ctime_swaps, fine_stamps, floor_swaps);
return 0;
}
DEFINE_SHOW_ATTRIBUTE(mgts);
static int __init mg_debugfs_init(void)
{
debugfs_create_file("multigrain_timestamps", S_IFREG | S_IRUGO, NULL, NULL, &mgts_fops);
return 0;
}
late_initcall(mg_debugfs_init);
#else /* ! CONFIG_DEBUG_FS */
#define mgtime_counter_inc(__var) do { } while (0)
#endif /* CONFIG_DEBUG_FS */
/*
* Handle nr_inode sysctl
*/
@ -464,6 +547,17 @@ static void __inode_add_lru(struct inode *inode, bool rotate)
inode->i_state |= I_REFERENCED;
}
struct wait_queue_head *inode_bit_waitqueue(struct wait_bit_queue_entry *wqe,
struct inode *inode, u32 bit)
{
void *bit_address;
bit_address = inode_state_wait_address(inode, bit);
init_wait_var_entry(wqe, bit_address, 0);
return __var_waitqueue(bit_address);
}
EXPORT_SYMBOL(inode_bit_waitqueue);
/*
* Add inode to LRU if needed (inode is unused and clean).
*
@ -492,25 +586,35 @@ static void inode_unpin_lru_isolating(struct inode *inode)
spin_lock(&inode->i_lock);
WARN_ON(!(inode->i_state & I_LRU_ISOLATING));
inode->i_state &= ~I_LRU_ISOLATING;
smp_mb();
wake_up_bit(&inode->i_state, __I_LRU_ISOLATING);
/* Called with inode->i_lock which ensures memory ordering. */
inode_wake_up_bit(inode, __I_LRU_ISOLATING);
spin_unlock(&inode->i_lock);
}
static void inode_wait_for_lru_isolating(struct inode *inode)
{
spin_lock(&inode->i_lock);
if (inode->i_state & I_LRU_ISOLATING) {
DEFINE_WAIT_BIT(wq, &inode->i_state, __I_LRU_ISOLATING);
wait_queue_head_t *wqh;
struct wait_bit_queue_entry wqe;
struct wait_queue_head *wq_head;
wqh = bit_waitqueue(&inode->i_state, __I_LRU_ISOLATING);
lockdep_assert_held(&inode->i_lock);
if (!(inode->i_state & I_LRU_ISOLATING))
return;
wq_head = inode_bit_waitqueue(&wqe, inode, __I_LRU_ISOLATING);
for (;;) {
prepare_to_wait_event(wq_head, &wqe.wq_entry, TASK_UNINTERRUPTIBLE);
/*
* Checking I_LRU_ISOLATING with inode->i_lock guarantees
* memory ordering.
*/
if (!(inode->i_state & I_LRU_ISOLATING))
break;
spin_unlock(&inode->i_lock);
__wait_on_bit(wqh, &wq, bit_wait, TASK_UNINTERRUPTIBLE);
schedule();
spin_lock(&inode->i_lock);
WARN_ON(inode->i_state & I_LRU_ISOLATING);
}
spin_unlock(&inode->i_lock);
finish_wait(wq_head, &wqe.wq_entry);
WARN_ON(inode->i_state & I_LRU_ISOLATING);
}
/**
@ -587,6 +691,7 @@ void dump_mapping(const struct address_space *mapping)
struct hlist_node *dentry_first;
struct dentry *dentry_ptr;
struct dentry dentry;
char fname[64] = {};
unsigned long ino;
/*
@ -623,11 +728,14 @@ void dump_mapping(const struct address_space *mapping)
return;
}
if (strncpy_from_kernel_nofault(fname, dentry.d_name.name, 63) < 0)
strscpy(fname, "<invalid>");
/*
* if dentry is corrupted, the %pd handler may still crash,
* but it's unlikely that we reach here with a corrupt mapping
* Even if strncpy_from_kernel_nofault() succeeded,
* the fname could be unreliable
*/
pr_warn("aops:%ps ino:%lx dentry name:\"%pd\"\n", a_ops, ino, &dentry);
pr_warn("aops:%ps ino:%lx dentry name(?):\"%s\"\n",
a_ops, ino, fname);
}
void clear_inode(struct inode *inode)
@ -682,6 +790,7 @@ static void evict(struct inode *inode)
inode_sb_list_del(inode);
spin_lock(&inode->i_lock);
inode_wait_for_lru_isolating(inode);
/*
@ -691,6 +800,7 @@ static void evict(struct inode *inode)
* the inode. We just have to wait for running writeback to finish.
*/
inode_wait_for_writeback(inode);
spin_unlock(&inode->i_lock);
if (op->evict_inode) {
op->evict_inode(inode);
@ -714,7 +824,13 @@ static void evict(struct inode *inode)
* used as an indicator whether blocking on it is safe.
*/
spin_lock(&inode->i_lock);
wake_up_bit(&inode->i_state, __I_NEW);
/*
* Pairs with the barrier in prepare_to_wait_event() to make sure
* ___wait_var_event() either sees the bit cleared or
* waitqueue_active() check in wake_up_var() sees the waiter.
*/
smp_mb();
inode_wake_up_bit(inode, __I_NEW);
BUG_ON(inode->i_state != (I_FREEING | I_CLEAR));
spin_unlock(&inode->i_lock);
@ -762,6 +878,10 @@ again:
continue;
spin_lock(&inode->i_lock);
if (atomic_read(&inode->i_count)) {
spin_unlock(&inode->i_lock);
continue;
}
if (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE)) {
spin_unlock(&inode->i_lock);
continue;
@ -1122,8 +1242,13 @@ void unlock_new_inode(struct inode *inode)
spin_lock(&inode->i_lock);
WARN_ON(!(inode->i_state & I_NEW));
inode->i_state &= ~I_NEW & ~I_CREATING;
/*
* Pairs with the barrier in prepare_to_wait_event() to make sure
* ___wait_var_event() either sees the bit cleared or
* waitqueue_active() check in wake_up_var() sees the waiter.
*/
smp_mb();
wake_up_bit(&inode->i_state, __I_NEW);
inode_wake_up_bit(inode, __I_NEW);
spin_unlock(&inode->i_lock);
}
EXPORT_SYMBOL(unlock_new_inode);
@ -1134,8 +1259,13 @@ void discard_new_inode(struct inode *inode)
spin_lock(&inode->i_lock);
WARN_ON(!(inode->i_state & I_NEW));
inode->i_state &= ~I_NEW;
/*
* Pairs with the barrier in prepare_to_wait_event() to make sure
* ___wait_var_event() either sees the bit cleared or
* waitqueue_active() check in wake_up_var() sees the waiter.
*/
smp_mb();
wake_up_bit(&inode->i_state, __I_NEW);
inode_wake_up_bit(inode, __I_NEW);
spin_unlock(&inode->i_lock);
iput(inode);
}
@ -1562,9 +1692,7 @@ struct inode *ilookup(struct super_block *sb, unsigned long ino)
struct hlist_head *head = inode_hashtable + hash(sb, ino);
struct inode *inode;
again:
spin_lock(&inode_hash_lock);
inode = find_inode_fast(sb, head, ino, true);
spin_unlock(&inode_hash_lock);
inode = find_inode_fast(sb, head, ino, false);
if (inode) {
if (IS_ERR(inode))
@ -2164,19 +2292,72 @@ int file_remove_privs(struct file *file)
}
EXPORT_SYMBOL(file_remove_privs);
/**
* coarse_ctime - return the current coarse-grained time
* @floor: current (monotonic) ctime_floor value
*
* Get the coarse-grained time, and then determine whether to
* return it or the current floor value. Returns the later of the
* floor and coarse grained timestamps, converted to realtime
* clock value.
*/
static ktime_t coarse_ctime(ktime_t floor)
{
ktime_t coarse = ktime_get_coarse();
/* If coarse time is already newer, return that */
if (!ktime_after(floor, coarse))
return ktime_get_coarse_real();
return ktime_mono_to_real(floor);
}
/**
* current_time - Return FS time (possibly fine-grained)
* @inode: inode.
*
* Return the current time truncated to the time granularity supported by
* the fs, as suitable for a ctime/mtime change. If the ctime is flagged
* as having been QUERIED, get a fine-grained timestamp.
*/
struct timespec64 current_time(struct inode *inode)
{
ktime_t floor = atomic64_read(&ctime_floor);
ktime_t now = coarse_ctime(floor);
struct timespec64 now_ts = ktime_to_timespec64(now);
u32 cns;
if (!is_mgtime(inode))
goto out;
/* If nothing has queried it, then coarse time is fine */
cns = smp_load_acquire(&inode->i_ctime_nsec);
if (cns & I_CTIME_QUERIED) {
/*
* If there is no apparent change, then
* get a fine-grained timestamp.
*/
if (now_ts.tv_nsec == (cns & ~I_CTIME_QUERIED))
ktime_get_real_ts64(&now_ts);
}
out:
return timestamp_truncate(now_ts, inode);
}
EXPORT_SYMBOL(current_time);
static int inode_needs_update_time(struct inode *inode)
{
struct timespec64 now, ts;
int sync_it = 0;
struct timespec64 now = current_time(inode);
struct timespec64 ts;
/* First try to exhaust all avenues to not sync */
if (IS_NOCMTIME(inode))
return 0;
now = current_time(inode);
ts = inode_get_mtime(inode);
if (!timespec64_equal(&ts, &now))
sync_it = S_MTIME;
sync_it |= S_MTIME;
ts = inode_get_ctime(inode);
if (!timespec64_equal(&ts, &now))
@ -2326,8 +2507,8 @@ EXPORT_SYMBOL(inode_needs_sync);
*/
static void __wait_on_freeing_inode(struct inode *inode, bool is_inode_hash_locked)
{
wait_queue_head_t *wq;
DEFINE_WAIT_BIT(wait, &inode->i_state, __I_NEW);
struct wait_bit_queue_entry wqe;
struct wait_queue_head *wq_head;
/*
* Handle racing against evict(), see that routine for more details.
@ -2338,14 +2519,14 @@ static void __wait_on_freeing_inode(struct inode *inode, bool is_inode_hash_lock
return;
}
wq = bit_waitqueue(&inode->i_state, __I_NEW);
prepare_to_wait(wq, &wait.wq_entry, TASK_UNINTERRUPTIBLE);
wq_head = inode_bit_waitqueue(&wqe, inode, __I_NEW);
prepare_to_wait_event(wq_head, &wqe.wq_entry, TASK_UNINTERRUPTIBLE);
spin_unlock(&inode->i_lock);
rcu_read_unlock();
if (is_inode_hash_locked)
spin_unlock(&inode_hash_lock);
schedule();
finish_wait(wq, &wait.wq_entry);
finish_wait(wq_head, &wqe.wq_entry);
if (is_inode_hash_locked)
spin_lock(&inode_hash_lock);
rcu_read_lock();
@ -2494,18 +2675,11 @@ EXPORT_SYMBOL(inode_owner_or_capable);
/*
* Direct i/o helper functions
*/
static void __inode_dio_wait(struct inode *inode)
bool inode_dio_finished(const struct inode *inode)
{
wait_queue_head_t *wq = bit_waitqueue(&inode->i_state, __I_DIO_WAKEUP);
DEFINE_WAIT_BIT(q, &inode->i_state, __I_DIO_WAKEUP);
do {
prepare_to_wait(wq, &q.wq_entry, TASK_UNINTERRUPTIBLE);
if (atomic_read(&inode->i_dio_count))
schedule();
} while (atomic_read(&inode->i_dio_count));
finish_wait(wq, &q.wq_entry);
return atomic_read(&inode->i_dio_count) == 0;
}
EXPORT_SYMBOL(inode_dio_finished);
/**
* inode_dio_wait - wait for outstanding DIO requests to finish
@ -2519,11 +2693,17 @@ static void __inode_dio_wait(struct inode *inode)
*/
void inode_dio_wait(struct inode *inode)
{
if (atomic_read(&inode->i_dio_count))
__inode_dio_wait(inode);
wait_var_event(&inode->i_dio_count, inode_dio_finished(inode));
}
EXPORT_SYMBOL(inode_dio_wait);
void inode_dio_wait_interruptible(struct inode *inode)
{
wait_var_event_interruptible(&inode->i_dio_count,
inode_dio_finished(inode));
}
EXPORT_SYMBOL(inode_dio_wait_interruptible);
/*
* inode_set_flags - atomically set some inode flags
*
@ -2554,6 +2734,16 @@ void inode_nohighmem(struct inode *inode)
}
EXPORT_SYMBOL(inode_nohighmem);
struct timespec64 inode_set_ctime_to_ts(struct inode *inode, struct timespec64 ts)
{
trace_inode_set_ctime_to_ts(inode, &ts);
set_normalized_timespec64(&ts, ts.tv_sec, ts.tv_nsec);
inode->i_ctime_sec = ts.tv_sec;
inode->i_ctime_nsec = ts.tv_nsec;
return ts;
}
EXPORT_SYMBOL(inode_set_ctime_to_ts);
/**
* timestamp_truncate - Truncate timespec to a granularity
* @t: Timespec
@ -2585,38 +2775,99 @@ struct timespec64 timestamp_truncate(struct timespec64 t, struct inode *inode)
}
EXPORT_SYMBOL(timestamp_truncate);
/**
* current_time - Return FS time
* @inode: inode.
*
* Return the current time truncated to the time granularity supported by
* the fs.
*
* Note that inode and inode->sb cannot be NULL.
* Otherwise, the function warns and returns time without truncation.
*/
struct timespec64 current_time(struct inode *inode)
{
struct timespec64 now;
ktime_get_coarse_real_ts64(&now);
return timestamp_truncate(now, inode);
}
EXPORT_SYMBOL(current_time);
/**
* inode_set_ctime_current - set the ctime to current_time
* @inode: inode
*
* Set the inode->i_ctime to the current value for the inode. Returns
* the current value that was assigned to i_ctime.
* Set the inode's ctime to the current value for the inode. Returns the
* current value that was assigned. If this is not a multigrain inode, then we
* just set it to whatever the coarse_ctime is.
*
* If it is multigrain, then we first see if the coarse-grained timestamp is
* distinct from what we have. If so, then we'll just use that. If we have to
* get a fine-grained timestamp, then do so, and try to swap it into the floor.
* We accept the new floor value regardless of the outcome of the cmpxchg.
* After that, we try to swap the new value into i_ctime_nsec. Again, we take
* the resulting ctime, regardless of the outcome of the swap.
*/
struct timespec64 inode_set_ctime_current(struct inode *inode)
{
struct timespec64 now = current_time(inode);
ktime_t now, floor = atomic64_read(&ctime_floor);
struct timespec64 now_ts;
u32 cns, cur;
inode_set_ctime_to_ts(inode, now);
return now;
now = coarse_ctime(floor);
/* Just return that if this is not a multigrain fs */
if (!is_mgtime(inode)) {
now_ts = timestamp_truncate(ktime_to_timespec64(now), inode);
inode_set_ctime_to_ts(inode, now_ts);
goto out;
}
/*
* We only need a fine-grained time if someone has queried it,
* and the current coarse grained time isn't later than what's
* already there.
*/
cns = smp_load_acquire(&inode->i_ctime_nsec);
if (cns & I_CTIME_QUERIED) {
ktime_t ctime = ktime_set(inode->i_ctime_sec, cns & ~I_CTIME_QUERIED);
if (!ktime_after(now, ctime)) {
ktime_t old, fine;
/* Get a fine-grained time */
fine = ktime_get();
mgtime_counter_inc(mg_fine_stamps);
/*
* If the cmpxchg works, we take the new floor value. If
* not, then that means that someone else changed it after we
* fetched it but before we got here. That value is just
* as good, so keep it.
*/
old = floor;
if (atomic64_try_cmpxchg(&ctime_floor, &old, fine))
mgtime_counter_inc(mg_floor_swaps);
else
fine = old;
now = ktime_mono_to_real(fine);
}
}
mgtime_counter_inc(mg_ctime_updates);
now_ts = timestamp_truncate(ktime_to_timespec64(now), inode);
cur = cns;
/* No need to cmpxchg if it's exactly the same */
if (cns == now_ts.tv_nsec && inode->i_ctime_sec == now_ts.tv_sec) {
trace_ctime_xchg_skip(inode, &now_ts);
goto out;
}
retry:
/* Try to swap the nsec value into place. */
if (try_cmpxchg(&inode->i_ctime_nsec, &cur, now_ts.tv_nsec)) {
/* If swap occurred, then we're (mostly) done */
inode->i_ctime_sec = now_ts.tv_sec;
trace_ctime_ns_xchg(inode, cns, now_ts.tv_nsec, cur);
mgtime_counter_inc(mg_ctime_swaps);
} else {
/*
* Was the change due to someone marking the old ctime QUERIED?
* If so then retry the swap. This can only happen once since
* the only way to clear I_CTIME_QUERIED is to stamp the inode
* with a new ctime.
*/
if (!(cns & I_CTIME_QUERIED) && (cns | I_CTIME_QUERIED) == cur) {
cns = cur;
goto retry;
}
/* Otherwise, keep the existing ctime */
now_ts.tv_sec = inode->i_ctime_sec;
now_ts.tv_nsec = cur & ~I_CTIME_QUERIED;
}
out:
return now_ts;
}
EXPORT_SYMBOL(inode_set_ctime_current);

View File

@ -267,7 +267,7 @@ struct xattr_name {
char name[XATTR_NAME_MAX + 1];
};
struct xattr_ctx {
struct kernel_xattr_ctx {
/* Value of attribute */
union {
const void __user *cvalue;
@ -283,11 +283,11 @@ struct xattr_ctx {
ssize_t do_getxattr(struct mnt_idmap *idmap,
struct dentry *d,
struct xattr_ctx *ctx);
struct kernel_xattr_ctx *ctx);
int setxattr_copy(const char __user *name, struct xattr_ctx *ctx);
int setxattr_copy(const char __user *name, struct kernel_xattr_ctx *ctx);
int do_setxattr(struct mnt_idmap *idmap, struct dentry *dentry,
struct xattr_ctx *ctx);
struct kernel_xattr_ctx *ctx);
int may_write_xattr(struct mnt_idmap *idmap, struct inode *inode);
#ifdef CONFIG_FS_POSIX_ACL
@ -337,3 +337,4 @@ static inline bool path_mounted(const struct path *path)
{
return path->mnt->mnt_root == path->dentry;
}
void file_f_owner_release(struct file *file);

View File

@ -900,7 +900,7 @@ static bool iomap_write_end(struct iomap_iter *iter, loff_t pos, size_t len,
size_t bh_written;
bh_written = block_write_end(NULL, iter->inode->i_mapping, pos,
len, copied, &folio->page, NULL);
len, copied, folio, NULL);
WARN_ON_ONCE(bh_written != copied && bh_written != 0);
return bh_written == copied;
}
@ -1022,13 +1022,14 @@ retry:
ssize_t
iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *i,
const struct iomap_ops *ops)
const struct iomap_ops *ops, void *private)
{
struct iomap_iter iter = {
.inode = iocb->ki_filp->f_mapping->host,
.pos = iocb->ki_pos,
.len = iov_iter_count(i),
.flags = IOMAP_WRITE,
.private = private,
};
ssize_t ret;
@ -2007,10 +2008,10 @@ iomap_writepages(struct address_space *mapping, struct writeback_control *wbc,
}
EXPORT_SYMBOL_GPL(iomap_writepages);
static int __init iomap_init(void)
static int __init iomap_buffered_init(void)
{
return bioset_init(&iomap_ioend_bioset, 4 * (PAGE_SIZE / SECTOR_SIZE),
offsetof(struct iomap_ioend, io_bio),
BIOSET_NEED_BVECS);
}
fs_initcall(iomap_init);
fs_initcall(iomap_buffered_init);

View File

@ -27,6 +27,13 @@
#define IOMAP_DIO_WRITE (1U << 30)
#define IOMAP_DIO_DIRTY (1U << 31)
/*
* Used for sub block zeroing in iomap_dio_zero()
*/
#define IOMAP_ZERO_PAGE_SIZE (SZ_64K)
#define IOMAP_ZERO_PAGE_ORDER (get_order(IOMAP_ZERO_PAGE_SIZE))
static struct page *zero_page;
struct iomap_dio {
struct kiocb *iocb;
const struct iomap_dio_ops *dops;
@ -232,13 +239,20 @@ release_bio:
}
EXPORT_SYMBOL_GPL(iomap_dio_bio_end_io);
static void iomap_dio_zero(const struct iomap_iter *iter, struct iomap_dio *dio,
static int iomap_dio_zero(const struct iomap_iter *iter, struct iomap_dio *dio,
loff_t pos, unsigned len)
{
struct inode *inode = file_inode(dio->iocb->ki_filp);
struct page *page = ZERO_PAGE(0);
struct bio *bio;
if (!len)
return 0;
/*
* Max block size supported is 64k
*/
if (WARN_ON_ONCE(len > IOMAP_ZERO_PAGE_SIZE))
return -EINVAL;
bio = iomap_dio_alloc_bio(iter, dio, 1, REQ_OP_WRITE | REQ_SYNC | REQ_IDLE);
fscrypt_set_bio_crypt_ctx(bio, inode, pos >> inode->i_blkbits,
GFP_KERNEL);
@ -246,8 +260,9 @@ static void iomap_dio_zero(const struct iomap_iter *iter, struct iomap_dio *dio,
bio->bi_private = dio;
bio->bi_end_io = iomap_dio_bio_end_io;
__bio_add_page(bio, page, len, 0);
__bio_add_page(bio, zero_page, len, 0);
iomap_dio_submit_bio(iter, dio, bio, pos);
return 0;
}
/*
@ -356,8 +371,10 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
if (need_zeroout) {
/* zero out from the start of the block to the write offset */
pad = pos & (fs_block_size - 1);
if (pad)
iomap_dio_zero(iter, dio, pos - pad, pad);
ret = iomap_dio_zero(iter, dio, pos - pad, pad);
if (ret)
goto out;
}
/*
@ -431,7 +448,8 @@ zero_tail:
/* zero out from the end of the write to the end of the block */
pad = pos & (fs_block_size - 1);
if (pad)
iomap_dio_zero(iter, dio, pos, fs_block_size - pad);
ret = iomap_dio_zero(iter, dio, pos,
fs_block_size - pad);
}
out:
/* Undo iter limitation to current extent */
@ -753,3 +771,15 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
return iomap_dio_complete(dio);
}
EXPORT_SYMBOL_GPL(iomap_dio_rw);
static int __init iomap_dio_init(void)
{
zero_page = alloc_pages(GFP_KERNEL | __GFP_ZERO,
IOMAP_ZERO_PAGE_ORDER);
if (!zero_page)
return -ENOMEM;
return 0;
}
fs_initcall(iomap_dio_init);

View File

@ -23,10 +23,10 @@
static int jffs2_write_end(struct file *filp, struct address_space *mapping,
loff_t pos, unsigned len, unsigned copied,
struct page *pg, void *fsdata);
struct folio *folio, void *fsdata);
static int jffs2_write_begin(struct file *filp, struct address_space *mapping,
loff_t pos, unsigned len,
struct page **pagep, void **fsdata);
struct folio **foliop, void **fsdata);
static int jffs2_read_folio(struct file *filp, struct folio *folio);
int jffs2_fsync(struct file *filp, loff_t start, loff_t end, int datasync)
@ -77,29 +77,27 @@ const struct address_space_operations jffs2_file_address_operations =
.write_end = jffs2_write_end,
};
static int jffs2_do_readpage_nolock (struct inode *inode, struct page *pg)
static int jffs2_do_readpage_nolock(struct inode *inode, struct folio *folio)
{
struct jffs2_inode_info *f = JFFS2_INODE_INFO(inode);
struct jffs2_sb_info *c = JFFS2_SB_INFO(inode->i_sb);
unsigned char *pg_buf;
unsigned char *kaddr;
int ret;
jffs2_dbg(2, "%s(): ino #%lu, page at offset 0x%lx\n",
__func__, inode->i_ino, pg->index << PAGE_SHIFT);
__func__, inode->i_ino, folio->index << PAGE_SHIFT);
BUG_ON(!PageLocked(pg));
BUG_ON(!folio_test_locked(folio));
pg_buf = kmap(pg);
/* FIXME: Can kmap fail? */
ret = jffs2_read_inode_range(c, f, pg_buf, pg->index << PAGE_SHIFT,
kaddr = kmap_local_folio(folio, 0);
ret = jffs2_read_inode_range(c, f, kaddr, folio->index << PAGE_SHIFT,
PAGE_SIZE);
kunmap_local(kaddr);
if (!ret)
SetPageUptodate(pg);
folio_mark_uptodate(folio);
flush_dcache_page(pg);
kunmap(pg);
flush_dcache_folio(folio);
jffs2_dbg(2, "readpage finished\n");
return ret;
@ -107,7 +105,7 @@ static int jffs2_do_readpage_nolock (struct inode *inode, struct page *pg)
int __jffs2_read_folio(struct file *file, struct folio *folio)
{
int ret = jffs2_do_readpage_nolock(folio->mapping->host, &folio->page);
int ret = jffs2_do_readpage_nolock(folio->mapping->host, folio);
folio_unlock(folio);
return ret;
}
@ -125,9 +123,9 @@ static int jffs2_read_folio(struct file *file, struct folio *folio)
static int jffs2_write_begin(struct file *filp, struct address_space *mapping,
loff_t pos, unsigned len,
struct page **pagep, void **fsdata)
struct folio **foliop, void **fsdata)
{
struct page *pg;
struct folio *folio;
struct inode *inode = mapping->host;
struct jffs2_inode_info *f = JFFS2_INODE_INFO(inode);
struct jffs2_sb_info *c = JFFS2_SB_INFO(inode->i_sb);
@ -206,29 +204,30 @@ static int jffs2_write_begin(struct file *filp, struct address_space *mapping,
* page in read_cache_page(), which causes a deadlock.
*/
mutex_lock(&c->alloc_sem);
pg = grab_cache_page_write_begin(mapping, index);
if (!pg) {
ret = -ENOMEM;
folio = __filemap_get_folio(mapping, index, FGP_WRITEBEGIN,
mapping_gfp_mask(mapping));
if (IS_ERR(folio)) {
ret = PTR_ERR(folio);
goto release_sem;
}
*pagep = pg;
*foliop = folio;
/*
* Read in the page if it wasn't already present. Cannot optimize away
* the whole page write case until jffs2_write_end can handle the
* Read in the folio if it wasn't already present. Cannot optimize away
* the whole folio write case until jffs2_write_end can handle the
* case of a short-copy.
*/
if (!PageUptodate(pg)) {
if (!folio_test_uptodate(folio)) {
mutex_lock(&f->sem);
ret = jffs2_do_readpage_nolock(inode, pg);
ret = jffs2_do_readpage_nolock(inode, folio);
mutex_unlock(&f->sem);
if (ret) {
unlock_page(pg);
put_page(pg);
folio_unlock(folio);
folio_put(folio);
goto release_sem;
}
}
jffs2_dbg(1, "end write_begin(). pg->flags %lx\n", pg->flags);
jffs2_dbg(1, "end write_begin(). folio->flags %lx\n", folio->flags);
release_sem:
mutex_unlock(&c->alloc_sem);
@ -238,7 +237,7 @@ out_err:
static int jffs2_write_end(struct file *filp, struct address_space *mapping,
loff_t pos, unsigned len, unsigned copied,
struct page *pg, void *fsdata)
struct folio *folio, void *fsdata)
{
/* Actually commit the write from the page cache page we're looking at.
* For now, we write the full page out each time. It sucks, but it's simple
@ -252,16 +251,17 @@ static int jffs2_write_end(struct file *filp, struct address_space *mapping,
unsigned aligned_start = start & ~3;
int ret = 0;
uint32_t writtenlen = 0;
void *buf;
jffs2_dbg(1, "%s(): ino #%lu, page at 0x%lx, range %d-%d, flags %lx\n",
__func__, inode->i_ino, pg->index << PAGE_SHIFT,
start, end, pg->flags);
jffs2_dbg(1, "%s(): ino #%lu, page at 0x%llx, range %d-%d, flags %lx\n",
__func__, inode->i_ino, folio_pos(folio),
start, end, folio->flags);
/* We need to avoid deadlock with page_cache_read() in
jffs2_garbage_collect_pass(). So the page must be
jffs2_garbage_collect_pass(). So the folio must be
up to date to prevent page_cache_read() from trying
to re-lock it. */
BUG_ON(!PageUptodate(pg));
BUG_ON(!folio_test_uptodate(folio));
if (end == PAGE_SIZE) {
/* When writing out the end of a page, write out the
@ -276,8 +276,8 @@ static int jffs2_write_end(struct file *filp, struct address_space *mapping,
if (!ri) {
jffs2_dbg(1, "%s(): Allocation of raw inode failed\n",
__func__);
unlock_page(pg);
put_page(pg);
folio_unlock(folio);
folio_put(folio);
return -ENOMEM;
}
@ -289,15 +289,11 @@ static int jffs2_write_end(struct file *filp, struct address_space *mapping,
ri->isize = cpu_to_je32((uint32_t)inode->i_size);
ri->atime = ri->ctime = ri->mtime = cpu_to_je32(JFFS2_NOW());
/* In 2.4, it was already kmapped by generic_file_write(). Doesn't
hurt to do it again. The alternative is ifdefs, which are ugly. */
kmap(pg);
ret = jffs2_write_inode_range(c, f, ri, page_address(pg) + aligned_start,
(pg->index << PAGE_SHIFT) + aligned_start,
buf = kmap_local_folio(folio, aligned_start);
ret = jffs2_write_inode_range(c, f, ri, buf,
folio_pos(folio) + aligned_start,
end - aligned_start, &writtenlen);
kunmap(pg);
kunmap_local(buf);
if (ret)
mapping_set_error(mapping, ret);
@ -323,12 +319,12 @@ static int jffs2_write_end(struct file *filp, struct address_space *mapping,
it gets reread */
jffs2_dbg(1, "%s(): Not all bytes written. Marking page !uptodate\n",
__func__);
ClearPageUptodate(pg);
folio_clear_uptodate(folio);
}
jffs2_dbg(1, "%s() returning %d\n",
__func__, writtenlen > 0 ? writtenlen : ret);
unlock_page(pg);
put_page(pg);
folio_unlock(folio);
folio_put(folio);
return writtenlen > 0 ? writtenlen : ret;
}

View File

@ -1171,7 +1171,7 @@ static int jffs2_garbage_collect_dnode(struct jffs2_sb_info *c, struct jffs2_era
uint32_t alloclen, offset, orig_end, orig_start;
int ret = 0;
unsigned char *comprbuf = NULL, *writebuf;
struct page *page;
struct folio *folio;
unsigned char *pg_ptr;
memset(&ri, 0, sizeof(ri));
@ -1317,25 +1317,25 @@ static int jffs2_garbage_collect_dnode(struct jffs2_sb_info *c, struct jffs2_era
BUG_ON(start > orig_start);
}
/* The rules state that we must obtain the page lock *before* f->sem, so
/* The rules state that we must obtain the folio lock *before* f->sem, so
* drop f->sem temporarily. Since we also hold c->alloc_sem, nothing's
* actually going to *change* so we're safe; we only allow reading.
*
* It is important to note that jffs2_write_begin() will ensure that its
* page is marked Uptodate before allocating space. That means that if we
* end up here trying to GC the *same* page that jffs2_write_begin() is
* trying to write out, read_cache_page() will not deadlock. */
* folio is marked uptodate before allocating space. That means that if we
* end up here trying to GC the *same* folio that jffs2_write_begin() is
* trying to write out, read_cache_folio() will not deadlock. */
mutex_unlock(&f->sem);
page = read_cache_page(inode->i_mapping, start >> PAGE_SHIFT,
folio = read_cache_folio(inode->i_mapping, start >> PAGE_SHIFT,
__jffs2_read_folio, NULL);
if (IS_ERR(page)) {
pr_warn("read_cache_page() returned error: %ld\n",
PTR_ERR(page));
if (IS_ERR(folio)) {
pr_warn("read_cache_folio() returned error: %ld\n",
PTR_ERR(folio));
mutex_lock(&f->sem);
return PTR_ERR(page);
return PTR_ERR(folio);
}
pg_ptr = kmap(page);
pg_ptr = kmap_local_folio(folio, 0);
mutex_lock(&f->sem);
offset = start;
@ -1400,7 +1400,6 @@ static int jffs2_garbage_collect_dnode(struct jffs2_sb_info *c, struct jffs2_era
}
}
kunmap(page);
put_page(page);
folio_release_kmap(folio, pg_ptr);
return ret;
}

Some files were not shown because too many files have changed in this diff Show More