-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE9zuTYTs0RXF+Ke33EVvVyTe/1WoFAmbodTUACgkQEVvVyTe/
1Wp5Jw/+LN99lGoKXKZ7zB8gta4hMm2O+wd3//IdfTi6u59uff0QvVNbczC6SRlJ
sW6czIgjKpR1xt6Orp0Rof0VZK09PMqq3Iw7Tt3xEkJtjKxppfMCQHeoi03CH9kF
gOHPW+Ad4jlCk+TYFEuHk5lQIgVtJ11JDtGjr5tNa03MbrLIa0pFpTdz65bg9uTn
JdPqA+fiJ/mVoXlFQEZkQPeT8CgMfTbKtQDGfcmKzrNo0ZyQdW30u5lP3MGROptn
X/QX5n77UrE9Du+UHPSaVDRUd8iA/ZcZe/7RAlKXWElHpKvAiqv84Sz0Mc1fczHx
nQHaUfQixGs/vakkPXxVqKn/Xvnj4GDwUqyM9mG8kN1x4QpdyFRqM66zZxtWsSi5
/PL0zwAY950a5fY4VsyRsSf6Zl/SlDdB+PVVajk6BKqMxWgvxyuVbFRUqfesHsfA
V5GHil7F03sPzO12SvAJg8HuresCiafLdAPJgGXuGoeprmeSP9eYx0EqcUzLpHd7
ORUF4qEylYFp8vZBJdBnZpSGRUv/mmjDDowsKeKyfgKyVR99lLKLCtqMJ0e0eGaE
mpGNg+eq//snJldvBUHrMYQvyVm18/bbOzT0ZdGPp2bMUxLjSbiab7D3S9GKRxRt
8GRg/bJfZw5tz2i1bxbe5IWfNij5pDilLeAQUsoO8pUwu/SrCy8=
=wnbj
-----END PGP SIGNATURE-----
Merge tag 'ovl-update-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs
Pull overlayfs updates from Amir Goldstein:
- Increase robustness of overlayfs to crashes in the case of underlying
filesystems that to not guarantee metadata ordering to persistent
storage (problem was reported with ubifs).
- Deny mount inside container with features that require root
privileges to work properly, instead of failing operations later.
- Some clarifications to overlayfs documentation.
* tag 'ovl-update-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs:
ovl: fail if trusted xattrs are needed but caller lacks permission
overlayfs.rst: update metacopy section in overlayfs documentation
ovl: fsync after metadata copy-up
ovl: don't set the superblock's errseq_t manually
For upper filesystems which do not use strict ordering of persisting
metadata changes (e.g. ubifs), when overlayfs file is modified for
the first time, copy up will create a copy of the lower file and
its parent directories in the upper layer. Permission lost of the
new upper parent directory was observed during power-cut stress test.
Fix by moving the fsync call to after metadata copy to make sure that the
metadata copied up directory and files persists to disk before renaming
from tmp to final destination.
With metacopy enabled, this change will hurt performance of workloads
such as chown -R, so we keep the legacy behavior of fsync only on copyup
of data.
Link: https://lore.kernel.org/linux-unionfs/CAOQ4uxj-pOvmw1-uXR3qVdqtLjSkwcR9nVKcNU_vC10Zyf2miQ@mail.gmail.com/
Reported-and-tested-by: Fei Lv <feilv@asrmicro.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
To be consistent with most LSM hooks, convert the return value of
hook inode_copy_up_xattr to 0 or a negative error code.
Before:
- Hook inode_copy_up_xattr returns 0 when accepting xattr, 1 when
discarding xattr, -EOPNOTSUPP if it does not know xattr, or any
other negative error code otherwise.
After:
- Hook inode_copy_up_xattr returns 0 when accepting xattr, *-ECANCELED*
when discarding xattr, -EOPNOTSUPP if it does not know xattr, or
any other negative error code otherwise.
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Reviewed-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Copying up xattrs is solely based on the security xattr name. For finer
granularity add a dentry parameter to the security_inode_copy_up_xattr
hook definition, allowing decisions to be based on the xattr content as
well.
Co-developed-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Acked-by: Amir Goldstein <amir73il@gmail.com>
Acked-by: Paul Moore <paul@paul-moore.com> (LSM,SELinux)
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
syzbot hit an assertion in copy up data loop which looks like it is
the result of a lower file whose size is being changed underneath
overlayfs.
This type of use case is documented to cause undefined behavior, so
returning EIO error for the copy up makes sense, but it should not be
causing a WARN_ON assertion.
Reported-and-tested-by: syzbot+3abd99031b42acf367ef@syzkaller.appspotmail.com
Fixes: ca7ab48240 ("ovl: add permission hooks outside of do_splice_direct()")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
commit dfad37051a ("remap_range: move permission hooks out of
do_clone_file_range()") moved the permission hooks from
do_clone_file_range() out to its caller vfs_clone_file_range(),
but left all the fast sanity checks in do_clone_file_range().
This makes the expensive security hooks be called in situations
that they would not have been called before (e.g. fs does not support
clone).
The only reason for the do_clone_file_range() helper was that overlayfs
did not use to be able to call vfs_clone_file_range() from copy up
context with sb_writers lock held. However, since commit c63e56a4a6
("ovl: do not open/llseek lower file with upper sb_writers held"),
overlayfs just uses an open coded version of vfs_clone_file_range().
Merge_clone_file_range() into vfs_clone_file_range(), restoring the
original order of checks as it was before the regressing commit and adapt
the overlayfs code to call vfs_clone_file_range() before the permission
hooks that were added by commit ca7ab48240 ("ovl: add permission hooks
outside of do_splice_direct()").
Note that in the merge of do_clone_file_range(), the file_start_write()
context was reduced to cover ->remap_file_range() without holding it
over the permission hooks, which was the reason for doing the regressing
commit in the first place.
Reported-and-tested-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202401312229.eddeb9a6-oliver.sang@intel.com
Fixes: dfad37051a ("remap_range: move permission hooks out of do_clone_file_range()")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Link: https://lore.kernel.org/r/20240202102258.1582671-1-amir73il@gmail.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
broken in 6.5; we really can't lock two unrelated directories
without holding ->s_vfs_rename_mutex first and in case of
same-parent rename of a subdirectory 6.5 ends up doing just
that.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZZ+lyQAKCRBZ7Krx/gZQ
60MWAP94hTqeMIpjhsUIkrTnylrIFaiw4UCWFJzIRG1QQYKqCgD/XUaWI9np7dL6
0wR/j4CQSdJjiEFKUFE2pD3QoSuJYAQ=
=+x0+
-----END PGP SIGNATURE-----
Merge tag 'pull-rename' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull rename updates from Al Viro:
"Fix directory locking scheme on rename
This was broken in 6.5; we really can't lock two unrelated directories
without holding ->s_vfs_rename_mutex first and in case of same-parent
rename of a subdirectory 6.5 ends up doing just that"
* tag 'pull-rename' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
rename(): avoid a deadlock in the case of parents having no common ancestor
kill lock_two_inodes()
rename(): fix the locking of subdirectories
f2fs: Avoid reading renamed directory if parent does not change
ext4: don't access the source subdirectory content on same-directory rename
ext2: Avoid reading renamed directory if parent does not change
udf_rename(): only access the child content on cross-directory rename
ocfs2: Avoid touching renamed directory if parent does not change
reiserfs: Avoid touching renamed directory if parent does not change
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE9zuTYTs0RXF+Ke33EVvVyTe/1WoFAmWeozwACgkQEVvVyTe/
1WqnyA//U2Ka5ZIncs/hA5D03LMyuCh9qlH5qAGce5vrBTxogTlFuTGGKtsUuCB5
Y4GALO+fw8aWAowt5X1XfHD3TETLVbCshT7dYjKsKy/ojANCbgkCipXBudYx+l9m
fllwTZyueK0UY14kCU2DAV5PYsI/XVVykk71GSMOMLCUfRJfDI7R0vBD0NaUd7Kz
Wp/M6t0MnXX23nGUdgNoroZPPj3Ts/gK2MXID+QHXGaR2+M1B1lLKfSu6TcRDLtn
tbe/ivaw4y1jj3jfFwMC7sSSDyIJeZh9tBB4Rvv2vsMiYU8zAC6Eg35eIbPONu42
pUMd0QQa79H3cyYEDtUzyskzur0Jry5azzb8JdQWipgVKFh5g3CHce2XAFlVjw2a
9RyCKg41A9LvdB5l/PvBtsxig2PzaYqE09rXAfUM7eLNFlOLbL99uc1WJbIFfG43
Czh9vPxsuJ5RkdwS7R0m4GYDw8+BKW6WjpaC+Eje4I8X1rAQK0H/BLTCxe2dLRB7
0neAg8e3g6NdisRSLOP74xoEn/dhijNP7ENOFF1EdP/BFPHL7+sRsV6XYwwBeUAc
c6YsxeAPylm6gvIq/ESoRiY+e5QWvImHIWP+zB/cySYdT0fQHL9WjO6/uZW0ALuv
oZugICSmZ15pYlACIU8iYztRkS19CJZrUV7Gbq4+AurUKP8kCEI=
=2Ohx
-----END PGP SIGNATURE-----
Merge tag 'ovl-update-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs
Pull overlayfs updates from Amir Goldstein:
"This is a very small update with no bug fixes and no new features.
The larger update of overlayfs for this cycle, the re-factoring of
overlayfs code into generic backing_file helpers, was already merged
via Christian.
Summary:
- Simplify/clarify some code
No bug fixes here, just some changes following questions from Al
about overlayfs code that could be a little more simple to follow.
- Overlayfs documentation style fixes
Mainly fixes for ReST formatting suggested by documentation
developers"
* tag 'ovl-update-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs:
overlayfs.rst: fix ReST formatting
overlayfs.rst: use consistent feature names
ovl: initialize ovl_copy_up_ctx.destname inside ovl_do_copy_up()
ovl: remove redundant ofs->indexdir member
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZZUzXQAKCRCRxhvAZXjc
ogOtAQDpqUp1zY4dV/dZisCJ5xarZTsSZ1AvgmcxZBtS0NhbdgEAshWvYGA9ryS/
ChL5jjtjjZDLhRA//reoFHTQIrdp2w8=
=bF+R
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.8.rw' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs rw updates from Christian Brauner:
"This contains updates from Amir for read-write backing file helpers
for stacking filesystems such as overlayfs:
- Fanotify is currently in the process of introducing pre content
events. Roughly, a new permission event will be added indicating
that it is safe to write to the file being accessed. These events
are used by hierarchical storage managers to e.g., fill the content
of files on first access.
During that work we noticed that our current permission checking is
inconsistent in rw_verify_area() and remap_verify_area().
Especially in the splice code permission checking is done multiple
times. For example, one time for the whole range and then again for
partial ranges inside the iterator.
In addition, we mostly do permission checking before we call
file_start_write() except for a few places where we call it after.
For pre-content events we need such permission checking to be done
before file_start_write(). So this is a nice reason to clean this
all up.
After this series, all permission checking is done before
file_start_write().
As part of this cleanup we also massaged the splice code a bit. We
got rid of a few helpers because we are alredy drowning in special
read-write helpers. We also cleaned up the return types for splice
helpers.
- Introduce generic read-write helpers for backing files. This lifts
some overlayfs code to common code so it can be used by the FUSE
passthrough work coming in over the next cycles. Make Amir and
Miklos the maintainers for this new subsystem of the vfs"
* tag 'vfs-6.8.rw' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (30 commits)
fs: fix __sb_write_started() kerneldoc formatting
fs: factor out backing_file_mmap() helper
fs: factor out backing_file_splice_{read,write}() helpers
fs: factor out backing_file_{read,write}_iter() helpers
fs: prepare for stackable filesystems backing file helpers
fsnotify: optionally pass access range in file permission hooks
fsnotify: assert that file_start_write() is not held in permission hooks
fsnotify: split fsnotify_perm() into two hooks
fs: use splice_copy_file_range() inline helper
splice: return type ssize_t from all helpers
fs: use do_splice_direct() for nfsd/ksmbd server-side-copy
fs: move file_start_write() into direct_splice_actor()
fs: fork splice_file_range() from do_splice_direct()
fs: create {sb,file}_write_not_started() helpers
fs: create file_write_started() helper
fs: create __sb_write_started() helper
fs: move kiocb_start_write() into vfs_iocb_iter_write()
fs: move permission hook out of do_iter_read()
fs: move permission hook out of do_iter_write()
fs: move file_start_write() into vfs_iter_write()
...
syzbot excercised the forbidden practice of moving the workdir under
lowerdir while overlayfs is mounted and tripped a dentry reference leak.
Fixes: c63e56a4a6 ("ovl: do not open/llseek lower file with upper sb_writers held")
Reported-and-tested-by: syzbot+8608bb4553edb8c78f41@syzkaller.appspotmail.com
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
The callers of do_splice_direct() hold file_start_write() on the output
file.
This may cause file permission hooks to be called indirectly on an
overlayfs lower layer, which is on the same filesystem of the output
file and could lead to deadlock with fanotify permission events.
To fix this potential deadlock, move file_start_write() from the callers
into the direct_splice_actor(), so file_start_write() will not be held
while splicing from the input file.
Suggested-by: Josef Bacik <josef@toxicpanda.com>
Link: https://lore.kernel.org/r/20231128214258.GA2398475@perftesting/
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Link: https://lore.kernel.org/r/20231130141624.3338942-3-amir73il@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
... and fix the directory locking documentation and proof of correctness.
Holding ->s_vfs_rename_mutex *almost* prevents ->d_parent changes; the
case where we really don't want it is splicing the root of disconnected
tree to somewhere.
In other words, ->s_vfs_rename_mutex is sufficient to stabilize "X is an
ancestor of Y" only if X and Y are already in the same tree. Otherwise
it can go from false to true, and one can construct a deadlock on that.
Make lock_two_directories() report an error in such case and update the
callers of lock_rename()/lock_rename_child() to handle such errors.
And yes, such conditions are not impossible to create ;-/
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The main callers of do_splice_direct() also call rw_verify_area() for
the entire range that is being copied, e.g. by vfs_copy_file_range()
or do_sendfile() before calling do_splice_direct().
The only caller that does not have those checks for entire range is
ovl_copy_up_file(). In preparation for removing the checks inside
do_splice_direct(), add rw_verify_area() call in ovl_copy_up_file().
For extra safety, perform minimal sanity checks from rw_verify_area()
for non negative offsets also in the copy up do_splice_direct() loop
without calling the file permission hooks.
This is needed for fanotify "pre content" events.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Link: https://lore.kernel.org/r/20231122122715.2561213-2-amir73il@gmail.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
The ->destname member of struct ovl_copy_up_ctx is initialized inside
ovl_copy_up_one() to ->d_name of the overlayfs dentry being copied up
and then it may be overridden by index name inside ovl_do_copy_up().
ovl_inode_lock() in ovl_copy_up_start() and ovl_copy_up() in ovl_rename()
effectively stabilze ->d_name of the overlayfs dentry being copied up,
but ovl_inode_lock() is not held when ->d_name is being read.
It is not a correctness bug, because if ovl_do_copy_up() races with
ovl_rename() and ctx.destname is freed, we will not end up calling
ovl_do_copy_up() with the dead name reference.
The code becomes much easier to understand and to document if the
initialization of c->destname is always done inside ovl_do_copy_up(),
either to the index entry name, or to the overlay dentry ->d_name.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
When lower fs is a nested overlayfs, calling encode_fh() on a lower
directory dentry may trigger copy up and take sb_writers on the upper fs
of the lower nested overlayfs.
The lower nested overlayfs may have the same upper fs as this overlayfs,
so nested sb_writers lock is illegal.
Move all the callers that encode lower fh to before ovl_want_write().
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
overlayfs file open (ovl_maybe_lookup_lowerdata) and overlay file llseek
take the ovl_inode_lock, without holding upper sb_writers.
In case of nested lower overlay that uses same upper fs as this overlay,
lockdep will warn about (possibly false positive) circular lock
dependency when doing open/llseek of lower ovl file during copy up with
our upper sb_writers held, because the locking ordering seems reverse to
the locking order in ovl_copy_up_start():
- lower ovl_inode_lock
- upper sb_writers
Let the copy up "transaction" keeps an elevated mnt write count on upper
mnt, but leaves taking upper sb_writers to lower level helpers only when
they actually need it. This allows to avoid holding upper sb_writers
during lower file open/llseek and prevents the lockdep warning.
Minimizing the scope of upper sb_writers during copy up is also needed
for fixing another possible deadlocks by a following patch.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Make the locking order of ovl_inode_lock() strictly between the two
vfs stacked layers, i.e.:
- ovl vfs locks: sb_writers, inode_lock, ...
- ovl_inode_lock
- upper vfs locks: sb_writers, inode_lock, ...
To that effect, move ovl_want_write() into the helpers ovl_nlink_start()
and ovl_copy_up_start which currently take the ovl_inode_lock() after
ovl_want_write().
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZRKHuAAKCRCRxhvAZXjc
ohOLAQDU9Fxq5UdqCdmsyi/b24XJFZlQhcVIZy2Hrhcor9TiVQEAjuECGlxFPSgj
atVOWLdugDJquiHextqTEMgIecJpNw4=
=uINF
-----END PGP SIGNATURE-----
Merge tag 'v6.6-rc4.vfs.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
"This contains the usual miscellaneous fixes and cleanups for vfs and
individual fses:
Fixes:
- Revert ki_pos on error from buffered writes for direct io fallback
- Add missing documentation for block device and superblock handling
for changes merged this cycle
- Fix reiserfs flexible array usage
- Ensure that overlayfs sets ctime when setting mtime and atime
- Disable deferred caller completions with overlayfs writes until
proper support exists
Cleanups:
- Remove duplicate initialization in pipe code
- Annotate aio kioctx_table with __counted_by"
* tag 'v6.6-rc4.vfs.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs:
overlayfs: set ctime when setting mtime and atime
ntfs3: put resources during ntfs_fill_super()
ovl: disable IOCB_DIO_CALLER_COMP
porting: document superblock as block device holder
porting: document new block device opening order
fs/pipe: remove duplicate "offset" initializer
fs-writeback: do not requeue a clean inode having skipped pages
aio: Annotate struct kioctx_table with __counted_by
direct_write_fallback(): on error revert the ->ki_pos update from buffered write
reiserfs: Replace 1-element array with C99 style flex-array
Nathan reported that he was seeing the new warning in
setattr_copy_mgtime pop when starting podman containers. Overlayfs is
trying to set the atime and mtime via notify_change without also
setting the ctime.
POSIX states that when the atime and mtime are updated via utimes() that
we must also update the ctime to the current time. The situation with
overlayfs copy-up is analogies, so add ATTR_CTIME to the bitmask.
notify_change will fill in the value.
Reported-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: Christian Brauner <brauner@kernel.org>
Acked-by: Amir Goldstein <amir73il@gmail.com>
Message-Id: <20230913-ctime-v1-1-c6bc509cbc27@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Some local filesystems support setting persistent fileattr flags
(e.g. FS_NOATIME_FL) on directories and regular files via ioctl.
Some of those persistent fileattr flags are reflected to vfs as
in-memory inode flags (e.g. S_NOATIME).
Overlayfs uses the in-memory inode flags (e.g. S_NOATIME) on a lower file
as an indication that a the lower file may have persistent inode fileattr
flags (e.g. FS_NOATIME_FL) that need to be copied to upper file.
However, in some cases, the S_NOATIME in-memory flag could be a false
indication for persistent FS_NOATIME_FL fileattr. For example, with NFS
and FUSE lower fs, as was the case in the two bug reports, the S_NOATIME
flag is set unconditionally for all inodes.
Users cannot set persistent fileattr flags on symlinks and special files,
but in some local fs, such as ext4/btrfs/tmpfs, the FS_NOATIME_FL fileattr
flag are inheritted to symlinks and special files from parent directory.
In both cases described above, when lower symlink has the S_NOATIME flag,
overlayfs will try to copy the symlink's fileattrs and fail with error
ENOXIO, because it could not open the symlink for the ioctl security hook.
To solve this failure, do not attempt to copyup fileattrs for anything
other than directories and regular files.
Reported-by: Ruiwen Zhao <ruiwen@google.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217850
Fixes: 72db82115d ("ovl: copy up sync/noatime fileattr flags")
Cc: <stable@vger.kernel.org> # v5.15
Reviewed-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Always use OVL_FS() to retrieve the corresponding struct ovl_fs from a
struct super_block.
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
The legacy behavior of ovl_statfs() reports the f_fsid filled by
underlying upper fs. This fsid is not unique among overlayfs instances
on the same upper fs.
With mount option uuid=on, generate a non-persistent uuid per overlayfs
instance and use it as the seed for f_fsid, similar to tmpfs.
This is useful for reporting fanotify events with fid info from different
instances of overlayfs over the same upper fs.
The old behavior of null uuid and upper fs fsid is retained with the
mount option uuid=null, which is the default.
The mount option uuid=off that disables uuid checks in underlying layers
also retains the legacy behavior.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
During regular metacopy, if lowerdata file has fs-verity enabled, and
the verity option is enabled, we add the digest to the metacopy xattr.
If verity is required, and lowerdata does not have fs-verity enabled,
fall back to full copy-up (or the generated metacopy would not
validate).
Signed-off-by: Alexander Larsson <alexl@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
The new digest field in the metacopy xattr is used during lookup to
record whether the header contained a digest in the OVL_HAS_DIGEST
flags.
When accessing file data the first time, if OVL_HAS_DIGEST is set, we
reload the metadata and check that the source lowerdata inode matches
the specified digest in it (according to the enabled verity
options). If the verity check passes we store this info in the inode
flags as OVL_VERIFIED_DIGEST, so that we can avoid doing it again if
the inode remains in memory.
The verification is done in ovl_maybe_validate_verity() which needs to
be called in the same places as ovl_maybe_lookup_lowerdata(), so there
is a new ovl_verify_lowerdata() helper that calls these in the right
order, and all current callers of ovl_maybe_lookup_lowerdata() are
changed to call it instead.
Signed-off-by: Alexander Larsson <alexl@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Defer lookup of lowerdata in the data-only layers to first data access
or before copy up.
We perform lowerdata lookup before copy up even if copy up is metadata
only copy up. We can further optimize this lookup later if needed.
We do best effort lazy lookup of lowerdata for d_real_inode(), because
this interface does not expect errors. The only current in-tree caller
of d_real_inode() is trace_uprobe and this caller is likely going to be
followed reading from the file, before placing uprobes on offset within
the file, so lowerdata should be available when setting the uprobe.
Tested-by: kernel test robot <oliver.sang@intel.com>
Reviewed-by: Alexander Larsson <alexl@redhat.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
After copy up, we may need to update d_flags if upper dentry is on a
remote fs and lower dentries are not.
Add helpers to allow incremental update of the revalidate flags.
Fixes: bccece1ead ("ovl: allow remote upper")
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
We have decoupled vfs_listxattr() from IOP_XATTR. Instead we just need
to check whether inode->i_op->listxattr is implemented.
Cc: linux-unionfs@vger.kernel.org
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
If st_uid/st_gid doesn't have a mapping in the mounter's user_ns, then
copy-up should fail, just like it would fail if the mounter task was doing
the copy using "cp -a".
There's a corner case where the "cp -a" would succeed but copy up fail: if
there's a mapping of the invalid uid/gid (65534 by default) in the user
namespace. This is because stat(2) will return this value if the mapping
doesn't exist in the current user_ns and "cp -a" will in turn be able to
create a file with this uid/gid.
This behavior would be inconsistent with POSIX ACL's, which return -1 for
invalid uid/gid which result in a failed copy.
For consistency and simplicity fail the copy of the st_uid/st_gid are
invalid.
Fixes: 459c7c565a ("ovl: unprivieged mounts")
Cc: <stable@vger.kernel.org> # v5.11
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Seth Forshee <sforshee@kernel.org>
Now that posix acls have a proper api us it to copy them.
All filesystems that can serve as lower or upper layers for overlayfs
have gained support for the new posix acl api in previous patches.
So switch all internal overlayfs codepaths for copying posix acls to the
new posix acl api.
Acked-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCY0DP2AAKCRBZ7Krx/gZQ
6/+qAQCEGQWpcC5MB17zylaX7gqzhgAsDrwtpevlno3aIv/1pQD/YWr/E8tf7WTW
ERXRXMRx1cAzBJhUhVgIY+3ANfU2Rg4=
=cko4
-----END PGP SIGNATURE-----
Merge tag 'pull-tmpfile' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs tmpfile updates from Al Viro:
"Miklos' ->tmpfile() signature change; pass an unopened struct file to
it, let it open the damn thing. Allows to add tmpfile support to FUSE"
* tag 'pull-tmpfile' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fuse: implement ->tmpfile()
vfs: open inside ->tmpfile()
vfs: move open right after ->tmpfile()
vfs: make vfs_tmpfile() static
ovl: use vfs_tmpfile_open() helper
cachefiles: use vfs_tmpfile_open() helper
cachefiles: only pass inode to *mark_inode_inuse() helpers
cachefiles: tmpfile error handling cleanup
hugetlbfs: cleanup mknod and tmpfile
vfs: add vfs_tmpfile_open() helper
If tmpfile is used for copy up, then use this helper to create the tmpfile
and open it at the same time. This will later allow filesystems such as
fuse to do this operation atomically.
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
magical no_llseek thing and makes checks consistent. In particular,
ad-hoc "can we do splice via internal pipe" checks got saner (and
somewhat more permissive, which is what Jason had been after, AFAICT)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCYug2xgAKCRBZ7Krx/gZQ
6wxWAQDqeg+xMq2FGPXmgjCa+Cp3PXH96Lp6f3hHzakIDx+t8gEAxvuiXAD22Mct
6S1SKuGj0iDIuM4L7hUiWTiY/bDXSAc=
=3EC/
-----END PGP SIGNATURE-----
Merge tag 'pull-work.lseek' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs lseek updates from Al Viro:
"Jason's lseek series.
Saner handling of 'lseek should fail with ESPIPE' - this gets rid of
the magical no_llseek thing and makes checks consistent.
In particular, the ad-hoc "can we do splice via internal pipe" checks
got saner (and somewhat more permissive, which is what Jason had been
after, AFAICT)"
* tag 'pull-work.lseek' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs: remove no_llseek
fs: check FMODE_LSEEK to control internal pipe splicing
vfio: do not set FMODE_LSEEK flag
dma-buf: remove useless FMODE_LSEEK flag
fs: do not compare against ->llseek
fs: clear or set FMODE_LSEEK based on llseek function
Now vfs_llseek() can simply check for FMODE_LSEEK; if it's set,
we know that ->llseek() won't be NULL and if it's not we should
just fail with -ESPIPE.
A couple of other places where we used to check for special
values of ->llseek() (somewhat inconsistently) switched to
checking FMODE_LSEEK.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Now that we introduced new infrastructure to increase the type safety
for filesystems supporting idmapped mounts port the first part of the
vfs over to them.
This ports the attribute changes codepaths to rely on the new better
helpers using a dedicated type.
Before this change we used to take a shortcut and place the actual
values that would be written to inode->i_{g,u}id into struct iattr. This
had the advantage that we moved idmappings mostly out of the picture
early on but it made reasoning about changes more difficult than it
should be.
The filesystem was never explicitly told that it dealt with an idmapped
mount. The transition to the value that needed to be stored in
inode->i_{g,u}id appeared way too early and increased the probability of
bugs in various codepaths.
We know place the same value in struct iattr no matter if this is an
idmapped mount or not. The vfs will only deal with type safe
vfs{g,u}id_t. This makes it massively safer to perform permission checks
as the type will tell us what checks we need to perform and what helpers
we need to use.
Fileystems raising FS_ALLOW_IDMAP can't simply write ia_vfs{g,u}id to
inode->i_{g,u}id since they are different types. Instead they need to
use the dedicated vfs{g,u}id_to_k{g,u}id() helpers that map the
vfs{g,u}id into the filesystem.
The other nice effect is that filesystems like overlayfs don't need to
care about idmappings explicitly anymore and can simply set up struct
iattr accordingly directly.
Link: https://lore.kernel.org/lkml/CAHk-=win6+ahs1EwLkcq8apqLi_1wXFWbrPf340zYEhObpz4jA@mail.gmail.com [1]
Link: https://lore.kernel.org/r/20220621141454.2914719-9-brauner@kernel.org
Cc: Seth Forshee <sforshee@digitalocean.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
CC: linux-fsdevel@vger.kernel.org
Reviewed-by: Seth Forshee <sforshee@digitalocean.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Add a helper that allows to retrieve ovl xattrs from either lower or
upper layers. To stop passing mnt and dentry separately everywhere use
struct path which more accurately reflects the tight coupling between
mount and dentry in this helper. Swich over all places to pass a path
argument that can operate on either upper or lower layers. This is
needed to support idmapped base layers with overlayfs.
Some helpers are always called with an upper dentry, which is now utilized
by these helpers to create the path. Make this usage explicit by renaming
the argument to "upperdentry" and by renaming the function as well in some
cases. Also add a check in ovl_do_getxattr() to catch misuse of these
functions.
Cc: <linux-unionfs@vger.kernel.org>
Tested-by: Giuseppe Scrivano <gscrivan@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Introduce ovl_lookup_upper() as a simple wrapper around lookup_one().
Make it clear in the helper's name that this only operates on the upper
layer. The wrapper will take upper layer's idmapping into account when
checking permission in lookup_one().
Cc: <linux-unionfs@vger.kernel.org>
Tested-by: Giuseppe Scrivano <gscrivan@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Introduce ovl_do_notify_change() as a simple wrapper around
notify_change() to support idmapped layers. The helper mirrors other
ovl_do_*() helpers that operate on the upper layers.
When changing ownership of an upper object the intended ownership needs
to be mapped according to the upper layer's idmapping. This mapping is
the inverse to the mapping applied when copying inode information from
an upper layer to the corresponding overlay inode. So e.g., when an
upper mount maps files that are stored on-disk as owned by id 1001 to
1000 this means that calling stat on this object from an idmapped mount
will report the file as being owned by id 1000. Consequently in order to
change ownership of an object in this filesystem so it appears as being
owned by id 1000 in the upper idmapped layer it needs to store id 1001
on disk. The mnt mapping helpers take care of this.
All idmapping helpers are nops when no idmapped base layers are used.
Cc: <linux-unionfs@vger.kernel.org>
Tested-by: Giuseppe Scrivano <gscrivan@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Pass down struct ovl_fs to setattr operations so we can ultimately
retrieve the relevant upper mount and take the mount's idmapping into
account when creating new filesystem objects. This is needed to support
idmapped base layers with overlay.
Cc: <linux-unionfs@vger.kernel.org>
Tested-by: Giuseppe Scrivano <gscrivan@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Pass down struct ovl_fs to all creation helpers so we can ultimately
retrieve the relevant upper mount and take the mount's idmapping into
account when creating new filesystem objects. This is needed to support
idmapped base layers with overlay.
Cc: <linux-unionfs@vger.kernel.org>
Tested-by: Giuseppe Scrivano <gscrivan@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Use helpers ovl_*xattr() to access user/trusted.overlay.* xattrs
and use helpers ovl_do_*xattr() to access generic xattrs. This is a
preparatory patch for using idmapped base layers with overlay.
Note that a few of those places called vfs_*xattr() calls directly to
reduce the amount of debug output. But as Miklos pointed out since
overlayfs has been stable for quite some time the debug output isn't all
that relevant anymore and the additional debug in all locations was
actually quite helpful when developing this patch series.
Cc: <linux-unionfs@vger.kernel.org>
Tested-by: Giuseppe Scrivano <gscrivan@redhat.com>
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Christoph Fritz is reporting that failure to copy up fileattr when upper
doesn't support fileattr or xattr results in a regression.
Return success in these failure cases; this reverts overlayfs to the old
behavior.
Add a pr_warn_once() in these cases to still let the user know about the
copy up failures.
Reported-by: Christoph Fritz <chf.fritz@googlemail.com>
Fixes: 72db82115d ("ovl: copy up sync/noatime fileattr flags")
Cc: <stable@vger.kernel.org> # v5.15
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
This patch is fixing a NULL pointer dereference to get a recently
introduced warning message working.
Fixes: 5b0a414d06 ("ovl: fix filattr copy-up failure")
Signed-off-by: Christoph Fritz <chf.fritz@googlemail.com>
Cc: <stable@vger.kernel.org> # v5.15
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
This regression can be reproduced with ntfs-3g and overlayfs:
mkdir lower upper work overlay
dd if=/dev/zero of=ntfs.raw bs=1M count=2
mkntfs -F ntfs.raw
mount ntfs.raw lower
touch lower/file.txt
mount -t overlay -o lowerdir=lower,upperdir=upper,workdir=work - overlay
mv overlay/file.txt overlay/file2.txt
mv fails and (misleadingly) prints
mv: cannot move 'overlay/file.txt' to a subdirectory of itself, 'overlay/file2.txt'
The reason is that ovl_copy_fileattr() is triggered due to S_NOATIME being
set on all inodes (by fuse) regardless of fileattr.
ovl_copy_fileattr() tries to retrieve file attributes from lower file, but
that fails because filesystem does not support this ioctl (this should fail
with ENOTTY, but ntfs-3g return EINVAL instead). This failure is
propagated to origial operation (in this case rename) that triggered the
copy-up.
The fix is to ignore ENOTTY and EINVAL errors from fileattr_get() in copy
up. This also requires turning the internal ENOIOCTLCMD into ENOTTY.
As a further measure to prevent unnecessary failures, only try the
fileattr_get/set on upper if there are any flags to copy up.
Side note: a number of filesystems set S_NOATIME (and sometimes other inode
flags) irrespective of fileattr flags. This causes unnecessary calls
during copy up, which might lead to a performance issue, especially if
latency is high. To fix this, the kernel would need to differentiate
between the two cases. E.g. introduce SB_NOATIME_UPDATE, a per-sb variant
of S_NOATIME. SB_NOATIME doesn't work, because that's interpreted as
"filesystem doesn't store an atime attribute"
Reported-and-tested-by: Kevin Locke <kevin@kevinlocke.name>
Fixes: 72db82115d ("ovl: copy up sync/noatime fileattr flags")
Cc: <stable@vger.kernel.org> # v5.15
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Extended attributes are usually small, but could be up to 64k in size, so
use the most efficient method for doing the allocation.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
When a lower file has immutable/append-only fileattr flags, the behavior of
overlayfs post copy up is inconsistent.
Immediattely after copy up, ovl inode still has the S_IMMUTABLE/S_APPEND
inode flags copied from lower inode, so vfs code still treats the ovl inode
as immutable/append-only. After ovl inode evict or mount cycle, the ovl
inode does not have these inode flags anymore.
We cannot copy up the immutable and append-only fileattr flags, because
immutable/append-only inodes cannot be linked and because overlayfs will
not be able to set overlay.* xattr on the upper inodes.
Instead, if any of the fileattr flags of interest exist on the lower inode,
we store them in overlay.protattr xattr on the upper inode and we read the
flags from xattr on lookup and on fileattr_get().
This gives consistent behavior post copy up regardless of inode eviction
from cache.
When user sets new fileattr flags, we update or remove the overlay.protattr
xattr.
Storing immutable/append-only fileattr flags in an xattr instead of upper
fileattr also solves other non-standard behavior issues - overlayfs can now
copy up children of "ovl-immutable" directories and lower aliases of
"ovl-immutable" hardlinks.
Reported-by: Chengguang Xu <cgxu519@mykernel.net>
Link: https://lore.kernel.org/linux-unionfs/20201226104618.239739-1-cgxu519@mykernel.net/
Link: https://lore.kernel.org/linux-unionfs/20210210190334.1212210-5-amir73il@gmail.com/
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
When a lower file has sync/noatime fileattr flags, the behavior of
overlayfs post copy up is inconsistent.
Immediately after copy up, ovl inode still has the S_SYNC/S_NOATIME
inode flags copied from lower inode, so vfs code still treats the ovl
inode as sync/noatime. After ovl inode evict or mount cycle,
the ovl inode does not have these inode flags anymore.
To fix this inconsistency, try to copy the fileattr flags on copy up
if the upper fs supports the fileattr_set() method.
This gives consistent behavior post copy up regardless of inode eviction
from cache.
We cannot copy up the immutable/append-only inode flags in a similar
manner, because immutable/append-only inodes cannot be linked and because
overlayfs will not be able to set overlay.* xattr on the upper inodes.
Those flags will be addressed by a followup patch.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>