The new mount API requires additional changes to how DFS
is handled. Additional testing of DFS uncovered problems
with domain based DFS referrals (a follow on patch addresses
DFS links) which this patch addresses.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Also make sure these are displayed in /proc/mounts
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
This is needed so that we display the correct //server/share vs
\\server\share in /proc/mounts for the device name (in the new
mount API).
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
There is no need to load the default nls to check if the iocharset argument
was specified or not since we have it in cifs_sb->ctx
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
and rename it to smb3_cleanup_fs_context[_content]
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
We can already access these from cifs_sb->ctx so we no longer need
a local copy in cifs_sb.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Add 'witness' mount option to register for witness notifications.
Signed-off-by: Samuel Cabrero <scabrero@suse.de>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Register a new generic netlink family to talk to the witness service
userspace daemon.
Signed-off-by: Samuel Cabrero <scabrero@suse.de>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
as we now have a full smb3_fs_context as part of the cifs superblock
we no longer need a local copy of the mount options and can just
reference the copy in the smb3_fs_context.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
and populate it during mount in cifs_smb3_do_mount()
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
See Documentation/filesystems/mount_api.rst for details on new mount API
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Harmonize and change all such variables to 'ctx', where possible.
No changes to actual logic.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Add new module load parameter enable_gcm_256. If set, then add
AES-256-GCM (strongest encryption type) to the list of encryption
types requested. Put it in the list as the second choice (since
AES-128-GCM is faster and much more broadly supported by
SMB3 servers). To make this stronger encryption type, GCM-256,
required (the first and only choice, you would use module parameter
"require_gcm_256."
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Add new module load parameter require_gcm_256. If set, then only
request AES-256-GCM (strongest encryption type).
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Missing the final 's' in "max_channels" mount option when displayed in
/proc/mounts (or by mount command)
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Shyam Prasad N <nspmangalore@gmail.com>
In order to handle workloads where it is important to make sure that
a buggy app did not delete content on the drive, the new mount option
"nodelete" allows standard permission checks on the server to work,
but prevents on the client any attempts to unlink a file or delete
a directory on that mount point. This can be helpful when running
a little understood app on a network mount that contains important
content that should not be deleted.
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Add experimental support for allowing a swap file to be on an SMB3
mount. There are use cases where swapping over a secure network
filesystem is preferable. In some cases there are no local
block devices large enough, and network block devices can be
hard to setup and secure. And in some cases there are no
local block devices at all (e.g. with the recent addition of
remote boot over SMB3 mounts).
There are various enhancements that can be added later e.g.:
- doing a mandatory byte range lock over the swapfile (until
the Linux VFS is modified to notify the file system that an open
is for a swapfile, when the file can be opened "DENY_ALL" to prevent
others from opening it).
- pinning more buffers in the underlying transport to minimize memory
allocations in the TCP stack under the fs
- documenting how to create ACLs (on the server) to secure the
swapfile (or adding additional tools to cifs-utils to make it easier)
Signed-off-by: Steve French <stfrench@microsoft.com>
Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
See commit 349457ccf2
"Allow file systems to manually d_move() inside of ->rename()"
Lessens possibility of race conditions in rename
Signed-off-by: Steve French <stfrench@microsoft.com>
We were not displaying the mount option "signloosely" in /proc/mounts
for cifs mounts which some users found confusing recently
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Fix display for sec=krb5i which was wrongly interleaved by cruid,
resulting in string "sec=krb5,cruid=<...>i" instead of
"sec=krb5i,cruid=<...>".
Fixes: 96281b9e46 ("smb3: for kerberos mounts display the credential uid used")
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
When "backup intent" is requested on the mount (e.g. backupuid or
backupgid mount options), the corresponding flag was missing from
some of the operations.
Change all operations to use the macro cifs_create_options() to
set the backup intent flag if needed.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Pull vfs d_inode/d_flags memory ordering fixes from Al Viro:
"Fallout from tree-wide audit for ->d_inode/->d_flags barriers use.
Basically, the problem is that negative pinned dentries require
careful treatment - unless ->d_lock is locked or parent is held at
least shared, another thread can make them positive right under us.
Most of the uses turned out to be safe - the main surprises as far as
filesystems are concerned were
- race in dget_parent() fastpath, that might end up with the caller
observing the returned dentry _negative_, due to insufficient
barriers. It is positive in memory, but we could end up seeing the
wrong value of ->d_inode in CPU cache. Fixed.
- manual checks that result of lookup_one_len_unlocked() is positive
(and rejection of negatives). Again, insufficient barriers (we
might end up with inconsistent observed values of ->d_inode and
->d_flags). Fixed by switching to a new primitive that does the
checks itself and returns ERR_PTR(-ENOENT) instead of a negative
dentry. That way we get rid of boilerplate converting negatives
into ERR_PTR(-ENOENT) in the callers and have a single place to
deal with the barrier-related mess - inside fs/namei.c rather than
in every caller out there.
The guts of pathname resolution *do* need to be careful - the race
found by Ritesh is real, as well as several similar races.
Fortunately, it turns out that we can take care of that with fairly
local changes in there.
The tree-wide audit had not been fun, and I hate the idea of repeating
it. I think the right approach would be to annotate the places where
we are _not_ guaranteed ->d_inode/->d_flags stability and have sparse
catch regressions. But I'm still not sure what would be the least
invasive way of doing that and it's clearly the next cycle fodder"
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs/namei.c: fix missing barriers when checking positivity
fix dget_parent() fastpath race
new helper: lookup_positive_unlocked()
fs/namei.c: pull positivity check into follow_managed()
- Various kerneldoc script enhancements.
- More RST conversions; those are slowing down as we run out of things to
convert, but we're a ways from done still.
- Dan's "maintainer profile entry" work landed at last. Now we just need
to get maintainers to fill in the profiles...
- A reworking of the parallel build setup to work better with a variety of
systems (and to not take over huge systems entirely in particular).
- The MAINTAINERS file is now converted to RST during the build.
Hopefully nobody ever tries to print this thing, or they will need to
load a lot of paper.
- A script and documentation making it easy for maintainers to add Link:
tags at commit time.
Also included is the removal of a bunch of spurious CR characters.
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAl3j5B0PHGNvcmJldEBs
d24ubmV0AAoJEBdDWhNsDH5YtBcH/jIN2cO8/0YW2rjVT+1G6ytSdFUKx5WJ/lpf
5uBeCvuCeYhtCB6+BgnXvjykJ7jDW11/NJNjWqz/gsvD5l5FJK1rXarI/oz2Klyi
kcPtDmBF/ki4wz9qXzEpa0vg8LXdjeys50S1vE75qCzxZoPP7YjuRbPnLrlIJukv
JbDVi4p9kxgeHfRB4+BHOe5rFwA3mMmaxKNIX34Y+UUO2KZ0g/yUi1bAaQwQAdt+
PsORmkVQ8Puh3K9xRIr7dYlcWBlBiPqzYdvDgTVxSjrxdK6wjYjSgVk2VjC5MBUN
mTSTWgyfsIcD/76/s8tq7ZRl2fw+SkCSkFo79Rb/hJwDTb7Vnng=
=LPBr
-----END PGP SIGNATURE-----
Merge tag 'docs-5.5a' of git://git.lwn.net/linux
Pull Documentation updates from Jonathan Corbet:
"Here are the main documentation changes for 5.5:
- Various kerneldoc script enhancements.
- More RST conversions; those are slowing down as we run out of
things to convert, but we're a ways from done still.
- Dan's "maintainer profile entry" work landed at last. Now we just
need to get maintainers to fill in the profiles...
- A reworking of the parallel build setup to work better with a
variety of systems (and to not take over huge systems entirely in
particular).
- The MAINTAINERS file is now converted to RST during the build.
Hopefully nobody ever tries to print this thing, or they will need
to load a lot of paper.
- A script and documentation making it easy for maintainers to add
Link: tags at commit time.
Also included is the removal of a bunch of spurious CR characters"
* tag 'docs-5.5a' of git://git.lwn.net/linux: (91 commits)
docs: remove a bunch of stray CRs
docs: fix up the maintainer profile document
libnvdimm, MAINTAINERS: Maintainer Entry Profile
Maintainer Handbook: Maintainer Entry Profile
MAINTAINERS: Reclaim the P: tag for Maintainer Entry Profile
docs, parallelism: Rearrange how jobserver reservations are made
docs, parallelism: Do not leak blocking mode to other readers
docs, parallelism: Fix failure path and add comment
Documentation: Remove bootmem_debug from kernel-parameters.txt
Documentation: security: core.rst: fix warnings
Documentation/process/howto/kokr: Update for 4.x -> 5.x versioning
Documentation/translation: Use Korean for Korean translation title
docs/memory-barriers.txt: Remove remaining references to mmiowb()
docs/memory-barriers.txt/kokr: Update I/O section to be clearer about CPU vs thread
docs/memory-barriers.txt/kokr: Fix style, spacing and grammar in I/O section
Documentation/kokr: Kill all references to mmiowb()
docs/memory-barriers.txt/kokr: Rewrite "KERNEL I/O BARRIER EFFECTS" section
docs: Add initial documentation for devfreq
Documentation: Document how to get links with git am
docs: Add request_irq() documentation
...
This patch moves the final part of the cifsFileInfo_put() logic where we
need a write lock on lock_sem to be processed in a separate thread that
holds no other locks.
This is to prevent deadlocks like the one below:
> there are 6 processes looping to while trying to down_write
> cinode->lock_sem, 5 of them from _cifsFileInfo_put, and one from
> cifs_new_fileinfo
>
> and there are 5 other processes which are blocked, several of them
> waiting on either PG_writeback or PG_locked (which are both set), all
> for the same page of the file
>
> 2 inode_lock() (inode->i_rwsem) for the file
> 1 wait_on_page_writeback() for the page
> 1 down_read(inode->i_rwsem) for the inode of the directory
> 1 inode_lock()(inode->i_rwsem) for the inode of the directory
> 1 __lock_page
>
>
> so processes are blocked waiting on:
> page flags PG_locked and PG_writeback for one specific page
> inode->i_rwsem for the directory
> inode->i_rwsem for the file
> cifsInodeInflock_sem
>
>
>
> here are the more gory details (let me know if I need to provide
> anything more/better):
>
> [0 00:48:22.765] [UN] PID: 8863 TASK: ffff8c691547c5c0 CPU: 3
> COMMAND: "reopen_file"
> #0 [ffff9965007e3ba8] __schedule at ffffffff9b6e6095
> #1 [ffff9965007e3c38] schedule at ffffffff9b6e64df
> #2 [ffff9965007e3c48] rwsem_down_write_slowpath at ffffffff9af283d7
> #3 [ffff9965007e3cb8] legitimize_path at ffffffff9b0f975d
> #4 [ffff9965007e3d08] path_openat at ffffffff9b0fe55d
> #5 [ffff9965007e3dd8] do_filp_open at ffffffff9b100a33
> #6 [ffff9965007e3ee0] do_sys_open at ffffffff9b0eb2d6
> #7 [ffff9965007e3f38] do_syscall_64 at ffffffff9ae04315
> * (I think legitimize_path is bogus)
>
> in path_openat
> } else {
> const char *s = path_init(nd, flags);
> while (!(error = link_path_walk(s, nd)) &&
> (error = do_last(nd, file, op)) > 0) { <<<<
>
> do_last:
> if (open_flag & O_CREAT)
> inode_lock(dir->d_inode); <<<<
> else
> so it's trying to take inode->i_rwsem for the directory
>
> DENTRY INODE SUPERBLK TYPE PATH
> ffff8c68bb8e79c0 ffff8c691158ef20 ffff8c6915bf9000 DIR /mnt/vm1_smb/
> inode.i_rwsem is ffff8c691158efc0
>
> <struct rw_semaphore 0xffff8c691158efc0>:
> owner: <struct task_struct 0xffff8c6914275d00> (UN - 8856 -
> reopen_file), counter: 0x0000000000000003
> waitlist: 2
> 0xffff9965007e3c90 8863 reopen_file UN 0 1:29:22.926
> RWSEM_WAITING_FOR_WRITE
> 0xffff996500393e00 9802 ls UN 0 1:17:26.700
> RWSEM_WAITING_FOR_READ
>
>
> the owner of the inode.i_rwsem of the directory is:
>
> [0 00:00:00.109] [UN] PID: 8856 TASK: ffff8c6914275d00 CPU: 3
> COMMAND: "reopen_file"
> #0 [ffff99650065b828] __schedule at ffffffff9b6e6095
> #1 [ffff99650065b8b8] schedule at ffffffff9b6e64df
> #2 [ffff99650065b8c8] schedule_timeout at ffffffff9b6e9f89
> #3 [ffff99650065b940] msleep at ffffffff9af573a9
> #4 [ffff99650065b948] _cifsFileInfo_put.cold.63 at ffffffffc0a42dd6 [cifs]
> #5 [ffff99650065ba38] cifs_writepage_locked at ffffffffc0a0b8f3 [cifs]
> #6 [ffff99650065bab0] cifs_launder_page at ffffffffc0a0bb72 [cifs]
> #7 [ffff99650065bb30] invalidate_inode_pages2_range at ffffffff9b04d4bd
> #8 [ffff99650065bcb8] cifs_invalidate_mapping at ffffffffc0a11339 [cifs]
> #9 [ffff99650065bcd0] cifs_revalidate_mapping at ffffffffc0a1139a [cifs]
> #10 [ffff99650065bcf0] cifs_d_revalidate at ffffffffc0a014f6 [cifs]
> #11 [ffff99650065bd08] path_openat at ffffffff9b0fe7f7
> #12 [ffff99650065bdd8] do_filp_open at ffffffff9b100a33
> #13 [ffff99650065bee0] do_sys_open at ffffffff9b0eb2d6
> #14 [ffff99650065bf38] do_syscall_64 at ffffffff9ae04315
>
> cifs_launder_page is for page 0xffffd1e2c07d2480
>
> crash> page.index,mapping,flags 0xffffd1e2c07d2480
> index = 0x8
> mapping = 0xffff8c68f3cd0db0
> flags = 0xfffffc0008095
>
> PAGE-FLAG BIT VALUE
> PG_locked 0 0000001
> PG_uptodate 2 0000004
> PG_lru 4 0000010
> PG_waiters 7 0000080
> PG_writeback 15 0008000
>
>
> inode is ffff8c68f3cd0c40
> inode.i_rwsem is ffff8c68f3cd0ce0
> DENTRY INODE SUPERBLK TYPE PATH
> ffff8c68a1f1b480 ffff8c68f3cd0c40 ffff8c6915bf9000 REG
> /mnt/vm1_smb/testfile.8853
>
>
> this process holds the inode->i_rwsem for the parent directory, is
> laundering a page attached to the inode of the file it's opening, and in
> _cifsFileInfo_put is trying to down_write the cifsInodeInflock_sem
> for the file itself.
>
>
> <struct rw_semaphore 0xffff8c68f3cd0ce0>:
> owner: <struct task_struct 0xffff8c6914272e80> (UN - 8854 -
> reopen_file), counter: 0x0000000000000003
> waitlist: 1
> 0xffff9965005dfd80 8855 reopen_file UN 0 1:29:22.912
> RWSEM_WAITING_FOR_WRITE
>
> this is the inode.i_rwsem for the file
>
> the owner:
>
> [0 00:48:22.739] [UN] PID: 8854 TASK: ffff8c6914272e80 CPU: 2
> COMMAND: "reopen_file"
> #0 [ffff99650054fb38] __schedule at ffffffff9b6e6095
> #1 [ffff99650054fbc8] schedule at ffffffff9b6e64df
> #2 [ffff99650054fbd8] io_schedule at ffffffff9b6e68e2
> #3 [ffff99650054fbe8] __lock_page at ffffffff9b03c56f
> #4 [ffff99650054fc80] pagecache_get_page at ffffffff9b03dcdf
> #5 [ffff99650054fcc0] grab_cache_page_write_begin at ffffffff9b03ef4c
> #6 [ffff99650054fcd0] cifs_write_begin at ffffffffc0a064ec [cifs]
> #7 [ffff99650054fd30] generic_perform_write at ffffffff9b03bba4
> #8 [ffff99650054fda8] __generic_file_write_iter at ffffffff9b04060a
> #9 [ffff99650054fdf0] cifs_strict_writev.cold.70 at ffffffffc0a4469b [cifs]
> #10 [ffff99650054fe48] new_sync_write at ffffffff9b0ec1dd
> #11 [ffff99650054fed0] vfs_write at ffffffff9b0eed35
> #12 [ffff99650054ff00] ksys_write at ffffffff9b0eefd9
> #13 [ffff99650054ff38] do_syscall_64 at ffffffff9ae04315
>
> the process holds the inode->i_rwsem for the file to which it's writing,
> and is trying to __lock_page for the same page as in the other processes
>
>
> the other tasks:
> [0 00:00:00.028] [UN] PID: 8859 TASK: ffff8c6915479740 CPU: 2
> COMMAND: "reopen_file"
> #0 [ffff9965007b39d8] __schedule at ffffffff9b6e6095
> #1 [ffff9965007b3a68] schedule at ffffffff9b6e64df
> #2 [ffff9965007b3a78] schedule_timeout at ffffffff9b6e9f89
> #3 [ffff9965007b3af0] msleep at ffffffff9af573a9
> #4 [ffff9965007b3af8] cifs_new_fileinfo.cold.61 at ffffffffc0a42a07 [cifs]
> #5 [ffff9965007b3b78] cifs_open at ffffffffc0a0709d [cifs]
> #6 [ffff9965007b3cd8] do_dentry_open at ffffffff9b0e9b7a
> #7 [ffff9965007b3d08] path_openat at ffffffff9b0fe34f
> #8 [ffff9965007b3dd8] do_filp_open at ffffffff9b100a33
> #9 [ffff9965007b3ee0] do_sys_open at ffffffff9b0eb2d6
> #10 [ffff9965007b3f38] do_syscall_64 at ffffffff9ae04315
>
> this is opening the file, and is trying to down_write cinode->lock_sem
>
>
> [0 00:00:00.041] [UN] PID: 8860 TASK: ffff8c691547ae80 CPU: 2
> COMMAND: "reopen_file"
> [0 00:00:00.057] [UN] PID: 8861 TASK: ffff8c6915478000 CPU: 3
> COMMAND: "reopen_file"
> [0 00:00:00.059] [UN] PID: 8858 TASK: ffff8c6914271740 CPU: 2
> COMMAND: "reopen_file"
> [0 00:00:00.109] [UN] PID: 8862 TASK: ffff8c691547dd00 CPU: 6
> COMMAND: "reopen_file"
> #0 [ffff9965007c3c78] __schedule at ffffffff9b6e6095
> #1 [ffff9965007c3d08] schedule at ffffffff9b6e64df
> #2 [ffff9965007c3d18] schedule_timeout at ffffffff9b6e9f89
> #3 [ffff9965007c3d90] msleep at ffffffff9af573a9
> #4 [ffff9965007c3d98] _cifsFileInfo_put.cold.63 at ffffffffc0a42dd6 [cifs]
> #5 [ffff9965007c3e88] cifs_close at ffffffffc0a07aaf [cifs]
> #6 [ffff9965007c3ea0] __fput at ffffffff9b0efa6e
> #7 [ffff9965007c3ee8] task_work_run at ffffffff9aef1614
> #8 [ffff9965007c3f20] exit_to_usermode_loop at ffffffff9ae03d6f
> #9 [ffff9965007c3f38] do_syscall_64 at ffffffff9ae0444c
>
> closing the file, and trying to down_write cifsi->lock_sem
>
>
> [0 00:48:22.839] [UN] PID: 8857 TASK: ffff8c6914270000 CPU: 7
> COMMAND: "reopen_file"
> #0 [ffff9965006a7cc8] __schedule at ffffffff9b6e6095
> #1 [ffff9965006a7d58] schedule at ffffffff9b6e64df
> #2 [ffff9965006a7d68] io_schedule at ffffffff9b6e68e2
> #3 [ffff9965006a7d78] wait_on_page_bit at ffffffff9b03cac6
> #4 [ffff9965006a7e10] __filemap_fdatawait_range at ffffffff9b03b028
> #5 [ffff9965006a7ed8] filemap_write_and_wait at ffffffff9b040165
> #6 [ffff9965006a7ef0] cifs_flush at ffffffffc0a0c2fa [cifs]
> #7 [ffff9965006a7f10] filp_close at ffffffff9b0e93f1
> #8 [ffff9965006a7f30] __x64_sys_close at ffffffff9b0e9a0e
> #9 [ffff9965006a7f38] do_syscall_64 at ffffffff9ae04315
>
> in __filemap_fdatawait_range
> wait_on_page_writeback(page);
> for the same page of the file
>
>
>
> [0 00:48:22.718] [UN] PID: 8855 TASK: ffff8c69142745c0 CPU: 7
> COMMAND: "reopen_file"
> #0 [ffff9965005dfc98] __schedule at ffffffff9b6e6095
> #1 [ffff9965005dfd28] schedule at ffffffff9b6e64df
> #2 [ffff9965005dfd38] rwsem_down_write_slowpath at ffffffff9af283d7
> #3 [ffff9965005dfdf0] cifs_strict_writev at ffffffffc0a0c40a [cifs]
> #4 [ffff9965005dfe48] new_sync_write at ffffffff9b0ec1dd
> #5 [ffff9965005dfed0] vfs_write at ffffffff9b0eed35
> #6 [ffff9965005dff00] ksys_write at ffffffff9b0eefd9
> #7 [ffff9965005dff38] do_syscall_64 at ffffffff9ae04315
>
> inode_lock(inode);
>
>
> and one 'ls' later on, to see whether the rest of the mount is available
> (the test file is in the root, so we get blocked up on the directory
> ->i_rwsem), so the entire mount is unavailable
>
> [0 00:36:26.473] [UN] PID: 9802 TASK: ffff8c691436ae80 CPU: 4
> COMMAND: "ls"
> #0 [ffff996500393d28] __schedule at ffffffff9b6e6095
> #1 [ffff996500393db8] schedule at ffffffff9b6e64df
> #2 [ffff996500393dc8] rwsem_down_read_slowpath at ffffffff9b6e9421
> #3 [ffff996500393e78] down_read_killable at ffffffff9b6e95e2
> #4 [ffff996500393e88] iterate_dir at ffffffff9b103c56
> #5 [ffff996500393ec8] ksys_getdents64 at ffffffff9b104b0c
> #6 [ffff996500393f30] __x64_sys_getdents64 at ffffffff9b104bb6
> #7 [ffff996500393f38] do_syscall_64 at ffffffff9ae04315
>
> in iterate_dir:
> if (shared)
> res = down_read_killable(&inode->i_rwsem); <<<<
> else
> res = down_write_killable(&inode->i_rwsem);
>
Reported-by: Frank Sorenson <sorenson@redhat.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
adds:
- [no]multichannel to enable/disable multichannel
- max_channels=N to control how many channels to create
these options are then stored in the volume struct.
- store channels and max_channels in cifs_ses
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
It can cause
to fail with
modprobe: FATAL: Module <module> is builtin.
RHBZ: 1767094
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
The flock system call locks the whole file rather than a byte
range and so is currently emulated by various other file systems
by simply sending a byte range lock for the whole file.
Add flock handling for cifs.ko in similar way.
xfstest generic/504 passes with this as well
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Most of the callers of lookup_one_len_unlocked() treat negatives are
ERR_PTR(-ENOENT). Provide a helper that would do just that. Note
that a pinned positive dentry remains positive - it's ->d_inode is
stable, etc.; a pinned _negative_ dentry can become positive at any
point as long as you are not holding its parent at least shared.
So using lookup_one_len_unlocked() needs to be careful;
lookup_positive_unlocked() is safer and that's what the callers
end up open-coding anyway.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl2su/AeHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGvm4H/1jkheCrvB/GJS69
wd18vizAg+eFmNCzxlGVhpQTKGymNRy+g6clnoli3cNJ3pSVKcYgVyB3oXaONIhp
g/ANudnBjTdjqYgJzfLij5AGecrGwDpF3YL0kuKrCB63s2I/HwQGYy/aPrYY8emy
gAYdaf1DGRu5/DIIB6soTo/TnpKoAyTE+XY5MaPSug++t/Flov19tlU40IZxXW94
bjTXbm0yklrsIx+LL5mYYGGnygSTCF66JjFg1qhDCBQaS2MZ21h1ZgaOtGZTwZcc
WgEiqLC5S1Iyj96zir1t78RcVQ4RzgvDbhUOgIqUFsYAO2wOicvxyFE3Hj8rPOKd
uGgVPRM=
=xgZa
-----END PGP SIGNATURE-----
Merge tag 'v5.4-rc4' into docs-next
I need to pick up the independent changes made to
Documentation/core-api/memory-allocation.rst to be able to merge further
work without creating a total mess.
It could be confusing why we set granularity to 1 seconds rather
than 2 seconds (1 second is the max the VFS allows) for these
mounts to very old servers ...
Signed-off-by: Steve French <stfrench@microsoft.com>
There are a number of documentation files that got moved or
renamed. update their references.
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: Shannon Nelson <snelson@pensando.io>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Rob Herring <robh@kernel.org>
Acked-by: Paul Walmsley <paul.walmsley@sifive.com> # RISC-V
Acked-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Fixes: cb7a69e605 ("cifs: Initialize filesystem timestamp ranges")
Only very old servers (e.g. OS/2 and DOS) did not support
DCE TIME (100 nanosecond granularity). Fix the checks used
to set minimum and maximum times.
Fixes xfstest generic/258 (on 5.4-rc1 and later)
CC: Deepa Dinamani <deepa.kernel@gmail.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAl2Dq7wACgkQiiy9cAdy
T1H5Kgv+NdJCBEfMjYzDPCci0plAFjurbPh5GvlrgkrYuRyCEw7KcG0/DltQSoCk
TOQ0JTVYulyKPd8uwha2p8ylscKwEi3BPHa0+CMg6qRmCtrRMMhh85TF2pScy8Jz
s1s9RskEvdJMpZ0disu1uge9O4JHNFB3LnfM7bDMxGTDUYMtuAXYIXTqMpLX3KVq
8NzcXUD0gAdjqo8iYZ/BtdeE7MBUCTvjp4O7pJh0cq+EV7p2G2DzaPhLJ3U5dL/y
IlMgMBfyuYNH6wZueB+R2ZO1GeiPe6M4sq/Beh2/5dXKGaNECxo3feDFOHsvsasT
WTllXAe8rz9AH8g8+QT4mxPQYBYw0IcJaQk5y/12gTiD2CR4BiTcNmgYv49tqiTH
b4Gl6KbjTvyT0iieStUBywCpU93qD04nwuzBOPqT0FG0T6Papx4sTvvYM3ThGs16
X891RLxdvj9QOjpPnQ4vw9yAd1s7PDjVm7BA5CRhRf0Yvdc7q2YVyitzGYBzjqF0
K2f/4HRJ
=4/w5
-----END PGP SIGNATURE-----
Merge tag '5.4-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs updates from Steve French:
"Various cifs/smb3 fixes (including for share deleted cases) and
features including improved encrypted read performance, and various
debugging improvements.
Note that since I am at a test event this week with the Samba team,
and at the annual Storage Developer Conference/SMB3 Plugfest test
event next week a higher than usual number of fixes is expected later
next week as other features in progress get additional testing and
review during these two events"
* tag '5.4-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: (38 commits)
cifs: update internal module version number
cifs: modefromsid: write mode ACE first
cifs: cifsroot: add more err checking
smb3: add missing worker function for SMB3 change notify
cifs: Add support for root file systems
cifs: modefromsid: make room for 4 ACE
smb3: fix potential null dereference in decrypt offload
smb3: fix unmount hang in open_shroot
smb3: allow disabling requesting leases
smb3: improve handling of share deleted (and share recreated)
smb3: display max smb3 requests in flight at any one time
smb3: only offload decryption of read responses if multiple requests
cifs: add a helper to find an existing readable handle to a file
smb3: enable offload of decryption of large reads via mount option
smb3: allow parallelizing decryption of reads
cifs: add a debug macro that prints \\server\share for errors
smb3: fix signing verification of large reads
smb3: allow skipping signature verification for perf sensitive configurations
smb3: add dynamic tracepoints for flush and close
smb3: log warning if CSC policy conflicts with cache mount option
...
This series from Deepa Dinamani adds a per-superblock minimum/maximum
timestamp limit for a file system, and clamps timestamps as they are
written, to avoid random behavior from integer overflow as well as having
different time stamps on disk vs in memory.
At mount time, a warning is now printed for any file system that can
represent current timestamps but not future timestamps more than 30
years into the future, similar to the arbitrary 30 year limit that was
added to settimeofday().
This was picked as a compromise to warn users to migrate to other file
systems (e.g. ext4 instead of ext3) when they need the file system to
survive beyond 2038 (or similar limits in other file systems), but not
get in the way of normal usage.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJdcs20AAoJEJpsee/mABjZaOwQALl3lBEhg0aV6a0ZZ1uYehtd
vcjZ6OpehfiOAxYJu0wfLPATo4T0FuBxZKz3+trkJDICcxyc68AJ2wijwInIQnZW
MrSKnPyv/fSGp8Jr5w/0CLdp6yT6Dh7z4j2UxhwusR1bQh4cCYSswDg29/nmxgKp
Nu8m7jMvJQ2Q0r4Zy0sT/MaycUcSH5yvpyTcsYFixGOz1niNy91ISs1+aq6HZ3i3
+cuYTUy13y40iNUHzFBTcJItBnikwZOQ/zjNfJFXZ3bVEUPg8ZTLPYQ0OZz+pM0Z
AlXCKghb2EOKgq729LtA6oaY+Nom/1Gm1p80q3G+nGRVOqRgC+dfAVPZQoiER5Y1
zNPEDf2Sf7J9xktvfC+Qqa9QEUPLKs22ZIccG+vYBW65sS8IAiEDH3LAt444GGls
yB/Cx/Qw7BftpR5Om27Mhm5jDQzr43iTkZaPQWq7ydJXpfxnjlg9L19yS1omDFyV
hdbBXY6FikUICPKUW6I49z5BhjL+kmK9M2DVljImmdKNDTrfr0xY5M/EWjJZ7X+I
rnSe9qTY+iQ5/AXANn5wfj1Y6L5IxkmdWI/zDIbKhYMZLCqqFLd3mJERbs+CMDJq
qNrYyFPReFrg50oSduBPAByMTR4x9hus7iIC7r77kpoz5i60DPmIJoTfFm3844Gv
sBEyvWV08CpE9mSzXuv6
=em9y
-----END PGP SIGNATURE-----
Merge tag 'y2038-vfs' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground
Pull y2038 vfs updates from Arnd Bergmann:
"Add inode timestamp clamping.
This series from Deepa Dinamani adds a per-superblock minimum/maximum
timestamp limit for a file system, and clamps timestamps as they are
written, to avoid random behavior from integer overflow as well as
having different time stamps on disk vs in memory.
At mount time, a warning is now printed for any file system that can
represent current timestamps but not future timestamps more than 30
years into the future, similar to the arbitrary 30 year limit that was
added to settimeofday().
This was picked as a compromise to warn users to migrate to other file
systems (e.g. ext4 instead of ext3) when they need the file system to
survive beyond 2038 (or similar limits in other file systems), but not
get in the way of normal usage"
* tag 'y2038-vfs' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground:
ext4: Reduce ext4 timestamp warnings
isofs: Initialize filesystem timestamp ranges
pstore: fs superblock limits
fs: omfs: Initialize filesystem timestamp ranges
fs: hpfs: Initialize filesystem timestamp ranges
fs: ceph: Initialize filesystem timestamp ranges
fs: sysv: Initialize filesystem timestamp ranges
fs: affs: Initialize filesystem timestamp ranges
fs: fat: Initialize filesystem timestamp ranges
fs: cifs: Initialize filesystem timestamp ranges
fs: nfs: Initialize filesystem timestamp ranges
ext4: Initialize timestamps limits
9p: Fill min and max timestamps in sb
fs: Fill in max and min timestamps in superblock
utimes: Clamp the timestamps before update
mount: Add mount warning for impending timestamp expiry
timestamp_truncate: Replace users of timespec64_trunc
vfs: Add timestamp_truncate() api
vfs: Add file timestamp range support
In some cases to work around server bugs or performance
problems it can be helpful to be able to disable requesting
SMB2.1/SMB3 leases on a particular mount (not to all servers
and all shares we are mounted to). Add new mount parm
"nolease" which turns off requesting leases on directory
or file opens. Currently the only way to disable leases is
globally through a module load parameter. This is more
granular.
Suggested-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
CC: Stable <stable@vger.kernel.org>
No point in offloading read decryption if no other requests on the
wire
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Disable offload of the decryption of encrypted read responses
by default (equivalent to setting this new mount option "esize=0").
Allow setting the minimum encrypted read response size that we
will choose to offload to a worker thread - it is now configurable
via on a new mount option "esize="
Depending on which encryption mechanism (GCM vs. CCM) and
the number of reads that will be issued in parallel and the
performance of the network and CPU on the client, it may make
sense to enable this since it can provide substantial benefit when
multiple large reads are in flight at the same time.
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
decrypting large reads on encrypted shares can be slow (e.g. adding
multiple milliseconds per-read on non-GCM capable servers or
when mounting with dialects prior to SMB3.1.1) - allow parallelizing
of read decryption by launching worker threads.
Testing to Samba on localhost showed 25% improvement.
Testing to remote server showed very large improvement when
doing more than one 'cp' command was called at one time.
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
If a share is known to be only to be accessed by one client, we
can aggressively cache writes not just reads to it.
Add "cache=" option (cache=singleclient) for mounting read write shares
(that will not be read or written to from other clients while we have
it mounted) in order to improve performance.
Signed-off-by: Steve French <stfrench@microsoft.com>
If a share is immutable (at least for the period that it will
be mounted) it would be helpful to not have to revalidate
dentries repeatedly that we know can not be changed remotely.
Add "cache=" option (cache=ro) for mounting read only shares
in order to improve performance in cases in which we know that
the share will not be changing while it is in use.
Signed-off-by: Steve French <stfrench@microsoft.com>
Some legacy code in the CIFS driver uses single DES to calculate
some password hash, and uses the crypto cipher API to do so. Given
that there is no point in invoking an accelerated cipher for doing
56-bit symmetric encryption on a single 8-byte block of input, the
flexibility of the crypto cipher API does not add much value here,
and so we're much better off using a library call into the generic
C implementation.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
cifs has both source and destination inodes locked throughout the copy.
Like ->write_iter(), we update mtime and strip setuid bits of destination
file before copy and like ->read_iter(), we update atime of source file
after copy.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAl0wDEQACgkQiiy9cAdy
T1E3CQv/e+8uTD0dSmU+bEBopYCtihRq7ZGXtCGSE8U/fj0l34qBxds/JLvTSSeY
NhUD+F5e2NYSU7LZx8d9HkOJStcLaNx5Jq1YrxmGvVfUC6s7VKn9637nByXhrgrM
t/rQj8Ot6RDGMNs7PlMUt1jjtP3zL9ugQ2DHsjLoCY+w07qbsVWCZlm9sJEmr8lS
3umvfPPi8LKNsOxTT+DsSwZ+XN/BctCExeojVkdFRCBsYJyHbJtejeJPXWxv4/6m
lQpY0uLwjxgRO6aZxFvMW18vhI8977f1svwA4CmgaVYB0A7yr1VptINWVPfN+mGK
BYJRe1i54JSBZ8/vp1POvKrhLa6Y623BNpa6myjxOXYQ3/M7PDU+PycosI4V61Bp
yyH451jdKGZYojG6O7qGGE8kTDyjCs/k/2GeNeUKvHcNX9juDBMTxx2G5kP+w/xd
2lgvgrYlSWVG/p1ADlHtwsAEupg8xZcl/y3IGBIAw57uKAX2LRzujbeT/CpZ3phm
k5ZljExt
=bdbT
-----END PGP SIGNATURE-----
Merge tag '4.3-rc-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs updates from Steve French:
"Fixes (three for stable) and improvements including much faster
encryption (SMB3.1.1 GCM)"
* tag '4.3-rc-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: (27 commits)
smb3: smbdirect no longer experimental
cifs: fix crash in smb2_compound_op()/smb2_set_next_command()
cifs: fix crash in cifs_dfs_do_automount
cifs: fix parsing of symbolic link error response
cifs: refactor and clean up arguments in the reparse point parsing
SMB3: query inode number on open via create context
smb3: Send netname context during negotiate protocol
smb3: do not send compression info by default
smb3: add new mount option to retrieve mode from special ACE
smb3: Allow query of symlinks stored as reparse points
cifs: Fix a race condition with cifs_echo_request
cifs: always add credits back for unsolicited PDUs
fs: cifs: cifsssmb: Change return type of convert_ace_to_cifs_ace
add some missing definitions
cifs: fix typo in debug message with struct field ia_valid
smb3: minor cleanup of compound_send_recv
CIFS: Fix module dependency
cifs: simplify code by removing CONFIG_CIFS_ACL ifdef
cifs: Fix check for matching with existing mount
cifs: Properly handle auto disabling of serverino option
...
- Create a generic copy_file_range handler and make individual
filesystems responsible for calling it (i.e. no more assuming that
do_splice_direct will work or is appropriate)
- Refactor copy_file_range and remap_range parameter checking where they
are the same
- Install missing copy_file_range parameter checking(!)
- Remove suid/sgid and update mtime like any other file write
- Change the behavior so that a copy range crossing the source file's
eof will result in a short copy to the source file's eof instead of
EINVAL
- Permit filesystems to decide if they want to handle cross-superblock
copy_file_range in their local handlers.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAl0BGvAACgkQ+H93GTRK
tOu2aw/+KGG7PiXm9ED3ZXUppKVddrZMOgqM7mSfHo6TBgW3pJUJcRIhawK0Wz/P
stgTsOkurHSl3iT3vQyX4GTZvLoGN/rfsRLPxogJptBUqVv3BOrXsrI53f7V/kbm
rtjlYsgExji7VBUiMTe5kOWWqxyR7B4nXyvY/8rier57rW/8C1I58B0OrxAmTK0k
rz1e5BtE1dg91xA7cSdEc38FInz8MW8cvsrEzW9vyYY4IVE0PBuhhA1EvryxTrAZ
hfthHFfzwxhJkI0mdha8uqNufNWrHLSqiwyjYC7pwAwSQzQPiQz9U17flu+URnfF
kXaR5LdXbBP3pl46RdthrfuonWsv612cC1Qwfjs8PBG9lG7b9PGJ40MGVTiw7LlQ
924/03ho0zAnV0E8Qn5O9nPshQNDJhwhzMS39EmMyFKb1D5XGzdMV0gDdIfx6hdO
HDbw6VQ33S59gvk7v/gxsFB5Bs4PKfamHx/QmwQwpqWM5XExcr0yJ90OTBtAuY4r
S+9gwG6uED3aPh8HbQ5UgnA8bZmMmi8AkcBvqJ9GgNw5SbZl0oyv9Sj6JNpoOejV
8y9JkhoZUxqiihnKTw/vtMrj5RCOfifNBjMSwrShfLdLKtK0AZl1mXC0/1Q3VnEQ
TUcyRHEzrtHgJ9/AK9xIyDNvNYzvHSLZj7maoZZumgQa2FOFrmw=
=qM44
-----END PGP SIGNATURE-----
Merge tag 'copy-file-range-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull copy_file_range updates from Darrick Wong:
"This fixes numerous parameter checking problems and inconsistent
behaviors in the new(ish) copy_file_range system call.
Now the system call will actually check its range parameters
correctly; refuse to copy into files for which the caller does not
have sufficient privileges; update mtime and strip setuid like file
writes are supposed to do; and allows copying up to the EOF of the
source file instead of failing the call like we used to.
Summary:
- Create a generic copy_file_range handler and make individual
filesystems responsible for calling it (i.e. no more assuming that
do_splice_direct will work or is appropriate)
- Refactor copy_file_range and remap_range parameter checking where
they are the same
- Install missing copy_file_range parameter checking(!)
- Remove suid/sgid and update mtime like any other file write
- Change the behavior so that a copy range crossing the source file's
eof will result in a short copy to the source file's eof instead of
EINVAL
- Permit filesystems to decide if they want to handle
cross-superblock copy_file_range in their local handlers"
* tag 'copy-file-range-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
fuse: copy_file_range needs to strip setuid bits and update timestamps
vfs: allow copy_file_range to copy across devices
xfs: use file_modified() helper
vfs: introduce file_modified() helper
vfs: add missing checks to copy_file_range
vfs: remove redundant checks from generic_remap_checks()
vfs: introduce generic_file_rw_checks()
vfs: no fallback for ->copy_file_range
vfs: introduce generic_copy_file_range()