Pull crypto updates from Herbert Xu:
"Here is the crypto update for 5.3:
API:
- Test shash interface directly in testmgr
- cra_driver_name is now mandatory
Algorithms:
- Replace arc4 crypto_cipher with library helper
- Implement 5 way interleave for ECB, CBC and CTR on arm64
- Add xxhash
- Add continuous self-test on noise source to drbg
- Update jitter RNG
Drivers:
- Add support for SHA204A random number generator
- Add support for 7211 in iproc-rng200
- Fix fuzz test failures in inside-secure
- Fix fuzz test failures in talitos
- Fix fuzz test failures in qat"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (143 commits)
crypto: stm32/hash - remove interruptible condition for dma
crypto: stm32/hash - Fix hmac issue more than 256 bytes
crypto: stm32/crc32 - rename driver file
crypto: amcc - remove memset after dma_alloc_coherent
crypto: ccp - Switch to SPDX license identifiers
crypto: ccp - Validate the the error value used to index error messages
crypto: doc - Fix formatting of new crypto engine content
crypto: doc - Add parameter documentation
crypto: arm64/aes-ce - implement 5 way interleave for ECB, CBC and CTR
crypto: arm64/aes-ce - add 5 way interleave routines
crypto: talitos - drop icv_ool
crypto: talitos - fix hash on SEC1.
crypto: talitos - move struct talitos_edesc into talitos.h
lib/scatterlist: Fix mapping iterator when sg->offset is greater than PAGE_SIZE
crypto/NX: Set receive window credits to max number of CRBs in RxFIFO
crypto: asymmetric_keys - select CRYPTO_HASH where needed
crypto: serpent - mark __serpent_setkey_sbox noinline
crypto: testmgr - dynamically allocate crypto_shash
crypto: testmgr - dynamically allocate testvec_config
crypto: talitos - eliminate unneeded 'done' functions at build time
...
There is a special ACE used by some servers to allow the mode
bits to be stored. This can be especially helpful in scenarios
in which the client is trusted, and access checking on the
client vs the POSIX mode bits is sufficient.
Add mount option to allow enabling this behavior.
Follow on patch will add support for chmod and queryinfo
(stat) by retrieving the POSIX mode bits from the special
ACE, SID: S-1-5-88-3
See e.g.
https://docs.microsoft.com/en-us/previous-versions/windows/it-pro/windows-server-2008-R2-and-2008/hh509017(v=ws.10)
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
SMB3 ACL support is needed for many use cases now and should not be
ifdeffed out, even for SMB1 (CIFS). Remove the CONFIG_CIFS_ACL
ifdef so ACL support is always built into cifs.ko
Signed-off-by: Steve French <stfrench@microsoft.com>
If "max_credits" is overridden from its default by specifying
it on the smb3 mount then display it in /proc/mounts
Signed-off-by: Steve French <stfrench@microsoft.com>
The CIFS code uses the sync skcipher API to invoke the ecb(arc4) skcipher,
of which only a single generic C code implementation exists. This means
that going through all the trouble of using scatterlists etc buys us
very little, and we're better off just invoking the arc4 library directly.
This also reverts commit 5f4b55699a ("CIFS: Fix BUG() in calc_seckey()"),
since it is no longer necessary to allocate sec_key on the heap.
Cc: linux-cifs@vger.kernel.org
Cc: Steve French <sfrench@samba.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
We can not depend on the tcon->open_file_lock here since in multiuser mode
we may have the same file/inode open via multiple different tcons.
The current code is race prone and will crash if one user deletes a file
at the same time a different user opens/create the file.
To avoid this we need to have a spinlock attached to the inode and not the tcon.
RHBZ: 1580165
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
We want to enable cross-filesystem copy_file_range functionality
where possible, so push the "same superblock only" checks down to
the individual filesystem callouts so they can make their own
decisions about cross-superblock copy offload and fallack to
generic_copy_file_range() for cross-superblock copy.
[Amir] We do not call ->remap_file_range() in case the files are not
on the same sb and do not call ->copy_file_range() in case the files
do not belong to the same filesystem driver.
This changes behavior of the copy_file_range(2) syscall, which will
now allow cross filesystem in-kernel copy. CIFS already supports
cross-superblock copy, between two shares to the same server. This
functionality will now be available via the copy_file_range(2) syscall.
Cc: Steve French <stfrench@microsoft.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Now that we have generic_copy_file_range(), remove it as a fallback
case when offloads fail. This puts the responsibility for executing
fallbacks on the filesystems that implement ->copy_file_range and
allows us to add operational validity checks to
generic_copy_file_range().
Rework vfs_copy_file_range() to call a new do_copy_file_range()
helper to execute the copying callout, and move calls to
generic_file_copy_range() into filesystem methods where they
currently return failures.
[Amir] overlayfs is not responsible of executing the fallback.
It is the responsibility of the underlying filesystem.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Add llseek op for SEEK_DATA and SEEK_HOLE.
Improves xfstests/285,286,436,445,448 and 490
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Copychunk allows source and target to be on the same file.
For details on restrictions see MS-SMB2 3.3.5.15.6
Signed-off-by: Kovtunenko Oleksandr <alexander198961@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Useful for improved copy performance as well as for
applications which query allocated ranges of sparse
files.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Missing parameter that should be displayed in the mount list
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Kenneth D'souza <kdsouza@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reconnecting after server or network failure can be improved
(to maintain availability and protect data integrity) by allowing
the client to choose the default persistent (or resilient)
handle timeout in some use cases. Today we default to 0 which lets
the server pick the default timeout (usually 120 seconds) but this
can be problematic for some workloads. Add the new mount parameter
to cifs.ko for SMB3 mounts "handletimeout" which enables the user
to override the default handle timeout for persistent (mount
option "persistenthandles") or resilient handles (mount option
"resilienthandles"). Maximum allowed is 16 minutes (960000 ms).
Units for the timeout are expressed in milliseconds. See
section 2.2.14.2.12 and 2.2.31.3 of the MS-SMB2 protocol
specification for more information.
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
CC: Stable <stable@vger.kernel.org>
dedupe_file_range operations is combiled into remap_file_range.
But it's always skipped for dedupe operations in function
cifs_remap_file_range.
Example to test:
Before this patch:
# dd if=/dev/zero of=cifs/file bs=1M count=1
# xfs_io -c "dedupe cifs/file 4k 64k 4k" cifs/file
XFS_IOC_FILE_EXTENT_SAME: Invalid argument
After this patch:
# dd if=/dev/zero of=cifs/file bs=1M count=1
# xfs_io -c "dedupe cifs/file 4k 64k 4k" cifs/file
XFS_IOC_FILE_EXTENT_SAME: Operation not supported
Influence for xfstests:
generic/091
generic/112
generic/127
generic/263
These tests report this error "do_copy_range:: Invalid
argument" instead of "FIDEDUPERANGE: Invalid argument".
Because there are still two bugs cause these test failed.
https://bugzilla.kernel.org/show_bug.cgi?id=202935https://bugzilla.kernel.org/show_bug.cgi?id=202785
Signed-off-by: Xiaoli Feng <fengxiaoli0714@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
For kerberos mounts, the cruid is helpful to display in
/proc/mounts in order to tell which uid's krb5 cache we
got the ticket for and to tell in the multiuser krb5 case
which local users (uids) we have Kerberos authentic sessions
for.
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
We negotiate rsize mounts (and it can be overridden by user) to
typically 4MB, so using larger default I/O sizes from userspace
(changing to 1MB default i/o size returned by stat) the
performance is much better (and not just for long latency
network connections) in most use cases for SMB3 than the default I/O
size (which ends up being 128K for cp and can be even smaller for cp).
This can be 4x slower or worse depending on network latency.
By changing inode->blocksize from 32K (which was perhaps ok
for very old SMB1/CIFS) to a larger value, 1MB (but still less than
max size negotiated with the server which is 4MB, in order to minimize
risk) it significantly increases performance for the
noncached case, and slightly increases it for the cached case.
This can be changed by the user on mount (specifying bsize=
values from 16K to 16MB) to tune better for performance
for applications that depend on blocksize.
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
CC: Stable <stable@vger.kernel.org>
This patch will make use of DFS cache routines where appropriate and
do not always request a new referral from server.
Signed-off-by: Paulo Alcantara <palcantara@suse.de>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
There is a null check on dst_file->private data which suggests
it can be potentially null. However, before this check, pointer
smb_file_target is derived from dst_file->private and dereferenced
in the call to tlink_tcon, hence there is a potential null pointer
deference.
Fix this by assigning smb_file_target and target_tcon after the
null pointer sanity checks.
Detected by CoverityScan, CID#1475302 ("Dereference before null check")
Fixes: 04b38d6012 ("vfs: pull btrfs clone API to vfs layer")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
With direct read/write functions implemented, add them to file_operations.
Dircet I/O is used under two conditions:
1. When mounting with "cache=none", CIFS uses direct I/O for all user file
data transfer.
2. When opening a file with O_DIRECT, CIFS uses direct I/O for all data
transfer on this file.
Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Rework the vfs_clone_file_range and vfs_dedupe_file_range infrastructure to use
a common .remap_file_range method and supply generic bounds and sanity checking
functions that are shared with the data write path. The current VFS
infrastructure has problems with rlimit, LFS file sizes, file time stamps,
maximum filesystem file sizes, stripping setuid bits, etc and so they are
addressed in these commits.
We also introduce the ability for the ->remap_file_range methods to return short
clones so that clones for vfs_copy_file_range() don't get rejected if the entire
range can't be cloned. It also allows filesystems to sliently skip deduplication
of partial EOF blocks if they are not capable of doing so without requiring
errors to be thrown to userspace.
All existing filesystems are converted to user the new .remap_file_range method,
and both XFS and ocfs2 are modified to make use of the new generic checking
infrastructure.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJb29gEAAoJEK3oKUf0dfodpOAQAL2VbHjvKXEwNMDTKscSRMmZ
Z0xXo3gamFKQ+VGOqy2g2lmAYQs9SAnTuCGTJ7zIAp7u+q8gzUy5FzKAwLS4Id6L
8siaY6nzlicfO04d0MdXnWz0f3xykChgzfdQfVUlUi7WrDioBUECLPmx4a+USsp1
DQGjLOZfoOAmn2rijdnH9RTEaHqg+8mcTaLN9TRav4gGqrWxldFKXw2y6ouFC7uo
/hxTRNXR9VI+EdbDelwBNXl9nU9gQA0WLOvRKwgUrtv6bSJohTPsmXt7EbBtNcVR
cl3zDNc1sLD1bLaRLEUAszI/33wXaaQgom1iB51obIcHHef+JxRNG/j6rUMfzxZI
VaauGv5EIvtaKN0LTAqVVLQ8t2MQFYfOr8TykmO+1UFog204aKRANdVMHDSjxD/0
dTGKJGcq+HnKQ+JHDbTdvuXEL8sUUl1FiLjOQbZPw63XmuddLKFUA2TOjXn6htbU
1h1MG5d9KjGLpabp2BQheczD08NuSmcrOBNt7IoeI3+nxr3HpMwprfB9TyaERy9X
iEgyVXmjjc9bLLRW7A2wm77aW64NvPs51wKMnvuNgNwnCewrGS6cB8WVj2zbQjH1
h3f3nku44s9ctNPSBzb/sJLnpqmZQ5t0oSmrMSN+5+En6rNTacoJCzxHRJBA7z/h
Z+C6y1GTZw0euY6Zjiwu
=CE/A
-----END PGP SIGNATURE-----
Merge tag 'xfs-4.20-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull vfs dedup fixes from Dave Chinner:
"This reworks the vfs data cloning infrastructure.
We discovered many issues with these interfaces late in the 4.19 cycle
- the worst of them (data corruption, setuid stripping) were fixed for
XFS in 4.19-rc8, but a larger rework of the infrastructure fixing all
the problems was needed. That rework is the contents of this pull
request.
Rework the vfs_clone_file_range and vfs_dedupe_file_range
infrastructure to use a common .remap_file_range method and supply
generic bounds and sanity checking functions that are shared with the
data write path. The current VFS infrastructure has problems with
rlimit, LFS file sizes, file time stamps, maximum filesystem file
sizes, stripping setuid bits, etc and so they are addressed in these
commits.
We also introduce the ability for the ->remap_file_range methods to
return short clones so that clones for vfs_copy_file_range() don't get
rejected if the entire range can't be cloned. It also allows
filesystems to sliently skip deduplication of partial EOF blocks if
they are not capable of doing so without requiring errors to be thrown
to userspace.
Existing filesystems are converted to user the new remap_file_range
method, and both XFS and ocfs2 are modified to make use of the new
generic checking infrastructure"
* tag 'xfs-4.20-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (28 commits)
xfs: remove [cm]time update from reflink calls
xfs: remove xfs_reflink_remap_range
xfs: remove redundant remap partial EOF block checks
xfs: support returning partial reflink results
xfs: clean up xfs_reflink_remap_blocks call site
xfs: fix pagecache truncation prior to reflink
ocfs2: remove ocfs2_reflink_remap_range
ocfs2: support partial clone range and dedupe range
ocfs2: fix pagecache truncation prior to reflink
ocfs2: truncate page cache for clone destination file before remapping
vfs: clean up generic_remap_file_range_prep return value
vfs: hide file range comparison function
vfs: enable remap callers that can handle short operations
vfs: plumb remap flags through the vfs dedupe functions
vfs: plumb remap flags through the vfs clone functions
vfs: make remap_file_range functions take and return bytes completed
vfs: remap helper should update destination inode metadata
vfs: pass remap flags to generic_remap_checks
vfs: pass remap flags to generic_remap_file_range_prep
vfs: combine the clone and dedupe into a single remap_file_range
...
Change the remap_file_range functions to take a number of bytes to
operate upon and return the number of bytes they operated on. This is a
requirement for allowing fs implementations to return short clone/dedupe
results to the user, which will enable us to obey resource limits in a
graceful manner.
A subsequent patch will enable copy_file_range to signal to the
->clone_file_range implementation that it can handle a short length,
which will be returned in the function's return value. For now the
short return is not implemented anywhere so the behavior won't change --
either copy_file_range manages to clone the entire range or it tries an
alternative.
Neither clone ioctl can take advantage of this, alas.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Combine the clone_file_range and dedupe_file_range operations into a
single remap_file_range file operation dispatch since they're
fundamentally the same operation. The differences between the two can
be made in the prep functions.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Clarify meaning (in comments) meaning of various
options for debug messages in cifs.ko. Also fixed
trivial formatting/style issue with previous patch.
Signed-off-by: Steve French <stfrench@microsoft.com>
Currently, no messages are printed when mounting a CIFS filesystem and
no debug configuration is enabled.
However, a CIFS mount information is valuable when troubleshooting
and/or forensic analyzing a system and finding out if was a CIFS
endpoint mount attempted.
Other filesystems such as XFS, EXT* does issue a printk() when mounting
their filesystems.
A terse log message is printed only if cifsFYI is not enabled. Otherwise,
the default full debug message is printed.
In order to not clutter and classify correctly the event messages, these
are logged as KERN_INFO level.
Sample mount operations:
[root@corinthians ~]# mount -o user=administrator //172.25.250.18/c$ /mnt
(non-existent system)
[root@corinthians ~]# mount -o user=administrator //172.25.250.19/c$ /mnt
(Valid system)
Kernel message log for the mount operations:
[ 450.464543] CIFS: Attempting to mount //172.25.250.18/c$
[ 456.478186] CIFS VFS: Error connecting to socket. Aborting operation.
[ 456.478381] CIFS VFS: cifs_mount failed w/return code = -113
[ 467.688866] CIFS: Attempting to mount //172.25.250.19/c$
Signed-off-by: Rodrigo Freire <rfreire@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
tcon->Flags is only used by SMB1 code and changing it is not permanent
(you lose the setting on tcon reconnect).
* Move the setting to superblock flags (per mount-points).
* Make automount callback exit early when flag present
* Make dfs resolving happening in mount syscall exit early if flag present
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
/proc/fs/cifs/Stats when CONFIG_CIFS_STATS2 is enabled logs 'slow'
responses, but depending on the server you are debugging a
one second timeout may be too fast, so allow setting it to
a larger number of seconds via new module parameter
/sys/module/cifs/parameters/slow_rsp_threshold
or via modprobe:
slow_rsp_threshold:Amount of time (in seconds) to wait before
logging that a response is delayed.
Default: 1 (if set to 0 disables msg). (uint)
Recommended values are 0 (disabled) to 32767 (9 hours) with
the default remaining as 1 second.
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
note smb3 (and common more modern servers) in the module description
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reported-by: Xiaoli Feng <xifeng@redhat.com>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
snapshot mounts were not marked as read-only and did not display the snapshot
time (in /proc/mounts) specified on mount
With this patch - note that can not write to the snapshot mount (see "ro" in
/proc/mounts line) and also the missing snapshot timewarp token time is
dumped. Sample line from /proc/mounts with the patch:
//127.0.0.1/scratch /mnt2 smb3 ro,relatime,vers=default,cache=strict,username=testuser,domain=,uid=0,noforceuid,gid=0,noforcegid,addr=127.0.0.1,file_mode=0755,dir_mode=0755,soft,nounix,serverino,mapposix,noperm,rsize=1048576,wsize=1048576,echo_interval=60,snapshot=1234567,actimeo=1 0 0
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Paulo Alcantara <palcantara@suse.de>
We really, really want to be encouraging use of secure dialects,
and SMB3.1.1 offers useful security features, and will soon
be the recommended dialect for many use cases. Simplify the code
by removing the CONFIG_CIFS_SMB311 ifdef so users don't disable
it in the build, and create compatibility and/or security issues
with modern servers - many of which have been supporting this
dialect for multiple years.
Also clarify some of the Kconfig text for cifs.ko about
SMB3.1.1 and current supported features in the module.
Signed-off-by: Steve French <stfrench@microsoft.com>
Acked-by: Aurelien Aptel <aaptel@suse.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Fil in the correct namelen (typically 255 not 4096) in the
statfs response and also fill in a reasonably unique fsid
(in this case taken from the volume id, and the creation time
of the volume).
In the case of the POSIX statfs all fields are now filled in,
and in the case of non-POSIX mounts, all fields are filled
in which can be.
Signed-off-by: Steve French <stfrench@gmail.com>
CC: Stable <stable@vger.kernel.org>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
if mounting as smb3 do not allow cifs (vers=1.0) or insecure vers=2.0
mounts.
For example:
root@smf-Thinkpad-P51:~/cifs-2.6# mount -t smb3 //127.0.0.1/scratch /mnt -o username=testuser,password=Testpass1
root@smf-Thinkpad-P51:~/cifs-2.6# umount /mnt
root@smf-Thinkpad-P51:~/cifs-2.6# mount -t smb3 //127.0.0.1/scratch /mnt -o username=testuser,password=Testpass1,vers=1.0
mount: /mnt: wrong fs type, bad option, bad superblock on //127.0.0.1/scratch ...
root@smf-Thinkpad-P51:~/cifs-2.6# dmesg | grep smb3
[ 4302.200122] CIFS VFS: vers=1.0 (cifs) not permitted when mounting with smb3
root@smf-Thinkpad-P51:~/cifs-2.6# mount -t smb3 //127.0.0.1/scratch /mnt -o username=testuser,password=Testpass1,vers=3.11
Signed-off-by: Steve French <stfrench@microsoft.com>
Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Reviewed-by: Sachin Prabhu <sprabhu@redhat.com>
If "posix" (or synonym "unix" for backward compatibility) specified on mount,
and server advertises support for SMB3.11 POSIX negotiate context, then
enable the new posix extensions on the tcon. This can be viewed by
looking for "posix" in the mount options displayed by /proc/mounts
for that mount (ie if posix extensions allowed by server and the
experimental POSIX extensions also requested on the mount by specifying
"posix" at mount time).
Also add check to warn user if conflicting unix/nounix or posix/noposix specified
on mount.
Signed-off-by: Steve French <smfrench@gmail.com>
To improve security it may be helpful to have additional ways to restrict the
ability to override the default dialects (SMB2.1, SMB3 and SMB3.02) on mount
with old dialects (CIFS/SMB1 and SMB2) since vers=1.0 (CIFS/SMB1) and vers=2.0
are weaker and less secure.
Add a module parameter "disable_legacy_dialects"
(/sys/module/cifs/parameters/disable_legacy_dialects) which can be set to
1 (or equivalently Y) to forbid use of vers=1.0 or vers=2.0 on mount.
Also cleans up a few build warnings about globals for various module parms.
Signed-off-by: Steve French <smfrench@gmail.com>
Note which ones of the module params are cifs dialect only
(N/A for default dialect now that has moved to SMB2.1 or later)
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
We really don't want to be encouraging people to use the old
(less secure) cifs dialect (SMB1) and it can be confusing for them
with SMB3 (or later) being recommended but the module name is cifs.
Add a module alias for "smb3" to cifs.ko to make this less confusing.
Signed-off-by: Steve French <smfrench@gmail.com>
In SMB2/SMB3 unlike in cifs we unnecessarily open the root of the share
over and over again in various places during mount and path revalidation
and also in statfs. This patch cuts redundant traffic (opens and closes)
by simply keeping the directory handle for the root around (and reopening
it as needed on reconnect), so query calls don't require three round
trips to copmlete - just one, and eases load on network, client and
server (on mount alone, cuts network traffic by more than a third).
Also add a new cifs mount parm "nohandlecache" to allow users whose
servers might have resource constraints (eg in case they have a server
with so many users connecting to it that this extra handle per mount
could possibly be a resource concern).
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
As with NFS, which ignores sync on directory handles,
fsync on a directory handle is a noop for CIFS/SMB3.
Do not return an error on it. It breaks some database
apps otherwise.
Signed-off-by: Steve French <smfrench@gmail.com>
CC: Stable <stable@vger.kernel.org>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
* prepare for SMB3.11 pre-auth integrity
* enable sha512 when SMB311 is enabled in Kconfig
* add sha512 as a soft dependency
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <smfrench@gmail.com>
CC: Stable <stable@vger.kernel.org>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
cache objects. This is good, but still leaves a lot of kernel memory
available to be copied to/from userspace in the face of bugs. To further
restrict what memory is available for copying, this creates a way to
whitelist specific areas of a given slab cache object for copying to/from
userspace, allowing much finer granularity of access control. Slab caches
that are never exposed to userspace can declare no whitelist for their
objects, thereby keeping them unavailable to userspace via dynamic copy
operations. (Note, an implicit form of whitelisting is the use of constant
sizes in usercopy operations and get_user()/put_user(); these bypass all
hardened usercopy checks since these sizes cannot change at runtime.)
This new check is WARN-by-default, so any mistakes can be found over the
next several releases without breaking anyone's system.
The series has roughly the following sections:
- remove %p and improve reporting with offset
- prepare infrastructure and whitelist kmalloc
- update VFS subsystem with whitelists
- update SCSI subsystem with whitelists
- update network subsystem with whitelists
- update process memory with whitelists
- update per-architecture thread_struct with whitelists
- update KVM with whitelists and fix ioctl bug
- mark all other allocations as not whitelisted
- update lkdtm for more sensible test overage
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Kees Cook <kees@outflux.net>
iQIcBAABCgAGBQJabvleAAoJEIly9N/cbcAmO1kQAJnjVPutnLSbnUteZxtsv7W4
43Cggvokfxr6l08Yh3hUowNxZVKjhF9uwMVgRRg9Nl5WdYCN+vCQbHz+ZdzGJXKq
cGqdKWgexMKX+aBdNDrK7BphUeD46sH7JWR+a/lDV/BgPxBCm9i5ZZCgXbPP89AZ
NpLBji7gz49wMsnm/x135xtNlZ3dG0oKETzi7MiR+NtKtUGvoIszSKy5JdPZ4m8q
9fnXmHqmwM6uQFuzDJPt1o+D1fusTuYnjI7EgyrJRRhQ+BB3qEFZApXnKNDRS9Dm
uB7jtcwefJCjlZVCf2+PWTOEifH2WFZXLPFlC8f44jK6iRW2Nc+wVRisJ3vSNBG1
gaRUe/FSge68eyfQj5OFiwM/2099MNkKdZ0fSOjEBeubQpiFChjgWgcOXa5Bhlrr
C4CIhFV2qg/tOuHDAF+Q5S96oZkaTy5qcEEwhBSW15ySDUaRWFSrtboNt6ZVOhug
d8JJvDCQWoNu1IQozcbv6xW/Rk7miy8c0INZ4q33YUvIZpH862+vgDWfTJ73Zy9H
jR/8eG6t3kFHKS1vWdKZzOX1bEcnd02CGElFnFYUEewKoV7ZeeLsYX7zodyUAKyi
Yp5CImsDbWWTsptBg6h9nt2TseXTxYCt2bbmpJcqzsqSCUwOQNQ4/YpuzLeG0ihc
JgOmUnQNJWCTwUUw5AS1
=tzmJ
-----END PGP SIGNATURE-----
Merge tag 'usercopy-v4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardened usercopy whitelisting from Kees Cook:
"Currently, hardened usercopy performs dynamic bounds checking on slab
cache objects. This is good, but still leaves a lot of kernel memory
available to be copied to/from userspace in the face of bugs.
To further restrict what memory is available for copying, this creates
a way to whitelist specific areas of a given slab cache object for
copying to/from userspace, allowing much finer granularity of access
control.
Slab caches that are never exposed to userspace can declare no
whitelist for their objects, thereby keeping them unavailable to
userspace via dynamic copy operations. (Note, an implicit form of
whitelisting is the use of constant sizes in usercopy operations and
get_user()/put_user(); these bypass all hardened usercopy checks since
these sizes cannot change at runtime.)
This new check is WARN-by-default, so any mistakes can be found over
the next several releases without breaking anyone's system.
The series has roughly the following sections:
- remove %p and improve reporting with offset
- prepare infrastructure and whitelist kmalloc
- update VFS subsystem with whitelists
- update SCSI subsystem with whitelists
- update network subsystem with whitelists
- update process memory with whitelists
- update per-architecture thread_struct with whitelists
- update KVM with whitelists and fix ioctl bug
- mark all other allocations as not whitelisted
- update lkdtm for more sensible test overage"
* tag 'usercopy-v4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (38 commits)
lkdtm: Update usercopy tests for whitelisting
usercopy: Restrict non-usercopy caches to size 0
kvm: x86: fix KVM_XEN_HVM_CONFIG ioctl
kvm: whitelist struct kvm_vcpu_arch
arm: Implement thread_struct whitelist for hardened usercopy
arm64: Implement thread_struct whitelist for hardened usercopy
x86: Implement thread_struct whitelist for hardened usercopy
fork: Provide usercopy whitelisting for task_struct
fork: Define usercopy region in thread_stack slab caches
fork: Define usercopy region in mm_struct slab caches
net: Restrict unwhitelisted proto caches to size 0
sctp: Copy struct sctp_sock.autoclose to userspace using put_user()
sctp: Define usercopy region in SCTP proto slab cache
caif: Define usercopy region in caif proto slab cache
ip: Define usercopy region in IP proto slab cache
net: Define usercopy region in struct proto slab cache
scsi: Define usercopy region in scsi_sense_cache slab cache
cifs: Define usercopy region in cifs_request slab cache
vxfs: Define usercopy region in vxfs_inode slab cache
ufs: Define usercopy region in ufs_inode_cache slab cache
...
add splice_write support in cifs vfs using iter_file_splice_write
Signed-off-by: Andrés Souto <kai670@gmail.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Add "rdma" to CIFS mount options to connect to SMB Direct.
Add checks to validate this is used on SMB 3.X dialects.
To connect to SMBDirect, use "mount.cifs -o rdma,vers=3.x".
At the time of this patch, 3.x can be 3.0, 3.02 or 3.1.1.
Signed-off-by: Long Li <longli@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Acked-by: Ronnie Sahlberg <lsahlber.redhat.com>
CIFS request buffers, stored in the cifs_request slab cache, need to be
copied to/from userspace.
cache object allocation:
fs/cifs/cifsfs.c:
cifs_init_request_bufs():
...
cifs_req_poolp = mempool_create_slab_pool(cifs_min_rcv,
cifs_req_cachep);
fs/cifs/misc.c:
cifs_buf_get():
...
ret_buf = mempool_alloc(cifs_req_poolp, GFP_NOFS);
...
return ret_buf;
In support of usercopy hardening, this patch defines a region in the
cifs_request slab cache in which userspace copy operations are allowed.
This region is known as the slab cache's usercopy region. Slab caches
can now check that each dynamically sized copy operation involving
cache-managed memory falls entirely within the slab's usercopy region.
This patch is verbatim from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.
Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, provide usage trace]
Cc: Steve French <sfrench@samba.org>
Cc: linux-cifs@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
This is a pure automated search-and-replace of the internal kernel
superblock flags.
The s_flags are now called SB_*, with the names and the values for the
moment mirroring the MS_* flags that they're equivalent to.
Note how the MS_xyz flags are the ones passed to the mount system call,
while the SB_xyz flags are what we then use in sb->s_flags.
The script to do this was:
# places to look in; re security/*: it generally should *not* be
# touched (that stuff parses mount(2) arguments directly), but
# there are two places where we really deal with superblock flags.
FILES="drivers/mtd drivers/staging/lustre fs ipc mm \
include/linux/fs.h include/uapi/linux/bfs_fs.h \
security/apparmor/apparmorfs.c security/apparmor/include/lib.h"
# the list of MS_... constants
SYMS="RDONLY NOSUID NODEV NOEXEC SYNCHRONOUS REMOUNT MANDLOCK \
DIRSYNC NOATIME NODIRATIME BIND MOVE REC VERBOSE SILENT \
POSIXACL UNBINDABLE PRIVATE SLAVE SHARED RELATIME KERNMOUNT \
I_VERSION STRICTATIME LAZYTIME SUBMOUNT NOREMOTELOCK NOSEC BORN \
ACTIVE NOUSER"
SED_PROG=
for i in $SYMS; do SED_PROG="$SED_PROG -e s/MS_$i/SB_$i/g"; done
# we want files that contain at least one of MS_...,
# with fs/namespace.c and fs/pnode.c excluded.
L=$(for i in $SYMS; do git grep -w -l MS_$i $FILES; done| sort|uniq|grep -v '^fs/namespace.c'|grep -v '^fs/pnode.c')
for f in $L; do sed -i $f $SED_PROG; done
Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
memory leak was found by kmemleak. exit_cifs_spnego
should be called before cifs module removed, or
cifs root_cred will not be released.
kmemleak report:
unreferenced object 0xffff880070a3ce40 (size 192):
backtrace:
kmemleak_alloc+0x4a/0xa0
kmem_cache_alloc+0xc7/0x1d0
prepare_kernel_cred+0x20/0x120
init_cifs_spnego+0x2d/0x170 [cifs]
0xffffffffc07801f3
do_one_initcall+0x51/0x1b0
do_init_module+0x60/0x1fd
load_module+0x161e/0x1b60
SYSC_finit_module+0xa9/0x100
SyS_finit_module+0xe/0x10
Signed-off-by: Shu Wang <shuwang@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
CC: Stable <stable@vger.kernel.org>
callers can more safely get random bytes if they can block until the
CRNG is initialized.
Also print a warning if get_random_*() is called before the CRNG is
initialized. By default, only one single-line warning will be printed
per boot. If CONFIG_WARN_ALL_UNSEEDED_RANDOM is defined, then a
warning will be printed for each function which tries to get random
bytes before the CRNG is initialized. This can get spammy for certain
architecture types, so it is not enabled by default.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAllqXNUACgkQ8vlZVpUN
gaPtAgf/aUbXZuWYsDQzslHsbzEWi+qz4QgL885/w4L00pEImTTp91Q06SDxWhtB
KPvGnZHS3IofxBh2DC+6AwN6dPMoWDCfYhhO6po3FSz0DiPRIQCTuvOb8fhKY1X7
rTdDq2xtDxPGxJ25bMJtlrgzH2XlXPpVyPUeoc9uh87zUK5aesXpUn9kBniRexoz
ume+M/cDzPKkwNQpbLq8vzhNjoWMVv0FeW2akVvrjkkWko8nZLZ0R/kIyKQlRPdG
LZDXcz0oTHpDS6+ufEo292ZuWm2IGer2YtwHsKyCAsyEWsUqBz2yurtkSj3mAVyC
hHafyS+5WNaGdgBmg0zJxxwn5qxxLg==
=ua7p
-----END PGP SIGNATURE-----
Merge tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random
Pull random updates from Ted Ts'o:
"Add wait_for_random_bytes() and get_random_*_wait() functions so that
callers can more safely get random bytes if they can block until the
CRNG is initialized.
Also print a warning if get_random_*() is called before the CRNG is
initialized. By default, only one single-line warning will be printed
per boot. If CONFIG_WARN_ALL_UNSEEDED_RANDOM is defined, then a
warning will be printed for each function which tries to get random
bytes before the CRNG is initialized. This can get spammy for certain
architecture types, so it is not enabled by default"
* tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random:
random: reorder READ_ONCE() in get_random_uXX
random: suppress spammy warnings about unseeded randomness
random: warn when kernel uses unseeded randomness
net/route: use get_random_int for random counter
net/neighbor: use get_random_u32 for 32-bit hash random
rhashtable: use get_random_u32 for hash_rnd
ceph: ensure RNG is seeded before using
iscsi: ensure RNG is seeded before use
cifs: use get_random_u32 for 32-bit lock random
random: add get_random_{bytes,u32,u64,int,long,once}_wait family
random: add wait_for_random_bytes() API
Remove the CONFIG_CIFS_SMB2 ifdef and Kconfig option since they
must always be on now.
For various security reasons, SMB3 and later are STRONGLY preferred
over CIFS and older dialects, and SMB3 (and later) will now be
the default dialects so we do not want to allow them to be
ifdeffed out.
In the longer term, we may be able to make older CIFS support
disableable in Kconfig with a new set of #ifdef, but we always
want SMB3 and later support enabled.
Signed-off-by: Steven French <smfrench@gmail.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Using get_random_u32 here is faster, more fitting of the use case, and
just as cryptographically secure. It also has the benefit of providing
better randomness at early boot, which is sometimes when this is used.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Steve French <sfrench@samba.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Pull cifs fixes from Steve French:
"Various fixes for stable for CIFS/SMB3 especially for better
interoperability for SMB3 to Macs.
It also includes Pavel's improvements to SMB3 async i/o support
(which is much faster now)"
* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
CIFS: add misssing SFM mapping for doublequote
SMB3: Work around mount failure when using SMB3 dialect to Macs
cifs: fix CIFS_IOC_GET_MNT_INFO oops
CIFS: fix mapping of SFM_SPACE and SFM_PERIOD
CIFS: fix oplock break deadlocks
cifs: fix CIFS_ENUMERATE_SNAPSHOTS oops
cifs: fix leak in FSCTL_ENUM_SNAPS response handling
Set unicode flag on cifs echo request to avoid Mac error
CIFS: Add asynchronous write support through kernel AIO
CIFS: Add asynchronous read support through kernel AIO
CIFS: Add asynchronous context to support kernel AIO
cifs: fix IPv6 link local, with scope id, address parsing
cifs: small underflow in cnvrtDosUnixTm()
When the final cifsFileInfo_put() is called from cifsiod and an oplock
break work is queued, lockdep complains loudly:
=============================================
[ INFO: possible recursive locking detected ]
4.11.0+ #21 Not tainted
---------------------------------------------
kworker/0:2/78 is trying to acquire lock:
("cifsiod"){++++.+}, at: flush_work+0x215/0x350
but task is already holding lock:
("cifsiod"){++++.+}, at: process_one_work+0x255/0x8e0
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock("cifsiod");
lock("cifsiod");
*** DEADLOCK ***
May be due to missing lock nesting notation
2 locks held by kworker/0:2/78:
#0: ("cifsiod"){++++.+}, at: process_one_work+0x255/0x8e0
#1: ((&wdata->work)){+.+...}, at: process_one_work+0x255/0x8e0
stack backtrace:
CPU: 0 PID: 78 Comm: kworker/0:2 Not tainted 4.11.0+ #21
Workqueue: cifsiod cifs_writev_complete
Call Trace:
dump_stack+0x85/0xc2
__lock_acquire+0x17dd/0x2260
? match_held_lock+0x20/0x2b0
? trace_hardirqs_off_caller+0x86/0x130
? mark_lock+0xa6/0x920
lock_acquire+0xcc/0x260
? lock_acquire+0xcc/0x260
? flush_work+0x215/0x350
flush_work+0x236/0x350
? flush_work+0x215/0x350
? destroy_worker+0x170/0x170
__cancel_work_timer+0x17d/0x210
? ___preempt_schedule+0x16/0x18
cancel_work_sync+0x10/0x20
cifsFileInfo_put+0x338/0x7f0
cifs_writedata_release+0x2a/0x40
? cifs_writedata_release+0x2a/0x40
cifs_writev_complete+0x29d/0x850
? preempt_count_sub+0x18/0xd0
process_one_work+0x304/0x8e0
worker_thread+0x9b/0x6a0
kthread+0x1b2/0x200
? process_one_work+0x8e0/0x8e0
? kthread_create_on_node+0x40/0x40
ret_from_fork+0x31/0x40
This is a real warning. Since the oplock is queued on the same
workqueue this can deadlock if there is only one worker thread active
for the workqueue (which will be the case during memory pressure when
the rescuer thread is handling it).
Furthermore, there is at least one other kind of hang possible due to
the oplock break handling if there is only worker. (This can be
reproduced without introducing memory pressure by having passing 1 for
the max_active parameter of cifsiod.) cifs_oplock_break() can wait
indefintely in the filemap_fdatawait() while the cifs_writev_complete()
work is blocked:
sysrq: SysRq : Show Blocked State
task PC stack pid father
kworker/0:1 D 0 16 2 0x00000000
Workqueue: cifsiod cifs_oplock_break
Call Trace:
__schedule+0x562/0xf40
? mark_held_locks+0x4a/0xb0
schedule+0x57/0xe0
io_schedule+0x21/0x50
wait_on_page_bit+0x143/0x190
? add_to_page_cache_lru+0x150/0x150
__filemap_fdatawait_range+0x134/0x190
? do_writepages+0x51/0x70
filemap_fdatawait_range+0x14/0x30
filemap_fdatawait+0x3b/0x40
cifs_oplock_break+0x651/0x710
? preempt_count_sub+0x18/0xd0
process_one_work+0x304/0x8e0
worker_thread+0x9b/0x6a0
kthread+0x1b2/0x200
? process_one_work+0x8e0/0x8e0
? kthread_create_on_node+0x40/0x40
ret_from_fork+0x31/0x40
dd D 0 683 171 0x00000000
Call Trace:
__schedule+0x562/0xf40
? mark_held_locks+0x29/0xb0
schedule+0x57/0xe0
io_schedule+0x21/0x50
wait_on_page_bit+0x143/0x190
? add_to_page_cache_lru+0x150/0x150
__filemap_fdatawait_range+0x134/0x190
? do_writepages+0x51/0x70
filemap_fdatawait_range+0x14/0x30
filemap_fdatawait+0x3b/0x40
filemap_write_and_wait+0x4e/0x70
cifs_flush+0x6a/0xb0
filp_close+0x52/0xa0
__close_fd+0xdc/0x150
SyS_close+0x33/0x60
entry_SYSCALL_64_fastpath+0x1f/0xbe
Showing all locks held in the system:
2 locks held by kworker/0:1/16:
#0: ("cifsiod"){.+.+.+}, at: process_one_work+0x255/0x8e0
#1: ((&cfile->oplock_break)){+.+.+.}, at: process_one_work+0x255/0x8e0
Showing busy workqueues and worker pools:
workqueue cifsiod: flags=0xc
pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/1
in-flight: 16:cifs_oplock_break
delayed: cifs_writev_complete, cifs_echo_request
pool 0: cpus=0 node=0 flags=0x0 nice=0 hung=0s workers=3 idle: 750 3
Fix these problems by creating a a new workqueue (with a rescuer) for
the oplock break work.
Signed-off-by: Rabin Vincent <rabinv@axis.com>
Signed-off-by: Steve French <smfrench@gmail.com>
CC: Stable <stable@vger.kernel.org>
Pull networking updates from David Millar:
"Here are some highlights from the 2065 networking commits that
happened this development cycle:
1) XDP support for IXGBE (John Fastabend) and thunderx (Sunil Kowuri)
2) Add a generic XDP driver, so that anyone can test XDP even if they
lack a networking device whose driver has explicit XDP support
(me).
3) Sparc64 now has an eBPF JIT too (me)
4) Add a BPF program testing framework via BPF_PROG_TEST_RUN (Alexei
Starovoitov)
5) Make netfitler network namespace teardown less expensive (Florian
Westphal)
6) Add symmetric hashing support to nft_hash (Laura Garcia Liebana)
7) Implement NAPI and GRO in netvsc driver (Stephen Hemminger)
8) Support TC flower offload statistics in mlxsw (Arkadi Sharshevsky)
9) Multiqueue support in stmmac driver (Joao Pinto)
10) Remove TCP timewait recycling, it never really could possibly work
well in the real world and timestamp randomization really zaps any
hint of usability this feature had (Soheil Hassas Yeganeh)
11) Support level3 vs level4 ECMP route hashing in ipv4 (Nikolay
Aleksandrov)
12) Add socket busy poll support to epoll (Sridhar Samudrala)
13) Netlink extended ACK support (Johannes Berg, Pablo Neira Ayuso,
and several others)
14) IPSEC hw offload infrastructure (Steffen Klassert)"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2065 commits)
tipc: refactor function tipc_sk_recv_stream()
tipc: refactor function tipc_sk_recvmsg()
net: thunderx: Optimize page recycling for XDP
net: thunderx: Support for XDP header adjustment
net: thunderx: Add support for XDP_TX
net: thunderx: Add support for XDP_DROP
net: thunderx: Add basic XDP support
net: thunderx: Cleanup receive buffer allocation
net: thunderx: Optimize CQE_TX handling
net: thunderx: Optimize RBDR descriptor handling
net: thunderx: Support for page recycling
ipx: call ipxitf_put() in ioctl error path
net: sched: add helpers to handle extended actions
qed*: Fix issues in the ptp filter config implementation.
qede: Fix concurrency issue in PTP Tx path processing.
stmmac: Add support for SIMATIC IOT2000 platform
net: hns: fix ethtool_get_strings overflow in hns driver
tcp: fix wraparound issue in tcp_lp
bpf, arm64: fix jit branch offset related to ldimm64
bpf, arm64: implement jiting of BPF_XADD
...
Pull block layer updates from Jens Axboe:
- Add BFQ IO scheduler under the new blk-mq scheduling framework. BFQ
was initially a fork of CFQ, but subsequently changed to implement
fairness based on B-WF2Q+, a modified variant of WF2Q. BFQ is meant
to be used on desktop type single drives, providing good fairness.
From Paolo.
- Add Kyber IO scheduler. This is a full multiqueue aware scheduler,
using a scalable token based algorithm that throttles IO based on
live completion IO stats, similary to blk-wbt. From Omar.
- A series from Jan, moving users to separately allocated backing
devices. This continues the work of separating backing device life
times, solving various problems with hot removal.
- A series of updates for lightnvm, mostly from Javier. Includes a
'pblk' target that exposes an open channel SSD as a physical block
device.
- A series of fixes and improvements for nbd from Josef.
- A series from Omar, removing queue sharing between devices on mostly
legacy drivers. This helps us clean up other bits, if we know that a
queue only has a single device backing. This has been overdue for
more than a decade.
- Fixes for the blk-stats, and improvements to unify the stats and user
windows. This both improves blk-wbt, and enables other users to
register a need to receive IO stats for a device. From Omar.
- blk-throttle improvements from Shaohua. This provides a scalable
framework for implementing scalable priotization - particularly for
blk-mq, but applicable to any type of block device. The interface is
marked experimental for now.
- Bucketized IO stats for IO polling from Stephen Bates. This improves
efficiency of polled workloads in the presence of mixed block size
IO.
- A few fixes for opal, from Scott.
- A few pulls for NVMe, including a lot of fixes for NVMe-over-fabrics.
From a variety of folks, mostly Sagi and James Smart.
- A series from Bart, improving our exposed info and capabilities from
the blk-mq debugfs support.
- A series from Christoph, cleaning up how handle WRITE_ZEROES.
- A series from Christoph, cleaning up the block layer handling of how
we track errors in a request. On top of being a nice cleanup, it also
shrinks the size of struct request a bit.
- Removal of mg_disk and hd (sorry Linus) by Christoph. The former was
never used by platforms, and the latter has outlived it's usefulness.
- Various little bug fixes and cleanups from a wide variety of folks.
* 'for-4.12/block' of git://git.kernel.dk/linux-block: (329 commits)
block: hide badblocks attribute by default
blk-mq: unify hctx delay_work and run_work
block: add kblock_mod_delayed_work_on()
blk-mq: unify hctx delayed_run_work and run_work
nbd: fix use after free on module unload
MAINTAINERS: bfq: Add Paolo as maintainer for the BFQ I/O scheduler
blk-mq-sched: alloate reserved tags out of normal pool
mtip32xx: use runtime tag to initialize command header
scsi: Implement blk_mq_ops.show_rq()
blk-mq: Add blk_mq_ops.show_rq()
blk-mq: Show operation, cmd_flags and rq_flags names
blk-mq: Make blk_flags_show() callers append a newline character
blk-mq: Move the "state" debugfs attribute one level down
blk-mq: Unregister debugfs attributes earlier
blk-mq: Only unregister hctxs for which registration succeeded
blk-mq-debugfs: Rename functions for registering and unregistering the mq directory
blk-mq: Let blk_mq_debugfs_register() look up the queue name
blk-mq: Register <dev>/queue/mq after having registered <dev>/queue
ide-pm: always pass 0 error to ide_complete_rq in ide_do_devset
ide-pm: always pass 0 error to __blk_end_request_all
..
Allocate struct backing_dev_info separately instead of embedding it
inside superblock. This unifies handling of bdi among users.
CC: Steve French <sfrench@samba.org>
CC: linux-cifs@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
Conflicts were simply overlapping changes. In the net/ipv4/route.c
case the code had simply moved around a little bit and the same fix
was made in both 'net' and 'net-next'.
In the net/sched/sch_generic.c case a fix in 'net' happened at
the same time that a new argument was added to qdisc_hash_add().
Signed-off-by: David S. Miller <davem@davemloft.net>
The earlier changes to copy range for cifs unintentionally disabled the more
common form of server side copy.
The patch introduces the file_operations helper cifs_copy_file_range()
which is used by the syscall copy_file_range. The new file operations
helper allows us to perform server side copies for SMB2.0 and 2.1
servers as well as SMB 3.0+ servers which do not support the ioctl
FSCTL_DUPLICATE_EXTENTS_TO_FILE.
The new helper uses the ioctl FSCTL_SRV_COPYCHUNK_WRITE to perform
server side copies. The helper is called by vfs_copy_file_range() only
once an attempt to clone the file using the ioctl
FSCTL_DUPLICATE_EXTENTS_TO_FILE has failed.
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <smfrench@gmail.com>
There is an include loop between netdevice.h, dsa.h, devlink.h because
of NETDEV_ALIGN, making it impossible to use devlink structures in
dsa.h.
Break this loop by taking dsa.h out of netdevice.h, add a forward
declaration of dsa_switch_tree and netdev_set_default_ethtool_ops()
function, which is what netdevice.h requires.
No longer having dsa.h in netdevice.h means the includes in dsa.h no
longer get included. This breaks a few other files which depend on
these includes. Add these directly in the affected file.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change allows to encrypt packets if it is required by a server
for SMB sessions or tree connections.
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
List soft dependencies of cifs so that mkinitrd and dracut can include
the required helper modules.
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Cc: Steve French <sfrench@samba.org>
Pull cifs fixes from Steve French:
"This ncludes various cifs/smb3 bug fixes, mostly for stable as well.
In the next week I expect that Germano will have some reconnection
fixes, and also I expect to have the remaining pieces of the snapshot
enablement and SMB3 ACLs, but wanted to get this set of bug fixes in"
* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
cifs_get_root shouldn't use path with tree name
Fix default behaviour for empty domains and add domainauto option
cifs: use %16phN for formatting md5 sum
cifs: Fix smbencrypt() to stop pointing a scatterlist at the stack
CIFS: Fix a possible double locking of mutex during reconnect
CIFS: Fix a possible memory corruption during reconnect
CIFS: Fix a possible memory corruption in push locks
CIFS: Fix missing nls unload in smb2_reconnect()
CIFS: Decrease verbosity of ioctl call
SMB3: parsing for new snapshot timestamp mount parm
When a server returns the optional flag SMB_SHARE_IS_IN_DFS in response
to a tree connect, cifs_build_path_to_root() will return a pathname
which includes the hostname. This causes problems with cifs_get_root()
which separates each component and does a lookup for each component of
the path which in this case will incorrectly include looking up the
hostname component as a path component.
We encountered a problem with dfs shares hosted by a Netapp. When
connecting to nodes pointed to by the DFS share. The tree connect for
these nodes return SMB_SHARE_IS_IN_DFS resulting failures in lookup
in cifs_get_root().
RH bz: 1373153
The patch was tested against a Netapp simulator and by a user using an
actual Netapp server.
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Reported-by: Pierguido Lambri <plambri@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
If .readlink == NULL implies generic_readlink().
Generated by:
to_del="\.readlink.*=.*generic_readlink"
for i in `git grep -l $to_del`; do sed -i "/$to_del"/d $i; done
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Add "idsfromsid" mount option to indicate to cifs.ko that it should
try to retrieve the uid and gid owner fields from special sids in the
ACL if present. This first patch just adds the parsing for the mount
option.
Signed-off-by: Steve French <steve.french@primarydata.com>
Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
/sys/module/cifs/parameters should display the three
other module load time configuration settings for cifs.ko
Signed-off-by: Germano Percossi <germano.percossi@citrix.com>
Signed-off-by: Steve French <steve.french@primarydata.com>
Remove the global file_list_lock to simplify cifs/smb3 locking and
have spinlocks that more closely match the information they are
protecting.
Add new tcon->open_file_lock and file->file_info_lock spinlocks.
Locks continue to follow a heirachy,
cifs_socket --> cifs_ses --> cifs_tcon --> cifs_file
where global tcp_ses_lock still protects socket and cifs_ses, while the
the newer locks protect the lower level structure's information
(tcon and cifs_file respectively).
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <steve.french@primarydata.com>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Reviewed-by: Germano Percossi <germano.percossi@citrix.com>
GUIDs although random, and 16 bytes, need to be generated as
proper uuids.
Signed-off-by: Steve French <steve.french@primarydata.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Reported-by: David Goebels <davidgoe@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Pull more vfs updates from Al Viro:
">rename2() work from Miklos + current_time() from Deepa"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs: Replace current_fs_time() with current_time()
fs: Replace CURRENT_TIME_SEC with current_time() for inode timestamps
fs: Replace CURRENT_TIME with current_time() for inode timestamps
fs: proc: Delete inode time initializations in proc_alloc_inode()
vfs: Add current_time() api
vfs: add note about i_op->rename changes to porting
fs: rename "rename2" i_op to "rename"
vfs: remove unused i_op->rename
fs: make remaining filesystems use .rename2
libfs: support RENAME_NOREPLACE in simple_rename()
fs: support RENAME_NOREPLACE for local filesystems
ncpfs: fix unused variable warning
These inode operations are no longer used; remove them.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Fix memory leaks introduced by the patch
fs/cifs: make share unaccessible at root level mountable
Also move allocation of cifs_sb->prepath to cifs_setup_cifs_sb().
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Tested-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <smfrench@gmail.com>
if, when mounting //HOST/share/sub/dir/foo we can query /sub/dir/foo but
not any of the path components above:
- store the /sub/dir/foo prefix in the cifs super_block info
- in the superblock, set root dentry to the subpath dentry (instead of
the share root)
- set a flag in the superblock to remember it
- use prefixpath when building path from a dentry
fixes bso#8950
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
CC: Stable <stable@vger.kernel.org>
Reviewed-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
Right now, we send the tgid cross the wire. What we really want to send
though is a hashed fl_owner_t since samba treats this field as a generic
lockowner.
It turns out that because we enforce and release locks locally before
they are ever sent to the server, this patch makes no difference in
behavior. Still, setting OFD locks on the server using the process
pid seems wrong, so I think this patch still makes sense.
Signed-off-by: Jeff Layton <jlayton@poochiereds.net>
Signed-off-by: Steve French <smfrench@gmail.com>
Acked-by: Pavel Shilovsky <pshilovsky@samba.org>
Acked-by: Sachin Prabhu <sprabhu@redhat.com>
Remove some obsolete comments in the cifs inode_operations
structs that were pointed out by Stephen Rothwell.
CC: Stephen Rothwell <sfr@canb.auug.org.au>
CC: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: Sachin Prabhu <sprabhu@redhat.com>
Signed-off-by: Steve French <steve.french@primarydata.com>
The session key is the default keyring set for request_key operations.
This session key is revoked when the user owning the session logs out.
Any long running daemon processes started by this session ends up with
revoked session keyring which prevents these processes from using the
request_key mechanism from obtaining the krb5 keys.
The problem has been reported by a large number of autofs users. The
problem is also seen with multiuser mounts where the share may be used
by processes run by a user who has since logged out. A reproducer using
automount is available on the Red Hat bz.
The patch creates a new keyring which is used to cache cifs spnego
upcalls.
Red Hat bz: 1267754
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Reported-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <smfrench@gmail.com>
Pull cifs xattr updates from Al Viro:
"This is the remaining parts of the xattr work - the cifs bits"
* 'for-cifs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
cifs: Switch to generic xattr handlers
cifs: Fix removexattr for os2.* xattrs
cifs: Check for equality with ACL_TYPE_ACCESS and ACL_TYPE_DEFAULT
cifs: Fix xattr name checks
Use xattr handlers for resolving attribute names. The amount of setup
code required on cifs is nontrivial, so use the same get and set
functions for all handlers, with switch statements for the different
types of attributes in them.
The set_EA handler can handle NULL values, so we don't need a separate
removexattr function anymore. Remove the cifs_dbg statements related to
xattr name resolution; they don't add much. Don't build xattr.o when
CONFIG_CIFS_XATTR is not defined.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.
This promise never materialized. And unlikely will.
We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.
Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.
Let's stop pretending that pages in page cache are special. They are
not.
The changes are pretty straight-forward:
- <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
- page_cache_get() -> get_page();
- page_cache_release() -> put_page();
This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.
The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.
There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.
virtual patch
@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT
@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE
@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK
@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)
@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)
@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull vfs updates from Al Viro:
- Preparations of parallel lookups (the remaining main obstacle is the
need to move security_d_instantiate(); once that becomes safe, the
rest will be a matter of rather short series local to fs/*.c
- preadv2/pwritev2 series from Christoph
- assorted fixes
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (32 commits)
splice: handle zero nr_pages in splice_to_pipe()
vfs: show_vfsstat: do not ignore errors from show_devname method
dcache.c: new helper: __d_add()
don't bother with __d_instantiate(dentry, NULL)
untangle fsnotify_d_instantiate() a bit
uninline d_add()
replace d_add_unique() with saner primitive
quota: use lookup_one_len_unlocked()
cifs_get_root(): use lookup_one_len_unlocked()
nfs_lookup: don't bother with d_instantiate(dentry, NULL)
kill dentry_unhash()
ceph_fill_trace(): don't bother with d_instantiate(dn, NULL)
autofs4: don't bother with d_instantiate(dentry, NULL) in ->lookup()
configfs: move d_rehash() into configfs_create() for regular files
ceph: don't bother with d_rehash() in splice_dentry()
namei: teach lookup_slow() to skip revalidate
namei: massage lookup_slow() to be usable by lookup_one_len_unlocked()
lookup_one_len_unlocked(): use lookup_dcache()
namei: simplify invalidation logics in lookup_dcache()
namei: change calling conventions for lookup_{fast,slow} and follow_managed()
...
Some callers of strtobool() were passing a pointer to unterminated
strings. In preparation of adding multi-character processing to
kstrtobool(), update the callers to not pass single-character pointers,
and switch to using the new kstrtobool_from_user() helper where
possible.
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Amitkumar Karwar <akarwar@marvell.com>
Cc: Nishant Sarmukadam <nishants@marvell.com>
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: Steve French <sfrench@samba.org>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Joe Perches <joe@perches.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 04b38d6012 ("vfs: pull btrfs clone API to vfs layer")
added a duplicated line (in cifsfs.c) which causes a sparse compile
warning.
Signed-off-by: Steve French <steve.french@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Pull SMB3 fixes from Steve French:
"A collection of CIFS/SMB3 fixes.
It includes a couple bug fixes, a few for improved debugging of
cifs.ko and some improvements to the way cifs does key generation.
I do have some additional bug fixes I expect in the next week or two
(to address a problem found by xfstest, and some fixes for SMB3.11
dialect, and a couple patches that just came in yesterday that I am
reviewing)"
* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
cifs_dbg() outputs an uninitialized buffer in cifs_readdir()
cifs: fix race between call_async() and reconnect()
Prepare for encryption support (first part). Add decryption and encryption key generation. Thanks to Metze for helping with this.
cifs: Allow using O_DIRECT with cache=loose
cifs: Make echo interval tunable
cifs: Check uniqueid for SMB2+ and return -ESTALE if necessary
Print IP address of unresponsive server
cifs: Ratelimit kernel log messages
parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
inode_foo(inode) being mutex_foo(&inode->i_mutex).
Please, use those for access to ->i_mutex; over the coming cycle
->i_mutex will become rwsem, with ->lookup() done with it held
only shared.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Mark those kmem allocations that are known to be easily triggered from
userspace as __GFP_ACCOUNT/SLAB_ACCOUNT, which makes them accounted to
memcg. For the list, see below:
- threadinfo
- task_struct
- task_delay_info
- pid
- cred
- mm_struct
- vm_area_struct and vm_region (nommu)
- anon_vma and anon_vma_chain
- signal_struct
- sighand_struct
- fs_struct
- files_struct
- fdtable and fdtable->full_fds_bits
- dentry and external_name
- inode for all filesystems. This is the most tedious part, because
most filesystems overwrite the alloc_inode method.
The list is far from complete, so feel free to add more objects.
Nevertheless, it should be close to "account everything" approach and
keep most workloads within bounds. Malevolent users will be able to
breach the limit, but this was possible even with the former "account
everything" approach (simply because it did not account everything in
fact).
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Greg Thelen <gthelen@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently O_DIRECT is supported with cache=none and cache=strict, but
not cache=loose. Add support for using O_DIRECT when mounted with
cache=loose.
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Currently the echo interval is set to 60 seconds using a macro. This
setting determines the interval at which echo requests are sent to the
server on an idling connection. This setting also affects the time
required for a connection to an unresponsive server to timeout.
Making this setting a tunable allows users to control the echo interval
times as well as control the time after which the connecting to an
unresponsive server times out.
To set echo interval, pass the echo_interval=n mount option.
Version four of the patch.
v2: Change MIN and MAX timeout values
v3: Remove incorrect comment in cifs_get_tcp_session
v4: Fix bug in setting echo_intervalw
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Acked-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Pull vfs copy_file_range updates from Al Viro:
"Several series around copy_file_range/CLONE"
* 'work.copy_file_range' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
btrfs: use new dedupe data function pointer
vfs: hoist the btrfs deduplication ioctl to the vfs
vfs: wire up compat ioctl for CLONE/CLONE_RANGE
cifs: avoid unused variable and label
nfsd: implement the NFSv4.2 CLONE operation
nfsd: Pass filehandle to nfs4_preprocess_stateid_op()
vfs: pull btrfs clone API to vfs layer
locks: new locks_mandatory_area calling convention
vfs: Add vfs_copy_file_range() support for pagecache copies
btrfs: add .copy_file_range file operation
x86: add sys_copy_file_range to syscall tables
vfs: add copy_file_range syscall and vfs helper
new method: ->get_link(); replacement of ->follow_link(). The differences
are:
* inode and dentry are passed separately
* might be called both in RCU and non-RCU mode;
the former is indicated by passing it a NULL dentry.
* when called that way it isn't allowed to block
and should return ERR_PTR(-ECHILD) if it needs to be called
in non-RCU mode.
It's a flagday change - the old method is gone, all in-tree instances
converted. Conversion isn't hard; said that, so far very few instances
do not immediately bail out when called in RCU mode. That'll change
in the next commits.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The newly introduced cifs_clone_file_range() function produces
two harmless compile-time warnings:
cifsfs.c: In function 'cifs_clone_file_range':
cifsfs.c:963:1: warning: label 'out_unlock' defined but not used [-Wunused-label]
cifsfs.c:924:20: warning: unused variable 'src_tcon' [-Wunused-variable]
In both cases, removing the extraneous line avoids the warning.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: c6f2a1e2e5f8 ("vfs: pull btrfs clone API to vfs layer")
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The btrfs clone ioctls are now adopted by other file systems, with NFS
and CIFS already having support for them, and XFS being under active
development. To avoid growth of various slightly incompatible
implementations, add one to the VFS. Note that clones are different from
file copies in several ways:
- they are atomic vs other writers
- they support whole file clones
- they support 64-bit legth clones
- they do not allow partial success (aka short writes)
- clones are expected to be a fast metadata operation
Because of that it would be rather cumbersome to try to piggyback them on
top of the recent clone_file_range infrastructure. The converse isn't
true and the clone_file_range system call could try clone file range as
a first attempt to copy, something that further patches will enable.
Based on earlier work from Peng Tao.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
FSCTL_SRV_COPYCHUNK_WRITE only requires that the source and target
be on the same server (not the same volume or same share),
so relax the existing check (which required them to be on
the same share). Note that this works to Windows (and presumably
most other NAS) but Samba requires that the source
and target be on the same share. Moving a file across
shares is a common use case and can be very heplful (100x faster).
Signed-off-by: Steve French <steve.french@primarydata.com>
Reviewed-by: David Disseldorp <ddiss@samba.org>
Since many servers (Windows clients, and non-clustered servers) do not
support persistent handles but do support resilient handles, allow
the user to specify a mount option "resilienthandles" in order
to get more reliable connections and less chance of data loss
(at least when SMB2.1 or later). Default resilient handle
timeout (120 seconds to recent Windows server) is used.
Reviewed-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <steve.french@primarydata.com>
When the user specifies "sec=none" in a cifs mount, we set
sec_type as unspecified (and set a flag and the username will be
null) rather than setting sectype as "none" so
cifs_show_security was not properly displaying it in
cifs /proc/mounts entries.
Signed-off-by: Steve French <steve.french@primarydata.com>
Reviewed-by: Jeff Layton <jlayton@poochiereds.net>
Many file systems that implement the show_options hook fail to correctly
escape their output which could lead to unescaped characters (e.g. new
lines) leaking into /proc/mounts and /proc/[pid]/mountinfo files. This
could lead to confusion, spoofed entries (resulting in things like
systemd issuing false d-bus "mount" notifications), and who knows what
else. This looks like it would only be the root user stepping on
themselves, but it's possible weird things could happen in containers or
in other situations with delegated mount privileges.
Here's an example using overlay with setuid fusermount trusting the
contents of /proc/mounts (via the /etc/mtab symlink). Imagine the use
of "sudo" is something more sneaky:
$ BASE="ovl"
$ MNT="$BASE/mnt"
$ LOW="$BASE/lower"
$ UP="$BASE/upper"
$ WORK="$BASE/work/ 0 0
none /proc fuse.pwn user_id=1000"
$ mkdir -p "$LOW" "$UP" "$WORK"
$ sudo mount -t overlay -o "lowerdir=$LOW,upperdir=$UP,workdir=$WORK" none /mnt
$ cat /proc/mounts
none /root/ovl/mnt overlay rw,relatime,lowerdir=ovl/lower,upperdir=ovl/upper,workdir=ovl/work/ 0 0
none /proc fuse.pwn user_id=1000 0 0
$ fusermount -u /proc
$ cat /proc/mounts
cat: /proc/mounts: No such file or directory
This fixes the problem by adding new seq_show_option and
seq_show_option_n helpers, and updating the vulnerable show_option
handlers to use them as needed. Some, like SELinux, need to be open
coded due to unusual existing escape mechanisms.
[akpm@linux-foundation.org: add lost chunk, per Kees]
[keescook@chromium.org: seq_show_option should be using const parameters]
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Acked-by: Jan Kara <jack@suse.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Cc: J. R. Okajima <hooanon05g@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>