Commit Graph

561 Commits

Author SHA1 Message Date
Linus Torvalds
b4b52b881c Wimplicit-fallthrough patches for 5.2-rc1
Hi Linus,
 
 This is my very first pull-request.  I've been working full-time as
 a kernel developer for more than two years now. During this time I've
 been fixing bugs reported by Coverity all over the tree and, as part
 of my work, I'm also contributing to the KSPP. My work in the kernel
 community has been supervised by Greg KH and Kees Cook.
 
 OK. So, after the quick introduction above, please, pull the following
 patches that mark switch cases where we are expecting to fall through.
 These patches are part of the ongoing efforts to enable -Wimplicit-fallthrough.
 They have been ignored for a long time (most of them more than 3 months,
 even after pinging multiple times), which is the reason why I've created
 this tree. Most of them have been baking in linux-next for a whole development
 cycle. And with Stephen Rothwell's help, we've had linux-next nag-emails
 going out for newly introduced code that triggers -Wimplicit-fallthrough
 to avoid gaining more of these cases while we work to remove the ones
 that are already present.
 
 I'm happy to let you know that we are getting close to completing this
 work.  Currently, there are only 32 of 2311 of these cases left to be
 addressed in linux-next.  I'm auditing every case; I take a look into
 the code and analyze it in order to determine if I'm dealing with an
 actual bug or a false positive, as explained here:
 
 https://lore.kernel.org/lkml/c2fad584-1705-a5f2-d63c-824e9b96cf50@embeddedor.com/
 
 While working on this, I've found and fixed the following missing
 break/return bugs, some of them introduced more than 5 years ago:
 
 84242b82d8
 7850b51b6c
 5e420fe635
 09186e5034
 b5be853181
 7264235ee7
 cc5034a5d2
 479826cc86
 5340f23df8
 df997abeeb
 2f10d82373
 307b00c5e6
 5d25ff7a54
 a7ed5b3e7d
 c24bfa8f21
 ad0eaee619
 9ba8376ce1
 dc586a60a1
 a8e9b186f1
 4e57562b48
 60747828ea
 c5b974bee9
 cc44ba9116
 2c930e3d0a
 
 Once this work is finish, we'll be able to universally enable
 "-Wimplicit-fallthrough" to avoid any of these kinds of bugs from
 entering the kernel again.
 
 Thanks
 
 Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEkmRahXBSurMIg1YvRwW0y0cG2zEFAlzQR2IACgkQRwW0y0cG
 2zEJbQ//X930OcBtT/9DRW4XL1Jeq0Mjssz/GLX2Vpup5CwwcTROG65no80Zezf/
 yQRWnUjGX0OBv/fmUK32/nTxI/7k7NkmIXJHe0HiEF069GEENB7FT6tfDzIPjU8M
 qQkB8NsSUWJs3IH6BVynb/9MGE1VpGBDbYk7CBZRtRJT1RMM+3kQPucgiZMgUBPo
 Yd9zKwn4i/8tcOCli++EUdQ29ukMoY2R3qpK4LftdX9sXLKZBWNwQbiCwSkjnvJK
 I6FDiA7RaWH2wWGlL7BpN5RrvAXp3z8QN/JZnivIGt4ijtAyxFUL/9KOEgQpBQN2
 6TBRhfTQFM73NCyzLgGLNzvd8awem1rKGSBBUvevaPbgesgM+Of65wmmTQRhFNCt
 A7+e286X1GiK3aNcjUKrByKWm7x590EWmDzmpmICxNPdt5DHQ6EkmvBdNjnxCMrO
 aGA24l78tBN09qN45LR7wtHYuuyR0Jt9bCmeQZmz7+x3ICDHi/+Gw7XPN/eM9+T6
 lZbbINiYUyZVxOqwzkYDCsdv9+kUvu3e4rPs20NERWRpV8FEvBIyMjXAg6NAMTue
 K+ikkyMBxCvyw+NMimHJwtD7ho4FkLPcoeXb2ZGJTRHixiZAEtF1RaQ7dA05Q/SL
 gbSc0DgLZeHlLBT+BSWC2Z8SDnoIhQFXW49OmuACwCUC68NHKps=
 =k30z
 -----END PGP SIGNATURE-----

Merge tag 'Wimplicit-fallthrough-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux

Pull Wimplicit-fallthrough updates from Gustavo A. R. Silva:
 "Mark switch cases where we are expecting to fall through.

  This is part of the ongoing efforts to enable -Wimplicit-fallthrough.

  Most of them have been baking in linux-next for a whole development
  cycle. And with Stephen Rothwell's help, we've had linux-next
  nag-emails going out for newly introduced code that triggers
  -Wimplicit-fallthrough to avoid gaining more of these cases while we
  work to remove the ones that are already present.

  We are getting close to completing this work. Currently, there are
  only 32 of 2311 of these cases left to be addressed in linux-next. I'm
  auditing every case; I take a look into the code and analyze it in
  order to determine if I'm dealing with an actual bug or a false
  positive, as explained here:

      https://lore.kernel.org/lkml/c2fad584-1705-a5f2-d63c-824e9b96cf50@embeddedor.com/

  While working on this, I've found and fixed the several missing
  break/return bugs, some of them introduced more than 5 years ago.

  Once this work is finished, we'll be able to universally enable
  "-Wimplicit-fallthrough" to avoid any of these kinds of bugs from
  entering the kernel again"

* tag 'Wimplicit-fallthrough-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux: (27 commits)
  memstick: mark expected switch fall-throughs
  drm/nouveau/nvkm: mark expected switch fall-throughs
  NFC: st21nfca: Fix fall-through warnings
  NFC: pn533: mark expected switch fall-throughs
  block: Mark expected switch fall-throughs
  ASN.1: mark expected switch fall-through
  lib/cmdline.c: mark expected switch fall-throughs
  lib: zstd: Mark expected switch fall-throughs
  scsi: sym53c8xx_2: sym_nvram: Mark expected switch fall-through
  scsi: sym53c8xx_2: sym_hipd: mark expected switch fall-throughs
  scsi: ppa: mark expected switch fall-through
  scsi: osst: mark expected switch fall-throughs
  scsi: lpfc: lpfc_scsi: Mark expected switch fall-throughs
  scsi: lpfc: lpfc_nvme: Mark expected switch fall-through
  scsi: lpfc: lpfc_nportdisc: Mark expected switch fall-through
  scsi: lpfc: lpfc_hbadisc: Mark expected switch fall-throughs
  scsi: lpfc: lpfc_els: Mark expected switch fall-throughs
  scsi: lpfc: lpfc_ct: Mark expected switch fall-throughs
  scsi: imm: mark expected switch fall-throughs
  scsi: csiostor: csio_wr: mark expected switch fall-through
  ...
2019-05-07 12:48:10 -07:00
Linus Torvalds
168e153d5e Merge branch 'work.icache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs inode freeing updates from Al Viro:
 "Introduction of separate method for RCU-delayed part of
  ->destroy_inode() (if any).

  Pretty much as posted, except that destroy_inode() stashes
  ->free_inode into the victim (anon-unioned with ->i_fops) before
  scheduling i_callback() and the last two patches (sockfs conversion
  and folding struct socket_wq into struct socket) are excluded - that
  pair should go through netdev once davem reopens his tree"

* 'work.icache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (58 commits)
  orangefs: make use of ->free_inode()
  shmem: make use of ->free_inode()
  hugetlb: make use of ->free_inode()
  overlayfs: make use of ->free_inode()
  jfs: switch to ->free_inode()
  fuse: switch to ->free_inode()
  ext4: make use of ->free_inode()
  ecryptfs: make use of ->free_inode()
  ceph: use ->free_inode()
  btrfs: use ->free_inode()
  afs: switch to use of ->free_inode()
  dax: make use of ->free_inode()
  ntfs: switch to ->free_inode()
  securityfs: switch to ->free_inode()
  apparmor: switch to ->free_inode()
  rpcpipe: switch to ->free_inode()
  bpf: switch to ->free_inode()
  mqueue: switch to ->free_inode()
  ufs: switch to ->free_inode()
  coda: switch to ->free_inode()
  ...
2019-05-07 10:57:05 -07:00
Al Viro
51b9fe48c4 afs: switch to use of ->free_inode()
debugging printks left in ->destroy_inode() and so's the
update of inode count; we could take the latter to RCU-delayed
part (would take only moving the check on module exit past
rcu_barrier() there), but debugging output ought to either
stay where it is or go into ->evict_inode()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01 22:43:26 -04:00
David Howells
eeba1e9cf3 afs: Fix in-progess ops to ignore server-level callback invalidation
The in-kernel afs filesystem client counts the number of server-level
callback invalidation events (CB.InitCallBackState* RPC operations) that it
receives from the server.  This is stored in cb_s_break in various
structures, including afs_server and afs_vnode.

If an inode is examined by afs_validate(), say, the afs_server copy is
compared, along with other break counters, to those in afs_vnode, and if
one or more of the counters do not match, it is considered that the
server's callback promise is broken.  At points where this happens,
AFS_VNODE_CB_PROMISED is cleared to indicate that the status must be
refetched from the server.

afs_validate() issues an FS.FetchStatus operation to get updated metadata -
and based on the updated data_version may invalidate the pagecache too.

However, the break counters are also used to determine whether to note a
new callback in the vnode (which would set the AFS_VNODE_CB_PROMISED flag)
and whether to cache the permit data included in the YFSFetchStatus record
by the server.


The problem comes when the server sends us a CB.InitCallBackState op.  The
first such instance doesn't cause cb_s_break to be incremented, but rather
causes AFS_SERVER_FL_NEW to be cleared - but thereafter, say some hours
after last use and all the volumes have been automatically unmounted and
the server has forgotten about the client[*], this *will* likely cause an
increment.

 [*] There are other circumstances too, such as the server restarting or
     needing to make space in its callback table.

Note that the server won't send us a CB.InitCallBackState op until we talk
to it again.

So what happens is:

 (1) A mount for a new volume is attempted, a inode is created for the root
     vnode and vnode->cb_s_break and AFS_VNODE_CB_PROMISED aren't set
     immediately, as we don't have a nominated server to talk to yet - and
     we may iterate through a few to find one.

 (2) Before the operation happens, afs_fetch_status(), say, notes in the
     cursor (fc.cb_break) the break counter sum from the vnode, volume and
     server counters, but the server->cb_s_break is currently 0.

 (3) We send FS.FetchStatus to the server.  The server sends us back
     CB.InitCallBackState.  We increment server->cb_s_break.

 (4) Our FS.FetchStatus completes.  The reply includes a callback record.

 (5) xdr_decode_AFSCallBack()/xdr_decode_YFSCallBack() check to see whether
     the callback promise was broken by checking the break counter sum from
     step (2) against the current sum.

     This fails because of step (3), so we don't set the callback record
     and, importantly, don't set AFS_VNODE_CB_PROMISED on the vnode.

This does not preclude the syscall from progressing, and we don't loop here
rechecking the status, but rather assume it's good enough for one round
only and will need to be rechecked next time.

 (6) afs_validate() it triggered on the vnode, probably called from
     d_revalidate() checking the parent directory.

 (7) afs_validate() notes that AFS_VNODE_CB_PROMISED isn't set, so doesn't
     update vnode->cb_s_break and assumes the vnode to be invalid.

 (8) afs_validate() needs to calls afs_fetch_status().  Go back to step (2)
     and repeat, every time the vnode is validated.

This primarily affects volume root dir vnodes.  Everything subsequent to
those inherit an already incremented cb_s_break upon mounting.


The issue is that we assume that the callback record and the cached permit
information in a reply from the server can't be trusted after getting a
server break - but this is wrong since the server makes sure things are
done in the right order, holding up our ops if necessary[*].

 [*] There is an extremely unlikely scenario where a reply from before the
     CB.InitCallBackState could get its delivery deferred till after - at
     which point we think we have a promise when we don't.  This, however,
     requires unlucky mass packet loss to one call.

AFS_SERVER_FL_NEW tries to paper over the cracks for the initial mount from
a server we've never contacted before, but this should be unnecessary.
It's also further insulated from the problem on an initial mount by
querying the server first with FS.GetCapabilities, which triggers the
CB.InitCallBackState.


Fix this by

 (1) Remove AFS_SERVER_FL_NEW.

 (2) In afs_calc_vnode_cb_break(), don't include cb_s_break in the
     calculation.

 (3) In afs_cb_is_broken(), don't include cb_s_break in the check.


Signed-off-by: David Howells <dhowells@redhat.com>
2019-04-13 08:37:37 +01:00
Marc Dionne
21bd68f196 afs: Unlock pages for __pagevec_release()
__pagevec_release() complains loudly if any page in the vector is still
locked.  The pages need to be locked for generic_error_remove_page(), but
that function doesn't actually unlock them.

Unlock the pages afterwards.

Signed-off-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Jonathan Billings <jsbillin@umich.edu>
2019-04-13 08:37:37 +01:00
David Howells
8022c4b95c afs: Differentiate abort due to unmarshalling from other errors
Differentiate an abort due to an unmarshalling error from an abort due to
other errors, such as ENETUNREACH.  It doesn't make sense to set abort code
RXGEN_*_UNMARSHAL in such a case, so use RX_USER_ABORT instead.

Signed-off-by: David Howells <dhowells@redhat.com>
2019-04-13 08:37:37 +01:00
Andi Kleen
d2abfa86ff afs: Avoid section confusion in CM_NAME
__tracepoint_str cannot be const because the tracepoint_str
section is not read-only. Remove the stray const.

Cc: dhowells@redhat.com
Cc: viro@zeniv.linux.org.uk
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2019-04-13 08:37:36 +01:00
Arnd Bergmann
ba25b81e3a afs: avoid deprecated get_seconds()
get_seconds() has a limited range on 32-bit architectures and is
deprecated because of that. While AFS uses the same limits for
its inode timestamps on the wire protocol, let's just use the
simpler current_time() as we do for other file systems.

This will still zero out the 'tv_nsec' field of the timestamps
internally.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David Howells <dhowells@redhat.com>
2019-04-13 08:37:36 +01:00
Marc Dionne
f7f1dd3162 afs: Check for rxrpc call completion in wait loop
Check the state of the rxrpc call backing an afs call in each iteration of
the call wait loop in case the rxrpc call has already been terminated at
the rxrpc layer.

Interrupt the wait loop and mark the afs call as complete if the rxrpc
layer call is complete.

There were cases where rxrpc errors were not passed up to afs, which could
result in this loop waiting forever for an afs call to transition to
AFS_CALL_COMPLETE while the rx call was already complete.

Signed-off-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12 16:57:23 -07:00
Marc Dionne
4611da30d6 rxrpc: Make rxrpc_kernel_check_life() indicate if call completed
Make rxrpc_kernel_check_life() pass back the life counter through the
argument list and return true if the call has not yet completed.

Suggested-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12 16:57:23 -07:00
Gustavo A. R. Silva
e690c9e3f4 afs: Mark expected switch fall-throughs
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Notice that in many cases I placed a /* Fall through */ comment
at the bottom of the case, which what GCC is expecting to find.

In other cases I had to tweak a bit the format of the comments.

This patch suppresses ALL missing-break-in-switch false positives
in fs/afs

Addresses-Coverity-ID: 115042 ("Missing break in switch")
Addresses-Coverity-ID: 115043 ("Missing break in switch")
Addresses-Coverity-ID: 115045 ("Missing break in switch")
Addresses-Coverity-ID: 1357430 ("Missing break in switch")
Addresses-Coverity-ID: 115047 ("Missing break in switch")
Addresses-Coverity-ID: 115050 ("Missing break in switch")
Addresses-Coverity-ID: 115051 ("Missing break in switch")
Addresses-Coverity-ID: 1467806 ("Missing break in switch")
Addresses-Coverity-ID: 1467807 ("Missing break in switch")
Addresses-Coverity-ID: 1467811 ("Missing break in switch")
Addresses-Coverity-ID: 115041 ("Missing break in switch")
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
2019-04-08 18:35:56 -05:00
David Howells
8c7ae38d1c afs: Fix StoreData op marshalling
The marshalling of AFS.StoreData, AFS.StoreData64 and YFS.StoreData64 calls
generated by ->setattr() ops for the purpose of expanding a file is
incorrect due to older documentation incorrectly describing the way the RPC
'FileLength' parameter is meant to work.

The older documentation says that this is the length the file is meant to
end up at the end of the operation; however, it was never implemented this
way in any of the servers, but rather the file is truncated down to this
before the write operation is effected, and never expanded to it (and,
indeed, it was renamed to 'TruncPos' in 2014).

Fix this by setting the position parameter to the new file length and doing
a zero-lengh write there.

The bug causes Xwayland to SIGBUS due to unexpected non-expansion of a file
it then mmaps.  This can be tested by giving the following test program a
filename in an AFS directory:

	#include <stdio.h>
	#include <stdlib.h>
	#include <unistd.h>
	#include <fcntl.h>
	#include <sys/mman.h>
	int main(int argc, char *argv[])
	{
		char *p;
		int fd;
		if (argc != 2) {
			fprintf(stderr,
				"Format: test-trunc-mmap <file>\n");
			exit(2);
		}
		fd = open(argv[1], O_RDWR | O_CREAT | O_TRUNC);
		if (fd < 0) {
			perror(argv[1]);
			exit(1);
		}
		if (ftruncate(fd, 0x140008) == -1) {
			perror("ftruncate");
			exit(1);
		}
		p = mmap(NULL, 4096, PROT_READ | PROT_WRITE,
			 MAP_SHARED, fd, 0);
		if (p == MAP_FAILED) {
			perror("mmap");
			exit(1);
		}
		p[0] = 'a';
		if (munmap(p, 4096) < 0) {
			perror("munmap");
			exit(1);
		}
		if (close(fd) < 0) {
			perror("close");
			exit(1);
		}
		exit(0);
	}

Fixes: 31143d5d51 ("AFS: implement basic file write support")
Reported-by: Jonathan Billings <jsbillin@umich.edu>
Tested-by: Jonathan Billings <jsbillin@umich.edu>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-28 08:54:20 -07:00
Linus Torvalds
7b47a9e7c8 Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs mount infrastructure updates from Al Viro:
 "The rest of core infrastructure; no new syscalls in that pile, but the
  old parts are switched to new infrastructure. At that point
  conversions of individual filesystems can happen independently; some
  are done here (afs, cgroup, procfs, etc.), there's also a large series
  outside of that pile dealing with NFS (quite a bit of option-parsing
  stuff is getting used there - it's one of the most convoluted
  filesystems in terms of mount-related logics), but NFS bits are the
  next cycle fodder.

  It got seriously simplified since the last cycle; documentation is
  probably the weakest bit at the moment - I considered dropping the
  commit introducing Documentation/filesystems/mount_api.txt (cutting
  the size increase by quarter ;-), but decided that it would be better
  to fix it up after -rc1 instead.

  That pile allows to do followup work in independent branches, which
  should make life much easier for the next cycle. fs/super.c size
  increase is unpleasant; there's a followup series that allows to
  shrink it considerably, but I decided to leave that until the next
  cycle"

* 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (41 commits)
  afs: Use fs_context to pass parameters over automount
  afs: Add fs_context support
  vfs: Add some logging to the core users of the fs_context log
  vfs: Implement logging through fs_context
  vfs: Provide documentation for new mount API
  vfs: Remove kern_mount_data()
  hugetlbfs: Convert to fs_context
  cpuset: Use fs_context
  kernfs, sysfs, cgroup, intel_rdt: Support fs_context
  cgroup: store a reference to cgroup_ns into cgroup_fs_context
  cgroup1_get_tree(): separate "get cgroup_root to use" into a separate helper
  cgroup_do_mount(): massage calling conventions
  cgroup: stash cgroup_root reference into cgroup_fs_context
  cgroup2: switch to option-by-option parsing
  cgroup1: switch to option-by-option parsing
  cgroup: take options parsing into ->parse_monolithic()
  cgroup: fold cgroup1_mount() into cgroup1_get_tree()
  cgroup: start switching to fs_context
  ipc: Convert mqueue fs to fs_context
  proc: Add fs_context support to procfs
  ...
2019-03-12 14:08:19 -07:00
Nikolay Borisov
b5420237ec mm: refactor readahead defines in mm.h
All users of VM_MAX_READAHEAD actually convert it to kbytes and then to
pages. Define the macro explicitly as (SZ_128K / PAGE_SIZE). This
simplifies the expression in every filesystem. Also rename the macro to
VM_READAHEAD_PAGES to properly convey its meaning. Finally remove unused
VM_MIN_READAHEAD

[akpm@linux-foundation.org: fix fs/io_uring.c, per Stephen]
Link: http://lkml.kernel.org/r/20181221144053.24318-1-nborisov@suse.com
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Matthew Wilcox <willy@infradead.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Chris Mason <clm@fb.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: David Sterba <dsterba@suse.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-12 10:04:01 -07:00
David Howells
c99c2171fc afs: Use fs_context to pass parameters over automount
Alter the AFS automounting code to create and modify an fs_context struct
when parameterising a new mount triggered by an AFS mountpoint rather than
constructing device name and option strings.

Also remove the cell=, vol= and rwpath options as they are then redundant.
The reason they existed is because the 'device name' may be derived
literally from a mountpoint object in the filesystem, so default cell and
parent-type information needed to be passed in by some other method from
the automount routines.  The vol= option didn't end up being used.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Eric W. Biederman <ebiederm@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-02-28 03:29:39 -05:00
David Howells
13fcc68370 afs: Add fs_context support
Add fs_context support to the AFS filesystem, converting the parameter
parsing to store options there.

This will form the basis for namespace propagation over mountpoints within
the AFS model, thereby allowing AFS to be used in containers more easily.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-02-28 03:29:38 -05:00
David Howells
7d762d6914 afs: Fix manually set volume location server list
When a cell with a volume location server list is added manually by
echoing the details into /proc/net/afs/cells, a record is added but the
flag saying it has been looked up isn't set.

This causes the VL server rotation code to wait forever, with the top of
/proc/pid/stack looking like:

	afs_select_vlserver+0x3a6/0x6f3
	afs_vl_lookup_vldb+0x4b/0x92
	afs_create_volume+0x25/0x1b9
	...

with the thread stuck in afs_start_vl_iteration() waiting for
AFS_CELL_FL_NO_LOOKUP_YET to be cleared.

Fix this by clearing AFS_CELL_FL_NO_LOOKUP_YET when setting up a record
if that record's details were supplied manually.

Fixes: 0a5143f2f8 ("afs: Implement VL server rotation")
Reported-by: Dave Botsch <dwb7@cornell.edu>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-02-25 11:59:07 -08:00
David Howells
34fa47612b afs: Fix race in async call refcounting
There's a race between afs_make_call() and afs_wake_up_async_call() in the
case that an error is returned from rxrpc_kernel_send_data() after it has
queued the final packet.

afs_make_call() will try and clean up the mess, but the call state may have
been moved on thereby causing afs_process_async_call() to also try and to
delete the call.

Fix this by:

 (1) Getting an extra ref for an asynchronous call for the call itself to
     hold.  This makes sure the call doesn't evaporate on us accidentally
     and will allow the call to be retained by the caller in a future
     patch.  The ref is released on leaving afs_make_call() or
     afs_wait_for_call_to_complete().

 (2) In the event of an error from rxrpc_kernel_send_data():

     (a) Don't set the call state to AFS_CALL_COMPLETE until *after* the
     	 call has been aborted and ended.  This prevents
     	 afs_deliver_to_call() from doing anything with any notifications
     	 it gets.

     (b) Explicitly end the call immediately to prevent further callbacks.

     (c) Cancel any queued async_work and wait for the work if it's
     	 executing.  This allows us to be sure the race won't recur when we
     	 change the state.  We put the work queue's ref on the call if we
     	 managed to cancel it.

     (d) Put the call's ref that we got in (1).  This belongs to us as long
     	 as the call is in state AFS_CALL_CL_REQUESTING.

Fixes: 341f741f04 ("afs: Refcount the afs_call struct")
Signed-off-by: David Howells <dhowells@redhat.com>
2019-01-17 15:17:28 +00:00
David Howells
7a75b0079a afs: Provide a function to get a ref on a call
Provide a function to get a reference on an afs_call struct.

Signed-off-by: David Howells <dhowells@redhat.com>
2019-01-17 15:17:28 +00:00
David Howells
59d49076ae afs: Fix key refcounting in file locking code
Fix the refcounting of the authentication keys in the file locking code.
The vnode->lock_key member points to a key on which it expects to be
holding a ref, but it isn't always given an extra ref, however.

Fixes: 0fafdc9f88 ("afs: Fix file locking")
Signed-off-by: David Howells <dhowells@redhat.com>
2019-01-17 15:17:28 +00:00
Marc Dionne
4882a27cec afs: Don't set vnode->cb_s_break in afs_validate()
A cb_interest record is not necessarily attached to the vnode on entry to
afs_validate(), which can cause an oops when we try to bring the vnode's
cb_s_break up to date in the default case (ie. no current callback promise
and the vnode has not been deleted).

Fix this by simply removing the line, as vnode->cb_s_break will be set when
needed by afs_register_server_cb_interest() when we next get a callback
promise from RPC call.

The oops looks something like:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
    ...
    RIP: 0010:afs_validate+0x66/0x250 [kafs]
    ...
    Call Trace:
     afs_d_revalidate+0x8d/0x340 [kafs]
     ? __d_lookup+0x61/0x150
     lookup_dcache+0x44/0x70
     ? lookup_dcache+0x44/0x70
     __lookup_hash+0x24/0xa0
     do_unlinkat+0x11d/0x2c0
     __x64_sys_unlink+0x23/0x30
     do_syscall_64+0x4d/0xf0
     entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: ae3b7361dc ("afs: Fix validation/callback interaction")
Signed-off-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2019-01-17 15:15:52 +00:00
Marc Dionne
5edc22cc1d afs: Set correct lock type for the yfs CreateFile
A lock type of 0 is "LockRead", which makes the fileserver record an
unintentional read lock on the new file.  This will cause problems
later on if the file is the subject of locking operations.

The correct default value should be -1 ("LockNone").

Fix the operation marshalling code to set the value and provide an enum to
symbolise the values whilst we're at it.

Fixes: 30062bd13e ("afs: Implement YFS support in the fs client")
Signed-off-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2019-01-10 17:12:05 +00:00
Gustavo A. R. Silva
c2b8bd49d3 afs: Use struct_size() in kzalloc()
One of the more common cases of allocation size calculations is finding the
size of a structure that has a zero-sized array at the end, along with
memory for some number of elements for that array. For example:

struct foo {
    int stuff;
    void *entry[];
};

instance = kzalloc(sizeof(struct foo) + sizeof(void *) * count, GFP_KERNEL);

Instead of leaving these open-coded and prone to type mistakes, we can now
use the new struct_size() helper:

instance = kzalloc(struct_size(instance, entry, count), GFP_KERNEL);

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2019-01-10 17:12:05 +00:00
Nikolay Borisov
f86196ea87 fs: don't open code lru_to_page()
Multiple filesystems open code lru_to_page().  Rectify this by moving
the macro from mm_inline (which is specific to lru stuff) to the more
generic mm.h header and start using the macro where appropriate.

No functional changes.

Link: http://lkml.kernel.org/r/20181129104810.23361-1-nborisov@suse.com
Link: https://lkml.kernel.org/r/20181129075301.29087-1-nborisov@suse.com
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Pankaj gupta <pagupta@redhat.com>
Acked-by: "Yan, Zheng" <zyan@redhat.com>		[ceph]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04 13:13:48 -08:00
Davidlohr Bueso
08d405c8b8 fs/: remove caller signal_pending branch predictions
This is already done for us internally by the signal machinery.

[akpm@linux-foundation.org: fix fs/buffer.c]
Link: http://lkml.kernel.org/r/20181116002713.8474-7-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04 13:13:48 -08:00
Linus Torvalds
5f1ca5c619 Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
 "Assorted fixes all over the place.

  The iov_iter one is this cycle regression (splice from UDP triggering
  WARN_ON()), the rest is older"

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  afs: Use d_instantiate() rather than d_add() and don't d_drop()
  afs: Fix missing net error handling
  afs: Fix validation/callback interaction
  iov_iter: teach csum_and_copy_to_iter() to handle pipe-backed ones
  exportfs: do not read dentry after free
  exportfs: fix 'passing zero to ERR_PTR()' warning
  aio: fix failure to put the file pointer
  sysv: return 'err' instead of 0 in __sysv_write_inode
2018-11-30 10:47:50 -08:00
David Howells
73116df7bb afs: Use d_instantiate() rather than d_add() and don't d_drop()
Use d_instantiate() rather than d_add() and don't d_drop() in
afs_vnode_new_inode().  The dentry shouldn't be removed as it's not
changing its name.

Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-11-29 21:08:14 -05:00
David Howells
4584ae96ae afs: Fix missing net error handling
kAFS can be given certain network errors (EADDRNOTAVAIL, EHOSTDOWN and
ERFKILL) that it doesn't handle in its server/address rotation algorithms.
They cause the probing and rotation to abort immediately rather than
rotating.

Fix this by:

 (1) Abstracting out the error prioritisation from the VL and FS rotation
     algorithms into a common function and expand usage into the server
     probing code.

     When multiple errors are available, this code selects the one we'd
     prefer to return.

 (2) Add handling for EADDRNOTAVAIL, EHOSTDOWN and ERFKILL.

Fixes: 0fafdc9f88 ("afs: Fix file locking")
Fixes: 0338747d8454 ("afs: Probe multiple fileservers simultaneously")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-11-29 21:08:14 -05:00
David Howells
ae3b7361dc afs: Fix validation/callback interaction
When afs_validate() is called to validate a vnode (inode), there are two
unhandled cases in the fastpath at the top of the function:

 (1) If the vnode is promised (AFS_VNODE_CB_PROMISED is set), the break
     counters match and the data has expired, then there's an implicit case
     in which the vnode needs revalidating.

     This has no consequences since the default "valid = false" set at the
     top of the function happens to do the right thing.

 (2) If the vnode is not promised and it hasn't been deleted
     (AFS_VNODE_DELETED is not set) then there's a default case we're not
     handling in which the vnode is invalid.  If the vnode is invalid, we
     need to bring cb_s_break and cb_v_break up to date before we refetch
     the status.

     As a consequence, once the server loses track of the client
     (ie. sufficient time has passed since we last sent it an operation),
     it will send us a CB.InitCallBackState* operation when we next try to
     talk to it.  This calls afs_init_callback_state() which increments
     afs_server::cb_s_break, but this then doesn't propagate to the
     afs_vnode record.

     The result being that every afs_validate() call thereafter sends a
     status fetch operation to the server.

Clarify and fix this by:

 (A) Setting valid in all the branches rather than initialising it at the
     top so that the compiler catches where we've missed.

 (B) Restructuring the logic in the 'promised' branch so that we set valid
     to false if the callback is due to expire (or has expired) and so that
     the final case is that the vnode is still valid.

 (C) Adding an else-statement that ups cb_s_break and cb_v_break if the
     promised and deleted cases don't match.

Fixes: c435ee3455 ("afs: Overhaul the callback handling")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-11-29 21:08:14 -05:00
David Howells
7150ceaacb rxrpc: Fix life check
The life-checking function, which is used by kAFS to make sure that a call
is still live in the event of a pending signal, only samples the received
packet serial number counter; it doesn't actually provoke a change in the
counter, rather relying on the server to happen to give us a packet in the
time window.

Fix this by adding a function to force a ping to be transmitted.

kAFS then keeps track of whether there's been a stall, and if so, uses the
new function to ping the server, resetting the timeout to allow the reply
to come back.

If there's a stall, a ping and the call is *still* stalled in the same
place after another period, then the call will be aborted.

Fixes: bc5e3a546d ("rxrpc: Use MSG_WAITALL to tell sendmsg() to temporarily ignore signals")
Fixes: f4d15fb6f9 ("rxrpc: Provide functions for allowing cleaner handling of signals")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-15 11:35:40 -08:00
David Howells
3bf0fb6f33 afs: Probe multiple fileservers simultaneously
Send probes to all the unprobed fileservers in a fileserver list on all
addresses simultaneously in an attempt to find out the fastest route whilst
not getting stuck for 20s on any server or address that we don't get a
reply from.

This alleviates the problem whereby attempting to access a new server can
take a long time because the rotation algorithm ends up rotating through
all servers and addresses until it finds one that responds.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24 00:41:09 +01:00
David Howells
18ac61853c afs: Fix callback handling
In some circumstances, the callback interest pointer is NULL, so in such a
case we can't dereference it when checking to see if the callback is
broken.  This causes an oops in some circumstances.

Fix this by replacing the function that worked out the aggregate break
counter with one that actually does the comparison, and then make that
return true (ie. broken) if there is no callback interest as yet (ie. the
pointer is NULL).

Fixes: 68251f0a68 ("afs: Fix whole-volume callback handling")
Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24 00:41:09 +01:00
David Howells
2feeaf8433 afs: Eliminate the address pointer from the address list cursor
Eliminate the address pointer from the address list cursor as it's
redundant (ac->addrs[ac->index] can be used to find the same address) and
address lists must be replaced rather than being rearranged, so is of
limited value.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24 00:41:09 +01:00
David Howells
744bcd713a afs: Allow dumping of server cursor on operation failure
Provide an option to allow the file or volume location server cursor to be
dumped if the rotation routine falls off the end without managing to
contact a server.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24 00:41:09 +01:00
David Howells
30062bd13e afs: Implement YFS support in the fs client
Implement support for talking to YFS-variant fileservers in the cache
manager and the filesystem client.  These implement upgraded services on
the same port as their AFS services.

YFS fileservers provide expanded capabilities over AFS.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24 00:41:08 +01:00
David Howells
d4936803a9 afs: Expand data structure fields to support YFS
Expand fields in various data structures to support the expanded
information that YFS is capable of returning.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24 00:41:08 +01:00
David Howells
f58db83fd3 afs: Get the target vnode in afs_rmdir() and get a callback on it
Get the target vnode in afs_rmdir() and validate it before we attempt the
deletion, The vnode pointer will be passed through to the delivery function
in a later patch so that the delivery function can mark it deleted.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24 00:41:08 +01:00
David Howells
12d8e95a91 afs: Calc callback expiry in op reply delivery
Calculate the callback expiration time at the point of operation reply
delivery, using the reply time queried from AF_RXRPC on that call as a
base.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24 00:41:08 +01:00
David Howells
36bb5f490a afs: Fix FS.FetchStatus delivery from updating wrong vnode
The FS.FetchStatus reply delivery function was updating inode of the
directory in which a lookup had been done with the status of the looked up
file.  This corrupts some of the directory state.

Fixes: 5cf9dd55a0 ("afs: Prospectively look up extra files when doing a single lookup")
Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24 00:41:08 +01:00
David Howells
35dbfba311 afs: Implement the YFS cache manager service
Implement the YFS cache manager service which gives extra capabilities on
top of AFS.  This is done by listening for an additional service on the
same port and indicating that anyone requesting an upgrade should be
upgraded to the YFS port.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24 00:41:08 +01:00
David Howells
06aeb29714 afs: Remove callback details from afs_callback_break struct
Remove unnecessary details of a broken callback, such as version, expiry
and type, from the afs_callback_break struct as they're not actually used
and make the list take more memory.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24 00:41:08 +01:00
David Howells
0067191201 afs: Commit the status on a new file/dir/symlink
Call the function to commit the status on a new file, dir or symlink so
that the access rights for the caller's key are cached for that object.

Without this, the next access to the file will cause a FetchStatus
operation to be emitted to retrieve the access rights.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24 00:41:08 +01:00
David Howells
3b6492df41 afs: Increase to 64-bit volume ID and 96-bit vnode ID for YFS
Increase the sizes of the volume ID to 64 bits and the vnode ID (inode
number equivalent) to 96 bits to allow the support of YFS.

This requires the iget comparator to check the vnode->fid rather than i_ino
and i_generation as i_ino is not sufficiently capacious.  It also requires
this data to be placed into the vnode cache key for fscache.

For the moment, just discard the top 32 bits of the vnode ID when returning
it though stat.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24 00:41:08 +01:00
David Howells
2a0b4f64c9 afs: Don't invoke the server to read data beyond EOF
When writing a new page, clear space in the page rather than attempting to
load it from the server if the space is beyond the EOF.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24 00:41:08 +01:00
David Howells
f51375cd9e afs: Add a couple of tracepoints to log I/O errors
Add a couple of tracepoints to log the production of I/O errors within the AFS
filesystem.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24 00:41:07 +01:00
David Howells
4ac15ea536 afs: Handle EIO from delivery function
Fix afs_deliver_to_call() to handle -EIO being returned by the operation
delivery function, indicating that the call found itself in the wrong
state, by printing an error and aborting the call.

Currently, an assertion failure will occur.  This can happen, say, if the
delivery function falls off the end without calling afs_extract_data() with
the want_more parameter set to false to collect the end of the Rx phase of
a call.

The assertion failure looks like:

	AFS: Assertion failed
	4 == 7 is false
	0x4 == 0x7 is false
	------------[ cut here ]------------
	kernel BUG at fs/afs/rxrpc.c:462!

and is matched in the trace buffer by a line like:

kworker/7:3-3226 [007] ...1 85158.030203: afs_io_error: c=0003be0c r=-5 CM_REPLY

Fixes: 98bf40cd99 ("afs: Protect call->state changes against signals")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24 00:41:07 +01:00
David Howells
ded2f4c58a afs: Fix TTL on VL server and address lists
Currently the TTL on VL server and address lists isn't set in all
circumstances and may be set to poor choices in others, since the TTL is
derived from the SRV/AFSDB DNS record if and when available.

Fix the TTL by limiting the range to a minimum and maximum from the current
time.  At some point these can be made into sysctl knobs.  Further, use the
TTL we obtained from the upcall to set the expiry on negative results too;
in future a mechanism can be added to force reloading of such data.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24 00:41:07 +01:00
David Howells
0a5143f2f8 afs: Implement VL server rotation
Track VL servers as independent entities rather than lumping all their
addresses together into one set and implement server-level rotation by:

 (1) Add the concept of a VL server list, where each server has its own
     separate address list.  This code is similar to the FS server list.

 (2) Use the DNS resolver to retrieve a set of servers and their associated
     addresses, ports, preference and weight ratings.

 (3) In the case of a legacy DNS resolver or an address list given directly
     through /proc/net/afs/cells, create a list containing just a dummy
     server record and attach all the addresses to that.

 (4) Implement a simple rotation policy, for the moment ignoring the
     priorities and weights assigned to the servers.

 (5) Show the address list through /proc/net/afs/<cell>/vlservers.  This
     also displays the source and status of the data as indicated by the
     upcall.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24 00:41:07 +01:00
David Howells
e7f680f45b afs: Improve FS server rotation error handling
Improve the error handling in FS server rotation by:

 (1) Cache the latest useful error value for the fs operation as a whole in
     struct afs_fs_cursor separately from the error cached in the
     afs_addr_cursor struct.  The one in the address cursor gets clobbered
     occasionally.  Copy over the error to the fs operation only when it's
     something we'd be interested in passing to userspace.

 (2) Make it so that EDESTADDRREQ is the default that is seen only if no
     addresses are available to be accessed.

 (3) When calling utility functions, such as checking a volume status or
     probing a fileserver, don't let a successful result clobber the cached
     error in the cursor; instead, stash the result in a temporary variable
     until it has been assessed.

 (4) Don't return ETIMEDOUT or ETIME if a better error, such as
     ENETUNREACH, is already cached.

 (5) On leaving the rotation loop, turn any remote abort code into a more
     useful error than ECONNABORTED.

Fixes: d2ddc776a4 ("afs: Overhaul volume and server record caching and fileserver rotation")
Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24 00:41:07 +01:00
David Howells
12bdcf333f afs: Set up the iov_iter before calling afs_extract_data()
afs_extract_data sets up a temporary iov_iter and passes it to AF_RXRPC
each time it is called to describe the remaining buffer to be filled.

Instead:

 (1) Put an iterator in the afs_call struct.

 (2) Set the iterator for each marshalling stage to load data into the
     appropriate places.  A number of convenience functions are provided to
     this end (eg. afs_extract_to_buf()).

     This iterator is then passed to afs_extract_data().

 (3) Use the new ITER_DISCARD iterator to discard any excess data provided
     by FetchData.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24 00:41:07 +01:00