Commit Graph

72 Commits

Author SHA1 Message Date
Marc Dionne
bfacaf71a1
afs: Fix ignored callbacks over ipv4
When searching for a matching peer, all addresses need to be searched,
not just the ipv6 ones in the fs_addresses6 list.

Given that the lists no longer contain addresses, there is little
reason to splitting things between separate lists, so unify them
into a single list.

When processing an incoming callback from an ipv4 address, this would
lead to a failure to set call->server, resulting in the callback being
ignored and the client seeing stale contents.

Fixes: 72904d7b9b ("rxrpc, afs: Allow afs to pin rxrpc_peer objects")
Reported-by: Markus Suvanto <markus.suvanto@gmail.com>
Link: https://lists.infradead.org/pipermail/linux-afs/2024-February/008035.html
Signed-off-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lists.infradead.org/pipermail/linux-afs/2024-February/008037.html # v1
Link: https://lists.infradead.org/pipermail/linux-afs/2024-February/008066.html # v2
Link: https://lore.kernel.org/r/20240219143906.138346-2-dhowells@redhat.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-20 09:51:21 +01:00
David Howells
32222f0978 afs: Apply server breaks to mmap'd files in the call processor
Apply server breaks to mmap'd files that are being used from that server
from the call processor work function rather than punting it off to a
workqueue.  The work item, afs_server_init_callback(), then bumps each
individual inode off to its own work item introducing a potentially lengthy
delay.  This reduces that delay at the cost of extending the amount of time
we delay replying to the CB.InitCallBack3 notification RPC from the server.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2024-01-01 16:37:27 +00:00
David Howells
ca0e79a460 afs: Make it possible to find the volumes that are using a server
Make it possible to find the afs_volume structs that are using an
afs_server struct to aid in breaking volume callbacks.

The way this is done is that each afs_volume already has an array of
afs_server_entry records that point to the servers where that volume might
be found.  An afs_volume backpointer and a list node is added to each entry
and each entry is then added to an RCU-traversable list on the afs_server
to which it points.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2024-01-01 16:37:27 +00:00
David Howells
f49b594df3 afs: Keep a record of the current fileserver endpoint state
Keep a record of the current fileserver endpoint state, including the probe
state, and replace it when a new probe is started rather than just
squelching the old state and overwriting it.  Clearance of the old state
can cause a race if there's another thread also currently trying to
communicate with that server.

It appears that this race might be the culprit for some occasions where
kafs complains about invalid data in the RPC reply because the rotation
algorithm fell all the way through without actually issuing an RPC call and
the error return got filled in from the probe state (which has a zero error
recorded).  Whatever happens to be in the caller's reply buffer is then
taken as the response.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2024-01-01 16:37:27 +00:00
David Howells
98f9fda205 afs: Fold the afs_addr_cursor struct in
Fold the afs_addr_cursor struct into the afs_operation struct and the
afs_vl_cursor struct and fold its operations into their callers also.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2023-12-24 15:22:55 +00:00
David Howells
e38f299ece afs: Use peer + service_id as call address
Use the rxrpc_peer plus the service ID as the call address instead of
passing in a sockaddr_srx down to rxrpc.  The peer record is obtained by
using rxrpc_kernel_get_peer().  This avoids the need to repeatedly look up
the peer and allows rxrpc to hold on to resources for it.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2023-12-24 15:22:55 +00:00
David Howells
1e5d849325 afs: Add a tracepoint for struct afs_addr_list
Add a tracepoint to track the lifetime of the afs_addr_list struct.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2023-12-24 15:22:54 +00:00
David Howells
aa453becce afs: Simplify error handling
Simplify error handling a bit by moving it from the afs_addr_cursor struct
to the afs_operation and afs_vl_cursor structs and using the error
prioritisation function for accumulating errors from multiple sources (AFS
tries to rotate between multiple fileservers, some of which may be
inaccessible or in some state of offlinedness).

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2023-12-24 15:22:53 +00:00
David Howells
2de5599f63 afs: Wrap most op->error accesses with inline funcs
Wrap most op->error accesses with inline funcs which will make it easier
for a subsequent patch to replace op->error with something else.  Two
functions are added to this end:

 (1) afs_op_error() - Get the error code.

 (2) afs_op_set_error() - Set the error code.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2023-12-24 15:22:53 +00:00
David Howells
72904d7b9b rxrpc, afs: Allow afs to pin rxrpc_peer objects
Change rxrpc's API such that:

 (1) A new function, rxrpc_kernel_lookup_peer(), is provided to look up an
     rxrpc_peer record for a remote address and a corresponding function,
     rxrpc_kernel_put_peer(), is provided to dispose of it again.

 (2) When setting up a call, the rxrpc_peer object used during a call is
     now passed in rather than being set up by rxrpc_connect_call().  For
     afs, this meenat passing it to rxrpc_kernel_begin_call() rather than
     the full address (the service ID then has to be passed in as a
     separate parameter).

 (3) A new function, rxrpc_kernel_remote_addr(), is added so that afs can
     get a pointer to the transport address for display purposed, and
     another, rxrpc_kernel_remote_srx(), to gain a pointer to the full
     rxrpc address.

 (4) The function to retrieve the RTT from a call, rxrpc_kernel_get_srtt(),
     is then altered to take a peer.  This now returns the RTT or -1 if
     there are insufficient samples.

 (5) Rename rxrpc_kernel_get_peer() to rxrpc_kernel_call_get_peer().

 (6) Provide a new function, rxrpc_kernel_get_peer(), to get a ref on a
     peer the caller already has.

This allows the afs filesystem to pin the rxrpc_peer records that it is
using, allowing faster lookups and pointer comparisons rather than
comparing sockaddr_rxrpc contents.  It also makes it easier to get hold of
the RTT.  The following changes are made to afs:

 (1) The addr_list struct's addrs[] elements now hold a peer struct pointer
     and a service ID rather than a sockaddr_rxrpc.

 (2) When displaying the transport address, rxrpc_kernel_remote_addr() is
     used.

 (3) The port arg is removed from afs_alloc_addrlist() since it's always
     overridden.

 (4) afs_merge_fs_addr4() and afs_merge_fs_addr6() do peer lookup and may
     now return an error that must be handled.

 (5) afs_find_server() now takes a peer pointer to specify the address.

 (6) afs_find_server(), afs_compare_fs_alists() and afs_merge_fs_addr[46]{}
     now do peer pointer comparison rather than address comparison.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2023-12-24 15:22:50 +00:00
David Howells
07f3502b33 afs: Turn the afs_addr_list address array into an array of structs
Turn the afs_addr_list address array into an array of structs, thereby
allowing per-address (such as RTT) info to be added.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
2023-12-24 15:22:50 +00:00
Oleg Nesterov
1702e0654c afs: fix the usage of read_seqbegin_or_lock() in afs_find_server*()
David Howells says:

 (5) afs_find_server().

     There could be a lot of servers in the list and each server can have
     multiple addresses, so I think this would be better with an exclusive
     second pass.

     The server list isn't likely to change all that often, but when it does
     change, there's a good chance several servers are going to be
     added/removed one after the other.  Further, this is only going to be
     used for incoming cache management/callback requests from the server,
     which hopefully aren't going to happen too often - but it is remotely
     drivable.

 (6) afs_find_server_by_uuid().

     Similarly to (5), there could be a lot of servers to search through, but
     they are in a tree not a flat list, so it should be faster to process.
     Again, it's not likely to change that often and, again, when it does
     change it's likely to involve multiple changes.  This can be driven
     remotely by an incoming cache management request but is mostly going to
     be driven by setting up or reconfiguring a volume's server list -
     something that also isn't likely to happen often.

Make the "seq" counter odd on the 2nd pass, otherwise read_seqbegin_or_lock()
never takes the lock.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/r/20231130115614.GA21581@redhat.com/
2023-12-24 15:22:48 +00:00
Marc Dionne
ef4d3ea405 afs: Fix server->active leak in afs_put_server
The atomic_read was accidentally replaced with atomic_inc_return,
which prevents the server from getting cleaned up and causes rmmod
to hang with a warning:

    Can't purge s=00000001

Fixes: 2757a4dc18 ("afs: Fix access after dec in put functions")
Signed-off-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20221130174053.2665818-1-marc.dionne@auristor.com/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-11-30 10:02:37 -08:00
David Howells
2757a4dc18 afs: Fix access after dec in put functions
Reference-putting functions should not access the object being put after
decrementing the refcount unless they reduce the refcount to zero.

Fix a couple of instances of this in afs by copying the information to be
logged by tracepoint to local variables before doing the decrement.

[Fixed a bit in afs_put_server() that I'd missed but Marc caught]

Fixes: 341f741f04 ("afs: Refcount the afs_call struct")
Fixes: 4521819369 ("afs: Trace afs_server usage")
Fixes: 977e5f8ed0 ("afs: Split the usage count on struct afs_server")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/r/165911278430.3745403.16526310736054780645.stgit@warthog.procyon.org.uk/ # v1
2022-08-02 18:21:29 +01:00
David Howells
c56f9ec8b2 afs: Use refcount_t rather than atomic_t
Use refcount_t rather than atomic_t in afs to make use of the count
checking facilities provided.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/r/165911277768.3745403.423349776836296452.stgit@warthog.procyon.org.uk/ # v1
2022-08-02 18:10:11 +01:00
David Howells
6e0e99d58a afs: Fix mmap coherency vs 3rd-party changes
Fix the coherency management of mmap'd data such that 3rd-party changes
become visible as soon as possible after the callback notification is
delivered by the fileserver.  This is done by the following means:

 (1) When we break a callback on a vnode specified by the CB.CallBack call
     from the server, we queue a work item (vnode->cb_work) to go and
     clobber all the PTEs mapping to that inode.

     This causes the CPU to trip through the ->map_pages() and
     ->page_mkwrite() handlers if userspace attempts to access the page(s)
     again.

     (Ideally, this would be done in the service handler for CB.CallBack,
     but the server is waiting for our reply before considering, and we
     have a list of vnodes, all of which need breaking - and the process of
     getting the mmap_lock and stripping the PTEs on all CPUs could be
     quite slow.)

 (2) Call afs_validate() from the ->map_pages() handler to check to see if
     the file has changed and to get a new callback promise from the
     server.

Also handle the fileserver telling us that it's dropping all callbacks,
possibly after it's been restarted by sending us a CB.InitCallBackState*
call by the following means:

 (3) Maintain a per-cell list of afs files that are currently mmap'd
     (cell->fs_open_mmaps).

 (4) Add a work item to each server that is invoked if there are any open
     mmaps when CB.InitCallBackState happens.  This work item goes through
     the aforementioned list and invokes the vnode->cb_work work item for
     each one that is currently using this server.

     This causes the PTEs to be cleared, causing ->map_pages() or
     ->page_mkwrite() to be called again, thereby calling afs_validate()
     again.

I've chosen to simply strip the PTEs at the point of notification reception
rather than invalidate all the pages as well because (a) it's faster, (b)
we may get a notification for other reasons than the data being altered (in
which case we don't want to clobber the pagecache) and (c) we need to ask
the server to find out - and I don't want to wait for the reply before
holding up userspace.

This was tested using the attached test program:

	#include <stdbool.h>
	#include <stdio.h>
	#include <stdlib.h>
	#include <unistd.h>
	#include <fcntl.h>
	#include <sys/mman.h>
	int main(int argc, char *argv[])
	{
		size_t size = getpagesize();
		unsigned char *p;
		bool mod = (argc == 3);
		int fd;
		if (argc != 2 && argc != 3) {
			fprintf(stderr, "Format: %s <file> [mod]\n", argv[0]);
			exit(2);
		}
		fd = open(argv[1], mod ? O_RDWR : O_RDONLY);
		if (fd < 0) {
			perror(argv[1]);
			exit(1);
		}

		p = mmap(NULL, size, mod ? PROT_READ|PROT_WRITE : PROT_READ,
			 MAP_SHARED, fd, 0);
		if (p == MAP_FAILED) {
			perror("mmap");
			exit(1);
		}
		for (;;) {
			if (mod) {
				p[0]++;
				msync(p, size, MS_ASYNC);
				fsync(fd);
			}
			printf("%02x", p[0]);
			fflush(stdout);
			sleep(1);
		}
	}

It runs in two modes: in one mode, it mmaps a file, then sits in a loop
reading the first byte, printing it and sleeping for a second; in the
second mode it mmaps a file, then sits in a loop incrementing the first
byte and flushing, then printing and sleeping.

Two instances of this program can be run on different machines, one doing
the reading and one doing the writing.  The reader should see the changes
made by the writer, but without this patch, they aren't because validity
checking is being done lazily - only on entry to the filesystem.

Testing the InitCallBackState change is more complicated.  The server has
to be taken offline, the saved callback state file removed and then the
server restarted whilst the reading-mode program continues to run.  The
client machine then has to poke the server to trigger the InitCallBackState
call.

Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Markus Suvanto <markus.suvanto@gmail.com>
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/r/163111668833.283156.382633263709075739.stgit@warthog.procyon.org.uk/
2021-09-13 09:10:39 +01:00
David Howells
7530d3eb3d afs: Don't assert on unpurgeable server records
Don't give an assertion failure on unpurgeable afs_server records - which
kills the thread - but rather emit a trace line when we are purging a
record (which only happens during network namespace removal or rmmod) and
print a notice of the problem.

Signed-off-by: David Howells <dhowells@redhat.com>
2020-10-16 14:39:34 +01:00
David Howells
5481fc6eb8 afs: Fix hang on rmmod due to outstanding timer
The fileserver probe timer, net->fs_probe_timer, isn't cancelled when
the kafs module is being removed and so the count it holds on
net->servers_outstanding doesn't get dropped..

This causes rmmod to wait forever.  The hung process shows a stack like:

	afs_purge_servers+0x1b5/0x23c [kafs]
	afs_net_exit+0x44/0x6e [kafs]
	ops_exit_list+0x72/0x93
	unregister_pernet_operations+0x14c/0x1ba
	unregister_pernet_subsys+0x1d/0x2a
	afs_exit+0x29/0x6f [kafs]
	__do_sys_delete_module.isra.0+0x1a2/0x24b
	do_syscall_64+0x51/0x95
	entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fix this by:

 (1) Attempting to cancel the probe timer and, if successful, drop the
     count that the timer was holding.

 (2) Make the timer function just drop the count and not schedule the
     prober if the afs portion of net namespace is being destroyed.

Also, whilst we're at it, make the following changes:

 (3) Initialise net->servers_outstanding to 1 and decrement it before
     waiting on it so that it doesn't generate wake up events by being
     decremented to 0 until we're cleaning up.

 (4) Switch the atomic_dec() on ->servers_outstanding for ->fs_timer in
     afs_purge_servers() to use the helper function for that.

Fixes: f6cbb368bc ("afs: Actively poll fileservers to maintain NAT or firewall openings")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-20 12:01:58 -07:00
David Howells
f3c130e6e6 afs: Don't use probe running state to make decisions outside probe code
Don't use the running state for fileserver probes to make decisions about
which server to use as the state is cleared at the start of a probe and
also intermediate values might be misleading.

Instead, add a separate 'latest known' rtt in the afs_server struct and a
flag to indicate if the server is known to be responding and update these
as and when we know what to change them to.

Signed-off-by: David Howells <dhowells@redhat.com>
2020-06-04 15:37:58 +01:00
David Howells
3c4c4075fc afs: Fix the by-UUID server tree to allow servers with the same UUID
Whilst it shouldn't happen, it is possible for multiple fileservers to
share a UUID, particularly if an entire cell has been duplicated, UUIDs and
all.  In such a case, it's not necessarily possible to map the effect of
the CB.InitCallBackState3 incoming RPC to a specific server unambiguously
by UUID and thus to a specific cell.

Indeed, there's a problem whereby multiple server records may need to
occupy the same spot in the rb_tree rooted in the afs_net struct.

Fix this by allowing servers to form a list, with the head of the list in
the tree.  When the front entry in the list is removed, the second in the
list just replaces it.  afs_init_callback_state() then just goes down the
line, poking each server in the list.

This means that some servers will be unnecessarily poked, unfortunately.
An alternative would be to route by call parameters.

Reported-by: Jeffrey Altman <jaltman@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Fixes: d2ddc776a4 ("afs: Overhaul volume and server record caching and fileserver rotation")
2020-06-04 15:37:57 +01:00
David Howells
20325960f8 afs: Reorganise volume and server trees to be rooted on the cell
Reorganise afs_volume objects such that they're in a tree keyed on volume
ID, rooted at on an afs_cell object rather than being in multiple trees,
each of which is rooted on an afs_server object.

afs_server structs become per-cell and acquire a pointer to the cell.

The process of breaking a callback then starts with finding the server by
its network address, following that to the cell and then looking up each
volume ID in the volume tree.

This is simpler than the afs_vol_interest/afs_cb_interest N:M mapping web
and allows those structs and the code for maintaining them to be simplified
or removed.

It does make a couple of things a bit more tricky, though:

 (1) Operations now start with a volume, not a server, so there can be more
     than one answer as to whether or not the server we'll end up using
     supports the FS.InlineBulkStatus RPC.

 (2) CB RPC operations that specify the server UUID.  There's still a tree
     of servers by UUID on the afs_net struct, but the UUIDs in it aren't
     guaranteed unique.

Signed-off-by: David Howells <dhowells@redhat.com>
2020-06-04 15:37:57 +01:00
David Howells
e49c7b2f6d afs: Build an abstraction around an "operation" concept
Turn the afs_operation struct into the main way that most fileserver
operations are managed.  Various things are added to the struct, including
the following:

 (1) All the parameters and results of the relevant operations are moved
     into it, removing corresponding fields from the afs_call struct.
     afs_call gets a pointer to the op.

 (2) The target volume is made the main focus of the operation, rather than
     the target vnode(s), and a bunch of op->vnode->volume are made
     op->volume instead.

 (3) Two vnode records are defined (op->file[]) for the vnode(s) involved
     in most operations.  The vnode record (struct afs_vnode_param)
     contains:

	- The vnode pointer.

	- The fid of the vnode to be included in the parameters or that was
          returned in the reply (eg. FS.MakeDir).

	- The status and callback information that may be returned in the
     	  reply about the vnode.

	- Callback break and data version tracking for detecting
          simultaneous third-parth changes.

 (4) Pointers to dentries to be updated with new inodes.

 (5) An operations table pointer.  The table includes pointers to functions
     for issuing AFS and YFS-variant RPCs, handling the success and abort
     of an operation and handling post-I/O-lock local editing of a
     directory.

To make this work, the following function restructuring is made:

 (A) The rotation loop that issues calls to fileservers that can be found
     in each function that wants to issue an RPC (such as afs_mkdir()) is
     extracted out into common code, in a new file called fs_operation.c.

 (B) The rotation loops, such as the one in afs_mkdir(), are replaced with
     a much smaller piece of code that allocates an operation, sets the
     parameters and then calls out to the common code to do the actual
     work.

 (C) The code for handling the success and failure of an operation are
     moved into operation functions (as (5) above) and these are called
     from the core code at appropriate times.

 (D) The pseudo inode getting stuff used by the dynamic root code is moved
     over into dynroot.c.

 (E) struct afs_iget_data is absorbed into the operation struct and
     afs_iget() expects to be given an op pointer and a vnode record.

 (F) Point (E) doesn't work for the root dir of a volume, but we know the
     FID in advance (it's always vnode 1, unique 1), so a separate inode
     getter, afs_root_iget(), is provided to special-case that.

 (G) The inode status init/update functions now also take an op and a vnode
     record.

 (H) The RPC marshalling functions now, for the most part, just take an
     afs_operation struct as their only argument.  All the data they need
     is held there.  The result delivery functions write their answers
     there as well.

 (I) The call is attached to the operation and then the operation core does
     the waiting.

And then the new operation code is, for the moment, made to just initialise
the operation, get the appropriate vnode I/O locks and do the same rotation
loop as before.

This lays the foundation for the following changes in the future:

 (*) Overhauling the rotation (again).

 (*) Support for asynchronous I/O, where the fileserver rotation must be
     done asynchronously also.

Signed-off-by: David Howells <dhowells@redhat.com>
2020-06-04 15:37:17 +01:00
David Howells
a310082f6d afs: Rename struct afs_fs_cursor to afs_operation
As a prelude to implementing asynchronous fileserver operations in the afs
filesystem, rename struct afs_fs_cursor to afs_operation.

This struct is going to form the core of the operation management and is
going to acquire more members in later.

Signed-off-by: David Howells <dhowells@redhat.com>
2020-05-31 15:19:52 +01:00
David Howells
8230fd8217 afs: Make callback processing more efficient.
afs_vol_interest objects represent the volume IDs currently being accessed
from a fileserver.  These hold lists of afs_cb_interest objects that
repesent the superblocks using that volume ID on that server.

When a callback notification from the server telling of a modification by
another client arrives, the volume ID specified in the notification is
looked up in the server's afs_vol_interest list.  Through the
afs_cb_interest list, the relevant superblocks can be iterated over and the
specific inode looked up and marked in each one.

Make the following efficiency improvements:

 (1) Hold rcu_read_lock() over the entire processing rather than locking it
     each time.

 (2) Do all the callbacks for each vid together rather than individually.
     Each volume then only needs to be looked up once.

 (3) afs_vol_interest objects are now stored in an rb_tree rather than a
     flat list to reduce the lookup step count.

 (4) afs_vol_interest lookup is now done with RCU, but because it's in an
     rb_tree which may rotate under us, a seqlock is used so that if it
     changes during the walk, we repeat the walk with a lock held.

With this and the preceding patch which adds RCU-based lookups in the inode
cache, target volumes/vnodes can be taken without the need to take any
locks, except on the target itself.

Signed-off-by: David Howells <dhowells@redhat.com>
2020-05-31 15:19:51 +01:00
David Howells
f6cbb368bc afs: Actively poll fileservers to maintain NAT or firewall openings
When an AFS client accesses a file, it receives a limited-duration callback
promise that the server will notify it if another client changes a file.
This callback duration can be a few hours in length.

If a client mounts a volume and then an application prevents it from being
unmounted, say by chdir'ing into it, but then does nothing for some time,
the rxrpc_peer record will expire and rxrpc-level keepalive will cease.

If there is NAT or a firewall between the client and the server, the route
back for the server may close after a comparatively short duration, meaning
that attempts by the server to notify the client may then bounce.

The client, however, may (so far as it knows) still have a valid unexpired
promise and will then rely on its cached data and will not see changes made
on the server by a third party until it incidentally rechecks the status or
the promise needs renewal.

To deal with this, the client needs to regularly probe the server.  This
has two effects: firstly, it keeps a route open back for the server, and
secondly, it causes the server to disgorge any notifications that got
queued up because they couldn't be sent.

Fix this by adding a mechanism to emit regular probes.

Two levels of probing are made available: Under normal circumstances the
'slow' queue will be used for a fileserver - this just probes the preferred
address once every 5 mins or so; however, if server fails to respond to any
probes, the server will shift to the 'fast' queue from which all its
interfaces will be probed every 30s.  When it finally responds, the record
will switch back to the slow queue.

Further notes:

 (1) Probing is now no longer driven from the fileserver rotation
     algorithm.

 (2) Probes are dispatched to all interfaces on a fileserver when that an
     afs_server object is set up to record it.

 (3) The afs_server object is removed from the probe queues when we start
     to probe it.  afs_is_probing_server() returns true if it's not listed
     - ie. it's undergoing probing.

 (4) The afs_server object is added back on to the probe queue when the
     final outstanding probe completes, but the probed_at time is set when
     we're about to launch a probe so that it's not dependent on the probe
     duration.

 (5) The timer and the work item added for this must be handed a count on
     net->servers_outstanding, which they hand on or release.  This makes
     sure that network namespace cleanup waits for them.

Fixes: d2ddc776a4 ("afs: Overhaul volume and server record caching and fileserver rotation")
Reported-by: Dave Botsch <botsch@cnf.cornell.edu>
Signed-off-by: David Howells <dhowells@redhat.com>
2020-05-31 15:19:51 +01:00
David Howells
977e5f8ed0 afs: Split the usage count on struct afs_server
Split the usage count on the afs_server struct to have an active count that
registers who's actually using it separately from the reference count on
the object.

This allows a future patch to dispatch polling probes without advancing the
"unuse" time into the future each time we emit a probe, which would
otherwise prevent unused server records from expiring.

Included in this:

 (1) The latter part of afs_destroy_server() in which the RCU destruction
     of afs_server objects is invoked and the outstanding server count is
     decremented is split out into __afs_put_server().

 (2) afs_put_server() now calls __afs_put_server() rather then setting the
     management timer.

 (3) The calls begun by afs_fs_give_up_all_callbacks() and
     afs_fs_get_capabilities() can now take a ref on the server record, so
     afs_destroy_server() can just drop its ref and needn't wait for the
     completion of these calls.  They'll put the ref when they're done.

 (4) Because of (3), afs_fs_probe_done() no longer needs to wake up
     afs_destroy_server() with server->probe_outstanding.

 (5) afs_gc_servers can be simplified.  It only needs to check if
     server->active is 0 rather than playing games with the refcount.

 (6) afs_manage_servers() can propose a server for gc if usage == 0 rather
     than if ref == 1.  The gc is effected by (5).

Signed-off-by: David Howells <dhowells@redhat.com>
2020-05-31 15:19:51 +01:00
David Howells
8100680592 afs: Use the serverUnique field in the UVLDB record to reduce rpc ops
The U-version VLDB volume record retrieved by the VL.GetEntryByNameU rpc op
carries a change counter (the serverUnique field) for each fileserver
listed in the record as backing that volume.  This is incremented whenever
the registration details for a fileserver change (such as its address
list).  Note that the same value will be seen in all UVLDB records that
refer to that fileserver.

This should be checked before calling the VL server to re-query the address
list for a fileserver.  If it's the same, there's no point doing the query.

Reported-by: Jeffrey Altman <jaltman@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2020-05-31 15:19:51 +01:00
David Howells
c4bfda16d1 afs: Make record checking use TASK_UNINTERRUPTIBLE when appropriate
When an operation is meant to be done uninterruptibly (such as
FS.StoreData), we should not be allowing volume and server record checking
to be interrupted.

Fixes: d2ddc776a4 ("afs: Overhaul volume and server record caching and fileserver rotation")
Signed-off-by: David Howells <dhowells@redhat.com>
2020-04-24 16:33:32 +01:00
Marc Dionne
9bd0160d12 afs: Fix afs_find_server lookups for ipv4 peers
afs_find_server tries to find a server that has an address that
matches the transport address of an rxrpc peer.  The code assumes
that the transport address is always ipv6, with ipv4 represented
as ipv4 mapped addresses, but that's not the case.  If the transport
family is AF_INET, srx->transport.sin6.sin6_addr.s6_addr32[] will
be beyond the actual ipv4 address and will always be 0, and all
ipv4 addresses will be seen as matching.

As a result, the first ipv4 address seen on any server will be
considered a match, and the server returned may be the wrong one.

One of the consequences is that callbacks received over ipv4 will
only be correctly applied for the server that happens to have the
first ipv4 address on the fs_addresses4 list.  Callbacks over ipv4
from all other servers are dropped, causing the client to serve stale
data.

This is fixed by looking at the transport family, and comparing ipv4
addresses based on a sockaddr_in structure rather than a sockaddr_in6.

Fixes: d2ddc776a4 ("afs: Overhaul volume and server record caching and fileserver rotation")
Signed-off-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2019-12-09 15:04:43 +00:00
zhengbin
4fe171bb81 afs: Remove set but not used variable 'ret'
Fixes gcc '-Wunused-but-set-variable' warning:

fs/afs/server.c: In function afs_install_server:
fs/afs/server.c:157:6: warning: variable ret set but not used [-Wunused-but-set-variable]

It is not used since commit d2ddc776a4 ("afs:
Overhaul volume and server record caching and fileserver rotation")

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2019-11-21 20:36:04 +00:00
Linus Torvalds
8dda9957e3 AFS development
-----BEGIN PGP SIGNATURE-----
 
 iQIVAwUAXRyW8vu3V2unywtrAQIhsw//cVtxLx4ZCox5Z/93cdqych8RoCrwcUEG
 Cli0NAjlp/0HETvCsIqdkPKf+4OYCW1tHB2KTdbFdQLZptLgoEhykx89k70z9ggb
 ViieEa1GvAKhdamVqkPUC+3Q33uzyRaK7Gi5N3phJoaO+o328SlrPG0LerQgY0Np
 Rf3je56A1gIjEgWTmpStxiY262jlgaR3IuvpOqbu2G0TQVWV8CsBKw61fTdmEEQp
 dIkNO/xFXS+PvPdmQe5zCAjD/W2D+ggeBMbBwHF411qA60plGinubBYKZ98ikliZ
 OnQQPExI7mroIMzpYT+rzEQyxui2nz5t+Hj+d6t7iIvitNcX/Q53sVTq3RfQ0FjG
 QCd+j/l2p7fkXK4Sxgb/UBkj/pRr6W+FYSbQ/tmpD8UypEf5B3ln6GuA6yTMuNRF
 wVb744slKWq0c7KUuXmz806B2qJoyFG206jyFnoByvs6cPmB1+JqhBBYOKHcwjbo
 HIK+oUKkEfE6ofjQ3B9xOQ1anfbRnjjfJCmXvns9v57y/nRP2P78HUJNnEsOolk2
 nc3Ep41OgeZdwkts9KnSjmwy6VF3UZ2NQEiWXsUIOxGMtcodw9ci1bpquJ71oyut
 4sFMJvMU4eJD+XuCOlAgpbTaQ0Wuf11kFpl1Cof4fj0Z09C25Ahj6iKEKnumtO+4
 edfNLlwO6oo=
 =wgib
 -----END PGP SIGNATURE-----

Merge tag 'afs-next-20190628' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

Pull afs updates from David Howells:
 "A set of minor changes for AFS:

   - Remove an unnecessary check in afs_unlink()

   - Add a tracepoint for tracking callback management

   - Add a tracepoint for afs_server object usage

   - Use struct_size()

   - Add mappings for AFS UAE abort codes to Linux error codes, using
     symbolic names rather than hex numbers in the .c file"

* tag 'afs-next-20190628' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  afs: Add support for the UAE error table
  fs/afs: use struct_size() in kzalloc()
  afs: Trace afs_server usage
  afs: Add some callback management tracepoints
  afs: afs_unlink() doesn't need to check dentry->d_inode
2019-07-10 20:55:33 -07:00
David Howells
4521819369 afs: Trace afs_server usage
Add a tracepoint (afs_server) to track the afs_server object usage count.

Signed-off-by: David Howells <dhowells@redhat.com>
2019-06-20 18:12:17 +01:00
Thomas Gleixner
2874c5fd28 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152
Based on 1 normalized pattern(s):

  this program is free software you can redistribute it and or modify
  it under the terms of the gnu general public license as published by
  the free software foundation either version 2 of the license or at
  your option any later version

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 3029 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-30 11:26:32 -07:00
David Howells
20b8391fff afs: Make some RPC operations non-interruptible
Make certain RPC operations non-interruptible, including:

 (*) Set attributes
 (*) Store data

     We don't want to get interrupted during a flush on close, flush on
     unlock, writeback or an inode update, leaving us in a state where we
     still need to do the writeback or update.

 (*) Extend lock
 (*) Release lock

     We don't want to get lock extension interrupted as the file locks on
     the server are time-limited.  Interruption during lock release is less
     of an issue since the lock is time-limited, but it's better to
     complete the release to avoid a several-minute wait to recover it.

     *Setting* the lock isn't a problem if it's interrupted since we can
      just return to the user and tell them they were interrupted - at
      which point they can elect to retry.

 (*) Silly unlink

     We want to remove silly unlink files if we can, rather than leaving
     them for the salvager to clear up.

Note that whilst these calls are no longer interruptible, they do have
timeouts on them, so if the server stops responding the call will fail with
something like ETIME or ECONNRESET.

Without this, the following:

	kAFS: Unexpected error from FS.StoreData -512

appears in dmesg when a pending store data gets interrupted and some
processes may just hang.

Additionally, make the code that checks/updates the server record ignore
failure due to interruption if the main call is uninterruptible and if the
server has an address list.  The next op will check it again since the
expiration time on the old list has past.

Fixes: d2ddc776a4 ("afs: Overhaul volume and server record caching and fileserver rotation")
Reported-by: Jonathan Billings <jsbillings@jsbillings.org>
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2019-05-16 16:25:20 +01:00
David Howells
0ab4c95948 afs: Fix error propagation from server record check/update
afs_check/update_server_record() should be setting fc->error rather than
fc->ac.error as they're called from within the cursor iteration function.

afs_fs_cursor::error is where the error code of the attempt to call the
operation on multiple servers is integrated and is the final result,
whereas afs_addr_cursor::error is used to hold the error from individual
iterations of the call loop.  (Note there's also an afs_vl_cursor which
also wraps afs_addr_cursor for accessing VL servers rather than file
servers).

Fix this by setting fc->error in the afs_check/update_server_record() so
that any error incurred whilst talking to the VL server correctly
propagates to the final result.

This results in:

	kAFS: Unexpected error from FS.StoreData -512

being seen, even though the store-data op is non-interruptible.  The error
is actually coming from the server record update getting interrupted.

Fixes: d2ddc776a4 ("afs: Overhaul volume and server record caching and fileserver rotation")
Signed-off-by: David Howells <dhowells@redhat.com>
2019-05-16 16:25:20 +01:00
David Howells
eeba1e9cf3 afs: Fix in-progess ops to ignore server-level callback invalidation
The in-kernel afs filesystem client counts the number of server-level
callback invalidation events (CB.InitCallBackState* RPC operations) that it
receives from the server.  This is stored in cb_s_break in various
structures, including afs_server and afs_vnode.

If an inode is examined by afs_validate(), say, the afs_server copy is
compared, along with other break counters, to those in afs_vnode, and if
one or more of the counters do not match, it is considered that the
server's callback promise is broken.  At points where this happens,
AFS_VNODE_CB_PROMISED is cleared to indicate that the status must be
refetched from the server.

afs_validate() issues an FS.FetchStatus operation to get updated metadata -
and based on the updated data_version may invalidate the pagecache too.

However, the break counters are also used to determine whether to note a
new callback in the vnode (which would set the AFS_VNODE_CB_PROMISED flag)
and whether to cache the permit data included in the YFSFetchStatus record
by the server.


The problem comes when the server sends us a CB.InitCallBackState op.  The
first such instance doesn't cause cb_s_break to be incremented, but rather
causes AFS_SERVER_FL_NEW to be cleared - but thereafter, say some hours
after last use and all the volumes have been automatically unmounted and
the server has forgotten about the client[*], this *will* likely cause an
increment.

 [*] There are other circumstances too, such as the server restarting or
     needing to make space in its callback table.

Note that the server won't send us a CB.InitCallBackState op until we talk
to it again.

So what happens is:

 (1) A mount for a new volume is attempted, a inode is created for the root
     vnode and vnode->cb_s_break and AFS_VNODE_CB_PROMISED aren't set
     immediately, as we don't have a nominated server to talk to yet - and
     we may iterate through a few to find one.

 (2) Before the operation happens, afs_fetch_status(), say, notes in the
     cursor (fc.cb_break) the break counter sum from the vnode, volume and
     server counters, but the server->cb_s_break is currently 0.

 (3) We send FS.FetchStatus to the server.  The server sends us back
     CB.InitCallBackState.  We increment server->cb_s_break.

 (4) Our FS.FetchStatus completes.  The reply includes a callback record.

 (5) xdr_decode_AFSCallBack()/xdr_decode_YFSCallBack() check to see whether
     the callback promise was broken by checking the break counter sum from
     step (2) against the current sum.

     This fails because of step (3), so we don't set the callback record
     and, importantly, don't set AFS_VNODE_CB_PROMISED on the vnode.

This does not preclude the syscall from progressing, and we don't loop here
rechecking the status, but rather assume it's good enough for one round
only and will need to be rechecked next time.

 (6) afs_validate() it triggered on the vnode, probably called from
     d_revalidate() checking the parent directory.

 (7) afs_validate() notes that AFS_VNODE_CB_PROMISED isn't set, so doesn't
     update vnode->cb_s_break and assumes the vnode to be invalid.

 (8) afs_validate() needs to calls afs_fetch_status().  Go back to step (2)
     and repeat, every time the vnode is validated.

This primarily affects volume root dir vnodes.  Everything subsequent to
those inherit an already incremented cb_s_break upon mounting.


The issue is that we assume that the callback record and the cached permit
information in a reply from the server can't be trusted after getting a
server break - but this is wrong since the server makes sure things are
done in the right order, holding up our ops if necessary[*].

 [*] There is an extremely unlikely scenario where a reply from before the
     CB.InitCallBackState could get its delivery deferred till after - at
     which point we think we have a promise when we don't.  This, however,
     requires unlucky mass packet loss to one call.

AFS_SERVER_FL_NEW tries to paper over the cracks for the initial mount from
a server we've never contacted before, but this should be unnecessary.
It's also further insulated from the problem on an initial mount by
querying the server first with FS.GetCapabilities, which triggers the
CB.InitCallBackState.


Fix this by

 (1) Remove AFS_SERVER_FL_NEW.

 (2) In afs_calc_vnode_cb_break(), don't include cb_s_break in the
     calculation.

 (3) In afs_cb_is_broken(), don't include cb_s_break in the check.


Signed-off-by: David Howells <dhowells@redhat.com>
2019-04-13 08:37:37 +01:00
David Howells
3bf0fb6f33 afs: Probe multiple fileservers simultaneously
Send probes to all the unprobed fileservers in a fileserver list on all
addresses simultaneously in an attempt to find out the fastest route whilst
not getting stuck for 20s on any server or address that we don't get a
reply from.

This alleviates the problem whereby attempting to access a new server can
take a long time because the rotation algorithm ends up rotating through
all servers and addresses until it finds one that responds.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24 00:41:09 +01:00
David Howells
2feeaf8433 afs: Eliminate the address pointer from the address list cursor
Eliminate the address pointer from the address list cursor as it's
redundant (ac->addrs[ac->index] can be used to find the same address) and
address lists must be replaced rather than being rearranged, so is of
limited value.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24 00:41:09 +01:00
David Howells
30062bd13e afs: Implement YFS support in the fs client
Implement support for talking to YFS-variant fileservers in the cache
manager and the filesystem client.  These implement upgraded services on
the same port as their AFS services.

YFS fileservers provide expanded capabilities over AFS.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24 00:41:08 +01:00
David Howells
f51375cd9e afs: Add a couple of tracepoints to log I/O errors
Add a couple of tracepoints to log the production of I/O errors within the AFS
filesystem.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24 00:41:07 +01:00
David Howells
0a5143f2f8 afs: Implement VL server rotation
Track VL servers as independent entities rather than lumping all their
addresses together into one set and implement server-level rotation by:

 (1) Add the concept of a VL server list, where each server has its own
     separate address list.  This code is similar to the FS server list.

 (2) Use the DNS resolver to retrieve a set of servers and their associated
     addresses, ports, preference and weight ratings.

 (3) In the case of a legacy DNS resolver or an address list given directly
     through /proc/net/afs/cells, create a list containing just a dummy
     server record and attach all the addresses to that.

 (4) Implement a simple rotation policy, for the moment ignoring the
     priorities and weights assigned to the servers.

 (5) Show the address list through /proc/net/afs/<cell>/vlservers.  This
     also displays the source and status of the data as indicated by the
     upcall.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24 00:41:07 +01:00
David Howells
f0a7d1883d afs: Fix clearance of reply
The recent patch to fix the afs_server struct leak didn't actually fix the
bug, but rather fixed some of the symptoms.  The problem is that an
asynchronous call that holds a resource pointed to by call->reply[0] will
find the pointer cleared in the call destructor, thereby preventing the
resource from being cleaned up.

In the case of the server record leak, the afs_fs_get_capabilities()
function in devel code sets up a call with reply[0] pointing at the server
record that should be altered when the result is obtained, but this was
being cleared before the destructor was called, so the put in the
destructor does nothing and the record is leaked.

Commit f014ffb025 removed the additional ref obtained by
afs_install_server(), but the removal of this ref is actually used by the
garbage collector to mark a server record as being defunct after the record
has expired through lack of use.

The offending clearance of call->reply[0] upon completion in
afs_process_async_call() has been there from the origin of the code, but
none of the asynchronous calls actually use that pointer currently, so it
should be safe to remove (note that synchronous calls don't involve this
function).

Fix this by the following means:

 (1) Revert commit f014ffb025.

 (2) Remove the clearance of reply[0] from afs_process_async_call().

Without this, afs_manage_servers() will suffer an assertion failure if it
sees a server record that didn't get used because the usage count is not 1.

Fixes: f014ffb025 ("afs: Fix afs_server struct leak")
Fixes: 08e0e7c82e ("[AF_RXRPC]: Make the in-kernel AFS filesystem use AF_RXRPC.")
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-15 15:31:47 +02:00
David Howells
f014ffb025 afs: Fix afs_server struct leak
Fix a leak of afs_server structs.  The routine that installs them in the
various lookup lists and trees gets a ref on leaving the function, whether
it added the server or a server already exists.  It shouldn't increment
the refcount if it added the server.

The effect of this that "rmmod kafs" will hang waiting for the leaked
server to become unused.

Fixes: d2ddc776a4 ("afs: Overhaul volume and server record caching and fileserver rotation")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-12 17:36:40 +02:00
David Howells
47ea0f2ebf afs: Optimise callback breaking by not repeating volume lookup
At the moment, afs_break_callbacks calls afs_break_one_callback() for each
separate FID it was given, and the latter looks up the volume individually
for each one.

However, this is inefficient if two or more FIDs have the same vid as we
could reuse the volume.  This is complicated by cell aliasing whereby we
may have multiple cells sharing a volume and can therefore have multiple
callback interests for any particular volume ID.

At the moment afs_break_one_callback() scans the entire list of volumes
we're getting from a server and breaks the appropriate callback in every
matching volume, regardless of cell.  This scan is done for every FID.

Optimise callback breaking by the following means:

 (1) Sort the FID list by vid so that all FIDs belonging to the same volume
     are clumped together.

     This is done through the use of an indirection table as we cannot do
     an insertion sort on the afs_callback_break array as we decode FIDs
     into it as we subsequently also have to decode callback info into it
     that corresponds by array index only.

     We also don't really want to bubblesort afterwards if we can avoid it.

 (2) Sort the server->cb_interests array by vid so that all the matching
     volumes are grouped together.  This permits the scan to stop after
     finding a record that has a higher vid.

 (3) When breaking FIDs, we try to keep server->cb_break_lock as long as
     possible, caching the start point in the array for that volume group
     as long as possible.

     It might make sense to add another layer in that list and have a
     refcounted volume ID anchor that has the matching interests attached
     to it rather than being in the list.  This would allow the lock to be
     dropped without losing the cursor.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-06-15 15:27:09 +01:00
Marc Dionne
f9c1bba3d3 afs: Fix afs_find_server search loop
The code that looks up servers by addresses makes the assumption
that the list of addresses for a server is sorted.  It exits the
loop if it finds that the target address is larger than the
current candidate.  As the list is not currently sorted, this
can lead to a failure to find a matching server, which can cause
callbacks from that server to be ignored.

Remove the early exit case so that the complete list is searched.

Fixes: d2ddc776a4 ("afs: Overhaul volume and server record caching and fileserver rotation")
Signed-off-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-14 15:15:18 +01:00
David Howells
f2686b0926 afs: Fix giving up callbacks on server destruction
When a server record is destroyed, we want to send a message to the server
telling it that we're giving up all the callbacks it has promised us.

Apply two fixes to this:

 (1) Only send the FS.GiveUpAllCallBacks message if we actually got a
     callback from that server.  We assume this to be the case if we
     performed at least one successful FS operation on that server.

 (2) Send it to the address last used for that server rather than always
     picking the first address in the list (which might be unreachable).

Fixes: d2ddc776a4 ("afs: Overhaul volume and server record caching and fileserver rotation")
Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-14 13:17:35 +01:00
David Howells
660625922b afs: Fix server record deletion
AFS server records get removed from the net->fs_servers tree when
they're deleted, but not from the net->fs_addresses{4,6} lists, which
can lead to an oops in afs_find_server() when a server record has been
removed, for instance during rmmod.

Fix this by deleting the record from the by-address lists before posting
it for RCU destruction.

The reason this hasn't been noticed before is that the fileserver keeps
probing the local cache manager, thereby keeping the service record
alive, so the oops would only happen when a fileserver eventually gets
bored and stops pinging or if the module gets rmmod'd and a call comes
in from the fileserver during the window between the server records
being destroyed and the socket being closed.

The oops looks something like:

  BUG: unable to handle kernel NULL pointer dereference at 000000000000001c
  ...
  Workqueue: kafsd afs_process_async_call [kafs]
  RIP: 0010:afs_find_server+0x271/0x36f [kafs]
  ...
  Call Trace:
   afs_deliver_cb_init_call_back_state3+0x1f2/0x21f [kafs]
   afs_deliver_to_call+0x1ee/0x5e8 [kafs]
   afs_process_async_call+0x5b/0xd0 [kafs]
   process_one_work+0x2c2/0x504
   worker_thread+0x1d4/0x2ac
   kthread+0x11f/0x127
   ret_from_fork+0x24/0x30

Fixes: d2ddc776a4 ("afs: Overhaul volume and server record caching and fileserver rotation")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-20 09:59:33 -07:00
David Howells
fe342cf77b afs: Fix checker warnings
Fix warnings raised by checker, including:

 (*) Warnings raised by unequal comparison for the purposes of sorting,
     where the endianness doesn't matter:

fs/afs/addr_list.c:246:21: warning: restricted __be16 degrades to integer
fs/afs/addr_list.c:246:30: warning: restricted __be16 degrades to integer
fs/afs/addr_list.c:248:21: warning: restricted __be32 degrades to integer
fs/afs/addr_list.c:248:49: warning: restricted __be32 degrades to integer
fs/afs/addr_list.c:283:21: warning: restricted __be16 degrades to integer
fs/afs/addr_list.c:283:30: warning: restricted __be16 degrades to integer

 (*) afs_set_cb_interest() is not actually used and can be removed.

 (*) afs_cell_gc_delay() should be provided with a sysctl.

 (*) afs_cell_destroy() needs to use rcu_access_pointer() to read
     cell->vl_addrs.

 (*) afs_init_fs_cursor() should be static.

 (*) struct afs_vnode::permit_cache needs to be marked __rcu.

 (*) afs_server_rcu() needs to use rcu_access_pointer().

 (*) afs_destroy_server() should use rcu_access_pointer() on
     server->addresses as the server object is no longer accessible.

 (*) afs_find_server() casts __be16/__be32 values to int in order to
     directly compare them for the purpose of finding a match in a list,
     but is should also annotate the cast with __force to avoid checker
     warnings.

 (*) afs_check_permit() accesses vnode->permit_cache outside of the RCU
     readlock, though it doesn't then access the value; the extraneous
     access is deleted.

False positives:

 (*) Conditional locking around the code in xdr_decode_AFSFetchStatus.  This
     can be dealt with in a separate patch.

fs/afs/fsclient.c:148:9: warning: context imbalance in 'xdr_decode_AFSFetchStatus' - different lock contexts for basic block

 (*) Incorrect handling of seq-retry lock context balance:

fs/afs/inode.c:455:38: warning: context imbalance in 'afs_getattr' - different
lock contexts for basic block
fs/afs/server.c:52:17: warning: context imbalance in 'afs_find_server' - different lock contexts for basic block
fs/afs/server.c:128:17: warning: context imbalance in 'afs_find_server_by_uuid' - different lock contexts for basic block

Errors:

 (*) afs_lookup_cell_rcu() needs to break out of the seq-retry loop, not go
     round again if it successfully found the workstation cell.

 (*) Fix UUID decode in afs_deliver_cb_probe_uuid().

 (*) afs_cache_permit() has a missing rcu_read_unlock() before one of the
     jumps to the someone_else_changed_it label.  Move the unlock to after
     the label.

 (*) afs_vl_get_addrs_u() is using ntohl() rather than htonl() when
     encoding to XDR.

 (*) afs_deliver_yfsvl_get_endpoints() is using htonl() rather than ntohl()
     when decoding from XDR.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-09 21:12:31 +01:00
Peter Zijlstra
ab1fbe3247 sched/wait, fs/afs: Convert wait_on_atomic_t() usage to the new wait_var_event() API
The old wait_on_atomic_t() is going to get removed, use the more
flexible wait_var_event() API instead.

No change in functionality.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-20 08:23:19 +01:00
David Howells
bf99a53ce2 afs: Make use of the YFS service upgrade to fully support IPv6
YFS VL servers offer an upgraded Volume Location service that can return
IPv6 addresses to fileservers and volume servers in addition to IPv4
addresses using the YFSVL.GetEndpoints operation which we should use if
it's available.

To this end:

 (1) Make rxrpc_kernel_recv_data() return the call's current service ID so
     that the caller can detect service upgrade and see what the service
     was upgraded to.

 (2) When we see a VL server address we haven't seen before, send a
     VL.GetCapabilities operation to it with the service upgrade bit set.

     If we get an upgrade to the YFS VL service, change the service ID in
     the address list for that address to use the upgraded service and set
     a flag to note that this appears to be a YFS-compatible server.

 (3) If, when a server's addresses are being looked up, we note that we
     previously detected a YFS-compatible server, then send the
     YFSVL.GetEndpoints operation rather than VL.GetAddrsU.

 (4) Build a fileserver address list from the reply of YFSVL.GetEndpoints,
     including both IPv4 and IPv6 addresses.  Volume server addresses are
     discarded.

 (5) The address list is sorted by address and port now, instead of just
     address.  This allows multiple servers on the same host sitting on
     different ports.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-13 15:38:19 +00:00