Commit Graph

245 Commits

Author SHA1 Message Date
David Howells
194d28cf19 afs: Retain more of the VLDB record for alias detection
Save more bits from the volume location database record obtained for a
server so that we can use this information in cell alias detection.

Signed-off-by: David Howells <dhowells@redhat.com>
2020-06-04 15:37:57 +01:00
David Howells
44746355cc afs: Don't get epoch from a server because it may be ambiguous
Don't get the epoch from a server, particularly one that we're looking up
by UUID, as UUIDs may be ambiguous and may map to more than one server - so
we can't draw any conclusions from it.

Reported-by: Jeffrey Altman <jaltman@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2020-06-04 15:37:56 +01:00
David Howells
e49c7b2f6d afs: Build an abstraction around an "operation" concept
Turn the afs_operation struct into the main way that most fileserver
operations are managed.  Various things are added to the struct, including
the following:

 (1) All the parameters and results of the relevant operations are moved
     into it, removing corresponding fields from the afs_call struct.
     afs_call gets a pointer to the op.

 (2) The target volume is made the main focus of the operation, rather than
     the target vnode(s), and a bunch of op->vnode->volume are made
     op->volume instead.

 (3) Two vnode records are defined (op->file[]) for the vnode(s) involved
     in most operations.  The vnode record (struct afs_vnode_param)
     contains:

	- The vnode pointer.

	- The fid of the vnode to be included in the parameters or that was
          returned in the reply (eg. FS.MakeDir).

	- The status and callback information that may be returned in the
     	  reply about the vnode.

	- Callback break and data version tracking for detecting
          simultaneous third-parth changes.

 (4) Pointers to dentries to be updated with new inodes.

 (5) An operations table pointer.  The table includes pointers to functions
     for issuing AFS and YFS-variant RPCs, handling the success and abort
     of an operation and handling post-I/O-lock local editing of a
     directory.

To make this work, the following function restructuring is made:

 (A) The rotation loop that issues calls to fileservers that can be found
     in each function that wants to issue an RPC (such as afs_mkdir()) is
     extracted out into common code, in a new file called fs_operation.c.

 (B) The rotation loops, such as the one in afs_mkdir(), are replaced with
     a much smaller piece of code that allocates an operation, sets the
     parameters and then calls out to the common code to do the actual
     work.

 (C) The code for handling the success and failure of an operation are
     moved into operation functions (as (5) above) and these are called
     from the core code at appropriate times.

 (D) The pseudo inode getting stuff used by the dynamic root code is moved
     over into dynroot.c.

 (E) struct afs_iget_data is absorbed into the operation struct and
     afs_iget() expects to be given an op pointer and a vnode record.

 (F) Point (E) doesn't work for the root dir of a volume, but we know the
     FID in advance (it's always vnode 1, unique 1), so a separate inode
     getter, afs_root_iget(), is provided to special-case that.

 (G) The inode status init/update functions now also take an op and a vnode
     record.

 (H) The RPC marshalling functions now, for the most part, just take an
     afs_operation struct as their only argument.  All the data they need
     is held there.  The result delivery functions write their answers
     there as well.

 (I) The call is attached to the operation and then the operation core does
     the waiting.

And then the new operation code is, for the moment, made to just initialise
the operation, get the appropriate vnode I/O locks and do the same rotation
loop as before.

This lays the foundation for the following changes in the future:

 (*) Overhauling the rotation (again).

 (*) Support for asynchronous I/O, where the fileserver rotation must be
     done asynchronously also.

Signed-off-by: David Howells <dhowells@redhat.com>
2020-06-04 15:37:17 +01:00
David Howells
a310082f6d afs: Rename struct afs_fs_cursor to afs_operation
As a prelude to implementing asynchronous fileserver operations in the afs
filesystem, rename struct afs_fs_cursor to afs_operation.

This struct is going to form the core of the operation management and is
going to acquire more members in later.

Signed-off-by: David Howells <dhowells@redhat.com>
2020-05-31 15:19:52 +01:00
David Howells
7126ead910 afs: Remove the error argument from afs_protocol_error()
Remove the error argument from afs_protocol_error() as it's always
-EBADMSG.

Signed-off-by: David Howells <dhowells@redhat.com>
2020-05-31 15:19:52 +01:00
David Howells
38355eec6a afs: Set error flag rather than return error from file status decode
Set a flag in the call struct to indicate an unmarshalling error rather
than return and handle an error from the decoding of file statuses.  This
flag is checked on a successful return from the delivery function.

Signed-off-by: David Howells <dhowells@redhat.com>
2020-05-31 15:19:51 +01:00
David Howells
8230fd8217 afs: Make callback processing more efficient.
afs_vol_interest objects represent the volume IDs currently being accessed
from a fileserver.  These hold lists of afs_cb_interest objects that
repesent the superblocks using that volume ID on that server.

When a callback notification from the server telling of a modification by
another client arrives, the volume ID specified in the notification is
looked up in the server's afs_vol_interest list.  Through the
afs_cb_interest list, the relevant superblocks can be iterated over and the
specific inode looked up and marked in each one.

Make the following efficiency improvements:

 (1) Hold rcu_read_lock() over the entire processing rather than locking it
     each time.

 (2) Do all the callbacks for each vid together rather than individually.
     Each volume then only needs to be looked up once.

 (3) afs_vol_interest objects are now stored in an rb_tree rather than a
     flat list to reduce the lookup step count.

 (4) afs_vol_interest lookup is now done with RCU, but because it's in an
     rb_tree which may rotate under us, a seqlock is used so that if it
     changes during the walk, we repeat the walk with a lock held.

With this and the preceding patch which adds RCU-based lookups in the inode
cache, target volumes/vnodes can be taken without the need to take any
locks, except on the target itself.

Signed-off-by: David Howells <dhowells@redhat.com>
2020-05-31 15:19:51 +01:00
David Howells
f6cbb368bc afs: Actively poll fileservers to maintain NAT or firewall openings
When an AFS client accesses a file, it receives a limited-duration callback
promise that the server will notify it if another client changes a file.
This callback duration can be a few hours in length.

If a client mounts a volume and then an application prevents it from being
unmounted, say by chdir'ing into it, but then does nothing for some time,
the rxrpc_peer record will expire and rxrpc-level keepalive will cease.

If there is NAT or a firewall between the client and the server, the route
back for the server may close after a comparatively short duration, meaning
that attempts by the server to notify the client may then bounce.

The client, however, may (so far as it knows) still have a valid unexpired
promise and will then rely on its cached data and will not see changes made
on the server by a third party until it incidentally rechecks the status or
the promise needs renewal.

To deal with this, the client needs to regularly probe the server.  This
has two effects: firstly, it keeps a route open back for the server, and
secondly, it causes the server to disgorge any notifications that got
queued up because they couldn't be sent.

Fix this by adding a mechanism to emit regular probes.

Two levels of probing are made available: Under normal circumstances the
'slow' queue will be used for a fileserver - this just probes the preferred
address once every 5 mins or so; however, if server fails to respond to any
probes, the server will shift to the 'fast' queue from which all its
interfaces will be probed every 30s.  When it finally responds, the record
will switch back to the slow queue.

Further notes:

 (1) Probing is now no longer driven from the fileserver rotation
     algorithm.

 (2) Probes are dispatched to all interfaces on a fileserver when that an
     afs_server object is set up to record it.

 (3) The afs_server object is removed from the probe queues when we start
     to probe it.  afs_is_probing_server() returns true if it's not listed
     - ie. it's undergoing probing.

 (4) The afs_server object is added back on to the probe queue when the
     final outstanding probe completes, but the probed_at time is set when
     we're about to launch a probe so that it's not dependent on the probe
     duration.

 (5) The timer and the work item added for this must be handed a count on
     net->servers_outstanding, which they hand on or release.  This makes
     sure that network namespace cleanup waits for them.

Fixes: d2ddc776a4 ("afs: Overhaul volume and server record caching and fileserver rotation")
Reported-by: Dave Botsch <botsch@cnf.cornell.edu>
Signed-off-by: David Howells <dhowells@redhat.com>
2020-05-31 15:19:51 +01:00
David Howells
977e5f8ed0 afs: Split the usage count on struct afs_server
Split the usage count on the afs_server struct to have an active count that
registers who's actually using it separately from the reference count on
the object.

This allows a future patch to dispatch polling probes without advancing the
"unuse" time into the future each time we emit a probe, which would
otherwise prevent unused server records from expiring.

Included in this:

 (1) The latter part of afs_destroy_server() in which the RCU destruction
     of afs_server objects is invoked and the outstanding server count is
     decremented is split out into __afs_put_server().

 (2) afs_put_server() now calls __afs_put_server() rather then setting the
     management timer.

 (3) The calls begun by afs_fs_give_up_all_callbacks() and
     afs_fs_get_capabilities() can now take a ref on the server record, so
     afs_destroy_server() can just drop its ref and needn't wait for the
     completion of these calls.  They'll put the ref when they're done.

 (4) Because of (3), afs_fs_probe_done() no longer needs to wake up
     afs_destroy_server() with server->probe_outstanding.

 (5) afs_gc_servers can be simplified.  It only needs to check if
     server->active is 0 rather than playing games with the refcount.

 (6) afs_manage_servers() can propose a server for gc if usage == 0 rather
     than if ref == 1.  The gc is effected by (5).

Signed-off-by: David Howells <dhowells@redhat.com>
2020-05-31 15:19:51 +01:00
David Howells
8100680592 afs: Use the serverUnique field in the UVLDB record to reduce rpc ops
The U-version VLDB volume record retrieved by the VL.GetEntryByNameU rpc op
carries a change counter (the serverUnique field) for each fileserver
listed in the record as backing that volume.  This is incremented whenever
the registration details for a fileserver change (such as its address
list).  Note that the same value will be seen in all UVLDB records that
refer to that fileserver.

This should be checked before calling the VL server to re-query the address
list for a fileserver.  If it's the same, there's no point doing the query.

Reported-by: Jeffrey Altman <jaltman@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2020-05-31 15:19:51 +01:00
David Howells
c4bfda16d1 afs: Make record checking use TASK_UNINTERRUPTIBLE when appropriate
When an operation is meant to be done uninterruptibly (such as
FS.StoreData), we should not be allowing volume and server record checking
to be interrupted.

Fixes: d2ddc776a4 ("afs: Overhaul volume and server record caching and fileserver rotation")
Signed-off-by: David Howells <dhowells@redhat.com>
2020-04-24 16:33:32 +01:00
David Howells
be59167c8f afs: Remove some unused bits
Remove three bits:

 (1) afs_server::no_epoch is neither set nor used.

 (2) afs_server::have_result is set and a wakeup is applied to it, but
     nothing looks at it or waits on it.

 (3) afs_vl_dump_edestaddrreq() prints afs_addr_list::probed, but nothing
     sets it for VL servers.

Signed-off-by: David Howells <dhowells@redhat.com>
2020-04-24 16:32:49 +01:00
Linus Torvalds
1b649e0bca Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:

 1) Fix deadlock in bpf_send_signal() from Yonghong Song.

 2) Fix off by one in kTLS offload of mlx5, from Tariq Toukan.

 3) Add missing locking in iwlwifi mvm code, from Avraham Stern.

 4) Fix MSG_WAITALL handling in rxrpc, from David Howells.

 5) Need to hold RTNL mutex in tcindex_partial_destroy_work(), from Cong
    Wang.

 6) Fix producer race condition in AF_PACKET, from Willem de Bruijn.

 7) cls_route removes the wrong filter during change operations, from
    Cong Wang.

 8) Reject unrecognized request flags in ethtool netlink code, from
    Michal Kubecek.

 9) Need to keep MAC in reset until PHY is up in bcmgenet driver, from
    Doug Berger.

10) Don't leak ct zone template in act_ct during replace, from Paul
    Blakey.

11) Fix flushing of offloaded netfilter flowtable flows, also from Paul
    Blakey.

12) Fix throughput drop during tx backpressure in cxgb4, from Rahul
    Lakkireddy.

13) Don't let a non-NULL skb->dev leave the TCP stack, from Eric
    Dumazet.

14) TCP_QUEUE_SEQ socket option has to update tp->copied_seq as well,
    also from Eric Dumazet.

15) Restrict macsec to ethernet devices, from Willem de Bruijn.

16) Fix reference leak in some ethtool *_SET handlers, from Michal
    Kubecek.

17) Fix accidental disabling of MSI for some r8169 chips, from Heiner
    Kallweit.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (138 commits)
  net: Fix CONFIG_NET_CLS_ACT=n and CONFIG_NFT_FWD_NETDEV={y, m} build
  net: ena: Add PCI shutdown handler to allow safe kexec
  selftests/net/forwarding: define libs as TEST_PROGS_EXTENDED
  selftests/net: add missing tests to Makefile
  r8169: re-enable MSI on RTL8168c
  net: phy: mdio-bcm-unimac: Fix clock handling
  cxgb4/ptp: pass the sign of offset delta in FW CMD
  net: dsa: tag_8021q: replace dsa_8021q_remove_header with __skb_vlan_pop
  net: cbs: Fix software cbs to consider packet sending time
  net/mlx5e: Do not recover from a non-fatal syndrome
  net/mlx5e: Fix ICOSQ recovery flow with Striding RQ
  net/mlx5e: Fix missing reset of SW metadata in Striding RQ reset
  net/mlx5e: Enhance ICOSQ WQE info fields
  net/mlx5_core: Set IB capability mask1 to fix ib_srpt connection failure
  selftests: netfilter: add nfqueue test case
  netfilter: nft_fwd_netdev: allow to redirect to ifb via ingress
  netfilter: nft_fwd_netdev: validate family and chain type
  netfilter: nft_set_rbtree: Detect partial overlaps on insertion
  netfilter: nft_set_rbtree: Introduce and use nft_rbtree_interval_start()
  netfilter: nft_set_pipapo: Separate partial and complete overlap cases on insertion
  ...
2020-03-25 13:58:05 -07:00
David Howells
dde9f09558 afs: Fix handling of an abort from a service handler
When an AFS service handler function aborts a call, AF_RXRPC marks the call
as complete - which means that it's not going to get any more packets from
the receiver.  This is a problem because reception of the final ACK is what
triggers afs_deliver_to_call() to drop the final ref on the afs_call
object.

Instead, aborted AFS service calls may then just sit around waiting for
ever or until they're displaced by a new call on the same connection
channel or a connection-level abort.

Fix this by calling afs_set_call_complete() to finalise the afs_call struct
representing the call.

However, we then need to drop the ref that stops the call from being
deallocated.  We can do this in afs_set_call_complete(), as the work queue
is holding a separate ref of its own, but then we shouldn't do it in
afs_process_async_call() and afs_delete_async_call().

call->drop_ref is set to indicate that a ref needs dropping for a call and
this is dealt with when we transition a call to AFS_CALL_COMPLETE.

But then we also need to get rid of the ref that pins an asynchronous
client call.  We can do this by the same mechanism, setting call->drop_ref
for an async client call too.

We can also get rid of call->incoming since nothing ever sets it and only
one thing ever checks it (futilely).


A trace of the rxrpc_call and afs_call struct ref counting looks like:

          <idle>-0     [001] ..s5   164.764892: rxrpc_call: c=00000002 SEE u=3 sp=rxrpc_new_incoming_call+0x473/0xb34 a=00000000442095b5
          <idle>-0     [001] .Ns5   164.766001: rxrpc_call: c=00000002 QUE u=4 sp=rxrpc_propose_ACK+0xbe/0x551 a=00000000442095b5
          <idle>-0     [001] .Ns4   164.766005: rxrpc_call: c=00000002 PUT u=3 sp=rxrpc_new_incoming_call+0xa3f/0xb34 a=00000000442095b5
          <idle>-0     [001] .Ns7   164.766433: afs_call: c=00000002 WAKE  u=2 o=11 sp=rxrpc_notify_socket+0x196/0x33c
     kworker/1:2-1810  [001] ...1   164.768409: rxrpc_call: c=00000002 SEE u=3 sp=rxrpc_process_call+0x25/0x7ae a=00000000442095b5
     kworker/1:2-1810  [001] ...1   164.769439: rxrpc_tx_packet: c=00000002 e9f1a7a8:95786a88:00000008:09c5 00000001 00000000 02 22 ACK CallAck
     kworker/1:2-1810  [001] ...1   164.769459: rxrpc_call: c=00000002 PUT u=2 sp=rxrpc_process_call+0x74f/0x7ae a=00000000442095b5
     kworker/1:2-1810  [001] ...1   164.770794: afs_call: c=00000002 QUEUE u=3 o=12 sp=afs_deliver_to_call+0x449/0x72c
     kworker/1:2-1810  [001] ...1   164.770829: afs_call: c=00000002 PUT   u=2 o=12 sp=afs_process_async_call+0xdb/0x11e
     kworker/1:2-1810  [001] ...2   164.771084: rxrpc_abort: c=00000002 95786a88:00000008 s=0 a=1 e=1 K-1
     kworker/1:2-1810  [001] ...1   164.771461: rxrpc_tx_packet: c=00000002 e9f1a7a8:95786a88:00000008:09c5 00000002 00000000 04 00 ABORT CallAbort
     kworker/1:2-1810  [001] ...1   164.771466: afs_call: c=00000002 PUT   u=1 o=12 sp=SRXAFSCB_ProbeUuid+0xc1/0x106

The abort generated in SRXAFSCB_ProbeUuid(), labelled "K-1", indicates that
the local filesystem/cache manager didn't recognise the UUID as its own.

Fixes: 2067b2b3f4 ("afs: Fix the CB.ProbeUuid service handler to reply correctly")
Signed-off-by: David Howells <dhowells@redhat.com>
2020-03-13 23:04:35 +00:00
Jann Horn
ddd2b85ff7 afs: Use kfree_rcu() instead of casting kfree() to rcu_callback_t
afs_put_addrlist() casts kfree() to rcu_callback_t. Apart from being wrong
in theory, this might also blow up when people start enforcing function
types via compiler instrumentation, and it means the rcu_head has to be
first in struct afs_addr_list.

Use kfree_rcu() instead, it's simpler and more correct.

Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-13 10:47:33 -07:00
David Howells
d4438a2529 afs: Introduce an afs_get_read() refcount helper
Introduce an afs_get_read() helper to get a reference on an afs_read
object.

Signed-off-by: David Howells <dhowells@redhat.com>
2019-11-21 09:12:17 +00:00
David Howells
fc27612249 afs: Switch the naming of call->iter and call->_iter
Change the name of call->iter to call->def_iter to represent the default
iterator.

Change the name of call->_iter to call->iter to represent the iterator
actually being used.

Signed-off-by: David Howells <dhowells@redhat.com>
2019-11-21 09:12:17 +00:00
Linus Torvalds
1eb80d6ffb Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull more vfs updates from Al Viro:
 "A couple of misc patches"

* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  afs dynroot: switch to simple_dir_operations
  fs/handle.c - fix up kerneldoc
2019-09-29 19:42:07 -07:00
Al Viro
473ef57ad8 afs dynroot: switch to simple_dir_operations
no point reinventing it (with wrong ->read(), BTW).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-09-15 12:19:48 -04:00
David Howells
8b6a666a97 afs: Provide an RCU-capable key lookup
Provide an RCU-capable key lookup function.  We don't want to call
afs_request_key() in RCU-mode pathwalk as request_key() might sleep, even if
we don't ask it to construct anything as it might find a key that is currently
undergoing construction.

Signed-off-by: David Howells <dhowells@redhat.com>
2019-09-02 11:43:54 +01:00
Linus Torvalds
237f83dfbe Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
 "Some highlights from this development cycle:

   1) Big refactoring of ipv6 route and neigh handling to support
      nexthop objects configurable as units from userspace. From David
      Ahern.

   2) Convert explored_states in BPF verifier into a hash table,
      significantly decreased state held for programs with bpf2bpf
      calls, from Alexei Starovoitov.

   3) Implement bpf_send_signal() helper, from Yonghong Song.

   4) Various classifier enhancements to mvpp2 driver, from Maxime
      Chevallier.

   5) Add aRFS support to hns3 driver, from Jian Shen.

   6) Fix use after free in inet frags by allocating fqdirs dynamically
      and reworking how rhashtable dismantle occurs, from Eric Dumazet.

   7) Add act_ctinfo packet classifier action, from Kevin
      Darbyshire-Bryant.

   8) Add TFO key backup infrastructure, from Jason Baron.

   9) Remove several old and unused ISDN drivers, from Arnd Bergmann.

  10) Add devlink notifications for flash update status to mlxsw driver,
      from Jiri Pirko.

  11) Lots of kTLS offload infrastructure fixes, from Jakub Kicinski.

  12) Add support for mv88e6250 DSA chips, from Rasmus Villemoes.

  13) Various enhancements to ipv6 flow label handling, from Eric
      Dumazet and Willem de Bruijn.

  14) Support TLS offload in nfp driver, from Jakub Kicinski, Dirk van
      der Merwe, and others.

  15) Various improvements to axienet driver including converting it to
      phylink, from Robert Hancock.

  16) Add PTP support to sja1105 DSA driver, from Vladimir Oltean.

  17) Add mqprio qdisc offload support to dpaa2-eth, from Ioana
      Radulescu.

  18) Add devlink health reporting to mlx5, from Moshe Shemesh.

  19) Convert stmmac over to phylink, from Jose Abreu.

  20) Add PTP PHC (Physical Hardware Clock) support to mlxsw, from
      Shalom Toledo.

  21) Add nftables SYNPROXY support, from Fernando Fernandez Mancera.

  22) Convert tcp_fastopen over to use SipHash, from Ard Biesheuvel.

  23) Track spill/fill of constants in BPF verifier, from Alexei
      Starovoitov.

  24) Support bounded loops in BPF, from Alexei Starovoitov.

  25) Various page_pool API fixes and improvements, from Jesper Dangaard
      Brouer.

  26) Just like ipv4, support ref-countless ipv6 route handling. From
      Wei Wang.

  27) Support VLAN offloading in aquantia driver, from Igor Russkikh.

  28) Add AF_XDP zero-copy support to mlx5, from Maxim Mikityanskiy.

  29) Add flower GRE encap/decap support to nfp driver, from Pieter
      Jansen van Vuuren.

  30) Protect against stack overflow when using act_mirred, from John
      Hurley.

  31) Allow devmap map lookups from eBPF, from Toke Høiland-Jørgensen.

  32) Use page_pool API in netsec driver, Ilias Apalodimas.

  33) Add Google gve network driver, from Catherine Sullivan.

  34) More indirect call avoidance, from Paolo Abeni.

  35) Add kTLS TX HW offload support to mlx5, from Tariq Toukan.

  36) Add XDP_REDIRECT support to bnxt_en, from Andy Gospodarek.

  37) Add MPLS manipulation actions to TC, from John Hurley.

  38) Add sending a packet to connection tracking from TC actions, and
      then allow flower classifier matching on conntrack state. From
      Paul Blakey.

  39) Netfilter hw offload support, from Pablo Neira Ayuso"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2080 commits)
  net/mlx5e: Return in default case statement in tx_post_resync_params
  mlx5: Return -EINVAL when WARN_ON_ONCE triggers in mlx5e_tls_resync().
  net: dsa: add support for BRIDGE_MROUTER attribute
  pkt_sched: Include const.h
  net: netsec: remove static declaration for netsec_set_tx_de()
  net: netsec: remove superfluous if statement
  netfilter: nf_tables: add hardware offload support
  net: flow_offload: rename tc_cls_flower_offload to flow_cls_offload
  net: flow_offload: add flow_block_cb_is_busy() and use it
  net: sched: remove tcf block API
  drivers: net: use flow block API
  net: sched: use flow block API
  net: flow_offload: add flow_block_cb_{priv, incref, decref}()
  net: flow_offload: add list handling functions
  net: flow_offload: add flow_block_cb_alloc() and flow_block_cb_free()
  net: flow_offload: rename TCF_BLOCK_BINDER_TYPE_* to FLOW_BLOCK_BINDER_TYPE_*
  net: flow_offload: rename TC_BLOCK_{UN}BIND to FLOW_BLOCK_{UN}BIND
  net: flow_offload: add flow_block_cb_setup_simple()
  net: hisilicon: Add an tx_desc to adapt HI13X1_GMAC
  net: hisilicon: Add an rx_desc to adapt HI13X1_GMAC
  ...
2019-07-11 10:55:49 -07:00
Linus Torvalds
8dda9957e3 AFS development
-----BEGIN PGP SIGNATURE-----
 
 iQIVAwUAXRyW8vu3V2unywtrAQIhsw//cVtxLx4ZCox5Z/93cdqych8RoCrwcUEG
 Cli0NAjlp/0HETvCsIqdkPKf+4OYCW1tHB2KTdbFdQLZptLgoEhykx89k70z9ggb
 ViieEa1GvAKhdamVqkPUC+3Q33uzyRaK7Gi5N3phJoaO+o328SlrPG0LerQgY0Np
 Rf3je56A1gIjEgWTmpStxiY262jlgaR3IuvpOqbu2G0TQVWV8CsBKw61fTdmEEQp
 dIkNO/xFXS+PvPdmQe5zCAjD/W2D+ggeBMbBwHF411qA60plGinubBYKZ98ikliZ
 OnQQPExI7mroIMzpYT+rzEQyxui2nz5t+Hj+d6t7iIvitNcX/Q53sVTq3RfQ0FjG
 QCd+j/l2p7fkXK4Sxgb/UBkj/pRr6W+FYSbQ/tmpD8UypEf5B3ln6GuA6yTMuNRF
 wVb744slKWq0c7KUuXmz806B2qJoyFG206jyFnoByvs6cPmB1+JqhBBYOKHcwjbo
 HIK+oUKkEfE6ofjQ3B9xOQ1anfbRnjjfJCmXvns9v57y/nRP2P78HUJNnEsOolk2
 nc3Ep41OgeZdwkts9KnSjmwy6VF3UZ2NQEiWXsUIOxGMtcodw9ci1bpquJ71oyut
 4sFMJvMU4eJD+XuCOlAgpbTaQ0Wuf11kFpl1Cof4fj0Z09C25Ahj6iKEKnumtO+4
 edfNLlwO6oo=
 =wgib
 -----END PGP SIGNATURE-----

Merge tag 'afs-next-20190628' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

Pull afs updates from David Howells:
 "A set of minor changes for AFS:

   - Remove an unnecessary check in afs_unlink()

   - Add a tracepoint for tracking callback management

   - Add a tracepoint for afs_server object usage

   - Use struct_size()

   - Add mappings for AFS UAE abort codes to Linux error codes, using
     symbolic names rather than hex numbers in the .c file"

* tag 'afs-next-20190628' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  afs: Add support for the UAE error table
  fs/afs: use struct_size() in kzalloc()
  afs: Trace afs_server usage
  afs: Add some callback management tracepoints
  afs: afs_unlink() doesn't need to check dentry->d_inode
2019-07-10 20:55:33 -07:00
David S. Miller
d96ff269a0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
The new route handling in ip_mc_finish_output() from 'net' overlapped
with the new support for returning congestion notifications from BPF
programs.

In order to handle this I had to take the dev_loopback_xmit() calls
out of the switch statement.

The aquantia driver conflicts were simple overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-27 21:06:39 -07:00
Linus Torvalds
cd0f3aaebc AFS fixes
-----BEGIN PGP SIGNATURE-----
 
 iQIVAwUAXRMn5vu3V2unywtrAQICpA/+IIINk6MJVQDzGhOnvWrbGdPnOdJEUyLN
 B9U4bLZJRg/j+Sqodn+fXIfsEO4FQflkSJD+xoBi4pzBZcr0xkLUVOog/1S7dv4J
 bPVT9p2f3ITNiatmisOrUe1InuHa6Wb/cUnQaLLRhd7NqbawKGRQG4tv4CGwKn67
 dJIOOm/iTCs1ACES4C5QOpU7/DWK38Pn3BbnN21bFzDgfbtbdDTaFFkhFtXy78oB
 Gcj5g+ULpkKBcuJThFuJUPZ9E4qICNZR4kJXEULSvykDDRzluhJmQ+v8btm6NJsq
 hMqTrT9M2y114V1OqXj3me7tA6wOEAfTQ0WzpzF2SmyFQKnSly/EkWc4HZXFD/8O
 BczCcABUbuKNE/pJSELx6k1M0+00QfeLcjHPc6joZFCni3lMdYWOncn/syyHw5P+
 rc9JQsy3+dLcFsaVQ5eGmX6NDc70dCrAlS6MllIzSBcwAVCctTKwm0meaSW6B2y6
 VymPy+cqi1RxMKyiQ0hAeU7Xe6yqFcl6rtonfCQqRLxkfzrCXkDp6/ELOXBzDft1
 ey6+N3WsmWW7YSPuM/SIZKV66rshlflj0w+FRluZEEAF1NYeYqXUDvK/S8KC9kPG
 AXUDvhI+tBpxg1AVz94JN714VmkbY23xV0g44eQsdqSQm2YvsxiFCSWZZ6L/KEWe
 kWQc6BGDCB0=
 =YTdG
 -----END PGP SIGNATURE-----

Merge tag 'afs-fixes-20190620' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

Pull AFS fixes from David Howells:
 "The in-kernel AFS client has been undergoing testing on opendev.org on
  one of their mirror machines. They are using AFS to hold data that is
  then served via apache, and Ian Wienand had reported seeing oopses,
  spontaneous machine reboots and updates to volumes going missing. This
  patch series appears to have fixed the problem, very probably due to
  patch (2), but it's not 100% certain.

  (1) Fix the printing of the "vnode modified" warning to exclude checks
      on files for which we don't have a callback promise from the
      server (and so don't expect the server to tell us when it
      changes).

      Without this, for every file or directory for which we still have
      an in-core inode that gets changed on the server, we may get a
      message logged when we next look at it. This can happen in bulk
      if, for instance, someone does "vos release" to update a R/O
      volume from a R/W volume and a whole set of files are all changed
      together.

      We only really want to log a message if the file changed and the
      server didn't tell us about it or we failed to track the state
      internally.

  (2) Fix accidental corruption of either afs_vlserver struct objects or
      the the following memory locations (which could hold anything).
      The issue is caused by a union that points to two different
      structs in struct afs_call (to save space in the struct). The call
      cleanup code assumes that it can simply call the cleanup for one
      of those structs if not NULL - when it might be actually pointing
      to the other struct.

      This means that every Volume Location RPC op is going to corrupt
      something.

  (3) Fix an uninitialised spinlock. This isn't too bad, it just causes
      a one-off warning if lockdep is enabled when "vos release" is
      called, but the spinlock still behaves correctly.

  (4) Fix the setting of i_block in the inode. This causes du, for
      example, to produce incorrect results, but otherwise should not be
      dangerous to the kernel"

* tag 'afs-fixes-20190620' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  afs: Fix setting of i_blocks
  afs: Fix uninitialised spinlock afs_volume::cb_break_lock
  afs: Fix vlserver record corruption
  afs: Fix over zealous "vnode modified" warnings
2019-06-28 08:34:12 +08:00
David Howells
4521819369 afs: Trace afs_server usage
Add a tracepoint (afs_server) to track the afs_server object usage count.

Signed-off-by: David Howells <dhowells@redhat.com>
2019-06-20 18:12:17 +01:00
David Howells
051d25250b afs: Add some callback management tracepoints
Add a couple of tracepoints to track callback management:

 (1) afs_cb_miss - Logs when we were unable to apply a callback, either due
     to the inode being discarded or due to a competing thread applying a
     callback first.

 (2) afs_cb_break - Logs when we attempted to clear the noted callback
     promise, either due to the server explicitly breaking the callback,
     the callback promise lapsing or a local event obsoleting it.

Signed-off-by: David Howells <dhowells@redhat.com>
2019-06-20 18:12:16 +01:00
David Howells
90fa9b6452 afs: Fix uninitialised spinlock afs_volume::cb_break_lock
Fix the cb_break_lock spinlock in afs_volume struct by initialising it when
the volume record is allocated.

Also rename the lock to cb_v_break_lock to distinguish it from the lock of
the same name in the afs_server struct.

Without this, the following trace may be observed when a volume-break
callback is received:

  INFO: trying to register non-static key.
  the code is fine but needs lockdep annotation.
  turning off the locking correctness validator.
  CPU: 2 PID: 50 Comm: kworker/2:1 Not tainted 5.2.0-rc1-fscache+ #3045
  Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
  Workqueue: afs SRXAFSCB_CallBack
  Call Trace:
   dump_stack+0x67/0x8e
   register_lock_class+0x23b/0x421
   ? check_usage_forwards+0x13c/0x13c
   __lock_acquire+0x89/0xf73
   lock_acquire+0x13b/0x166
   ? afs_break_callbacks+0x1b2/0x3dd
   _raw_write_lock+0x2c/0x36
   ? afs_break_callbacks+0x1b2/0x3dd
   afs_break_callbacks+0x1b2/0x3dd
   ? trace_event_raw_event_afs_server+0x61/0xac
   SRXAFSCB_CallBack+0x11f/0x16c
   process_one_work+0x2c5/0x4ee
   ? worker_thread+0x234/0x2ac
   worker_thread+0x1d8/0x2ac
   ? cancel_delayed_work_sync+0xf/0xf
   kthread+0x11f/0x127
   ? kthread_park+0x76/0x76
   ret_from_fork+0x24/0x30

Fixes: 68251f0a68 ("afs: Fix whole-volume callback handling")
Signed-off-by: David Howells <dhowells@redhat.com>
2019-06-20 16:49:35 +01:00
David Howells
a6853b9ce8 afs: Fix vlserver record corruption
Because I made the afs_call struct share pointers to an afs_server object
and an afs_vlserver object to save space, afs_put_call() calls
afs_put_server() on afs_vlserver object (which is only meant for the
afs_server object) because it sees that call->server isn't NULL.

This means that the afs_vlserver object gets unpredictably and randomly
modified, depending on what config options are set (such as lockdep).

Fix this by getting rid of the union and having two non-overlapping
pointers in the afs_call struct.

Fixes: ffba718e93 ("afs: Get rid of afs_call::reply[]")
Signed-off-by: David Howells <dhowells@redhat.com>
2019-06-20 16:49:35 +01:00
David S. Miller
a6cdeeb16b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Some ISDN files that got removed in net-next had some changes
done in mainline, take the removals.

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-07 11:00:14 -07:00
Florian Westphal
35ebfc22fe afs: do not send list of client addresses
David Howells says:
  I'm told that there's not really any point populating the list.
  Current OpenAFS ignores it, as does AuriStor - and IBM AFS 3.6 will
  do the right thing.
  The list is actually useless as it's the client's view of the world,
  not the servers, so if there's any NAT in the way its contents are
  invalid.  Further, it doesn't support IPv6 addresses.

  On that basis, feel free to make it an empty list and remove all the
  interface enumeration.

V1 of this patch reworked the function to use a new helper for the
ifa_list iteration to avoid sparse warnings once the proper __rcu
annotations get added in struct in_device later.

But, in light of the above, just remove afs_get_ipv4_interfaces.

Compile tested only.

Cc: David Howells <dhowells@redhat.com>
Cc: linux-afs@lists.infradead.org
Signed-off-by: Florian Westphal <fw@strlen.de>
Tested-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-02 18:06:26 -07:00
Thomas Gleixner
2874c5fd28 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152
Based on 1 normalized pattern(s):

  this program is free software you can redistribute it and or modify
  it under the terms of the gnu general public license as published by
  the free software foundation either version 2 of the license or at
  your option any later version

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 3029 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-30 11:26:32 -07:00
David Howells
b835915325 afs: Pass pre-fetch server and volume break counts into afs_iget5_set()
Pass the server and volume break counts from before the status fetch
operation that queried the attributes of a file into afs_iget5_set() so
that the new vnode's break counters can be initialised appropriately.

This allows detection of a volume or server break that happened whilst we
were fetching the status or setting up the vnode.

Fixes: c435ee3455 ("afs: Overhaul the callback handling")
Signed-off-by: David Howells <dhowells@redhat.com>
2019-05-16 22:23:21 +01:00
David Howells
a38a75581e afs: Fix unlink to handle YFS.RemoveFile2 better
Make use of the status update for the target file that the YFS.RemoveFile2
RPC op returns to correctly update the vnode as to whether the file was
actually deleted or just had nlink reduced.

Fixes: 30062bd13e ("afs: Implement YFS support in the fs client")
Signed-off-by: David Howells <dhowells@redhat.com>
2019-05-16 22:23:21 +01:00
David Howells
f642404a04 afs: Make vnode->cb_interest RCU safe
Use RCU-based freeing for afs_cb_interest struct objects and use RCU on
vnode->cb_interest.  Use that change to allow afs_check_validity() to use
read_seqbegin_or_lock() instead of read_seqlock_excl().

This also requires the caller of afs_check_validity() to hold the RCU read
lock across the call.

Signed-off-by: David Howells <dhowells@redhat.com>
2019-05-16 22:23:21 +01:00
David Howells
c925bd0ac4 afs: Split afs_validate() so first part can be used under LOOKUP_RCU
Split afs_validate() so that the part that decides if the vnode is still
valid can be used under LOOKUP_RCU conditions from afs_d_revalidate().

Signed-off-by: David Howells <dhowells@redhat.com>
2019-05-16 22:23:21 +01:00
David Howells
7c71245866 afs: Don't save callback version and type fields
Don't save callback version and type fields as the version is about the
format of the callback information and the type is relative to the
particular RPC call.

Signed-off-by: David Howells <dhowells@redhat.com>
2019-05-16 22:23:21 +01:00
David Howells
a58823ac45 afs: Fix application of status and callback to be under same lock
When applying the status and callback in the response of an operation,
apply them in the same critical section so that there's no race between
checking the callback state and checking status-dependent state (such as
the data version).

Fix this by:

 (1) Allocating a joint {status,callback} record (afs_status_cb) before
     calling the RPC function for each vnode for which the RPC reply
     contains a status or a status plus a callback.  A flag is set in the
     record to indicate if a callback was actually received.

 (2) These records are passed into the RPC functions to be filled in.  The
     afs_decode_status() and yfs_decode_status() functions are removed and
     the cb_lock is no longer taken.

 (3) xdr_decode_AFSFetchStatus() and xdr_decode_YFSFetchStatus() no longer
     update the vnode.

 (4) xdr_decode_AFSCallBack() and xdr_decode_YFSCallBack() no longer update
     the vnode.

 (5) vnodes, expected data-version numbers and callback break counters
     (cb_break) no longer need to be passed to the reply delivery
     functions.

     Note that, for the moment, the file locking functions still need
     access to both the call and the vnode at the same time.

 (6) afs_vnode_commit_status() is now given the cb_break value and the
     expected data_version and the task of applying the status and the
     callback to the vnode are now done here.

     This is done under a single taking of vnode->cb_lock.

 (7) afs_pages_written_back() is now called by afs_store_data() rather than
     by the reply delivery function.

     afs_pages_written_back() has been moved to before the call point and
     is now given the first and last page numbers rather than a pointer to
     the call.

 (8) The indicator from YFS.RemoveFile2 as to whether the target file
     actually got removed (status.abort_code == VNOVNODE) rather than
     merely dropping a link is now checked in afs_unlink rather than in
     xdr_decode_YFSFetchStatus().

Supplementary fixes:

 (*) afs_cache_permit() now gets the caller_access mask from the
     afs_status_cb object rather than picking it out of the vnode's status
     record.  afs_fetch_status() returns caller_access through its argument
     list for this purpose also.

 (*) afs_inode_init_from_status() now uses a write lock on cb_lock rather
     than a read lock and now sets the callback inside the same critical
     section.

Fixes: c435ee3455 ("afs: Overhaul the callback handling")
Signed-off-by: David Howells <dhowells@redhat.com>
2019-05-16 16:25:21 +01:00
David Howells
4571577f16 afs: Always get the reply time
Always ask for the reply time from AF_RXRPC as it's used to calculate the
callback expiry time and lock expiry times, so it's needed by most FS
operations.

Signed-off-by: David Howells <dhowells@redhat.com>
2019-05-16 16:25:21 +01:00
David Howells
87182759cd afs: Fix order-1 allocation in afs_do_lookup()
afs_do_lookup() will do an order-1 allocation to allocate status records if
there are more than 39 vnodes to stat.

Fix this by allocating an array of {status,callback} records for each vnode
we want to examine using vmalloc() if larger than a page.

This not only gets rid of the order-1 allocation, but makes it easier to
grow beyond 50 records for YFS servers.  It also allows us to move to
{status,callback} tuples for other calls too and makes it easier to lock
across the application of the status and the callback to the vnode.

Fixes: 5cf9dd55a0 ("afs: Prospectively look up extra files when doing a single lookup")
Signed-off-by: David Howells <dhowells@redhat.com>
2019-05-16 16:25:21 +01:00
David Howells
ffba718e93 afs: Get rid of afs_call::reply[]
Replace the afs_call::reply[] array with a bunch of typed members so that
the compiler can use type-checking on them.  It's also easier for the eye
to see what's going on.

Signed-off-by: David Howells <dhowells@redhat.com>
2019-05-16 16:25:21 +01:00
David Howells
20b8391fff afs: Make some RPC operations non-interruptible
Make certain RPC operations non-interruptible, including:

 (*) Set attributes
 (*) Store data

     We don't want to get interrupted during a flush on close, flush on
     unlock, writeback or an inode update, leaving us in a state where we
     still need to do the writeback or update.

 (*) Extend lock
 (*) Release lock

     We don't want to get lock extension interrupted as the file locks on
     the server are time-limited.  Interruption during lock release is less
     of an issue since the lock is time-limited, but it's better to
     complete the release to avoid a several-minute wait to recover it.

     *Setting* the lock isn't a problem if it's interrupted since we can
      just return to the user and tell them they were interrupted - at
      which point they can elect to retry.

 (*) Silly unlink

     We want to remove silly unlink files if we can, rather than leaving
     them for the salvager to clear up.

Note that whilst these calls are no longer interruptible, they do have
timeouts on them, so if the server stops responding the call will fail with
something like ETIME or ECONNRESET.

Without this, the following:

	kAFS: Unexpected error from FS.StoreData -512

appears in dmesg when a pending store data gets interrupted and some
processes may just hang.

Additionally, make the code that checks/updates the server record ignore
failure due to interruption if the main call is uninterruptible and if the
server has an address list.  The next op will check it again since the
expiration time on the old list has past.

Fixes: d2ddc776a4 ("afs: Overhaul volume and server record caching and fileserver rotation")
Reported-by: Jonathan Billings <jsbillings@jsbillings.org>
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2019-05-16 16:25:20 +01:00
David Howells
94f699c9cd afs: Fix the maximum lifespan of VL and probe calls
If an older AFS server doesn't support an operation, it may accept the call
and then sit on it forever, happily responding to pings that make kafs
think that the call is still alive.

Fix this by setting the maximum lifespan of Volume Location service calls
in particular and probe calls in general so that they don't run on
endlessly if they're not supported.

Signed-off-by: David Howells <dhowells@redhat.com>
2019-05-16 16:25:20 +01:00
David Howells
d5c32c89b2 afs: Fix cell DNS lookup
Currently, once configured, AFS cells are looked up in the DNS at regular
intervals - which is a waste of resources if those cells aren't being
used.  It also leads to a problem where cells preloaded, but not
configured, before the network is brought up end up effectively statically
configured with no VL servers and are unable to get any.

Fix this by not doing the DNS lookup until the first time a cell is
touched.  It is waited for if we don't have any cached records yet,
otherwise the DNS lookup to maintain the record is done in the background.

This has the downside that the first time you touch a cell, you now have to
wait for the upcall to do the required DNS lookups rather than them already
being cached.

Further, the record is not replaced if the old record has at least one
server in it and the new record doesn't have any.

Fixes: 0a5143f2f8 ("afs: Implement VL server rotation")
Signed-off-by: David Howells <dhowells@redhat.com>
2019-05-16 12:58:23 +01:00
David Howells
773e0c4025 afs: Fix afs_xattr_get_yfs() to not try freeing an error value
afs_xattr_get_yfs() tries to free yacl, which may hold an error value (say
if yfs_fs_fetch_opaque_acl() failed and returned an error).

Fix this by allocating yacl up front (since it's a fixed-length struct,
unlike afs_acl) and passing it in to the RPC function.  This also allows
the flags to be placed in the object rather than passing them through to
the RPC function.

Fixes: ae46578b96 ("afs: Get YFS ACLs and information through xattrs")
Signed-off-by: David Howells <dhowells@redhat.com>
2019-05-15 17:35:53 +01:00
David Howells
f5e4546347 afs: Implement YFS ACL setting
Implement the setting of YFS ACLs in AFS through the interface of setting
the afs.yfs.acl extended attribute on the file.

Signed-off-by: David Howells <dhowells@redhat.com>
2019-05-07 16:48:44 +01:00
David Howells
ae46578b96 afs: Get YFS ACLs and information through xattrs
The YFS/AuriStor variant of AFS provides more capable ACLs and provides
per-volume ACLs and per-file ACLs as well as per-directory ACLs.  It also
provides some extra information that can be retrieved through four ACLs:

 (1) afs.yfs.acl

     The YFS file ACL (not the same format as afs.acl).

 (2) afs.yfs.vol_acl

     The YFS volume ACL.

 (3) afs.yfs.acl_inherited

     "1" if a file's ACL is inherited from its parent directory, "0"
     otherwise.

 (4) afs.yfs.acl_num_cleaned

     The number of of ACEs removed from the ACL by the server because the
     PT entries were removed from the PTS database (ie. the subject is no
     longer known).

Signed-off-by: David Howells <dhowells@redhat.com>
2019-05-07 16:48:44 +01:00
Joe Gorse
b10494af49 afs: implement acl setting
Implements the setting of ACLs in AFS by means of setting the
afs.acl extended attribute on the file.

Signed-off-by: Joe Gorse <jhgorse@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2019-05-07 16:48:44 +01:00
David Howells
260f082bae afs: Get an AFS3 ACL as an xattr
Implement an xattr on AFS files called "afs.acl" that retrieves a file's
ACL.  It returns the raw AFS3 ACL from the result of calling FS.FetchACL,
leaving any interpretation to userspace.

Note that whilst YFS servers will respond to FS.FetchACL, this will render
a more-advanced YFS ACL down.  Use "afs.yfs.acl" instead for that.

Signed-off-by: David Howells <dhowells@redhat.com>
2019-05-07 16:48:44 +01:00
David Howells
b134d687dd afs: Log more information for "kAFS: AFS vnode with undefined type\n"
Log more information when "kAFS: AFS vnode with undefined type\n" is
displayed due to a vnode record being retrieved from the server that
appears to have a duff file type (usually 0).  This prints more information
to try and help pin down the problem.

Signed-off-by: David Howells <dhowells@redhat.com>
2019-05-07 16:48:44 +01:00
David Howells
6c6c1d63c2 afs: Provide mount-time configurable byte-range file locking emulation
Provide byte-range file locking emulation that can be configured at mount
time to one of four modes:

 (1) flock=local.  Locking is done locally only and no reference is made to
     the server.

 (2) flock=openafs.  Byte-range locking is done locally only; whole-file
     locking is done with reference to the server.  Whole-file locks cannot
     be upgraded unless the client holds an exclusive lock.

 (3) flock=strict.  Byte-range and whole-file locking both require a
     sufficient whole-file lock on the server.

 (4) flock=write.  As strict, but the client always gets an exclusive
     whole-file lock on the server.

Signed-off-by: David Howells <dhowells@redhat.com>
2019-04-25 14:26:52 +01:00
David Howells
79ddbfa500 afs: Implement sillyrename for unlink and rename
Implement sillyrename for AFS unlink and rename, using the NFS variant
implementation as a basis.

Note that the asynchronous file locking extender/releaser has to be
notified with a state change to stop it complaining if there's a race
between that and the actual file deletion.

A tracepoint, afs_silly_rename, is also added to note the silly rename and
the cleanup.  The afs_edit_dir tracepoint is given some extra reason
indicators and the afs_flock_ev tracepoint is given a silly-delete file
lock cancellation indicator.

Signed-off-by: David Howells <dhowells@redhat.com>
2019-04-25 14:26:51 +01:00
David Howells
cdfb26b40d afs: Handle lock rpc ops failing on a file that got deleted
Holding a file lock on an AFS file does not prevent it from being deleted
on the server, so we need to handle an error resulting from that when we
try setting, extending or releasing a lock.

Fix this by adding a "deleted" lock state and cancelling the lock extension
process for that file and aborting all waiters for the lock.

Fixes: 0fafdc9f88 ("afs: Fix file locking")
Reported-by: Jonathan Billings <jsbillin@umich.edu>
Signed-off-by: David Howells <dhowells@redhat.com>
2019-04-25 14:26:51 +01:00
David Howells
a690f60a2b afs: Calculate lock extend timer from set/extend reply reception
Record the timestamp on the first reply DATA packet received in response to
a set- or extend-lock operation, then use this to calculate the time
remaining till the lock expires rather than using whatever time the
requesting process wakes up and finishes processing the operation as a
base.

Signed-off-by: David Howells <dhowells@redhat.com>
2019-04-25 14:26:50 +01:00
David Howells
0b9bf3812a afs: Split wait from afs_make_call()
Split the call to afs_wait_for_call_to_complete() from afs_make_call() to
make it easier to handle asynchronous calls and to make it easier to
convert a synchronous call to an asynchronous one in future, for instance
when someone tries to interrupt an operation by pressing Ctrl-C.

Signed-off-by: David Howells <dhowells@redhat.com>
2019-04-25 14:26:50 +01:00
David Howells
eeba1e9cf3 afs: Fix in-progess ops to ignore server-level callback invalidation
The in-kernel afs filesystem client counts the number of server-level
callback invalidation events (CB.InitCallBackState* RPC operations) that it
receives from the server.  This is stored in cb_s_break in various
structures, including afs_server and afs_vnode.

If an inode is examined by afs_validate(), say, the afs_server copy is
compared, along with other break counters, to those in afs_vnode, and if
one or more of the counters do not match, it is considered that the
server's callback promise is broken.  At points where this happens,
AFS_VNODE_CB_PROMISED is cleared to indicate that the status must be
refetched from the server.

afs_validate() issues an FS.FetchStatus operation to get updated metadata -
and based on the updated data_version may invalidate the pagecache too.

However, the break counters are also used to determine whether to note a
new callback in the vnode (which would set the AFS_VNODE_CB_PROMISED flag)
and whether to cache the permit data included in the YFSFetchStatus record
by the server.


The problem comes when the server sends us a CB.InitCallBackState op.  The
first such instance doesn't cause cb_s_break to be incremented, but rather
causes AFS_SERVER_FL_NEW to be cleared - but thereafter, say some hours
after last use and all the volumes have been automatically unmounted and
the server has forgotten about the client[*], this *will* likely cause an
increment.

 [*] There are other circumstances too, such as the server restarting or
     needing to make space in its callback table.

Note that the server won't send us a CB.InitCallBackState op until we talk
to it again.

So what happens is:

 (1) A mount for a new volume is attempted, a inode is created for the root
     vnode and vnode->cb_s_break and AFS_VNODE_CB_PROMISED aren't set
     immediately, as we don't have a nominated server to talk to yet - and
     we may iterate through a few to find one.

 (2) Before the operation happens, afs_fetch_status(), say, notes in the
     cursor (fc.cb_break) the break counter sum from the vnode, volume and
     server counters, but the server->cb_s_break is currently 0.

 (3) We send FS.FetchStatus to the server.  The server sends us back
     CB.InitCallBackState.  We increment server->cb_s_break.

 (4) Our FS.FetchStatus completes.  The reply includes a callback record.

 (5) xdr_decode_AFSCallBack()/xdr_decode_YFSCallBack() check to see whether
     the callback promise was broken by checking the break counter sum from
     step (2) against the current sum.

     This fails because of step (3), so we don't set the callback record
     and, importantly, don't set AFS_VNODE_CB_PROMISED on the vnode.

This does not preclude the syscall from progressing, and we don't loop here
rechecking the status, but rather assume it's good enough for one round
only and will need to be rechecked next time.

 (6) afs_validate() it triggered on the vnode, probably called from
     d_revalidate() checking the parent directory.

 (7) afs_validate() notes that AFS_VNODE_CB_PROMISED isn't set, so doesn't
     update vnode->cb_s_break and assumes the vnode to be invalid.

 (8) afs_validate() needs to calls afs_fetch_status().  Go back to step (2)
     and repeat, every time the vnode is validated.

This primarily affects volume root dir vnodes.  Everything subsequent to
those inherit an already incremented cb_s_break upon mounting.


The issue is that we assume that the callback record and the cached permit
information in a reply from the server can't be trusted after getting a
server break - but this is wrong since the server makes sure things are
done in the right order, holding up our ops if necessary[*].

 [*] There is an extremely unlikely scenario where a reply from before the
     CB.InitCallBackState could get its delivery deferred till after - at
     which point we think we have a promise when we don't.  This, however,
     requires unlucky mass packet loss to one call.

AFS_SERVER_FL_NEW tries to paper over the cracks for the initial mount from
a server we've never contacted before, but this should be unnecessary.
It's also further insulated from the problem on an initial mount by
querying the server first with FS.GetCapabilities, which triggers the
CB.InitCallBackState.


Fix this by

 (1) Remove AFS_SERVER_FL_NEW.

 (2) In afs_calc_vnode_cb_break(), don't include cb_s_break in the
     calculation.

 (3) In afs_cb_is_broken(), don't include cb_s_break in the check.


Signed-off-by: David Howells <dhowells@redhat.com>
2019-04-13 08:37:37 +01:00
David Howells
c99c2171fc afs: Use fs_context to pass parameters over automount
Alter the AFS automounting code to create and modify an fs_context struct
when parameterising a new mount triggered by an AFS mountpoint rather than
constructing device name and option strings.

Also remove the cell=, vol= and rwpath options as they are then redundant.
The reason they existed is because the 'device name' may be derived
literally from a mountpoint object in the filesystem, so default cell and
parent-type information needed to be passed in by some other method from
the automount routines.  The vol= option didn't end up being used.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Eric W. Biederman <ebiederm@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-02-28 03:29:39 -05:00
David Howells
13fcc68370 afs: Add fs_context support
Add fs_context support to the AFS filesystem, converting the parameter
parsing to store options there.

This will form the basis for namespace propagation over mountpoints within
the AFS model, thereby allowing AFS to be used in containers more easily.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-02-28 03:29:38 -05:00
David Howells
4584ae96ae afs: Fix missing net error handling
kAFS can be given certain network errors (EADDRNOTAVAIL, EHOSTDOWN and
ERFKILL) that it doesn't handle in its server/address rotation algorithms.
They cause the probing and rotation to abort immediately rather than
rotating.

Fix this by:

 (1) Abstracting out the error prioritisation from the VL and FS rotation
     algorithms into a common function and expand usage into the server
     probing code.

     When multiple errors are available, this code selects the one we'd
     prefer to return.

 (2) Add handling for EADDRNOTAVAIL, EHOSTDOWN and ERFKILL.

Fixes: 0fafdc9f88 ("afs: Fix file locking")
Fixes: 0338747d8454 ("afs: Probe multiple fileservers simultaneously")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-11-29 21:08:14 -05:00
David Howells
3bf0fb6f33 afs: Probe multiple fileservers simultaneously
Send probes to all the unprobed fileservers in a fileserver list on all
addresses simultaneously in an attempt to find out the fastest route whilst
not getting stuck for 20s on any server or address that we don't get a
reply from.

This alleviates the problem whereby attempting to access a new server can
take a long time because the rotation algorithm ends up rotating through
all servers and addresses until it finds one that responds.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24 00:41:09 +01:00
David Howells
18ac61853c afs: Fix callback handling
In some circumstances, the callback interest pointer is NULL, so in such a
case we can't dereference it when checking to see if the callback is
broken.  This causes an oops in some circumstances.

Fix this by replacing the function that worked out the aggregate break
counter with one that actually does the comparison, and then make that
return true (ie. broken) if there is no callback interest as yet (ie. the
pointer is NULL).

Fixes: 68251f0a68 ("afs: Fix whole-volume callback handling")
Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24 00:41:09 +01:00
David Howells
2feeaf8433 afs: Eliminate the address pointer from the address list cursor
Eliminate the address pointer from the address list cursor as it's
redundant (ac->addrs[ac->index] can be used to find the same address) and
address lists must be replaced rather than being rearranged, so is of
limited value.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24 00:41:09 +01:00
David Howells
744bcd713a afs: Allow dumping of server cursor on operation failure
Provide an option to allow the file or volume location server cursor to be
dumped if the rotation routine falls off the end without managing to
contact a server.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24 00:41:09 +01:00
David Howells
30062bd13e afs: Implement YFS support in the fs client
Implement support for talking to YFS-variant fileservers in the cache
manager and the filesystem client.  These implement upgraded services on
the same port as their AFS services.

YFS fileservers provide expanded capabilities over AFS.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24 00:41:08 +01:00
David Howells
12d8e95a91 afs: Calc callback expiry in op reply delivery
Calculate the callback expiration time at the point of operation reply
delivery, using the reply time queried from AF_RXRPC on that call as a
base.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24 00:41:08 +01:00
David Howells
f51375cd9e afs: Add a couple of tracepoints to log I/O errors
Add a couple of tracepoints to log the production of I/O errors within the AFS
filesystem.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24 00:41:07 +01:00
David Howells
0a5143f2f8 afs: Implement VL server rotation
Track VL servers as independent entities rather than lumping all their
addresses together into one set and implement server-level rotation by:

 (1) Add the concept of a VL server list, where each server has its own
     separate address list.  This code is similar to the FS server list.

 (2) Use the DNS resolver to retrieve a set of servers and their associated
     addresses, ports, preference and weight ratings.

 (3) In the case of a legacy DNS resolver or an address list given directly
     through /proc/net/afs/cells, create a list containing just a dummy
     server record and attach all the addresses to that.

 (4) Implement a simple rotation policy, for the moment ignoring the
     priorities and weights assigned to the servers.

 (5) Show the address list through /proc/net/afs/<cell>/vlservers.  This
     also displays the source and status of the data as indicated by the
     upcall.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24 00:41:07 +01:00
David Howells
e7f680f45b afs: Improve FS server rotation error handling
Improve the error handling in FS server rotation by:

 (1) Cache the latest useful error value for the fs operation as a whole in
     struct afs_fs_cursor separately from the error cached in the
     afs_addr_cursor struct.  The one in the address cursor gets clobbered
     occasionally.  Copy over the error to the fs operation only when it's
     something we'd be interested in passing to userspace.

 (2) Make it so that EDESTADDRREQ is the default that is seen only if no
     addresses are available to be accessed.

 (3) When calling utility functions, such as checking a volume status or
     probing a fileserver, don't let a successful result clobber the cached
     error in the cursor; instead, stash the result in a temporary variable
     until it has been assessed.

 (4) Don't return ETIMEDOUT or ETIME if a better error, such as
     ENETUNREACH, is already cached.

 (5) On leaving the rotation loop, turn any remote abort code into a more
     useful error than ECONNABORTED.

Fixes: d2ddc776a4 ("afs: Overhaul volume and server record caching and fileserver rotation")
Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24 00:41:07 +01:00
David Howells
12bdcf333f afs: Set up the iov_iter before calling afs_extract_data()
afs_extract_data sets up a temporary iov_iter and passes it to AF_RXRPC
each time it is called to describe the remaining buffer to be filled.

Instead:

 (1) Put an iterator in the afs_call struct.

 (2) Set the iterator for each marshalling stage to load data into the
     appropriate places.  A number of convenience functions are provided to
     this end (eg. afs_extract_to_buf()).

     This iterator is then passed to afs_extract_data().

 (3) Use the new ITER_DISCARD iterator to discard any excess data provided
     by FetchData.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24 00:41:07 +01:00
David Howells
160cb9574b afs: Better tracing of protocol errors
Include the site of detection of AFS protocol errors in trace lines to
better be able to determine what went wrong.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24 00:41:07 +01:00
David S. Miller
d864991b22 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts were easy to resolve using immediate context mostly,
except the cls_u32.c one where I simply too the entire HEAD
chunk.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-12 21:38:46 -07:00
David Howells
6b3944e42e afs: Fix cell proc list
Access to the list of cells by /proc/net/afs/cells has a couple of
problems:

 (1) It should be checking against SEQ_START_TOKEN for the keying the
     header line.

 (2) It's only holding the RCU read lock, so it can't just walk over the
     list without following the proper RCU methods.

Fix these by using an hlist instead of an ordinary list and using the
appropriate accessor functions to follow it with RCU.

Since the code that adds a cell to the list must also necessarily change,
sort the list on insertion whilst we're at it.

Fixes: 989782dcdc ("afs: Overhaul cell database management")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-12 13:18:57 +02:00
David Howells
68eb64c3d2 afs: Do better max capacity handling on address lists
Note the maximum allocated capacity in an afs_addr_list struct and discard
addresses that would exceed it in afs_merge_fs_addr{4,6}().

Also, since the current maximum capacity is less than 255, reduce the
relevant members to bytes.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-04 09:32:27 +01:00
Souptick Joarder
0722f18620 fs/afs: use new return type vm_fault_t
Use new return type vm_fault_t for fault handler in struct
vm_operations_struct.  For now, this is just documenting that the
function returns a VM_FAULT value rather than an errno.  Once all
instances are converted, vm_fault_t will become a distinct type.

See 1c8f422059 ("mm: change return type to vm_fault_t") for reference.

Link: http://lkml.kernel.org/r/20180702152017.GA3780@jordon-HP-15-Notebook-PC
Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-23 18:48:43 -07:00
David Howells
47ea0f2ebf afs: Optimise callback breaking by not repeating volume lookup
At the moment, afs_break_callbacks calls afs_break_one_callback() for each
separate FID it was given, and the latter looks up the volume individually
for each one.

However, this is inefficient if two or more FIDs have the same vid as we
could reuse the volume.  This is complicated by cell aliasing whereby we
may have multiple cells sharing a volume and can therefore have multiple
callback interests for any particular volume ID.

At the moment afs_break_one_callback() scans the entire list of volumes
we're getting from a server and breaks the appropriate callback in every
matching volume, regardless of cell.  This scan is done for every FID.

Optimise callback breaking by the following means:

 (1) Sort the FID list by vid so that all FIDs belonging to the same volume
     are clumped together.

     This is done through the use of an indirection table as we cannot do
     an insertion sort on the afs_callback_break array as we decode FIDs
     into it as we subsequently also have to decode callback info into it
     that corresponds by array index only.

     We also don't really want to bubblesort afterwards if we can avoid it.

 (2) Sort the server->cb_interests array by vid so that all the matching
     volumes are grouped together.  This permits the scan to stop after
     finding a record that has a higher vid.

 (3) When breaking FIDs, we try to keep server->cb_break_lock as long as
     possible, caching the start point in the array for that volume group
     as long as possible.

     It might make sense to add another layer in that list and have a
     refcounted volume ID anchor that has the matching interests attached
     to it rather than being in the list.  This would allow the lock to be
     dropped without losing the cursor.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-06-15 15:27:09 +01:00
David Howells
0da0b7fd73 afs: Display manually added cells in dynamic root mount
Alter the dynroot mount so that cells created by manipulation of
/proc/fs/afs/cells and /proc/fs/afs/rootcell and by specification of a root
cell as a module parameter will cause directories for those cells to be
created in the dynamic root superblock for the network namespace[*].

To this end:

 (1) Only one dynamic root superblock is now created per network namespace
     and this is shared between all attempts to mount it.  This makes it
     easier to find the superblock to modify.

 (2) When a dynamic root superblock is created, the list of cells is walked
     and directories created for each cell already defined.

 (3) When a new cell is added, if a dynamic root superblock exists, a
     directory is created for it.

 (4) When a cell is destroyed, the directory is removed.

 (5) These directories are created by calling lookup_one_len() on the root
     dir which automatically creates them if they don't exist.

[*] Inasmuch as network namespaces are currently supported here.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-06-15 15:27:09 +01:00
David Howells
b6cfbecafb afs: Handle CONFIG_PROC_FS=n
The AFS filesystem depends at the moment on /proc for configuration and
also presents information that way - however, this causes a compilation
failure if procfs is disabled.

Fix it so that the procfs bits aren't compiled in if procfs is disabled.

This means that you can't configure the AFS filesystem directly, but it is
still usable provided that an up-to-date keyutils is installed to look up
cells by SRV or AFSDB DNS records.

Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: David Howells <dhowells@redhat.com>
2018-06-15 00:52:55 -04:00
Al Viro
de52cf922a AFS fixes
-----BEGIN PGP SIGNATURE-----
 
 iQIVAwUAWvmaZvu3V2unywtrAQKZoA/9HzO6QsB7h7hWY6tTuoL0gD8T8S4hC7l3
 UYFtTgq0rFHJYiET4SWoy0Sfs8rY1iFPtaIeFVQG804SrnXu5/Q1tsv+1lRhZIuo
 /upAtZ3xEcqvAqU8pgcksKl/KUdmm7ZHUbhAFCasu+1eczGF5Q55UAUgonFrnEMi
 9N0WviRUkRAlTre7cvCMRI05c+HJV+PCYrJPjStAkJeuS1CuTEAT/d58NumquMAt
 6ENkpR4OhRUJZDhYH7XIRLm7hsYjr9v3VIeCiLpYqUZGuvhaj3jzPi0e9zD5PDzZ
 lyyodQVegBs88V2rXrjjZHohNQRiuSzI+42pMXrdaDu5jBFFqYLEeaBoperJY7nl
 W6l6HSb/I8VValM7iwkyzNWeQ6KhdUhYvA5ljYaJufZvqxp4di9xT4mAxRqbHSX+
 H5I/n+R27FEOFAqnWInaksj5IO80HGThrGhdz9O/4pa8xITz7W2ZKg5YMLEoF9yp
 /QUxsn3lz4VD4tjPrqampJ+IwbpQB+XDiJhM4boI47kC2IxEc9L2QiYWlFl/okZ4
 CGuXsluQFPleR3Mo8xq1WaQzmT40iYQ+aBOPq1/OhDisexZJ55Cjha1GHk/8aHDu
 GL5UiL7AfWEwY20mJiCObg8u2nnkwg/0YPR3awDBlCMDBeYhxbSFOLrKiQxUjWM9
 Pp6PUhTtSjU=
 =1ow3
 -----END PGP SIGNATURE-----

Merge tag 'afs-fixes-20180514' into afs-proc

backmerge AFS fixes that went into mainline and deal with
the conflict in fs/afs/fsclient.c

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-06-02 18:09:27 -04:00
David Howells
5b86d4ff5d afs: Implement network namespacing
Implement network namespacing within AFS, but don't yet let mounts occur
outside the init namespace.  An additional patch will be required propagate
the network namespace across automounts.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-23 12:01:15 +01:00
David Howells
1588def91d afs: Mark afs_net::ws_cell as __rcu and set using rcu functions
The afs_net::ws_cell member is sometimes used under RCU conditions from
within an seq-readlock.  It isn't, however, marked __rcu and it isn't set
using the proper RCU barrier-imposing functions.

Fix this by annotating it with __rcu and using appropriate barriers to
make sure accesses are correctly ordered.

Without this, the code can produce the following warning:

>> fs/afs/proc.c:151:24: sparse: incompatible types in comparison expression (different address spaces)

Fixes: f044c8847b ("afs: Lay the groundwork for supporting network namespaces")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-23 11:51:29 +01:00
David Howells
68251f0a68 afs: Fix whole-volume callback handling
It's possible for an AFS file server to issue a whole-volume notification
that callbacks on all the vnodes in the file have been broken.  This is
done for R/O and backup volumes (which don't have per-file callbacks) and
for things like a volume being taken offline.

Fix callback handling to detect whole-volume notifications, to track it
across operations and to check it during inode validation.

Fixes: c435ee3455 ("afs: Overhaul the callback handling")
Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-14 15:15:18 +01:00
David Howells
d4a96bec7a afs: Fix refcounting in callback registration
The refcounting on afs_cb_interest struct objects in
afs_register_server_cb_interest() is wrong as it uses the server list
entry's call back interest pointer without regard for the fact that it
might be replaced at any time and the object thrown away.

Fix this by:

 (1) Put a lock on the afs_server_list struct that can be used to
     mediate access to the callback interest pointers in the servers array.

 (2) Keep a ref on the callback interest that we get from the entry.

 (3) Dropping the old reference held by vnode->cb_interest if we replace
     the pointer.

Fixes: c435ee3455 ("afs: Overhaul the callback handling")
Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-14 13:17:35 +01:00
David Howells
f2686b0926 afs: Fix giving up callbacks on server destruction
When a server record is destroyed, we want to send a message to the server
telling it that we're giving up all the callbacks it has promised us.

Apply two fixes to this:

 (1) Only send the FS.GiveUpAllCallBacks message if we actually got a
     callback from that server.  We assume this to be the case if we
     performed at least one successful FS operation on that server.

 (2) Send it to the address last used for that server rather than always
     picking the first address in the list (which might be unreachable).

Fixes: d2ddc776a4 ("afs: Overhaul volume and server record caching and fileserver rotation")
Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-14 13:17:35 +01:00
David Howells
b61f7dcf4e afs: Fix directory page locking
The afs directory loading code (primarily afs_read_dir()) locks all the
pages that hold a directory's content blob to defend against
getdents/getdents races and getdents/lookup races where the competitors
issue conflicting reads on the same data.  As the reads will complete
consecutively, they may retrieve different versions of the data and
one may overwrite the data that the other is busy parsing.

Fix this by not locking the pages at all, but rather by turning the
validation lock into an rwsem and getting an exclusive lock on it whilst
reading the data or validating the attributes and a shared lock whilst
parsing the data.  Sharing the attribute validation lock should be fine as
the data fetch will retrieve the attributes also.

The individual page locks aren't needed at all as the only place they're
being used is to serialise data loading.

Without this patch, the:

 	if (!test_bit(AFS_VNODE_DIR_VALID, &dvnode->flags)) {
		...
	}

part of afs_read_dir() may be skipped, leaving the pages unlocked when we
hit the success: clause - in which case we try to unlock the not-locked
pages, leading to the following oops:

  page:ffffe38b405b4300 count:3 mapcount:0 mapping:ffff98156c83a978 index:0x0
  flags: 0xfffe000001004(referenced|private)
  raw: 000fffe000001004 ffff98156c83a978 0000000000000000 00000003ffffffff
  raw: dead000000000100 dead000000000200 0000000000000001 ffff98156b27c000
  page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
  page->mem_cgroup:ffff98156b27c000
  ------------[ cut here ]------------
  kernel BUG at mm/filemap.c:1205!
  ...
  RIP: 0010:unlock_page+0x43/0x50
  ...
  Call Trace:
   afs_dir_iterate+0x789/0x8f0 [kafs]
   ? _cond_resched+0x15/0x30
   ? kmem_cache_alloc_trace+0x166/0x1d0
   ? afs_do_lookup+0x69/0x490 [kafs]
   ? afs_do_lookup+0x101/0x490 [kafs]
   ? key_default_cmp+0x20/0x20
   ? request_key+0x3c/0x80
   ? afs_lookup+0xf1/0x340 [kafs]
   ? __lookup_slow+0x97/0x150
   ? lookup_slow+0x35/0x50
   ? walk_component+0x1bf/0x490
   ? path_lookupat.isra.52+0x75/0x200
   ? filename_lookup.part.66+0xa0/0x170
   ? afs_end_vnode_operation+0x41/0x60 [kafs]
   ? __check_object_size+0x9c/0x171
   ? strncpy_from_user+0x4a/0x170
   ? vfs_statx+0x73/0xe0
   ? __do_sys_newlstat+0x39/0x70
   ? __x64_sys_getdents+0xc9/0x140
   ? __x64_sys_getdents+0x140/0x140
   ? do_syscall_64+0x5b/0x160
   ? entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: f3ddee8dc4 ("afs: Fix directory handling")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-14 13:17:35 +01:00
David Howells
5a81327616 afs: Do better accretion of small writes on newly created content
Processes like ld that do lots of small writes that aren't necessarily
contiguous result in a lot of small StoreData operations to the server, the
idea being that if someone else changes the data on the server, we only
write our changes over that and not the space between.  Further, we don't
want to write back empty space if we can avoid it to make it easier for the
server to do sparse files.

However, making lots of tiny RPC ops is a lot less efficient for the server
than one big one because each op requires allocation of resources and the
taking of locks, so we want to compromise a bit.

Reduce the load by the following:

 (1) If a file is just created locally or has just been truncated with
     O_TRUNC locally, allow subsequent writes to the file to be merged with
     intervening space if that space doesn't cross an entire intervening
     page.

 (2) Don't flush the file on ->flush() but rather on ->release() if the
     file was open for writing.

Just linking vmlinux.o, without this patch, looking in /proc/fs/afs/stats:

	file-wr : n=441 nb=513581204

and after the patch:

	file-wr : n=62 nb=513668555

there were 379 fewer StoreData RPC operations at the expense of an extra
87K being written.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-09 21:54:48 +01:00
David Howells
76a5cb6fc1 afs: Add stats for data transfer operations
Add statistics to /proc/fs/afs/stats for data transfer RPC operations.  New
lines are added that look like:

	file-rd : n=55794 nb=10252282150
	file-wr : n=9789 nb=3247763645

where n= indicates the number of ops completed and nb= indicates the number
of bytes successfully transferred.  file-rd is the counts for read/fetch
operations and file-wr the counts for write/store operations.

Note that directory and symlink downloading are included in the file-rd
stats at the moment.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-09 21:54:48 +01:00
David Howells
5f702c8e12 afs: Trace protocol errors
Trace protocol errors detected in afs.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-09 21:54:48 +01:00
David Howells
63a4681ff3 afs: Locally edit directory data for mkdir/create/unlink/...
Locally edit the contents of an AFS directory upon a successful inode
operation that modifies that directory (such as mkdir, create and unlink)
so that we can avoid the current practice of re-downloading the directory
after each change.

This is viable provided that the directory version number we get back from
the modifying RPC op is exactly incremented by 1 from what we had
previously.  The data in the directory contents is in a defined format that
we have to parse locally to perform lookups and readdir, so modifying isn't
a problem.

If the edit fails, we just clear the VALID flag on the directory and it
will be reloaded next time it is needed.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-09 21:54:48 +01:00
David Howells
f3ddee8dc4 afs: Fix directory handling
AFS directories are structured blobs that are downloaded just like files
and then parsed by the lookup and readdir code and, as such, are currently
handled in the pagecache like any other file, with the entire directory
content being thrown away each time the directory changes.

However, since the blob is a known structure and since the data version
counter on a directory increases by exactly one for each change committed
to that directory, we can actually edit the directory locally rather than
fetching it from the server after each locally-induced change.

What we can't do, though, is mix data from the server and data from the
client since the server is technically at liberty to rearrange or compress
a directory if it sees fit, provided it updates the data version number
when it does so and breaks the callback (ie. sends a notification).

Further, lookup with lookup-ahead, readdir and, when it arrives, local
editing are likely want to scan the whole of a directory.

So directory handling needs to be improved to maintain the coherency of the
directory blob prior to permitting local directory editing.

To this end:

 (1) If any directory page gets discarded, invalidate and reread the entire
     directory.

 (2) If readpage notes that if when it fetches a single page that the
     version number has changed, the entire directory is flagged for
     invalidation.

 (3) Read as much of the directory in one go as we can.

Note that this removes local caching of directories in fscache for the
moment as we can't pass the pages to fscache_read_or_alloc_pages() since
page->lru is in use by the LRU.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-09 21:54:48 +01:00
David Howells
66c7e1d319 afs: Split the dynroot stuff out and give it its own ops tables
Split the AFS dynamic root stuff out of the main directory handling file
and into its own file as they share little in common.

The dynamic root code also gets its own dentry and inode ops tables.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-09 21:54:00 +01:00
David Howells
a4ff7401fb afs: Keep track of invalid-before version for dentry coherency
Each afs dentry is tagged with the version that the parent directory was at
last time it was validated and, currently, if this differs, the directory
is scanned and the dentry is refreshed.

However, this leads to an excessive amount of revalidation on directories
that get modified on the client without conflict with another client.  We
know there's no conflict because the parent directory's data version number
got incremented by exactly 1 on any create, mkdir, unlink, etc., therefore
we can trust the current state of the unaffected dentries when we perform a
local directory modification.

Optimise by keeping track of the last version of the parent directory that
was changed outside of the client in the parent directory's vnode and using
that to validate the dentries rather than the current version.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-09 21:53:59 +01:00
David Howells
dd9fbcb8e1 afs: Rearrange status mapping
Rearrange the AFSFetchStatus to inode attribute mapping code in a number of
ways:

 (1) Use an XDR structure rather than a series of incremented pointer
     accesses when decoding an AFSFetchStatus object.  This allows
     out-of-order decode.

 (2) Don't store the if_version value but rather just check it and abort if
     it's not something we can handle.

 (3) Store the owner and group in the status record as raw values rather
     than converting them to kuid/kgid.  Do that when they're mapped into
     i_uid/i_gid.

 (4) Validate the type and abort code up front and abort if they're wrong.

 (5) Split the inode attribute setting out into its own function from the
     XDR decode of an AFSFetchStatus object.  This allows it to be called
     from elsewhere too.

 (6) Differentiate changes to data from changes to metadata.

 (7) Use the split-out attribute mapping function from afs_iget().

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-09 21:53:59 +01:00
David Howells
0c3a5ac281 afs: Make it possible to get the data version in readpage
Store the data version number indicated by an FS.FetchData op into the read
request structure so that it's accessible by the page reader.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-09 21:53:56 +01:00
David Howells
d55b4da433 afs: Introduce a statistics proc file
Introduce a proc file that displays a bunch of statistics for the AFS
filesystem in the current network namespace.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-09 21:53:54 +01:00
David Howells
6f8880d8e6 afs: Implement @sys substitution handling
Implement the AFS feature by which @sys at the end of a pathname component
may be substituted for one of a list of values, typically naming the
operating system.  Up to 16 alternatives may be specified and these are
tried in turn until one works.  Each network namespace has[*] a separate
independent list.

Upon creation of a new network namespace, the list of values is
initialised[*] to a single OpenAFS-compatible string representing arch type
plus "_linux26".  For example, on x86_64, the sysname is "amd64_linux26".

[*] Or will, once network namespace support is finalised in kAFS.

The list may be set by:

	# for i in foo bar linux-x86_64; do echo $i; done >/proc/fs/afs/sysname

for which separate writes to the same fd are amalgamated and applied on
close.  The LF character may be used as a separator to specify multiple
items in the same write() call.

The list may be cleared by:

	# echo >/proc/fs/afs/sysname

and read by:

	# cat /proc/fs/afs/sysname
	foo
	bar
	linux-x86_64

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-09 21:12:31 +01:00
David Howells
5cf9dd55a0 afs: Prospectively look up extra files when doing a single lookup
When afs_lookup() is called, prospectively look up the next 50 uncached
fids also from that same directory and cache the results, rather than just
looking up the one file requested.

This allows us to use the FS.InlineBulkStatus RPC op to increase efficiency
by fetching up to 50 file statuses at a time.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-09 21:12:31 +01:00
David Howells
fe342cf77b afs: Fix checker warnings
Fix warnings raised by checker, including:

 (*) Warnings raised by unequal comparison for the purposes of sorting,
     where the endianness doesn't matter:

fs/afs/addr_list.c:246:21: warning: restricted __be16 degrades to integer
fs/afs/addr_list.c:246:30: warning: restricted __be16 degrades to integer
fs/afs/addr_list.c:248:21: warning: restricted __be32 degrades to integer
fs/afs/addr_list.c:248:49: warning: restricted __be32 degrades to integer
fs/afs/addr_list.c:283:21: warning: restricted __be16 degrades to integer
fs/afs/addr_list.c:283:30: warning: restricted __be16 degrades to integer

 (*) afs_set_cb_interest() is not actually used and can be removed.

 (*) afs_cell_gc_delay() should be provided with a sysctl.

 (*) afs_cell_destroy() needs to use rcu_access_pointer() to read
     cell->vl_addrs.

 (*) afs_init_fs_cursor() should be static.

 (*) struct afs_vnode::permit_cache needs to be marked __rcu.

 (*) afs_server_rcu() needs to use rcu_access_pointer().

 (*) afs_destroy_server() should use rcu_access_pointer() on
     server->addresses as the server object is no longer accessible.

 (*) afs_find_server() casts __be16/__be32 values to int in order to
     directly compare them for the purpose of finding a match in a list,
     but is should also annotate the cast with __force to avoid checker
     warnings.

 (*) afs_check_permit() accesses vnode->permit_cache outside of the RCU
     readlock, though it doesn't then access the value; the extraneous
     access is deleted.

False positives:

 (*) Conditional locking around the code in xdr_decode_AFSFetchStatus.  This
     can be dealt with in a separate patch.

fs/afs/fsclient.c:148:9: warning: context imbalance in 'xdr_decode_AFSFetchStatus' - different lock contexts for basic block

 (*) Incorrect handling of seq-retry lock context balance:

fs/afs/inode.c:455:38: warning: context imbalance in 'afs_getattr' - different
lock contexts for basic block
fs/afs/server.c:52:17: warning: context imbalance in 'afs_find_server' - different lock contexts for basic block
fs/afs/server.c:128:17: warning: context imbalance in 'afs_find_server_by_uuid' - different lock contexts for basic block

Errors:

 (*) afs_lookup_cell_rcu() needs to break out of the seq-retry loop, not go
     round again if it successfully found the workstation cell.

 (*) Fix UUID decode in afs_deliver_cb_probe_uuid().

 (*) afs_cache_permit() has a missing rcu_read_unlock() before one of the
     jumps to the someone_else_changed_it label.  Move the unlock to after
     the label.

 (*) afs_vl_get_addrs_u() is using ntohl() rather than htonl() when
     encoding to XDR.

 (*) afs_deliver_yfsvl_get_endpoints() is using htonl() rather than ntohl()
     when decoding from XDR.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-09 21:12:31 +01:00
David Howells
402cb8dda9 fscache: Attach the index key and aux data to the cookie
Attach copies of the index key and auxiliary data to the fscache cookie so
that:

 (1) The callbacks to the netfs for this stuff can be eliminated.  This
     can simplify things in the cache as the information is still
     available, even after the cache has relinquished the cookie.

 (2) Simplifies the locking requirements of accessing the information as we
     don't have to worry about the netfs object going away on us.

 (3) The cache can do lazy updating of the coherency information on disk.
     As long as the cache is flushed before reboot/poweroff, there's no
     need to update the coherency info on disk every time it changes.

 (4) Cookies can be hashed or put in a tree as the index key is easily
     available.  This allows:

     (a) Checks for duplicate cookies can be made at the top fscache layer
     	 rather than down in the bowels of the cache backend.

     (b) Caching can be added to a netfs object that has a cookie if the
     	 cache is brought online after the netfs object is allocated.

A certain amount of space is made in the cookie for inline copies of the
data, but if it won't fit there, extra memory will be allocated for it.

The downside of this is that live cache operation requires more memory.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Anna Schumaker <anna.schumaker@netapp.com>
Tested-by: Steve Dickson <steved@redhat.com>
2018-04-04 13:41:28 +01:00
David Howells
a25e21f0bc rxrpc, afs: Use debug_ids rather than pointers in traces
In rxrpc and afs, use the debug_ids that are monotonically allocated to
various objects as they're allocated rather than pointers as kernel
pointers are now hashed making them less useful.  Further, the debug ids
aren't reused anywhere nearly as quickly.

In addition, allow kernel services that use rxrpc, such as afs, to take
numbers from the rxrpc counter, assign them to their own call struct and
pass them in to rxrpc for both client and service calls so that the trace
lines for each will have the same ID tag.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-03-27 23:03:00 +01:00
David Howells
4d673da145 afs: Support the AFS dynamic root
Support the AFS dynamic root which is a pseudo-volume that doesn't connect
to any server resource, but rather is just a root directory that
dynamically creates mountpoint directories where the name of such a
directory is the name of the cell.

Such a mount can be created thus:

	mount -t afs none /afs -o dyn

Dynamic root superblocks aren't shared except by bind mounts and
propagation.  Cell root volumes can then be mounted by referring to them by
name, e.g.:

	ls /afs/grand.central.org/
	ls /afs/.grand.central.org/

The kernel will upcall to consult the DNS if the address wasn't supplied
directly.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-02-06 14:43:37 +00:00
David Howells
f8de483e74 afs: Properly reset afs_vnode (inode) fields
When an AFS inode is allocated by afs_alloc_inode(), the allocated
afs_vnode struct isn't necessarily reset from the last time it was used as
an inode because the slab constructor is only invoked once when the memory
is obtained from the page allocator.

This means that information can leak from one inode to the next because
we're not calling kmem_cache_zalloc().  Some of the information isn't
reset, in particular the permit cache pointer.

Bring the clearances up to date.

Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Marc Dionne <marc.dionne@auristor.com>
2017-12-01 11:51:24 +00:00