Commit Graph

87 Commits

Author SHA1 Message Date
David Howells
78525c74d9 netfs, 9p, afs, ceph: Use folios
Convert the netfs helper library to use folios throughout, convert the 9p
and afs filesystems to use folios in their file I/O paths and convert the
ceph filesystem to use just enough folios to compile.

With these changes, afs passes -g quick xfstests.

Changes
=======
ver #5:
 - Got rid of folio_end{io,_read,_write}() and inlined the stuff it does
   instead (Willy decided he didn't want this after all).

ver #4:
 - Fixed a bug in afs_redirty_page() whereby it didn't set the next page
   index in the loop and returned too early.
 - Simplified a check in v9fs_vfs_write_folio_locked()[1].
 - Undid a change to afs_symlink_readpage()[1].
 - Used offset_in_folio() in afs_write_end()[1].
 - Changed from using page_endio() to folio_end{io,_read,_write}()[1].

ver #2:
 - Add 9p foliation.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Dominique Martinet <asmadeus@codewreck.org>
Tested-by: kafs-testing@auristor.com
cc: Matthew Wilcox (Oracle) <willy@infradead.org>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Ilya Dryomov <idryomov@gmail.com>
cc: Dominique Martinet <asmadeus@codewreck.org>
cc: v9fs-developer@lists.sourceforge.net
cc: linux-afs@lists.infradead.org
cc: ceph-devel@vger.kernel.org
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/YYKa3bfQZxK5/wDN@casper.infradead.org/ [1]
Link: https://lore.kernel.org/r/2408234.1628687271@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/162877311459.3085614.10601478228012245108.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/162981153551.1901565.3124454657133703341.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/163005745264.2472992.9852048135392188995.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163584187452.4023316.500389675405550116.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/163649328026.309189.1124218109373941936.stgit@warthog.procyon.org.uk/ # v4
Link: https://lore.kernel.org/r/163657852454.834781.9265101983152100556.stgit@warthog.procyon.org.uk/ # v5
2021-11-10 21:16:56 +00:00
David Howells
75bd228d56 afs: Sort out symlink reading
afs_readpage() doesn't get a file pointer when called for a symlink, so
separate it from regular file pointer handling.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Link: https://lore.kernel.org/r/162687508008.276387.6418924257569297305.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/162981152280.1901565.2264055504466731917.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/163005742570.2472992.7800423440314043178.stgit@warthog.procyon.org.uk/ # v2
2021-11-02 09:40:18 +00:00
David Howells
6e0e99d58a afs: Fix mmap coherency vs 3rd-party changes
Fix the coherency management of mmap'd data such that 3rd-party changes
become visible as soon as possible after the callback notification is
delivered by the fileserver.  This is done by the following means:

 (1) When we break a callback on a vnode specified by the CB.CallBack call
     from the server, we queue a work item (vnode->cb_work) to go and
     clobber all the PTEs mapping to that inode.

     This causes the CPU to trip through the ->map_pages() and
     ->page_mkwrite() handlers if userspace attempts to access the page(s)
     again.

     (Ideally, this would be done in the service handler for CB.CallBack,
     but the server is waiting for our reply before considering, and we
     have a list of vnodes, all of which need breaking - and the process of
     getting the mmap_lock and stripping the PTEs on all CPUs could be
     quite slow.)

 (2) Call afs_validate() from the ->map_pages() handler to check to see if
     the file has changed and to get a new callback promise from the
     server.

Also handle the fileserver telling us that it's dropping all callbacks,
possibly after it's been restarted by sending us a CB.InitCallBackState*
call by the following means:

 (3) Maintain a per-cell list of afs files that are currently mmap'd
     (cell->fs_open_mmaps).

 (4) Add a work item to each server that is invoked if there are any open
     mmaps when CB.InitCallBackState happens.  This work item goes through
     the aforementioned list and invokes the vnode->cb_work work item for
     each one that is currently using this server.

     This causes the PTEs to be cleared, causing ->map_pages() or
     ->page_mkwrite() to be called again, thereby calling afs_validate()
     again.

I've chosen to simply strip the PTEs at the point of notification reception
rather than invalidate all the pages as well because (a) it's faster, (b)
we may get a notification for other reasons than the data being altered (in
which case we don't want to clobber the pagecache) and (c) we need to ask
the server to find out - and I don't want to wait for the reply before
holding up userspace.

This was tested using the attached test program:

	#include <stdbool.h>
	#include <stdio.h>
	#include <stdlib.h>
	#include <unistd.h>
	#include <fcntl.h>
	#include <sys/mman.h>
	int main(int argc, char *argv[])
	{
		size_t size = getpagesize();
		unsigned char *p;
		bool mod = (argc == 3);
		int fd;
		if (argc != 2 && argc != 3) {
			fprintf(stderr, "Format: %s <file> [mod]\n", argv[0]);
			exit(2);
		}
		fd = open(argv[1], mod ? O_RDWR : O_RDONLY);
		if (fd < 0) {
			perror(argv[1]);
			exit(1);
		}

		p = mmap(NULL, size, mod ? PROT_READ|PROT_WRITE : PROT_READ,
			 MAP_SHARED, fd, 0);
		if (p == MAP_FAILED) {
			perror("mmap");
			exit(1);
		}
		for (;;) {
			if (mod) {
				p[0]++;
				msync(p, size, MS_ASYNC);
				fsync(fd);
			}
			printf("%02x", p[0]);
			fflush(stdout);
			sleep(1);
		}
	}

It runs in two modes: in one mode, it mmaps a file, then sits in a loop
reading the first byte, printing it and sleeping for a second; in the
second mode it mmaps a file, then sits in a loop incrementing the first
byte and flushing, then printing and sleeping.

Two instances of this program can be run on different machines, one doing
the reading and one doing the writing.  The reader should see the changes
made by the writer, but without this patch, they aren't because validity
checking is being done lazily - only on entry to the filesystem.

Testing the InitCallBackState change is more complicated.  The server has
to be taken offline, the saved callback state file removed and then the
server restarted whilst the reading-mode program continues to run.  The
client machine then has to poke the server to trigger the InitCallBackState
call.

Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Markus Suvanto <markus.suvanto@gmail.com>
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/r/163111668833.283156.382633263709075739.stgit@warthog.procyon.org.uk/
2021-09-13 09:10:39 +01:00
David Howells
3978d81652 afs: Add missing vnode validation checks
afs_d_revalidate() should only be validating the directory entry it is
given and the directory to which that belongs; it shouldn't be validating
the inode/vnode to which that dentry points.  Besides, validation need to
be done even if we don't call afs_d_revalidate() - which might be the case
if we're starting from a file descriptor.

In order for afs_d_revalidate() to be fixed, validation points must be
added in some other places.  Certain directory operations, such as
afs_unlink(), already check this, but not all and not all file operations
either.

Note that the validation of a vnode not only checks to see if the
attributes we have are correct, but also gets a promise from the server to
notify us if that file gets changed by a third party.

Add the following checks:

 - Check the vnode we're going to make a hard link to.
 - Check the vnode we're going to move/rename.
 - Check the vnode we're going to read from.
 - Check the vnode we're going to write to.
 - Check the vnode we're going to sync.
 - Check the vnode we're going to make a mapped page writable for.

Some of these aren't strictly necessary as we're going to perform a server
operation that might get the attributes anyway from which we can determine
if something changed - though it might not get us a callback promise.

Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Markus Suvanto <markus.suvanto@gmail.com>
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/r/163111667354.283156.12720698333342917516.stgit@warthog.procyon.org.uk/
2021-09-13 09:10:39 +01:00
David Howells
345e1ae0c6 afs: Fix missing put on afs_read objects and missing get on the key therein
The afs_read objects created by afs_req_issue_op() get leaked because
afs_alloc_read() returns a ref and then afs_fetch_data() gets its own ref
which is released when the operation completes, but the initial ref is
never released.

Fix this by discarding the initial ref at the end of afs_req_issue_op().

This leak also covered another bug whereby a ref isn't got on the key
attached to the read record by afs_req_issue_op().  This isn't a problem as
long as the afs_read req never goes away...

Fix this by calling key_get() in afs_req_issue_op().

This was found by the generic/074 test.  It leaks a bunch of kmalloc-192
objects each time it is run, which can be observed by watching
/proc/slabinfo.

Fixes: f7605fa869cf ("afs: Fix leak of afs_read objects")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-and-tested-by: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/r/163010394740.3035676.8516846193899793357.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/163111665914.283156.3038561975681836591.stgit@warthog.procyon.org.uk/
2021-09-10 22:14:51 +01:00
David Howells
3003bbd069 afs: Use the netfs_write_begin() helper
Make AFS use the new netfs_write_begin() helper to do the pre-reading
required before the write.  If successful, the helper returns with the
required page filled in and locked.  It may read more than just one page,
expanding the read to meet cache granularity requirements as necessary.

Note: A more advanced version of this could be made that does
generic_perform_write() for a whole cache granule.  This would make it
easier to avoid doing the download/read for the data to be overwritten.

Signed-off-by: David Howells <dhowells@redhat.com>
Tested-By: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-cachefs@redhat.com
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/160588546422.3465195.1546354372589291098.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/161539563244.286939.16537296241609909980.stgit@warthog.procyon.org.uk/ # v4
Link: https://lore.kernel.org/r/161653819291.2770958.406013201547420544.stgit@warthog.procyon.org.uk/ # v5
Link: https://lore.kernel.org/r/161789102743.6155.17396591236631761195.stgit@warthog.procyon.org.uk/ # v6
2021-04-23 10:17:28 +01:00
David Howells
5cbf03985c afs: Use new netfs lib read helper API
Make AFS use the new netfs read helpers to implement the VM read
operations:

 - afs_readpage() now hands off responsibility to netfs_readpage().

 - afs_readpages() is gone and replaced with afs_readahead().

 - afs_readahead() just hands off responsibility to netfs_readahead().

These make use of the cache if a cookie is supplied, otherwise just call
the ->issue_op() method a sufficient number of times to complete the entire
request.

Changes:
v5:
- Use proper wait function for PG_fscache in afs_page_mkwrite()[1].
- Use killable wait for PG_writeback in afs_page_mkwrite()[1].

v4:
- Folded in error handling fixes to afs_req_issue_op().
- Added flag to netfs_subreq_terminated() to indicate that the caller may
  have been running async and stuff that might sleep needs punting to a
  workqueue.

Signed-off-by: David Howells <dhowells@redhat.com>
Tested-By: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-cachefs@redhat.com
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/2499407.1616505440@warthog.procyon.org.uk [1]
Link: https://lore.kernel.org/r/160588542733.3465195.7526541422073350302.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/161118158436.1232039.3884845981224091996.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/161161053540.2537118.14904446369309535330.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/161340418739.1303470.5908092911600241280.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/161539561926.286939.5729036262354802339.stgit@warthog.procyon.org.uk/ # v4
Link: https://lore.kernel.org/r/161653817977.2770958.17696456811587237197.stgit@warthog.procyon.org.uk/ # v5
Link: https://lore.kernel.org/r/161789101258.6155.3879271028895121537.stgit@warthog.procyon.org.uk/ # v6
2021-04-23 10:17:28 +01:00
David Howells
dc4191841d afs: Use the fs operation ops to handle FetchData completion
Use the 'success' and 'aborted' afs_operations_ops methods and add a
'failed' method to handle the completion of an AFS.FetchData,
AFS.FetchData64 or YFS.FetchData64 RPC operation rather than directly
calling the done func pointed to by the afs_read struct from the call
delivery handler.

This means the done function will be called back on error also, not just on
successful completion.

This allows motion towards asynchronous data reception on data fetch calls
and allows any error to be handed off to the fscache read helper in the
same place as a successful completion.

Signed-off-by: David Howells <dhowells@redhat.com>
Tested-By: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-cachefs@redhat.com
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/160588541471.3465195.8807019223378490810.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/161118157260.1232039.6549085372718234792.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/161161052647.2537118.12922380836599003659.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/161340417106.1303470.3502017303898569631.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/161539560673.286939.391310781674212229.stgit@warthog.procyon.org.uk/ # v4
Link: https://lore.kernel.org/r/161653816367.2770958.5856904574822446404.stgit@warthog.procyon.org.uk/ # v5
Link: https://lore.kernel.org/r/161789099994.6155.473719823490561190.stgit@warthog.procyon.org.uk/ # v6
2021-04-23 10:17:28 +01:00
David Howells
e87b03f583 afs: Prepare for use of THPs
As a prelude to supporting transparent huge pages, use thp_size() and
similar rather than PAGE_SIZE/SHIFT.

Further, try and frame everything in terms of file positions and lengths
rather than page indices and numbers of pages.

Signed-off-by: David Howells <dhowells@redhat.com>
Tested-By: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-cachefs@redhat.com
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/160588540227.3465195.4752143929716269062.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/161118155821.1232039.540445038028845740.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/161161051439.2537118.15577827510426326534.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/161340415869.1303470.6040191748634322355.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/161539559365.286939.18344613540296085269.stgit@warthog.procyon.org.uk/ # v4
Link: https://lore.kernel.org/r/161653815142.2770958.454490670311230206.stgit@warthog.procyon.org.uk/ # v5
Link: https://lore.kernel.org/r/161789098713.6155.16394227991842480300.stgit@warthog.procyon.org.uk/ # v6
2021-04-23 10:17:27 +01:00
David Howells
630f5dda84 afs: Wait on PG_fscache before modifying/releasing a page
PG_fscache is going to be used to indicate that a page is being written to
the cache, and that the page should not be modified or released until it's
finished.

Make afs_invalidatepage() and afs_releasepage() wait for it.

Signed-off-by: David Howells <dhowells@redhat.com>
Tested-By: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-cachefs@redhat.com
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/158861253957.340223.7465334678444521655.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/159465832417.1377938.3571599385208729791.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/160588536286.3465195.13231895135369807920.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/161118153708.1232039.3535103645871176749.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/161161049369.2537118.11591934943429117060.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/161340412903.1303470.6424701655031380012.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/161539556890.286939.5873470593519458598.stgit@warthog.procyon.org.uk/ # v4
Link: https://lore.kernel.org/r/161653812726.2770958.18167145829938766503.stgit@warthog.procyon.org.uk/ # v5
Link: https://lore.kernel.org/r/161789096241.6155.5907241930823579235.stgit@warthog.procyon.org.uk/ # v6
2021-04-23 10:17:27 +01:00
David Howells
c450846461 afs: Set up the iov_iter before calling afs_extract_data()
afs_extract_data() sets up a temporary iov_iter and passes it to AF_RXRPC
each time it is called to describe the remaining buffer to be filled.

Instead:

 (1) Put an iterator in the afs_call struct.

 (2) Set the iterator for each marshalling stage to load data into the
     appropriate places.  A number of convenience functions are provided to
     this end (eg. afs_extract_to_buf()).

     This iterator is then passed to afs_extract_data().

 (3) Use the new ITER_XARRAY iterator when reading data to load directly
     into the inode's pages without needing to create a list of them.

This will allow O_DIRECT calls to be supported in future patches.

Signed-off-by: David Howells <dhowells@redhat.com>
Tested-By: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-cachefs@redhat.com
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/152898380012.11616.12094591785228251717.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/153685394431.14766.3178466345696987059.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/153999787395.866.11218209749223643998.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/154033911195.12041.3882700371848894587.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/158861250059.340223.1248231474865140653.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/159465827399.1377938.11181327349704960046.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/160588533776.3465195.3612752083351956948.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/161118151238.1232039.17015723405750601161.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/161161047240.2537118.14721975104810564022.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/161340410333.1303470.16260122230371140878.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/161539554187.286939.15305559004905459852.stgit@warthog.procyon.org.uk/ # v4
Link: https://lore.kernel.org/r/161653810525.2770958.4630666029125411789.stgit@warthog.procyon.org.uk/ # v5
Link: https://lore.kernel.org/r/161789093719.6155.7877160739235087723.stgit@warthog.procyon.org.uk/ # v6
2021-04-23 10:17:27 +01:00
David Howells
c69bf479ba afs: Move key to afs_read struct
Stash the key used to authenticate read operations in the afs_read struct.
This will be necessary to reissue the operation against the server if a
read from the cache fails in upcoming cache changes.

Signed-off-by: David Howells <dhowells@redhat.com>
Tested-By: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-cachefs@redhat.com
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/158861248336.340223.1851189950710196001.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/159465823899.1377938.11925978022348532049.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/160588529557.3465195.7303323479305254243.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/161118147693.1232039.13780672951838643842.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/161161043340.2537118.511899217704140722.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/161340406678.1303470.12676824086429446370.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/161539550819.286939.1268332875889175195.stgit@warthog.procyon.org.uk/ # v4
Link: https://lore.kernel.org/r/161653806683.2770958.11300984379283401542.stgit@warthog.procyon.org.uk/ # v5
Link: https://lore.kernel.org/r/161789089556.6155.14603302893431820997.stgit@warthog.procyon.org.uk/ # v6
2021-04-23 10:17:26 +01:00
David Howells
67d78a6f6e afs: Pass page into dirty region helpers to provide THP size
Pass a pointer to the page being accessed into the dirty region helpers so
that the size of the page can be determined in case it's a transparent huge
page.

This also required the page to be passed into the afs_page_dirty trace
point - so there's no need to specifically pass in the index or private
data as these can be retrieved directly from the page struct.

Signed-off-by: David Howells <dhowells@redhat.com>
Tested-By: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-cachefs@redhat.com
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/160588527183.3465195.16107942526481976308.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/161118144921.1232039.11377711180492625929.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/161161040747.2537118.11435394902674511430.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/161340404553.1303470.11414163641767769882.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/161539548385.286939.8864598314493255313.stgit@warthog.procyon.org.uk/ # v4
Link: https://lore.kernel.org/r/161653804285.2770958.3497360004849598038.stgit@warthog.procyon.org.uk/ # v5
Link: https://lore.kernel.org/r/161789087043.6155.16922142208140170528.stgit@warthog.procyon.org.uk/ # v6
2021-04-23 10:17:26 +01:00
David Howells
03ffae9092 afs: Disable use of the fscache I/O routines
Disable use of the fscache I/O routined by the AFS filesystem.  It's about
to transition to passing iov_iters down and fscache is about to have its
I/O path to use iov_iter, so all that needs to change.

Signed-off-by: David Howells <dhowells@redhat.com>
Tested-By: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: linux-cachefs@redhat.com
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/158861209824.340223.1864211542341758994.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/159465768717.1376105.2229314852486665807.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/160588457929.3465195.1730097418904945578.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/161118143744.1232039.2727898205333669064.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/161161039077.2537118.7986870854927176905.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/161340403323.1303470.8159439948319423431.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/161539547167.286939.3536238932531122332.stgit@warthog.procyon.org.uk/ # v4
Link: https://lore.kernel.org/r/161653802797.2770958.547311814861545911.stgit@warthog.procyon.org.uk/ # v5
Link: https://lore.kernel.org/r/161789085806.6155.2596146255056027428.stgit@warthog.procyon.org.uk/ # v6
2021-04-23 10:17:25 +01:00
David Howells
a7889c6320 afs: Stop listxattr() from listing "afs.*" attributes
afs_listxattr() lists all the available special afs xattrs (i.e. those in
the "afs.*" space), no matter what type of server we're dealing with.  But
OpenAFS servers, for example, cannot deal with some of the extra-capable
attributes that AuriStor (YFS) servers provide.  Unfortunately, the
presence of the afs.yfs.* attributes causes errors[1] for anything that
tries to read them if the server is of the wrong type.

Fix the problem by removing afs_listxattr() so that none of the special
xattrs are listed (AFS doesn't support xattrs).  It does mean, however,
that getfattr won't list them, though they can still be accessed with
getxattr() and setxattr().

This can be tested with something like:

	getfattr -d -m ".*" /afs/example.com/path/to/file

With this change, none of the afs.* attributes should be visible.

Changes:
ver #2:
 - Hide all of the afs.* xattrs, not just the ACL ones.

Fixes: ae46578b96 ("afs: Get YFS ACLs and information through xattrs")
Reported-by: Gaja Sophie Peters <gaja.peters@math.uni-hamburg.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Gaja Sophie Peters <gaja.peters@math.uni-hamburg.de>
Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: http://lists.infradead.org/pipermail/linux-afs/2021-March/003502.html [1]
Link: http://lists.infradead.org/pipermail/linux-afs/2021-March/003567.html # v1
Link: http://lists.infradead.org/pipermail/linux-afs/2021-March/003573.html # v2
2021-03-15 17:09:54 +00:00
David Howells
f86726a69d afs: Fix afs_invalidatepage to adjust the dirty region
Fix afs_invalidatepage() to adjust the dirty region recorded in
page->private when truncating a page.  If the dirty region is entirely
removed, then the private data is cleared and the page dirty state is
cleared.

Without this, if the page is truncated and then expanded again by truncate,
zeros from the expanded, but no-longer dirty region may get written back to
the server if the page gets laundered due to a conflicting 3rd-party write.

It mustn't, however, shorten the dirty region of the page if that page is
still mmapped and has been marked dirty by afs_page_mkwrite(), so a flag is
stored in page->private to record this.

Fixes: 4343d00872 ("afs: Get rid of the afs_writeback record")
Signed-off-by: David Howells <dhowells@redhat.com>
2020-10-29 13:53:04 +00:00
David Howells
fa04a40b16 afs: Fix to take ref on page when PG_private is set
Fix afs to take a ref on a page when it sets PG_private on it and to drop
the ref when removing the flag.

Note that in afs_write_begin(), a lot of the time, PG_private is already
set on a page to which we're going to add some data.  In such a case, we
leave the bit set and mustn't increment the page count.

As suggested by Matthew Wilcox, use attach/detach_page_private() where
possible.

Fixes: 31143d5d51 ("AFS: implement basic file write support")
Reported-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2020-10-29 13:53:04 +00:00
David Howells
06a17bbe1d afs: Fix copy_file_range()
The prevention of splice-write without explicit ops made the
copy_file_write() syscall to an afs file (as done by the generic/112
xfstest) fail with EINVAL.

Fix by using iter_file_splice_write() for afs.

Fixes: 36e2c7421f ("fs: don't allow splice read/write without explicit ops")
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-10-27 22:05:56 +00:00
Gustavo A. R. Silva
df561f6688 treewide: Use fallthrough pseudo-keyword
Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-08-23 17:36:59 -05:00
David Howells
728279a5a1 afs: Fix use of afs_check_for_remote_deletion()
afs_check_for_remote_deletion() checks to see if error ENOENT is returned
by the server in response to an operation and, if so, marks the primary
vnode as having been deleted as the FID is no longer valid.

However, it's being called from the operation success functions, where no
abort has happened - and if an inline abort is recorded, it's handled by
afs_vnode_commit_status().

Fix this by actually calling the operation aborted method if provided and
having that point to afs_check_for_remote_deletion().

Fixes: e49c7b2f6d ("afs: Build an abstraction around an "operation" concept")
Signed-off-by: David Howells <dhowells@redhat.com>
2020-06-16 16:26:57 +01:00
David Howells
e49c7b2f6d afs: Build an abstraction around an "operation" concept
Turn the afs_operation struct into the main way that most fileserver
operations are managed.  Various things are added to the struct, including
the following:

 (1) All the parameters and results of the relevant operations are moved
     into it, removing corresponding fields from the afs_call struct.
     afs_call gets a pointer to the op.

 (2) The target volume is made the main focus of the operation, rather than
     the target vnode(s), and a bunch of op->vnode->volume are made
     op->volume instead.

 (3) Two vnode records are defined (op->file[]) for the vnode(s) involved
     in most operations.  The vnode record (struct afs_vnode_param)
     contains:

	- The vnode pointer.

	- The fid of the vnode to be included in the parameters or that was
          returned in the reply (eg. FS.MakeDir).

	- The status and callback information that may be returned in the
     	  reply about the vnode.

	- Callback break and data version tracking for detecting
          simultaneous third-parth changes.

 (4) Pointers to dentries to be updated with new inodes.

 (5) An operations table pointer.  The table includes pointers to functions
     for issuing AFS and YFS-variant RPCs, handling the success and abort
     of an operation and handling post-I/O-lock local editing of a
     directory.

To make this work, the following function restructuring is made:

 (A) The rotation loop that issues calls to fileservers that can be found
     in each function that wants to issue an RPC (such as afs_mkdir()) is
     extracted out into common code, in a new file called fs_operation.c.

 (B) The rotation loops, such as the one in afs_mkdir(), are replaced with
     a much smaller piece of code that allocates an operation, sets the
     parameters and then calls out to the common code to do the actual
     work.

 (C) The code for handling the success and failure of an operation are
     moved into operation functions (as (5) above) and these are called
     from the core code at appropriate times.

 (D) The pseudo inode getting stuff used by the dynamic root code is moved
     over into dynroot.c.

 (E) struct afs_iget_data is absorbed into the operation struct and
     afs_iget() expects to be given an op pointer and a vnode record.

 (F) Point (E) doesn't work for the root dir of a volume, but we know the
     FID in advance (it's always vnode 1, unique 1), so a separate inode
     getter, afs_root_iget(), is provided to special-case that.

 (G) The inode status init/update functions now also take an op and a vnode
     record.

 (H) The RPC marshalling functions now, for the most part, just take an
     afs_operation struct as their only argument.  All the data they need
     is held there.  The result delivery functions write their answers
     there as well.

 (I) The call is attached to the operation and then the operation core does
     the waiting.

And then the new operation code is, for the moment, made to just initialise
the operation, get the appropriate vnode I/O locks and do the same rotation
loop as before.

This lays the foundation for the following changes in the future:

 (*) Overhauling the rotation (again).

 (*) Support for asynchronous I/O, where the fileserver rotation must be
     done asynchronously also.

Signed-off-by: David Howells <dhowells@redhat.com>
2020-06-04 15:37:17 +01:00
David Howells
a310082f6d afs: Rename struct afs_fs_cursor to afs_operation
As a prelude to implementing asynchronous fileserver operations in the afs
filesystem, rename struct afs_fs_cursor to afs_operation.

This struct is going to form the core of the operation management and is
going to acquire more members in later.

Signed-off-by: David Howells <dhowells@redhat.com>
2020-05-31 15:19:52 +01:00
David Howells
0b9c0174d6 afs: Rename desc -> req in afs_fetch_data()
Rename the desc parameter to req in afs_fetch_data() for consistency with
other functions.

Signed-off-by: David Howells <dhowells@redhat.com>
2019-11-21 09:12:17 +00:00
Jia-Ju Bai
a6eed4ab5d fs: afs: Fix a possible null-pointer dereference in afs_put_read()
In afs_read_dir(), there is an if statement on line 255 to check whether
req->pages is NULL:
	if (!req->pages)
		goto error;

If req->pages is NULL, afs_put_read() on line 337 is executed.
In afs_put_read(), req->pages[i] is used on line 195.
Thus, a possible null-pointer dereference may occur in this case.

To fix this possible bug, an if statement is added in afs_put_read() to
check req->pages.

This bug is found by a static analysis tool STCheck written by us.

Fixes: f3ddee8dc4 ("afs: Fix directory handling")
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2019-07-30 14:38:51 +01:00
Linus Torvalds
8dda9957e3 AFS development
-----BEGIN PGP SIGNATURE-----
 
 iQIVAwUAXRyW8vu3V2unywtrAQIhsw//cVtxLx4ZCox5Z/93cdqych8RoCrwcUEG
 Cli0NAjlp/0HETvCsIqdkPKf+4OYCW1tHB2KTdbFdQLZptLgoEhykx89k70z9ggb
 ViieEa1GvAKhdamVqkPUC+3Q33uzyRaK7Gi5N3phJoaO+o328SlrPG0LerQgY0Np
 Rf3je56A1gIjEgWTmpStxiY262jlgaR3IuvpOqbu2G0TQVWV8CsBKw61fTdmEEQp
 dIkNO/xFXS+PvPdmQe5zCAjD/W2D+ggeBMbBwHF411qA60plGinubBYKZ98ikliZ
 OnQQPExI7mroIMzpYT+rzEQyxui2nz5t+Hj+d6t7iIvitNcX/Q53sVTq3RfQ0FjG
 QCd+j/l2p7fkXK4Sxgb/UBkj/pRr6W+FYSbQ/tmpD8UypEf5B3ln6GuA6yTMuNRF
 wVb744slKWq0c7KUuXmz806B2qJoyFG206jyFnoByvs6cPmB1+JqhBBYOKHcwjbo
 HIK+oUKkEfE6ofjQ3B9xOQ1anfbRnjjfJCmXvns9v57y/nRP2P78HUJNnEsOolk2
 nc3Ep41OgeZdwkts9KnSjmwy6VF3UZ2NQEiWXsUIOxGMtcodw9ci1bpquJ71oyut
 4sFMJvMU4eJD+XuCOlAgpbTaQ0Wuf11kFpl1Cof4fj0Z09C25Ahj6iKEKnumtO+4
 edfNLlwO6oo=
 =wgib
 -----END PGP SIGNATURE-----

Merge tag 'afs-next-20190628' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

Pull afs updates from David Howells:
 "A set of minor changes for AFS:

   - Remove an unnecessary check in afs_unlink()

   - Add a tracepoint for tracking callback management

   - Add a tracepoint for afs_server object usage

   - Use struct_size()

   - Add mappings for AFS UAE abort codes to Linux error codes, using
     symbolic names rather than hex numbers in the .c file"

* tag 'afs-next-20190628' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  afs: Add support for the UAE error table
  fs/afs: use struct_size() in kzalloc()
  afs: Trace afs_server usage
  afs: Add some callback management tracepoints
  afs: afs_unlink() doesn't need to check dentry->d_inode
2019-07-10 20:55:33 -07:00
Zhengyuan Liu
ee102584ef fs/afs: use struct_size() in kzalloc()
As Gustavo said in other patches doing the same replace, we can now
use the new struct_size() helper to avoid leaving these open-coded and
prone to type mistake.

Signed-off-by: Zhengyuan Liu <liuzhengyuan@kylinos.cn>
Signed-off-by: David Howells <dhowells@redhat.com>
2019-06-20 18:12:17 +01:00
Thomas Gleixner
2874c5fd28 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152
Based on 1 normalized pattern(s):

  this program is free software you can redistribute it and or modify
  it under the terms of the gnu general public license as published by
  the free software foundation either version 2 of the license or at
  your option any later version

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 3029 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-30 11:26:32 -07:00
David Howells
a58823ac45 afs: Fix application of status and callback to be under same lock
When applying the status and callback in the response of an operation,
apply them in the same critical section so that there's no race between
checking the callback state and checking status-dependent state (such as
the data version).

Fix this by:

 (1) Allocating a joint {status,callback} record (afs_status_cb) before
     calling the RPC function for each vnode for which the RPC reply
     contains a status or a status plus a callback.  A flag is set in the
     record to indicate if a callback was actually received.

 (2) These records are passed into the RPC functions to be filled in.  The
     afs_decode_status() and yfs_decode_status() functions are removed and
     the cb_lock is no longer taken.

 (3) xdr_decode_AFSFetchStatus() and xdr_decode_YFSFetchStatus() no longer
     update the vnode.

 (4) xdr_decode_AFSCallBack() and xdr_decode_YFSCallBack() no longer update
     the vnode.

 (5) vnodes, expected data-version numbers and callback break counters
     (cb_break) no longer need to be passed to the reply delivery
     functions.

     Note that, for the moment, the file locking functions still need
     access to both the call and the vnode at the same time.

 (6) afs_vnode_commit_status() is now given the cb_break value and the
     expected data_version and the task of applying the status and the
     callback to the vnode are now done here.

     This is done under a single taking of vnode->cb_lock.

 (7) afs_pages_written_back() is now called by afs_store_data() rather than
     by the reply delivery function.

     afs_pages_written_back() has been moved to before the call point and
     is now given the first and last page numbers rather than a pointer to
     the call.

 (8) The indicator from YFS.RemoveFile2 as to whether the target file
     actually got removed (status.abort_code == VNOVNODE) rather than
     merely dropping a link is now checked in afs_unlink rather than in
     xdr_decode_YFSFetchStatus().

Supplementary fixes:

 (*) afs_cache_permit() now gets the caller_access mask from the
     afs_status_cb object rather than picking it out of the vnode's status
     record.  afs_fetch_status() returns caller_access through its argument
     list for this purpose also.

 (*) afs_inode_init_from_status() now uses a write lock on cb_lock rather
     than a read lock and now sets the callback inside the same critical
     section.

Fixes: c435ee3455 ("afs: Overhaul the callback handling")
Signed-off-by: David Howells <dhowells@redhat.com>
2019-05-16 16:25:21 +01:00
David Howells
ffba718e93 afs: Get rid of afs_call::reply[]
Replace the afs_call::reply[] array with a bunch of typed members so that
the compiler can use type-checking on them.  It's also easier for the eye
to see what's going on.

Signed-off-by: David Howells <dhowells@redhat.com>
2019-05-16 16:25:21 +01:00
David Howells
20b8391fff afs: Make some RPC operations non-interruptible
Make certain RPC operations non-interruptible, including:

 (*) Set attributes
 (*) Store data

     We don't want to get interrupted during a flush on close, flush on
     unlock, writeback or an inode update, leaving us in a state where we
     still need to do the writeback or update.

 (*) Extend lock
 (*) Release lock

     We don't want to get lock extension interrupted as the file locks on
     the server are time-limited.  Interruption during lock release is less
     of an issue since the lock is time-limited, but it's better to
     complete the release to avoid a several-minute wait to recover it.

     *Setting* the lock isn't a problem if it's interrupted since we can
      just return to the user and tell them they were interrupted - at
      which point they can elect to retry.

 (*) Silly unlink

     We want to remove silly unlink files if we can, rather than leaving
     them for the salvager to clear up.

Note that whilst these calls are no longer interruptible, they do have
timeouts on them, so if the server stops responding the call will fail with
something like ETIME or ECONNRESET.

Without this, the following:

	kAFS: Unexpected error from FS.StoreData -512

appears in dmesg when a pending store data gets interrupted and some
processes may just hang.

Additionally, make the code that checks/updates the server record ignore
failure due to interruption if the main call is uninterruptible and if the
server has an address list.  The next op will check it again since the
expiration time on the old list has past.

Fixes: d2ddc776a4 ("afs: Overhaul volume and server record caching and fileserver rotation")
Reported-by: Jonathan Billings <jsbillings@jsbillings.org>
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2019-05-16 16:25:20 +01:00
David Howells
a1b879eefc afs: Fix key leak in afs_release() and afs_evict_inode()
Fix afs_release() to go through the cleanup part of the function if
FMODE_WRITE is set rather than exiting through vfs_fsync() (which skips the
cleanup).  The cleanup involves discarding the refs on the key used for
file ops and the writeback key record.

Also fix afs_evict_inode() to clean up any left over wb keys attached to
the inode/vnode when it is removed.

Fixes: 5a81327616 ("afs: Do better accretion of small writes on newly created content")
Signed-off-by: David Howells <dhowells@redhat.com>
2019-05-15 12:32:34 +01:00
Gustavo A. R. Silva
e690c9e3f4 afs: Mark expected switch fall-throughs
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Notice that in many cases I placed a /* Fall through */ comment
at the bottom of the case, which what GCC is expecting to find.

In other cases I had to tweak a bit the format of the comments.

This patch suppresses ALL missing-break-in-switch false positives
in fs/afs

Addresses-Coverity-ID: 115042 ("Missing break in switch")
Addresses-Coverity-ID: 115043 ("Missing break in switch")
Addresses-Coverity-ID: 115045 ("Missing break in switch")
Addresses-Coverity-ID: 1357430 ("Missing break in switch")
Addresses-Coverity-ID: 115047 ("Missing break in switch")
Addresses-Coverity-ID: 115050 ("Missing break in switch")
Addresses-Coverity-ID: 115051 ("Missing break in switch")
Addresses-Coverity-ID: 1467806 ("Missing break in switch")
Addresses-Coverity-ID: 1467807 ("Missing break in switch")
Addresses-Coverity-ID: 1467811 ("Missing break in switch")
Addresses-Coverity-ID: 115041 ("Missing break in switch")
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
2019-04-08 18:35:56 -05:00
Nikolay Borisov
f86196ea87 fs: don't open code lru_to_page()
Multiple filesystems open code lru_to_page().  Rectify this by moving
the macro from mm_inline (which is specific to lru stuff) to the more
generic mm.h header and start using the macro where appropriate.

No functional changes.

Link: http://lkml.kernel.org/r/20181129104810.23361-1-nborisov@suse.com
Link: https://lkml.kernel.org/r/20181129075301.29087-1-nborisov@suse.com
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Pankaj gupta <pagupta@redhat.com>
Acked-by: "Yan, Zheng" <zyan@redhat.com>		[ceph]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04 13:13:48 -08:00
David Howells
3b6492df41 afs: Increase to 64-bit volume ID and 96-bit vnode ID for YFS
Increase the sizes of the volume ID to 64 bits and the vnode ID (inode
number equivalent) to 96 bits to allow the support of YFS.

This requires the iget comparator to check the vnode->fid rather than i_ino
and i_generation as i_ino is not sufficiently capacious.  It also requires
this data to be placed into the vnode cache key for fscache.

For the moment, just discard the top 32 bits of the vnode ID when returning
it though stat.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24 00:41:08 +01:00
David Howells
68251f0a68 afs: Fix whole-volume callback handling
It's possible for an AFS file server to issue a whole-volume notification
that callbacks on all the vnodes in the file have been broken.  This is
done for R/O and backup volumes (which don't have per-file callbacks) and
for things like a volume being taken offline.

Fix callback handling to detect whole-volume notifications, to track it
across operations and to check it during inode validation.

Fixes: c435ee3455 ("afs: Overhaul the callback handling")
Signed-off-by: David Howells <dhowells@redhat.com>
2018-05-14 15:15:18 +01:00
David Howells
5a81327616 afs: Do better accretion of small writes on newly created content
Processes like ld that do lots of small writes that aren't necessarily
contiguous result in a lot of small StoreData operations to the server, the
idea being that if someone else changes the data on the server, we only
write our changes over that and not the space between.  Further, we don't
want to write back empty space if we can avoid it to make it easier for the
server to do sparse files.

However, making lots of tiny RPC ops is a lot less efficient for the server
than one big one because each op requires allocation of resources and the
taking of locks, so we want to compromise a bit.

Reduce the load by the following:

 (1) If a file is just created locally or has just been truncated with
     O_TRUNC locally, allow subsequent writes to the file to be merged with
     intervening space if that space doesn't cross an entire intervening
     page.

 (2) Don't flush the file on ->flush() but rather on ->release() if the
     file was open for writing.

Just linking vmlinux.o, without this patch, looking in /proc/fs/afs/stats:

	file-wr : n=441 nb=513581204

and after the patch:

	file-wr : n=62 nb=513668555

there were 379 fewer StoreData RPC operations at the expense of an extra
87K being written.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-09 21:54:48 +01:00
David Howells
76a5cb6fc1 afs: Add stats for data transfer operations
Add statistics to /proc/fs/afs/stats for data transfer RPC operations.  New
lines are added that look like:

	file-rd : n=55794 nb=10252282150
	file-wr : n=9789 nb=3247763645

where n= indicates the number of ops completed and nb= indicates the number
of bytes successfully transferred.  file-rd is the counts for read/fetch
operations and file-wr the counts for write/store operations.

Note that directory and symlink downloading are included in the file-rd
stats at the moment.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-09 21:54:48 +01:00
David Howells
f3ddee8dc4 afs: Fix directory handling
AFS directories are structured blobs that are downloaded just like files
and then parsed by the lookup and readdir code and, as such, are currently
handled in the pagecache like any other file, with the entire directory
content being thrown away each time the directory changes.

However, since the blob is a known structure and since the data version
counter on a directory increases by exactly one for each change committed
to that directory, we can actually edit the directory locally rather than
fetching it from the server after each locally-induced change.

What we can't do, though, is mix data from the server and data from the
client since the server is technically at liberty to rearrange or compress
a directory if it sees fit, provided it updates the data version number
when it does so and breaks the callback (ie. sends a notification).

Further, lookup with lookup-ahead, readdir and, when it arrives, local
editing are likely want to scan the whole of a directory.

So directory handling needs to be improved to maintain the coherency of the
directory blob prior to permitting local directory editing.

To this end:

 (1) If any directory page gets discarded, invalidate and reread the entire
     directory.

 (2) If readpage notes that if when it fetches a single page that the
     version number has changed, the entire directory is flagged for
     invalidation.

 (3) Read as much of the directory in one go as we can.

Note that this removes local caching of directories in fscache for the
moment as we can't pass the pages to fscache_read_or_alloc_pages() since
page->lru is in use by the LRU.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-09 21:54:48 +01:00
David Howells
ee1235a9a0 fscache: Pass object size in rather than calling back for it
Pass the object size in to fscache_acquire_cookie() and
fscache_write_page() rather than the netfs providing a callback by which it
can be received.  This makes it easier to update the size of the object
when a new page is written that extends the object.

The current object size is also passed by fscache to the check_aux
function, obviating the need to store it in the aux data.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Anna Schumaker <anna.schumaker@netapp.com>
Tested-by: Steve Dickson <steved@redhat.com>
2018-04-06 14:05:14 +01:00
David Howells
13524ab3c6 afs: Trace page dirty/clean
Add a trace event that logs the dirtying and cleaning of pages attached to
AFS inodes.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-13 15:38:21 +00:00
David Howells
1cf7a1518a afs: Implement shared-writeable mmap
Implement shared-writeable mmap for AFS.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-13 15:38:21 +00:00
David Howells
4343d00872 afs: Get rid of the afs_writeback record
Get rid of the afs_writeback record that kAFS is using to match keys with
writes made by that key.

Instead, keep a list of keys that have a file open for writing and/or
sync'ing and iterate through those.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-13 15:38:20 +00:00
David Howells
215804a992 afs: Introduce a file-private data record
Introduce a file-private data record for kAFS and put the key into it
rather than storing the key in file->private_data.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-13 15:38:20 +00:00
David Howells
dab17c1add afs: Fix directory read/modify race
Because parsing of the directory wasn't being done under any sort of lock,
the pages holding the directory content can get invalidated whilst the
parsing is ongoing.

Further, the directory page check function gets called outside of the page
lock, so if the page gets cleared or updated, this may return reports of
bad magic numbers in the directory page.

Also, the directory may change size whilst checking and parsing are
ongoing, so more care needs to be taken here.

Fix this by:

 (1) Perform the page check from the page filling function before we set
     PageUptodate and drop the page lock.

 (2) Check for the file having shrunk and the page having been abandoned
     before checking the page contents.

 (3) Lock the page whilst parsing it for the directory iterator.

Whilst we're at it, add a tracepoint to report check failure.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-13 15:38:20 +00:00
David Howells
d2ddc776a4 afs: Overhaul volume and server record caching and fileserver rotation
The current code assumes that volumes and servers are per-cell and are
never shared, but this is not enforced, and, indeed, public cells do exist
that are aliases of each other.  Further, an organisation can, say, set up
a public cell and a private cell with overlapping, but not identical, sets
of servers.  The difference is purely in the database attached to the VL
servers.

The current code will malfunction if it sees a server in two cells as it
assumes global address -> server record mappings and that each server is in
just one cell.

Further, each server may have multiple addresses - and may have addresses
of different families (IPv4 and IPv6, say).

To this end, the following structural changes are made:

 (1) Server record management is overhauled:

     (a) Server records are made independent of cell.  The namespace keeps
     	 track of them, volume records have lists of them and each vnode
     	 has a server on which its callback interest currently resides.

     (b) The cell record no longer keeps a list of servers known to be in
     	 that cell.

     (c) The server records are now kept in a flat list because there's no
     	 single address to sort on.

     (d) Server records are now keyed by their UUID within the namespace.

     (e) The addresses for a server are obtained with the VL.GetAddrsU
     	 rather than with VL.GetEntryByName, using the server's UUID as a
     	 parameter.

     (f) Cached server records are garbage collected after a period of
     	 non-use and are counted out of existence before purging is allowed
     	 to complete.  This protects the work functions against rmmod.

     (g) The servers list is now in /proc/fs/afs/servers.

 (2) Volume record management is overhauled:

     (a) An RCU-replaceable server list is introduced.  This tracks both
     	 servers and their coresponding callback interests.

     (b) The superblock is now keyed on cell record and numeric volume ID.

     (c) The volume record is now tied to the superblock which mounts it,
     	 and is activated when mounted and deactivated when unmounted.
     	 This makes it easier to handle the cache cookie without causing a
     	 double-use in fscache.

     (d) The volume record is loaded from the VLDB using VL.GetEntryByNameU
     	 to get the server UUID list.

     (e) The volume name is updated if it is seen to have changed when the
     	 volume is updated (the update is keyed on the volume ID).

 (3) The vlocation record is got rid of and VLDB records are no longer
     cached.  Sufficient information is stored in the volume record, though
     an update to a volume record is now no longer shared between related
     volumes (volumes come in bundles of three: R/W, R/O and backup).

and the following procedural changes are made:

 (1) The fileserver cursor introduced previously is now fleshed out and
     used to iterate over fileservers and their addresses.

 (2) Volume status is checked during iteration, and the server list is
     replaced if a change is detected.

 (3) Server status is checked during iteration, and the address list is
     replaced if a change is detected.

 (4) The abort code is saved into the address list cursor and -ECONNABORTED
     returned in afs_make_call() if a remote abort happened rather than
     translating the abort into an error message.  This allows actions to
     be taken depending on the abort code more easily.

     (a) If a VMOVED abort is seen then this is handled by rechecking the
     	 volume and restarting the iteration.

     (b) If a VBUSY, VRESTARTING or VSALVAGING abort is seen then this is
         handled by sleeping for a short period and retrying and/or trying
         other servers that might serve that volume.  A message is also
         displayed once until the condition has cleared.

     (c) If a VOFFLINE abort is seen, then this is handled as VBUSY for the
     	 moment.

     (d) If a VNOVOL abort is seen, the volume is rechecked in the VLDB to
     	 see if it has been deleted; if not, the fileserver is probably
     	 indicating that the volume couldn't be attached and needs
     	 salvaging.

     (e) If statfs() sees one of these aborts, it does not sleep, but
     	 rather returns an error, so as not to block the umount program.

 (5) The fileserver iteration functions in vnode.c are now merged into
     their callers and more heavily macroised around the cursor.  vnode.c
     is removed.

 (6) Operations on a particular vnode are serialised on that vnode because
     the server will lock that vnode whilst it operates on it, so a second
     op sent will just have to wait.

 (7) Fileservers are probed with FS.GetCapabilities before being used.
     This is where service upgrade will be done.

 (8) A callback interest on a fileserver is set up before an FS operation
     is performed and passed through to afs_make_call() so that it can be
     set on the vnode if the operation returns a callback.  The callback
     interest is passed through to afs_iget() also so that it can be set
     there too.

In general, record updating is done on an as-needed basis when we try to
access servers, volumes or vnodes rather than offloading it to work items
and special threads.

Notes:

 (1) Pre AFS-3.4 servers are no longer supported, though this can be added
     back if necessary (AFS-3.4 was released in 1998).

 (2) VBUSY is retried forever for the moment at intervals of 1s.

 (3) /proc/fs/afs/<cell>/servers no longer exists.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-13 15:38:19 +00:00
David Howells
97e3043ad8 afs: Condense afs_call's reply{,2,3,4} into an array
Condense struct afs_call's reply anchor members - reply{,2,3,4} - into an
array.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-13 15:38:17 +00:00
David Howells
d3e3b7eac8 afs: Add metadata xattrs
Add xattrs to allow the user to get/set metadata in lieu of having pioctl()
available.  The following xattrs are now available:

 - "afs.cell"

   The name of the cell in which the vnode's volume resides.

 - "afs.fid"

   The volume ID, vnode ID and vnode uniquifier of the file as three hex
   numbers separated by colons.

 - "afs.volume"

   The name of the volume in which the vnode resides.

For example:

	# getfattr -d -m ".*" /mnt/scratch
	getfattr: Removing leading '/' from absolute path names
	# file: mnt/scratch
	afs.cell="mycell.myorg.org"
	afs.fid="10000b:1:1"
	afs.volume="scratch"

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-09 14:40:12 -07:00
David Howells
68ae849d7e afs: Don't set PG_error on local EINTR or ENOMEM when filling a page
Don't set PG_error on a page if we get local EINTR or ENOMEM when filling a
page for writing.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-03-16 16:27:48 +00:00
David Howells
58fed94dfb afs: Flush outstanding writes when an fd is closed
Flush outstanding writes in afs when an fd is closed.  This is what NFS and
CIFS do.

Reported-by: Marc Dionne <marc.c.dionne@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2017-03-16 16:27:45 +00:00
David Howells
6db3ac3c4b afs: Handle better the server returning excess or short data
When an AFS server is given an FS.FetchData{,64} request to read data from
a file, it is permitted by the protocol to return more or less than was
requested.  kafs currently relies on the latter behaviour in readpage{,s}
to handle a partial page at the end of the file (we just ask for a whole
page and clear space beyond the short read).

However, we don't handle all cases.  Add:

 (1) Handle excess data by discarding it rather than aborting.  Note that
     we use a common static buffer to discard into so that the decryption
     algorithm advances the PCBC state.

 (2) Handle a short read that affects more than just the last page.

Note that if a read comes up unexpectedly short of long, it's possible that
the server's copy of the file changed - in which case the data version
number will have been incremented and the callback will have been broken -
in which case all the pages currently attached to the inode will be zapped
anyway at some point.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-03-16 16:27:44 +00:00