linux/include/trace/events/netfs.h

688 lines
21 KiB
C
Raw Normal View History

netfs: Add tracepoints Add three tracepoints to track the activity of the read helpers: (1) netfs/netfs_read This logs entry to the read helpers and also expansion of the range in a readahead request. (2) netfs/netfs_rreq This logs the progress of netfs_read_request objects which track read requests. A read request may be a compound of multiple subrequests. (3) netfs/netfs_sreq This logs the progress of netfs_read_subrequest objects, which track the contributions from various sources to a read request. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-and-tested-by: Jeff Layton <jlayton@kernel.org> Tested-by: Dave Wysochanski <dwysocha@redhat.com> Tested-By: Marc Dionne <marc.dionne@auristor.com> cc: Matthew Wilcox <willy@infradead.org> cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/161118138060.1232039.5353374588021776217.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161033468.2537118.14021843889844001905.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340395843.1303470.7355519662919639648.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/161539538693.286939.10171713520419106334.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/161653796447.2770958.1870655382450862155.stgit@warthog.procyon.org.uk/ # v5 Link: https://lore.kernel.org/r/161789078003.6155.17814844411672989942.stgit@warthog.procyon.org.uk/ # v6
2020-09-18 16:25:13 +08:00
/* SPDX-License-Identifier: GPL-2.0-or-later */
/* Network filesystem support module tracepoints
*
* Copyright (C) 2021 Red Hat, Inc. All Rights Reserved.
* Written by David Howells (dhowells@redhat.com)
*/
#undef TRACE_SYSTEM
#define TRACE_SYSTEM netfs
#if !defined(_TRACE_NETFS_H) || defined(TRACE_HEADER_MULTI_READ)
#define _TRACE_NETFS_H
#include <linux/tracepoint.h>
/*
* Define enums for tracing information.
*/
#define netfs_read_traces \
EM(netfs_read_trace_dio_read, "DIO-READ ") \
netfs: Add tracepoints Add three tracepoints to track the activity of the read helpers: (1) netfs/netfs_read This logs entry to the read helpers and also expansion of the range in a readahead request. (2) netfs/netfs_rreq This logs the progress of netfs_read_request objects which track read requests. A read request may be a compound of multiple subrequests. (3) netfs/netfs_sreq This logs the progress of netfs_read_subrequest objects, which track the contributions from various sources to a read request. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-and-tested-by: Jeff Layton <jlayton@kernel.org> Tested-by: Dave Wysochanski <dwysocha@redhat.com> Tested-By: Marc Dionne <marc.dionne@auristor.com> cc: Matthew Wilcox <willy@infradead.org> cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/161118138060.1232039.5353374588021776217.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161033468.2537118.14021843889844001905.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340395843.1303470.7355519662919639648.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/161539538693.286939.10171713520419106334.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/161653796447.2770958.1870655382450862155.stgit@warthog.procyon.org.uk/ # v5 Link: https://lore.kernel.org/r/161789078003.6155.17814844411672989942.stgit@warthog.procyon.org.uk/ # v6
2020-09-18 16:25:13 +08:00
EM(netfs_read_trace_expanded, "EXPANDED ") \
EM(netfs_read_trace_readahead, "READAHEAD") \
netfs: Add write_begin helper Add a helper to do the pre-reading work for the netfs write_begin address space op. Changes v6: - Fixed a missing rreq put in netfs_write_begin()[3]. - Use DEFINE_READAHEAD()[4]. v5: - Made the wait for PG_fscache in netfs_write_begin() killable[2]. v4: - Added flag to netfs_subreq_terminated() to indicate that the caller may have been running async and stuff that might sleep needs punting to a workqueue (can't use in_softirq()[1]). Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-and-tested-by: Jeff Layton <jlayton@kernel.org> Tested-by: Dave Wysochanski <dwysocha@redhat.com> Tested-By: Marc Dionne <marc.dionne@auristor.com> cc: Matthew Wilcox <willy@infradead.org> cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20210216084230.GA23669@lst.de/ [1] Link: https://lore.kernel.org/r/2499407.1616505440@warthog.procyon.org.uk/ [2] Link: https://lore.kernel.org/r/161781042127.463527.9154479794406046987.stgit@warthog.procyon.org.uk/ [3] Link: https://lore.kernel.org/r/1234933.1617886271@warthog.procyon.org.uk/ [4] Link: https://lore.kernel.org/r/160588543960.3465195.2792938973035886168.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161118140165.1232039.16418853874312234477.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161035539.2537118.15674887534950908530.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340398368.1303470.11242918276563276090.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/161539541541.286939.1889738674057013729.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/161653798616.2770958.17213315845968485563.stgit@warthog.procyon.org.uk/ # v5 Link: https://lore.kernel.org/r/161789080530.6155.1011847312392330491.stgit@warthog.procyon.org.uk/ # v6
2020-09-22 18:06:07 +08:00
EM(netfs_read_trace_readpage, "READPAGE ") \
EM(netfs_read_trace_prefetch_for_write, "PREFETCHW") \
netfs: Add write_begin helper Add a helper to do the pre-reading work for the netfs write_begin address space op. Changes v6: - Fixed a missing rreq put in netfs_write_begin()[3]. - Use DEFINE_READAHEAD()[4]. v5: - Made the wait for PG_fscache in netfs_write_begin() killable[2]. v4: - Added flag to netfs_subreq_terminated() to indicate that the caller may have been running async and stuff that might sleep needs punting to a workqueue (can't use in_softirq()[1]). Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-and-tested-by: Jeff Layton <jlayton@kernel.org> Tested-by: Dave Wysochanski <dwysocha@redhat.com> Tested-By: Marc Dionne <marc.dionne@auristor.com> cc: Matthew Wilcox <willy@infradead.org> cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20210216084230.GA23669@lst.de/ [1] Link: https://lore.kernel.org/r/2499407.1616505440@warthog.procyon.org.uk/ [2] Link: https://lore.kernel.org/r/161781042127.463527.9154479794406046987.stgit@warthog.procyon.org.uk/ [3] Link: https://lore.kernel.org/r/1234933.1617886271@warthog.procyon.org.uk/ [4] Link: https://lore.kernel.org/r/160588543960.3465195.2792938973035886168.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161118140165.1232039.16418853874312234477.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161035539.2537118.15674887534950908530.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340398368.1303470.11242918276563276090.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/161539541541.286939.1889738674057013729.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/161653798616.2770958.17213315845968485563.stgit@warthog.procyon.org.uk/ # v5 Link: https://lore.kernel.org/r/161789080530.6155.1011847312392330491.stgit@warthog.procyon.org.uk/ # v6
2020-09-22 18:06:07 +08:00
E_(netfs_read_trace_write_begin, "WRITEBEGN")
netfs: Add tracepoints Add three tracepoints to track the activity of the read helpers: (1) netfs/netfs_read This logs entry to the read helpers and also expansion of the range in a readahead request. (2) netfs/netfs_rreq This logs the progress of netfs_read_request objects which track read requests. A read request may be a compound of multiple subrequests. (3) netfs/netfs_sreq This logs the progress of netfs_read_subrequest objects, which track the contributions from various sources to a read request. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-and-tested-by: Jeff Layton <jlayton@kernel.org> Tested-by: Dave Wysochanski <dwysocha@redhat.com> Tested-By: Marc Dionne <marc.dionne@auristor.com> cc: Matthew Wilcox <willy@infradead.org> cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/161118138060.1232039.5353374588021776217.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161033468.2537118.14021843889844001905.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340395843.1303470.7355519662919639648.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/161539538693.286939.10171713520419106334.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/161653796447.2770958.1870655382450862155.stgit@warthog.procyon.org.uk/ # v5 Link: https://lore.kernel.org/r/161789078003.6155.17814844411672989942.stgit@warthog.procyon.org.uk/ # v6
2020-09-18 16:25:13 +08:00
netfs: Dispatch write requests to process a writeback slice Dispatch one or more write reqeusts to process a writeback slice, where a slice is tailored more to logical block divisions within the file (such as crypto blocks, an object layout or cache granules) than the protocol RPC maximum capacity. The dispatch doesn't happen until throttling allows, at which point the entire writeback slice is processed and queued. A slice may be written to multiple destinations (one or more servers and the local cache) and the writes to each destination might be split up along different lines. The writeback slice holds the required folios pinned. An iov_iter is provided in netfs_write_request that describes the buffer to be used. This may be part of the pagecache, may have auxiliary padding pages attached or may be a bounce buffer resulting from crypto or compression. Consequently, the filesystem must not twiddle the folio markings directly. The following API is available to the filesystem: (1) The ->create_write_requests() method is called to ask the filesystem to create the requests it needs. This is passed the writeback slice to be processed. (2) The filesystem should then call netfs_create_write_request() to create the requests it needs. (3) Once a request is initialised, netfs_queue_write_request() can be called to dispatch it asynchronously, if not completed immediately. (4) netfs_write_request_completed() should be called to note the completion of a request. (5) netfs_get_write_request() and netfs_put_write_request() are provided to refcount a request. These take constants from the netfs_wreq_trace enum for logging into ftrace. (6) The ->free_write_request is method is called to ask the filesystem to clean up a request. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2021-06-30 05:31:48 +08:00
#define netfs_write_traces \
netfs: Replace PG_fscache by setting folio->private and marking dirty When dirty data is being written to the cache, setting/waiting on/clearing the fscache flag is always done in tandem with setting/waiting on/clearing the writeback flag. The netfslib buffered write routines wait on and set both flags and the write request cleanup clears both flags, so the fscache flag is almost superfluous. The reason it isn't superfluous is because the fscache flag is also used to indicate that data just read from the server is being written to the cache. The flag is used to prevent a race involving overlapping direct-I/O writes to the cache. Change this to indicate that a page is in need of being copied to the cache by placing a magic value in folio->private and marking the folios dirty. Then when the writeback code sees a folio marked in this way, it only writes it to the cache and not to the server. If a folio that has this magic value set is modified, the value is just replaced and the folio will then be uplodaded too. With this, PG_fscache is no longer required by the netfslib core, 9p and afs. Ceph and nfs, however, still need to use the old PG_fscache-based tracking. To deal with this, a flag, NETFS_ICTX_USE_PGPRIV2, now has to be set on the flags in the netfs_inode struct for those filesystems. This reenables the use of PG_fscache in that inode. 9p and afs use the netfslib write helpers so get switched over; cifs, for the moment, does page-by-page manual access to the cache, so doesn't use PG_fscache and is unaffected. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Matthew Wilcox (Oracle) <willy@infradead.org> cc: Eric Van Hensbergen <ericvh@kernel.org> cc: Latchesar Ionkov <lucho@ionkov.net> cc: Dominique Martinet <asmadeus@codewreck.org> cc: Christian Schoenebeck <linux_oss@crudebyte.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Ilya Dryomov <idryomov@gmail.com> cc: Xiubo Li <xiubli@redhat.com> cc: Steve French <sfrench@samba.org> cc: Paulo Alcantara <pc@manguebit.com> cc: Ronnie Sahlberg <ronniesahlberg@gmail.com> cc: Shyam Prasad N <sprasad@microsoft.com> cc: Tom Talpey <tom@talpey.com> cc: Bharath SM <bharathsm@microsoft.com> cc: Trond Myklebust <trond.myklebust@hammerspace.com> cc: Anna Schumaker <anna@kernel.org> cc: netfs@lists.linux.dev cc: v9fs@lists.linux.dev cc: linux-afs@lists.infradead.org cc: ceph-devel@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: linux-nfs@vger.kernel.org cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2024-03-19 18:00:09 +08:00
EM(netfs_write_trace_copy_to_cache, "COPY2CACH") \
netfs: Dispatch write requests to process a writeback slice Dispatch one or more write reqeusts to process a writeback slice, where a slice is tailored more to logical block divisions within the file (such as crypto blocks, an object layout or cache granules) than the protocol RPC maximum capacity. The dispatch doesn't happen until throttling allows, at which point the entire writeback slice is processed and queued. A slice may be written to multiple destinations (one or more servers and the local cache) and the writes to each destination might be split up along different lines. The writeback slice holds the required folios pinned. An iov_iter is provided in netfs_write_request that describes the buffer to be used. This may be part of the pagecache, may have auxiliary padding pages attached or may be a bounce buffer resulting from crypto or compression. Consequently, the filesystem must not twiddle the folio markings directly. The following API is available to the filesystem: (1) The ->create_write_requests() method is called to ask the filesystem to create the requests it needs. This is passed the writeback slice to be processed. (2) The filesystem should then call netfs_create_write_request() to create the requests it needs. (3) Once a request is initialised, netfs_queue_write_request() can be called to dispatch it asynchronously, if not completed immediately. (4) netfs_write_request_completed() should be called to note the completion of a request. (5) netfs_get_write_request() and netfs_put_write_request() are provided to refcount a request. These take constants from the netfs_wreq_trace enum for logging into ftrace. (6) The ->free_write_request is method is called to ask the filesystem to clean up a request. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2021-06-30 05:31:48 +08:00
EM(netfs_write_trace_dio_write, "DIO-WRITE") \
EM(netfs_write_trace_unbuffered_write, "UNB-WRITE") \
EM(netfs_write_trace_writeback, "WRITEBACK") \
E_(netfs_write_trace_writethrough, "WRITETHRU")
netfs: Dispatch write requests to process a writeback slice Dispatch one or more write reqeusts to process a writeback slice, where a slice is tailored more to logical block divisions within the file (such as crypto blocks, an object layout or cache granules) than the protocol RPC maximum capacity. The dispatch doesn't happen until throttling allows, at which point the entire writeback slice is processed and queued. A slice may be written to multiple destinations (one or more servers and the local cache) and the writes to each destination might be split up along different lines. The writeback slice holds the required folios pinned. An iov_iter is provided in netfs_write_request that describes the buffer to be used. This may be part of the pagecache, may have auxiliary padding pages attached or may be a bounce buffer resulting from crypto or compression. Consequently, the filesystem must not twiddle the folio markings directly. The following API is available to the filesystem: (1) The ->create_write_requests() method is called to ask the filesystem to create the requests it needs. This is passed the writeback slice to be processed. (2) The filesystem should then call netfs_create_write_request() to create the requests it needs. (3) Once a request is initialised, netfs_queue_write_request() can be called to dispatch it asynchronously, if not completed immediately. (4) netfs_write_request_completed() should be called to note the completion of a request. (5) netfs_get_write_request() and netfs_put_write_request() are provided to refcount a request. These take constants from the netfs_wreq_trace enum for logging into ftrace. (6) The ->free_write_request is method is called to ask the filesystem to clean up a request. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2021-06-30 05:31:48 +08:00
#define netfs_rreq_origins \
EM(NETFS_READAHEAD, "RA") \
EM(NETFS_READPAGE, "RP") \
EM(NETFS_READ_FOR_WRITE, "RW") \
netfs: Replace PG_fscache by setting folio->private and marking dirty When dirty data is being written to the cache, setting/waiting on/clearing the fscache flag is always done in tandem with setting/waiting on/clearing the writeback flag. The netfslib buffered write routines wait on and set both flags and the write request cleanup clears both flags, so the fscache flag is almost superfluous. The reason it isn't superfluous is because the fscache flag is also used to indicate that data just read from the server is being written to the cache. The flag is used to prevent a race involving overlapping direct-I/O writes to the cache. Change this to indicate that a page is in need of being copied to the cache by placing a magic value in folio->private and marking the folios dirty. Then when the writeback code sees a folio marked in this way, it only writes it to the cache and not to the server. If a folio that has this magic value set is modified, the value is just replaced and the folio will then be uplodaded too. With this, PG_fscache is no longer required by the netfslib core, 9p and afs. Ceph and nfs, however, still need to use the old PG_fscache-based tracking. To deal with this, a flag, NETFS_ICTX_USE_PGPRIV2, now has to be set on the flags in the netfs_inode struct for those filesystems. This reenables the use of PG_fscache in that inode. 9p and afs use the netfslib write helpers so get switched over; cifs, for the moment, does page-by-page manual access to the cache, so doesn't use PG_fscache and is unaffected. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Matthew Wilcox (Oracle) <willy@infradead.org> cc: Eric Van Hensbergen <ericvh@kernel.org> cc: Latchesar Ionkov <lucho@ionkov.net> cc: Dominique Martinet <asmadeus@codewreck.org> cc: Christian Schoenebeck <linux_oss@crudebyte.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Ilya Dryomov <idryomov@gmail.com> cc: Xiubo Li <xiubli@redhat.com> cc: Steve French <sfrench@samba.org> cc: Paulo Alcantara <pc@manguebit.com> cc: Ronnie Sahlberg <ronniesahlberg@gmail.com> cc: Shyam Prasad N <sprasad@microsoft.com> cc: Tom Talpey <tom@talpey.com> cc: Bharath SM <bharathsm@microsoft.com> cc: Trond Myklebust <trond.myklebust@hammerspace.com> cc: Anna Schumaker <anna@kernel.org> cc: netfs@lists.linux.dev cc: v9fs@lists.linux.dev cc: linux-afs@lists.infradead.org cc: ceph-devel@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: linux-nfs@vger.kernel.org cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2024-03-19 18:00:09 +08:00
EM(NETFS_COPY_TO_CACHE, "CC") \
EM(NETFS_WRITEBACK, "WB") \
EM(NETFS_WRITETHROUGH, "WT") \
EM(NETFS_UNBUFFERED_WRITE, "UW") \
EM(NETFS_DIO_READ, "DR") \
E_(NETFS_DIO_WRITE, "DW")
netfs: Add tracepoints Add three tracepoints to track the activity of the read helpers: (1) netfs/netfs_read This logs entry to the read helpers and also expansion of the range in a readahead request. (2) netfs/netfs_rreq This logs the progress of netfs_read_request objects which track read requests. A read request may be a compound of multiple subrequests. (3) netfs/netfs_sreq This logs the progress of netfs_read_subrequest objects, which track the contributions from various sources to a read request. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-and-tested-by: Jeff Layton <jlayton@kernel.org> Tested-by: Dave Wysochanski <dwysocha@redhat.com> Tested-By: Marc Dionne <marc.dionne@auristor.com> cc: Matthew Wilcox <willy@infradead.org> cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/161118138060.1232039.5353374588021776217.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161033468.2537118.14021843889844001905.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340395843.1303470.7355519662919639648.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/161539538693.286939.10171713520419106334.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/161653796447.2770958.1870655382450862155.stgit@warthog.procyon.org.uk/ # v5 Link: https://lore.kernel.org/r/161789078003.6155.17814844411672989942.stgit@warthog.procyon.org.uk/ # v6
2020-09-18 16:25:13 +08:00
#define netfs_rreq_traces \
EM(netfs_rreq_trace_assess, "ASSESS ") \
EM(netfs_rreq_trace_copy, "COPY ") \
netfs: New writeback implementation The current netfslib writeback implementation creates writeback requests of contiguous folio data and then separately tiles subrequests over the space twice, once for the server and once for the cache. This creates a few issues: (1) Every time there's a discontiguity or a change between writing to only one destination or writing to both, it must create a new request. This makes it harder to do vectored writes. (2) The folios don't have the writeback mark removed until the end of the request - and a request could be hundreds of megabytes. (3) In future, I want to support a larger cache granularity, which will require aggregation of some folios that contain unmodified data (which only need to go to the cache) and some which contain modifications (which need to be uploaded and stored to the cache) - but, currently, these are treated as discontiguous. There's also a move to get everyone to use writeback_iter() to extract writable folios from the pagecache. That said, currently writeback_iter() has some issues that make it less than ideal: (1) there's no way to cancel the iteration, even if you find a "temporary" error that means the current folio and all subsequent folios are going to fail; (2) there's no way to filter the folios being written back - something that will impact Ceph with it's ordered snap system; (3) and if you get a folio you can't immediately deal with (say you need to flush the preceding writes), you are left with a folio hanging in the locked state for the duration, when really we should unlock it and relock it later. In this new implementation, I use writeback_iter() to pump folios, progressively creating two parallel, but separate streams and cleaning up the finished folios as the subrequests complete. Either or both streams can contain gaps, and the subrequests in each stream can be of variable size, don't need to align with each other and don't need to align with the folios. Indeed, subrequests can cross folio boundaries, may cover several folios or a folio may be spanned by multiple folios, e.g.: +---+---+-----+-----+---+----------+ Folios: | | | | | | | +---+---+-----+-----+---+----------+ +------+------+ +----+----+ Upload: | | |.....| | | +------+------+ +----+----+ +------+------+------+------+------+ Cache: | | | | | | +------+------+------+------+------+ The progressive subrequest construction permits the algorithm to be preparing both the next upload to the server and the next write to the cache whilst the previous ones are already in progress. Throttling can be applied to control the rate of production of subrequests - and, in any case, we probably want to write them to the server in ascending order, particularly if the file will be extended. Content crypto can also be prepared at the same time as the subrequests and run asynchronously, with the prepped requests being stalled until the crypto catches up with them. This might also be useful for transport crypto, but that happens at a lower layer, so probably would be harder to pull off. The algorithm is split into three parts: (1) The issuer. This walks through the data, packaging it up, encrypting it and creating subrequests. The part of this that generates subrequests only deals with file positions and spans and so is usable for DIO/unbuffered writes as well as buffered writes. (2) The collector. This asynchronously collects completed subrequests, unlocks folios, frees crypto buffers and performs any retries. This runs in a work queue so that the issuer can return to the caller for writeback (so that the VM can have its kswapd thread back) or async writes. (3) The retryer. This pauses the issuer, waits for all outstanding subrequests to complete and then goes through the failed subrequests to reissue them. This may involve reprepping them (with cifs, the credits must be renegotiated, and a subrequest may need splitting), and doing RMW for content crypto if there's a conflicting change on the server. [!] Note that some of the functions are prefixed with "new_" to avoid clashes with existing functions. These will be renamed in a later patch that cuts over to the new algorithm. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Eric Van Hensbergen <ericvh@kernel.org> cc: Latchesar Ionkov <lucho@ionkov.net> cc: Dominique Martinet <asmadeus@codewreck.org> cc: Christian Schoenebeck <linux_oss@crudebyte.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: v9fs@lists.linux.dev cc: linux-afs@lists.infradead.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org
2024-03-19 00:52:05 +08:00
EM(netfs_rreq_trace_collect, "COLLECT") \
EM(netfs_rreq_trace_done, "DONE ") \
EM(netfs_rreq_trace_free, "FREE ") \
netfs: Dispatch write requests to process a writeback slice Dispatch one or more write reqeusts to process a writeback slice, where a slice is tailored more to logical block divisions within the file (such as crypto blocks, an object layout or cache granules) than the protocol RPC maximum capacity. The dispatch doesn't happen until throttling allows, at which point the entire writeback slice is processed and queued. A slice may be written to multiple destinations (one or more servers and the local cache) and the writes to each destination might be split up along different lines. The writeback slice holds the required folios pinned. An iov_iter is provided in netfs_write_request that describes the buffer to be used. This may be part of the pagecache, may have auxiliary padding pages attached or may be a bounce buffer resulting from crypto or compression. Consequently, the filesystem must not twiddle the folio markings directly. The following API is available to the filesystem: (1) The ->create_write_requests() method is called to ask the filesystem to create the requests it needs. This is passed the writeback slice to be processed. (2) The filesystem should then call netfs_create_write_request() to create the requests it needs. (3) Once a request is initialised, netfs_queue_write_request() can be called to dispatch it asynchronously, if not completed immediately. (4) netfs_write_request_completed() should be called to note the completion of a request. (5) netfs_get_write_request() and netfs_put_write_request() are provided to refcount a request. These take constants from the netfs_wreq_trace enum for logging into ftrace. (6) The ->free_write_request is method is called to ask the filesystem to clean up a request. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2021-06-30 05:31:48 +08:00
EM(netfs_rreq_trace_redirty, "REDIRTY") \
EM(netfs_rreq_trace_resubmit, "RESUBMT") \
netfs: New writeback implementation The current netfslib writeback implementation creates writeback requests of contiguous folio data and then separately tiles subrequests over the space twice, once for the server and once for the cache. This creates a few issues: (1) Every time there's a discontiguity or a change between writing to only one destination or writing to both, it must create a new request. This makes it harder to do vectored writes. (2) The folios don't have the writeback mark removed until the end of the request - and a request could be hundreds of megabytes. (3) In future, I want to support a larger cache granularity, which will require aggregation of some folios that contain unmodified data (which only need to go to the cache) and some which contain modifications (which need to be uploaded and stored to the cache) - but, currently, these are treated as discontiguous. There's also a move to get everyone to use writeback_iter() to extract writable folios from the pagecache. That said, currently writeback_iter() has some issues that make it less than ideal: (1) there's no way to cancel the iteration, even if you find a "temporary" error that means the current folio and all subsequent folios are going to fail; (2) there's no way to filter the folios being written back - something that will impact Ceph with it's ordered snap system; (3) and if you get a folio you can't immediately deal with (say you need to flush the preceding writes), you are left with a folio hanging in the locked state for the duration, when really we should unlock it and relock it later. In this new implementation, I use writeback_iter() to pump folios, progressively creating two parallel, but separate streams and cleaning up the finished folios as the subrequests complete. Either or both streams can contain gaps, and the subrequests in each stream can be of variable size, don't need to align with each other and don't need to align with the folios. Indeed, subrequests can cross folio boundaries, may cover several folios or a folio may be spanned by multiple folios, e.g.: +---+---+-----+-----+---+----------+ Folios: | | | | | | | +---+---+-----+-----+---+----------+ +------+------+ +----+----+ Upload: | | |.....| | | +------+------+ +----+----+ +------+------+------+------+------+ Cache: | | | | | | +------+------+------+------+------+ The progressive subrequest construction permits the algorithm to be preparing both the next upload to the server and the next write to the cache whilst the previous ones are already in progress. Throttling can be applied to control the rate of production of subrequests - and, in any case, we probably want to write them to the server in ascending order, particularly if the file will be extended. Content crypto can also be prepared at the same time as the subrequests and run asynchronously, with the prepped requests being stalled until the crypto catches up with them. This might also be useful for transport crypto, but that happens at a lower layer, so probably would be harder to pull off. The algorithm is split into three parts: (1) The issuer. This walks through the data, packaging it up, encrypting it and creating subrequests. The part of this that generates subrequests only deals with file positions and spans and so is usable for DIO/unbuffered writes as well as buffered writes. (2) The collector. This asynchronously collects completed subrequests, unlocks folios, frees crypto buffers and performs any retries. This runs in a work queue so that the issuer can return to the caller for writeback (so that the VM can have its kswapd thread back) or async writes. (3) The retryer. This pauses the issuer, waits for all outstanding subrequests to complete and then goes through the failed subrequests to reissue them. This may involve reprepping them (with cifs, the credits must be renegotiated, and a subrequest may need splitting), and doing RMW for content crypto if there's a conflicting change on the server. [!] Note that some of the functions are prefixed with "new_" to avoid clashes with existing functions. These will be renamed in a later patch that cuts over to the new algorithm. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Eric Van Hensbergen <ericvh@kernel.org> cc: Latchesar Ionkov <lucho@ionkov.net> cc: Dominique Martinet <asmadeus@codewreck.org> cc: Christian Schoenebeck <linux_oss@crudebyte.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: v9fs@lists.linux.dev cc: linux-afs@lists.infradead.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org
2024-03-19 00:52:05 +08:00
EM(netfs_rreq_trace_set_pause, "PAUSE ") \
EM(netfs_rreq_trace_unlock, "UNLOCK ") \
EM(netfs_rreq_trace_unmark, "UNMARK ") \
EM(netfs_rreq_trace_wait_ip, "WAIT-IP") \
netfs: New writeback implementation The current netfslib writeback implementation creates writeback requests of contiguous folio data and then separately tiles subrequests over the space twice, once for the server and once for the cache. This creates a few issues: (1) Every time there's a discontiguity or a change between writing to only one destination or writing to both, it must create a new request. This makes it harder to do vectored writes. (2) The folios don't have the writeback mark removed until the end of the request - and a request could be hundreds of megabytes. (3) In future, I want to support a larger cache granularity, which will require aggregation of some folios that contain unmodified data (which only need to go to the cache) and some which contain modifications (which need to be uploaded and stored to the cache) - but, currently, these are treated as discontiguous. There's also a move to get everyone to use writeback_iter() to extract writable folios from the pagecache. That said, currently writeback_iter() has some issues that make it less than ideal: (1) there's no way to cancel the iteration, even if you find a "temporary" error that means the current folio and all subsequent folios are going to fail; (2) there's no way to filter the folios being written back - something that will impact Ceph with it's ordered snap system; (3) and if you get a folio you can't immediately deal with (say you need to flush the preceding writes), you are left with a folio hanging in the locked state for the duration, when really we should unlock it and relock it later. In this new implementation, I use writeback_iter() to pump folios, progressively creating two parallel, but separate streams and cleaning up the finished folios as the subrequests complete. Either or both streams can contain gaps, and the subrequests in each stream can be of variable size, don't need to align with each other and don't need to align with the folios. Indeed, subrequests can cross folio boundaries, may cover several folios or a folio may be spanned by multiple folios, e.g.: +---+---+-----+-----+---+----------+ Folios: | | | | | | | +---+---+-----+-----+---+----------+ +------+------+ +----+----+ Upload: | | |.....| | | +------+------+ +----+----+ +------+------+------+------+------+ Cache: | | | | | | +------+------+------+------+------+ The progressive subrequest construction permits the algorithm to be preparing both the next upload to the server and the next write to the cache whilst the previous ones are already in progress. Throttling can be applied to control the rate of production of subrequests - and, in any case, we probably want to write them to the server in ascending order, particularly if the file will be extended. Content crypto can also be prepared at the same time as the subrequests and run asynchronously, with the prepped requests being stalled until the crypto catches up with them. This might also be useful for transport crypto, but that happens at a lower layer, so probably would be harder to pull off. The algorithm is split into three parts: (1) The issuer. This walks through the data, packaging it up, encrypting it and creating subrequests. The part of this that generates subrequests only deals with file positions and spans and so is usable for DIO/unbuffered writes as well as buffered writes. (2) The collector. This asynchronously collects completed subrequests, unlocks folios, frees crypto buffers and performs any retries. This runs in a work queue so that the issuer can return to the caller for writeback (so that the VM can have its kswapd thread back) or async writes. (3) The retryer. This pauses the issuer, waits for all outstanding subrequests to complete and then goes through the failed subrequests to reissue them. This may involve reprepping them (with cifs, the credits must be renegotiated, and a subrequest may need splitting), and doing RMW for content crypto if there's a conflicting change on the server. [!] Note that some of the functions are prefixed with "new_" to avoid clashes with existing functions. These will be renamed in a later patch that cuts over to the new algorithm. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Eric Van Hensbergen <ericvh@kernel.org> cc: Latchesar Ionkov <lucho@ionkov.net> cc: Dominique Martinet <asmadeus@codewreck.org> cc: Christian Schoenebeck <linux_oss@crudebyte.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: v9fs@lists.linux.dev cc: linux-afs@lists.infradead.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org
2024-03-19 00:52:05 +08:00
EM(netfs_rreq_trace_wait_pause, "WT-PAUS") \
netfs: Dispatch write requests to process a writeback slice Dispatch one or more write reqeusts to process a writeback slice, where a slice is tailored more to logical block divisions within the file (such as crypto blocks, an object layout or cache granules) than the protocol RPC maximum capacity. The dispatch doesn't happen until throttling allows, at which point the entire writeback slice is processed and queued. A slice may be written to multiple destinations (one or more servers and the local cache) and the writes to each destination might be split up along different lines. The writeback slice holds the required folios pinned. An iov_iter is provided in netfs_write_request that describes the buffer to be used. This may be part of the pagecache, may have auxiliary padding pages attached or may be a bounce buffer resulting from crypto or compression. Consequently, the filesystem must not twiddle the folio markings directly. The following API is available to the filesystem: (1) The ->create_write_requests() method is called to ask the filesystem to create the requests it needs. This is passed the writeback slice to be processed. (2) The filesystem should then call netfs_create_write_request() to create the requests it needs. (3) Once a request is initialised, netfs_queue_write_request() can be called to dispatch it asynchronously, if not completed immediately. (4) netfs_write_request_completed() should be called to note the completion of a request. (5) netfs_get_write_request() and netfs_put_write_request() are provided to refcount a request. These take constants from the netfs_wreq_trace enum for logging into ftrace. (6) The ->free_write_request is method is called to ask the filesystem to clean up a request. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2021-06-30 05:31:48 +08:00
EM(netfs_rreq_trace_wake_ip, "WAKE-IP") \
netfs: New writeback implementation The current netfslib writeback implementation creates writeback requests of contiguous folio data and then separately tiles subrequests over the space twice, once for the server and once for the cache. This creates a few issues: (1) Every time there's a discontiguity or a change between writing to only one destination or writing to both, it must create a new request. This makes it harder to do vectored writes. (2) The folios don't have the writeback mark removed until the end of the request - and a request could be hundreds of megabytes. (3) In future, I want to support a larger cache granularity, which will require aggregation of some folios that contain unmodified data (which only need to go to the cache) and some which contain modifications (which need to be uploaded and stored to the cache) - but, currently, these are treated as discontiguous. There's also a move to get everyone to use writeback_iter() to extract writable folios from the pagecache. That said, currently writeback_iter() has some issues that make it less than ideal: (1) there's no way to cancel the iteration, even if you find a "temporary" error that means the current folio and all subsequent folios are going to fail; (2) there's no way to filter the folios being written back - something that will impact Ceph with it's ordered snap system; (3) and if you get a folio you can't immediately deal with (say you need to flush the preceding writes), you are left with a folio hanging in the locked state for the duration, when really we should unlock it and relock it later. In this new implementation, I use writeback_iter() to pump folios, progressively creating two parallel, but separate streams and cleaning up the finished folios as the subrequests complete. Either or both streams can contain gaps, and the subrequests in each stream can be of variable size, don't need to align with each other and don't need to align with the folios. Indeed, subrequests can cross folio boundaries, may cover several folios or a folio may be spanned by multiple folios, e.g.: +---+---+-----+-----+---+----------+ Folios: | | | | | | | +---+---+-----+-----+---+----------+ +------+------+ +----+----+ Upload: | | |.....| | | +------+------+ +----+----+ +------+------+------+------+------+ Cache: | | | | | | +------+------+------+------+------+ The progressive subrequest construction permits the algorithm to be preparing both the next upload to the server and the next write to the cache whilst the previous ones are already in progress. Throttling can be applied to control the rate of production of subrequests - and, in any case, we probably want to write them to the server in ascending order, particularly if the file will be extended. Content crypto can also be prepared at the same time as the subrequests and run asynchronously, with the prepped requests being stalled until the crypto catches up with them. This might also be useful for transport crypto, but that happens at a lower layer, so probably would be harder to pull off. The algorithm is split into three parts: (1) The issuer. This walks through the data, packaging it up, encrypting it and creating subrequests. The part of this that generates subrequests only deals with file positions and spans and so is usable for DIO/unbuffered writes as well as buffered writes. (2) The collector. This asynchronously collects completed subrequests, unlocks folios, frees crypto buffers and performs any retries. This runs in a work queue so that the issuer can return to the caller for writeback (so that the VM can have its kswapd thread back) or async writes. (3) The retryer. This pauses the issuer, waits for all outstanding subrequests to complete and then goes through the failed subrequests to reissue them. This may involve reprepping them (with cifs, the credits must be renegotiated, and a subrequest may need splitting), and doing RMW for content crypto if there's a conflicting change on the server. [!] Note that some of the functions are prefixed with "new_" to avoid clashes with existing functions. These will be renamed in a later patch that cuts over to the new algorithm. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Eric Van Hensbergen <ericvh@kernel.org> cc: Latchesar Ionkov <lucho@ionkov.net> cc: Dominique Martinet <asmadeus@codewreck.org> cc: Christian Schoenebeck <linux_oss@crudebyte.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: v9fs@lists.linux.dev cc: linux-afs@lists.infradead.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org
2024-03-19 00:52:05 +08:00
EM(netfs_rreq_trace_unpause, "UNPAUSE") \
netfs: Dispatch write requests to process a writeback slice Dispatch one or more write reqeusts to process a writeback slice, where a slice is tailored more to logical block divisions within the file (such as crypto blocks, an object layout or cache granules) than the protocol RPC maximum capacity. The dispatch doesn't happen until throttling allows, at which point the entire writeback slice is processed and queued. A slice may be written to multiple destinations (one or more servers and the local cache) and the writes to each destination might be split up along different lines. The writeback slice holds the required folios pinned. An iov_iter is provided in netfs_write_request that describes the buffer to be used. This may be part of the pagecache, may have auxiliary padding pages attached or may be a bounce buffer resulting from crypto or compression. Consequently, the filesystem must not twiddle the folio markings directly. The following API is available to the filesystem: (1) The ->create_write_requests() method is called to ask the filesystem to create the requests it needs. This is passed the writeback slice to be processed. (2) The filesystem should then call netfs_create_write_request() to create the requests it needs. (3) Once a request is initialised, netfs_queue_write_request() can be called to dispatch it asynchronously, if not completed immediately. (4) netfs_write_request_completed() should be called to note the completion of a request. (5) netfs_get_write_request() and netfs_put_write_request() are provided to refcount a request. These take constants from the netfs_wreq_trace enum for logging into ftrace. (6) The ->free_write_request is method is called to ask the filesystem to clean up a request. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2021-06-30 05:31:48 +08:00
E_(netfs_rreq_trace_write_done, "WR-DONE")
netfs: Add tracepoints Add three tracepoints to track the activity of the read helpers: (1) netfs/netfs_read This logs entry to the read helpers and also expansion of the range in a readahead request. (2) netfs/netfs_rreq This logs the progress of netfs_read_request objects which track read requests. A read request may be a compound of multiple subrequests. (3) netfs/netfs_sreq This logs the progress of netfs_read_subrequest objects, which track the contributions from various sources to a read request. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-and-tested-by: Jeff Layton <jlayton@kernel.org> Tested-by: Dave Wysochanski <dwysocha@redhat.com> Tested-By: Marc Dionne <marc.dionne@auristor.com> cc: Matthew Wilcox <willy@infradead.org> cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/161118138060.1232039.5353374588021776217.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161033468.2537118.14021843889844001905.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340395843.1303470.7355519662919639648.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/161539538693.286939.10171713520419106334.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/161653796447.2770958.1870655382450862155.stgit@warthog.procyon.org.uk/ # v5 Link: https://lore.kernel.org/r/161789078003.6155.17814844411672989942.stgit@warthog.procyon.org.uk/ # v6
2020-09-18 16:25:13 +08:00
#define netfs_sreq_sources \
EM(NETFS_FILL_WITH_ZEROES, "ZERO") \
EM(NETFS_DOWNLOAD_FROM_SERVER, "DOWN") \
EM(NETFS_READ_FROM_CACHE, "READ") \
EM(NETFS_INVALID_READ, "INVL") \
EM(NETFS_UPLOAD_TO_SERVER, "UPLD") \
EM(NETFS_WRITE_TO_CACHE, "WRIT") \
E_(NETFS_INVALID_WRITE, "INVL")
netfs: Add tracepoints Add three tracepoints to track the activity of the read helpers: (1) netfs/netfs_read This logs entry to the read helpers and also expansion of the range in a readahead request. (2) netfs/netfs_rreq This logs the progress of netfs_read_request objects which track read requests. A read request may be a compound of multiple subrequests. (3) netfs/netfs_sreq This logs the progress of netfs_read_subrequest objects, which track the contributions from various sources to a read request. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-and-tested-by: Jeff Layton <jlayton@kernel.org> Tested-by: Dave Wysochanski <dwysocha@redhat.com> Tested-By: Marc Dionne <marc.dionne@auristor.com> cc: Matthew Wilcox <willy@infradead.org> cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/161118138060.1232039.5353374588021776217.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161033468.2537118.14021843889844001905.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340395843.1303470.7355519662919639648.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/161539538693.286939.10171713520419106334.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/161653796447.2770958.1870655382450862155.stgit@warthog.procyon.org.uk/ # v5 Link: https://lore.kernel.org/r/161789078003.6155.17814844411672989942.stgit@warthog.procyon.org.uk/ # v6
2020-09-18 16:25:13 +08:00
#define netfs_sreq_traces \
netfs: New writeback implementation The current netfslib writeback implementation creates writeback requests of contiguous folio data and then separately tiles subrequests over the space twice, once for the server and once for the cache. This creates a few issues: (1) Every time there's a discontiguity or a change between writing to only one destination or writing to both, it must create a new request. This makes it harder to do vectored writes. (2) The folios don't have the writeback mark removed until the end of the request - and a request could be hundreds of megabytes. (3) In future, I want to support a larger cache granularity, which will require aggregation of some folios that contain unmodified data (which only need to go to the cache) and some which contain modifications (which need to be uploaded and stored to the cache) - but, currently, these are treated as discontiguous. There's also a move to get everyone to use writeback_iter() to extract writable folios from the pagecache. That said, currently writeback_iter() has some issues that make it less than ideal: (1) there's no way to cancel the iteration, even if you find a "temporary" error that means the current folio and all subsequent folios are going to fail; (2) there's no way to filter the folios being written back - something that will impact Ceph with it's ordered snap system; (3) and if you get a folio you can't immediately deal with (say you need to flush the preceding writes), you are left with a folio hanging in the locked state for the duration, when really we should unlock it and relock it later. In this new implementation, I use writeback_iter() to pump folios, progressively creating two parallel, but separate streams and cleaning up the finished folios as the subrequests complete. Either or both streams can contain gaps, and the subrequests in each stream can be of variable size, don't need to align with each other and don't need to align with the folios. Indeed, subrequests can cross folio boundaries, may cover several folios or a folio may be spanned by multiple folios, e.g.: +---+---+-----+-----+---+----------+ Folios: | | | | | | | +---+---+-----+-----+---+----------+ +------+------+ +----+----+ Upload: | | |.....| | | +------+------+ +----+----+ +------+------+------+------+------+ Cache: | | | | | | +------+------+------+------+------+ The progressive subrequest construction permits the algorithm to be preparing both the next upload to the server and the next write to the cache whilst the previous ones are already in progress. Throttling can be applied to control the rate of production of subrequests - and, in any case, we probably want to write them to the server in ascending order, particularly if the file will be extended. Content crypto can also be prepared at the same time as the subrequests and run asynchronously, with the prepped requests being stalled until the crypto catches up with them. This might also be useful for transport crypto, but that happens at a lower layer, so probably would be harder to pull off. The algorithm is split into three parts: (1) The issuer. This walks through the data, packaging it up, encrypting it and creating subrequests. The part of this that generates subrequests only deals with file positions and spans and so is usable for DIO/unbuffered writes as well as buffered writes. (2) The collector. This asynchronously collects completed subrequests, unlocks folios, frees crypto buffers and performs any retries. This runs in a work queue so that the issuer can return to the caller for writeback (so that the VM can have its kswapd thread back) or async writes. (3) The retryer. This pauses the issuer, waits for all outstanding subrequests to complete and then goes through the failed subrequests to reissue them. This may involve reprepping them (with cifs, the credits must be renegotiated, and a subrequest may need splitting), and doing RMW for content crypto if there's a conflicting change on the server. [!] Note that some of the functions are prefixed with "new_" to avoid clashes with existing functions. These will be renamed in a later patch that cuts over to the new algorithm. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Eric Van Hensbergen <ericvh@kernel.org> cc: Latchesar Ionkov <lucho@ionkov.net> cc: Dominique Martinet <asmadeus@codewreck.org> cc: Christian Schoenebeck <linux_oss@crudebyte.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: v9fs@lists.linux.dev cc: linux-afs@lists.infradead.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org
2024-03-19 00:52:05 +08:00
EM(netfs_sreq_trace_discard, "DSCRD") \
netfs: Add tracepoints Add three tracepoints to track the activity of the read helpers: (1) netfs/netfs_read This logs entry to the read helpers and also expansion of the range in a readahead request. (2) netfs/netfs_rreq This logs the progress of netfs_read_request objects which track read requests. A read request may be a compound of multiple subrequests. (3) netfs/netfs_sreq This logs the progress of netfs_read_subrequest objects, which track the contributions from various sources to a read request. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-and-tested-by: Jeff Layton <jlayton@kernel.org> Tested-by: Dave Wysochanski <dwysocha@redhat.com> Tested-By: Marc Dionne <marc.dionne@auristor.com> cc: Matthew Wilcox <willy@infradead.org> cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/161118138060.1232039.5353374588021776217.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161033468.2537118.14021843889844001905.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340395843.1303470.7355519662919639648.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/161539538693.286939.10171713520419106334.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/161653796447.2770958.1870655382450862155.stgit@warthog.procyon.org.uk/ # v5 Link: https://lore.kernel.org/r/161789078003.6155.17814844411672989942.stgit@warthog.procyon.org.uk/ # v6
2020-09-18 16:25:13 +08:00
EM(netfs_sreq_trace_download_instead, "RDOWN") \
netfs: New writeback implementation The current netfslib writeback implementation creates writeback requests of contiguous folio data and then separately tiles subrequests over the space twice, once for the server and once for the cache. This creates a few issues: (1) Every time there's a discontiguity or a change between writing to only one destination or writing to both, it must create a new request. This makes it harder to do vectored writes. (2) The folios don't have the writeback mark removed until the end of the request - and a request could be hundreds of megabytes. (3) In future, I want to support a larger cache granularity, which will require aggregation of some folios that contain unmodified data (which only need to go to the cache) and some which contain modifications (which need to be uploaded and stored to the cache) - but, currently, these are treated as discontiguous. There's also a move to get everyone to use writeback_iter() to extract writable folios from the pagecache. That said, currently writeback_iter() has some issues that make it less than ideal: (1) there's no way to cancel the iteration, even if you find a "temporary" error that means the current folio and all subsequent folios are going to fail; (2) there's no way to filter the folios being written back - something that will impact Ceph with it's ordered snap system; (3) and if you get a folio you can't immediately deal with (say you need to flush the preceding writes), you are left with a folio hanging in the locked state for the duration, when really we should unlock it and relock it later. In this new implementation, I use writeback_iter() to pump folios, progressively creating two parallel, but separate streams and cleaning up the finished folios as the subrequests complete. Either or both streams can contain gaps, and the subrequests in each stream can be of variable size, don't need to align with each other and don't need to align with the folios. Indeed, subrequests can cross folio boundaries, may cover several folios or a folio may be spanned by multiple folios, e.g.: +---+---+-----+-----+---+----------+ Folios: | | | | | | | +---+---+-----+-----+---+----------+ +------+------+ +----+----+ Upload: | | |.....| | | +------+------+ +----+----+ +------+------+------+------+------+ Cache: | | | | | | +------+------+------+------+------+ The progressive subrequest construction permits the algorithm to be preparing both the next upload to the server and the next write to the cache whilst the previous ones are already in progress. Throttling can be applied to control the rate of production of subrequests - and, in any case, we probably want to write them to the server in ascending order, particularly if the file will be extended. Content crypto can also be prepared at the same time as the subrequests and run asynchronously, with the prepped requests being stalled until the crypto catches up with them. This might also be useful for transport crypto, but that happens at a lower layer, so probably would be harder to pull off. The algorithm is split into three parts: (1) The issuer. This walks through the data, packaging it up, encrypting it and creating subrequests. The part of this that generates subrequests only deals with file positions and spans and so is usable for DIO/unbuffered writes as well as buffered writes. (2) The collector. This asynchronously collects completed subrequests, unlocks folios, frees crypto buffers and performs any retries. This runs in a work queue so that the issuer can return to the caller for writeback (so that the VM can have its kswapd thread back) or async writes. (3) The retryer. This pauses the issuer, waits for all outstanding subrequests to complete and then goes through the failed subrequests to reissue them. This may involve reprepping them (with cifs, the credits must be renegotiated, and a subrequest may need splitting), and doing RMW for content crypto if there's a conflicting change on the server. [!] Note that some of the functions are prefixed with "new_" to avoid clashes with existing functions. These will be renamed in a later patch that cuts over to the new algorithm. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Eric Van Hensbergen <ericvh@kernel.org> cc: Latchesar Ionkov <lucho@ionkov.net> cc: Dominique Martinet <asmadeus@codewreck.org> cc: Christian Schoenebeck <linux_oss@crudebyte.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: v9fs@lists.linux.dev cc: linux-afs@lists.infradead.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org
2024-03-19 00:52:05 +08:00
EM(netfs_sreq_trace_fail, "FAIL ") \
netfs: Add tracepoints Add three tracepoints to track the activity of the read helpers: (1) netfs/netfs_read This logs entry to the read helpers and also expansion of the range in a readahead request. (2) netfs/netfs_rreq This logs the progress of netfs_read_request objects which track read requests. A read request may be a compound of multiple subrequests. (3) netfs/netfs_sreq This logs the progress of netfs_read_subrequest objects, which track the contributions from various sources to a read request. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-and-tested-by: Jeff Layton <jlayton@kernel.org> Tested-by: Dave Wysochanski <dwysocha@redhat.com> Tested-By: Marc Dionne <marc.dionne@auristor.com> cc: Matthew Wilcox <willy@infradead.org> cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/161118138060.1232039.5353374588021776217.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161033468.2537118.14021843889844001905.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340395843.1303470.7355519662919639648.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/161539538693.286939.10171713520419106334.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/161653796447.2770958.1870655382450862155.stgit@warthog.procyon.org.uk/ # v5 Link: https://lore.kernel.org/r/161789078003.6155.17814844411672989942.stgit@warthog.procyon.org.uk/ # v6
2020-09-18 16:25:13 +08:00
EM(netfs_sreq_trace_free, "FREE ") \
EM(netfs_sreq_trace_limited, "LIMIT") \
netfs: Add tracepoints Add three tracepoints to track the activity of the read helpers: (1) netfs/netfs_read This logs entry to the read helpers and also expansion of the range in a readahead request. (2) netfs/netfs_rreq This logs the progress of netfs_read_request objects which track read requests. A read request may be a compound of multiple subrequests. (3) netfs/netfs_sreq This logs the progress of netfs_read_subrequest objects, which track the contributions from various sources to a read request. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-and-tested-by: Jeff Layton <jlayton@kernel.org> Tested-by: Dave Wysochanski <dwysocha@redhat.com> Tested-By: Marc Dionne <marc.dionne@auristor.com> cc: Matthew Wilcox <willy@infradead.org> cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/161118138060.1232039.5353374588021776217.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161033468.2537118.14021843889844001905.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340395843.1303470.7355519662919639648.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/161539538693.286939.10171713520419106334.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/161653796447.2770958.1870655382450862155.stgit@warthog.procyon.org.uk/ # v5 Link: https://lore.kernel.org/r/161789078003.6155.17814844411672989942.stgit@warthog.procyon.org.uk/ # v6
2020-09-18 16:25:13 +08:00
EM(netfs_sreq_trace_prepare, "PREP ") \
netfs: New writeback implementation The current netfslib writeback implementation creates writeback requests of contiguous folio data and then separately tiles subrequests over the space twice, once for the server and once for the cache. This creates a few issues: (1) Every time there's a discontiguity or a change between writing to only one destination or writing to both, it must create a new request. This makes it harder to do vectored writes. (2) The folios don't have the writeback mark removed until the end of the request - and a request could be hundreds of megabytes. (3) In future, I want to support a larger cache granularity, which will require aggregation of some folios that contain unmodified data (which only need to go to the cache) and some which contain modifications (which need to be uploaded and stored to the cache) - but, currently, these are treated as discontiguous. There's also a move to get everyone to use writeback_iter() to extract writable folios from the pagecache. That said, currently writeback_iter() has some issues that make it less than ideal: (1) there's no way to cancel the iteration, even if you find a "temporary" error that means the current folio and all subsequent folios are going to fail; (2) there's no way to filter the folios being written back - something that will impact Ceph with it's ordered snap system; (3) and if you get a folio you can't immediately deal with (say you need to flush the preceding writes), you are left with a folio hanging in the locked state for the duration, when really we should unlock it and relock it later. In this new implementation, I use writeback_iter() to pump folios, progressively creating two parallel, but separate streams and cleaning up the finished folios as the subrequests complete. Either or both streams can contain gaps, and the subrequests in each stream can be of variable size, don't need to align with each other and don't need to align with the folios. Indeed, subrequests can cross folio boundaries, may cover several folios or a folio may be spanned by multiple folios, e.g.: +---+---+-----+-----+---+----------+ Folios: | | | | | | | +---+---+-----+-----+---+----------+ +------+------+ +----+----+ Upload: | | |.....| | | +------+------+ +----+----+ +------+------+------+------+------+ Cache: | | | | | | +------+------+------+------+------+ The progressive subrequest construction permits the algorithm to be preparing both the next upload to the server and the next write to the cache whilst the previous ones are already in progress. Throttling can be applied to control the rate of production of subrequests - and, in any case, we probably want to write them to the server in ascending order, particularly if the file will be extended. Content crypto can also be prepared at the same time as the subrequests and run asynchronously, with the prepped requests being stalled until the crypto catches up with them. This might also be useful for transport crypto, but that happens at a lower layer, so probably would be harder to pull off. The algorithm is split into three parts: (1) The issuer. This walks through the data, packaging it up, encrypting it and creating subrequests. The part of this that generates subrequests only deals with file positions and spans and so is usable for DIO/unbuffered writes as well as buffered writes. (2) The collector. This asynchronously collects completed subrequests, unlocks folios, frees crypto buffers and performs any retries. This runs in a work queue so that the issuer can return to the caller for writeback (so that the VM can have its kswapd thread back) or async writes. (3) The retryer. This pauses the issuer, waits for all outstanding subrequests to complete and then goes through the failed subrequests to reissue them. This may involve reprepping them (with cifs, the credits must be renegotiated, and a subrequest may need splitting), and doing RMW for content crypto if there's a conflicting change on the server. [!] Note that some of the functions are prefixed with "new_" to avoid clashes with existing functions. These will be renamed in a later patch that cuts over to the new algorithm. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Eric Van Hensbergen <ericvh@kernel.org> cc: Latchesar Ionkov <lucho@ionkov.net> cc: Dominique Martinet <asmadeus@codewreck.org> cc: Christian Schoenebeck <linux_oss@crudebyte.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: v9fs@lists.linux.dev cc: linux-afs@lists.infradead.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org
2024-03-19 00:52:05 +08:00
EM(netfs_sreq_trace_prep_failed, "PRPFL") \
netfs: Add tracepoints Add three tracepoints to track the activity of the read helpers: (1) netfs/netfs_read This logs entry to the read helpers and also expansion of the range in a readahead request. (2) netfs/netfs_rreq This logs the progress of netfs_read_request objects which track read requests. A read request may be a compound of multiple subrequests. (3) netfs/netfs_sreq This logs the progress of netfs_read_subrequest objects, which track the contributions from various sources to a read request. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-and-tested-by: Jeff Layton <jlayton@kernel.org> Tested-by: Dave Wysochanski <dwysocha@redhat.com> Tested-By: Marc Dionne <marc.dionne@auristor.com> cc: Matthew Wilcox <willy@infradead.org> cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/161118138060.1232039.5353374588021776217.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161033468.2537118.14021843889844001905.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340395843.1303470.7355519662919639648.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/161539538693.286939.10171713520419106334.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/161653796447.2770958.1870655382450862155.stgit@warthog.procyon.org.uk/ # v5 Link: https://lore.kernel.org/r/161789078003.6155.17814844411672989942.stgit@warthog.procyon.org.uk/ # v6
2020-09-18 16:25:13 +08:00
EM(netfs_sreq_trace_resubmit_short, "SHORT") \
netfs: New writeback implementation The current netfslib writeback implementation creates writeback requests of contiguous folio data and then separately tiles subrequests over the space twice, once for the server and once for the cache. This creates a few issues: (1) Every time there's a discontiguity or a change between writing to only one destination or writing to both, it must create a new request. This makes it harder to do vectored writes. (2) The folios don't have the writeback mark removed until the end of the request - and a request could be hundreds of megabytes. (3) In future, I want to support a larger cache granularity, which will require aggregation of some folios that contain unmodified data (which only need to go to the cache) and some which contain modifications (which need to be uploaded and stored to the cache) - but, currently, these are treated as discontiguous. There's also a move to get everyone to use writeback_iter() to extract writable folios from the pagecache. That said, currently writeback_iter() has some issues that make it less than ideal: (1) there's no way to cancel the iteration, even if you find a "temporary" error that means the current folio and all subsequent folios are going to fail; (2) there's no way to filter the folios being written back - something that will impact Ceph with it's ordered snap system; (3) and if you get a folio you can't immediately deal with (say you need to flush the preceding writes), you are left with a folio hanging in the locked state for the duration, when really we should unlock it and relock it later. In this new implementation, I use writeback_iter() to pump folios, progressively creating two parallel, but separate streams and cleaning up the finished folios as the subrequests complete. Either or both streams can contain gaps, and the subrequests in each stream can be of variable size, don't need to align with each other and don't need to align with the folios. Indeed, subrequests can cross folio boundaries, may cover several folios or a folio may be spanned by multiple folios, e.g.: +---+---+-----+-----+---+----------+ Folios: | | | | | | | +---+---+-----+-----+---+----------+ +------+------+ +----+----+ Upload: | | |.....| | | +------+------+ +----+----+ +------+------+------+------+------+ Cache: | | | | | | +------+------+------+------+------+ The progressive subrequest construction permits the algorithm to be preparing both the next upload to the server and the next write to the cache whilst the previous ones are already in progress. Throttling can be applied to control the rate of production of subrequests - and, in any case, we probably want to write them to the server in ascending order, particularly if the file will be extended. Content crypto can also be prepared at the same time as the subrequests and run asynchronously, with the prepped requests being stalled until the crypto catches up with them. This might also be useful for transport crypto, but that happens at a lower layer, so probably would be harder to pull off. The algorithm is split into three parts: (1) The issuer. This walks through the data, packaging it up, encrypting it and creating subrequests. The part of this that generates subrequests only deals with file positions and spans and so is usable for DIO/unbuffered writes as well as buffered writes. (2) The collector. This asynchronously collects completed subrequests, unlocks folios, frees crypto buffers and performs any retries. This runs in a work queue so that the issuer can return to the caller for writeback (so that the VM can have its kswapd thread back) or async writes. (3) The retryer. This pauses the issuer, waits for all outstanding subrequests to complete and then goes through the failed subrequests to reissue them. This may involve reprepping them (with cifs, the credits must be renegotiated, and a subrequest may need splitting), and doing RMW for content crypto if there's a conflicting change on the server. [!] Note that some of the functions are prefixed with "new_" to avoid clashes with existing functions. These will be renamed in a later patch that cuts over to the new algorithm. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Eric Van Hensbergen <ericvh@kernel.org> cc: Latchesar Ionkov <lucho@ionkov.net> cc: Dominique Martinet <asmadeus@codewreck.org> cc: Christian Schoenebeck <linux_oss@crudebyte.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: v9fs@lists.linux.dev cc: linux-afs@lists.infradead.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org
2024-03-19 00:52:05 +08:00
EM(netfs_sreq_trace_retry, "RETRY") \
netfs: Add tracepoints Add three tracepoints to track the activity of the read helpers: (1) netfs/netfs_read This logs entry to the read helpers and also expansion of the range in a readahead request. (2) netfs/netfs_rreq This logs the progress of netfs_read_request objects which track read requests. A read request may be a compound of multiple subrequests. (3) netfs/netfs_sreq This logs the progress of netfs_read_subrequest objects, which track the contributions from various sources to a read request. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-and-tested-by: Jeff Layton <jlayton@kernel.org> Tested-by: Dave Wysochanski <dwysocha@redhat.com> Tested-By: Marc Dionne <marc.dionne@auristor.com> cc: Matthew Wilcox <willy@infradead.org> cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/161118138060.1232039.5353374588021776217.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161033468.2537118.14021843889844001905.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340395843.1303470.7355519662919639648.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/161539538693.286939.10171713520419106334.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/161653796447.2770958.1870655382450862155.stgit@warthog.procyon.org.uk/ # v5 Link: https://lore.kernel.org/r/161789078003.6155.17814844411672989942.stgit@warthog.procyon.org.uk/ # v6
2020-09-18 16:25:13 +08:00
EM(netfs_sreq_trace_submit, "SUBMT") \
EM(netfs_sreq_trace_terminated, "TERM ") \
EM(netfs_sreq_trace_write, "WRITE") \
netfs: Define an interface to talk to a cache Add an interface to the netfs helper library for reading data from the cache instead of downloading it from the server and support for writing data just downloaded or cleared to the cache. The API passes an iov_iter to the cache read/write routines to indicate the data/buffer to be used. This is done using the ITER_XARRAY type to provide direct access to the netfs inode's pagecache. When the netfs's ->begin_cache_operation() method is called, this must fill in the cache_resources in the netfs_read_request struct, including the netfs_cache_ops used by the helper lib to talk to the cache. The helper lib does not directly access the cache. Changes: v6: - Call trace_netfs_read() after beginning the cache op so that the cookie debug ID can be logged[3]. - Don't record the error from writing to the cache. We don't want to pass it back to the netfs[4]. - Fix copy-to-cache subreq amalgamation to not round up as it goes along otherwise it overcalculates the length of the write[5]. v5: - Use end_page_fscache() rather than unlock_page_fscache()[2]. v4: - Added flag to netfs_subreq_terminated() to indicate that the caller may have been running async and stuff that might sleep needs punting to a workqueue (can't use in_softirq()[1]). - Add missing inc of netfs_n_rh_read stat. - Move initial definition of fscache_begin_read_operation() elsewhere. - Need to call op->begin_cache_operation() from netfs_write_begin(). Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-and-tested-by: Jeff Layton <jlayton@kernel.org> Tested-by: Dave Wysochanski <dwysocha@redhat.com> Tested-By: Marc Dionne <marc.dionne@auristor.com> cc: Matthew Wilcox <willy@infradead.org> cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20210216084230.GA23669@lst.de/ [1] Link: https://lore.kernel.org/r/2499407.1616505440@warthog.procyon.org.uk/ [2] Link: https://lore.kernel.org/r/161781045123.463527.14533348855710902201.stgit@warthog.procyon.org.uk/ [3] Link: https://lore.kernel.org/r/161781046256.463527.18158681600085556192.stgit@warthog.procyon.org.uk/ [4] Link: https://lore.kernel.org/r/161781047695.463527.7463536103593997492.stgit@warthog.procyon.org.uk/ [5] Link: https://lore.kernel.org/r/161118141321.1232039.8296910406755622458.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161036700.2537118.11170748455436854978.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340399569.1303470.1138884774643385730.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/161539542874.286939.13337898213448136687.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/161653799826.2770958.9015430297426331950.stgit@warthog.procyon.org.uk/ # v5 Link: https://lore.kernel.org/r/161789081462.6155.3853904866933313256.stgit@warthog.procyon.org.uk/ # v6
2020-02-06 22:22:24 +08:00
EM(netfs_sreq_trace_write_skip, "SKIP ") \
netfs: Add tracepoints Add three tracepoints to track the activity of the read helpers: (1) netfs/netfs_read This logs entry to the read helpers and also expansion of the range in a readahead request. (2) netfs/netfs_rreq This logs the progress of netfs_read_request objects which track read requests. A read request may be a compound of multiple subrequests. (3) netfs/netfs_sreq This logs the progress of netfs_read_subrequest objects, which track the contributions from various sources to a read request. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-and-tested-by: Jeff Layton <jlayton@kernel.org> Tested-by: Dave Wysochanski <dwysocha@redhat.com> Tested-By: Marc Dionne <marc.dionne@auristor.com> cc: Matthew Wilcox <willy@infradead.org> cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/161118138060.1232039.5353374588021776217.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161033468.2537118.14021843889844001905.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340395843.1303470.7355519662919639648.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/161539538693.286939.10171713520419106334.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/161653796447.2770958.1870655382450862155.stgit@warthog.procyon.org.uk/ # v5 Link: https://lore.kernel.org/r/161789078003.6155.17814844411672989942.stgit@warthog.procyon.org.uk/ # v6
2020-09-18 16:25:13 +08:00
E_(netfs_sreq_trace_write_term, "WTERM")
#define netfs_failures \
EM(netfs_fail_check_write_begin, "check-write-begin") \
EM(netfs_fail_copy_to_cache, "copy-to-cache") \
EM(netfs_fail_dio_read_short, "dio-read-short") \
EM(netfs_fail_dio_read_zero, "dio-read-zero") \
EM(netfs_fail_read, "read") \
EM(netfs_fail_short_read, "short-read") \
netfs: Dispatch write requests to process a writeback slice Dispatch one or more write reqeusts to process a writeback slice, where a slice is tailored more to logical block divisions within the file (such as crypto blocks, an object layout or cache granules) than the protocol RPC maximum capacity. The dispatch doesn't happen until throttling allows, at which point the entire writeback slice is processed and queued. A slice may be written to multiple destinations (one or more servers and the local cache) and the writes to each destination might be split up along different lines. The writeback slice holds the required folios pinned. An iov_iter is provided in netfs_write_request that describes the buffer to be used. This may be part of the pagecache, may have auxiliary padding pages attached or may be a bounce buffer resulting from crypto or compression. Consequently, the filesystem must not twiddle the folio markings directly. The following API is available to the filesystem: (1) The ->create_write_requests() method is called to ask the filesystem to create the requests it needs. This is passed the writeback slice to be processed. (2) The filesystem should then call netfs_create_write_request() to create the requests it needs. (3) Once a request is initialised, netfs_queue_write_request() can be called to dispatch it asynchronously, if not completed immediately. (4) netfs_write_request_completed() should be called to note the completion of a request. (5) netfs_get_write_request() and netfs_put_write_request() are provided to refcount a request. These take constants from the netfs_wreq_trace enum for logging into ftrace. (6) The ->free_write_request is method is called to ask the filesystem to clean up a request. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2021-06-30 05:31:48 +08:00
EM(netfs_fail_prepare_write, "prep-write") \
E_(netfs_fail_write, "write")
#define netfs_rreq_ref_traces \
EM(netfs_rreq_trace_get_for_outstanding,"GET OUTSTND") \
EM(netfs_rreq_trace_get_subreq, "GET SUBREQ ") \
netfs: New writeback implementation The current netfslib writeback implementation creates writeback requests of contiguous folio data and then separately tiles subrequests over the space twice, once for the server and once for the cache. This creates a few issues: (1) Every time there's a discontiguity or a change between writing to only one destination or writing to both, it must create a new request. This makes it harder to do vectored writes. (2) The folios don't have the writeback mark removed until the end of the request - and a request could be hundreds of megabytes. (3) In future, I want to support a larger cache granularity, which will require aggregation of some folios that contain unmodified data (which only need to go to the cache) and some which contain modifications (which need to be uploaded and stored to the cache) - but, currently, these are treated as discontiguous. There's also a move to get everyone to use writeback_iter() to extract writable folios from the pagecache. That said, currently writeback_iter() has some issues that make it less than ideal: (1) there's no way to cancel the iteration, even if you find a "temporary" error that means the current folio and all subsequent folios are going to fail; (2) there's no way to filter the folios being written back - something that will impact Ceph with it's ordered snap system; (3) and if you get a folio you can't immediately deal with (say you need to flush the preceding writes), you are left with a folio hanging in the locked state for the duration, when really we should unlock it and relock it later. In this new implementation, I use writeback_iter() to pump folios, progressively creating two parallel, but separate streams and cleaning up the finished folios as the subrequests complete. Either or both streams can contain gaps, and the subrequests in each stream can be of variable size, don't need to align with each other and don't need to align with the folios. Indeed, subrequests can cross folio boundaries, may cover several folios or a folio may be spanned by multiple folios, e.g.: +---+---+-----+-----+---+----------+ Folios: | | | | | | | +---+---+-----+-----+---+----------+ +------+------+ +----+----+ Upload: | | |.....| | | +------+------+ +----+----+ +------+------+------+------+------+ Cache: | | | | | | +------+------+------+------+------+ The progressive subrequest construction permits the algorithm to be preparing both the next upload to the server and the next write to the cache whilst the previous ones are already in progress. Throttling can be applied to control the rate of production of subrequests - and, in any case, we probably want to write them to the server in ascending order, particularly if the file will be extended. Content crypto can also be prepared at the same time as the subrequests and run asynchronously, with the prepped requests being stalled until the crypto catches up with them. This might also be useful for transport crypto, but that happens at a lower layer, so probably would be harder to pull off. The algorithm is split into three parts: (1) The issuer. This walks through the data, packaging it up, encrypting it and creating subrequests. The part of this that generates subrequests only deals with file positions and spans and so is usable for DIO/unbuffered writes as well as buffered writes. (2) The collector. This asynchronously collects completed subrequests, unlocks folios, frees crypto buffers and performs any retries. This runs in a work queue so that the issuer can return to the caller for writeback (so that the VM can have its kswapd thread back) or async writes. (3) The retryer. This pauses the issuer, waits for all outstanding subrequests to complete and then goes through the failed subrequests to reissue them. This may involve reprepping them (with cifs, the credits must be renegotiated, and a subrequest may need splitting), and doing RMW for content crypto if there's a conflicting change on the server. [!] Note that some of the functions are prefixed with "new_" to avoid clashes with existing functions. These will be renamed in a later patch that cuts over to the new algorithm. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Eric Van Hensbergen <ericvh@kernel.org> cc: Latchesar Ionkov <lucho@ionkov.net> cc: Dominique Martinet <asmadeus@codewreck.org> cc: Christian Schoenebeck <linux_oss@crudebyte.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: v9fs@lists.linux.dev cc: linux-afs@lists.infradead.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org
2024-03-19 00:52:05 +08:00
EM(netfs_rreq_trace_get_work, "GET WORK ") \
EM(netfs_rreq_trace_put_complete, "PUT COMPLT ") \
EM(netfs_rreq_trace_put_discard, "PUT DISCARD") \
EM(netfs_rreq_trace_put_failed, "PUT FAILED ") \
EM(netfs_rreq_trace_put_no_submit, "PUT NO-SUBM") \
EM(netfs_rreq_trace_put_return, "PUT RETURN ") \
EM(netfs_rreq_trace_put_subreq, "PUT SUBREQ ") \
netfs: Dispatch write requests to process a writeback slice Dispatch one or more write reqeusts to process a writeback slice, where a slice is tailored more to logical block divisions within the file (such as crypto blocks, an object layout or cache granules) than the protocol RPC maximum capacity. The dispatch doesn't happen until throttling allows, at which point the entire writeback slice is processed and queued. A slice may be written to multiple destinations (one or more servers and the local cache) and the writes to each destination might be split up along different lines. The writeback slice holds the required folios pinned. An iov_iter is provided in netfs_write_request that describes the buffer to be used. This may be part of the pagecache, may have auxiliary padding pages attached or may be a bounce buffer resulting from crypto or compression. Consequently, the filesystem must not twiddle the folio markings directly. The following API is available to the filesystem: (1) The ->create_write_requests() method is called to ask the filesystem to create the requests it needs. This is passed the writeback slice to be processed. (2) The filesystem should then call netfs_create_write_request() to create the requests it needs. (3) Once a request is initialised, netfs_queue_write_request() can be called to dispatch it asynchronously, if not completed immediately. (4) netfs_write_request_completed() should be called to note the completion of a request. (5) netfs_get_write_request() and netfs_put_write_request() are provided to refcount a request. These take constants from the netfs_wreq_trace enum for logging into ftrace. (6) The ->free_write_request is method is called to ask the filesystem to clean up a request. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2021-06-30 05:31:48 +08:00
EM(netfs_rreq_trace_put_work, "PUT WORK ") \
netfs: New writeback implementation The current netfslib writeback implementation creates writeback requests of contiguous folio data and then separately tiles subrequests over the space twice, once for the server and once for the cache. This creates a few issues: (1) Every time there's a discontiguity or a change between writing to only one destination or writing to both, it must create a new request. This makes it harder to do vectored writes. (2) The folios don't have the writeback mark removed until the end of the request - and a request could be hundreds of megabytes. (3) In future, I want to support a larger cache granularity, which will require aggregation of some folios that contain unmodified data (which only need to go to the cache) and some which contain modifications (which need to be uploaded and stored to the cache) - but, currently, these are treated as discontiguous. There's also a move to get everyone to use writeback_iter() to extract writable folios from the pagecache. That said, currently writeback_iter() has some issues that make it less than ideal: (1) there's no way to cancel the iteration, even if you find a "temporary" error that means the current folio and all subsequent folios are going to fail; (2) there's no way to filter the folios being written back - something that will impact Ceph with it's ordered snap system; (3) and if you get a folio you can't immediately deal with (say you need to flush the preceding writes), you are left with a folio hanging in the locked state for the duration, when really we should unlock it and relock it later. In this new implementation, I use writeback_iter() to pump folios, progressively creating two parallel, but separate streams and cleaning up the finished folios as the subrequests complete. Either or both streams can contain gaps, and the subrequests in each stream can be of variable size, don't need to align with each other and don't need to align with the folios. Indeed, subrequests can cross folio boundaries, may cover several folios or a folio may be spanned by multiple folios, e.g.: +---+---+-----+-----+---+----------+ Folios: | | | | | | | +---+---+-----+-----+---+----------+ +------+------+ +----+----+ Upload: | | |.....| | | +------+------+ +----+----+ +------+------+------+------+------+ Cache: | | | | | | +------+------+------+------+------+ The progressive subrequest construction permits the algorithm to be preparing both the next upload to the server and the next write to the cache whilst the previous ones are already in progress. Throttling can be applied to control the rate of production of subrequests - and, in any case, we probably want to write them to the server in ascending order, particularly if the file will be extended. Content crypto can also be prepared at the same time as the subrequests and run asynchronously, with the prepped requests being stalled until the crypto catches up with them. This might also be useful for transport crypto, but that happens at a lower layer, so probably would be harder to pull off. The algorithm is split into three parts: (1) The issuer. This walks through the data, packaging it up, encrypting it and creating subrequests. The part of this that generates subrequests only deals with file positions and spans and so is usable for DIO/unbuffered writes as well as buffered writes. (2) The collector. This asynchronously collects completed subrequests, unlocks folios, frees crypto buffers and performs any retries. This runs in a work queue so that the issuer can return to the caller for writeback (so that the VM can have its kswapd thread back) or async writes. (3) The retryer. This pauses the issuer, waits for all outstanding subrequests to complete and then goes through the failed subrequests to reissue them. This may involve reprepping them (with cifs, the credits must be renegotiated, and a subrequest may need splitting), and doing RMW for content crypto if there's a conflicting change on the server. [!] Note that some of the functions are prefixed with "new_" to avoid clashes with existing functions. These will be renamed in a later patch that cuts over to the new algorithm. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Eric Van Hensbergen <ericvh@kernel.org> cc: Latchesar Ionkov <lucho@ionkov.net> cc: Dominique Martinet <asmadeus@codewreck.org> cc: Christian Schoenebeck <linux_oss@crudebyte.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: v9fs@lists.linux.dev cc: linux-afs@lists.infradead.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org
2024-03-19 00:52:05 +08:00
EM(netfs_rreq_trace_put_work_complete, "PUT WORK CP") \
EM(netfs_rreq_trace_put_work_nq, "PUT WORK NQ") \
netfs: Dispatch write requests to process a writeback slice Dispatch one or more write reqeusts to process a writeback slice, where a slice is tailored more to logical block divisions within the file (such as crypto blocks, an object layout or cache granules) than the protocol RPC maximum capacity. The dispatch doesn't happen until throttling allows, at which point the entire writeback slice is processed and queued. A slice may be written to multiple destinations (one or more servers and the local cache) and the writes to each destination might be split up along different lines. The writeback slice holds the required folios pinned. An iov_iter is provided in netfs_write_request that describes the buffer to be used. This may be part of the pagecache, may have auxiliary padding pages attached or may be a bounce buffer resulting from crypto or compression. Consequently, the filesystem must not twiddle the folio markings directly. The following API is available to the filesystem: (1) The ->create_write_requests() method is called to ask the filesystem to create the requests it needs. This is passed the writeback slice to be processed. (2) The filesystem should then call netfs_create_write_request() to create the requests it needs. (3) Once a request is initialised, netfs_queue_write_request() can be called to dispatch it asynchronously, if not completed immediately. (4) netfs_write_request_completed() should be called to note the completion of a request. (5) netfs_get_write_request() and netfs_put_write_request() are provided to refcount a request. These take constants from the netfs_wreq_trace enum for logging into ftrace. (6) The ->free_write_request is method is called to ask the filesystem to clean up a request. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2021-06-30 05:31:48 +08:00
EM(netfs_rreq_trace_see_work, "SEE WORK ") \
E_(netfs_rreq_trace_new, "NEW ")
#define netfs_sreq_ref_traces \
EM(netfs_sreq_trace_get_copy_to_cache, "GET COPY2C ") \
EM(netfs_sreq_trace_get_resubmit, "GET RESUBMIT") \
cifs: Implement netfslib hooks Provide implementation of the netfslib hooks that will be used by netfslib to ask cifs to set up and perform operations. Of particular note are (*) cifs_clamp_length() - This is used to negotiate the size of the next subrequest in a read request, taking into account the credit available and the rsize. The credits are attached to the subrequest. (*) cifs_req_issue_read() - This is used to issue a subrequest that has been set up and clamped. (*) cifs_prepare_write() - This prepares to fill a subrequest by picking a channel, reopening the file and requesting credits so that we can set the maximum size of the subrequest and also sets the maximum number of segments if we're doing RDMA. (*) cifs_issue_write() - This releases any unneeded credits and issues an asynchronous data write for the contiguous slice of file covered by the subrequest. This should possibly be folded in to all ->async_writev() ops and that called directly. (*) cifs_begin_writeback() - This gets the cached writable handle through which we do writeback (this does not affect writethrough, unbuffered or direct writes). At this point, cifs is not wired up to actually *use* netfslib; that will be done in a subsequent patch. Signed-off-by: David Howells <dhowells@redhat.com> cc: Steve French <sfrench@samba.org> cc: Shyam Prasad N <nspmangalore@gmail.com> cc: Rohith Surabattula <rohiths.msft@gmail.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2023-10-07 01:16:15 +08:00
EM(netfs_sreq_trace_get_submit, "GET SUBMIT") \
EM(netfs_sreq_trace_get_short_read, "GET SHORTRD") \
EM(netfs_sreq_trace_new, "NEW ") \
netfs: New writeback implementation The current netfslib writeback implementation creates writeback requests of contiguous folio data and then separately tiles subrequests over the space twice, once for the server and once for the cache. This creates a few issues: (1) Every time there's a discontiguity or a change between writing to only one destination or writing to both, it must create a new request. This makes it harder to do vectored writes. (2) The folios don't have the writeback mark removed until the end of the request - and a request could be hundreds of megabytes. (3) In future, I want to support a larger cache granularity, which will require aggregation of some folios that contain unmodified data (which only need to go to the cache) and some which contain modifications (which need to be uploaded and stored to the cache) - but, currently, these are treated as discontiguous. There's also a move to get everyone to use writeback_iter() to extract writable folios from the pagecache. That said, currently writeback_iter() has some issues that make it less than ideal: (1) there's no way to cancel the iteration, even if you find a "temporary" error that means the current folio and all subsequent folios are going to fail; (2) there's no way to filter the folios being written back - something that will impact Ceph with it's ordered snap system; (3) and if you get a folio you can't immediately deal with (say you need to flush the preceding writes), you are left with a folio hanging in the locked state for the duration, when really we should unlock it and relock it later. In this new implementation, I use writeback_iter() to pump folios, progressively creating two parallel, but separate streams and cleaning up the finished folios as the subrequests complete. Either or both streams can contain gaps, and the subrequests in each stream can be of variable size, don't need to align with each other and don't need to align with the folios. Indeed, subrequests can cross folio boundaries, may cover several folios or a folio may be spanned by multiple folios, e.g.: +---+---+-----+-----+---+----------+ Folios: | | | | | | | +---+---+-----+-----+---+----------+ +------+------+ +----+----+ Upload: | | |.....| | | +------+------+ +----+----+ +------+------+------+------+------+ Cache: | | | | | | +------+------+------+------+------+ The progressive subrequest construction permits the algorithm to be preparing both the next upload to the server and the next write to the cache whilst the previous ones are already in progress. Throttling can be applied to control the rate of production of subrequests - and, in any case, we probably want to write them to the server in ascending order, particularly if the file will be extended. Content crypto can also be prepared at the same time as the subrequests and run asynchronously, with the prepped requests being stalled until the crypto catches up with them. This might also be useful for transport crypto, but that happens at a lower layer, so probably would be harder to pull off. The algorithm is split into three parts: (1) The issuer. This walks through the data, packaging it up, encrypting it and creating subrequests. The part of this that generates subrequests only deals with file positions and spans and so is usable for DIO/unbuffered writes as well as buffered writes. (2) The collector. This asynchronously collects completed subrequests, unlocks folios, frees crypto buffers and performs any retries. This runs in a work queue so that the issuer can return to the caller for writeback (so that the VM can have its kswapd thread back) or async writes. (3) The retryer. This pauses the issuer, waits for all outstanding subrequests to complete and then goes through the failed subrequests to reissue them. This may involve reprepping them (with cifs, the credits must be renegotiated, and a subrequest may need splitting), and doing RMW for content crypto if there's a conflicting change on the server. [!] Note that some of the functions are prefixed with "new_" to avoid clashes with existing functions. These will be renamed in a later patch that cuts over to the new algorithm. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Eric Van Hensbergen <ericvh@kernel.org> cc: Latchesar Ionkov <lucho@ionkov.net> cc: Dominique Martinet <asmadeus@codewreck.org> cc: Christian Schoenebeck <linux_oss@crudebyte.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: v9fs@lists.linux.dev cc: linux-afs@lists.infradead.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org
2024-03-19 00:52:05 +08:00
EM(netfs_sreq_trace_put_cancel, "PUT CANCEL ") \
EM(netfs_sreq_trace_put_clear, "PUT CLEAR ") \
netfs: Dispatch write requests to process a writeback slice Dispatch one or more write reqeusts to process a writeback slice, where a slice is tailored more to logical block divisions within the file (such as crypto blocks, an object layout or cache granules) than the protocol RPC maximum capacity. The dispatch doesn't happen until throttling allows, at which point the entire writeback slice is processed and queued. A slice may be written to multiple destinations (one or more servers and the local cache) and the writes to each destination might be split up along different lines. The writeback slice holds the required folios pinned. An iov_iter is provided in netfs_write_request that describes the buffer to be used. This may be part of the pagecache, may have auxiliary padding pages attached or may be a bounce buffer resulting from crypto or compression. Consequently, the filesystem must not twiddle the folio markings directly. The following API is available to the filesystem: (1) The ->create_write_requests() method is called to ask the filesystem to create the requests it needs. This is passed the writeback slice to be processed. (2) The filesystem should then call netfs_create_write_request() to create the requests it needs. (3) Once a request is initialised, netfs_queue_write_request() can be called to dispatch it asynchronously, if not completed immediately. (4) netfs_write_request_completed() should be called to note the completion of a request. (5) netfs_get_write_request() and netfs_put_write_request() are provided to refcount a request. These take constants from the netfs_wreq_trace enum for logging into ftrace. (6) The ->free_write_request is method is called to ask the filesystem to clean up a request. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2021-06-30 05:31:48 +08:00
EM(netfs_sreq_trace_put_discard, "PUT DISCARD") \
netfs: New writeback implementation The current netfslib writeback implementation creates writeback requests of contiguous folio data and then separately tiles subrequests over the space twice, once for the server and once for the cache. This creates a few issues: (1) Every time there's a discontiguity or a change between writing to only one destination or writing to both, it must create a new request. This makes it harder to do vectored writes. (2) The folios don't have the writeback mark removed until the end of the request - and a request could be hundreds of megabytes. (3) In future, I want to support a larger cache granularity, which will require aggregation of some folios that contain unmodified data (which only need to go to the cache) and some which contain modifications (which need to be uploaded and stored to the cache) - but, currently, these are treated as discontiguous. There's also a move to get everyone to use writeback_iter() to extract writable folios from the pagecache. That said, currently writeback_iter() has some issues that make it less than ideal: (1) there's no way to cancel the iteration, even if you find a "temporary" error that means the current folio and all subsequent folios are going to fail; (2) there's no way to filter the folios being written back - something that will impact Ceph with it's ordered snap system; (3) and if you get a folio you can't immediately deal with (say you need to flush the preceding writes), you are left with a folio hanging in the locked state for the duration, when really we should unlock it and relock it later. In this new implementation, I use writeback_iter() to pump folios, progressively creating two parallel, but separate streams and cleaning up the finished folios as the subrequests complete. Either or both streams can contain gaps, and the subrequests in each stream can be of variable size, don't need to align with each other and don't need to align with the folios. Indeed, subrequests can cross folio boundaries, may cover several folios or a folio may be spanned by multiple folios, e.g.: +---+---+-----+-----+---+----------+ Folios: | | | | | | | +---+---+-----+-----+---+----------+ +------+------+ +----+----+ Upload: | | |.....| | | +------+------+ +----+----+ +------+------+------+------+------+ Cache: | | | | | | +------+------+------+------+------+ The progressive subrequest construction permits the algorithm to be preparing both the next upload to the server and the next write to the cache whilst the previous ones are already in progress. Throttling can be applied to control the rate of production of subrequests - and, in any case, we probably want to write them to the server in ascending order, particularly if the file will be extended. Content crypto can also be prepared at the same time as the subrequests and run asynchronously, with the prepped requests being stalled until the crypto catches up with them. This might also be useful for transport crypto, but that happens at a lower layer, so probably would be harder to pull off. The algorithm is split into three parts: (1) The issuer. This walks through the data, packaging it up, encrypting it and creating subrequests. The part of this that generates subrequests only deals with file positions and spans and so is usable for DIO/unbuffered writes as well as buffered writes. (2) The collector. This asynchronously collects completed subrequests, unlocks folios, frees crypto buffers and performs any retries. This runs in a work queue so that the issuer can return to the caller for writeback (so that the VM can have its kswapd thread back) or async writes. (3) The retryer. This pauses the issuer, waits for all outstanding subrequests to complete and then goes through the failed subrequests to reissue them. This may involve reprepping them (with cifs, the credits must be renegotiated, and a subrequest may need splitting), and doing RMW for content crypto if there's a conflicting change on the server. [!] Note that some of the functions are prefixed with "new_" to avoid clashes with existing functions. These will be renamed in a later patch that cuts over to the new algorithm. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Eric Van Hensbergen <ericvh@kernel.org> cc: Latchesar Ionkov <lucho@ionkov.net> cc: Dominique Martinet <asmadeus@codewreck.org> cc: Christian Schoenebeck <linux_oss@crudebyte.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: v9fs@lists.linux.dev cc: linux-afs@lists.infradead.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org
2024-03-19 00:52:05 +08:00
EM(netfs_sreq_trace_put_done, "PUT DONE ") \
EM(netfs_sreq_trace_put_failed, "PUT FAILED ") \
EM(netfs_sreq_trace_put_merged, "PUT MERGED ") \
EM(netfs_sreq_trace_put_no_copy, "PUT NO COPY") \
netfs: New writeback implementation The current netfslib writeback implementation creates writeback requests of contiguous folio data and then separately tiles subrequests over the space twice, once for the server and once for the cache. This creates a few issues: (1) Every time there's a discontiguity or a change between writing to only one destination or writing to both, it must create a new request. This makes it harder to do vectored writes. (2) The folios don't have the writeback mark removed until the end of the request - and a request could be hundreds of megabytes. (3) In future, I want to support a larger cache granularity, which will require aggregation of some folios that contain unmodified data (which only need to go to the cache) and some which contain modifications (which need to be uploaded and stored to the cache) - but, currently, these are treated as discontiguous. There's also a move to get everyone to use writeback_iter() to extract writable folios from the pagecache. That said, currently writeback_iter() has some issues that make it less than ideal: (1) there's no way to cancel the iteration, even if you find a "temporary" error that means the current folio and all subsequent folios are going to fail; (2) there's no way to filter the folios being written back - something that will impact Ceph with it's ordered snap system; (3) and if you get a folio you can't immediately deal with (say you need to flush the preceding writes), you are left with a folio hanging in the locked state for the duration, when really we should unlock it and relock it later. In this new implementation, I use writeback_iter() to pump folios, progressively creating two parallel, but separate streams and cleaning up the finished folios as the subrequests complete. Either or both streams can contain gaps, and the subrequests in each stream can be of variable size, don't need to align with each other and don't need to align with the folios. Indeed, subrequests can cross folio boundaries, may cover several folios or a folio may be spanned by multiple folios, e.g.: +---+---+-----+-----+---+----------+ Folios: | | | | | | | +---+---+-----+-----+---+----------+ +------+------+ +----+----+ Upload: | | |.....| | | +------+------+ +----+----+ +------+------+------+------+------+ Cache: | | | | | | +------+------+------+------+------+ The progressive subrequest construction permits the algorithm to be preparing both the next upload to the server and the next write to the cache whilst the previous ones are already in progress. Throttling can be applied to control the rate of production of subrequests - and, in any case, we probably want to write them to the server in ascending order, particularly if the file will be extended. Content crypto can also be prepared at the same time as the subrequests and run asynchronously, with the prepped requests being stalled until the crypto catches up with them. This might also be useful for transport crypto, but that happens at a lower layer, so probably would be harder to pull off. The algorithm is split into three parts: (1) The issuer. This walks through the data, packaging it up, encrypting it and creating subrequests. The part of this that generates subrequests only deals with file positions and spans and so is usable for DIO/unbuffered writes as well as buffered writes. (2) The collector. This asynchronously collects completed subrequests, unlocks folios, frees crypto buffers and performs any retries. This runs in a work queue so that the issuer can return to the caller for writeback (so that the VM can have its kswapd thread back) or async writes. (3) The retryer. This pauses the issuer, waits for all outstanding subrequests to complete and then goes through the failed subrequests to reissue them. This may involve reprepping them (with cifs, the credits must be renegotiated, and a subrequest may need splitting), and doing RMW for content crypto if there's a conflicting change on the server. [!] Note that some of the functions are prefixed with "new_" to avoid clashes with existing functions. These will be renamed in a later patch that cuts over to the new algorithm. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Eric Van Hensbergen <ericvh@kernel.org> cc: Latchesar Ionkov <lucho@ionkov.net> cc: Dominique Martinet <asmadeus@codewreck.org> cc: Christian Schoenebeck <linux_oss@crudebyte.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: v9fs@lists.linux.dev cc: linux-afs@lists.infradead.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org
2024-03-19 00:52:05 +08:00
EM(netfs_sreq_trace_put_oom, "PUT OOM ") \
netfs: Dispatch write requests to process a writeback slice Dispatch one or more write reqeusts to process a writeback slice, where a slice is tailored more to logical block divisions within the file (such as crypto blocks, an object layout or cache granules) than the protocol RPC maximum capacity. The dispatch doesn't happen until throttling allows, at which point the entire writeback slice is processed and queued. A slice may be written to multiple destinations (one or more servers and the local cache) and the writes to each destination might be split up along different lines. The writeback slice holds the required folios pinned. An iov_iter is provided in netfs_write_request that describes the buffer to be used. This may be part of the pagecache, may have auxiliary padding pages attached or may be a bounce buffer resulting from crypto or compression. Consequently, the filesystem must not twiddle the folio markings directly. The following API is available to the filesystem: (1) The ->create_write_requests() method is called to ask the filesystem to create the requests it needs. This is passed the writeback slice to be processed. (2) The filesystem should then call netfs_create_write_request() to create the requests it needs. (3) Once a request is initialised, netfs_queue_write_request() can be called to dispatch it asynchronously, if not completed immediately. (4) netfs_write_request_completed() should be called to note the completion of a request. (5) netfs_get_write_request() and netfs_put_write_request() are provided to refcount a request. These take constants from the netfs_wreq_trace enum for logging into ftrace. (6) The ->free_write_request is method is called to ask the filesystem to clean up a request. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2021-06-30 05:31:48 +08:00
EM(netfs_sreq_trace_put_wip, "PUT WIP ") \
EM(netfs_sreq_trace_put_work, "PUT WORK ") \
E_(netfs_sreq_trace_put_terminated, "PUT TERM ")
#define netfs_folio_traces \
/* The first few correspond to enum netfs_how_to_modify */ \
EM(netfs_folio_is_uptodate, "mod-uptodate") \
EM(netfs_just_prefetch, "mod-prefetch") \
EM(netfs_whole_folio_modify, "mod-whole-f") \
EM(netfs_modify_and_clear, "mod-n-clear") \
EM(netfs_streaming_write, "mod-streamw") \
EM(netfs_streaming_write_cont, "mod-streamw+") \
EM(netfs_flush_content, "flush") \
EM(netfs_streaming_filled_page, "mod-streamw-f") \
EM(netfs_streaming_cont_filled_page, "mod-streamw-f+") \
/* The rest are for writeback */ \
netfs: New writeback implementation The current netfslib writeback implementation creates writeback requests of contiguous folio data and then separately tiles subrequests over the space twice, once for the server and once for the cache. This creates a few issues: (1) Every time there's a discontiguity or a change between writing to only one destination or writing to both, it must create a new request. This makes it harder to do vectored writes. (2) The folios don't have the writeback mark removed until the end of the request - and a request could be hundreds of megabytes. (3) In future, I want to support a larger cache granularity, which will require aggregation of some folios that contain unmodified data (which only need to go to the cache) and some which contain modifications (which need to be uploaded and stored to the cache) - but, currently, these are treated as discontiguous. There's also a move to get everyone to use writeback_iter() to extract writable folios from the pagecache. That said, currently writeback_iter() has some issues that make it less than ideal: (1) there's no way to cancel the iteration, even if you find a "temporary" error that means the current folio and all subsequent folios are going to fail; (2) there's no way to filter the folios being written back - something that will impact Ceph with it's ordered snap system; (3) and if you get a folio you can't immediately deal with (say you need to flush the preceding writes), you are left with a folio hanging in the locked state for the duration, when really we should unlock it and relock it later. In this new implementation, I use writeback_iter() to pump folios, progressively creating two parallel, but separate streams and cleaning up the finished folios as the subrequests complete. Either or both streams can contain gaps, and the subrequests in each stream can be of variable size, don't need to align with each other and don't need to align with the folios. Indeed, subrequests can cross folio boundaries, may cover several folios or a folio may be spanned by multiple folios, e.g.: +---+---+-----+-----+---+----------+ Folios: | | | | | | | +---+---+-----+-----+---+----------+ +------+------+ +----+----+ Upload: | | |.....| | | +------+------+ +----+----+ +------+------+------+------+------+ Cache: | | | | | | +------+------+------+------+------+ The progressive subrequest construction permits the algorithm to be preparing both the next upload to the server and the next write to the cache whilst the previous ones are already in progress. Throttling can be applied to control the rate of production of subrequests - and, in any case, we probably want to write them to the server in ascending order, particularly if the file will be extended. Content crypto can also be prepared at the same time as the subrequests and run asynchronously, with the prepped requests being stalled until the crypto catches up with them. This might also be useful for transport crypto, but that happens at a lower layer, so probably would be harder to pull off. The algorithm is split into three parts: (1) The issuer. This walks through the data, packaging it up, encrypting it and creating subrequests. The part of this that generates subrequests only deals with file positions and spans and so is usable for DIO/unbuffered writes as well as buffered writes. (2) The collector. This asynchronously collects completed subrequests, unlocks folios, frees crypto buffers and performs any retries. This runs in a work queue so that the issuer can return to the caller for writeback (so that the VM can have its kswapd thread back) or async writes. (3) The retryer. This pauses the issuer, waits for all outstanding subrequests to complete and then goes through the failed subrequests to reissue them. This may involve reprepping them (with cifs, the credits must be renegotiated, and a subrequest may need splitting), and doing RMW for content crypto if there's a conflicting change on the server. [!] Note that some of the functions are prefixed with "new_" to avoid clashes with existing functions. These will be renamed in a later patch that cuts over to the new algorithm. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Eric Van Hensbergen <ericvh@kernel.org> cc: Latchesar Ionkov <lucho@ionkov.net> cc: Dominique Martinet <asmadeus@codewreck.org> cc: Christian Schoenebeck <linux_oss@crudebyte.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: v9fs@lists.linux.dev cc: linux-afs@lists.infradead.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org
2024-03-19 00:52:05 +08:00
EM(netfs_folio_trace_cancel_copy, "cancel-copy") \
EM(netfs_folio_trace_clear, "clear") \
netfs: New writeback implementation The current netfslib writeback implementation creates writeback requests of contiguous folio data and then separately tiles subrequests over the space twice, once for the server and once for the cache. This creates a few issues: (1) Every time there's a discontiguity or a change between writing to only one destination or writing to both, it must create a new request. This makes it harder to do vectored writes. (2) The folios don't have the writeback mark removed until the end of the request - and a request could be hundreds of megabytes. (3) In future, I want to support a larger cache granularity, which will require aggregation of some folios that contain unmodified data (which only need to go to the cache) and some which contain modifications (which need to be uploaded and stored to the cache) - but, currently, these are treated as discontiguous. There's also a move to get everyone to use writeback_iter() to extract writable folios from the pagecache. That said, currently writeback_iter() has some issues that make it less than ideal: (1) there's no way to cancel the iteration, even if you find a "temporary" error that means the current folio and all subsequent folios are going to fail; (2) there's no way to filter the folios being written back - something that will impact Ceph with it's ordered snap system; (3) and if you get a folio you can't immediately deal with (say you need to flush the preceding writes), you are left with a folio hanging in the locked state for the duration, when really we should unlock it and relock it later. In this new implementation, I use writeback_iter() to pump folios, progressively creating two parallel, but separate streams and cleaning up the finished folios as the subrequests complete. Either or both streams can contain gaps, and the subrequests in each stream can be of variable size, don't need to align with each other and don't need to align with the folios. Indeed, subrequests can cross folio boundaries, may cover several folios or a folio may be spanned by multiple folios, e.g.: +---+---+-----+-----+---+----------+ Folios: | | | | | | | +---+---+-----+-----+---+----------+ +------+------+ +----+----+ Upload: | | |.....| | | +------+------+ +----+----+ +------+------+------+------+------+ Cache: | | | | | | +------+------+------+------+------+ The progressive subrequest construction permits the algorithm to be preparing both the next upload to the server and the next write to the cache whilst the previous ones are already in progress. Throttling can be applied to control the rate of production of subrequests - and, in any case, we probably want to write them to the server in ascending order, particularly if the file will be extended. Content crypto can also be prepared at the same time as the subrequests and run asynchronously, with the prepped requests being stalled until the crypto catches up with them. This might also be useful for transport crypto, but that happens at a lower layer, so probably would be harder to pull off. The algorithm is split into three parts: (1) The issuer. This walks through the data, packaging it up, encrypting it and creating subrequests. The part of this that generates subrequests only deals with file positions and spans and so is usable for DIO/unbuffered writes as well as buffered writes. (2) The collector. This asynchronously collects completed subrequests, unlocks folios, frees crypto buffers and performs any retries. This runs in a work queue so that the issuer can return to the caller for writeback (so that the VM can have its kswapd thread back) or async writes. (3) The retryer. This pauses the issuer, waits for all outstanding subrequests to complete and then goes through the failed subrequests to reissue them. This may involve reprepping them (with cifs, the credits must be renegotiated, and a subrequest may need splitting), and doing RMW for content crypto if there's a conflicting change on the server. [!] Note that some of the functions are prefixed with "new_" to avoid clashes with existing functions. These will be renamed in a later patch that cuts over to the new algorithm. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Eric Van Hensbergen <ericvh@kernel.org> cc: Latchesar Ionkov <lucho@ionkov.net> cc: Dominique Martinet <asmadeus@codewreck.org> cc: Christian Schoenebeck <linux_oss@crudebyte.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: v9fs@lists.linux.dev cc: linux-afs@lists.infradead.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org
2024-03-19 00:52:05 +08:00
EM(netfs_folio_trace_clear_cc, "clear-cc") \
EM(netfs_folio_trace_clear_g, "clear-g") \
EM(netfs_folio_trace_clear_s, "clear-s") \
netfs: Replace PG_fscache by setting folio->private and marking dirty When dirty data is being written to the cache, setting/waiting on/clearing the fscache flag is always done in tandem with setting/waiting on/clearing the writeback flag. The netfslib buffered write routines wait on and set both flags and the write request cleanup clears both flags, so the fscache flag is almost superfluous. The reason it isn't superfluous is because the fscache flag is also used to indicate that data just read from the server is being written to the cache. The flag is used to prevent a race involving overlapping direct-I/O writes to the cache. Change this to indicate that a page is in need of being copied to the cache by placing a magic value in folio->private and marking the folios dirty. Then when the writeback code sees a folio marked in this way, it only writes it to the cache and not to the server. If a folio that has this magic value set is modified, the value is just replaced and the folio will then be uplodaded too. With this, PG_fscache is no longer required by the netfslib core, 9p and afs. Ceph and nfs, however, still need to use the old PG_fscache-based tracking. To deal with this, a flag, NETFS_ICTX_USE_PGPRIV2, now has to be set on the flags in the netfs_inode struct for those filesystems. This reenables the use of PG_fscache in that inode. 9p and afs use the netfslib write helpers so get switched over; cifs, for the moment, does page-by-page manual access to the cache, so doesn't use PG_fscache and is unaffected. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Matthew Wilcox (Oracle) <willy@infradead.org> cc: Eric Van Hensbergen <ericvh@kernel.org> cc: Latchesar Ionkov <lucho@ionkov.net> cc: Dominique Martinet <asmadeus@codewreck.org> cc: Christian Schoenebeck <linux_oss@crudebyte.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Ilya Dryomov <idryomov@gmail.com> cc: Xiubo Li <xiubli@redhat.com> cc: Steve French <sfrench@samba.org> cc: Paulo Alcantara <pc@manguebit.com> cc: Ronnie Sahlberg <ronniesahlberg@gmail.com> cc: Shyam Prasad N <sprasad@microsoft.com> cc: Tom Talpey <tom@talpey.com> cc: Bharath SM <bharathsm@microsoft.com> cc: Trond Myklebust <trond.myklebust@hammerspace.com> cc: Anna Schumaker <anna@kernel.org> cc: netfs@lists.linux.dev cc: v9fs@lists.linux.dev cc: linux-afs@lists.infradead.org cc: ceph-devel@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: linux-nfs@vger.kernel.org cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2024-03-19 18:00:09 +08:00
EM(netfs_folio_trace_copy_to_cache, "mark-copy") \
EM(netfs_folio_trace_filled_gaps, "filled-gaps") \
EM(netfs_folio_trace_kill, "kill") \
netfs: New writeback implementation The current netfslib writeback implementation creates writeback requests of contiguous folio data and then separately tiles subrequests over the space twice, once for the server and once for the cache. This creates a few issues: (1) Every time there's a discontiguity or a change between writing to only one destination or writing to both, it must create a new request. This makes it harder to do vectored writes. (2) The folios don't have the writeback mark removed until the end of the request - and a request could be hundreds of megabytes. (3) In future, I want to support a larger cache granularity, which will require aggregation of some folios that contain unmodified data (which only need to go to the cache) and some which contain modifications (which need to be uploaded and stored to the cache) - but, currently, these are treated as discontiguous. There's also a move to get everyone to use writeback_iter() to extract writable folios from the pagecache. That said, currently writeback_iter() has some issues that make it less than ideal: (1) there's no way to cancel the iteration, even if you find a "temporary" error that means the current folio and all subsequent folios are going to fail; (2) there's no way to filter the folios being written back - something that will impact Ceph with it's ordered snap system; (3) and if you get a folio you can't immediately deal with (say you need to flush the preceding writes), you are left with a folio hanging in the locked state for the duration, when really we should unlock it and relock it later. In this new implementation, I use writeback_iter() to pump folios, progressively creating two parallel, but separate streams and cleaning up the finished folios as the subrequests complete. Either or both streams can contain gaps, and the subrequests in each stream can be of variable size, don't need to align with each other and don't need to align with the folios. Indeed, subrequests can cross folio boundaries, may cover several folios or a folio may be spanned by multiple folios, e.g.: +---+---+-----+-----+---+----------+ Folios: | | | | | | | +---+---+-----+-----+---+----------+ +------+------+ +----+----+ Upload: | | |.....| | | +------+------+ +----+----+ +------+------+------+------+------+ Cache: | | | | | | +------+------+------+------+------+ The progressive subrequest construction permits the algorithm to be preparing both the next upload to the server and the next write to the cache whilst the previous ones are already in progress. Throttling can be applied to control the rate of production of subrequests - and, in any case, we probably want to write them to the server in ascending order, particularly if the file will be extended. Content crypto can also be prepared at the same time as the subrequests and run asynchronously, with the prepped requests being stalled until the crypto catches up with them. This might also be useful for transport crypto, but that happens at a lower layer, so probably would be harder to pull off. The algorithm is split into three parts: (1) The issuer. This walks through the data, packaging it up, encrypting it and creating subrequests. The part of this that generates subrequests only deals with file positions and spans and so is usable for DIO/unbuffered writes as well as buffered writes. (2) The collector. This asynchronously collects completed subrequests, unlocks folios, frees crypto buffers and performs any retries. This runs in a work queue so that the issuer can return to the caller for writeback (so that the VM can have its kswapd thread back) or async writes. (3) The retryer. This pauses the issuer, waits for all outstanding subrequests to complete and then goes through the failed subrequests to reissue them. This may involve reprepping them (with cifs, the credits must be renegotiated, and a subrequest may need splitting), and doing RMW for content crypto if there's a conflicting change on the server. [!] Note that some of the functions are prefixed with "new_" to avoid clashes with existing functions. These will be renamed in a later patch that cuts over to the new algorithm. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Eric Van Hensbergen <ericvh@kernel.org> cc: Latchesar Ionkov <lucho@ionkov.net> cc: Dominique Martinet <asmadeus@codewreck.org> cc: Christian Schoenebeck <linux_oss@crudebyte.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: v9fs@lists.linux.dev cc: linux-afs@lists.infradead.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org
2024-03-19 00:52:05 +08:00
EM(netfs_folio_trace_kill_cc, "kill-cc") \
EM(netfs_folio_trace_kill_g, "kill-g") \
EM(netfs_folio_trace_kill_s, "kill-s") \
EM(netfs_folio_trace_mkwrite, "mkwrite") \
EM(netfs_folio_trace_mkwrite_plus, "mkwrite+") \
netfs: New writeback implementation The current netfslib writeback implementation creates writeback requests of contiguous folio data and then separately tiles subrequests over the space twice, once for the server and once for the cache. This creates a few issues: (1) Every time there's a discontiguity or a change between writing to only one destination or writing to both, it must create a new request. This makes it harder to do vectored writes. (2) The folios don't have the writeback mark removed until the end of the request - and a request could be hundreds of megabytes. (3) In future, I want to support a larger cache granularity, which will require aggregation of some folios that contain unmodified data (which only need to go to the cache) and some which contain modifications (which need to be uploaded and stored to the cache) - but, currently, these are treated as discontiguous. There's also a move to get everyone to use writeback_iter() to extract writable folios from the pagecache. That said, currently writeback_iter() has some issues that make it less than ideal: (1) there's no way to cancel the iteration, even if you find a "temporary" error that means the current folio and all subsequent folios are going to fail; (2) there's no way to filter the folios being written back - something that will impact Ceph with it's ordered snap system; (3) and if you get a folio you can't immediately deal with (say you need to flush the preceding writes), you are left with a folio hanging in the locked state for the duration, when really we should unlock it and relock it later. In this new implementation, I use writeback_iter() to pump folios, progressively creating two parallel, but separate streams and cleaning up the finished folios as the subrequests complete. Either or both streams can contain gaps, and the subrequests in each stream can be of variable size, don't need to align with each other and don't need to align with the folios. Indeed, subrequests can cross folio boundaries, may cover several folios or a folio may be spanned by multiple folios, e.g.: +---+---+-----+-----+---+----------+ Folios: | | | | | | | +---+---+-----+-----+---+----------+ +------+------+ +----+----+ Upload: | | |.....| | | +------+------+ +----+----+ +------+------+------+------+------+ Cache: | | | | | | +------+------+------+------+------+ The progressive subrequest construction permits the algorithm to be preparing both the next upload to the server and the next write to the cache whilst the previous ones are already in progress. Throttling can be applied to control the rate of production of subrequests - and, in any case, we probably want to write them to the server in ascending order, particularly if the file will be extended. Content crypto can also be prepared at the same time as the subrequests and run asynchronously, with the prepped requests being stalled until the crypto catches up with them. This might also be useful for transport crypto, but that happens at a lower layer, so probably would be harder to pull off. The algorithm is split into three parts: (1) The issuer. This walks through the data, packaging it up, encrypting it and creating subrequests. The part of this that generates subrequests only deals with file positions and spans and so is usable for DIO/unbuffered writes as well as buffered writes. (2) The collector. This asynchronously collects completed subrequests, unlocks folios, frees crypto buffers and performs any retries. This runs in a work queue so that the issuer can return to the caller for writeback (so that the VM can have its kswapd thread back) or async writes. (3) The retryer. This pauses the issuer, waits for all outstanding subrequests to complete and then goes through the failed subrequests to reissue them. This may involve reprepping them (with cifs, the credits must be renegotiated, and a subrequest may need splitting), and doing RMW for content crypto if there's a conflicting change on the server. [!] Note that some of the functions are prefixed with "new_" to avoid clashes with existing functions. These will be renamed in a later patch that cuts over to the new algorithm. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Eric Van Hensbergen <ericvh@kernel.org> cc: Latchesar Ionkov <lucho@ionkov.net> cc: Dominique Martinet <asmadeus@codewreck.org> cc: Christian Schoenebeck <linux_oss@crudebyte.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: v9fs@lists.linux.dev cc: linux-afs@lists.infradead.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org
2024-03-19 00:52:05 +08:00
EM(netfs_folio_trace_not_under_wback, "!wback") \
EM(netfs_folio_trace_read_gaps, "read-gaps") \
EM(netfs_folio_trace_redirtied, "redirtied") \
EM(netfs_folio_trace_store, "store") \
netfs: New writeback implementation The current netfslib writeback implementation creates writeback requests of contiguous folio data and then separately tiles subrequests over the space twice, once for the server and once for the cache. This creates a few issues: (1) Every time there's a discontiguity or a change between writing to only one destination or writing to both, it must create a new request. This makes it harder to do vectored writes. (2) The folios don't have the writeback mark removed until the end of the request - and a request could be hundreds of megabytes. (3) In future, I want to support a larger cache granularity, which will require aggregation of some folios that contain unmodified data (which only need to go to the cache) and some which contain modifications (which need to be uploaded and stored to the cache) - but, currently, these are treated as discontiguous. There's also a move to get everyone to use writeback_iter() to extract writable folios from the pagecache. That said, currently writeback_iter() has some issues that make it less than ideal: (1) there's no way to cancel the iteration, even if you find a "temporary" error that means the current folio and all subsequent folios are going to fail; (2) there's no way to filter the folios being written back - something that will impact Ceph with it's ordered snap system; (3) and if you get a folio you can't immediately deal with (say you need to flush the preceding writes), you are left with a folio hanging in the locked state for the duration, when really we should unlock it and relock it later. In this new implementation, I use writeback_iter() to pump folios, progressively creating two parallel, but separate streams and cleaning up the finished folios as the subrequests complete. Either or both streams can contain gaps, and the subrequests in each stream can be of variable size, don't need to align with each other and don't need to align with the folios. Indeed, subrequests can cross folio boundaries, may cover several folios or a folio may be spanned by multiple folios, e.g.: +---+---+-----+-----+---+----------+ Folios: | | | | | | | +---+---+-----+-----+---+----------+ +------+------+ +----+----+ Upload: | | |.....| | | +------+------+ +----+----+ +------+------+------+------+------+ Cache: | | | | | | +------+------+------+------+------+ The progressive subrequest construction permits the algorithm to be preparing both the next upload to the server and the next write to the cache whilst the previous ones are already in progress. Throttling can be applied to control the rate of production of subrequests - and, in any case, we probably want to write them to the server in ascending order, particularly if the file will be extended. Content crypto can also be prepared at the same time as the subrequests and run asynchronously, with the prepped requests being stalled until the crypto catches up with them. This might also be useful for transport crypto, but that happens at a lower layer, so probably would be harder to pull off. The algorithm is split into three parts: (1) The issuer. This walks through the data, packaging it up, encrypting it and creating subrequests. The part of this that generates subrequests only deals with file positions and spans and so is usable for DIO/unbuffered writes as well as buffered writes. (2) The collector. This asynchronously collects completed subrequests, unlocks folios, frees crypto buffers and performs any retries. This runs in a work queue so that the issuer can return to the caller for writeback (so that the VM can have its kswapd thread back) or async writes. (3) The retryer. This pauses the issuer, waits for all outstanding subrequests to complete and then goes through the failed subrequests to reissue them. This may involve reprepping them (with cifs, the credits must be renegotiated, and a subrequest may need splitting), and doing RMW for content crypto if there's a conflicting change on the server. [!] Note that some of the functions are prefixed with "new_" to avoid clashes with existing functions. These will be renamed in a later patch that cuts over to the new algorithm. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Eric Van Hensbergen <ericvh@kernel.org> cc: Latchesar Ionkov <lucho@ionkov.net> cc: Dominique Martinet <asmadeus@codewreck.org> cc: Christian Schoenebeck <linux_oss@crudebyte.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: v9fs@lists.linux.dev cc: linux-afs@lists.infradead.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org
2024-03-19 00:52:05 +08:00
EM(netfs_folio_trace_store_copy, "store-copy") \
EM(netfs_folio_trace_store_plus, "store+") \
EM(netfs_folio_trace_wthru, "wthru") \
E_(netfs_folio_trace_wthru_plus, "wthru+")
netfs: New writeback implementation The current netfslib writeback implementation creates writeback requests of contiguous folio data and then separately tiles subrequests over the space twice, once for the server and once for the cache. This creates a few issues: (1) Every time there's a discontiguity or a change between writing to only one destination or writing to both, it must create a new request. This makes it harder to do vectored writes. (2) The folios don't have the writeback mark removed until the end of the request - and a request could be hundreds of megabytes. (3) In future, I want to support a larger cache granularity, which will require aggregation of some folios that contain unmodified data (which only need to go to the cache) and some which contain modifications (which need to be uploaded and stored to the cache) - but, currently, these are treated as discontiguous. There's also a move to get everyone to use writeback_iter() to extract writable folios from the pagecache. That said, currently writeback_iter() has some issues that make it less than ideal: (1) there's no way to cancel the iteration, even if you find a "temporary" error that means the current folio and all subsequent folios are going to fail; (2) there's no way to filter the folios being written back - something that will impact Ceph with it's ordered snap system; (3) and if you get a folio you can't immediately deal with (say you need to flush the preceding writes), you are left with a folio hanging in the locked state for the duration, when really we should unlock it and relock it later. In this new implementation, I use writeback_iter() to pump folios, progressively creating two parallel, but separate streams and cleaning up the finished folios as the subrequests complete. Either or both streams can contain gaps, and the subrequests in each stream can be of variable size, don't need to align with each other and don't need to align with the folios. Indeed, subrequests can cross folio boundaries, may cover several folios or a folio may be spanned by multiple folios, e.g.: +---+---+-----+-----+---+----------+ Folios: | | | | | | | +---+---+-----+-----+---+----------+ +------+------+ +----+----+ Upload: | | |.....| | | +------+------+ +----+----+ +------+------+------+------+------+ Cache: | | | | | | +------+------+------+------+------+ The progressive subrequest construction permits the algorithm to be preparing both the next upload to the server and the next write to the cache whilst the previous ones are already in progress. Throttling can be applied to control the rate of production of subrequests - and, in any case, we probably want to write them to the server in ascending order, particularly if the file will be extended. Content crypto can also be prepared at the same time as the subrequests and run asynchronously, with the prepped requests being stalled until the crypto catches up with them. This might also be useful for transport crypto, but that happens at a lower layer, so probably would be harder to pull off. The algorithm is split into three parts: (1) The issuer. This walks through the data, packaging it up, encrypting it and creating subrequests. The part of this that generates subrequests only deals with file positions and spans and so is usable for DIO/unbuffered writes as well as buffered writes. (2) The collector. This asynchronously collects completed subrequests, unlocks folios, frees crypto buffers and performs any retries. This runs in a work queue so that the issuer can return to the caller for writeback (so that the VM can have its kswapd thread back) or async writes. (3) The retryer. This pauses the issuer, waits for all outstanding subrequests to complete and then goes through the failed subrequests to reissue them. This may involve reprepping them (with cifs, the credits must be renegotiated, and a subrequest may need splitting), and doing RMW for content crypto if there's a conflicting change on the server. [!] Note that some of the functions are prefixed with "new_" to avoid clashes with existing functions. These will be renamed in a later patch that cuts over to the new algorithm. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Eric Van Hensbergen <ericvh@kernel.org> cc: Latchesar Ionkov <lucho@ionkov.net> cc: Dominique Martinet <asmadeus@codewreck.org> cc: Christian Schoenebeck <linux_oss@crudebyte.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: v9fs@lists.linux.dev cc: linux-afs@lists.infradead.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org
2024-03-19 00:52:05 +08:00
#define netfs_collect_contig_traces \
EM(netfs_contig_trace_collect, "Collect") \
EM(netfs_contig_trace_jump, "-->JUMP-->") \
E_(netfs_contig_trace_unlock, "Unlock")
#ifndef __NETFS_DECLARE_TRACE_ENUMS_ONCE_ONLY
#define __NETFS_DECLARE_TRACE_ENUMS_ONCE_ONLY
#undef EM
#undef E_
#define EM(a, b) a,
#define E_(a, b) a
enum netfs_read_trace { netfs_read_traces } __mode(byte);
netfs: Dispatch write requests to process a writeback slice Dispatch one or more write reqeusts to process a writeback slice, where a slice is tailored more to logical block divisions within the file (such as crypto blocks, an object layout or cache granules) than the protocol RPC maximum capacity. The dispatch doesn't happen until throttling allows, at which point the entire writeback slice is processed and queued. A slice may be written to multiple destinations (one or more servers and the local cache) and the writes to each destination might be split up along different lines. The writeback slice holds the required folios pinned. An iov_iter is provided in netfs_write_request that describes the buffer to be used. This may be part of the pagecache, may have auxiliary padding pages attached or may be a bounce buffer resulting from crypto or compression. Consequently, the filesystem must not twiddle the folio markings directly. The following API is available to the filesystem: (1) The ->create_write_requests() method is called to ask the filesystem to create the requests it needs. This is passed the writeback slice to be processed. (2) The filesystem should then call netfs_create_write_request() to create the requests it needs. (3) Once a request is initialised, netfs_queue_write_request() can be called to dispatch it asynchronously, if not completed immediately. (4) netfs_write_request_completed() should be called to note the completion of a request. (5) netfs_get_write_request() and netfs_put_write_request() are provided to refcount a request. These take constants from the netfs_wreq_trace enum for logging into ftrace. (6) The ->free_write_request is method is called to ask the filesystem to clean up a request. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2021-06-30 05:31:48 +08:00
enum netfs_write_trace { netfs_write_traces } __mode(byte);
enum netfs_rreq_trace { netfs_rreq_traces } __mode(byte);
enum netfs_sreq_trace { netfs_sreq_traces } __mode(byte);
enum netfs_failure { netfs_failures } __mode(byte);
enum netfs_rreq_ref_trace { netfs_rreq_ref_traces } __mode(byte);
enum netfs_sreq_ref_trace { netfs_sreq_ref_traces } __mode(byte);
enum netfs_folio_trace { netfs_folio_traces } __mode(byte);
netfs: New writeback implementation The current netfslib writeback implementation creates writeback requests of contiguous folio data and then separately tiles subrequests over the space twice, once for the server and once for the cache. This creates a few issues: (1) Every time there's a discontiguity or a change between writing to only one destination or writing to both, it must create a new request. This makes it harder to do vectored writes. (2) The folios don't have the writeback mark removed until the end of the request - and a request could be hundreds of megabytes. (3) In future, I want to support a larger cache granularity, which will require aggregation of some folios that contain unmodified data (which only need to go to the cache) and some which contain modifications (which need to be uploaded and stored to the cache) - but, currently, these are treated as discontiguous. There's also a move to get everyone to use writeback_iter() to extract writable folios from the pagecache. That said, currently writeback_iter() has some issues that make it less than ideal: (1) there's no way to cancel the iteration, even if you find a "temporary" error that means the current folio and all subsequent folios are going to fail; (2) there's no way to filter the folios being written back - something that will impact Ceph with it's ordered snap system; (3) and if you get a folio you can't immediately deal with (say you need to flush the preceding writes), you are left with a folio hanging in the locked state for the duration, when really we should unlock it and relock it later. In this new implementation, I use writeback_iter() to pump folios, progressively creating two parallel, but separate streams and cleaning up the finished folios as the subrequests complete. Either or both streams can contain gaps, and the subrequests in each stream can be of variable size, don't need to align with each other and don't need to align with the folios. Indeed, subrequests can cross folio boundaries, may cover several folios or a folio may be spanned by multiple folios, e.g.: +---+---+-----+-----+---+----------+ Folios: | | | | | | | +---+---+-----+-----+---+----------+ +------+------+ +----+----+ Upload: | | |.....| | | +------+------+ +----+----+ +------+------+------+------+------+ Cache: | | | | | | +------+------+------+------+------+ The progressive subrequest construction permits the algorithm to be preparing both the next upload to the server and the next write to the cache whilst the previous ones are already in progress. Throttling can be applied to control the rate of production of subrequests - and, in any case, we probably want to write them to the server in ascending order, particularly if the file will be extended. Content crypto can also be prepared at the same time as the subrequests and run asynchronously, with the prepped requests being stalled until the crypto catches up with them. This might also be useful for transport crypto, but that happens at a lower layer, so probably would be harder to pull off. The algorithm is split into three parts: (1) The issuer. This walks through the data, packaging it up, encrypting it and creating subrequests. The part of this that generates subrequests only deals with file positions and spans and so is usable for DIO/unbuffered writes as well as buffered writes. (2) The collector. This asynchronously collects completed subrequests, unlocks folios, frees crypto buffers and performs any retries. This runs in a work queue so that the issuer can return to the caller for writeback (so that the VM can have its kswapd thread back) or async writes. (3) The retryer. This pauses the issuer, waits for all outstanding subrequests to complete and then goes through the failed subrequests to reissue them. This may involve reprepping them (with cifs, the credits must be renegotiated, and a subrequest may need splitting), and doing RMW for content crypto if there's a conflicting change on the server. [!] Note that some of the functions are prefixed with "new_" to avoid clashes with existing functions. These will be renamed in a later patch that cuts over to the new algorithm. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Eric Van Hensbergen <ericvh@kernel.org> cc: Latchesar Ionkov <lucho@ionkov.net> cc: Dominique Martinet <asmadeus@codewreck.org> cc: Christian Schoenebeck <linux_oss@crudebyte.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: v9fs@lists.linux.dev cc: linux-afs@lists.infradead.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org
2024-03-19 00:52:05 +08:00
enum netfs_collect_contig_trace { netfs_collect_contig_traces } __mode(byte);
#endif
netfs: Add tracepoints Add three tracepoints to track the activity of the read helpers: (1) netfs/netfs_read This logs entry to the read helpers and also expansion of the range in a readahead request. (2) netfs/netfs_rreq This logs the progress of netfs_read_request objects which track read requests. A read request may be a compound of multiple subrequests. (3) netfs/netfs_sreq This logs the progress of netfs_read_subrequest objects, which track the contributions from various sources to a read request. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-and-tested-by: Jeff Layton <jlayton@kernel.org> Tested-by: Dave Wysochanski <dwysocha@redhat.com> Tested-By: Marc Dionne <marc.dionne@auristor.com> cc: Matthew Wilcox <willy@infradead.org> cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/161118138060.1232039.5353374588021776217.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161033468.2537118.14021843889844001905.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340395843.1303470.7355519662919639648.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/161539538693.286939.10171713520419106334.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/161653796447.2770958.1870655382450862155.stgit@warthog.procyon.org.uk/ # v5 Link: https://lore.kernel.org/r/161789078003.6155.17814844411672989942.stgit@warthog.procyon.org.uk/ # v6
2020-09-18 16:25:13 +08:00
/*
* Export enum symbols via userspace.
*/
#undef EM
#undef E_
#define EM(a, b) TRACE_DEFINE_ENUM(a);
#define E_(a, b) TRACE_DEFINE_ENUM(a);
netfs_read_traces;
netfs: Dispatch write requests to process a writeback slice Dispatch one or more write reqeusts to process a writeback slice, where a slice is tailored more to logical block divisions within the file (such as crypto blocks, an object layout or cache granules) than the protocol RPC maximum capacity. The dispatch doesn't happen until throttling allows, at which point the entire writeback slice is processed and queued. A slice may be written to multiple destinations (one or more servers and the local cache) and the writes to each destination might be split up along different lines. The writeback slice holds the required folios pinned. An iov_iter is provided in netfs_write_request that describes the buffer to be used. This may be part of the pagecache, may have auxiliary padding pages attached or may be a bounce buffer resulting from crypto or compression. Consequently, the filesystem must not twiddle the folio markings directly. The following API is available to the filesystem: (1) The ->create_write_requests() method is called to ask the filesystem to create the requests it needs. This is passed the writeback slice to be processed. (2) The filesystem should then call netfs_create_write_request() to create the requests it needs. (3) Once a request is initialised, netfs_queue_write_request() can be called to dispatch it asynchronously, if not completed immediately. (4) netfs_write_request_completed() should be called to note the completion of a request. (5) netfs_get_write_request() and netfs_put_write_request() are provided to refcount a request. These take constants from the netfs_wreq_trace enum for logging into ftrace. (6) The ->free_write_request is method is called to ask the filesystem to clean up a request. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2021-06-30 05:31:48 +08:00
netfs_write_traces;
netfs_rreq_origins;
netfs: Add tracepoints Add three tracepoints to track the activity of the read helpers: (1) netfs/netfs_read This logs entry to the read helpers and also expansion of the range in a readahead request. (2) netfs/netfs_rreq This logs the progress of netfs_read_request objects which track read requests. A read request may be a compound of multiple subrequests. (3) netfs/netfs_sreq This logs the progress of netfs_read_subrequest objects, which track the contributions from various sources to a read request. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-and-tested-by: Jeff Layton <jlayton@kernel.org> Tested-by: Dave Wysochanski <dwysocha@redhat.com> Tested-By: Marc Dionne <marc.dionne@auristor.com> cc: Matthew Wilcox <willy@infradead.org> cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/161118138060.1232039.5353374588021776217.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161033468.2537118.14021843889844001905.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340395843.1303470.7355519662919639648.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/161539538693.286939.10171713520419106334.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/161653796447.2770958.1870655382450862155.stgit@warthog.procyon.org.uk/ # v5 Link: https://lore.kernel.org/r/161789078003.6155.17814844411672989942.stgit@warthog.procyon.org.uk/ # v6
2020-09-18 16:25:13 +08:00
netfs_rreq_traces;
netfs_sreq_sources;
netfs_sreq_traces;
netfs_failures;
netfs_rreq_ref_traces;
netfs_sreq_ref_traces;
netfs_folio_traces;
netfs: New writeback implementation The current netfslib writeback implementation creates writeback requests of contiguous folio data and then separately tiles subrequests over the space twice, once for the server and once for the cache. This creates a few issues: (1) Every time there's a discontiguity or a change between writing to only one destination or writing to both, it must create a new request. This makes it harder to do vectored writes. (2) The folios don't have the writeback mark removed until the end of the request - and a request could be hundreds of megabytes. (3) In future, I want to support a larger cache granularity, which will require aggregation of some folios that contain unmodified data (which only need to go to the cache) and some which contain modifications (which need to be uploaded and stored to the cache) - but, currently, these are treated as discontiguous. There's also a move to get everyone to use writeback_iter() to extract writable folios from the pagecache. That said, currently writeback_iter() has some issues that make it less than ideal: (1) there's no way to cancel the iteration, even if you find a "temporary" error that means the current folio and all subsequent folios are going to fail; (2) there's no way to filter the folios being written back - something that will impact Ceph with it's ordered snap system; (3) and if you get a folio you can't immediately deal with (say you need to flush the preceding writes), you are left with a folio hanging in the locked state for the duration, when really we should unlock it and relock it later. In this new implementation, I use writeback_iter() to pump folios, progressively creating two parallel, but separate streams and cleaning up the finished folios as the subrequests complete. Either or both streams can contain gaps, and the subrequests in each stream can be of variable size, don't need to align with each other and don't need to align with the folios. Indeed, subrequests can cross folio boundaries, may cover several folios or a folio may be spanned by multiple folios, e.g.: +---+---+-----+-----+---+----------+ Folios: | | | | | | | +---+---+-----+-----+---+----------+ +------+------+ +----+----+ Upload: | | |.....| | | +------+------+ +----+----+ +------+------+------+------+------+ Cache: | | | | | | +------+------+------+------+------+ The progressive subrequest construction permits the algorithm to be preparing both the next upload to the server and the next write to the cache whilst the previous ones are already in progress. Throttling can be applied to control the rate of production of subrequests - and, in any case, we probably want to write them to the server in ascending order, particularly if the file will be extended. Content crypto can also be prepared at the same time as the subrequests and run asynchronously, with the prepped requests being stalled until the crypto catches up with them. This might also be useful for transport crypto, but that happens at a lower layer, so probably would be harder to pull off. The algorithm is split into three parts: (1) The issuer. This walks through the data, packaging it up, encrypting it and creating subrequests. The part of this that generates subrequests only deals with file positions and spans and so is usable for DIO/unbuffered writes as well as buffered writes. (2) The collector. This asynchronously collects completed subrequests, unlocks folios, frees crypto buffers and performs any retries. This runs in a work queue so that the issuer can return to the caller for writeback (so that the VM can have its kswapd thread back) or async writes. (3) The retryer. This pauses the issuer, waits for all outstanding subrequests to complete and then goes through the failed subrequests to reissue them. This may involve reprepping them (with cifs, the credits must be renegotiated, and a subrequest may need splitting), and doing RMW for content crypto if there's a conflicting change on the server. [!] Note that some of the functions are prefixed with "new_" to avoid clashes with existing functions. These will be renamed in a later patch that cuts over to the new algorithm. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Eric Van Hensbergen <ericvh@kernel.org> cc: Latchesar Ionkov <lucho@ionkov.net> cc: Dominique Martinet <asmadeus@codewreck.org> cc: Christian Schoenebeck <linux_oss@crudebyte.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: v9fs@lists.linux.dev cc: linux-afs@lists.infradead.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org
2024-03-19 00:52:05 +08:00
netfs_collect_contig_traces;
netfs: Add tracepoints Add three tracepoints to track the activity of the read helpers: (1) netfs/netfs_read This logs entry to the read helpers and also expansion of the range in a readahead request. (2) netfs/netfs_rreq This logs the progress of netfs_read_request objects which track read requests. A read request may be a compound of multiple subrequests. (3) netfs/netfs_sreq This logs the progress of netfs_read_subrequest objects, which track the contributions from various sources to a read request. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-and-tested-by: Jeff Layton <jlayton@kernel.org> Tested-by: Dave Wysochanski <dwysocha@redhat.com> Tested-By: Marc Dionne <marc.dionne@auristor.com> cc: Matthew Wilcox <willy@infradead.org> cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/161118138060.1232039.5353374588021776217.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161033468.2537118.14021843889844001905.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340395843.1303470.7355519662919639648.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/161539538693.286939.10171713520419106334.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/161653796447.2770958.1870655382450862155.stgit@warthog.procyon.org.uk/ # v5 Link: https://lore.kernel.org/r/161789078003.6155.17814844411672989942.stgit@warthog.procyon.org.uk/ # v6
2020-09-18 16:25:13 +08:00
/*
* Now redefine the EM() and E_() macros to map the enums to the strings that
* will be printed in the output.
*/
#undef EM
#undef E_
#define EM(a, b) { a, b },
#define E_(a, b) { a, b }
TRACE_EVENT(netfs_read,
TP_PROTO(struct netfs_io_request *rreq,
netfs: Add tracepoints Add three tracepoints to track the activity of the read helpers: (1) netfs/netfs_read This logs entry to the read helpers and also expansion of the range in a readahead request. (2) netfs/netfs_rreq This logs the progress of netfs_read_request objects which track read requests. A read request may be a compound of multiple subrequests. (3) netfs/netfs_sreq This logs the progress of netfs_read_subrequest objects, which track the contributions from various sources to a read request. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-and-tested-by: Jeff Layton <jlayton@kernel.org> Tested-by: Dave Wysochanski <dwysocha@redhat.com> Tested-By: Marc Dionne <marc.dionne@auristor.com> cc: Matthew Wilcox <willy@infradead.org> cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/161118138060.1232039.5353374588021776217.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161033468.2537118.14021843889844001905.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340395843.1303470.7355519662919639648.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/161539538693.286939.10171713520419106334.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/161653796447.2770958.1870655382450862155.stgit@warthog.procyon.org.uk/ # v5 Link: https://lore.kernel.org/r/161789078003.6155.17814844411672989942.stgit@warthog.procyon.org.uk/ # v6
2020-09-18 16:25:13 +08:00
loff_t start, size_t len,
enum netfs_read_trace what),
TP_ARGS(rreq, start, len, what),
TP_STRUCT__entry(
__field(unsigned int, rreq )
__field(unsigned int, cookie )
__field(loff_t, start )
__field(size_t, len )
__field(enum netfs_read_trace, what )
__field(unsigned int, netfs_inode )
netfs: Add tracepoints Add three tracepoints to track the activity of the read helpers: (1) netfs/netfs_read This logs entry to the read helpers and also expansion of the range in a readahead request. (2) netfs/netfs_rreq This logs the progress of netfs_read_request objects which track read requests. A read request may be a compound of multiple subrequests. (3) netfs/netfs_sreq This logs the progress of netfs_read_subrequest objects, which track the contributions from various sources to a read request. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-and-tested-by: Jeff Layton <jlayton@kernel.org> Tested-by: Dave Wysochanski <dwysocha@redhat.com> Tested-By: Marc Dionne <marc.dionne@auristor.com> cc: Matthew Wilcox <willy@infradead.org> cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/161118138060.1232039.5353374588021776217.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161033468.2537118.14021843889844001905.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340395843.1303470.7355519662919639648.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/161539538693.286939.10171713520419106334.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/161653796447.2770958.1870655382450862155.stgit@warthog.procyon.org.uk/ # v5 Link: https://lore.kernel.org/r/161789078003.6155.17814844411672989942.stgit@warthog.procyon.org.uk/ # v6
2020-09-18 16:25:13 +08:00
),
TP_fast_assign(
__entry->rreq = rreq->debug_id;
__entry->cookie = rreq->cache_resources.debug_id;
netfs: Add tracepoints Add three tracepoints to track the activity of the read helpers: (1) netfs/netfs_read This logs entry to the read helpers and also expansion of the range in a readahead request. (2) netfs/netfs_rreq This logs the progress of netfs_read_request objects which track read requests. A read request may be a compound of multiple subrequests. (3) netfs/netfs_sreq This logs the progress of netfs_read_subrequest objects, which track the contributions from various sources to a read request. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-and-tested-by: Jeff Layton <jlayton@kernel.org> Tested-by: Dave Wysochanski <dwysocha@redhat.com> Tested-By: Marc Dionne <marc.dionne@auristor.com> cc: Matthew Wilcox <willy@infradead.org> cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/161118138060.1232039.5353374588021776217.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161033468.2537118.14021843889844001905.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340395843.1303470.7355519662919639648.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/161539538693.286939.10171713520419106334.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/161653796447.2770958.1870655382450862155.stgit@warthog.procyon.org.uk/ # v5 Link: https://lore.kernel.org/r/161789078003.6155.17814844411672989942.stgit@warthog.procyon.org.uk/ # v6
2020-09-18 16:25:13 +08:00
__entry->start = start;
__entry->len = len;
__entry->what = what;
__entry->netfs_inode = rreq->inode->i_ino;
netfs: Add tracepoints Add three tracepoints to track the activity of the read helpers: (1) netfs/netfs_read This logs entry to the read helpers and also expansion of the range in a readahead request. (2) netfs/netfs_rreq This logs the progress of netfs_read_request objects which track read requests. A read request may be a compound of multiple subrequests. (3) netfs/netfs_sreq This logs the progress of netfs_read_subrequest objects, which track the contributions from various sources to a read request. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-and-tested-by: Jeff Layton <jlayton@kernel.org> Tested-by: Dave Wysochanski <dwysocha@redhat.com> Tested-By: Marc Dionne <marc.dionne@auristor.com> cc: Matthew Wilcox <willy@infradead.org> cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/161118138060.1232039.5353374588021776217.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161033468.2537118.14021843889844001905.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340395843.1303470.7355519662919639648.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/161539538693.286939.10171713520419106334.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/161653796447.2770958.1870655382450862155.stgit@warthog.procyon.org.uk/ # v5 Link: https://lore.kernel.org/r/161789078003.6155.17814844411672989942.stgit@warthog.procyon.org.uk/ # v6
2020-09-18 16:25:13 +08:00
),
TP_printk("R=%08x %s c=%08x ni=%x s=%llx %zx",
netfs: Add tracepoints Add three tracepoints to track the activity of the read helpers: (1) netfs/netfs_read This logs entry to the read helpers and also expansion of the range in a readahead request. (2) netfs/netfs_rreq This logs the progress of netfs_read_request objects which track read requests. A read request may be a compound of multiple subrequests. (3) netfs/netfs_sreq This logs the progress of netfs_read_subrequest objects, which track the contributions from various sources to a read request. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-and-tested-by: Jeff Layton <jlayton@kernel.org> Tested-by: Dave Wysochanski <dwysocha@redhat.com> Tested-By: Marc Dionne <marc.dionne@auristor.com> cc: Matthew Wilcox <willy@infradead.org> cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/161118138060.1232039.5353374588021776217.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161033468.2537118.14021843889844001905.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340395843.1303470.7355519662919639648.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/161539538693.286939.10171713520419106334.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/161653796447.2770958.1870655382450862155.stgit@warthog.procyon.org.uk/ # v5 Link: https://lore.kernel.org/r/161789078003.6155.17814844411672989942.stgit@warthog.procyon.org.uk/ # v6
2020-09-18 16:25:13 +08:00
__entry->rreq,
__print_symbolic(__entry->what, netfs_read_traces),
__entry->cookie,
__entry->netfs_inode,
netfs: Add tracepoints Add three tracepoints to track the activity of the read helpers: (1) netfs/netfs_read This logs entry to the read helpers and also expansion of the range in a readahead request. (2) netfs/netfs_rreq This logs the progress of netfs_read_request objects which track read requests. A read request may be a compound of multiple subrequests. (3) netfs/netfs_sreq This logs the progress of netfs_read_subrequest objects, which track the contributions from various sources to a read request. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-and-tested-by: Jeff Layton <jlayton@kernel.org> Tested-by: Dave Wysochanski <dwysocha@redhat.com> Tested-By: Marc Dionne <marc.dionne@auristor.com> cc: Matthew Wilcox <willy@infradead.org> cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/161118138060.1232039.5353374588021776217.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161033468.2537118.14021843889844001905.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340395843.1303470.7355519662919639648.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/161539538693.286939.10171713520419106334.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/161653796447.2770958.1870655382450862155.stgit@warthog.procyon.org.uk/ # v5 Link: https://lore.kernel.org/r/161789078003.6155.17814844411672989942.stgit@warthog.procyon.org.uk/ # v6
2020-09-18 16:25:13 +08:00
__entry->start, __entry->len)
);
TRACE_EVENT(netfs_rreq,
TP_PROTO(struct netfs_io_request *rreq,
netfs: Add tracepoints Add three tracepoints to track the activity of the read helpers: (1) netfs/netfs_read This logs entry to the read helpers and also expansion of the range in a readahead request. (2) netfs/netfs_rreq This logs the progress of netfs_read_request objects which track read requests. A read request may be a compound of multiple subrequests. (3) netfs/netfs_sreq This logs the progress of netfs_read_subrequest objects, which track the contributions from various sources to a read request. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-and-tested-by: Jeff Layton <jlayton@kernel.org> Tested-by: Dave Wysochanski <dwysocha@redhat.com> Tested-By: Marc Dionne <marc.dionne@auristor.com> cc: Matthew Wilcox <willy@infradead.org> cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/161118138060.1232039.5353374588021776217.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161033468.2537118.14021843889844001905.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340395843.1303470.7355519662919639648.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/161539538693.286939.10171713520419106334.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/161653796447.2770958.1870655382450862155.stgit@warthog.procyon.org.uk/ # v5 Link: https://lore.kernel.org/r/161789078003.6155.17814844411672989942.stgit@warthog.procyon.org.uk/ # v6
2020-09-18 16:25:13 +08:00
enum netfs_rreq_trace what),
TP_ARGS(rreq, what),
TP_STRUCT__entry(
__field(unsigned int, rreq )
__field(unsigned int, flags )
__field(enum netfs_io_origin, origin )
netfs: Add tracepoints Add three tracepoints to track the activity of the read helpers: (1) netfs/netfs_read This logs entry to the read helpers and also expansion of the range in a readahead request. (2) netfs/netfs_rreq This logs the progress of netfs_read_request objects which track read requests. A read request may be a compound of multiple subrequests. (3) netfs/netfs_sreq This logs the progress of netfs_read_subrequest objects, which track the contributions from various sources to a read request. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-and-tested-by: Jeff Layton <jlayton@kernel.org> Tested-by: Dave Wysochanski <dwysocha@redhat.com> Tested-By: Marc Dionne <marc.dionne@auristor.com> cc: Matthew Wilcox <willy@infradead.org> cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/161118138060.1232039.5353374588021776217.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161033468.2537118.14021843889844001905.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340395843.1303470.7355519662919639648.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/161539538693.286939.10171713520419106334.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/161653796447.2770958.1870655382450862155.stgit@warthog.procyon.org.uk/ # v5 Link: https://lore.kernel.org/r/161789078003.6155.17814844411672989942.stgit@warthog.procyon.org.uk/ # v6
2020-09-18 16:25:13 +08:00
__field(enum netfs_rreq_trace, what )
),
TP_fast_assign(
__entry->rreq = rreq->debug_id;
__entry->flags = rreq->flags;
__entry->origin = rreq->origin;
netfs: Add tracepoints Add three tracepoints to track the activity of the read helpers: (1) netfs/netfs_read This logs entry to the read helpers and also expansion of the range in a readahead request. (2) netfs/netfs_rreq This logs the progress of netfs_read_request objects which track read requests. A read request may be a compound of multiple subrequests. (3) netfs/netfs_sreq This logs the progress of netfs_read_subrequest objects, which track the contributions from various sources to a read request. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-and-tested-by: Jeff Layton <jlayton@kernel.org> Tested-by: Dave Wysochanski <dwysocha@redhat.com> Tested-By: Marc Dionne <marc.dionne@auristor.com> cc: Matthew Wilcox <willy@infradead.org> cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/161118138060.1232039.5353374588021776217.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161033468.2537118.14021843889844001905.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340395843.1303470.7355519662919639648.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/161539538693.286939.10171713520419106334.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/161653796447.2770958.1870655382450862155.stgit@warthog.procyon.org.uk/ # v5 Link: https://lore.kernel.org/r/161789078003.6155.17814844411672989942.stgit@warthog.procyon.org.uk/ # v6
2020-09-18 16:25:13 +08:00
__entry->what = what;
),
TP_printk("R=%08x %s %s f=%02x",
netfs: Add tracepoints Add three tracepoints to track the activity of the read helpers: (1) netfs/netfs_read This logs entry to the read helpers and also expansion of the range in a readahead request. (2) netfs/netfs_rreq This logs the progress of netfs_read_request objects which track read requests. A read request may be a compound of multiple subrequests. (3) netfs/netfs_sreq This logs the progress of netfs_read_subrequest objects, which track the contributions from various sources to a read request. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-and-tested-by: Jeff Layton <jlayton@kernel.org> Tested-by: Dave Wysochanski <dwysocha@redhat.com> Tested-By: Marc Dionne <marc.dionne@auristor.com> cc: Matthew Wilcox <willy@infradead.org> cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/161118138060.1232039.5353374588021776217.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161033468.2537118.14021843889844001905.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340395843.1303470.7355519662919639648.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/161539538693.286939.10171713520419106334.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/161653796447.2770958.1870655382450862155.stgit@warthog.procyon.org.uk/ # v5 Link: https://lore.kernel.org/r/161789078003.6155.17814844411672989942.stgit@warthog.procyon.org.uk/ # v6
2020-09-18 16:25:13 +08:00
__entry->rreq,
__print_symbolic(__entry->origin, netfs_rreq_origins),
netfs: Add tracepoints Add three tracepoints to track the activity of the read helpers: (1) netfs/netfs_read This logs entry to the read helpers and also expansion of the range in a readahead request. (2) netfs/netfs_rreq This logs the progress of netfs_read_request objects which track read requests. A read request may be a compound of multiple subrequests. (3) netfs/netfs_sreq This logs the progress of netfs_read_subrequest objects, which track the contributions from various sources to a read request. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-and-tested-by: Jeff Layton <jlayton@kernel.org> Tested-by: Dave Wysochanski <dwysocha@redhat.com> Tested-By: Marc Dionne <marc.dionne@auristor.com> cc: Matthew Wilcox <willy@infradead.org> cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/161118138060.1232039.5353374588021776217.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161033468.2537118.14021843889844001905.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340395843.1303470.7355519662919639648.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/161539538693.286939.10171713520419106334.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/161653796447.2770958.1870655382450862155.stgit@warthog.procyon.org.uk/ # v5 Link: https://lore.kernel.org/r/161789078003.6155.17814844411672989942.stgit@warthog.procyon.org.uk/ # v6
2020-09-18 16:25:13 +08:00
__print_symbolic(__entry->what, netfs_rreq_traces),
__entry->flags)
);
TRACE_EVENT(netfs_sreq,
TP_PROTO(struct netfs_io_subrequest *sreq,
netfs: Add tracepoints Add three tracepoints to track the activity of the read helpers: (1) netfs/netfs_read This logs entry to the read helpers and also expansion of the range in a readahead request. (2) netfs/netfs_rreq This logs the progress of netfs_read_request objects which track read requests. A read request may be a compound of multiple subrequests. (3) netfs/netfs_sreq This logs the progress of netfs_read_subrequest objects, which track the contributions from various sources to a read request. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-and-tested-by: Jeff Layton <jlayton@kernel.org> Tested-by: Dave Wysochanski <dwysocha@redhat.com> Tested-By: Marc Dionne <marc.dionne@auristor.com> cc: Matthew Wilcox <willy@infradead.org> cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/161118138060.1232039.5353374588021776217.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161033468.2537118.14021843889844001905.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340395843.1303470.7355519662919639648.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/161539538693.286939.10171713520419106334.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/161653796447.2770958.1870655382450862155.stgit@warthog.procyon.org.uk/ # v5 Link: https://lore.kernel.org/r/161789078003.6155.17814844411672989942.stgit@warthog.procyon.org.uk/ # v6
2020-09-18 16:25:13 +08:00
enum netfs_sreq_trace what),
TP_ARGS(sreq, what),
TP_STRUCT__entry(
__field(unsigned int, rreq )
__field(unsigned short, index )
__field(short, error )
__field(unsigned short, flags )
__field(enum netfs_io_source, source )
netfs: Add tracepoints Add three tracepoints to track the activity of the read helpers: (1) netfs/netfs_read This logs entry to the read helpers and also expansion of the range in a readahead request. (2) netfs/netfs_rreq This logs the progress of netfs_read_request objects which track read requests. A read request may be a compound of multiple subrequests. (3) netfs/netfs_sreq This logs the progress of netfs_read_subrequest objects, which track the contributions from various sources to a read request. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-and-tested-by: Jeff Layton <jlayton@kernel.org> Tested-by: Dave Wysochanski <dwysocha@redhat.com> Tested-By: Marc Dionne <marc.dionne@auristor.com> cc: Matthew Wilcox <willy@infradead.org> cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/161118138060.1232039.5353374588021776217.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161033468.2537118.14021843889844001905.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340395843.1303470.7355519662919639648.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/161539538693.286939.10171713520419106334.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/161653796447.2770958.1870655382450862155.stgit@warthog.procyon.org.uk/ # v5 Link: https://lore.kernel.org/r/161789078003.6155.17814844411672989942.stgit@warthog.procyon.org.uk/ # v6
2020-09-18 16:25:13 +08:00
__field(enum netfs_sreq_trace, what )
__field(size_t, len )
__field(size_t, transferred )
__field(loff_t, start )
),
TP_fast_assign(
__entry->rreq = sreq->rreq->debug_id;
__entry->index = sreq->debug_index;
__entry->error = sreq->error;
__entry->flags = sreq->flags;
__entry->source = sreq->source;
__entry->what = what;
__entry->len = sreq->len;
__entry->transferred = sreq->transferred;
__entry->start = sreq->start;
),
TP_printk("R=%08x[%x] %s %s f=%02x s=%llx %zx/%zx e=%d",
netfs: Add tracepoints Add three tracepoints to track the activity of the read helpers: (1) netfs/netfs_read This logs entry to the read helpers and also expansion of the range in a readahead request. (2) netfs/netfs_rreq This logs the progress of netfs_read_request objects which track read requests. A read request may be a compound of multiple subrequests. (3) netfs/netfs_sreq This logs the progress of netfs_read_subrequest objects, which track the contributions from various sources to a read request. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-and-tested-by: Jeff Layton <jlayton@kernel.org> Tested-by: Dave Wysochanski <dwysocha@redhat.com> Tested-By: Marc Dionne <marc.dionne@auristor.com> cc: Matthew Wilcox <willy@infradead.org> cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/161118138060.1232039.5353374588021776217.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161033468.2537118.14021843889844001905.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340395843.1303470.7355519662919639648.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/161539538693.286939.10171713520419106334.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/161653796447.2770958.1870655382450862155.stgit@warthog.procyon.org.uk/ # v5 Link: https://lore.kernel.org/r/161789078003.6155.17814844411672989942.stgit@warthog.procyon.org.uk/ # v6
2020-09-18 16:25:13 +08:00
__entry->rreq, __entry->index,
__print_symbolic(__entry->source, netfs_sreq_sources),
__print_symbolic(__entry->what, netfs_sreq_traces),
netfs: Add tracepoints Add three tracepoints to track the activity of the read helpers: (1) netfs/netfs_read This logs entry to the read helpers and also expansion of the range in a readahead request. (2) netfs/netfs_rreq This logs the progress of netfs_read_request objects which track read requests. A read request may be a compound of multiple subrequests. (3) netfs/netfs_sreq This logs the progress of netfs_read_subrequest objects, which track the contributions from various sources to a read request. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-and-tested-by: Jeff Layton <jlayton@kernel.org> Tested-by: Dave Wysochanski <dwysocha@redhat.com> Tested-By: Marc Dionne <marc.dionne@auristor.com> cc: Matthew Wilcox <willy@infradead.org> cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/161118138060.1232039.5353374588021776217.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161033468.2537118.14021843889844001905.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340395843.1303470.7355519662919639648.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/161539538693.286939.10171713520419106334.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/161653796447.2770958.1870655382450862155.stgit@warthog.procyon.org.uk/ # v5 Link: https://lore.kernel.org/r/161789078003.6155.17814844411672989942.stgit@warthog.procyon.org.uk/ # v6
2020-09-18 16:25:13 +08:00
__entry->flags,
__entry->start, __entry->transferred, __entry->len,
__entry->error)
);
TRACE_EVENT(netfs_failure,
TP_PROTO(struct netfs_io_request *rreq,
struct netfs_io_subrequest *sreq,
int error, enum netfs_failure what),
TP_ARGS(rreq, sreq, error, what),
TP_STRUCT__entry(
__field(unsigned int, rreq )
__field(short, index )
__field(short, error )
__field(unsigned short, flags )
__field(enum netfs_io_source, source )
__field(enum netfs_failure, what )
__field(size_t, len )
__field(size_t, transferred )
__field(loff_t, start )
),
TP_fast_assign(
__entry->rreq = rreq->debug_id;
__entry->index = sreq ? sreq->debug_index : -1;
__entry->error = error;
__entry->flags = sreq ? sreq->flags : 0;
__entry->source = sreq ? sreq->source : NETFS_INVALID_READ;
__entry->what = what;
__entry->len = sreq ? sreq->len : rreq->len;
__entry->transferred = sreq ? sreq->transferred : 0;
__entry->start = sreq ? sreq->start : 0;
),
TP_printk("R=%08x[%x] %s f=%02x s=%llx %zx/%zx %s e=%d",
__entry->rreq, __entry->index,
__print_symbolic(__entry->source, netfs_sreq_sources),
__entry->flags,
__entry->start, __entry->transferred, __entry->len,
__print_symbolic(__entry->what, netfs_failures),
__entry->error)
);
TRACE_EVENT(netfs_rreq_ref,
TP_PROTO(unsigned int rreq_debug_id, int ref,
enum netfs_rreq_ref_trace what),
TP_ARGS(rreq_debug_id, ref, what),
TP_STRUCT__entry(
__field(unsigned int, rreq )
__field(int, ref )
__field(enum netfs_rreq_ref_trace, what )
),
TP_fast_assign(
__entry->rreq = rreq_debug_id;
__entry->ref = ref;
__entry->what = what;
),
TP_printk("R=%08x %s r=%u",
__entry->rreq,
__print_symbolic(__entry->what, netfs_rreq_ref_traces),
__entry->ref)
);
TRACE_EVENT(netfs_sreq_ref,
TP_PROTO(unsigned int rreq_debug_id, unsigned int subreq_debug_index,
int ref, enum netfs_sreq_ref_trace what),
TP_ARGS(rreq_debug_id, subreq_debug_index, ref, what),
TP_STRUCT__entry(
__field(unsigned int, rreq )
__field(unsigned int, subreq )
__field(int, ref )
__field(enum netfs_sreq_ref_trace, what )
),
TP_fast_assign(
__entry->rreq = rreq_debug_id;
__entry->subreq = subreq_debug_index;
__entry->ref = ref;
__entry->what = what;
),
TP_printk("R=%08x[%x] %s r=%u",
__entry->rreq,
__entry->subreq,
__print_symbolic(__entry->what, netfs_sreq_ref_traces),
__entry->ref)
);
TRACE_EVENT(netfs_folio,
TP_PROTO(struct folio *folio, enum netfs_folio_trace why),
TP_ARGS(folio, why),
TP_STRUCT__entry(
__field(ino_t, ino)
__field(pgoff_t, index)
__field(unsigned int, nr)
__field(enum netfs_folio_trace, why)
),
TP_fast_assign(
__entry->ino = folio->mapping->host->i_ino;
__entry->why = why;
__entry->index = folio_index(folio);
__entry->nr = folio_nr_pages(folio);
),
TP_printk("i=%05lx ix=%05lx-%05lx %s",
__entry->ino, __entry->index, __entry->index + __entry->nr - 1,
__print_symbolic(__entry->why, netfs_folio_traces))
);
TRACE_EVENT(netfs_write_iter,
TP_PROTO(const struct kiocb *iocb, const struct iov_iter *from),
TP_ARGS(iocb, from),
TP_STRUCT__entry(
__field(unsigned long long, start )
__field(size_t, len )
__field(unsigned int, flags )
netfs: New writeback implementation The current netfslib writeback implementation creates writeback requests of contiguous folio data and then separately tiles subrequests over the space twice, once for the server and once for the cache. This creates a few issues: (1) Every time there's a discontiguity or a change between writing to only one destination or writing to both, it must create a new request. This makes it harder to do vectored writes. (2) The folios don't have the writeback mark removed until the end of the request - and a request could be hundreds of megabytes. (3) In future, I want to support a larger cache granularity, which will require aggregation of some folios that contain unmodified data (which only need to go to the cache) and some which contain modifications (which need to be uploaded and stored to the cache) - but, currently, these are treated as discontiguous. There's also a move to get everyone to use writeback_iter() to extract writable folios from the pagecache. That said, currently writeback_iter() has some issues that make it less than ideal: (1) there's no way to cancel the iteration, even if you find a "temporary" error that means the current folio and all subsequent folios are going to fail; (2) there's no way to filter the folios being written back - something that will impact Ceph with it's ordered snap system; (3) and if you get a folio you can't immediately deal with (say you need to flush the preceding writes), you are left with a folio hanging in the locked state for the duration, when really we should unlock it and relock it later. In this new implementation, I use writeback_iter() to pump folios, progressively creating two parallel, but separate streams and cleaning up the finished folios as the subrequests complete. Either or both streams can contain gaps, and the subrequests in each stream can be of variable size, don't need to align with each other and don't need to align with the folios. Indeed, subrequests can cross folio boundaries, may cover several folios or a folio may be spanned by multiple folios, e.g.: +---+---+-----+-----+---+----------+ Folios: | | | | | | | +---+---+-----+-----+---+----------+ +------+------+ +----+----+ Upload: | | |.....| | | +------+------+ +----+----+ +------+------+------+------+------+ Cache: | | | | | | +------+------+------+------+------+ The progressive subrequest construction permits the algorithm to be preparing both the next upload to the server and the next write to the cache whilst the previous ones are already in progress. Throttling can be applied to control the rate of production of subrequests - and, in any case, we probably want to write them to the server in ascending order, particularly if the file will be extended. Content crypto can also be prepared at the same time as the subrequests and run asynchronously, with the prepped requests being stalled until the crypto catches up with them. This might also be useful for transport crypto, but that happens at a lower layer, so probably would be harder to pull off. The algorithm is split into three parts: (1) The issuer. This walks through the data, packaging it up, encrypting it and creating subrequests. The part of this that generates subrequests only deals with file positions and spans and so is usable for DIO/unbuffered writes as well as buffered writes. (2) The collector. This asynchronously collects completed subrequests, unlocks folios, frees crypto buffers and performs any retries. This runs in a work queue so that the issuer can return to the caller for writeback (so that the VM can have its kswapd thread back) or async writes. (3) The retryer. This pauses the issuer, waits for all outstanding subrequests to complete and then goes through the failed subrequests to reissue them. This may involve reprepping them (with cifs, the credits must be renegotiated, and a subrequest may need splitting), and doing RMW for content crypto if there's a conflicting change on the server. [!] Note that some of the functions are prefixed with "new_" to avoid clashes with existing functions. These will be renamed in a later patch that cuts over to the new algorithm. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Eric Van Hensbergen <ericvh@kernel.org> cc: Latchesar Ionkov <lucho@ionkov.net> cc: Dominique Martinet <asmadeus@codewreck.org> cc: Christian Schoenebeck <linux_oss@crudebyte.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: v9fs@lists.linux.dev cc: linux-afs@lists.infradead.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org
2024-03-19 00:52:05 +08:00
__field(unsigned int, ino )
),
TP_fast_assign(
__entry->start = iocb->ki_pos;
__entry->len = iov_iter_count(from);
netfs: New writeback implementation The current netfslib writeback implementation creates writeback requests of contiguous folio data and then separately tiles subrequests over the space twice, once for the server and once for the cache. This creates a few issues: (1) Every time there's a discontiguity or a change between writing to only one destination or writing to both, it must create a new request. This makes it harder to do vectored writes. (2) The folios don't have the writeback mark removed until the end of the request - and a request could be hundreds of megabytes. (3) In future, I want to support a larger cache granularity, which will require aggregation of some folios that contain unmodified data (which only need to go to the cache) and some which contain modifications (which need to be uploaded and stored to the cache) - but, currently, these are treated as discontiguous. There's also a move to get everyone to use writeback_iter() to extract writable folios from the pagecache. That said, currently writeback_iter() has some issues that make it less than ideal: (1) there's no way to cancel the iteration, even if you find a "temporary" error that means the current folio and all subsequent folios are going to fail; (2) there's no way to filter the folios being written back - something that will impact Ceph with it's ordered snap system; (3) and if you get a folio you can't immediately deal with (say you need to flush the preceding writes), you are left with a folio hanging in the locked state for the duration, when really we should unlock it and relock it later. In this new implementation, I use writeback_iter() to pump folios, progressively creating two parallel, but separate streams and cleaning up the finished folios as the subrequests complete. Either or both streams can contain gaps, and the subrequests in each stream can be of variable size, don't need to align with each other and don't need to align with the folios. Indeed, subrequests can cross folio boundaries, may cover several folios or a folio may be spanned by multiple folios, e.g.: +---+---+-----+-----+---+----------+ Folios: | | | | | | | +---+---+-----+-----+---+----------+ +------+------+ +----+----+ Upload: | | |.....| | | +------+------+ +----+----+ +------+------+------+------+------+ Cache: | | | | | | +------+------+------+------+------+ The progressive subrequest construction permits the algorithm to be preparing both the next upload to the server and the next write to the cache whilst the previous ones are already in progress. Throttling can be applied to control the rate of production of subrequests - and, in any case, we probably want to write them to the server in ascending order, particularly if the file will be extended. Content crypto can also be prepared at the same time as the subrequests and run asynchronously, with the prepped requests being stalled until the crypto catches up with them. This might also be useful for transport crypto, but that happens at a lower layer, so probably would be harder to pull off. The algorithm is split into three parts: (1) The issuer. This walks through the data, packaging it up, encrypting it and creating subrequests. The part of this that generates subrequests only deals with file positions and spans and so is usable for DIO/unbuffered writes as well as buffered writes. (2) The collector. This asynchronously collects completed subrequests, unlocks folios, frees crypto buffers and performs any retries. This runs in a work queue so that the issuer can return to the caller for writeback (so that the VM can have its kswapd thread back) or async writes. (3) The retryer. This pauses the issuer, waits for all outstanding subrequests to complete and then goes through the failed subrequests to reissue them. This may involve reprepping them (with cifs, the credits must be renegotiated, and a subrequest may need splitting), and doing RMW for content crypto if there's a conflicting change on the server. [!] Note that some of the functions are prefixed with "new_" to avoid clashes with existing functions. These will be renamed in a later patch that cuts over to the new algorithm. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Eric Van Hensbergen <ericvh@kernel.org> cc: Latchesar Ionkov <lucho@ionkov.net> cc: Dominique Martinet <asmadeus@codewreck.org> cc: Christian Schoenebeck <linux_oss@crudebyte.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: v9fs@lists.linux.dev cc: linux-afs@lists.infradead.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org
2024-03-19 00:52:05 +08:00
__entry->ino = iocb->ki_filp->f_inode->i_ino;
__entry->flags = iocb->ki_flags;
),
netfs: New writeback implementation The current netfslib writeback implementation creates writeback requests of contiguous folio data and then separately tiles subrequests over the space twice, once for the server and once for the cache. This creates a few issues: (1) Every time there's a discontiguity or a change between writing to only one destination or writing to both, it must create a new request. This makes it harder to do vectored writes. (2) The folios don't have the writeback mark removed until the end of the request - and a request could be hundreds of megabytes. (3) In future, I want to support a larger cache granularity, which will require aggregation of some folios that contain unmodified data (which only need to go to the cache) and some which contain modifications (which need to be uploaded and stored to the cache) - but, currently, these are treated as discontiguous. There's also a move to get everyone to use writeback_iter() to extract writable folios from the pagecache. That said, currently writeback_iter() has some issues that make it less than ideal: (1) there's no way to cancel the iteration, even if you find a "temporary" error that means the current folio and all subsequent folios are going to fail; (2) there's no way to filter the folios being written back - something that will impact Ceph with it's ordered snap system; (3) and if you get a folio you can't immediately deal with (say you need to flush the preceding writes), you are left with a folio hanging in the locked state for the duration, when really we should unlock it and relock it later. In this new implementation, I use writeback_iter() to pump folios, progressively creating two parallel, but separate streams and cleaning up the finished folios as the subrequests complete. Either or both streams can contain gaps, and the subrequests in each stream can be of variable size, don't need to align with each other and don't need to align with the folios. Indeed, subrequests can cross folio boundaries, may cover several folios or a folio may be spanned by multiple folios, e.g.: +---+---+-----+-----+---+----------+ Folios: | | | | | | | +---+---+-----+-----+---+----------+ +------+------+ +----+----+ Upload: | | |.....| | | +------+------+ +----+----+ +------+------+------+------+------+ Cache: | | | | | | +------+------+------+------+------+ The progressive subrequest construction permits the algorithm to be preparing both the next upload to the server and the next write to the cache whilst the previous ones are already in progress. Throttling can be applied to control the rate of production of subrequests - and, in any case, we probably want to write them to the server in ascending order, particularly if the file will be extended. Content crypto can also be prepared at the same time as the subrequests and run asynchronously, with the prepped requests being stalled until the crypto catches up with them. This might also be useful for transport crypto, but that happens at a lower layer, so probably would be harder to pull off. The algorithm is split into three parts: (1) The issuer. This walks through the data, packaging it up, encrypting it and creating subrequests. The part of this that generates subrequests only deals with file positions and spans and so is usable for DIO/unbuffered writes as well as buffered writes. (2) The collector. This asynchronously collects completed subrequests, unlocks folios, frees crypto buffers and performs any retries. This runs in a work queue so that the issuer can return to the caller for writeback (so that the VM can have its kswapd thread back) or async writes. (3) The retryer. This pauses the issuer, waits for all outstanding subrequests to complete and then goes through the failed subrequests to reissue them. This may involve reprepping them (with cifs, the credits must be renegotiated, and a subrequest may need splitting), and doing RMW for content crypto if there's a conflicting change on the server. [!] Note that some of the functions are prefixed with "new_" to avoid clashes with existing functions. These will be renamed in a later patch that cuts over to the new algorithm. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Eric Van Hensbergen <ericvh@kernel.org> cc: Latchesar Ionkov <lucho@ionkov.net> cc: Dominique Martinet <asmadeus@codewreck.org> cc: Christian Schoenebeck <linux_oss@crudebyte.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: v9fs@lists.linux.dev cc: linux-afs@lists.infradead.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org
2024-03-19 00:52:05 +08:00
TP_printk("WRITE-ITER i=%x s=%llx l=%zx f=%x",
__entry->ino, __entry->start, __entry->len, __entry->flags)
);
netfs: Dispatch write requests to process a writeback slice Dispatch one or more write reqeusts to process a writeback slice, where a slice is tailored more to logical block divisions within the file (such as crypto blocks, an object layout or cache granules) than the protocol RPC maximum capacity. The dispatch doesn't happen until throttling allows, at which point the entire writeback slice is processed and queued. A slice may be written to multiple destinations (one or more servers and the local cache) and the writes to each destination might be split up along different lines. The writeback slice holds the required folios pinned. An iov_iter is provided in netfs_write_request that describes the buffer to be used. This may be part of the pagecache, may have auxiliary padding pages attached or may be a bounce buffer resulting from crypto or compression. Consequently, the filesystem must not twiddle the folio markings directly. The following API is available to the filesystem: (1) The ->create_write_requests() method is called to ask the filesystem to create the requests it needs. This is passed the writeback slice to be processed. (2) The filesystem should then call netfs_create_write_request() to create the requests it needs. (3) Once a request is initialised, netfs_queue_write_request() can be called to dispatch it asynchronously, if not completed immediately. (4) netfs_write_request_completed() should be called to note the completion of a request. (5) netfs_get_write_request() and netfs_put_write_request() are provided to refcount a request. These take constants from the netfs_wreq_trace enum for logging into ftrace. (6) The ->free_write_request is method is called to ask the filesystem to clean up a request. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2021-06-30 05:31:48 +08:00
TRACE_EVENT(netfs_write,
TP_PROTO(const struct netfs_io_request *wreq,
enum netfs_write_trace what),
TP_ARGS(wreq, what),
TP_STRUCT__entry(
__field(unsigned int, wreq )
__field(unsigned int, cookie )
netfs: New writeback implementation The current netfslib writeback implementation creates writeback requests of contiguous folio data and then separately tiles subrequests over the space twice, once for the server and once for the cache. This creates a few issues: (1) Every time there's a discontiguity or a change between writing to only one destination or writing to both, it must create a new request. This makes it harder to do vectored writes. (2) The folios don't have the writeback mark removed until the end of the request - and a request could be hundreds of megabytes. (3) In future, I want to support a larger cache granularity, which will require aggregation of some folios that contain unmodified data (which only need to go to the cache) and some which contain modifications (which need to be uploaded and stored to the cache) - but, currently, these are treated as discontiguous. There's also a move to get everyone to use writeback_iter() to extract writable folios from the pagecache. That said, currently writeback_iter() has some issues that make it less than ideal: (1) there's no way to cancel the iteration, even if you find a "temporary" error that means the current folio and all subsequent folios are going to fail; (2) there's no way to filter the folios being written back - something that will impact Ceph with it's ordered snap system; (3) and if you get a folio you can't immediately deal with (say you need to flush the preceding writes), you are left with a folio hanging in the locked state for the duration, when really we should unlock it and relock it later. In this new implementation, I use writeback_iter() to pump folios, progressively creating two parallel, but separate streams and cleaning up the finished folios as the subrequests complete. Either or both streams can contain gaps, and the subrequests in each stream can be of variable size, don't need to align with each other and don't need to align with the folios. Indeed, subrequests can cross folio boundaries, may cover several folios or a folio may be spanned by multiple folios, e.g.: +---+---+-----+-----+---+----------+ Folios: | | | | | | | +---+---+-----+-----+---+----------+ +------+------+ +----+----+ Upload: | | |.....| | | +------+------+ +----+----+ +------+------+------+------+------+ Cache: | | | | | | +------+------+------+------+------+ The progressive subrequest construction permits the algorithm to be preparing both the next upload to the server and the next write to the cache whilst the previous ones are already in progress. Throttling can be applied to control the rate of production of subrequests - and, in any case, we probably want to write them to the server in ascending order, particularly if the file will be extended. Content crypto can also be prepared at the same time as the subrequests and run asynchronously, with the prepped requests being stalled until the crypto catches up with them. This might also be useful for transport crypto, but that happens at a lower layer, so probably would be harder to pull off. The algorithm is split into three parts: (1) The issuer. This walks through the data, packaging it up, encrypting it and creating subrequests. The part of this that generates subrequests only deals with file positions and spans and so is usable for DIO/unbuffered writes as well as buffered writes. (2) The collector. This asynchronously collects completed subrequests, unlocks folios, frees crypto buffers and performs any retries. This runs in a work queue so that the issuer can return to the caller for writeback (so that the VM can have its kswapd thread back) or async writes. (3) The retryer. This pauses the issuer, waits for all outstanding subrequests to complete and then goes through the failed subrequests to reissue them. This may involve reprepping them (with cifs, the credits must be renegotiated, and a subrequest may need splitting), and doing RMW for content crypto if there's a conflicting change on the server. [!] Note that some of the functions are prefixed with "new_" to avoid clashes with existing functions. These will be renamed in a later patch that cuts over to the new algorithm. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Eric Van Hensbergen <ericvh@kernel.org> cc: Latchesar Ionkov <lucho@ionkov.net> cc: Dominique Martinet <asmadeus@codewreck.org> cc: Christian Schoenebeck <linux_oss@crudebyte.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: v9fs@lists.linux.dev cc: linux-afs@lists.infradead.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org
2024-03-19 00:52:05 +08:00
__field(unsigned int, ino )
netfs: Dispatch write requests to process a writeback slice Dispatch one or more write reqeusts to process a writeback slice, where a slice is tailored more to logical block divisions within the file (such as crypto blocks, an object layout or cache granules) than the protocol RPC maximum capacity. The dispatch doesn't happen until throttling allows, at which point the entire writeback slice is processed and queued. A slice may be written to multiple destinations (one or more servers and the local cache) and the writes to each destination might be split up along different lines. The writeback slice holds the required folios pinned. An iov_iter is provided in netfs_write_request that describes the buffer to be used. This may be part of the pagecache, may have auxiliary padding pages attached or may be a bounce buffer resulting from crypto or compression. Consequently, the filesystem must not twiddle the folio markings directly. The following API is available to the filesystem: (1) The ->create_write_requests() method is called to ask the filesystem to create the requests it needs. This is passed the writeback slice to be processed. (2) The filesystem should then call netfs_create_write_request() to create the requests it needs. (3) Once a request is initialised, netfs_queue_write_request() can be called to dispatch it asynchronously, if not completed immediately. (4) netfs_write_request_completed() should be called to note the completion of a request. (5) netfs_get_write_request() and netfs_put_write_request() are provided to refcount a request. These take constants from the netfs_wreq_trace enum for logging into ftrace. (6) The ->free_write_request is method is called to ask the filesystem to clean up a request. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2021-06-30 05:31:48 +08:00
__field(enum netfs_write_trace, what )
__field(unsigned long long, start )
__field(unsigned long long, len )
netfs: Dispatch write requests to process a writeback slice Dispatch one or more write reqeusts to process a writeback slice, where a slice is tailored more to logical block divisions within the file (such as crypto blocks, an object layout or cache granules) than the protocol RPC maximum capacity. The dispatch doesn't happen until throttling allows, at which point the entire writeback slice is processed and queued. A slice may be written to multiple destinations (one or more servers and the local cache) and the writes to each destination might be split up along different lines. The writeback slice holds the required folios pinned. An iov_iter is provided in netfs_write_request that describes the buffer to be used. This may be part of the pagecache, may have auxiliary padding pages attached or may be a bounce buffer resulting from crypto or compression. Consequently, the filesystem must not twiddle the folio markings directly. The following API is available to the filesystem: (1) The ->create_write_requests() method is called to ask the filesystem to create the requests it needs. This is passed the writeback slice to be processed. (2) The filesystem should then call netfs_create_write_request() to create the requests it needs. (3) Once a request is initialised, netfs_queue_write_request() can be called to dispatch it asynchronously, if not completed immediately. (4) netfs_write_request_completed() should be called to note the completion of a request. (5) netfs_get_write_request() and netfs_put_write_request() are provided to refcount a request. These take constants from the netfs_wreq_trace enum for logging into ftrace. (6) The ->free_write_request is method is called to ask the filesystem to clean up a request. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2021-06-30 05:31:48 +08:00
),
TP_fast_assign(
struct netfs_inode *__ctx = netfs_inode(wreq->inode);
struct fscache_cookie *__cookie = netfs_i_cookie(__ctx);
__entry->wreq = wreq->debug_id;
__entry->cookie = __cookie ? __cookie->debug_id : 0;
netfs: New writeback implementation The current netfslib writeback implementation creates writeback requests of contiguous folio data and then separately tiles subrequests over the space twice, once for the server and once for the cache. This creates a few issues: (1) Every time there's a discontiguity or a change between writing to only one destination or writing to both, it must create a new request. This makes it harder to do vectored writes. (2) The folios don't have the writeback mark removed until the end of the request - and a request could be hundreds of megabytes. (3) In future, I want to support a larger cache granularity, which will require aggregation of some folios that contain unmodified data (which only need to go to the cache) and some which contain modifications (which need to be uploaded and stored to the cache) - but, currently, these are treated as discontiguous. There's also a move to get everyone to use writeback_iter() to extract writable folios from the pagecache. That said, currently writeback_iter() has some issues that make it less than ideal: (1) there's no way to cancel the iteration, even if you find a "temporary" error that means the current folio and all subsequent folios are going to fail; (2) there's no way to filter the folios being written back - something that will impact Ceph with it's ordered snap system; (3) and if you get a folio you can't immediately deal with (say you need to flush the preceding writes), you are left with a folio hanging in the locked state for the duration, when really we should unlock it and relock it later. In this new implementation, I use writeback_iter() to pump folios, progressively creating two parallel, but separate streams and cleaning up the finished folios as the subrequests complete. Either or both streams can contain gaps, and the subrequests in each stream can be of variable size, don't need to align with each other and don't need to align with the folios. Indeed, subrequests can cross folio boundaries, may cover several folios or a folio may be spanned by multiple folios, e.g.: +---+---+-----+-----+---+----------+ Folios: | | | | | | | +---+---+-----+-----+---+----------+ +------+------+ +----+----+ Upload: | | |.....| | | +------+------+ +----+----+ +------+------+------+------+------+ Cache: | | | | | | +------+------+------+------+------+ The progressive subrequest construction permits the algorithm to be preparing both the next upload to the server and the next write to the cache whilst the previous ones are already in progress. Throttling can be applied to control the rate of production of subrequests - and, in any case, we probably want to write them to the server in ascending order, particularly if the file will be extended. Content crypto can also be prepared at the same time as the subrequests and run asynchronously, with the prepped requests being stalled until the crypto catches up with them. This might also be useful for transport crypto, but that happens at a lower layer, so probably would be harder to pull off. The algorithm is split into three parts: (1) The issuer. This walks through the data, packaging it up, encrypting it and creating subrequests. The part of this that generates subrequests only deals with file positions and spans and so is usable for DIO/unbuffered writes as well as buffered writes. (2) The collector. This asynchronously collects completed subrequests, unlocks folios, frees crypto buffers and performs any retries. This runs in a work queue so that the issuer can return to the caller for writeback (so that the VM can have its kswapd thread back) or async writes. (3) The retryer. This pauses the issuer, waits for all outstanding subrequests to complete and then goes through the failed subrequests to reissue them. This may involve reprepping them (with cifs, the credits must be renegotiated, and a subrequest may need splitting), and doing RMW for content crypto if there's a conflicting change on the server. [!] Note that some of the functions are prefixed with "new_" to avoid clashes with existing functions. These will be renamed in a later patch that cuts over to the new algorithm. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Eric Van Hensbergen <ericvh@kernel.org> cc: Latchesar Ionkov <lucho@ionkov.net> cc: Dominique Martinet <asmadeus@codewreck.org> cc: Christian Schoenebeck <linux_oss@crudebyte.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: v9fs@lists.linux.dev cc: linux-afs@lists.infradead.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org
2024-03-19 00:52:05 +08:00
__entry->ino = wreq->inode->i_ino;
netfs: Dispatch write requests to process a writeback slice Dispatch one or more write reqeusts to process a writeback slice, where a slice is tailored more to logical block divisions within the file (such as crypto blocks, an object layout or cache granules) than the protocol RPC maximum capacity. The dispatch doesn't happen until throttling allows, at which point the entire writeback slice is processed and queued. A slice may be written to multiple destinations (one or more servers and the local cache) and the writes to each destination might be split up along different lines. The writeback slice holds the required folios pinned. An iov_iter is provided in netfs_write_request that describes the buffer to be used. This may be part of the pagecache, may have auxiliary padding pages attached or may be a bounce buffer resulting from crypto or compression. Consequently, the filesystem must not twiddle the folio markings directly. The following API is available to the filesystem: (1) The ->create_write_requests() method is called to ask the filesystem to create the requests it needs. This is passed the writeback slice to be processed. (2) The filesystem should then call netfs_create_write_request() to create the requests it needs. (3) Once a request is initialised, netfs_queue_write_request() can be called to dispatch it asynchronously, if not completed immediately. (4) netfs_write_request_completed() should be called to note the completion of a request. (5) netfs_get_write_request() and netfs_put_write_request() are provided to refcount a request. These take constants from the netfs_wreq_trace enum for logging into ftrace. (6) The ->free_write_request is method is called to ask the filesystem to clean up a request. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2021-06-30 05:31:48 +08:00
__entry->what = what;
__entry->start = wreq->start;
__entry->len = wreq->len;
),
netfs: New writeback implementation The current netfslib writeback implementation creates writeback requests of contiguous folio data and then separately tiles subrequests over the space twice, once for the server and once for the cache. This creates a few issues: (1) Every time there's a discontiguity or a change between writing to only one destination or writing to both, it must create a new request. This makes it harder to do vectored writes. (2) The folios don't have the writeback mark removed until the end of the request - and a request could be hundreds of megabytes. (3) In future, I want to support a larger cache granularity, which will require aggregation of some folios that contain unmodified data (which only need to go to the cache) and some which contain modifications (which need to be uploaded and stored to the cache) - but, currently, these are treated as discontiguous. There's also a move to get everyone to use writeback_iter() to extract writable folios from the pagecache. That said, currently writeback_iter() has some issues that make it less than ideal: (1) there's no way to cancel the iteration, even if you find a "temporary" error that means the current folio and all subsequent folios are going to fail; (2) there's no way to filter the folios being written back - something that will impact Ceph with it's ordered snap system; (3) and if you get a folio you can't immediately deal with (say you need to flush the preceding writes), you are left with a folio hanging in the locked state for the duration, when really we should unlock it and relock it later. In this new implementation, I use writeback_iter() to pump folios, progressively creating two parallel, but separate streams and cleaning up the finished folios as the subrequests complete. Either or both streams can contain gaps, and the subrequests in each stream can be of variable size, don't need to align with each other and don't need to align with the folios. Indeed, subrequests can cross folio boundaries, may cover several folios or a folio may be spanned by multiple folios, e.g.: +---+---+-----+-----+---+----------+ Folios: | | | | | | | +---+---+-----+-----+---+----------+ +------+------+ +----+----+ Upload: | | |.....| | | +------+------+ +----+----+ +------+------+------+------+------+ Cache: | | | | | | +------+------+------+------+------+ The progressive subrequest construction permits the algorithm to be preparing both the next upload to the server and the next write to the cache whilst the previous ones are already in progress. Throttling can be applied to control the rate of production of subrequests - and, in any case, we probably want to write them to the server in ascending order, particularly if the file will be extended. Content crypto can also be prepared at the same time as the subrequests and run asynchronously, with the prepped requests being stalled until the crypto catches up with them. This might also be useful for transport crypto, but that happens at a lower layer, so probably would be harder to pull off. The algorithm is split into three parts: (1) The issuer. This walks through the data, packaging it up, encrypting it and creating subrequests. The part of this that generates subrequests only deals with file positions and spans and so is usable for DIO/unbuffered writes as well as buffered writes. (2) The collector. This asynchronously collects completed subrequests, unlocks folios, frees crypto buffers and performs any retries. This runs in a work queue so that the issuer can return to the caller for writeback (so that the VM can have its kswapd thread back) or async writes. (3) The retryer. This pauses the issuer, waits for all outstanding subrequests to complete and then goes through the failed subrequests to reissue them. This may involve reprepping them (with cifs, the credits must be renegotiated, and a subrequest may need splitting), and doing RMW for content crypto if there's a conflicting change on the server. [!] Note that some of the functions are prefixed with "new_" to avoid clashes with existing functions. These will be renamed in a later patch that cuts over to the new algorithm. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Eric Van Hensbergen <ericvh@kernel.org> cc: Latchesar Ionkov <lucho@ionkov.net> cc: Dominique Martinet <asmadeus@codewreck.org> cc: Christian Schoenebeck <linux_oss@crudebyte.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: v9fs@lists.linux.dev cc: linux-afs@lists.infradead.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org
2024-03-19 00:52:05 +08:00
TP_printk("R=%08x %s c=%08x i=%x by=%llx-%llx",
netfs: Dispatch write requests to process a writeback slice Dispatch one or more write reqeusts to process a writeback slice, where a slice is tailored more to logical block divisions within the file (such as crypto blocks, an object layout or cache granules) than the protocol RPC maximum capacity. The dispatch doesn't happen until throttling allows, at which point the entire writeback slice is processed and queued. A slice may be written to multiple destinations (one or more servers and the local cache) and the writes to each destination might be split up along different lines. The writeback slice holds the required folios pinned. An iov_iter is provided in netfs_write_request that describes the buffer to be used. This may be part of the pagecache, may have auxiliary padding pages attached or may be a bounce buffer resulting from crypto or compression. Consequently, the filesystem must not twiddle the folio markings directly. The following API is available to the filesystem: (1) The ->create_write_requests() method is called to ask the filesystem to create the requests it needs. This is passed the writeback slice to be processed. (2) The filesystem should then call netfs_create_write_request() to create the requests it needs. (3) Once a request is initialised, netfs_queue_write_request() can be called to dispatch it asynchronously, if not completed immediately. (4) netfs_write_request_completed() should be called to note the completion of a request. (5) netfs_get_write_request() and netfs_put_write_request() are provided to refcount a request. These take constants from the netfs_wreq_trace enum for logging into ftrace. (6) The ->free_write_request is method is called to ask the filesystem to clean up a request. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2021-06-30 05:31:48 +08:00
__entry->wreq,
__print_symbolic(__entry->what, netfs_write_traces),
__entry->cookie,
netfs: New writeback implementation The current netfslib writeback implementation creates writeback requests of contiguous folio data and then separately tiles subrequests over the space twice, once for the server and once for the cache. This creates a few issues: (1) Every time there's a discontiguity or a change between writing to only one destination or writing to both, it must create a new request. This makes it harder to do vectored writes. (2) The folios don't have the writeback mark removed until the end of the request - and a request could be hundreds of megabytes. (3) In future, I want to support a larger cache granularity, which will require aggregation of some folios that contain unmodified data (which only need to go to the cache) and some which contain modifications (which need to be uploaded and stored to the cache) - but, currently, these are treated as discontiguous. There's also a move to get everyone to use writeback_iter() to extract writable folios from the pagecache. That said, currently writeback_iter() has some issues that make it less than ideal: (1) there's no way to cancel the iteration, even if you find a "temporary" error that means the current folio and all subsequent folios are going to fail; (2) there's no way to filter the folios being written back - something that will impact Ceph with it's ordered snap system; (3) and if you get a folio you can't immediately deal with (say you need to flush the preceding writes), you are left with a folio hanging in the locked state for the duration, when really we should unlock it and relock it later. In this new implementation, I use writeback_iter() to pump folios, progressively creating two parallel, but separate streams and cleaning up the finished folios as the subrequests complete. Either or both streams can contain gaps, and the subrequests in each stream can be of variable size, don't need to align with each other and don't need to align with the folios. Indeed, subrequests can cross folio boundaries, may cover several folios or a folio may be spanned by multiple folios, e.g.: +---+---+-----+-----+---+----------+ Folios: | | | | | | | +---+---+-----+-----+---+----------+ +------+------+ +----+----+ Upload: | | |.....| | | +------+------+ +----+----+ +------+------+------+------+------+ Cache: | | | | | | +------+------+------+------+------+ The progressive subrequest construction permits the algorithm to be preparing both the next upload to the server and the next write to the cache whilst the previous ones are already in progress. Throttling can be applied to control the rate of production of subrequests - and, in any case, we probably want to write them to the server in ascending order, particularly if the file will be extended. Content crypto can also be prepared at the same time as the subrequests and run asynchronously, with the prepped requests being stalled until the crypto catches up with them. This might also be useful for transport crypto, but that happens at a lower layer, so probably would be harder to pull off. The algorithm is split into three parts: (1) The issuer. This walks through the data, packaging it up, encrypting it and creating subrequests. The part of this that generates subrequests only deals with file positions and spans and so is usable for DIO/unbuffered writes as well as buffered writes. (2) The collector. This asynchronously collects completed subrequests, unlocks folios, frees crypto buffers and performs any retries. This runs in a work queue so that the issuer can return to the caller for writeback (so that the VM can have its kswapd thread back) or async writes. (3) The retryer. This pauses the issuer, waits for all outstanding subrequests to complete and then goes through the failed subrequests to reissue them. This may involve reprepping them (with cifs, the credits must be renegotiated, and a subrequest may need splitting), and doing RMW for content crypto if there's a conflicting change on the server. [!] Note that some of the functions are prefixed with "new_" to avoid clashes with existing functions. These will be renamed in a later patch that cuts over to the new algorithm. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Eric Van Hensbergen <ericvh@kernel.org> cc: Latchesar Ionkov <lucho@ionkov.net> cc: Dominique Martinet <asmadeus@codewreck.org> cc: Christian Schoenebeck <linux_oss@crudebyte.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: v9fs@lists.linux.dev cc: linux-afs@lists.infradead.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org
2024-03-19 00:52:05 +08:00
__entry->ino,
netfs: Dispatch write requests to process a writeback slice Dispatch one or more write reqeusts to process a writeback slice, where a slice is tailored more to logical block divisions within the file (such as crypto blocks, an object layout or cache granules) than the protocol RPC maximum capacity. The dispatch doesn't happen until throttling allows, at which point the entire writeback slice is processed and queued. A slice may be written to multiple destinations (one or more servers and the local cache) and the writes to each destination might be split up along different lines. The writeback slice holds the required folios pinned. An iov_iter is provided in netfs_write_request that describes the buffer to be used. This may be part of the pagecache, may have auxiliary padding pages attached or may be a bounce buffer resulting from crypto or compression. Consequently, the filesystem must not twiddle the folio markings directly. The following API is available to the filesystem: (1) The ->create_write_requests() method is called to ask the filesystem to create the requests it needs. This is passed the writeback slice to be processed. (2) The filesystem should then call netfs_create_write_request() to create the requests it needs. (3) Once a request is initialised, netfs_queue_write_request() can be called to dispatch it asynchronously, if not completed immediately. (4) netfs_write_request_completed() should be called to note the completion of a request. (5) netfs_get_write_request() and netfs_put_write_request() are provided to refcount a request. These take constants from the netfs_wreq_trace enum for logging into ftrace. (6) The ->free_write_request is method is called to ask the filesystem to clean up a request. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2021-06-30 05:31:48 +08:00
__entry->start, __entry->start + __entry->len - 1)
);
netfs: New writeback implementation The current netfslib writeback implementation creates writeback requests of contiguous folio data and then separately tiles subrequests over the space twice, once for the server and once for the cache. This creates a few issues: (1) Every time there's a discontiguity or a change between writing to only one destination or writing to both, it must create a new request. This makes it harder to do vectored writes. (2) The folios don't have the writeback mark removed until the end of the request - and a request could be hundreds of megabytes. (3) In future, I want to support a larger cache granularity, which will require aggregation of some folios that contain unmodified data (which only need to go to the cache) and some which contain modifications (which need to be uploaded and stored to the cache) - but, currently, these are treated as discontiguous. There's also a move to get everyone to use writeback_iter() to extract writable folios from the pagecache. That said, currently writeback_iter() has some issues that make it less than ideal: (1) there's no way to cancel the iteration, even if you find a "temporary" error that means the current folio and all subsequent folios are going to fail; (2) there's no way to filter the folios being written back - something that will impact Ceph with it's ordered snap system; (3) and if you get a folio you can't immediately deal with (say you need to flush the preceding writes), you are left with a folio hanging in the locked state for the duration, when really we should unlock it and relock it later. In this new implementation, I use writeback_iter() to pump folios, progressively creating two parallel, but separate streams and cleaning up the finished folios as the subrequests complete. Either or both streams can contain gaps, and the subrequests in each stream can be of variable size, don't need to align with each other and don't need to align with the folios. Indeed, subrequests can cross folio boundaries, may cover several folios or a folio may be spanned by multiple folios, e.g.: +---+---+-----+-----+---+----------+ Folios: | | | | | | | +---+---+-----+-----+---+----------+ +------+------+ +----+----+ Upload: | | |.....| | | +------+------+ +----+----+ +------+------+------+------+------+ Cache: | | | | | | +------+------+------+------+------+ The progressive subrequest construction permits the algorithm to be preparing both the next upload to the server and the next write to the cache whilst the previous ones are already in progress. Throttling can be applied to control the rate of production of subrequests - and, in any case, we probably want to write them to the server in ascending order, particularly if the file will be extended. Content crypto can also be prepared at the same time as the subrequests and run asynchronously, with the prepped requests being stalled until the crypto catches up with them. This might also be useful for transport crypto, but that happens at a lower layer, so probably would be harder to pull off. The algorithm is split into three parts: (1) The issuer. This walks through the data, packaging it up, encrypting it and creating subrequests. The part of this that generates subrequests only deals with file positions and spans and so is usable for DIO/unbuffered writes as well as buffered writes. (2) The collector. This asynchronously collects completed subrequests, unlocks folios, frees crypto buffers and performs any retries. This runs in a work queue so that the issuer can return to the caller for writeback (so that the VM can have its kswapd thread back) or async writes. (3) The retryer. This pauses the issuer, waits for all outstanding subrequests to complete and then goes through the failed subrequests to reissue them. This may involve reprepping them (with cifs, the credits must be renegotiated, and a subrequest may need splitting), and doing RMW for content crypto if there's a conflicting change on the server. [!] Note that some of the functions are prefixed with "new_" to avoid clashes with existing functions. These will be renamed in a later patch that cuts over to the new algorithm. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Eric Van Hensbergen <ericvh@kernel.org> cc: Latchesar Ionkov <lucho@ionkov.net> cc: Dominique Martinet <asmadeus@codewreck.org> cc: Christian Schoenebeck <linux_oss@crudebyte.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: v9fs@lists.linux.dev cc: linux-afs@lists.infradead.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org
2024-03-19 00:52:05 +08:00
TRACE_EVENT(netfs_collect,
TP_PROTO(const struct netfs_io_request *wreq),
TP_ARGS(wreq),
TP_STRUCT__entry(
__field(unsigned int, wreq )
__field(unsigned int, len )
__field(unsigned long long, transferred )
__field(unsigned long long, start )
),
TP_fast_assign(
__entry->wreq = wreq->debug_id;
__entry->start = wreq->start;
__entry->len = wreq->len;
__entry->transferred = wreq->transferred;
),
TP_printk("R=%08x s=%llx-%llx",
__entry->wreq,
__entry->start + __entry->transferred,
__entry->start + __entry->len)
);
TRACE_EVENT(netfs_collect_contig,
TP_PROTO(const struct netfs_io_request *wreq, unsigned long long to,
enum netfs_collect_contig_trace type),
TP_ARGS(wreq, to, type),
TP_STRUCT__entry(
__field(unsigned int, wreq)
__field(enum netfs_collect_contig_trace, type)
__field(unsigned long long, contiguity)
__field(unsigned long long, to)
),
TP_fast_assign(
__entry->wreq = wreq->debug_id;
__entry->type = type;
__entry->contiguity = wreq->contiguity;
__entry->to = to;
),
TP_printk("R=%08x %llx -> %llx %s",
__entry->wreq,
__entry->contiguity,
__entry->to,
__print_symbolic(__entry->type, netfs_collect_contig_traces))
);
TRACE_EVENT(netfs_collect_sreq,
TP_PROTO(const struct netfs_io_request *wreq,
const struct netfs_io_subrequest *subreq),
TP_ARGS(wreq, subreq),
TP_STRUCT__entry(
__field(unsigned int, wreq )
__field(unsigned int, subreq )
__field(unsigned int, stream )
__field(unsigned int, len )
__field(unsigned int, transferred )
__field(unsigned long long, start )
),
TP_fast_assign(
__entry->wreq = wreq->debug_id;
__entry->subreq = subreq->debug_index;
__entry->stream = subreq->stream_nr;
__entry->start = subreq->start;
__entry->len = subreq->len;
__entry->transferred = subreq->transferred;
),
TP_printk("R=%08x[%u:%02x] s=%llx t=%x/%x",
__entry->wreq, __entry->stream, __entry->subreq,
__entry->start, __entry->transferred, __entry->len)
);
TRACE_EVENT(netfs_collect_folio,
TP_PROTO(const struct netfs_io_request *wreq,
const struct folio *folio,
unsigned long long fend,
unsigned long long collected_to),
TP_ARGS(wreq, folio, fend, collected_to),
TP_STRUCT__entry(
__field(unsigned int, wreq )
__field(unsigned long, index )
__field(unsigned long long, fend )
__field(unsigned long long, cleaned_to )
__field(unsigned long long, collected_to )
),
TP_fast_assign(
__entry->wreq = wreq->debug_id;
__entry->index = folio->index;
__entry->fend = fend;
__entry->cleaned_to = wreq->cleaned_to;
__entry->collected_to = collected_to;
),
TP_printk("R=%08x ix=%05lx r=%llx-%llx t=%llx/%llx",
__entry->wreq, __entry->index,
(unsigned long long)__entry->index * PAGE_SIZE, __entry->fend,
__entry->cleaned_to, __entry->collected_to)
);
TRACE_EVENT(netfs_collect_state,
TP_PROTO(const struct netfs_io_request *wreq,
unsigned long long collected_to,
unsigned int notes),
TP_ARGS(wreq, collected_to, notes),
TP_STRUCT__entry(
__field(unsigned int, wreq )
__field(unsigned int, notes )
__field(unsigned long long, collected_to )
__field(unsigned long long, cleaned_to )
__field(unsigned long long, contiguity )
),
TP_fast_assign(
__entry->wreq = wreq->debug_id;
__entry->notes = notes;
__entry->collected_to = collected_to;
__entry->cleaned_to = wreq->cleaned_to;
__entry->contiguity = wreq->contiguity;
),
TP_printk("R=%08x cto=%llx fto=%llx ctg=%llx n=%x",
__entry->wreq, __entry->collected_to,
__entry->cleaned_to, __entry->contiguity,
__entry->notes)
);
TRACE_EVENT(netfs_collect_gap,
TP_PROTO(const struct netfs_io_request *wreq,
const struct netfs_io_stream *stream,
unsigned long long jump_to, char type),
TP_ARGS(wreq, stream, jump_to, type),
TP_STRUCT__entry(
__field(unsigned int, wreq)
__field(unsigned char, stream)
__field(unsigned char, type)
__field(unsigned long long, from)
__field(unsigned long long, to)
),
TP_fast_assign(
__entry->wreq = wreq->debug_id;
__entry->stream = stream->stream_nr;
__entry->from = stream->collected_to;
__entry->to = jump_to;
__entry->type = type;
),
TP_printk("R=%08x[%x:] %llx->%llx %c",
__entry->wreq, __entry->stream,
__entry->from, __entry->to, __entry->type)
);
TRACE_EVENT(netfs_collect_stream,
TP_PROTO(const struct netfs_io_request *wreq,
const struct netfs_io_stream *stream),
TP_ARGS(wreq, stream),
TP_STRUCT__entry(
__field(unsigned int, wreq)
__field(unsigned char, stream)
__field(unsigned long long, collected_to)
__field(unsigned long long, front)
),
TP_fast_assign(
__entry->wreq = wreq->debug_id;
__entry->stream = stream->stream_nr;
__entry->collected_to = stream->collected_to;
__entry->front = stream->front ? stream->front->start : UINT_MAX;
),
TP_printk("R=%08x[%x:] cto=%llx frn=%llx",
__entry->wreq, __entry->stream,
__entry->collected_to, __entry->front)
);
#undef EM
#undef E_
netfs: Add tracepoints Add three tracepoints to track the activity of the read helpers: (1) netfs/netfs_read This logs entry to the read helpers and also expansion of the range in a readahead request. (2) netfs/netfs_rreq This logs the progress of netfs_read_request objects which track read requests. A read request may be a compound of multiple subrequests. (3) netfs/netfs_sreq This logs the progress of netfs_read_subrequest objects, which track the contributions from various sources to a read request. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-and-tested-by: Jeff Layton <jlayton@kernel.org> Tested-by: Dave Wysochanski <dwysocha@redhat.com> Tested-By: Marc Dionne <marc.dionne@auristor.com> cc: Matthew Wilcox <willy@infradead.org> cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/161118138060.1232039.5353374588021776217.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161033468.2537118.14021843889844001905.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340395843.1303470.7355519662919639648.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/161539538693.286939.10171713520419106334.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/161653796447.2770958.1870655382450862155.stgit@warthog.procyon.org.uk/ # v5 Link: https://lore.kernel.org/r/161789078003.6155.17814844411672989942.stgit@warthog.procyon.org.uk/ # v6
2020-09-18 16:25:13 +08:00
#endif /* _TRACE_NETFS_H */
/* This part must be outside protection */
#include <trace/define_trace.h>