Changes since the last update:

- Enable large folios for iomap/fscache mode;
 
  - Avoid sysfs warning due to mounting twice with the same fsid and
    domain_id in fscache mode;
 
  - Refine fscache interface among erofs, fscache, and cachefiles;
 
  - Use kmap_local_page() only for metabuf;
 
  - Fixes around crafted images found by syzbot;
 
  - Minor cleanups and documentation updates.
 -----BEGIN PGP SIGNATURE-----
 
 iIcEABYIAC8WIQThPAmQN9sSA0DVxtI5NzHcH7XmBAUCY5S3khEceGlhbmdAa2Vy
 bmVsLm9yZwAKCRA5NzHcH7XmBLr3AQDA5xpztSsxfe0Gp+bwf12ySuntimJxXmAj
 83EHCfSC+AEAu4fcWkIF38MBBVJvFVjFaXCZKmFossbI5Rp8TuqPpgk=
 =HDsJ
 -----END PGP SIGNATURE-----

Merge tag 'erofs-for-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs

Pull erofs updates from Gao Xiang:
 "In this cycle, large folios are now enabled in the iomap/fscache mode
  for uncompressed files first. In order to do that, we've also cleaned
  up better interfaces between erofs and fscache, which are acked by
  fscache/netfs folks and included in this pull request.

  Other than that, there are random fixes around erofs over fscache and
  crafted images by syzbot, minor cleanups and documentation updates.

  Summary:

   - Enable large folios for iomap/fscache mode

   - Avoid sysfs warning due to mounting twice with the same fsid and
     domain_id in fscache mode

   - Refine fscache interface among erofs, fscache, and cachefiles

   - Use kmap_local_page() only for metabuf

   - Fixes around crafted images found by syzbot

   - Minor cleanups and documentation updates"

* tag 'erofs-for-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
  erofs: validate the extent length for uncompressed pclusters
  erofs: fix missing unmap if z_erofs_get_extent_compressedlen() fails
  erofs: Fix pcluster memleak when its block address is zero
  erofs: use kmap_local_page() only for erofs_bread()
  erofs: enable large folios for fscache mode
  erofs: support large folios for fscache mode
  erofs: switch to prepare_ondemand_read() in fscache mode
  fscache,cachefiles: add prepare_ondemand_read() callback
  erofs: clean up cached I/O strategies
  erofs: update documentation
  erofs: check the uniqueness of fsid in shared domain in advance
  erofs: enable large folios for iomap mode
This commit is contained in:
Linus Torvalds 2022-12-12 20:14:04 -08:00
commit 4a6bff1187
12 changed files with 348 additions and 348 deletions

View File

@ -30,12 +30,18 @@ It is implemented to be a better choice for the following scenarios:
especially for those embedded devices with limited memory and high-density especially for those embedded devices with limited memory and high-density
hosts with numerous containers. hosts with numerous containers.
Here is the main features of EROFS: Here are the main features of EROFS:
- Little endian on-disk design; - Little endian on-disk design;
- 4KiB block size and 32-bit block addresses, therefore 16TiB address space - Block-based distribution and file-based distribution over fscache are
at most for now; supported;
- Support multiple devices to refer to external blobs, which can be used
for container images;
- 4KiB block size and 32-bit block addresses for each device, therefore
16TiB address space at most for now;
- Two inode layouts for different requirements: - Two inode layouts for different requirements:
@ -50,28 +56,31 @@ Here is the main features of EROFS:
Metadata reserved 8 bytes 18 bytes Metadata reserved 8 bytes 18 bytes
===================== ============ ====================================== ===================== ============ ======================================
- Metadata and data could be mixed as an option; - Support extended attributes as an option;
- Support extended attributes (xattrs) as an option; - Support POSIX.1e ACLs by using extended attributes;
- Support tailpacking data and xattr inline compared to byte-addressed
unaligned metadata or smaller block size alternatives;
- Support POSIX.1e ACLs by using xattrs;
- Support transparent data compression as an option: - Support transparent data compression as an option:
LZ4 and MicroLZMA algorithms can be used on a per-file basis; In addition, LZ4 and MicroLZMA algorithms can be used on a per-file basis; In addition,
inplace decompression is also supported to avoid bounce compressed buffers inplace decompression is also supported to avoid bounce compressed buffers
and page cache thrashing. and page cache thrashing.
- Support chunk-based data deduplication and rolling-hash compressed data
deduplication;
- Support tailpacking inline compared to byte-addressed unaligned metadata
or smaller block size alternatives;
- Support merging tail-end data into a special inode as fragments.
- Support large folios for uncompressed files.
- Support direct I/O on uncompressed files to avoid double caching for loop - Support direct I/O on uncompressed files to avoid double caching for loop
devices; devices;
- Support FSDAX on uncompressed images for secure containers and ramdisks in - Support FSDAX on uncompressed images for secure containers and ramdisks in
order to get rid of unnecessary page cache. order to get rid of unnecessary page cache.
- Support multiple devices for multi blob container images;
- Support file-based on-demand loading with the Fscache infrastructure. - Support file-based on-demand loading with the Fscache infrastructure.
The following git tree provides the file system user-space tools under The following git tree provides the file system user-space tools under
@ -259,7 +268,7 @@ By the way, chunk-based files are all uncompressed for now.
Data compression Data compression
---------------- ----------------
EROFS implements LZ4 fixed-sized output compression which generates fixed-sized EROFS implements fixed-sized output compression which generates fixed-sized
compressed data blocks from variable-sized input in contrast to other existing compressed data blocks from variable-sized input in contrast to other existing
fixed-sized input solutions. Relatively higher compression ratios can be gotten fixed-sized input solutions. Relatively higher compression ratios can be gotten
by using fixed-sized output compression since nowadays popular data compression by using fixed-sized output compression since nowadays popular data compression
@ -314,3 +323,6 @@ to understand its delta0 is constantly 1, as illustrated below::
If another HEAD follows a HEAD lcluster, there is no room to record CBLKCNT, If another HEAD follows a HEAD lcluster, there is no room to record CBLKCNT,
but it's easy to know the size of such pcluster is 1 lcluster as well. but it's easy to know the size of such pcluster is 1 lcluster as well.
Since Linux v6.1, each pcluster can be used for multiple variable-sized extents,
therefore it can be used for compressed data deduplication.

View File

@ -385,38 +385,35 @@ static int cachefiles_write(struct netfs_cache_resources *cres,
term_func, term_func_priv); term_func, term_func_priv);
} }
/* static inline enum netfs_io_source
* Prepare a read operation, shortening it to a cached/uncached cachefiles_do_prepare_read(struct netfs_cache_resources *cres,
* boundary as appropriate. loff_t start, size_t *_len, loff_t i_size,
*/ unsigned long *_flags, ino_t netfs_ino)
static enum netfs_io_source cachefiles_prepare_read(struct netfs_io_subrequest *subreq,
loff_t i_size)
{ {
enum cachefiles_prepare_read_trace why; enum cachefiles_prepare_read_trace why;
struct netfs_io_request *rreq = subreq->rreq; struct cachefiles_object *object = NULL;
struct netfs_cache_resources *cres = &rreq->cache_resources;
struct cachefiles_object *object;
struct cachefiles_cache *cache; struct cachefiles_cache *cache;
struct fscache_cookie *cookie = fscache_cres_cookie(cres); struct fscache_cookie *cookie = fscache_cres_cookie(cres);
const struct cred *saved_cred; const struct cred *saved_cred;
struct file *file = cachefiles_cres_file(cres); struct file *file = cachefiles_cres_file(cres);
enum netfs_io_source ret = NETFS_DOWNLOAD_FROM_SERVER; enum netfs_io_source ret = NETFS_DOWNLOAD_FROM_SERVER;
size_t len = *_len;
loff_t off, to; loff_t off, to;
ino_t ino = file ? file_inode(file)->i_ino : 0; ino_t ino = file ? file_inode(file)->i_ino : 0;
int rc; int rc;
_enter("%zx @%llx/%llx", subreq->len, subreq->start, i_size); _enter("%zx @%llx/%llx", len, start, i_size);
if (subreq->start >= i_size) { if (start >= i_size) {
ret = NETFS_FILL_WITH_ZEROES; ret = NETFS_FILL_WITH_ZEROES;
why = cachefiles_trace_read_after_eof; why = cachefiles_trace_read_after_eof;
goto out_no_object; goto out_no_object;
} }
if (test_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags)) { if (test_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags)) {
__set_bit(NETFS_SREQ_COPY_TO_CACHE, &subreq->flags); __set_bit(NETFS_SREQ_COPY_TO_CACHE, _flags);
why = cachefiles_trace_read_no_data; why = cachefiles_trace_read_no_data;
if (!test_bit(NETFS_SREQ_ONDEMAND, &subreq->flags)) if (!test_bit(NETFS_SREQ_ONDEMAND, _flags))
goto out_no_object; goto out_no_object;
} }
@ -437,7 +434,7 @@ static enum netfs_io_source cachefiles_prepare_read(struct netfs_io_subrequest *
retry: retry:
off = cachefiles_inject_read_error(); off = cachefiles_inject_read_error();
if (off == 0) if (off == 0)
off = vfs_llseek(file, subreq->start, SEEK_DATA); off = vfs_llseek(file, start, SEEK_DATA);
if (off < 0 && off >= (loff_t)-MAX_ERRNO) { if (off < 0 && off >= (loff_t)-MAX_ERRNO) {
if (off == (loff_t)-ENXIO) { if (off == (loff_t)-ENXIO) {
why = cachefiles_trace_read_seek_nxio; why = cachefiles_trace_read_seek_nxio;
@ -449,21 +446,22 @@ retry:
goto out; goto out;
} }
if (off >= subreq->start + subreq->len) { if (off >= start + len) {
why = cachefiles_trace_read_found_hole; why = cachefiles_trace_read_found_hole;
goto download_and_store; goto download_and_store;
} }
if (off > subreq->start) { if (off > start) {
off = round_up(off, cache->bsize); off = round_up(off, cache->bsize);
subreq->len = off - subreq->start; len = off - start;
*_len = len;
why = cachefiles_trace_read_found_part; why = cachefiles_trace_read_found_part;
goto download_and_store; goto download_and_store;
} }
to = cachefiles_inject_read_error(); to = cachefiles_inject_read_error();
if (to == 0) if (to == 0)
to = vfs_llseek(file, subreq->start, SEEK_HOLE); to = vfs_llseek(file, start, SEEK_HOLE);
if (to < 0 && to >= (loff_t)-MAX_ERRNO) { if (to < 0 && to >= (loff_t)-MAX_ERRNO) {
trace_cachefiles_io_error(object, file_inode(file), to, trace_cachefiles_io_error(object, file_inode(file), to,
cachefiles_trace_seek_error); cachefiles_trace_seek_error);
@ -471,12 +469,13 @@ retry:
goto out; goto out;
} }
if (to < subreq->start + subreq->len) { if (to < start + len) {
if (subreq->start + subreq->len >= i_size) if (start + len >= i_size)
to = round_up(to, cache->bsize); to = round_up(to, cache->bsize);
else else
to = round_down(to, cache->bsize); to = round_down(to, cache->bsize);
subreq->len = to - subreq->start; len = to - start;
*_len = len;
} }
why = cachefiles_trace_read_have_data; why = cachefiles_trace_read_have_data;
@ -484,12 +483,11 @@ retry:
goto out; goto out;
download_and_store: download_and_store:
__set_bit(NETFS_SREQ_COPY_TO_CACHE, &subreq->flags); __set_bit(NETFS_SREQ_COPY_TO_CACHE, _flags);
if (test_bit(NETFS_SREQ_ONDEMAND, &subreq->flags)) { if (test_bit(NETFS_SREQ_ONDEMAND, _flags)) {
rc = cachefiles_ondemand_read(object, subreq->start, rc = cachefiles_ondemand_read(object, start, len);
subreq->len);
if (!rc) { if (!rc) {
__clear_bit(NETFS_SREQ_ONDEMAND, &subreq->flags); __clear_bit(NETFS_SREQ_ONDEMAND, _flags);
goto retry; goto retry;
} }
ret = NETFS_INVALID_READ; ret = NETFS_INVALID_READ;
@ -497,10 +495,34 @@ download_and_store:
out: out:
cachefiles_end_secure(cache, saved_cred); cachefiles_end_secure(cache, saved_cred);
out_no_object: out_no_object:
trace_cachefiles_prep_read(subreq, ret, why, ino); trace_cachefiles_prep_read(object, start, len, *_flags, ret, why, ino, netfs_ino);
return ret; return ret;
} }
/*
* Prepare a read operation, shortening it to a cached/uncached
* boundary as appropriate.
*/
static enum netfs_io_source cachefiles_prepare_read(struct netfs_io_subrequest *subreq,
loff_t i_size)
{
return cachefiles_do_prepare_read(&subreq->rreq->cache_resources,
subreq->start, &subreq->len, i_size,
&subreq->flags, subreq->rreq->inode->i_ino);
}
/*
* Prepare an on-demand read operation, shortening it to a cached/uncached
* boundary as appropriate.
*/
static enum netfs_io_source
cachefiles_prepare_ondemand_read(struct netfs_cache_resources *cres,
loff_t start, size_t *_len, loff_t i_size,
unsigned long *_flags, ino_t ino)
{
return cachefiles_do_prepare_read(cres, start, _len, i_size, _flags, ino);
}
/* /*
* Prepare for a write to occur. * Prepare for a write to occur.
*/ */
@ -621,6 +643,7 @@ static const struct netfs_cache_ops cachefiles_netfs_cache_ops = {
.write = cachefiles_write, .write = cachefiles_write,
.prepare_read = cachefiles_prepare_read, .prepare_read = cachefiles_prepare_read,
.prepare_write = cachefiles_prepare_write, .prepare_write = cachefiles_prepare_write,
.prepare_ondemand_read = cachefiles_prepare_ondemand_read,
.query_occupancy = cachefiles_query_occupancy, .query_occupancy = cachefiles_query_occupancy,
}; };

View File

@ -13,9 +13,7 @@
void erofs_unmap_metabuf(struct erofs_buf *buf) void erofs_unmap_metabuf(struct erofs_buf *buf)
{ {
if (buf->kmap_type == EROFS_KMAP) if (buf->kmap_type == EROFS_KMAP)
kunmap(buf->page); kunmap_local(buf->base);
else if (buf->kmap_type == EROFS_KMAP_ATOMIC)
kunmap_atomic(buf->base);
buf->base = NULL; buf->base = NULL;
buf->kmap_type = EROFS_NO_KMAP; buf->kmap_type = EROFS_NO_KMAP;
} }
@ -54,9 +52,7 @@ void *erofs_bread(struct erofs_buf *buf, struct inode *inode,
} }
if (buf->kmap_type == EROFS_NO_KMAP) { if (buf->kmap_type == EROFS_NO_KMAP) {
if (type == EROFS_KMAP) if (type == EROFS_KMAP)
buf->base = kmap(page); buf->base = kmap_local_page(page);
else if (type == EROFS_KMAP_ATOMIC)
buf->base = kmap_atomic(page);
buf->kmap_type = type; buf->kmap_type = type;
} else if (buf->kmap_type != type) { } else if (buf->kmap_type != type) {
DBG_BUGON(1); DBG_BUGON(1);
@ -403,6 +399,8 @@ const struct address_space_operations erofs_raw_access_aops = {
.readahead = erofs_readahead, .readahead = erofs_readahead,
.bmap = erofs_bmap, .bmap = erofs_bmap,
.direct_IO = noop_direct_IO, .direct_IO = noop_direct_IO,
.release_folio = iomap_release_folio,
.invalidate_folio = iomap_invalidate_folio,
}; };
#ifdef CONFIG_FS_DAX #ifdef CONFIG_FS_DAX

View File

@ -11,265 +11,201 @@ static DEFINE_MUTEX(erofs_domain_cookies_lock);
static LIST_HEAD(erofs_domain_list); static LIST_HEAD(erofs_domain_list);
static struct vfsmount *erofs_pseudo_mnt; static struct vfsmount *erofs_pseudo_mnt;
static struct netfs_io_request *erofs_fscache_alloc_request(struct address_space *mapping, struct erofs_fscache_request {
struct erofs_fscache_request *primary;
struct netfs_cache_resources cache_resources;
struct address_space *mapping; /* The mapping being accessed */
loff_t start; /* Start position */
size_t len; /* Length of the request */
size_t submitted; /* Length of submitted */
short error; /* 0 or error that occurred */
refcount_t ref;
};
static struct erofs_fscache_request *erofs_fscache_req_alloc(struct address_space *mapping,
loff_t start, size_t len) loff_t start, size_t len)
{ {
struct netfs_io_request *rreq; struct erofs_fscache_request *req;
rreq = kzalloc(sizeof(struct netfs_io_request), GFP_KERNEL); req = kzalloc(sizeof(struct erofs_fscache_request), GFP_KERNEL);
if (!rreq) if (!req)
return ERR_PTR(-ENOMEM); return ERR_PTR(-ENOMEM);
rreq->start = start; req->mapping = mapping;
rreq->len = len; req->start = start;
rreq->mapping = mapping; req->len = len;
rreq->inode = mapping->host; refcount_set(&req->ref, 1);
INIT_LIST_HEAD(&rreq->subrequests);
refcount_set(&rreq->ref, 1); return req;
return rreq;
} }
static void erofs_fscache_put_request(struct netfs_io_request *rreq) static struct erofs_fscache_request *erofs_fscache_req_chain(struct erofs_fscache_request *primary,
size_t len)
{ {
if (!refcount_dec_and_test(&rreq->ref)) struct erofs_fscache_request *req;
return;
if (rreq->cache_resources.ops)
rreq->cache_resources.ops->end_operation(&rreq->cache_resources);
kfree(rreq);
}
static void erofs_fscache_put_subrequest(struct netfs_io_subrequest *subreq) /* use primary request for the first submission */
{ if (!primary->submitted) {
if (!refcount_dec_and_test(&subreq->ref)) refcount_inc(&primary->ref);
return; return primary;
erofs_fscache_put_request(subreq->rreq);
kfree(subreq);
}
static void erofs_fscache_clear_subrequests(struct netfs_io_request *rreq)
{
struct netfs_io_subrequest *subreq;
while (!list_empty(&rreq->subrequests)) {
subreq = list_first_entry(&rreq->subrequests,
struct netfs_io_subrequest, rreq_link);
list_del(&subreq->rreq_link);
erofs_fscache_put_subrequest(subreq);
} }
req = erofs_fscache_req_alloc(primary->mapping,
primary->start + primary->submitted, len);
if (!IS_ERR(req)) {
req->primary = primary;
refcount_inc(&primary->ref);
}
return req;
} }
static void erofs_fscache_rreq_unlock_folios(struct netfs_io_request *rreq) static void erofs_fscache_req_complete(struct erofs_fscache_request *req)
{ {
struct netfs_io_subrequest *subreq;
struct folio *folio; struct folio *folio;
unsigned int iopos = 0; bool failed = req->error;
pgoff_t start_page = rreq->start / PAGE_SIZE; pgoff_t start_page = req->start / PAGE_SIZE;
pgoff_t last_page = ((rreq->start + rreq->len) / PAGE_SIZE) - 1; pgoff_t last_page = ((req->start + req->len) / PAGE_SIZE) - 1;
bool subreq_failed = false;
XA_STATE(xas, &rreq->mapping->i_pages, start_page); XA_STATE(xas, &req->mapping->i_pages, start_page);
subreq = list_first_entry(&rreq->subrequests,
struct netfs_io_subrequest, rreq_link);
subreq_failed = (subreq->error < 0);
rcu_read_lock(); rcu_read_lock();
xas_for_each(&xas, folio, last_page) { xas_for_each(&xas, folio, last_page) {
unsigned int pgpos, pgend;
bool pg_failed = false;
if (xas_retry(&xas, folio)) if (xas_retry(&xas, folio))
continue; continue;
if (!failed)
pgpos = (folio_index(folio) - start_page) * PAGE_SIZE;
pgend = pgpos + folio_size(folio);
for (;;) {
if (!subreq) {
pg_failed = true;
break;
}
pg_failed |= subreq_failed;
if (pgend < iopos + subreq->len)
break;
iopos += subreq->len;
if (!list_is_last(&subreq->rreq_link,
&rreq->subrequests)) {
subreq = list_next_entry(subreq, rreq_link);
subreq_failed = (subreq->error < 0);
} else {
subreq = NULL;
subreq_failed = false;
}
if (pgend == iopos)
break;
}
if (!pg_failed)
folio_mark_uptodate(folio); folio_mark_uptodate(folio);
folio_unlock(folio); folio_unlock(folio);
} }
rcu_read_unlock(); rcu_read_unlock();
} }
static void erofs_fscache_rreq_complete(struct netfs_io_request *rreq) static void erofs_fscache_req_put(struct erofs_fscache_request *req)
{ {
erofs_fscache_rreq_unlock_folios(rreq); if (refcount_dec_and_test(&req->ref)) {
erofs_fscache_clear_subrequests(rreq); if (req->cache_resources.ops)
erofs_fscache_put_request(rreq); req->cache_resources.ops->end_operation(&req->cache_resources);
if (!req->primary)
erofs_fscache_req_complete(req);
else
erofs_fscache_req_put(req->primary);
kfree(req);
}
} }
static void erofc_fscache_subreq_complete(void *priv, static void erofs_fscache_subreq_complete(void *priv,
ssize_t transferred_or_error, bool was_async) ssize_t transferred_or_error, bool was_async)
{ {
struct netfs_io_subrequest *subreq = priv; struct erofs_fscache_request *req = priv;
struct netfs_io_request *rreq = subreq->rreq;
if (IS_ERR_VALUE(transferred_or_error)) if (IS_ERR_VALUE(transferred_or_error)) {
subreq->error = transferred_or_error; if (req->primary)
req->primary->error = transferred_or_error;
if (atomic_dec_and_test(&rreq->nr_outstanding)) else
erofs_fscache_rreq_complete(rreq); req->error = transferred_or_error;
}
erofs_fscache_put_subrequest(subreq); erofs_fscache_req_put(req);
} }
/* /*
* Read data from fscache and fill the read data into page cache described by * Read data from fscache (cookie, pstart, len), and fill the read data into
* @rreq, which shall be both aligned with PAGE_SIZE. @pstart describes * page cache described by (req->mapping, lstart, len). @pstart describeis the
* the start physical address in the cache file. * start physical address in the cache file.
*/ */
static int erofs_fscache_read_folios_async(struct fscache_cookie *cookie, static int erofs_fscache_read_folios_async(struct fscache_cookie *cookie,
struct netfs_io_request *rreq, loff_t pstart) struct erofs_fscache_request *req, loff_t pstart, size_t len)
{ {
enum netfs_io_source source; enum netfs_io_source source;
struct super_block *sb = rreq->mapping->host->i_sb; struct super_block *sb = req->mapping->host->i_sb;
struct netfs_io_subrequest *subreq; struct netfs_cache_resources *cres = &req->cache_resources;
struct netfs_cache_resources *cres = &rreq->cache_resources;
struct iov_iter iter; struct iov_iter iter;
loff_t start = rreq->start; loff_t lstart = req->start + req->submitted;
size_t len = rreq->len;
size_t done = 0; size_t done = 0;
int ret; int ret;
atomic_set(&rreq->nr_outstanding, 1); DBG_BUGON(len > req->len - req->submitted);
ret = fscache_begin_read_operation(cres, cookie); ret = fscache_begin_read_operation(cres, cookie);
if (ret) if (ret)
goto out; return ret;
while (done < len) { while (done < len) {
subreq = kzalloc(sizeof(struct netfs_io_subrequest), loff_t sstart = pstart + done;
GFP_KERNEL); size_t slen = len - done;
if (subreq) { unsigned long flags = 1 << NETFS_SREQ_ONDEMAND;
INIT_LIST_HEAD(&subreq->rreq_link);
refcount_set(&subreq->ref, 2);
subreq->rreq = rreq;
refcount_inc(&rreq->ref);
} else {
ret = -ENOMEM;
goto out;
}
subreq->start = pstart + done; source = cres->ops->prepare_ondemand_read(cres,
subreq->len = len - done; sstart, &slen, LLONG_MAX, &flags, 0);
subreq->flags = 1 << NETFS_SREQ_ONDEMAND; if (WARN_ON(slen == 0))
list_add_tail(&subreq->rreq_link, &rreq->subrequests);
source = cres->ops->prepare_read(subreq, LLONG_MAX);
if (WARN_ON(subreq->len == 0))
source = NETFS_INVALID_READ; source = NETFS_INVALID_READ;
if (source != NETFS_READ_FROM_CACHE) { if (source != NETFS_READ_FROM_CACHE) {
erofs_err(sb, "failed to fscache prepare_read (source %d)", erofs_err(sb, "failed to fscache prepare_read (source %d)", source);
source); return -EIO;
ret = -EIO;
subreq->error = ret;
erofs_fscache_put_subrequest(subreq);
goto out;
} }
atomic_inc(&rreq->nr_outstanding); refcount_inc(&req->ref);
iov_iter_xarray(&iter, ITER_DEST, &req->mapping->i_pages,
lstart + done, slen);
iov_iter_xarray(&iter, ITER_DEST, &rreq->mapping->i_pages, ret = fscache_read(cres, sstart, &iter, NETFS_READ_HOLE_FAIL,
start + done, subreq->len); erofs_fscache_subreq_complete, req);
ret = fscache_read(cres, subreq->start, &iter,
NETFS_READ_HOLE_FAIL,
erofc_fscache_subreq_complete, subreq);
if (ret == -EIOCBQUEUED) if (ret == -EIOCBQUEUED)
ret = 0; ret = 0;
if (ret) { if (ret) {
erofs_err(sb, "failed to fscache_read (ret %d)", ret); erofs_err(sb, "failed to fscache_read (ret %d)", ret);
goto out; return ret;
} }
done += subreq->len; done += slen;
} }
out: DBG_BUGON(done != len);
if (atomic_dec_and_test(&rreq->nr_outstanding)) return 0;
erofs_fscache_rreq_complete(rreq);
return ret;
} }
static int erofs_fscache_meta_read_folio(struct file *data, struct folio *folio) static int erofs_fscache_meta_read_folio(struct file *data, struct folio *folio)
{ {
int ret; int ret;
struct super_block *sb = folio_mapping(folio)->host->i_sb; struct super_block *sb = folio_mapping(folio)->host->i_sb;
struct netfs_io_request *rreq; struct erofs_fscache_request *req;
struct erofs_map_dev mdev = { struct erofs_map_dev mdev = {
.m_deviceid = 0, .m_deviceid = 0,
.m_pa = folio_pos(folio), .m_pa = folio_pos(folio),
}; };
ret = erofs_map_dev(sb, &mdev); ret = erofs_map_dev(sb, &mdev);
if (ret) if (ret) {
goto out; folio_unlock(folio);
return ret;
rreq = erofs_fscache_alloc_request(folio_mapping(folio),
folio_pos(folio), folio_size(folio));
if (IS_ERR(rreq)) {
ret = PTR_ERR(rreq);
goto out;
} }
return erofs_fscache_read_folios_async(mdev.m_fscache->cookie, req = erofs_fscache_req_alloc(folio_mapping(folio),
rreq, mdev.m_pa); folio_pos(folio), folio_size(folio));
out: if (IS_ERR(req)) {
folio_unlock(folio); folio_unlock(folio);
return PTR_ERR(req);
}
ret = erofs_fscache_read_folios_async(mdev.m_fscache->cookie,
req, mdev.m_pa, folio_size(folio));
if (ret)
req->error = ret;
erofs_fscache_req_put(req);
return ret; return ret;
} }
/* static int erofs_fscache_data_read_slice(struct erofs_fscache_request *primary)
* Read into page cache in the range described by (@pos, @len).
*
* On return, the caller is responsible for page unlocking if the output @unlock
* is true, or the callee will take this responsibility through netfs_io_request
* interface.
*
* The return value is the number of bytes successfully handled, or negative
* error code on failure. The only exception is that, the length of the range
* instead of the error code is returned on failure after netfs_io_request is
* allocated, so that .readahead() could advance rac accordingly.
*/
static int erofs_fscache_data_read(struct address_space *mapping,
loff_t pos, size_t len, bool *unlock)
{ {
struct address_space *mapping = primary->mapping;
struct inode *inode = mapping->host; struct inode *inode = mapping->host;
struct super_block *sb = inode->i_sb; struct super_block *sb = inode->i_sb;
struct netfs_io_request *rreq; struct erofs_fscache_request *req;
struct erofs_map_blocks map; struct erofs_map_blocks map;
struct erofs_map_dev mdev; struct erofs_map_dev mdev;
struct iov_iter iter; struct iov_iter iter;
loff_t pos = primary->start + primary->submitted;
size_t count; size_t count;
int ret; int ret;
*unlock = true;
map.m_la = pos; map.m_la = pos;
ret = erofs_map_blocks(inode, &map, EROFS_GET_BLOCKS_RAW); ret = erofs_map_blocks(inode, &map, EROFS_GET_BLOCKS_RAW);
if (ret) if (ret)
@ -297,17 +233,19 @@ static int erofs_fscache_data_read(struct address_space *mapping,
} }
iov_iter_zero(PAGE_SIZE - size, &iter); iov_iter_zero(PAGE_SIZE - size, &iter);
erofs_put_metabuf(&buf); erofs_put_metabuf(&buf);
return PAGE_SIZE; primary->submitted += PAGE_SIZE;
return 0;
} }
count = primary->len - primary->submitted;
if (!(map.m_flags & EROFS_MAP_MAPPED)) { if (!(map.m_flags & EROFS_MAP_MAPPED)) {
count = len;
iov_iter_xarray(&iter, ITER_DEST, &mapping->i_pages, pos, count); iov_iter_xarray(&iter, ITER_DEST, &mapping->i_pages, pos, count);
iov_iter_zero(count, &iter); iov_iter_zero(count, &iter);
return count; primary->submitted += count;
return 0;
} }
count = min_t(size_t, map.m_llen - (pos - map.m_la), len); count = min_t(size_t, map.m_llen - (pos - map.m_la), count);
DBG_BUGON(!count || count % PAGE_SIZE); DBG_BUGON(!count || count % PAGE_SIZE);
mdev = (struct erofs_map_dev) { mdev = (struct erofs_map_dev) {
@ -318,64 +256,65 @@ static int erofs_fscache_data_read(struct address_space *mapping,
if (ret) if (ret)
return ret; return ret;
rreq = erofs_fscache_alloc_request(mapping, pos, count); req = erofs_fscache_req_chain(primary, count);
if (IS_ERR(rreq)) if (IS_ERR(req))
return PTR_ERR(rreq); return PTR_ERR(req);
*unlock = false; ret = erofs_fscache_read_folios_async(mdev.m_fscache->cookie,
erofs_fscache_read_folios_async(mdev.m_fscache->cookie, req, mdev.m_pa + (pos - map.m_la), count);
rreq, mdev.m_pa + (pos - map.m_la)); erofs_fscache_req_put(req);
return count; primary->submitted += count;
return ret;
}
static int erofs_fscache_data_read(struct erofs_fscache_request *req)
{
int ret;
do {
ret = erofs_fscache_data_read_slice(req);
if (ret)
req->error = ret;
} while (!ret && req->submitted < req->len);
return ret;
} }
static int erofs_fscache_read_folio(struct file *file, struct folio *folio) static int erofs_fscache_read_folio(struct file *file, struct folio *folio)
{ {
bool unlock; struct erofs_fscache_request *req;
int ret; int ret;
DBG_BUGON(folio_size(folio) != EROFS_BLKSIZ); req = erofs_fscache_req_alloc(folio_mapping(folio),
folio_pos(folio), folio_size(folio));
ret = erofs_fscache_data_read(folio_mapping(folio), folio_pos(folio), if (IS_ERR(req)) {
folio_size(folio), &unlock);
if (unlock) {
if (ret > 0)
folio_mark_uptodate(folio);
folio_unlock(folio); folio_unlock(folio);
return PTR_ERR(req);
} }
return ret < 0 ? ret : 0;
ret = erofs_fscache_data_read(req);
erofs_fscache_req_put(req);
return ret;
} }
static void erofs_fscache_readahead(struct readahead_control *rac) static void erofs_fscache_readahead(struct readahead_control *rac)
{ {
struct folio *folio; struct erofs_fscache_request *req;
size_t len, done = 0;
loff_t start, pos;
bool unlock;
int ret, size;
if (!readahead_count(rac)) if (!readahead_count(rac))
return; return;
start = readahead_pos(rac); req = erofs_fscache_req_alloc(rac->mapping,
len = readahead_length(rac); readahead_pos(rac), readahead_length(rac));
if (IS_ERR(req))
return;
do { /* The request completion will drop refs on the folios. */
pos = start + done; while (readahead_folio(rac))
ret = erofs_fscache_data_read(rac->mapping, pos, ;
len - done, &unlock);
if (ret <= 0)
return;
size = ret; erofs_fscache_data_read(req);
while (size) { erofs_fscache_req_put(req);
folio = readahead_folio(rac);
size -= folio_size(folio);
if (unlock) {
folio_mark_uptodate(folio);
folio_unlock(folio);
}
}
} while ((done += ret) < len);
} }
static const struct address_space_operations erofs_fscache_meta_aops = { static const struct address_space_operations erofs_fscache_meta_aops = {
@ -494,7 +433,8 @@ static int erofs_fscache_register_domain(struct super_block *sb)
static static
struct erofs_fscache *erofs_fscache_acquire_cookie(struct super_block *sb, struct erofs_fscache *erofs_fscache_acquire_cookie(struct super_block *sb,
char *name, bool need_inode) char *name,
unsigned int flags)
{ {
struct fscache_volume *volume = EROFS_SB(sb)->volume; struct fscache_volume *volume = EROFS_SB(sb)->volume;
struct erofs_fscache *ctx; struct erofs_fscache *ctx;
@ -516,7 +456,7 @@ struct erofs_fscache *erofs_fscache_acquire_cookie(struct super_block *sb,
fscache_use_cookie(cookie, false); fscache_use_cookie(cookie, false);
ctx->cookie = cookie; ctx->cookie = cookie;
if (need_inode) { if (flags & EROFS_REG_COOKIE_NEED_INODE) {
struct inode *const inode = new_inode(sb); struct inode *const inode = new_inode(sb);
if (!inode) { if (!inode) {
@ -554,14 +494,15 @@ static void erofs_fscache_relinquish_cookie(struct erofs_fscache *ctx)
static static
struct erofs_fscache *erofs_fscache_domain_init_cookie(struct super_block *sb, struct erofs_fscache *erofs_fscache_domain_init_cookie(struct super_block *sb,
char *name, bool need_inode) char *name,
unsigned int flags)
{ {
int err; int err;
struct inode *inode; struct inode *inode;
struct erofs_fscache *ctx; struct erofs_fscache *ctx;
struct erofs_domain *domain = EROFS_SB(sb)->domain; struct erofs_domain *domain = EROFS_SB(sb)->domain;
ctx = erofs_fscache_acquire_cookie(sb, name, need_inode); ctx = erofs_fscache_acquire_cookie(sb, name, flags);
if (IS_ERR(ctx)) if (IS_ERR(ctx))
return ctx; return ctx;
@ -589,7 +530,8 @@ out:
static static
struct erofs_fscache *erofs_domain_register_cookie(struct super_block *sb, struct erofs_fscache *erofs_domain_register_cookie(struct super_block *sb,
char *name, bool need_inode) char *name,
unsigned int flags)
{ {
struct inode *inode; struct inode *inode;
struct erofs_fscache *ctx; struct erofs_fscache *ctx;
@ -602,23 +544,30 @@ struct erofs_fscache *erofs_domain_register_cookie(struct super_block *sb,
ctx = inode->i_private; ctx = inode->i_private;
if (!ctx || ctx->domain != domain || strcmp(ctx->name, name)) if (!ctx || ctx->domain != domain || strcmp(ctx->name, name))
continue; continue;
igrab(inode); if (!(flags & EROFS_REG_COOKIE_NEED_NOEXIST)) {
igrab(inode);
} else {
erofs_err(sb, "%s already exists in domain %s", name,
domain->domain_id);
ctx = ERR_PTR(-EEXIST);
}
spin_unlock(&psb->s_inode_list_lock); spin_unlock(&psb->s_inode_list_lock);
mutex_unlock(&erofs_domain_cookies_lock); mutex_unlock(&erofs_domain_cookies_lock);
return ctx; return ctx;
} }
spin_unlock(&psb->s_inode_list_lock); spin_unlock(&psb->s_inode_list_lock);
ctx = erofs_fscache_domain_init_cookie(sb, name, need_inode); ctx = erofs_fscache_domain_init_cookie(sb, name, flags);
mutex_unlock(&erofs_domain_cookies_lock); mutex_unlock(&erofs_domain_cookies_lock);
return ctx; return ctx;
} }
struct erofs_fscache *erofs_fscache_register_cookie(struct super_block *sb, struct erofs_fscache *erofs_fscache_register_cookie(struct super_block *sb,
char *name, bool need_inode) char *name,
unsigned int flags)
{ {
if (EROFS_SB(sb)->domain_id) if (EROFS_SB(sb)->domain_id)
return erofs_domain_register_cookie(sb, name, need_inode); return erofs_domain_register_cookie(sb, name, flags);
return erofs_fscache_acquire_cookie(sb, name, need_inode); return erofs_fscache_acquire_cookie(sb, name, flags);
} }
void erofs_fscache_unregister_cookie(struct erofs_fscache *ctx) void erofs_fscache_unregister_cookie(struct erofs_fscache *ctx)
@ -647,6 +596,7 @@ int erofs_fscache_register_fs(struct super_block *sb)
int ret; int ret;
struct erofs_sb_info *sbi = EROFS_SB(sb); struct erofs_sb_info *sbi = EROFS_SB(sb);
struct erofs_fscache *fscache; struct erofs_fscache *fscache;
unsigned int flags;
if (sbi->domain_id) if (sbi->domain_id)
ret = erofs_fscache_register_domain(sb); ret = erofs_fscache_register_domain(sb);
@ -655,8 +605,20 @@ int erofs_fscache_register_fs(struct super_block *sb)
if (ret) if (ret)
return ret; return ret;
/* acquired domain/volume will be relinquished in kill_sb() on error */ /*
fscache = erofs_fscache_register_cookie(sb, sbi->fsid, true); * When shared domain is enabled, using NEED_NOEXIST to guarantee
* the primary data blob (aka fsid) is unique in the shared domain.
*
* For non-shared-domain case, fscache_acquire_volume() invoked by
* erofs_fscache_register_volume() has already guaranteed
* the uniqueness of primary data blob.
*
* Acquired domain/volume will be relinquished in kill_sb() on error.
*/
flags = EROFS_REG_COOKIE_NEED_INODE;
if (sbi->domain_id)
flags |= EROFS_REG_COOKIE_NEED_NOEXIST;
fscache = erofs_fscache_register_cookie(sb, sbi->fsid, flags);
if (IS_ERR(fscache)) if (IS_ERR(fscache))
return PTR_ERR(fscache); return PTR_ERR(fscache);

View File

@ -268,6 +268,7 @@ static int erofs_fill_inode(struct inode *inode)
case S_IFDIR: case S_IFDIR:
inode->i_op = &erofs_dir_iops; inode->i_op = &erofs_dir_iops;
inode->i_fop = &erofs_dir_fops; inode->i_fop = &erofs_dir_fops;
inode_nohighmem(inode);
break; break;
case S_IFLNK: case S_IFLNK:
err = erofs_fill_symlink(inode, kaddr, ofs); err = erofs_fill_symlink(inode, kaddr, ofs);
@ -295,6 +296,7 @@ static int erofs_fill_inode(struct inode *inode)
goto out_unlock; goto out_unlock;
} }
inode->i_mapping->a_ops = &erofs_raw_access_aops; inode->i_mapping->a_ops = &erofs_raw_access_aops;
mapping_set_large_folios(inode->i_mapping);
#ifdef CONFIG_EROFS_FS_ONDEMAND #ifdef CONFIG_EROFS_FS_ONDEMAND
if (erofs_is_fscache_mode(inode->i_sb)) if (erofs_is_fscache_mode(inode->i_sb))
inode->i_mapping->a_ops = &erofs_fscache_access_aops; inode->i_mapping->a_ops = &erofs_fscache_access_aops;

View File

@ -255,8 +255,7 @@ static inline int erofs_wait_on_workgroup_freezed(struct erofs_workgroup *grp)
enum erofs_kmap_type { enum erofs_kmap_type {
EROFS_NO_KMAP, /* don't map the buffer */ EROFS_NO_KMAP, /* don't map the buffer */
EROFS_KMAP, /* use kmap() to map the buffer */ EROFS_KMAP, /* use kmap_local_page() to map the buffer */
EROFS_KMAP_ATOMIC, /* use kmap_atomic() to map the buffer */
}; };
struct erofs_buf { struct erofs_buf {
@ -604,13 +603,18 @@ static inline int z_erofs_load_lzma_config(struct super_block *sb,
} }
#endif /* !CONFIG_EROFS_FS_ZIP */ #endif /* !CONFIG_EROFS_FS_ZIP */
/* flags for erofs_fscache_register_cookie() */
#define EROFS_REG_COOKIE_NEED_INODE 1
#define EROFS_REG_COOKIE_NEED_NOEXIST 2
/* fscache.c */ /* fscache.c */
#ifdef CONFIG_EROFS_FS_ONDEMAND #ifdef CONFIG_EROFS_FS_ONDEMAND
int erofs_fscache_register_fs(struct super_block *sb); int erofs_fscache_register_fs(struct super_block *sb);
void erofs_fscache_unregister_fs(struct super_block *sb); void erofs_fscache_unregister_fs(struct super_block *sb);
struct erofs_fscache *erofs_fscache_register_cookie(struct super_block *sb, struct erofs_fscache *erofs_fscache_register_cookie(struct super_block *sb,
char *name, bool need_inode); char *name,
unsigned int flags);
void erofs_fscache_unregister_cookie(struct erofs_fscache *fscache); void erofs_fscache_unregister_cookie(struct erofs_fscache *fscache);
extern const struct address_space_operations erofs_fscache_access_aops; extern const struct address_space_operations erofs_fscache_access_aops;
@ -623,7 +627,8 @@ static inline void erofs_fscache_unregister_fs(struct super_block *sb) {}
static inline static inline
struct erofs_fscache *erofs_fscache_register_cookie(struct super_block *sb, struct erofs_fscache *erofs_fscache_register_cookie(struct super_block *sb,
char *name, bool need_inode) char *name,
unsigned int flags)
{ {
return ERR_PTR(-EOPNOTSUPP); return ERR_PTR(-EOPNOTSUPP);
} }

View File

@ -245,7 +245,7 @@ static int erofs_init_device(struct erofs_buf *buf, struct super_block *sb,
} }
if (erofs_is_fscache_mode(sb)) { if (erofs_is_fscache_mode(sb)) {
fscache = erofs_fscache_register_cookie(sb, dif->path, false); fscache = erofs_fscache_register_cookie(sb, dif->path, 0);
if (IS_ERR(fscache)) if (IS_ERR(fscache))
return PTR_ERR(fscache); return PTR_ERR(fscache);
dif->fscache = fscache; dif->fscache = fscache;

View File

@ -148,7 +148,7 @@ static inline int xattr_iter_fixup(struct xattr_iter *it)
it->blkaddr += erofs_blknr(it->ofs); it->blkaddr += erofs_blknr(it->ofs);
it->kaddr = erofs_read_metabuf(&it->buf, it->sb, it->blkaddr, it->kaddr = erofs_read_metabuf(&it->buf, it->sb, it->blkaddr,
EROFS_KMAP_ATOMIC); EROFS_KMAP);
if (IS_ERR(it->kaddr)) if (IS_ERR(it->kaddr))
return PTR_ERR(it->kaddr); return PTR_ERR(it->kaddr);
it->ofs = erofs_blkoff(it->ofs); it->ofs = erofs_blkoff(it->ofs);
@ -174,7 +174,7 @@ static int inline_xattr_iter_begin(struct xattr_iter *it,
it->ofs = erofs_blkoff(iloc(sbi, vi->nid) + inline_xattr_ofs); it->ofs = erofs_blkoff(iloc(sbi, vi->nid) + inline_xattr_ofs);
it->kaddr = erofs_read_metabuf(&it->buf, inode->i_sb, it->blkaddr, it->kaddr = erofs_read_metabuf(&it->buf, inode->i_sb, it->blkaddr,
EROFS_KMAP_ATOMIC); EROFS_KMAP);
if (IS_ERR(it->kaddr)) if (IS_ERR(it->kaddr))
return PTR_ERR(it->kaddr); return PTR_ERR(it->kaddr);
return vi->xattr_isize - xattr_header_sz; return vi->xattr_isize - xattr_header_sz;
@ -368,7 +368,7 @@ static int shared_getxattr(struct inode *inode, struct getxattr_iter *it)
it->it.ofs = xattrblock_offset(sbi, vi->xattr_shared_xattrs[i]); it->it.ofs = xattrblock_offset(sbi, vi->xattr_shared_xattrs[i]);
it->it.kaddr = erofs_read_metabuf(&it->it.buf, sb, blkaddr, it->it.kaddr = erofs_read_metabuf(&it->it.buf, sb, blkaddr,
EROFS_KMAP_ATOMIC); EROFS_KMAP);
if (IS_ERR(it->it.kaddr)) if (IS_ERR(it->it.kaddr))
return PTR_ERR(it->it.kaddr); return PTR_ERR(it->it.kaddr);
it->it.blkaddr = blkaddr; it->it.blkaddr = blkaddr;
@ -580,7 +580,7 @@ static int shared_listxattr(struct listxattr_iter *it)
it->it.ofs = xattrblock_offset(sbi, vi->xattr_shared_xattrs[i]); it->it.ofs = xattrblock_offset(sbi, vi->xattr_shared_xattrs[i]);
it->it.kaddr = erofs_read_metabuf(&it->it.buf, sb, blkaddr, it->it.kaddr = erofs_read_metabuf(&it->it.buf, sb, blkaddr,
EROFS_KMAP_ATOMIC); EROFS_KMAP);
if (IS_ERR(it->it.kaddr)) if (IS_ERR(it->it.kaddr))
return PTR_ERR(it->it.kaddr); return PTR_ERR(it->it.kaddr);
it->it.blkaddr = blkaddr; it->it.blkaddr = blkaddr;

View File

@ -175,16 +175,6 @@ static void z_erofs_free_pcluster(struct z_erofs_pcluster *pcl)
DBG_BUGON(1); DBG_BUGON(1);
} }
/* how to allocate cached pages for a pcluster */
enum z_erofs_cache_alloctype {
DONTALLOC, /* don't allocate any cached pages */
/*
* try to use cached I/O if page allocation succeeds or fallback
* to in-place I/O instead to avoid any direct reclaim.
*/
TRYALLOC,
};
/* /*
* tagged pointer with 1-bit tag for all compressed pages * tagged pointer with 1-bit tag for all compressed pages
* tag 0 - the page is just found with an extra page reference * tag 0 - the page is just found with an extra page reference
@ -292,12 +282,29 @@ struct z_erofs_decompress_frontend {
.inode = __i, .owned_head = Z_EROFS_PCLUSTER_TAIL, \ .inode = __i, .owned_head = Z_EROFS_PCLUSTER_TAIL, \
.mode = Z_EROFS_PCLUSTER_FOLLOWED, .backmost = true } .mode = Z_EROFS_PCLUSTER_FOLLOWED, .backmost = true }
static bool z_erofs_should_alloc_cache(struct z_erofs_decompress_frontend *fe)
{
unsigned int cachestrategy = EROFS_I_SB(fe->inode)->opt.cache_strategy;
if (cachestrategy <= EROFS_ZIP_CACHE_DISABLED)
return false;
if (fe->backmost)
return true;
if (cachestrategy >= EROFS_ZIP_CACHE_READAROUND &&
fe->map.m_la < fe->headoffset)
return true;
return false;
}
static void z_erofs_bind_cache(struct z_erofs_decompress_frontend *fe, static void z_erofs_bind_cache(struct z_erofs_decompress_frontend *fe,
enum z_erofs_cache_alloctype type,
struct page **pagepool) struct page **pagepool)
{ {
struct address_space *mc = MNGD_MAPPING(EROFS_I_SB(fe->inode)); struct address_space *mc = MNGD_MAPPING(EROFS_I_SB(fe->inode));
struct z_erofs_pcluster *pcl = fe->pcl; struct z_erofs_pcluster *pcl = fe->pcl;
bool shouldalloc = z_erofs_should_alloc_cache(fe);
bool standalone = true; bool standalone = true;
/* /*
* optimistic allocation without direct reclaim since inplace I/O * optimistic allocation without direct reclaim since inplace I/O
@ -326,18 +333,19 @@ static void z_erofs_bind_cache(struct z_erofs_decompress_frontend *fe,
} else { } else {
/* I/O is needed, no possible to decompress directly */ /* I/O is needed, no possible to decompress directly */
standalone = false; standalone = false;
switch (type) { if (!shouldalloc)
case TRYALLOC:
newpage = erofs_allocpage(pagepool, gfp);
if (!newpage)
continue;
set_page_private(newpage,
Z_EROFS_PREALLOCATED_PAGE);
t = tag_compressed_page_justfound(newpage);
break;
default: /* DONTALLOC */
continue; continue;
}
/*
* try to use cached I/O if page allocation
* succeeds or fallback to in-place I/O instead
* to avoid any direct reclaim.
*/
newpage = erofs_allocpage(pagepool, gfp);
if (!newpage)
continue;
set_page_private(newpage, Z_EROFS_PREALLOCATED_PAGE);
t = tag_compressed_page_justfound(newpage);
} }
if (!cmpxchg_relaxed(&pcl->compressed_bvecs[i].page, NULL, if (!cmpxchg_relaxed(&pcl->compressed_bvecs[i].page, NULL,
@ -488,7 +496,8 @@ static int z_erofs_register_pcluster(struct z_erofs_decompress_frontend *fe)
struct erofs_workgroup *grp; struct erofs_workgroup *grp;
int err; int err;
if (!(map->m_flags & EROFS_MAP_ENCODED)) { if (!(map->m_flags & EROFS_MAP_ENCODED) ||
(!ztailpacking && !(map->m_pa >> PAGE_SHIFT))) {
DBG_BUGON(1); DBG_BUGON(1);
return -EFSCORRUPTED; return -EFSCORRUPTED;
} }
@ -637,20 +646,6 @@ static bool z_erofs_collector_end(struct z_erofs_decompress_frontend *fe)
return true; return true;
} }
static bool should_alloc_managed_pages(struct z_erofs_decompress_frontend *fe,
unsigned int cachestrategy,
erofs_off_t la)
{
if (cachestrategy <= EROFS_ZIP_CACHE_DISABLED)
return false;
if (fe->backmost)
return true;
return cachestrategy >= EROFS_ZIP_CACHE_READAROUND &&
la < fe->headoffset;
}
static int z_erofs_read_fragment(struct inode *inode, erofs_off_t pos, static int z_erofs_read_fragment(struct inode *inode, erofs_off_t pos,
struct page *page, unsigned int pageofs, struct page *page, unsigned int pageofs,
unsigned int len) unsigned int len)
@ -687,12 +682,9 @@ static int z_erofs_do_read_page(struct z_erofs_decompress_frontend *fe,
struct page *page, struct page **pagepool) struct page *page, struct page **pagepool)
{ {
struct inode *const inode = fe->inode; struct inode *const inode = fe->inode;
struct erofs_sb_info *const sbi = EROFS_I_SB(inode);
struct erofs_map_blocks *const map = &fe->map; struct erofs_map_blocks *const map = &fe->map;
const loff_t offset = page_offset(page); const loff_t offset = page_offset(page);
bool tight = true, exclusive; bool tight = true, exclusive;
enum z_erofs_cache_alloctype cache_strategy;
unsigned int cur, end, spiltted; unsigned int cur, end, spiltted;
int err = 0; int err = 0;
@ -746,13 +738,7 @@ repeat:
fe->mode = Z_EROFS_PCLUSTER_FOLLOWED_NOINPLACE; fe->mode = Z_EROFS_PCLUSTER_FOLLOWED_NOINPLACE;
} else { } else {
/* bind cache first when cached decompression is preferred */ /* bind cache first when cached decompression is preferred */
if (should_alloc_managed_pages(fe, sbi->opt.cache_strategy, z_erofs_bind_cache(fe, pagepool);
map->m_la))
cache_strategy = TRYALLOC;
else
cache_strategy = DONTALLOC;
z_erofs_bind_cache(fe, cache_strategy, pagepool);
} }
hitted: hitted:
/* /*

View File

@ -178,7 +178,7 @@ static int legacy_load_cluster_from_disk(struct z_erofs_maprecorder *m,
unsigned int advise, type; unsigned int advise, type;
m->kaddr = erofs_read_metabuf(&m->map->buf, inode->i_sb, m->kaddr = erofs_read_metabuf(&m->map->buf, inode->i_sb,
erofs_blknr(pos), EROFS_KMAP_ATOMIC); erofs_blknr(pos), EROFS_KMAP);
if (IS_ERR(m->kaddr)) if (IS_ERR(m->kaddr))
return PTR_ERR(m->kaddr); return PTR_ERR(m->kaddr);
@ -416,7 +416,7 @@ static int compacted_load_cluster_from_disk(struct z_erofs_maprecorder *m,
out: out:
pos += lcn * (1 << amortizedshift); pos += lcn * (1 << amortizedshift);
m->kaddr = erofs_read_metabuf(&m->map->buf, inode->i_sb, m->kaddr = erofs_read_metabuf(&m->map->buf, inode->i_sb,
erofs_blknr(pos), EROFS_KMAP_ATOMIC); erofs_blknr(pos), EROFS_KMAP);
if (IS_ERR(m->kaddr)) if (IS_ERR(m->kaddr))
return PTR_ERR(m->kaddr); return PTR_ERR(m->kaddr);
return unpack_compacted_index(m, amortizedshift, pos, lookahead); return unpack_compacted_index(m, amortizedshift, pos, lookahead);
@ -694,10 +694,15 @@ static int z_erofs_do_map_blocks(struct inode *inode,
map->m_pa = blknr_to_addr(m.pblk); map->m_pa = blknr_to_addr(m.pblk);
err = z_erofs_get_extent_compressedlen(&m, initial_lcn); err = z_erofs_get_extent_compressedlen(&m, initial_lcn);
if (err) if (err)
goto out; goto unmap_out;
} }
if (m.headtype == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN) { if (m.headtype == Z_EROFS_VLE_CLUSTER_TYPE_PLAIN) {
if (map->m_llen > map->m_plen) {
DBG_BUGON(1);
err = -EFSCORRUPTED;
goto unmap_out;
}
if (vi->z_advise & Z_EROFS_ADVISE_INTERLACED_PCLUSTER) if (vi->z_advise & Z_EROFS_ADVISE_INTERLACED_PCLUSTER)
map->m_algorithmformat = map->m_algorithmformat =
Z_EROFS_COMPRESSION_INTERLACED; Z_EROFS_COMPRESSION_INTERLACED;
@ -718,14 +723,12 @@ static int z_erofs_do_map_blocks(struct inode *inode,
if (!err) if (!err)
map->m_flags |= EROFS_MAP_FULL_MAPPED; map->m_flags |= EROFS_MAP_FULL_MAPPED;
} }
unmap_out: unmap_out:
erofs_unmap_metabuf(&m.map->buf); erofs_unmap_metabuf(&m.map->buf);
out:
erofs_dbg("%s, m_la %llu m_pa %llu m_llen %llu m_plen %llu m_flags 0%o", erofs_dbg("%s, m_la %llu m_pa %llu m_llen %llu m_plen %llu m_flags 0%o",
__func__, map->m_la, map->m_pa, __func__, map->m_la, map->m_pa,
map->m_llen, map->m_plen, map->m_flags); map->m_llen, map->m_plen, map->m_flags);
return err; return err;
} }

View File

@ -267,6 +267,14 @@ struct netfs_cache_ops {
loff_t *_start, size_t *_len, loff_t i_size, loff_t *_start, size_t *_len, loff_t i_size,
bool no_space_allocated_yet); bool no_space_allocated_yet);
/* Prepare an on-demand read operation, shortening it to a cached/uncached
* boundary as appropriate.
*/
enum netfs_io_source (*prepare_ondemand_read)(struct netfs_cache_resources *cres,
loff_t start, size_t *_len,
loff_t i_size,
unsigned long *_flags, ino_t ino);
/* Query the occupancy of the cache in a region, returning where the /* Query the occupancy of the cache in a region, returning where the
* next chunk of data starts and how long it is. * next chunk of data starts and how long it is.
*/ */

View File

@ -428,16 +428,18 @@ TRACE_EVENT(cachefiles_vol_coherency,
); );
TRACE_EVENT(cachefiles_prep_read, TRACE_EVENT(cachefiles_prep_read,
TP_PROTO(struct netfs_io_subrequest *sreq, TP_PROTO(struct cachefiles_object *obj,
loff_t start,
size_t len,
unsigned short flags,
enum netfs_io_source source, enum netfs_io_source source,
enum cachefiles_prepare_read_trace why, enum cachefiles_prepare_read_trace why,
ino_t cache_inode), ino_t cache_inode, ino_t netfs_inode),
TP_ARGS(sreq, source, why, cache_inode), TP_ARGS(obj, start, len, flags, source, why, cache_inode, netfs_inode),
TP_STRUCT__entry( TP_STRUCT__entry(
__field(unsigned int, rreq ) __field(unsigned int, obj )
__field(unsigned short, index )
__field(unsigned short, flags ) __field(unsigned short, flags )
__field(enum netfs_io_source, source ) __field(enum netfs_io_source, source )
__field(enum cachefiles_prepare_read_trace, why ) __field(enum cachefiles_prepare_read_trace, why )
@ -448,19 +450,18 @@ TRACE_EVENT(cachefiles_prep_read,
), ),
TP_fast_assign( TP_fast_assign(
__entry->rreq = sreq->rreq->debug_id; __entry->obj = obj ? obj->debug_id : 0;
__entry->index = sreq->debug_index; __entry->flags = flags;
__entry->flags = sreq->flags;
__entry->source = source; __entry->source = source;
__entry->why = why; __entry->why = why;
__entry->len = sreq->len; __entry->len = len;
__entry->start = sreq->start; __entry->start = start;
__entry->netfs_inode = sreq->rreq->inode->i_ino; __entry->netfs_inode = netfs_inode;
__entry->cache_inode = cache_inode; __entry->cache_inode = cache_inode;
), ),
TP_printk("R=%08x[%u] %s %s f=%02x s=%llx %zx ni=%x B=%x", TP_printk("o=%08x %s %s f=%02x s=%llx %zx ni=%x B=%x",
__entry->rreq, __entry->index, __entry->obj,
__print_symbolic(__entry->source, netfs_sreq_sources), __print_symbolic(__entry->source, netfs_sreq_sources),
__print_symbolic(__entry->why, cachefiles_prepare_read_traces), __print_symbolic(__entry->why, cachefiles_prepare_read_traces),
__entry->flags, __entry->flags,