Commit Graph

317 Commits

Author SHA1 Message Date
Gao Xiang
e424638474 erofs: ensure m_llen is reset to 0 if metadata is invalid
[ Upstream commit 9b32b063be ]

Sometimes, the on-disk metadata might be invalid due to user
interrupts, storage failures, or other unknown causes.

In that case, z_erofs_map_blocks_iter() may still return a valid
m_llen while other fields remain invalid (e.g., m_plen can be 0).

Due to the return value of z_erofs_scan_folio() in some path will
be ignored on purpose, the following z_erofs_scan_folio() could
then use the invalid value by accident.

Let's reset m_llen to 0 to prevent this.

Link: https://lore.kernel.org/r/20240629185743.2819229-1-hsiangkao@linux.alibaba.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-07-25 09:49:18 +02:00
Gao Xiang
6e49f3ac43 erofs: apply proper VMA alignment for memory mapped files on THP
[ Upstream commit 4127caee89 ]

There are mainly two reasons that thp_get_unmapped_area() should be
used for EROFS as other filesystems:

 - It's needed to enable PMD mappings as a FSDAX filesystem, see
   commit 74d2fad133 ("thp, dax: add thp_get_unmapped_area for pmd
   mappings");

 - It's useful together with large folios and
   CONFIG_READ_ONLY_THP_FOR_FS which enable THPs for mmapped files
   (e.g. shared libraries) even without FSDAX.  See commit 1854bc6e24
   ("mm/readahead: Align file mappings for non-DAX").

Fixes: 06252e9ce0 ("erofs: dax support for non-tailpacking regular file")
Fixes: ce529cc25b ("erofs: enable large folios for iomap mode")
Fixes: e6687b8922 ("erofs: enable large folios for fscache mode")
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240306053138.2240206-1-hsiangkao@linux.alibaba.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-15 10:48:16 -04:00
Gao Xiang
47467e0481 erofs: fix inconsistent per-file compression format
commit 118a8cf504 upstream.

EROFS can select compression algorithms on a per-file basis, and each
per-file compression algorithm needs to be marked in the on-disk
superblock for initialization.

However, syzkaller can generate inconsistent crafted images that use
an unsupported algorithmtype for specific inodes, e.g. use MicroLZMA
algorithmtype even it's not set in `sbi->available_compr_algs`.  This
can lead to an unexpected "BUG: kernel NULL pointer dereference" if
the corresponding decompressor isn't built-in.

Fix this by checking against `sbi->available_compr_algs` for each
m_algorithmformat request.  Incorrect !erofs_sb_has_compr_cfgs preset
bitmap is now fixed together since it was harmless previously.

Reported-by: <bugreport@ubisectech.com>
Fixes: 8f89926290 ("erofs: get compression algorithms directly on mapping")
Fixes: 622ceaddb7 ("erofs: lzma compression support")
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Link: https://lore.kernel.org/r/20240113150602.1471050-1-hsiangkao@linux.alibaba.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Yue Hu <huyue2@coolpad.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-01 13:26:39 +01:00
Gao Xiang
54407d9bc5 erofs: simplify compression configuration parser
commit efb4fb02ce upstream.

Move erofs_load_compr_cfgs() into decompressor.c as well as introduce
a callback instead of a hard-coded switch for each algorithm for
simplicity.

Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20231022130957.11398-1-xiang@kernel.org
Stable-dep-of: 118a8cf504 ("erofs: fix inconsistent per-file compression format")
Signed-off-by: Yue Hu <huyue2@coolpad.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-01 13:26:38 +01:00
Sandeep Dhavale
d9d2426253 erofs: fix refcount on the metabuf used for inode lookup
commit 56ee7db311 upstream.

In erofs_find_target_block() when erofs_dirnamecmp() returns 0,
we do not assign the target metabuf. This causes the caller
erofs_namei()'s erofs_put_metabuf() at the end to be not effective
leaving the refcount on the page.
As the page from metabuf (buf->page) is never put, such page cannot be
migrated or reclaimed. Fix it now by putting the metabuf from
previous loop and assigning the current metabuf to target before
returning so caller erofs_namei() can do the final put as it was
intended.

Fixes: 500edd0956 ("erofs: use meta buffers for inode lookup")
Cc: <stable@vger.kernel.org> # 5.18+
Signed-off-by: Sandeep Dhavale <dhavale@google.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20240221210348.3667795-1-dhavale@google.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-01 13:26:33 +01:00
Gao Xiang
e0e78522b4 erofs: fix ztailpacking for subpage compressed blocks
[ Upstream commit e5aba911de ]

`pageofs_in` should be the compressed data offset of the page rather
than of the block.

Acked-by: Chao Yu <chao@kernel.org>
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20231214161337.753049-1-hsiangkao@linux.alibaba.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-02-05 20:12:48 +00:00
Gao Xiang
33bf23c994 erofs: fix lz4 inplace decompression
[ Upstream commit 3c12466b6b ]

Currently EROFS can map another compressed buffer for inplace
decompression, that was used to handle the cases that some pages of
compressed data are actually not in-place I/O.

However, like most simple LZ77 algorithms, LZ4 expects the compressed
data is arranged at the end of the decompressed buffer and it
explicitly uses memmove() to handle overlapping:
  __________________________________________________________
 |_ direction of decompression --> ____ |_ compressed data _|

Although EROFS arranges compressed data like this, it typically maps two
individual virtual buffers so the relative order is uncertain.
Previously, it was hardly observed since LZ4 only uses memmove() for
short overlapped literals and x86/arm64 memmove implementations seem to
completely cover it up and they don't have this issue.  Juhyung reported
that EROFS data corruption can be found on a new Intel x86 processor.
After some analysis, it seems that recent x86 processors with the new
FSRM feature expose this issue with "rep movsb".

Let's strictly use the decompressed buffer for lz4 inplace
decompression for now.  Later, as an useful improvement, we could try
to tie up these two buffers together in the correct order.

Reported-and-tested-by: Juhyung Park <qkrwngud825@gmail.com>
Closes: https://lore.kernel.org/r/CAD14+f2AVKf8Fa2OO1aAUdDNTDsVzzR6ctU_oJSmTyd6zSYR2Q@mail.gmail.com
Fixes: 0ffd71bcc3 ("staging: erofs: introduce LZ4 decompression inplace")
Fixes: 598162d050 ("erofs: support decompress big pcluster for lz4 backend")
Cc: stable <stable@vger.kernel.org> # 5.4+
Tested-by: Yifan Zhao <zhaoyifan@sjtu.edu.cn>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20231206045534.3920847-1-hsiangkao@linux.alibaba.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-31 16:17:09 -08:00
Gao Xiang
2197389e1a erofs: get rid of the remaining kmap_atomic()
[ Upstream commit 123ec246eb ]

It's unnecessary to use kmap_atomic() compared with kmap_local_page().
In addition, kmap_atomic() is deprecated now.

Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20230627161240.331-1-hsiangkao@linux.alibaba.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Stable-dep-of: 3c12466b6b ("erofs: fix lz4 inplace decompression")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-31 16:17:09 -08:00
Gao Xiang
9c285df949 erofs: fix memory leak on short-lived bounced pages
[ Upstream commit 93d6fda7f9 ]

Both MicroLZMA and DEFLATE algorithms can use short-lived pages on
demand for the overlapped inplace I/O decompression.

However, those short-lived pages are actually added to
`be->compressed_pages`.  Thus, it should be checked instead of
`pcl->compressed_bvecs`.

The LZ4 algorithm doesn't work like this, so it won't be impacted.

Fixes: 67139e36d9 ("erofs: introduce `z_erofs_parse_in_bvecs'")
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20231128180431.4116991-1-hsiangkao@linux.alibaba.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-25 15:27:22 -08:00
Gao Xiang
6a5a8f0a97 erofs: fix memory leak of LZMA global compressed deduplication
[ Upstream commit 75a5221630 ]

When stressing microLZMA EROFS images with the new global compressed
deduplication feature enabled (`-Ededupe`), I found some short-lived
temporary pages weren't properly released, which could slowly cause
unexpected OOMs hours later.

Let's fix it now (LZ4 and DEFLATE don't have this issue.)

Fixes: 5c2a64252c ("erofs: introduce partial-referenced pclusters")
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230907050542.97152-1-hsiangkao@linux.alibaba.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-10-10 22:00:39 +02:00
Gao Xiang
1ce9ebc96e erofs: ensure that the post-EOF tails are all zeroed
commit e4c1cf523d upstream.

This was accidentally fixed up in commit e4c1cf523d but we can't
take the full change due to other dependancy issues, so here is just
the actual bugfix that is needed.

[Background]

keltargw reported an issue [1] that with mmaped I/Os, sometimes the
tail of the last page (after file ends) is not filled with zeroes.

The root cause is that such tail page could be wrongly selected for
inplace I/Os so the zeroed part will then be filled with compressed
data instead of zeroes.

A simple fix is to avoid doing inplace I/Os for such tail parts,
actually that was already fixed upstream in commit e4c1cf523d
("erofs: tidy up z_erofs_do_read_page()") by accident.

[1] https://lore.kernel.org/r/3ad8b469-25db-a297-21f9-75db2d6ad224@linux.alibaba.com

Reported-by: keltargw <keltar.gw@gmail.com>
Fixes: 3883a79abd ("staging: erofs: introduce VLE decompression support")
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-09-06 21:26:59 +01:00
Gao Xiang
05e0952ddb erofs: fix wrong primary bvec selection on deduplicated extents
[ Upstream commit 94c43de735 ]

When handling deduplicated compressed data, there can be multiple
decompressed extents pointing to the same compressed data in one shot.

In such cases, the bvecs which belong to the longest extent will be
selected as the primary bvecs for real decompressors to decode and the
other duplicated bvecs will be directly copied from the primary bvecs.

Previously, only relative offsets of the longest extent were checked to
decompress the primary bvecs.  On rare occasions, it can be incorrect
if there are several extents with the same start relative offset.
As a result, some short bvecs could be selected for decompression and
then cause data corruption.

For example, as Shijie Sun reported off-list, considering the following
extents of a file:
 117:   903345..  915250 |   11905 :     385024..    389120 |    4096
...
 119:   919729..  930323 |   10594 :     385024..    389120 |    4096
...
 124:   968881..  980786 |   11905 :     385024..    389120 |    4096

The start relative offset is the same: 2225, but extent 119 (919729..
930323) is shorter than the others.

Let's restrict the bvec length in addition to the start offset if bvecs
are not full.

Reported-by: Shijie Sun <sunshijie@xiaomi.com>
Fixes: 5c2a64252c ("erofs: introduce partial-referenced pclusters")
Tested-by Shijie Sun <sunshijie@xiaomi.com>
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230719065459.60083-1-hsiangkao@linux.alibaba.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-11 12:08:11 +02:00
Xin Yin
9e6474e5d7 erofs: fix fsdax unavailability for chunk-based regular files
[ Upstream commit 18bddc5b67 ]

DAX can be used to share page cache between VMs, reducing guest memory
overhead. And chunk based data format is widely used for VM and
container image. So enable dax support for it, make erofs better used
for VM scenarios.

Fixes: c5aa903a59 ("erofs: support reading chunk-based uncompressed files")
Signed-off-by: Xin Yin <yinxin.x@bytedance.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20230711062130.7860-1-yinxin.x@bytedance.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-23 13:49:26 +02:00
Chunhai Guo
dc8158a95f erofs: avoid infinite loop in z_erofs_do_read_page() when reading beyond EOF
[ Upstream commit 8191213a58 ]

z_erofs_do_read_page() may loop infinitely due to the inappropriate
truncation in the below statement. Since the offset is 64 bits and min_t()
truncates the result to 32 bits. The solution is to replace unsigned int
with a 64-bit type, such as erofs_off_t.
    cur = end - min_t(unsigned int, offset + end - map->m_la, end);

    - For example:
        - offset = 0x400160000
        - end = 0x370
        - map->m_la = 0x160370
        - offset + end - map->m_la = 0x400000000
        - offset + end - map->m_la = 0x00000000 (truncated as unsigned int)
    - Expected result:
        - cur = 0
    - Actual result:
        - cur = 0x370

Signed-off-by: Chunhai Guo <guochunhai@vivo.com>
Fixes: 3883a79abd ("staging: erofs: introduce VLE decompression support")
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20230710093410.44071-1-guochunhai@vivo.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-23 13:49:26 +02:00
Chunhai Guo
83879f72e0 erofs: avoid useless loops in z_erofs_pcluster_readmore() when reading beyond EOF
[ Upstream commit 936aa701d8 ]

z_erofs_pcluster_readmore() may take a long time to loop when the page
offset is large enough, which is unnecessary should be prevented.

For example, when the following case is encountered, it will loop 4691368
times, taking about 27 seconds:
    - offset = 19217289215
    - inode_size = 1442672

Signed-off-by: Chunhai Guo <guochunhai@vivo.com>
Fixes: 386292919c ("erofs: introduce readmore decompression strategy")
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20230710042531.28761-1-guochunhai@vivo.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-23 13:49:26 +02:00
Gao Xiang
9a53410038 erofs: fix compact 4B support for 16k block size
[ Upstream commit 001b8ccd06 ]

In compact 4B, two adjacent lclusters are packed together as a unit to
form on-disk indexes for effective random access, as below:

(amortized = 4, vcnt = 2)
       _____________________________________________
      |___@_____ encoded bits __________|_ blkaddr _|
      0        .                                    amortized * vcnt = 8
      .             .
      .                  .              amortized * vcnt - 4 = 4
      .                        .
      .____________________________.
      |_type (2 bits)_|_clusterofs_|

Therefore, encoded bits for each pack are 32 bits (4 bytes). IOWs,
since each lcluster can get 16 bits for its type and clusterofs, the
maximum supported lclustersize for compact 4B format is 16k (14 bits).

Fix this to enable compact 4B format for 16k lclusters (blocks), which
is tested on an arm64 server with 16k page size.

Fixes: 152a333a58 ("staging: erofs: add compacted compression indexes support")
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230601112341.56960-1-hsiangkao@linux.alibaba.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19 16:20:59 +02:00
Gao Xiang
ec94df6bcf erofs: simplify iloc()
[ Upstream commit b780d3fc61 ]

Actually we could pass in inodes directly to clean up all callers.
Also rename iloc() as erofs_iloc().

Link: https://lore.kernel.org/r/20230114150823.432069-1-xiang@kernel.org
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Stable-dep-of: 001b8ccd06 ("erofs: fix compact 4B support for 16k block size")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19 16:20:59 +02:00
Gao Xiang
d3b39ea248 erofs: kill hooked chains to avoid loops on deduplicated compressed images
[ Upstream commit 967c28b23f ]

After heavily stressing EROFS with several images which include a
hand-crafted image of repeated patterns for more than 46 days, I found
two chains could be linked with each other almost simultaneously and
form a loop so that the entire loop won't be submitted.  As a
consequence, the corresponding file pages will remain locked forever.

It can be _only_ observed on data-deduplicated compressed images.
For example, consider two chains with five pclusters in total:
	Chain 1:  2->3->4->5    -- The tail pcluster is 5;
        Chain 2:  5->1->2       -- The tail pcluster is 2.

Chain 2 could link to Chain 1 with pcluster 5; and Chain 1 could link
to Chain 2 at the same time with pcluster 2.

Since hooked chains are all linked locklessly now, I have no idea how
to simply avoid the race.  Instead, let's avoid hooked chains completely
until I could work out a proper way to fix this and end users finally
tell us that it's needed to add it back.

Actually, this optimization can be found with multi-threaded workloads
(especially even more often on deduplicated compressed images), yet I'm
not sure about the overall system impacts of not having this compared
with implementation complexity.

Fixes: 267f2492c8 ("erofs: introduce multi-reference pclusters (fully-referenced)")
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Link: https://lore.kernel.org/r/20230526201459.128169-4-hsiangkao@linux.alibaba.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19 16:20:55 +02:00
Gao Xiang
daed10290b erofs: move zdata.h into zdata.c
[ Upstream commit a9a94d9373 ]

Definitions in zdata.h are only used in zdata.c and for internal
use only.  No logic changes.

Reviewed-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230204093040.97967-4-hsiangkao@linux.alibaba.com
Stable-dep-of: 967c28b23f ("erofs: kill hooked chains to avoid loops on deduplicated compressed images")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19 16:20:54 +02:00
Gao Xiang
041ff2c21b erofs: remove tagged pointer helpers
[ Upstream commit b1ed220c62 ]

Just open-code the remaining one to simplify the code.

Reviewed-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230204093040.97967-3-hsiangkao@linux.alibaba.com
Stable-dep-of: 967c28b23f ("erofs: kill hooked chains to avoid loops on deduplicated compressed images")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19 16:20:54 +02:00
Gao Xiang
3379f13ebc erofs: avoid tagged pointers to mark sync decompression
[ Upstream commit cdba55067f ]

We could just use a boolean in z_erofs_decompressqueue for sync
decompression to simplify the code.

Reviewed-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230204093040.97967-2-hsiangkao@linux.alibaba.com
Stable-dep-of: 967c28b23f ("erofs: kill hooked chains to avoid loops on deduplicated compressed images")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19 16:20:54 +02:00
Gao Xiang
3564500b0d erofs: clean up cached I/O strategies
[ Upstream commit 1282dea37b ]

After commit 4c7e42552b ("erofs: remove useless cache strategy of
DELAYEDALLOC"), only one cached I/O allocation strategy is supported:

  When cached I/O is preferred, page allocation is applied without
  direct reclaim.  If allocation fails, fall back to inplace I/O.

Let's get rid of z_erofs_cache_alloctype.  No logical changes.

Reviewed-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Yue Hu <huyue2@coolpad.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20221206060352.152830-1-xiang@kernel.org
Stable-dep-of: 967c28b23f ("erofs: kill hooked chains to avoid loops on deduplicated compressed images")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19 16:20:54 +02:00
Jingbo Xu
0955b8eac3 erofs: fix potential overflow calculating xattr_isize
[ Upstream commit 1b3567a196 ]

Given on-disk i_xattr_icount is 16 bits and xattr_isize is calculated
from i_xattr_icount multiplying 4, xattr_isize has a theoretical maximum
of 256K (64K * 4).

Thus declare xattr_isize as unsigned int to avoid the potential overflow.

Fixes: bfb8674dc0 ("staging: erofs: add erofs in-memory stuffs")
Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20230414061810.6479-1-jefflexu@linux.alibaba.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-11 23:03:07 +09:00
Jingbo Xu
50f1c1fba0 erofs: initialize packed inode after root inode is assigned
[ Upstream commit cb9bce7951 ]

As commit 8f7acdae2c ("staging: erofs: kill all failure handling in
fill_super()"), move the initialization of packed inode after root
inode is assigned, so that the iput() in .put_super() is adequate as
the failure handling.

Otherwise, iput() is also needed in .kill_sb(), in case of the mounting
fails halfway.

Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Fixes: b15b2e307c ("erofs: support on-disk compressed fragments data")
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Acked-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20230407141710.113882-3-jefflexu@linux.alibaba.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-11 23:03:07 +09:00
Gao Xiang
7ee7a86e28 erofs: stop parsing non-compact HEAD index if clusterofs is invalid
[ Upstream commit cc4efd3dd2 ]

Syzbot generated a crafted image [1] with a non-compact HEAD index of
clusterofs 33024 while valid numbers should be 0 ~ lclustersize-1,
which causes the following unexpected behavior as below:

 BUG: unable to handle page fault for address: fffff52101a3fff9
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 23ffed067 P4D 23ffed067 PUD 0
 Oops: 0000 [#1] PREEMPT SMP KASAN
 CPU: 1 PID: 4398 Comm: kworker/u5:1 Not tainted 6.3.0-rc6-syzkaller-g09a9639e56c0 #0
 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/30/2023
 Workqueue: erofs_worker z_erofs_decompressqueue_work
 RIP: 0010:z_erofs_decompress_queue+0xb7e/0x2b40
 ...
 Call Trace:
  <TASK>
  z_erofs_decompressqueue_work+0x99/0xe0
  process_one_work+0x8f6/0x1170
  worker_thread+0xa63/0x1210
  kthread+0x270/0x300
  ret_from_fork+0x1f/0x30

Note that normal images or images using compact indexes are not
impacted.  Let's fix this now.

[1] https://lore.kernel.org/r/000000000000ec75b005ee97fbaa@google.com

Reported-and-tested-by: syzbot+aafb3f37cfeb6534c4ac@syzkaller.appspotmail.com
Fixes: 02827e1796 ("staging: erofs: add erofs_map_blocks_iter")
Fixes: 152a333a58 ("staging: erofs: add compacted compression indexes support")
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20230410173714.104604-1-hsiangkao@linux.alibaba.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-05-11 23:03:06 +09:00
Gao Xiang
99e9188f9c erofs: Revert "erofs: fix kvcalloc() misuse with __GFP_NOFAIL"
[ Upstream commit 647dd2c3f0 ]

Let's revert commit 12724ba389 ("erofs: fix kvcalloc() misuse with
__GFP_NOFAIL") since kvmalloc() already supports __GFP_NOFAIL in commit
a421ef3030 ("mm: allow !GFP_KERNEL allocations for kvmalloc").  So
the original fix was wrong.

Actually there was some issue as [1] discussed, so before that mm fix
is landed, the warn could still happen but applying this commit first
will cause less.

[1] https://lore.kernel.org/r/20230305053035.1911-1-hsiangkao@linux.alibaba.com

Fixes: 12724ba389 ("erofs: fix kvcalloc() misuse with __GFP_NOFAIL")
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20230309053148.9223-1-hsiangkao@linux.alibaba.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-17 08:50:28 +01:00
Gao Xiang
fa4056781a erofs: fix wrong kunmap when using LZMA on HIGHMEM platforms
commit 8f121dfb15 upstream.

As the call trace shown, the root cause is kunmap incorrect pages:

 BUG: kernel NULL pointer dereference, address: 00000000
 CPU: 1 PID: 40 Comm: kworker/u5:0 Not tainted 6.2.0-rc5 #4
 Workqueue: erofs_worker z_erofs_decompressqueue_work
 EIP: z_erofs_lzma_decompress+0x34b/0x8ac
  z_erofs_decompress+0x12/0x14
  z_erofs_decompress_queue+0x7e7/0xb1c
  z_erofs_decompressqueue_work+0x32/0x60
  process_one_work+0x24b/0x4d8
  ? process_one_work+0x1a4/0x4d8
  worker_thread+0x14c/0x3fc
  kthread+0xe6/0x10c
  ? rescuer_thread+0x358/0x358
  ? kthread_complete_and_exit+0x18/0x18
  ret_from_fork+0x1c/0x28
 ---[ end trace 0000000000000000 ]---

The bug is trivial and should be fixed now.  It has no impact on
!HIGHMEM platforms.

Fixes: 622ceaddb7 ("erofs: lzma compression support")
Cc: <stable@vger.kernel.org> # 5.16+
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230305134455.88236-1-hsiangkao@linux.alibaba.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-17 08:50:14 +01:00
Jingbo Xu
afc9831894 erofs: relinquish volume with mutex held
[ Upstream commit 7032809a44 ]

Relinquish fscache volume with mutex held.  Otherwise if a new domain is
registered when the old domain with the same name gets removed from the
list but not relinquished yet, fscache may complain the collision.

Fixes: 8b7adf1dff ("erofs: introduce fscache-based domain")
Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com>
Link: https://lore.kernel.org/r/20230209063913.46341-4-jefflexu@linux.alibaba.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-03-10 09:32:46 +01:00
Al Viro
5a19095103 use less confusing names for iov_iter direction initializers
[ Upstream commit de4eda9de2 ]

READ/WRITE proved to be actively confusing - the meanings are
"data destination, as used with read(2)" and "data source, as
used with write(2)", but people keep interpreting those as
"we read data from it" and "we write data to it", i.e. exactly
the wrong way.

Call them ITER_DEST and ITER_SOURCE - at least that is harder
to misinterpret...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Stable-dep-of: 6dd88fd59d ("vhost-scsi: unbreak any layout for response")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-02-09 11:28:04 +01:00
Jingbo Xu
6db03adf07 erofs: clean up parsing of fscache related options
[ Upstream commit e02ac3e732 ]

... to avoid the mess of conditional preprocessing as we are continually
adding fscache related mount options.

Reviewd-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230112065431.124926-3-jefflexu@linux.alibaba.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-02-06 08:06:32 +01:00
Siddh Raman Pant
9f31d8c889 erofs/zmap.c: Fix incorrect offset calculation
[ Upstream commit 6acd87d509 ]

Effective offset to add to length was being incorrectly calculated,
which resulted in iomap->length being set to 0, triggering a WARN_ON
in iomap_iter_done().

Fix that, and describe it in comments.

This was reported as a crash by syzbot under an issue about a warning
encountered in iomap_iter_done(), but unrelated to erofs.

C reproducer: https://syzkaller.appspot.com/text?tag=ReproC&x=1037a6b2880000
Kernel config: https://syzkaller.appspot.com/text?tag=KernelConfig&x=e2021a61197ebe02
Dashboard link: https://syzkaller.appspot.com/bug?extid=a8e049cd3abd342936b6

Reported-by: syzbot+a8e049cd3abd342936b6@syzkaller.appspotmail.com
Suggested-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Siddh Raman Pant <code@siddh.me>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20221209102151.311049-1-code@siddh.me
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-02-06 08:06:31 +01:00
Gao Xiang
7b28a89208 erofs: fix kvcalloc() misuse with __GFP_NOFAIL
[ Upstream commit 12724ba389 ]

As reported by syzbot [1], kvcalloc() cannot work with  __GFP_NOFAIL.
Let's use kcalloc() instead.

[1] https://lore.kernel.org/r/0000000000007796bd05f1852ec2@google.com

Reported-by: syzbot+c3729cda01706a04fb98@syzkaller.appspotmail.com
Fixes: fe3e5914e6 ("erofs: try to leave (de)compressed_pages on stack if possible")
Fixes: 4f05687fd7 ("erofs: introduce struct z_erofs_decompress_backend")
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230110074927.41651-1-hsiangkao@linux.alibaba.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-02-01 08:34:07 +01:00
Gao Xiang
40c73b2ea9 erofs: validate the extent length for uncompressed pclusters
[ Upstream commit c505feba4c ]

syzkaller reported a KASAN use-after-free:
https://syzkaller.appspot.com/bug?extid=2ae90e873e97f1faf6f2

The referenced fuzzed image actually has two issues:
 - m_pa == 0 as a non-inlined pcluster;
 - The logical length is longer than its physical length.

The first issue has already been addressed.  This patch addresses
the second issue by checking the extent length validity.

Reported-by: syzbot+2ae90e873e97f1faf6f2@syzkaller.appspotmail.com
Fixes: 02827e1796 ("staging: erofs: add erofs_map_blocks_iter")
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20221205150050.47784-2-hsiangkao@linux.alibaba.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-12-31 13:31:59 +01:00
Gao Xiang
373b6f350a erofs: fix missing unmap if z_erofs_get_extent_compressedlen() fails
[ Upstream commit d5d188b8f8 ]

Otherwise, meta buffers could be leaked.

Fixes: cec6e93bea ("erofs: support parsing big pcluster compress indexes")
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20221205150050.47784-1-hsiangkao@linux.alibaba.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-12-31 13:31:59 +01:00
Chen Zhongjin
618e712b99 erofs: Fix pcluster memleak when its block address is zero
[ Upstream commit c42c0ffe81 ]

syzkaller reported a memleak:
https://syzkaller.appspot.com/bug?id=62f37ff612f0021641eda5b17f056f1668aa9aed

unreferenced object 0xffff88811009c7f8 (size 136):
  ...
  backtrace:
    [<ffffffff821db19b>] z_erofs_do_read_page+0x99b/0x1740
    [<ffffffff821dee9e>] z_erofs_readahead+0x24e/0x580
    [<ffffffff814bc0d6>] read_pages+0x86/0x3d0
    ...

syzkaller constructed a case: in z_erofs_register_pcluster(),
ztailpacking = false and map->m_pa = zero. This makes pcl->obj.index be
zero although pcl is not a inline pcluster.

Then following path adds refcount for grp, but the refcount won't be put
because pcl is inline.

z_erofs_readahead()
  z_erofs_do_read_page() # for another page
    z_erofs_collector_begin()
      erofs_find_workgroup()
        erofs_workgroup_get()

Since it's illegal for the block address of a non-inlined pcluster to
be zero, add check here to avoid registering the pcluster which would
be leaked.

Fixes: cecf864d3d ("erofs: support inline data decompression")
Reported-by: syzbot+6f8cd9a0155b366d227f@syzkaller.appspotmail.com
Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com>
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/Y42Kz6sVkf+XqJRB@debian
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-12-31 13:31:59 +01:00
Hou Tao
18067e2624 erofs: check the uniqueness of fsid in shared domain in advance
[ Upstream commit 27f2a2dcc6 ]

When shared domain is enabled, doing mount twice with the same fsid and
domain_id will trigger sysfs warning as shown below:

 sysfs: cannot create duplicate filename '/fs/erofs/d0,meta.bin'
 CPU: 15 PID: 1051 Comm: mount Not tainted 6.1.0-rc6+ #1
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
 Call Trace:
  <TASK>
  dump_stack_lvl+0x38/0x49
  dump_stack+0x10/0x12
  sysfs_warn_dup.cold+0x17/0x27
  sysfs_create_dir_ns+0xb8/0xd0
  kobject_add_internal+0xb1/0x240
  kobject_init_and_add+0x71/0xa0
  erofs_register_sysfs+0x89/0x110
  erofs_fc_fill_super+0x98c/0xaf0
  vfs_get_super+0x7d/0x100
  get_tree_nodev+0x16/0x20
  erofs_fc_get_tree+0x20/0x30
  vfs_get_tree+0x24/0xb0
  path_mount+0x2fa/0xa90
  do_mount+0x7c/0xa0
  __x64_sys_mount+0x8b/0xe0
  do_syscall_64+0x30/0x60
  entry_SYSCALL_64_after_hwframe+0x46/0xb0

The reason is erofs_fscache_register_cookie() doesn't guarantee the primary
data blob (aka fsid) is unique in the shared domain and
erofs_register_sysfs() invoked by the second mount will fail due to the
duplicated fsid in the shared domain and report warning.

It would be better to check the uniqueness of fsid before doing
erofs_register_sysfs(), so adding a new flags parameter for
erofs_fscache_register_cookie() and doing the uniqueness check if
EROFS_REG_COOKIE_NEED_NOEXIST is enabled.

After the patch, the error in dmesg for the duplicated mount would be:

 erofs: ...: erofs_domain_register_cookie: XX already exists in domain YY

Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com>
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20221125110822.3812942-1-houtao@huaweicloud.com
Fixes: 7d41963759 ("erofs: Support sharing cookies in the same domain")
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-12-31 13:31:59 +01:00
Linus Torvalds
81e7cfa3a9 Changes since last update:
- Fix packed_inode invalid access when reading fragments on crafted
    images;
 
  - Add a missing erofs_put_metabuf() in an error path in fscache mode;
 
  - Fix incorrect `count' for unmapped extents in fscache mode;
 
  - Fix use-after-free of fsid and domain_id string when remounting;
 
  - Fix missing xas_retry() in fscache mode.
 -----BEGIN PGP SIGNATURE-----
 
 iIcEABYIAC8WIQThPAmQN9sSA0DVxtI5NzHcH7XmBAUCY3OcchEceGlhbmdAa2Vy
 bmVsLm9yZwAKCRA5NzHcH7XmBCzgAP92t7Lfu7gBuyhXfCJwJVFK0Iku8j9mhOiT
 +C/RVB+9zQEAk/2vy3ULcGN5k6k2q7OgEzNxQ/jM3hVQnoK+sQzwVwA=
 =XHmr
 -----END PGP SIGNATURE-----

Merge tag 'erofs-for-6.1-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs

Pull erofs fixes from Gao Xiang:
 "Most patches randomly fix error paths or corner cases in fscache mode
  reported recently. One fixes an invalid access relating to fragments
  on crafted images.

  Summary:

   - Fix packed_inode invalid access when reading fragments on crafted
     images

   - Add a missing erofs_put_metabuf() in an error path in fscache mode

   - Fix incorrect `count' for unmapped extents in fscache mode

   - Fix use-after-free of fsid and domain_id string when remounting

   - Fix missing xas_retry() in fscache mode"

* tag 'erofs-for-6.1-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
  erofs: fix missing xas_retry() in fscache mode
  erofs: fix use-after-free of fsid and domain_id string
  erofs: get correct count for unmapped range in fscache mode
  erofs: put metabuf in error path in fscache mode
  erofs: fix general protection fault when reading fragment
2022-11-15 10:30:34 -08:00
Jingbo Xu
37020bbb71 erofs: fix missing xas_retry() in fscache mode
The xarray iteration only holds the RCU read lock and thus may encounter
XA_RETRY_ENTRY if there's process modifying the xarray concurrently.
This will cause oops when referring to the invalid entry.

Fix this by adding the missing xas_retry(), which will make the
iteration wind back to the root node if XA_RETRY_ENTRY is encountered.

Fixes: d435d53228 ("erofs: change to use asynchronous io for fscache readpage/readahead")
Suggested-by: David Howells <dhowells@redhat.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com>
Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Link: https://lore.kernel.org/r/20221114121943.29987-1-jefflexu@linux.alibaba.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2022-11-14 23:48:38 +08:00
Jingbo Xu
39bfcb8138 erofs: fix use-after-free of fsid and domain_id string
When erofs instance is remounted with fsid or domain_id mount option
specified, the original fsid and domain_id string pointer in sbi->opt
is directly overridden with the fsid and domain_id string in the new
fs_context, without freeing the original fsid and domain_id string.
What's worse, when the new fsid and domain_id string is transferred to
sbi, they are not reset to NULL in fs_context, and thus they are freed
when remount finishes, while sbi is still referring to these strings.

Reconfiguration for fsid and domain_id seems unusual. Thus clarify this
restriction explicitly and dump a warning when users are attempting to
do this.

Besides, to fix the use-after-free issue, move fsid and domain_id from
erofs_mount_opts to outside.

Fixes: c6be2bd0a5 ("erofs: register fscache volume")
Fixes: 8b7adf1dff ("erofs: introduce fscache-based domain")
Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20221021023153.1330-1-jefflexu@linux.alibaba.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2022-11-10 09:53:20 +08:00
Johannes Weiner
82e60d00b7 fs: fix leaked psi pressure state
When psi annotations were added to to btrfs compression reads, the psi
state tracking over add_ra_bio_pages and btrfs_submit_compressed_read was
faulty.  A pressure state, once entered, is never left.  This results in
incorrectly elevated pressure, which triggers OOM kills.

pflags record the *previous* memstall state when we enter a new one.  The
code tried to initialize pflags to 1, and then optimize the leave call
when we either didn't enter a memstall, or were already inside a nested
stall.  However, there can be multiple PageWorkingset pages in the bio, at
which point it's that path itself that enters repeatedly and overwrites
pflags.  This causes us to miss the exit.

Enter the stall only once if needed, then unwind correctly.

erofs has the same problem, fix that up too.  And move the memstall exit
past submit_bio() to restore submit accounting originally added by
b8e24a9300 ("block: annotate refault stalls from IO submission").

Link: https://lkml.kernel.org/r/Y2UHRqthNUwuIQGS@cmpxchg.org
Fixes: 4088a47e78 ("btrfs: add manual PSI accounting for compressed reads")
Fixes: 99486c511f ("erofs: add manual PSI accounting for the compressed address space")
Fixes: 118f3663fb ("block: remove PSI accounting from the bio layer")
Link: https://lore.kernel.org/r/d20a0a85-e415-cf78-27f9-77dd7a94bc8d@leemhuis.info/
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Thorsten Leemhuis <linux@leemhuis.info>
Tested-by: Thorsten Leemhuis <linux@leemhuis.info>
Cc: Chao Yu <chao@kernel.org>
Cc: Chris Mason <clm@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Sterba <dsterba@suse.com>
Cc: Gao Xiang <xiang@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-08 15:57:25 -08:00
Jingbo Xu
e6d9f9ba11 erofs: get correct count for unmapped range in fscache mode
For unmapped range, the returned map.m_llen is zero, and thus the
calculated count is unexpected zero.

Prior to the refactoring introduced by commit 1ae9470c3e ("erofs:
clean up .read_folio() and .readahead() in fscache mode"), only the
readahead routine suffers from this. With the refactoring of making
.read_folio() and .readahead() calling one common routine, both
read_folio and readahead have this issue now.

Fix this by calculating count separately in unmapped condition.

Fixes: c665b394b9 ("erofs: implement fscache-based data readahead")
Fixes: 1ae9470c3e ("erofs: clean up .read_folio() and .readahead() in fscache mode")
Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20221104054028.52208-3-jefflexu@linux.alibaba.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2022-11-08 14:46:30 +08:00
Jingbo Xu
75e43355cb erofs: put metabuf in error path in fscache mode
For tail packing layout, put metabuf when error is encountered.

Fixes: 1ae9470c3e ("erofs: clean up .read_folio() and .readahead() in fscache mode")
Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20221104054028.52208-2-jefflexu@linux.alibaba.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2022-11-08 14:45:44 +08:00
Yue Hu
e5126de138 erofs: fix general protection fault when reading fragment
As syzbot reported [1], the fragment feature sb flag is not set, so
packed_inode != NULL needs to be checked in z_erofs_read_fragment().

[1] https://lore.kernel.org/all/0000000000002e7a8905eb841ddd@google.com/

Reported-by: syzbot+3faecbfd845a895c04cb@syzkaller.appspotmail.com
Fixes: b15b2e307c ("erofs: support on-disk compressed fragments data")
Signed-off-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20221021085325.25788-1-zbestahu@gmail.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2022-11-08 14:44:13 +08:00
Dawei Li
ce4b815686 erofs: protect s_inodes with s_inode_list_lock for fscache
s_inodes is superblock-specific resource, which should be
protected by sb's specific lock s_inode_list_lock.

Link: https://lore.kernel.org/r/TYCP286MB23238380DE3B74874E8D78ABCA299@TYCP286MB2323.JPNP286.PROD.OUTLOOK.COM
Fixes: 7d41963759 ("erofs: Support sharing cookies in the same domain")
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com>
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Signed-off-by: Dawei Li <set_pte_at@outlook.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2022-10-17 14:57:57 +08:00
Gao Xiang
e7933278b4 erofs: fix up inplace decompression success rate
Partial decompression should be checked after updating length.
It's a new regression when introducing multi-reference pclusters.

Fixes: 2bfab9c0ed ("erofs: record the longest decompressed size in this round")
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20221014064915.8103-1-hsiangkao@linux.alibaba.com
2022-10-17 06:55:49 +08:00
Gao Xiang
63bbb85658 erofs: shouldn't churn the mapping page for duplicated copies
If other duplicated copies exist in one decompression shot, should
leave the old page as is rather than replace it with the new duplicated
one.  Otherwise, the following cold path to deal with duplicated copies
will use the invalid bvec.  It impacts compressed data deduplication.

Also, shift the onlinepage EIO bit to avoid touching the signed bit.

Fixes: 267f2492c8 ("erofs: introduce multi-reference pclusters (fully-referenced)")
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20221012045056.13421-1-hsiangkao@linux.alibaba.com
2022-10-17 06:55:49 +08:00
Yue Hu
664609e49f erofs: fix illegal unmapped accesses in z_erofs_fill_inode_lazy()
Note that we are still accessing 'h_idata_size' and 'h_fragmentoff'
after calling erofs_put_metabuf(), that is not correct. Fix it.

Fixes: ab92184ff8 ("erofs: add on-disk compressed tail-packing inline support")
Fixes: b15b2e307c ("erofs: support on-disk compressed fragments data")
Signed-off-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20221005013528.62977-1-zbestahu@163.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2022-10-17 06:55:48 +08:00
Linus Torvalds
513389809e for-6.1/block-2022-10-03
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmM67XkQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpiHoD/9eN+6YnNRPu5+2zeGnnm1Nlwic6YMZeORr
 KFIeC0COMWoFhNBIPFkgAKT+0qIH+uGt5UsHSM3Y5La7wMR8yLxD4PAnvTZ/Ijtt
 yxVIOmonJoQ0OrQ2kTbvDXL/9OCUrzwXXyUIEPJnH0Ca1mxeNOgDHbE7VGF6DMul
 0D3pI8qs2WLnHlDi1V/8kH5qZ6WoAJSDcb8sTzOUVnyveZPNaZhGQJuHA2XAYMtg
 fqKMDJqgmNk6jdTMUgdF5B+rV64PQoCy28I7fXqGkEe+RE5TBy57vAa0XY84V8XR
 /a8CEuwMts2ypk1hIcJG8Vv8K6u5war9yPM5MTngKsoMpzNIlhrhaJQVyjKdcs+E
 Ixwzexu6xTYcrcq+mUARgeTh79FzTBM/uXEdbCG2G3S6HPd6UZWUJZGfxw/l0Aem
 V4xB7lj6SQaJDU1iJCYUaHcekNXhQAPvyVG+R2ED1SO3McTpTPIM1aeigxw6vj7u
 bH3Kfdr94Z8HNuoLuiS6YYfjNt2Shf4LEB6GxKJ9TYHtyhdOyO0H64jGHpygrWqN
 cSnkWPUqUUNpF7srKM0ZgbliCshvmyJc4aMOFd0gBY/kXf5J/j7IXvh8TFCi9rHH
 0KyZH3/3Zsu9geUn3ynznlr4FXU+BcqE6boaa/iWb9sN1m+Rvaahv8cSch/dh44a
 vQNj/iOBQA==
 =R05e
 -----END PGP SIGNATURE-----

Merge tag 'for-6.1/block-2022-10-03' of git://git.kernel.dk/linux

Pull block updates from Jens Axboe:

 - NVMe pull requests via Christoph:
      - handle number of queue changes in the TCP and RDMA drivers
        (Daniel Wagner)
      - allow changing the number of queues in nvmet (Daniel Wagner)
      - also consider host_iface when checking ip options (Daniel
        Wagner)
      - don't map pages which can't come from HIGHMEM (Fabio M. De
        Francesco)
      - avoid unnecessary flush bios in nvmet (Guixin Liu)
      - shrink and better pack the nvme_iod structure (Keith Busch)
      - add comment for unaligned "fake" nqn (Linjun Bao)
      - print actual source IP address through sysfs "address" attr
        (Martin Belanger)
      - various cleanups (Jackie Liu, Wolfram Sang, Genjian Zhang)
      - handle effects after freeing the request (Keith Busch)
      - copy firmware_rev on each init (Keith Busch)
      - restrict management ioctls to admin (Keith Busch)
      - ensure subsystem reset is single threaded (Keith Busch)
      - report the actual number of tagset maps in nvme-pci (Keith
        Busch)
      - small fabrics authentication fixups (Christoph Hellwig)
      - add common code for tagset allocation and freeing (Christoph
        Hellwig)
      - stop using the request_queue in nvmet (Christoph Hellwig)
      - set min_align_mask before calculating max_hw_sectors (Rishabh
        Bhatnagar)
      - send a rediscover uevent when a persistent discovery controller
        reconnects (Sagi Grimberg)
      - misc nvmet-tcp fixes (Varun Prakash, zhenwei pi)

 - MD pull request via Song:
      - Various raid5 fix and clean up, by Logan Gunthorpe and David
        Sloan.
      - Raid10 performance optimization, by Yu Kuai.

 - sbitmap wakeup hang fixes (Hugh, Keith, Jan, Yu)

 - IO scheduler switching quisce fix (Keith)

 - s390/dasd block driver updates (Stefan)

 - support for recovery for the ublk driver (ZiyangZhang)

 - rnbd drivers fixes and updates (Guoqing, Santosh, ye, Christoph)

 - blk-mq and null_blk map fixes (Bart)

 - various bcache fixes (Coly, Jilin, Jules)

 - nbd signal hang fix (Shigeru)

 - block writeback throttling fix (Yu)

 - optimize the passthrough mapping handling (me)

 - prepare block cgroups to being gendisk based (Christoph)

 - get rid of an old PSI hack in the block layer, moving it to the
   callers instead where it belongs (Christoph)

 - blk-throttle fixes and cleanups (Yu)

 - misc fixes and cleanups (Liu Shixin, Liu Song, Miaohe, Pankaj,
   Ping-Xiang, Wolfram, Saurabh, Li Jinlin, Li Lei, Lin, Li zeming,
   Miaohe, Bart, Coly, Gaosheng

* tag 'for-6.1/block-2022-10-03' of git://git.kernel.dk/linux: (162 commits)
  sbitmap: fix lockup while swapping
  block: add rationale for not using blk_mq_plug() when applicable
  block: adapt blk_mq_plug() to not plug for writes that require a zone lock
  s390/dasd: use blk_mq_alloc_disk
  blk-cgroup: don't update the blkg lookup hint in blkg_conf_prep
  nvmet: don't look at the request_queue in nvmet_bdev_set_limits
  nvmet: don't look at the request_queue in nvmet_bdev_zone_mgmt_emulate_all
  blk-mq: use quiesced elevator switch when reinitializing queues
  block: replace blk_queue_nowait with bdev_nowait
  nvme: remove nvme_ctrl_init_connect_q
  nvme-loop: use the tagset alloc/free helpers
  nvme-loop: store the generic nvme_ctrl in set->driver_data
  nvme-loop: initialize sqsize later
  nvme-fc: use the tagset alloc/free helpers
  nvme-fc: store the generic nvme_ctrl in set->driver_data
  nvme-fc: keep ctrl->sqsize in sync with opts->queue_size
  nvme-rdma: use the tagset alloc/free helpers
  nvme-rdma: store the generic nvme_ctrl in set->driver_data
  nvme-tcp: use the tagset alloc/free helpers
  nvme-tcp: store the generic nvme_ctrl in set->driver_data
  ...
2022-10-07 09:19:14 -07:00
Gao Xiang
312fe643ad erofs: clean up erofs_iget()
isdir indicated REQ_META|REQ_PRIO which no longer works now.
Get rid of isdir entirely.

Link: https://lore.kernel.org/r/20220927063607.54832-2-hsiangkao@linux.alibaba.com
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2022-09-27 17:27:45 +08:00
Gao Xiang
53a7f9961c erofs: clean up unnecessary code and comments
Some conditional macros and comments are useless.

Link: https://lore.kernel.org/r/20220927063607.54832-1-hsiangkao@linux.alibaba.com
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2022-09-27 17:27:25 +08:00