Recently we catched ENOSPC returned by make_reservation() while doing
fsstress on UBIFS, we got following information when it occurred (See
details in Link):
UBIFS error (ubi0:0 pid 3640152): make_reservation [ubifs]: cannot
reserve 112 bytes in jhead 2, error -28
CPU: 2 PID: 3640152 Comm: kworker/u16:2 Tainted: G B W
Hardware name: Hisilicon PhosphorHi1230 EMU (DT)
Workqueue: writeback wb_workfn (flush-ubifs_0_0)
Call trace:
dump_stack+0x114/0x198
make_reservation+0x564/0x610 [ubifs]
ubifs_jnl_write_data+0x328/0x48c [ubifs]
do_writepage+0x2a8/0x3e4 [ubifs]
ubifs_writepage+0x16c/0x374 [ubifs]
generic_writepages+0xb4/0x114
do_writepages+0xcc/0x11c
writeback_sb_inodes+0x2d0/0x564
wb_writeback+0x20c/0x2b4
wb_workfn+0x404/0x510
process_one_work+0x304/0x4ac
worker_thread+0x31c/0x4e4
kthread+0x23c/0x290
Budgeting info: data budget sum 17576, total budget sum 17768
budg_data_growth 4144, budg_dd_growth 13432, budg_idx_growth 192
min_idx_lebs 13, old_idx_sz 988640, uncommitted_idx 0
page_budget 4144, inode_budget 160, dent_budget 312
nospace 0, nospace_rp 0
dark_wm 8192, dead_wm 4096, max_idx_node_sz 192
freeable_cnt 0, calc_idx_sz 988640, idx_gc_cnt 0
dirty_pg_cnt 4, dirty_zn_cnt 0, clean_zn_cnt 4811
gc_lnum 21, ihead_lnum 14
jhead 0 (GC) LEB 16
jhead 1 (base) LEB 34
jhead 2 (data) LEB 23
bud LEB 16
bud LEB 23
bud LEB 34
old bud LEB 33
old bud LEB 31
old bud LEB 15
commit state 4
Budgeting predictions:
available: 33832, outstanding 17576, free 15356
(pid 3640152) start dumping LEB properties
(pid 3640152) Lprops statistics: empty_lebs 3, idx_lebs 11
taken_empty_lebs 1, total_free 1253376, total_dirty 2445736
total_used 3438712, total_dark 65536, total_dead 17248
LEB 15 free 0 dirty 248000 used 5952 (taken)
LEB 16 free 110592 dirty 896 used 142464 (taken, jhead 0 (GC))
LEB 21 free 253952 dirty 0 used 0 (taken, GC LEB)
LEB 23 free 0 dirty 248104 used 5848 (taken, jhead 2 (data))
LEB 29 free 253952 dirty 0 used 0 (empty)
LEB 33 free 0 dirty 253952 used 0 (taken)
LEB 34 free 217088 dirty 36544 used 320 (taken, jhead 1 (base))
LEB 37 free 253952 dirty 0 used 0 (empty)
OTHERS: index lebs, zero-available non-index lebs
According to the budget algorithm, there are 5 LEBs reserved for budget:
three journal heads(16,23,34), 1 GC LEB(21) and 1 deletion LEB(can be
used in make_reservation()). There are 2 empty LEBs used for index nodes,
which is calculated as min_idx_lebs - idx_lebs = 2. In theory, LEB 15
and 33 should be reclaimed as free state after committing, but it is now
in taken state. After looking the realization of reserve_space(), there's
a possible situation:
LEB 15: free 2000 dirty 248000 used 3952 (jhead 2)
LEB 23: free 2000 dirty 248104 used 3848 (bud, taken)
LEB 33: free 2000 dirty 251952 used 0 (bud, taken)
wb_workfn wb_workfn_2
do_writepage // write 3000 bytes
ubifs_jnl_write_data
make_reservation
reserve_space
ubifs_garbage_collect
ubifs_find_dirty_leb // ret ENOSPC, dirty LEBs are taken
nospc_retries++ // 1
ubifs_run_commit
do_commit
LEB 15: free 2000 dirty 248000 used 3952 (jhead 2)
LEB 23: free 2000 dirty 248104 used 3848 (dirty)
LEB 33: free 2000 dirty 251952 used 0 (dirty)
do_writepage // write 2000 bytes for 3 times
ubifs_jnl_write_data
// grabs 15\23\33
LEB 15: free 0 dirty 248000 used 5952 (bud, taken)
LEB 23: free 0 dirty 248104 used 5848 (jhead 2)
LEB 33: free 0 dirty 253952 used 0 (bud, taken)
reserve_space
ubifs_garbage_collect
ubifs_find_dirty_leb // ret ENOSPC, dirty LEBs are taken
if (nospc_retries++ < 2) // false
ubifs_ro_mode !
Fetch a reproducer in Link.
The dirty LEBs could be grabbed by other threads, which fails finding dirty
LEBs of GC in current thread, so make_reservation() could try many times to
invoke GC&&committing, but current realization limits the times of retrying
as 'nospc_retries'(twice).
Fix it by adding a wait queue, start queuing up space reservation tasks
when someone task has retried gc + commit for many times. Then there is
only one task making space reservation at any time, and it can always make
success under the premise of correct budgeting.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=218164
Fixes: 1e51764a3c ("UBIFS: add new flash file system")
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
If function dbg_check_idx_size() failed by loading znode in mounting
process, there are two problems:
1. Allocated znodes won't be freed, which causes kmemleak in kernel:
ubifs_mount
dbg_check_idx_size
dbg_walk_index
c->zroot.znode = ubifs_load_znode
child = ubifs_load_znode // failed
// Loaded znodes won't be freed in error handling path.
2. Global variable ubifs_clean_zn_cnt is not decreased, because
ubifs_tnc_close() is not invoked in error handling path, which
triggers a warning in ubifs_exit():
WARNING: CPU: 1 PID: 1576 at fs/ubifs/super.c:2486 ubifs_exit
Modules linked in: zstd ubifs(-) ubi nandsim
CPU: 1 PID: 1576 Comm: rmmod Not tainted 6.7.0-rc6
Call Trace:
ubifs_exit+0xca/0xc70 [ubifs]
__do_sys_delete_module+0x29a/0x4a0
do_syscall_64+0x6f/0x140
Fix it by adding error handling path in dbg_check_idx_size() to release
tnc tree.
Fixes: 1e51764a3c ("UBIFS: add new flash file system")
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Suggested-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Richard Weinberger <richard@nod.at>
API:
- Add virtual-address based lskcipher interface.
- Optimise ahash/shash performance in light of costly indirect calls.
- Remove ahash alignmask attribute.
Algorithms:
- Improve AES/XTS performance of 6-way unrolling for ppc.
- Remove some uses of obsolete algorithms (md4, md5, sha1).
- Add FIPS 202 SHA-3 support in pkcs1pad.
- Add fast path for single-page messages in adiantum.
- Remove zlib-deflate.
Drivers:
- Add support for S4 in meson RNG driver.
- Add STM32MP13x support in stm32.
- Add hwrng interface support in qcom-rng.
- Add support for deflate algorithm in hisilicon/zip.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEn51F/lCuNhUwmDeSxycdCkmxi6cFAmVB3vgACgkQxycdCkmx
i6dsOBAAykbnX8BpnpnOXYywE9ZWrl98rAk51MK0N9olZNfg78zRPIv7fFxFdC20
SDJrDSNPmn0Qvaa5e0EfoAdklsm0k2GkXL/BwPKMKWUsyIoJVYI3WrBMnjBy9xMp
yfME+h0bKoXJCZKnYkIUSGUejmUPSyRlEylrXoFlH/VWYwAaii/x9zwreQoF+0LR
KI24A1q8AYs6Dw9HSfndaAub9GOzrqKYs6fSaMG+77Y4UC5aoi5J9Bp2G3uVyHay
x/0bZtIxKXS9wn+LeG/3GspX23x/I5VwBOdAoMigrYmAIaIg5qgyMszudltTAs4R
zF1Kh7WsnM5+vpnBSeigzo+/GGOU3QTz8y3tBTg+3ZR7GWGOwQLiizhOYqCyOfAH
pIm6c++sZw/OOHiL69Nt4HeLKzGNYYWk3s4X/B/6cqoouPfOsfBaQobZNx9zfy7q
ZNEvSVBjrFX/L6wDSotny1LTWLUNjHbmLaMV5uQZ/SQKEtv19fp2Dl7SsLkHH+3v
ldOAwfoJR6QcSwz3Ez02TUAvQhtP172Hnxi7u44eiZu2aUboLhCFr7aEU6kVdBCx
1rIRVHD1oqlOEDRwPRXzhF3I8R4QDORJIxZ6UUhg7yueuI+XCGDsBNC+LqBrBmSR
IbdjqmSDUBhJyM5yMnt1VFYhqKQ/ZzwZ3JQviwW76Es9pwEIolM=
=IZmR
-----END PGP SIGNATURE-----
Merge tag 'v6.7-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu:
"API:
- Add virtual-address based lskcipher interface
- Optimise ahash/shash performance in light of costly indirect calls
- Remove ahash alignmask attribute
Algorithms:
- Improve AES/XTS performance of 6-way unrolling for ppc
- Remove some uses of obsolete algorithms (md4, md5, sha1)
- Add FIPS 202 SHA-3 support in pkcs1pad
- Add fast path for single-page messages in adiantum
- Remove zlib-deflate
Drivers:
- Add support for S4 in meson RNG driver
- Add STM32MP13x support in stm32
- Add hwrng interface support in qcom-rng
- Add support for deflate algorithm in hisilicon/zip"
* tag 'v6.7-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (283 commits)
crypto: adiantum - flush destination page before unmapping
crypto: testmgr - move pkcs1pad(rsa,sha3-*) to correct place
Documentation/module-signing.txt: bring up to date
module: enable automatic module signing with FIPS 202 SHA-3
crypto: asymmetric_keys - allow FIPS 202 SHA-3 signatures
crypto: rsa-pkcs1pad - Add FIPS 202 SHA-3 support
crypto: FIPS 202 SHA-3 register in hash info for IMA
x509: Add OIDs for FIPS 202 SHA-3 hash and signatures
crypto: ahash - optimize performance when wrapping shash
crypto: ahash - check for shash type instead of not ahash type
crypto: hash - move "ahash wrapping shash" functions to ahash.c
crypto: talitos - stop using crypto_ahash::init
crypto: chelsio - stop using crypto_ahash::init
crypto: ahash - improve file comment
crypto: ahash - remove struct ahash_request_priv
crypto: ahash - remove crypto_ahash_alignmask
crypto: gcm - stop using alignmask of ahash
crypto: chacha20poly1305 - stop using alignmask of ahash
crypto: ccm - stop using alignmask of ahash
net: ipv6: stop checking crypto_ahash_alignmask
...
This makes it harder for accidental or malicious changes to
ubifs_xattr_handlers at runtime.
Cc: Richard Weinberger <richard@nod.at>
Cc: linux-mtd@lists.infradead.org
Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
Link: https://lore.kernel.org/r/20230930050033.41174-26-wedsonaf@gmail.com
Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
The header file crypto/algapi.h is for internal use only. Use the
header file crypto/utils.h instead.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Now that all of the update_time operations are prepared for it, we can
drop the timespec64 argument from the update_time operation. Do that and
remove it from some associated functions like inode_update_time and
inode_needs_update_time.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Message-Id: <20230807-mgctime-v7-8-d1dec143a704@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
JFFS2:
- Fix memory corruption in error path
- Spelling and coding style fixes
UBI:
- Switch to BLK_MQ_F_BLOCKING in ubiblock
- Wire up partent device (for sysfs)
- Multiple UAF bugfixes
- Fix for an infinite loop in WL error path
UBIFS:
- Fix for multiple memory leaks in error paths
- Fixes for wrong space accounting
- Minor cleanups
- Spelling and coding style fixes
-----BEGIN PGP SIGNATURE-----
iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAmP/BSMWHHJpY2hhcmRA
c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wX5QEADnPsAW87AIi44UlkbAsJEjkEqU
yQrI9UPsdqfE7K5OR7Vb2tOti3MLfaiJ5MTG64xNaTEgmrbcqo4GgENk4Pe6zJ9q
azO/xWTr6r8G4aE70KhsUBc9Vc/99Ok48rxLHJS+u7s+FvOJ//WirGXjyNZEVDyx
TgvH80rhUlfj1ExqEXLcUfQ53WUnR3PefOw+zIi29ldtgaI/TFmnWObl2A/XtsGr
0ExIdqDnoTfSsYecfGP71jVbYH8u2mLFe2zNOYW1pvrHYhcet6Q6e+69fgnyNRf9
5s5dbDPmmWN2Qdz63AWwHfC4mncoQdc7HbnnKIk6bTm3v6gqXQSrgy/hZEX3LmWr
G41j7RLhBMZ2mljRfFQH97V71n9TL010T6jZ5zurvqErXtVdFImmInGmqY3AR13I
fzUWnJf81xKgcMv6My2/nEVGDtrGLFdILoT8j6VIXQO6teqtF7Vip7UfamrwL399
57fy0Iwbx2aume/IdwI3TZYacBQ2d8GvQxTZshJ+qx+gCvhb6EyiSIn3AOgtrbAo
srXMy+6xXJecKNHR7Bl1C5BLei/25HDIYjk/yiAH/IhhV20c5eiySSvA+q720Gcz
reBLFh7EtcP51B2GXMdSCOgYY8wGaMl5zlbiLYzY79ptgnnBJ7luuYjc3c02MGB9
25qVLnpvC/ZREzZqLg==
=fTYq
-----END PGP SIGNATURE-----
Merge tag 'ubifs-for-linus-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs
Pull jffs2, ubi and ubifs updates from Richard Weinberger:
"JFFS2:
- Fix memory corruption in error path
- Spelling and coding style fixes
UBI:
- Switch to BLK_MQ_F_BLOCKING in ubiblock
- Wire up partent device (for sysfs)
- Multiple UAF bugfixes
- Fix for an infinite loop in WL error path
UBIFS:
- Fix for multiple memory leaks in error paths
- Fixes for wrong space accounting
- Minor cleanups
- Spelling and coding style fixes"
* tag 'ubifs-for-linus-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs: (36 commits)
ubi: block: Fix a possible use-after-free bug in ubiblock_create()
ubifs: make kobj_type structures constant
mtd: ubi: block: wire-up device parent
mtd: ubi: wire-up parent MTD device
ubi: use correct names in function kernel-doc comments
ubi: block: set BLK_MQ_F_BLOCKING
jffs2: Fix list_del corruption if compressors initialized failed
jffs2: Use function instead of macro when initialize compressors
jffs2: fix spelling mistake "neccecary"->"necessary"
ubifs: Fix kernel-doc
ubifs: Fix some kernel-doc comments
UBI: Fastmap: Fix kernel-doc
ubi: ubi_wl_put_peb: Fix infinite loop when wear-leveling work failed
ubi: Fix UAF wear-leveling entry in eraseblk_count_seq_show()
ubi: fastmap: Fix missed fm_anchor PEB in wear-leveling after disabling fastmap
ubifs: ubifs_releasepage: Remove ubifs_assert(0) to valid this process
ubifs: ubifs_writepage: Mark page dirty after writing inode failed
ubifs: dirty_cow_znode: Fix memleak in error handling path
ubifs: Re-statistic cleaned znode count if commit failed
ubi: Fix permission display of the debugfs files
...
With CONFIG_UBIFS_FS_AUTHENTICATION not set, the compiler can assume that
ubifs_node_check_hash() is never true and drops the call to ubifs_bad_hash().
Is CONFIG_CC_OPTIMIZE_FOR_SIZE enabled this optimization does not happen anymore.
So When CONFIG_UBIFS_FS and CONFIG_CC_OPTIMIZE_FOR_SIZE is enabled but
CONFIG_UBIFS_FS_AUTHENTICATION is not set, the build errors is as followd:
ERROR: modpost: "ubifs_bad_hash" [fs/ubifs/ubifs.ko] undefined!
Fix it by add no-op ubifs_bad_hash() for the CONFIG_UBIFS_FS_AUTHENTICATION=n case.
Fixes: 16a26b20d2 ("ubifs: authentication: Add hashes to index nodes")
Signed-off-by: Li Hua <hucool.lihua@huawei.com>
Reviewed-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Richard Weinberger <richard@nod.at>
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
FS_CRYPTO_BLOCK_SIZE is neither the filesystem block size nor the
granularity of encryption. Rather, it defines two logically separate
constraints that both arise from the block size of the AES cipher:
- The alignment required for the lengths of file contents blocks
- The minimum input/output length for the filenames encryption modes
Since there are way too many things called the "block size", and the
connection with the AES block size is not easily understood, split
FS_CRYPTO_BLOCK_SIZE into two constants FSCRYPT_CONTENTS_ALIGNMENT and
FSCRYPT_FNAME_MIN_MSG_LEN that more clearly describe what they are.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20220405010914.18519-1-ebiggers@kernel.org
Since 9ec64962afb1702f75b("ubifs: Implement RENAME_EXCHANGE") and
9e0a1fff8db56eaaebb("ubifs: Implement RENAME_WHITEOUT") are applied,
ubifs_rename locks and changes 4 ubifs inodes, correct the comment
for ui_mutex in ubifs_inode.
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Not all ubifs filesystem errors are propagated to userspace.
Export bad magic, bad node and crc errors via sysfs. This allows userspace
to notice filesystem errors:
/sys/fs/ubifs/ubiX_Y/errors_magic
/sys/fs/ubifs/ubiX_Y/errors_node
/sys/fs/ubifs/ubiX_Y/errors_crc
The counters are reset to 0 with a remount.
Signed-off-by: Stefan Schaeckeler <sschaeck@cisco.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Fix some spelling mistakes in comments:
withoug ==> without
numer ==> number
aswell ==> as well
referes ==> refers
childs ==> children
unnecesarry ==> unnecessary
Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com>
Reviewed-by: Alexander Dahl <ada@thorsis.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
UBIFS may occur some problems with concurrent xattr_{set|get} and
listxattr operations, such as assertion failure, memory corruption,
stale xattr value[1].
Fix it by importing a new rw-lock in @ubifs_inode to serilize write
operations on xattr, concurrent read operations are still effective,
just like ext4.
[1] https://lore.kernel.org/linux-mtd/20200630130438.141649-1-houtao1@huawei.com
Fixes: 1e51764a3c ("UBIFS: add new flash file system")
Cc: stable@vger.kernel.org # v2.6+
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Richard Weinberger <richard@nod.at>
Use the fileattr API to let the VFS handle locking, permission checking and
conversion.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: Richard Weinberger <richard@nod.at>
Extend some inode methods with an additional user namespace argument. A
filesystem that is aware of idmapped mounts will receive the user
namespace the mount has been marked with. This can be used for
additional permission checking and also to enable filesystems to
translate between uids and gids if they need to. We have implemented all
relevant helpers in earlier patches.
As requested we simply extend the exisiting inode method instead of
introducing new ones. This is a little more code churn but it's mostly
mechanical and doesnt't leave us with additional inode methods.
Link: https://lore.kernel.org/r/20210121131959.646623-25-christian.brauner@ubuntu.com
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Function ubifs_dump_node() has been modified to avoid memory oob
accessing while dumping node, node length (corresponding to the
size of allocated memory for node) should be passed into all node
dumping callers.
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Define ubifs_listxattr and ubifs_xattr_handlers to NULL
when CONFIG_UBIFS_FS_XATTR is not enabled, then we can
remove many ugly ifdef macros in the code.
Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
Signed-off-by: Richard Weinberger <richard@nod.at>
Instead of creating ubifs file systems with UBIFS_FORMAT_VERSION
by default, add a module parameter ubifs.default_version to allow
the user to specify the desired version. Valid values are 4 to
UBIFS_FORMAT_VERSION (currently 5).
This way, one can for example create a file system with version 4
on kernel 4.19 which can still be mounted rw when downgrading to
kernel 4.9.
Signed-off-by: Martin Kaistra <martin.kaistra@linutronix.de>
Signed-off-by: Richard Weinberger <richard@nod.at>
There's no need for the ubifs_crypt_is_encrypted() function anymore.
Just use IS_ENCRYPTED() instead, like ext4 and f2fs do. IS_ENCRYPTED()
checks the VFS-level flag instead of the UBIFS-specific flag, but it
shouldn't change any behavior since the flags are kept in sync.
Link: https://lore.kernel.org/r/20191209212721.244396-1-ebiggers@kernel.org
Acked-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Eric Biggers <ebiggers@google.com>
HMACs can only be generated on the system the UBIFS image is running on.
To support offline signed images we add a PKCS#7 signature to the UBIFS
image which can be created by mkfs.ubifs.
Both the master node and the superblock need to be authenticated, during
normal runtime both are protected with HMACs. For offline signature
support however only a single signature is desired. We add a signature
covering the superblock node directly behind it. To protect the master
node a hash of the master node is added to the superblock which is used
when the master node doesn't contain a HMAC.
Transition to a read/write filesystem is also supported. During
transition first the master node is rewritten with a HMAC (implicitly,
it is written anyway as the FS is marked dirty). Afterwards the
superblock is rewritten with a HMAC. Once after the image has been
mounted read/write it is HMAC only, the signature is no longer required
or even present on the filesystem.
In an offline signed image the master node is authenticated by the
superblock. In a transition to r/w we have to make sure that the master
node is rewritten before the superblock node. In this case the master
node gets a HMAC and its authenticity no longer depends on the
superblock node. There are some cases in which the current code first
writes the superblock node though, so with this patch writing of the
superblock node is delayed until the master node is written.
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Richard Weinberger <richard@nod.at>
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation this program is
distributed in the hope that it will be useful but without any
warranty without even the implied warranty of merchantability or
fitness for a particular purpose see the gnu general public license
for more details you should have received a copy of the gnu general
public license along with this program if not write to the free
software foundation inc 51 franklin st fifth floor boston ma 02110
1301 usa
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 246 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190530000436.674189849@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix gcc build error while CONFIG_UBIFS_FS_XATTR
is not set
fs/ubifs/dir.o: In function `ubifs_unlink':
dir.c:(.text+0x260): undefined reference to `ubifs_purge_xattrs'
fs/ubifs/dir.o: In function `do_rename':
dir.c:(.text+0x1edc): undefined reference to `ubifs_purge_xattrs'
fs/ubifs/dir.o: In function `ubifs_rmdir':
dir.c:(.text+0x2638): undefined reference to `ubifs_purge_xattrs'
Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: 9ca2d73264 ("ubifs: Limit number of xattrs per inode")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
ifdefs reduce readability and compile coverage. This removes the ifdefs
around CONFIG_UBIFS_ATIME_SUPPORT by replacing them with IS_ENABLED()
where applicable. The fs layer would fall back to generic_update_time()
when .update_time doesn't exist. We do this fallback explicitly now.
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Richard Weinberger <richard@nod.at>
Since we have to write one deletion inode per xattr
into the journal, limit the max number of xattrs.
In theory UBIFS supported up to 65535 xattrs per inode.
But this never worked correctly, expect no powercuts happened.
Now we support only as many xattrs as we can store in 50% of a
LEB.
Even for tiny flashes this allows dozens of xattrs per inode,
which is for an embedded filesystem still fine.
In case someone has existing inodes with much more xattrs, it is
still possible to delete them.
UBIFS will fall back to an non-atomic deletion mode.
Reported-by: Stefan Agner <stefan@agner.ch>
Fixes: 1e51764a3c ("UBIFS: add new flash file system")
Signed-off-by: Richard Weinberger <richard@nod.at>
Like for the journal case, make sure that we track all xattr
inodes.
Otherwise UBIFS might not be able to locate stale xattr inodes
upon recovery.
Reported-by: Stefan Agner <stefan@agner.ch>
Fixes: 1e51764a3c ("UBIFS: add new flash file system")
Signed-off-by: Richard Weinberger <richard@nod.at>
In order to have a common code base for fscrypt "post read" processing
for all filesystems which support encryption, this commit removes
filesystem specific build config option (e.g. CONFIG_EXT4_FS_ENCRYPTION)
and replaces it with a build option (i.e. CONFIG_FS_ENCRYPTION) whose
value affects all the filesystems making use of fscrypt.
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
In authenticated mode we cannot fixup the inode sizes in-place
during recovery as this would invalidate the hashes and HMACs
we stored for this inode.
Instead, we just write the updated inodes to the journal. We can
only do this after ubifs_rcvry_gc_commit() is done though, so for
authenticated mode call ubifs_recover_size() after
ubifs_rcvry_gc_commit() and not vice versa as normally done.
Calling ubifs_recover_size() after ubifs_rcvry_gc_commit() has the
drawback that after a commit the size fixup information is gone, so
when a powercut happens while recovering from another powercut
we may lose some data written right before the first powercut.
This is why we only do this in authenticated mode and leave the
behaviour for unauthenticated mode untouched.
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Richard Weinberger <richard@nod.at>
During creation of the default filesystem on an empty flash the default
LPT is created. With this patch a hash over the default LPT is
calculated which can be added to the default filesystems master node.
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Richard Weinberger <richard@nod.at>
The master node contains hashes over the root index node and the LPT.
This patch adds a HMAC to authenticate the master node itself.
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Richard Weinberger <richard@nod.at>
The LPT needs to be authenticated aswell. Since the LPT is only written
during commit it is enough to authenticate the whole LPT with a single
hash which is stored in the master node. Only the leaf nodes (pnodes)
are hashed which makes the implementation much simpler than it would be
to hash the complete LPT.
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Richard Weinberger <richard@nod.at>
Nodes that are written to flash can only be authenticated through the
index after the next commit. When a journal replay is necessary the
nodes are not yet referenced by the index and thus can't be
authenticated.
This patch overcomes this situation by creating a hash over all nodes
beginning from the commit start node over the reference node(s) and
the buds themselves. From
time to time we insert authentication nodes. Authentication nodes
contain a HMAC from the current hash state, so that they can be
used to authenticate a journal replay up to the point where the
authentication node is. The hash is continued afterwards
so that theoretically we would only have to check the HMAC of
the last authentication node we find.
Overall we get this picture:
,,,,,,,,
,......,...........................................
,. CS , hash1.----. hash2.----.
,. | , . |hmac . |hmac
,. v , . v . v
,.REF#0,-> bud -> bud -> bud.-> auth -> bud -> bud.-> auth ...
,..|...,...........................................
, | ,
, | ,,,,,,,,,,,,,,,
. | hash3,----.
, | , |hmac
, v , v
, REF#1 -> bud -> bud,-> auth ...
,,,|,,,,,,,,,,,,,,,,,,
v
REF#2 -> ...
|
V
...
Note how hash3 covers CS, REF#0 and REF#1 so that it is not possible to
exchange or skip any reference nodes. Unlike the picture suggests the
auth nodes themselves are not hashed.
With this it is possible for an offline attacker to cut each journal
head or to drop the last reference node(s), but not to skip any journal
heads or to reorder any operations.
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Richard Weinberger <richard@nod.at>
With this patch the hashes over the index nodes stored in the tree node
cache are written to flash and are checked when read back from flash.
The hash of the root index node is stored in the master node.
During journal replay the hashes are regenerated from the read nodes
and stored in the tree node cache. This means the nodes must previously
be authenticated by other means. This is done in a later patch.
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Richard Weinberger <richard@nod.at>
As part of the UBIFS authentication support every branch in the index
gets a hash covering the referenced node. To make that happen the tree
node cache needs hashes over the nodes. This patch adds a hash argument
to ubifs_tnc_add() and ubifs_tnc_add_nm(). The hashes are calculated
from the callers of these functions which actually prepare the nodes.
With this patch all the leaf nodes of the index tree get hashes, but
currently nothing is done with these hashes, this is left for a later
patch.
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Richard Weinberger <richard@nod.at>
With authentication support some nodes (master node, super block node)
get a HMAC embedded into them. This patch adds functions to prepare and
write such a node.
The difficulty is that besides the HMAC the nodes also have a CRC which
must stay valid. This means we first have to initialize all fields in
the node, then calculate the HMAC (not covering the CRC) and finally
calculate the CRC.
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Richard Weinberger <richard@nod.at>
This patch adds the various helper functions needed for authentication
support. We need functions to hash nodes, to embed HMACs into a node and
to compare hashes and HMACs. Most functions first check if this
filesystem is authenticated and bail out early if not, which makes the
functions safe to be called with disabled authentication.
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Richard Weinberger <richard@nod.at>
When adding authentication support we will embed a HMAC into some
nodes. To prepare these nodes we have to first initialize the nodes,
then add a HMAC and finally add a CRC. To accomplish this add separate
ubifs_init_node/ubifs_crc_node functions.
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Richard Weinberger <richard@nod.at>
The superblock node is read/modified/written several times throughout
the UBIFS code. Instead of reading it from the device each time just
keep a copy in memory and write back the modified copy when necessary.
This patch helps for authentication support, here we not only have to
read the superblock node, but also have to authenticate it, which
is easier if we do it once during initialization.
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Richard Weinberger <richard@nod.at>
ubifs_lpt_lookup could be implemented using pnode_lookup. To make that
possible move pnode_lookup from lpt.c to lpt_commit.c. Rename it to
ubifs_pnode_lookup since it's now exported.
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Richard Weinberger <richard@nod.at>
With having access to struct ubifs_info in ubifs_assert() we can
give more information when an assert is failing.
By using ubifs_err() we can tell which UBIFS instance failed.
Also multiple actions can be taken now.
We support:
- report: This is what UBIFS did so far, just report the failure and go
on.
- read-only: Switch to read-only mode.
- panic: shoot the kernel in the head.
Signed-off-by: Richard Weinberger <richard@nod.at>
This allows us to have more context in ubifs_assert()
and take different actions depending on the configuration.
Signed-off-by: Richard Weinberger <richard@nod.at>
Allow to disable extended attribute support.
This aids in reliability testing, especially since some xattr
related bugs have surfaced.
Also an embedded system might not need it, so this allows for a
slightly smaller kernel (about 4KiB).
Signed-off-by: Stefan Agner <stefan@agner.ch>
Signed-off-by: Richard Weinberger <richard@nod.at>
The tnc uses get_seconds() based timestamps to check the age of a znode,
which has two problems: on 32-bit architectures this may overflow in
2038 or 2106, and it gives incorrect information when the system time
is updated using settimeofday().
Using montonic timestamps with ktime_get_seconds() solves both thes
problems.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Richard Weinberger <richard@nod.at>
This is a late set of changes from Deepa Dinamani doing an automated
treewide conversion of the inode and iattr structures from 'timespec'
to 'timespec64', to push the conversion from the VFS layer into the
individual file systems.
There were no conflicts between this and the contents of linux-next
until just before the merge window, when we saw multiple problems:
- A minor conflict with my own y2038 fixes, which I could address
by adding another patch on top here.
- One semantic conflict with late changes to the NFS tree. I addressed
this by merging Deepa's original branch on top of the changes that
now got merged into mainline and making sure the merge commit includes
the necessary changes as produced by coccinelle.
- A trivial conflict against the removal of staging/lustre.
- Multiple conflicts against the VFS changes in the overlayfs tree.
These are still part of linux-next, but apparently this is no longer
intended for 4.18 [1], so I am ignoring that part.
As Deepa writes:
The series aims to switch vfs timestamps to use struct timespec64.
Currently vfs uses struct timespec, which is not y2038 safe.
The series involves the following:
1. Add vfs helper functions for supporting struct timepec64 timestamps.
2. Cast prints of vfs timestamps to avoid warnings after the switch.
3. Simplify code using vfs timestamps so that the actual
replacement becomes easy.
4. Convert vfs timestamps to use struct timespec64 using a script.
This is a flag day patch.
Next steps:
1. Convert APIs that can handle timespec64, instead of converting
timestamps at the boundaries.
2. Update internal data structures to avoid timestamp conversions.
Thomas Gleixner adds:
I think there is no point to drag that out for the next merge window.
The whole thing needs to be done in one go for the core changes which
means that you're going to play that catchup game forever. Let's get
over with it towards the end of the merge window.
[1] https://www.spinics.net/lists/linux-fsdevel/msg128294.html
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJbInZAAAoJEGCrR//JCVInReoQAIlVIIMt5ZX6wmaKbrjy9Itf
MfgbFihQ/djLnuSPVQ3nztcxF0d66BKHZ9puVjz6+mIHqfDvJTRwZs9nU+sOF/T1
g78fRkM1cxq6ZCkGYAbzyjyo5aC4PnSMP/NQLmwqvi0MXqqrbDoq5ZdP9DHJw39h
L9lD8FM/P7T29Fgp9tq/pT5l9X8VU8+s5KQG1uhB5hii4VL6pD6JyLElDita7rg+
Z7/V7jkxIGEUWF7vGaiR1QTFzEtpUA/exDf9cnsf51OGtK/LJfQ0oiZPPuq3oA/E
LSbt8YQQObc+dvfnGxwgxEg1k5WP5ekj/Wdibv/+rQKgGyLOTz6Q4xK6r8F2ahxs
nyZQBdXqHhJYyKr1H1reUH3mrSgQbE5U5R1i3My0xV2dSn+vtK5vgF21v2Ku3A1G
wJratdtF/kVBzSEQUhsYTw14Un+xhBLRWzcq0cELonqxaKvRQK9r92KHLIWNE7/v
c0TmhFbkZA+zR8HdsaL3iYf1+0W/eYy8PcvepyldKNeW2pVk3CyvdTfY2Z87G2XK
tIkK+BUWbG3drEGG3hxZ3757Ln3a9qWyC5ruD3mBVkuug/wekbI8PykYJS7Mx4s/
WNXl0dAL0Eeu1M8uEJejRAe1Q3eXoMWZbvCYZc+wAm92pATfHVcKwPOh8P7NHlfy
A3HkjIBrKW5AgQDxfgvm
=CZX2
-----END PGP SIGNATURE-----
Merge tag 'vfs-timespec64' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground
Pull inode timestamps conversion to timespec64 from Arnd Bergmann:
"This is a late set of changes from Deepa Dinamani doing an automated
treewide conversion of the inode and iattr structures from 'timespec'
to 'timespec64', to push the conversion from the VFS layer into the
individual file systems.
As Deepa writes:
'The series aims to switch vfs timestamps to use struct timespec64.
Currently vfs uses struct timespec, which is not y2038 safe.
The series involves the following:
1. Add vfs helper functions for supporting struct timepec64
timestamps.
2. Cast prints of vfs timestamps to avoid warnings after the switch.
3. Simplify code using vfs timestamps so that the actual replacement
becomes easy.
4. Convert vfs timestamps to use struct timespec64 using a script.
This is a flag day patch.
Next steps:
1. Convert APIs that can handle timespec64, instead of converting
timestamps at the boundaries.
2. Update internal data structures to avoid timestamp conversions'
Thomas Gleixner adds:
'I think there is no point to drag that out for the next merge
window. The whole thing needs to be done in one go for the core
changes which means that you're going to play that catchup game
forever. Let's get over with it towards the end of the merge window'"
* tag 'vfs-timespec64' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground:
pstore: Remove bogus format string definition
vfs: change inode times to use struct timespec64
pstore: Convert internal records to timespec64
udf: Simplify calls to udf_disk_stamp_to_time
fs: nfs: get rid of memcpys for inode times
ceph: make inode time prints to be long long
lustre: Use long long type to print inode time
fs: add timespec64_truncate()
This is a pure automated search-and-replace of the internal kernel
superblock flags.
The s_flags are now called SB_*, with the names and the values for the
moment mirroring the MS_* flags that they're equivalent to.
Note how the MS_xyz flags are the ones passed to the mount system call,
while the SB_xyz flags are what we then use in sb->s_flags.
The script to do this was:
# places to look in; re security/*: it generally should *not* be
# touched (that stuff parses mount(2) arguments directly), but
# there are two places where we really deal with superblock flags.
FILES="drivers/mtd drivers/staging/lustre fs ipc mm \
include/linux/fs.h include/uapi/linux/bfs_fs.h \
security/apparmor/apparmorfs.c security/apparmor/include/lib.h"
# the list of MS_... constants
SYMS="RDONLY NOSUID NODEV NOEXEC SYNCHRONOUS REMOUNT MANDLOCK \
DIRSYNC NOATIME NODIRATIME BIND MOVE REC VERBOSE SILENT \
POSIXACL UNBINDABLE PRIVATE SLAVE SHARED RELATIME KERNMOUNT \
I_VERSION STRICTATIME LAZYTIME SUBMOUNT NOREMOTELOCK NOSEC BORN \
ACTIVE NOUSER"
SED_PROG=
for i in $SYMS; do SED_PROG="$SED_PROG -e s/MS_$i/SB_$i/g"; done
# we want files that contain at least one of MS_...,
# with fs/namespace.c and fs/pnode.c excluded.
L=$(for i in $SYMS; do git grep -w -l MS_$i $FILES; done| sort|uniq|grep -v '^fs/namespace.c'|grep -v '^fs/pnode.c')
for f in $L; do sed -i $f $SED_PROG; done
Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>