An online resize of a file system with the bigalloc feature enabled
and a 1k block size would be refused since ext4_resize_begin() did not
understand s_first_data_block is 0 for all bigalloc file systems, even
when the block size is 1k.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Avoid growing the file system to an extent so that the last block
group is too small to hold all of the metadata that must be stored in
the block group.
This problem can be triggered with the following reproducer:
umount /mnt
mke2fs -F -m0 -b 4096 -t ext4 -O resize_inode,^has_journal \
-E resize=1073741824 /tmp/foo.img 128M
mount /tmp/foo.img /mnt
truncate --size 1708M /tmp/foo.img
resize2fs /dev/loop0 295400
umount /mnt
e2fsck -fy /tmp/foo.img
Reported-by: Torsten Hilbrich <torsten.hilbrich@secunet.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
When mounting the superblock, ext4_fill_super() calculates the free
blocks and free inodes and stores them in the superblock. It's not
strictly necessary, since we don't use them any more, but it's nice to
keep them roughly aligned to reality.
Since it's not critical for file system correctness, the code doesn't
call ext4_commit_super(). The problem is that it's in
ext4_commit_super() that we recalculate the superblock checksum. So
if we're not going to call ext4_commit_super(), we need to call
ext4_superblock_csum_set() to make sure the superblock checksum is
consistent.
Most of the time, this doesn't matter, since we end up calling
ext4_commit_super() very soon thereafter, and definitely by the time
the file system is unmounted. However, it doesn't work in this
sequence:
mke2fs -Fq -t ext4 /dev/vdc 128M
mount /dev/vdc /vdc
cp xfstests/git-versions /vdc
godown /vdc
umount /vdc
mount /dev/vdc
tune2fs -l /dev/vdc
With this commit, the "tune2fs -l" no longer fails.
Reported-by: Chengguang Xu <cgxu519@gmx.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
A maliciously crafted file system can cause an overflow when the
results of a 64-bit calculation is stored into a 32-bit length
parameter.
https://bugzilla.kernel.org/show_bug.cgi?id=200623
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reported-by: Wen Xu <wen.xu@gatech.edu>
Cc: stable@vger.kernel.org
A specially crafted file system can trick empty_inline_dir() into
reading past the last valid entry in a inline directory, and then run
into the end of xattr marker. This will trigger a divide by zero
fault. Fix this by using the size of the inline directory instead of
dir->i_size.
Also clean up error reporting in __ext4_check_dir_entry so that the
message is clearer and more understandable --- and avoids the division
by zero trap if the size passed in is zero. (I'm not sure why we
coded it that way in the first place; printing offset % size is
actually more confusing and less useful.)
https://bugzilla.kernel.org/show_bug.cgi?id=200933
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reported-by: Wen Xu <wen.xu@gatech.edu>
Cc: stable@vger.kernel.org
If the destination of the rename(2) system call exists, the inode's
link count (i_nlinks) must be non-zero. If it is, the inode can end
up on the orphan list prematurely, leading to all sorts of hilarity,
including a use-after-free.
https://bugzilla.kernel.org/show_bug.cgi?id=200931
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reported-by: Wen Xu <wen.xu@gatech.edu>
Cc: stable@vger.kernel.org
This suppresses some false positives in gcc 8's -Wstringop-truncation
Suggested by Miguel Ojeda (hopefully the __nonstring definition will
eventually get accepted in the compiler-gcc.h header file).
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
The err is not used after initalization. So just remove the variable.
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
'ac->ac_g_ex.fe_len' is a user-controlled value which is used in the
derivation of 'ac->ac_2order'. 'ac->ac_2order', in turn, is used to
index arrays which makes it a potential spectre gadget. Fix this by
sanitizing the value assigned to 'ac->ac2_order'. This covers the
following accesses found with the help of smatch:
* fs/ext4/mballoc.c:1896 ext4_mb_simple_scan_group() warn: potential
spectre issue 'grp->bb_counters' [w] (local cap)
* fs/ext4/mballoc.c:445 mb_find_buddy() warn: potential spectre issue
'EXT4_SB(e4b->bd_sb)->s_mb_offsets' [r] (local cap)
* fs/ext4/mballoc.c:446 mb_find_buddy() warn: potential spectre issue
'EXT4_SB(e4b->bd_sb)->s_mb_maxs' [r] (local cap)
Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Jeremy Cline <jcline@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Extended attribute names are defined to be NUL-terminated, so the name
must not contain a NUL character. This is important because there are
places when remove extended attribute, the code uses strlen to
determine the length of the entry. That should probably be fixed at
some point, but code is currently really messy, so the simplest fix
for now is to simply validate that the extended attributes are sane.
https://bugzilla.kernel.org/show_bug.cgi?id=200401
Reported-by: Wen Xu <wen.xu@gatech.edu>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Out of memory should not be considered as critical errors; so replace
ext4_error() with ext4_warnig().
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Whenever we hit block or inode bitmap corruptions we set
bit and then reduce this block group free inode/clusters
counter to expose right available space.
However some of ext4_mark_group_bitmap_corrupted() is called
inside group spinlock, some are not, this could make it happen
that we double reduce one block group free counters from system.
Always hold group spinlock for it could fix it, but it looks
a little heavy, we could use test_and_set_bit() to fix race
problems here.
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
When ext4_find_entry() falls back to "searching the old fashioned
way" due to a corrupt dx dir, it needs to reset the error code
to NULL so that the nonstandard ERR_BAD_DX_DIR code isn't returned
to userspace.
https://bugzilla.kernel.org/show_bug.cgi?id=199947
Reported-by: Anatoly Trosinenko <anatoly.trosinenko@yandex.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Follow the lead of xfs_break_dax_layouts() and add synchronization between
operations in ext4 which remove blocks from an inode (hole punch, truncate
down, etc.) and pages which are pinned due to DAX DMA operations.
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
Inodes using DAX should only ever have exceptional entries in their page
caches. Make this clear by warning if the iteration in
dax_layout_busy_page() ever sees a non-exceptional entry, and by adding a
comment for the pagevec_release() call which only deals with struct page
pointers.
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
The superblock timestamp fields were enlarged by u8 to be 40 bits wide.
Update the documentation to reflect this.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Create a new top-level section for documentation of filesystem usage,
on-disk format information, and anything else.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Make use of the swap macro and remove unnecessary variable *tmp*.
This makes the code easier to read and maintain.
This code was detected with the help of Coccinelle.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
There is no check for allocation failure when duplicating
"data" in ext4_remount(). Check for failure and return
error -ENOMEM in this case.
Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Output the warning message before we clobber type and be -1 all the time.
The error message would now be
[ 1.519791] EXT4-fs warning (device vdb): ext4_enable_quotas:5402:
Failed to enable quota tracking (type=0, err=-3). Please run e2fsck to fix.
Signed-off-by: Junichi Uekawa <uekawa@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
The inode timestamps use 34 bits in ext4, but the various timestamps in
the superblock are limited to 32 bits. If every user accesses these as
'unsigned', then this is good until year 2106, but it seems better to
extend this a bit further in the process of removing the deprecated
get_seconds() function.
This adds another byte for each timestamp in the superblock, making
them long enough to store timestamps beyond what is in the inodes,
which seems good enough here (in ocfs2, they are already 64-bit wide,
which is appropriate for a new layout).
I did not modify e2fsprogs, which obviously needs the same change to
actually interpret future timestamps correctly.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
jbd2 is one of the few callers of current_kernel_time64(), which
is a wrapper around ktime_get_coarse_real_ts64(). This calls the
latter directly for consistency with the rest of the kernel that
is moving to the ktime_get_ family of time accessors.
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
This is the last missing piece for the inode times on 32-bit systems:
now that VFS interfaces use timespec64, we just need to stop truncating
the tv_sec values for y2038 compatibililty.
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
We only care about the low 32-bit for i_dtime as explained in commit
b5f515735b ("ext4: avoid Y2038 overflow in recently_deleted()"), so
the use of get_seconds() is correct here, but that function is getting
removed in the process of the y2038 fixes, so let's use the modern
ktime_get_real_seconds() here.
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The mmp_time field is 64 bits wide, which is good, but calling
get_seconds() results in a 32-bit value on 32-bit architectures. Using
ktime_get_real_seconds() instead returns 64 bits everywhere.
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
While working on extended rand for last_error/first_error timestamps,
I noticed that the endianess is wrong; we access the little-endian
fields in struct ext4_super_block as native-endian when we print them.
This adds a special case in ext4_attr_show() and ext4_attr_store()
to byteswap the superblock fields if needed.
In older kernels, this code was part of super.c, it got moved to
sysfs.c in linux-4.4.
Cc: stable@vger.kernel.org
Fixes: 52c198c682 ("ext4: add sysfs entry showing whether the fs contains errors")
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Import the chapter about extended attributes from the on-disk format wiki
page into the kernel documentation.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Import the chapter about directory layout from the on-disk format wiki
page into the kernel documentation.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Import the chapter about inode data fork from the on-disk format wiki
page into the kernel documentation.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Import the chapter about inodes from the on-disk format wiki
page into the kernel documentation.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Import the chapter about the journal from the on-disk format wiki
page into the kernel documentation.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Import the chapter about multi-mount protection from the on-disk format
wiki page into the kernel documentation.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Import the chapter about bitmaps from the on-disk format wiki
page into the kernel documentation.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Import the chapter about group descriptors from the on-disk format wiki
page into the kernel documentation.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Import the chapter about superblocks from the on-disk format wiki
page into the kernel documentation.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Import the chapter about high level design from the on-disk format wiki
page into the kernel documentation.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Create the basic structure of the "new" data structures & algorithms
book to be ported over from the on-disk format wiki, and then start by
pulling in the introductory information.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Convert the existing ext4 documentation into rst format and link it in
with the rest of the kernel documentation.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Move Documentation/filesystems/ext4.txt into
Documentation/filesystems/ext4/ext4.rst in preparation for adding more
ext4 documentation.
Note that the documentation isn't in rst format yet, but as it's not
linked from anywhere it won't cause build errors.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Commit 8844618d8a: "ext4: only look at the bg_flags field if it is
valid" will complain if block group zero does not have the
EXT4_BG_INODE_ZEROED flag set. Unfortunately, this is not correct,
since a freshly created file system has this flag cleared. It gets
almost immediately after the file system is mounted read-write --- but
the following somewhat unlikely sequence will end up triggering a
false positive report of a corrupted file system:
mkfs.ext4 /dev/vdc
mount -o ro /dev/vdc /vdc
mount -o remount,rw /dev/vdc
Instead, when initializing the inode table for block group zero, test
to make sure that itable_unused count is not too large, since that is
the case that will result in some or all of the reserved inodes
getting cleared.
This fixes the failures reported by Eric Whiteney when running
generic/230 and generic/231 in the the nojournal test case.
Fixes: 8844618d8a ("ext4: only look at the bg_flags field if it is valid")
Reported-by: Eric Whitney <enwlinux@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
With commit 044e6e3d74: "ext4: don't update checksum of new
initialized bitmaps" the buffer valid bit will get set without
actually setting up the checksum for the allocation bitmap, since the
checksum will get calculated once we actually allocate an inode or
block.
If we are doing this, then we need to (re-)check the verified bit
after we take the block group lock. Otherwise, we could race with
another process reading and verifying the bitmap, which would then
complain about the checksum being invalid.
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1780137
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
The inline data code was updating the raw inode directly; this is
problematic since if metadata checksums are enabled,
ext4_mark_inode_dirty() must be called to update the inode's checksum.
In addition, the jbd2 layer requires that get_write_access() be called
before the metadata buffer is modified. Fix both of these problems.
https://bugzilla.kernel.org/show_bug.cgi?id=200443
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Previously, when an MMP-protected file system is remounted read-only,
the kmmpd thread would exit the next time it woke up (a few seconds
later), without resetting the MMP sequence number back to
EXT4_MMP_SEQ_CLEAN.
Fix this by explicitly killing the MMP thread when the file system is
remounted read-only.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: Andreas Dilger <adilger@dilger.ca>
Ext4_check_descriptors() was getting called before s_gdb_count was
initialized. So for file systems w/o the meta_bg feature, allocation
bitmaps could overlap the block group descriptors and ext4 wouldn't
notice.
For file systems with the meta_bg feature enabled, there was a
fencepost error which would cause the ext4_check_descriptors() to
incorrectly believe that the block allocation bitmap overlaps with the
block group descriptor blocks, and it would reject the mount.
Fix both of these problems.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
A small collection of fixes, sort of the usual at this point, all for
i.MX or OMAP:
- Enable ULPI drivers on i.MX to avoid a hang
- Pinctrl fix for touchscreen on i.MX51 ZII RDU1
- Fixes for ethernet clock references on am3517
- mmc0 write protect detection fix for am335x
- kzalloc->kcalloc conversion in an OMAP driver
- USB metastability fix for USB on dra7
- Fix touchscreen wakeup on am437x
-----BEGIN PGP SIGNATURE-----
iQJDBAABCAAtFiEElf+HevZ4QCAJmMQ+jBrnPN6EHHcFAltCOpkPHG9sb2ZAbGl4
b20ubmV0AAoJEIwa5zzehBx38LsP/1V0UGTgU3qz7vo2JkZxa9HAT+vroL1PDUmi
+WCtndQA5kOGZkAEB8q5Ry6MH9P372JfJrkSy3v0B8e7gRXXH3Ojke9FQA2N5z60
cKLaiWq3euZ/y5UWvfkd8WX29uBC63y9+AyJcFaacyWYvwwvC6TxMOCcSNBUG24l
bCGkXgwF9UCPxUWD+ZDgDhrTCNWr6E9jf/guAJ8ozsR/XR1xTSVlWJOq2qY9x9h/
QvZLa+fYXj55z/XBNWxqlzWu24xHU6yPAizdSOqHVmCe51QTub2ZVlx0qJuWUuvn
POdVgmwbvKixfX8ivxvIeEb3yEPTOl5WI6xBt3Sf5PcaE+Oby91koFaqq2kOT/fs
XCMBU9mRFQdryiFuvl8g5cqV26s6AyC+ACxfAhCr4BQoyBZgjITCrR4zfyf/YO1U
e94VsDawdFG1QSRlbgpYCIeV4uQJJeCeZFKbQ/nZH9R2yBuGEOWry/dET6FkfmiE
NbWBzIIdrjs1TS5BtCRAOM4KAMKhKcF7EZ67F6azeIt0WcO+w3hf0+8NAQdJqS59
WjNWS6szmZgEblbJmCgk0CJPUGyLD8voVvQ9AWHTNEHM3CMD+2Mn8ZhYxVxhXCLz
odQxPOuUk9l3eA4Y5t2DfDaCHA4JRNygsVLyPugzx5XcV2RA30daPhWO9IzKMTPO
t58/SeW6
=N/ik
-----END PGP SIGNATURE-----
Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Olof Johansson:
"A small collection of fixes, sort of the usual at this point, all for
i.MX or OMAP:
- Enable ULPI drivers on i.MX to avoid a hang
- Pinctrl fix for touchscreen on i.MX51 ZII RDU1
- Fixes for ethernet clock references on am3517
- mmc0 write protect detection fix for am335x
- kzalloc->kcalloc conversion in an OMAP driver
- USB metastability fix for USB on dra7
- Fix touchscreen wakeup on am437x"
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ARM: imx_v4_v5_defconfig: Select ULPI support
ARM: imx_v6_v7_defconfig: Select ULPI support
ARM: dts: omap3: Fix am3517 mdio and emac clock references
ARM: dts: am335x-bone-common: Fix mmc0 Write Protect
bus: ti-sysc: Use 2-factor allocator arguments
ARM: dts: dra7: Disable metastability workaround for USB2
ARM: dts: imx51-zii-rdu1: fix touchscreen pinctrl
ARM: dts: am437x: make edt-ft5x06 a wakeup source
Pull x86/pti updates from Thomas Gleixner:
"Two small fixes correcting the handling of SSB mitigations on AMD
processors"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/bugs: Fix the AMD SSBD usage of the SPEC_CTRL MSR
x86/bugs: Update when to check for the LS_CFG SSBD mitigation
Pull x86 fixes from Thomas Gleixner:
- Prevent an out-of-bounds access in mtrr_write()
- Break a circular dependency in the new hyperv IPI acceleration code
- Address the build breakage related to inline functions by enforcing
gnu_inline and explicitly bringing native_save_fl() out of line,
which also adds a set of _ARM_ARG macros which provide 32/64bit
safety.
- Initialize the shadow CR4 per cpu variable before using it.
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mtrr: Don't copy out-of-bounds data in mtrr_write
x86/hyper-v: Fix the circular dependency in IPI enlightenment
x86/paravirt: Make native_save_fl() extern inline
x86/asm: Add _ASM_ARG* constants for argument registers to <asm/asm.h>
compiler-gcc.h: Add __attribute__((gnu_inline)) to all inline declarations
x86/mm/32: Initialize the CR4 shadow before __flush_tlb_all()