linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-11-16 08:44:21 +08:00

Author	SHA1	Message	Date
Theodore Ts'o	01daf94525	ext4: propagate error values from ext4_inline_data_truncate() Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-01-22 19:35:49 -05:00
Theodore Ts'o	b907f2d519	ext4: avoid calling ext4_mark_inode_dirty() under unneeded semaphores There is no need to call ext4_mark_inode_dirty while holding xattr_sem or i_data_sem, so where it's easy to avoid it, move it out from the critical region. Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-01-11 22:14:49 -05:00
Theodore Ts'o	c755e25135	ext4: fix deadlock between inline_data and ext4_expand_extra_isize_ea() The xattr_sem deadlock problems fixed in commit `2e81a4eeed`: "ext4: avoid deadlock when expanding inode size" didn't include the use of xattr_sem in fs/ext4/inline.c. With the addition of project quota which added a new extra inode field, this exposed deadlocks in the inline_data code similar to the ones fixed by `2e81a4eeed`. The deadlock can be reproduced via: dmesg -n 7 mke2fs -t ext4 -O inline_data -Fq -I 256 /dev/vdc 32768 mount -t ext4 -o debug_want_extra_isize=24 /dev/vdc /vdc mkdir /vdc/a umount /vdc mount -t ext4 /dev/vdc /vdc echo foo > /vdc/a/foo and looks like this: [ 11.158815] [ 11.160276] ============================================= [ 11.161960] [ INFO: possible recursive locking detected ] [ 11.161960] 4.10.0-rc3-00015-g011b30a8a3cf #160 Tainted: G W [ 11.161960] --------------------------------------------- [ 11.161960] bash/2519 is trying to acquire lock: [ 11.161960] (&ei->xattr_sem){++++..}, at: [<c1225a4b>] ext4_expand_extra_isize_ea+0x3d/0x4cd [ 11.161960] [ 11.161960] but task is already holding lock: [ 11.161960] (&ei->xattr_sem){++++..}, at: [<c1227941>] ext4_try_add_inline_entry+0x3a/0x152 [ 11.161960] [ 11.161960] other info that might help us debug this: [ 11.161960] Possible unsafe locking scenario: [ 11.161960] [ 11.161960] CPU0 [ 11.161960] ---- [ 11.161960] lock(&ei->xattr_sem); [ 11.161960] lock(&ei->xattr_sem); [ 11.161960] [ 11.161960] * DEADLOCK * [ 11.161960] [ 11.161960] May be due to missing lock nesting notation [ 11.161960] [ 11.161960] 4 locks held by bash/2519: [ 11.161960] #0: (sb_writers#3){.+.+.+}, at: [<c11a2414>] mnt_want_write+0x1e/0x3e [ 11.161960] #1: (&type->i_mutex_dir_key){++++++}, at: [<c119508b>] path_openat+0x338/0x67a [ 11.161960] #2: (jbd2_handle){++++..}, at: [<c123314a>] start_this_handle+0x582/0x622 [ 11.161960] #3: (&ei->xattr_sem){++++..}, at: [<c1227941>] ext4_try_add_inline_entry+0x3a/0x152 [ 11.161960] [ 11.161960] stack backtrace: [ 11.161960] CPU: 0 PID: 2519 Comm: bash Tainted: G W 4.10.0-rc3-00015-g011b30a8a3cf #160 [ 11.161960] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1 04/01/2014 [ 11.161960] Call Trace: [ 11.161960] dump_stack+0x72/0xa3 [ 11.161960] __lock_acquire+0xb7c/0xcb9 [ 11.161960] ? kvm_clock_read+0x1f/0x29 [ 11.161960] ? __lock_is_held+0x36/0x66 [ 11.161960] ? __lock_is_held+0x36/0x66 [ 11.161960] lock_acquire+0x106/0x18a [ 11.161960] ? ext4_expand_extra_isize_ea+0x3d/0x4cd [ 11.161960] down_write+0x39/0x72 [ 11.161960] ? ext4_expand_extra_isize_ea+0x3d/0x4cd [ 11.161960] ext4_expand_extra_isize_ea+0x3d/0x4cd [ 11.161960] ? _raw_read_unlock+0x22/0x2c [ 11.161960] ? jbd2_journal_extend+0x1e2/0x262 [ 11.161960] ? __ext4_journal_get_write_access+0x3d/0x60 [ 11.161960] ext4_mark_inode_dirty+0x17d/0x26d [ 11.161960] ? ext4_add_dirent_to_inline.isra.12+0xa5/0xb2 [ 11.161960] ext4_add_dirent_to_inline.isra.12+0xa5/0xb2 [ 11.161960] ext4_try_add_inline_entry+0x69/0x152 [ 11.161960] ext4_add_entry+0xa3/0x848 [ 11.161960] ? __brelse+0x14/0x2f [ 11.161960] ? _raw_spin_unlock_irqrestore+0x44/0x4f [ 11.161960] ext4_add_nondir+0x17/0x5b [ 11.161960] ext4_create+0xcf/0x133 [ 11.161960] ? ext4_mknod+0x12f/0x12f [ 11.161960] lookup_open+0x39e/0x3fb [ 11.161960] ? __wake_up+0x1a/0x40 [ 11.161960] ? lock_acquire+0x11e/0x18a [ 11.161960] path_openat+0x35c/0x67a [ 11.161960] ? sched_clock_cpu+0xd7/0xf2 [ 11.161960] do_filp_open+0x36/0x7c [ 11.161960] ? _raw_spin_unlock+0x22/0x2c [ 11.161960] ? __alloc_fd+0x169/0x173 [ 11.161960] do_sys_open+0x59/0xcc [ 11.161960] SyS_open+0x1d/0x1f [ 11.161960] do_int80_syscall_32+0x4f/0x61 [ 11.161960] entry_INT80_32+0x2f/0x2f [ 11.161960] EIP: 0xb76ad469 [ 11.161960] EFLAGS: 00000286 CPU: 0 [ 11.161960] EAX: ffffffda EBX: 08168ac8 ECX: 00008241 EDX: 000001b6 [ 11.161960] ESI: b75e46bc EDI: b7755000 EBP: bfbdb108 ESP: bfbdafc0 [ 11.161960] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b Cc: stable@vger.kernel.org # 3.10 (requires `2e81a4eeed` as a prereq) Reported-by: George Spelvin <linux@sciencehorizons.net> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-01-11 21:50:46 -05:00
Theodore Ts'o	670e9875eb	ext4: add debug_want_extra_isize mount option In order to test the inode extra isize expansion code, it is useful to be able to easily create file systems that have inodes with extra isize values smaller than the current desired value. Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-01-11 15:32:22 -05:00
Roman Pen	03e916fa8b	ext4: do not polute the extents cache while shifting extents Inside ext4_ext_shift_extents() function ext4_find_extent() is called without EXT4_EX_NOCACHE flag, which should prevent cache population. This leads to oudated offsets in the extents tree and wrong blocks afterwards. Patch fixes the problem providing EXT4_EX_NOCACHE flag for each ext4_find_extents() call inside ext4_ext_shift_extents function. Fixes: `331573febb` Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: Namjae Jeon <namjae.jeon@samsung.com> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: stable@vger.kernel.org	2017-01-08 21:00:35 -05:00
Roman Pen	2a9b8cba62	ext4: Include forgotten start block on fallocate insert range While doing 'insert range' start block should be also shifted right. The bug can be easily reproduced by the following test: ptr = malloc(4096); assert(ptr); fd = open("./ext4.file", O_CREAT \| O_TRUNC \| O_RDWR, 0600); assert(fd >= 0); rc = fallocate(fd, 0, 0, 8192); assert(rc == 0); for (i = 0; i < 2048; i++) ((unsigned short )ptr + i) = 0xbeef; rc = pwrite(fd, ptr, 4096, 0); assert(rc == 4096); rc = pwrite(fd, ptr, 4096, 4096); assert(rc == 4096); for (block = 2; block < 1000; block++) { rc = fallocate(fd, FALLOC_FL_INSERT_RANGE, 4096, 4096); assert(rc == 0); for (i = 0; i < 2048; i++) ((unsigned short )ptr + i) = block; rc = pwrite(fd, ptr, 4096, 4096); assert(rc == 4096); } Because start block is not included in the range the hole appears at the wrong offset (just after the desired offset) and the following pwrite() overwrites already existent block, keeping hole untouched. Simple way to verify wrong behaviour is to check zeroed blocks after the test: $ hexdump ./ext4.file \| grep '0000 0000' The root cause of the bug is a wrong range (start, stop], where start should be inclusive, i.e. [start, stop]. This patch fixes the problem by including start into the range. But not to break left shift (range collapse) stop points to the beginning of the a block, not to the end. The other not obvious change is an iterator check on validness in a main loop. Because iterator is unsigned the following corner case should be considered with care: insert a block at 0 offset, when stop variables overflows and never becomes less than start, which is 0. To handle this special case iterator is set to NULL to indicate that end of the loop is reached. Fixes: `331573febb` Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: Namjae Jeon <namjae.jeon@samsung.com> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: stable@vger.kernel.org	2017-01-08 20:59:35 -05:00
Theodore Ts'o	56735be053	Merge branch 'fscrypt' into d	2017-01-08 20:57:35 -05:00
Eric Biggers	a5d431eff2	fscrypt: make fscrypt_operations.key_prefix a string There was an unnecessary amount of complexity around requesting the filesystem-specific key prefix. It was unclear why; perhaps it was envisioned that different instances of the same filesystem type could use different key prefixes, or that key prefixes could be binary. However, neither of those things were implemented or really make sense at all. So simplify the code by making key_prefix a const char *. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Richard Weinberger <richard@nod.at> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-01-08 01:03:41 -05:00
Theodore Ts'o	173b8439e1	ext4: don't allow encrypted operations without keys While we allow deletes without the key, the following should not be permitted: # cd /vdc/encrypted-dir-without-key # ls -l total 4 -rw-r--r-- 1 root root 0 Dec 27 22:35 6,LKNRJsp209FbXoSvJWzB -rw-r--r-- 1 root root 286 Dec 27 22:35 uRJ5vJh9gE7vcomYMqTAyD # mv uRJ5vJh9gE7vcomYMqTAyD 6,LKNRJsp209FbXoSvJWzB Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-01-08 00:58:23 -05:00
Eric Biggers	54475f531b	fscrypt: use ENOKEY when file cannot be created w/o key As part of an effort to clean up fscrypt-related error codes, make attempting to create a file in an encrypted directory that hasn't been "unlocked" fail with ENOKEY. Previously, several error codes were used for this case, including ENOENT, EACCES, and EPERM, and they were not consistent between and within filesystems. ENOKEY is a better choice because it expresses that the failure is due to lacking the encryption key. It also matches the error code returned when trying to open an encrypted regular file without the key. I am not aware of any users who might be relying on the previous inconsistent error codes, which were never documented anywhere. This failure case will be exercised by an xfstest. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-12-31 16:26:20 -05:00
Jan Kara	1db175428e	ext4: Simplify DAX fault path Now that dax_iomap_fault() calls ->iomap_begin() without entry lock, we can use transaction starting in ext4_iomap_begin() and thus simplify ext4_dax_fault(). It also provides us proper retries in case of ENOSPC. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-12-26 20:29:25 -08:00
Linus Torvalds	7c0f6ba682	Replace <asm/uaccess.h> with <linux/uaccess.h> globally This was entirely automated, using the script by Al: PATT='^[[:blank:]]#[[:blank:]]include[[:blank:]]*<asm/uaccess.h>' sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \ $(git grep -l "$PATT"\|grep -v ^include/linux/uaccess.h) to do the replacement at the end of the merge window. Requested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-12-24 11:46:01 -08:00
Linus Torvalds	231753ef78	Merge uncontroversial parts of branch 'readlink' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull partial readlink cleanups from Miklos Szeredi. This is the uncontroversial part of the readlink cleanup patch-set that simplifies the default readlink handling. Miklos and Al are still discussing the rest of the series. * git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: vfs: make generic_readlink() static vfs: remove ".readlink = generic_readlink" assignments vfs: default to generic_readlink() vfs: replace calling i_op->readlink with vfs_readlink() proc/self: use generic_readlink ecryptfs: use vfs_get_link() bad_inode: add missing i_op initializers	2016-12-17 19:16:12 -08:00
Linus Torvalds	0110c350c8	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull more vfs updates from Al Viro: "In this pile: - autofs-namespace series - dedupe stuff - more struct path constification" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (40 commits) ocfs2: implement the VFS clone_range, copy_range, and dedupe_range features ocfs2: charge quota for reflinked blocks ocfs2: fix bad pointer cast ocfs2: always unlock when completing dio writes ocfs2: don't eat io errors during _dio_end_io_write ocfs2: budget for extent tree splits when adding refcount flag ocfs2: prohibit refcounted swapfiles ocfs2: add newlines to some error messages ocfs2: convert inode refcount test to a helper simple_write_end(): don't zero in short copy into uptodate exofs: don't mess with simple_write_{begin,end} 9p: saner ->write_end() on failing copy into non-uptodate page fix gfs2_stuffed_write_end() on short copies fix ceph_write_end() nfs_write_end(): fix handling of short copies vfs: refactor clone/dedupe_file_range common functions fs: try to clone files first in vfs_copy_file_range vfs: misc struct path constification namespace.c: constify struct path passed to a bunch of primitives quota: constify struct path in quota_on ...	2016-12-17 18:44:00 -08:00
Linus Torvalds	80eabba702	Merge branch 'for-4.10/fs-unmap' of git://git.kernel.dk/linux-block Pull fs meta data unmap optimization from Jens Axboe: "A series from Jan Kara, providing a more efficient way for unmapping meta data from in the buffer cache than doing it block-by-block. Provide a general helper that existing callers can use" * 'for-4.10/fs-unmap' of git://git.kernel.dk/linux-block: fs: Remove unmap_underlying_metadata fs: Add helper to clean bdev aliases under a bh and use it ext2: Use clean_bdev_aliases() instead of iteration ext4: Use clean_bdev_aliases() instead of iteration direct-io: Use clean_bdev_aliases() instead of handmade iteration fs: Provide function to unmap metadata for a range of blocks	2016-12-14 17:09:00 -08:00
Linus Torvalds	5084fdf081	This merge request includes the dax-4.0-iomap-pmd branch which is needed for both ext4 and xfs dax changes to use iomap for DAX. It also includes the fscrypt branch which is needed for ubifs encryption work as well as ext4 encryption and fscrypt cleanups. Lots of cleanups and bug fixes, especially making sure ext4 is robust against maliciously corrupted file systems --- especially maliciously corrupted xattr blocks and a maliciously corrupted superblock. Also fix ext4 support for 64k block sizes so it works well on ppcle. Fixed mbcache so we don't miss some common xattr blocks that can be merged. -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAlhQQVEACgkQ8vlZVpUN gaN9TQgAoCD+V4kJjMCFhiV8u6QR3hqD6bOZbggo5wJf4CHglWkmrbAmc3jANOgH CKsXDRRjxuDjPXf1ukB1i4M7ArLYjkbbzKdsu7lismoJLS+w8uwUKSNdep+LYMjD alxUcf5DCzLlUmdOdW4yE22L+CwRfqfs8IpBvKmJb7DrAKiwJVA340ys6daBGuu1 63xYx0QIyPzq0xjqLb6TVf88HUI4NiGVXmlm2wcrnYd5966hEZd/SztOZTVCVWOf Z0Z0fGQ1WJzmaBB9+YV3aBi+BObOx4m2PUprIa531+iEW02E+ot5Xd4vVQFoV/r4 NX3XtoBrT1XlKagy2sJLMBoCavqrKw== =j4KP -----END PGP SIGNATURE----- Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "This merge request includes the dax-4.0-iomap-pmd branch which is needed for both ext4 and xfs dax changes to use iomap for DAX. It also includes the fscrypt branch which is needed for ubifs encryption work as well as ext4 encryption and fscrypt cleanups. Lots of cleanups and bug fixes, especially making sure ext4 is robust against maliciously corrupted file systems --- especially maliciously corrupted xattr blocks and a maliciously corrupted superblock. Also fix ext4 support for 64k block sizes so it works well on ppcle. Fixed mbcache so we don't miss some common xattr blocks that can be merged" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (86 commits) dax: Fix sleep in atomic contex in grab_mapping_entry() fscrypt: Rename FS_WRITE_PATH_FL to FS_CTX_HAS_BOUNCE_BUFFER_FL fscrypt: Delay bounce page pool allocation until needed fscrypt: Cleanup page locking requirements for fscrypt_{decrypt,encrypt}_page() fscrypt: Cleanup fscrypt_{decrypt,encrypt}_page() fscrypt: Never allocate fscrypt_ctx on in-place encryption fscrypt: Use correct index in decrypt path. fscrypt: move the policy flags and encryption mode definitions to uapi header fscrypt: move non-public structures and constants to fscrypt_private.h fscrypt: unexport fscrypt_initialize() fscrypt: rename get_crypt_info() to fscrypt_get_crypt_info() fscrypto: move ioctl processing more fully into common code fscrypto: remove unneeded Kconfig dependencies MAINTAINERS: fscrypto: recommend linux-fsdevel for fscrypto patches ext4: do not perform data journaling when data is encrypted ext4: return -ENOMEM instead of success ext4: reject inodes with negative size ext4: remove another test in ext4_alloc_file_blocks() Documentation: fix description of ext4's block_validity mount option ext4: fix checks for data=ordered and journal_async_commit options ...	2016-12-14 09:17:42 -08:00
Linus Torvalds	36869cb93d	Merge branch 'for-4.10/block' of git://git.kernel.dk/linux-block Pull block layer updates from Jens Axboe: "This is the main block pull request this series. Contrary to previous release, I've kept the core and driver changes in the same branch. We always ended up having dependencies between the two for obvious reasons, so makes more sense to keep them together. That said, I'll probably try and keep more topical branches going forward, especially for cycles that end up being as busy as this one. The major parts of this pull request is: - Improved support for O_DIRECT on block devices, with a small private implementation instead of using the pig that is fs/direct-io.c. From Christoph. - Request completion tracking in a scalable fashion. This is utilized by two components in this pull, the new hybrid polling and the writeback queue throttling code. - Improved support for polling with O_DIRECT, adding a hybrid mode that combines pure polling with an initial sleep. From me. - Support for automatic throttling of writeback queues on the block side. This uses feedback from the device completion latencies to scale the queue on the block side up or down. From me. - Support from SMR drives in the block layer and for SD. From Hannes and Shaun. - Multi-connection support for nbd. From Josef. - Cleanup of request and bio flags, so we have a clear split between which are bio (or rq) private, and which ones are shared. From Christoph. - A set of patches from Bart, that improve how we handle queue stopping and starting in blk-mq. - Support for WRITE_ZEROES from Chaitanya. - Lightnvm updates from Javier/Matias. - Supoort for FC for the nvme-over-fabrics code. From James Smart. - A bunch of fixes from a whole slew of people, too many to name here" * 'for-4.10/block' of git://git.kernel.dk/linux-block: (182 commits) blk-stat: fix a few cases of missing batch flushing blk-flush: run the queue when inserting blk-mq flush elevator: make the rqhash helpers exported blk-mq: abstract out blk_mq_dispatch_rq_list() helper blk-mq: add blk_mq_start_stopped_hw_queue() block: improve handling of the magic discard payload blk-wbt: don't throttle discard or write zeroes nbd: use dev_err_ratelimited in io path nbd: reset the setup task for NBD_CLEAR_SOCK nvme-fabrics: Add FC LLDD loopback driver to test FC-NVME nvme-fabrics: Add target support for FC transport nvme-fabrics: Add host support for FC transport nvme-fabrics: Add FC transport LLDD api definitions nvme-fabrics: Add FC transport FC-NVME definitions nvme-fabrics: Add FC transport error codes to nvme.h Add type 0x28 NVME type code to scsi fc headers nvme-fabrics: patch target code in prep for FC transport support nvme-fabrics: set sqe.command_id in core not transports parser: add u64 number parser nvme-rdma: align to generic ib_event logging helper ...	2016-12-13 10:19:16 -08:00
Theodore Ts'o	a551d7c8de	Merge branch 'fscrypt' into dev	2016-12-12 21:50:28 -05:00
David Gstir	bd7b829038	fscrypt: Cleanup page locking requirements for fscrypt_{decrypt,encrypt}_page() Rename the FS_CFLG_INPLACE_ENCRYPTION flag to FS_CFLG_OWN_PAGES which, when set, indicates that the fs uses pages under its own control as opposed to writeback pages which require locking and a bounce buffer for encryption. Signed-off-by: David Gstir <david@sigma-star.at> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-12-11 16:26:12 -05:00
Eric Biggers	db717d8e26	fscrypto: move ioctl processing more fully into common code Multiple bugs were recently fixed in the "set encryption policy" ioctl. To make it clear that fscrypt_process_policy() and fscrypt_get_policy() implement ioctls and therefore their implementations must take standard security and correctness precautions, rename them to fscrypt_ioctl_set_policy() and fscrypt_ioctl_get_policy(). Make the latter take in a struct file * to make it consistent with the former. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-12-11 16:26:07 -05:00
Sergey Karamov	73b92a2a5e	ext4: do not perform data journaling when data is encrypted Currently data journalling is incompatible with encryption: enabling both at the same time has never been supported by design, and would result in unpredictable behavior. However, users are not precluded from turning on both features simultaneously. This change programmatically replaces data journaling for encrypted regular files with ordered data journaling mode. Background: Journaling encrypted data has not been supported because it operates on buffer heads of the page in the page cache. Namely, when the commit happens, which could be up to five seconds after caching, the commit thread uses the buffer heads attached to the page to copy the contents of the page to the journal. With encryption, it would have been required to keep the bounce buffer with ciphertext for up to the aforementioned five seconds, since the page cache can only hold plaintext and could not be used for journaling. Alternatively, it would be required to setup the journal to initiate a callback at the commit time to perform deferred encryption - in this case, not only would the data have to be written twice, but it would also have to be encrypted twice. This level of complexity was not justified for a mode that in practice is very rarely used because of the overhead from the data journalling. Solution: If data=journaled has been set as a mount option for a filesystem, or if journaling is enabled on a regular file, do not perform journaling if the file is also encrypted, instead fall back to the data=ordered mode for the file. Rationale: The intent is to allow seamless and proper filesystem operation when journaling and encryption have both been enabled, and have these two conflicting features gracefully resolved by the filesystem. Fixes: `4461471107` Signed-off-by: Sergey Karamov <skaramov@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org	2016-12-10 17:54:58 -05:00
Dan Carpenter	578620f451	ext4: return -ENOMEM instead of success We should set the error code if kzalloc() fails. Fixes: `67cf5b09a4` ("ext4: add the basic function for inline data support") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org	2016-12-10 09:56:01 -05:00
Darrick J. Wong	7e6e1ef48f	ext4: reject inodes with negative size Don't load an inode with a negative size; this causes integer overflow problems in the VFS. [ Added EXT4_ERROR_INODE() to mark file system as corrupted. -TYT] Fixes: `a48380f769` (ext4: rename i_dir_acl to i_size_high) Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org	2016-12-10 09:55:01 -05:00
Miklos Szeredi	dfeef68862	vfs: remove ".readlink = generic_readlink" assignments If .readlink == NULL implies generic_readlink(). Generated by: to_del="\.readlink.=.generic_readlink" for i in `git grep -l $to_del`; do sed -i "/$to_del"/d $i; done Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2016-12-09 16:45:04 +01:00
Al Viro	8c54ca9c68	quota: constify struct path in quota_on Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-12-05 19:03:06 -05:00
Dan Carpenter	011c88e36c	ext4: remove another test in ext4_alloc_file_blocks() Before commit `c3fe493ccd` ('ext4: remove unneeded test in ext4_alloc_file_blocks()') then it was possible for "depth" to be -1 but now, it's not possible that it is negative. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>	2016-12-03 16:46:58 -05:00
Jan Kara	ab04df7818	ext4: fix checks for data=ordered and journal_async_commit options Combination of data=ordered mode and journal_async_commit mount option is invalid. However the check in parse_options() fails to detect the case where we simply end up defaulting to data=ordered mode and we detect the problem only on remount which triggers hard to understand failure to remount the filesystem. Fix the checking of mount options to take into account also the default mode by moving the check somewhat later in the mount sequence. Reported-by: Wolfgang Walter <linux@stwm.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-12-03 16:20:53 -05:00
Theodore Ts'o	4db0d88e2e	ext4: fix reading new encrypted symlinks on no-journal file systems On a filesystem with no journal, a symlink longer than about 32 characters (exact length depending on padding for encryption) could not be followed or read immediately after being created in an encrypted directory. This happened because when the symlink data went through the delayed allocation path instead of the journaling path, the symlink was incorrectly detected as a "fast" symlink rather than a "slow" symlink until its data was written out. To fix this, disable delayed allocation for symlinks, since there is no benefit for delayed allocation anyway. Reported-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-12-02 12:12:53 -05:00
Eryu Guan	3a4b77cd47	ext4: validate s_first_meta_bg at mount time Ralf Spenneberg reported that he hit a kernel crash when mounting a modified ext4 image. And it turns out that kernel crashed when calculating fs overhead (ext4_calculate_overhead()), this is because the image has very large s_first_meta_bg (debug code shows it's 842150400), and ext4 overruns the memory in count_overhead() when setting bitmap buffer, which is PAGE_SIZE. ext4_calculate_overhead(): buf = get_zeroed_page(GFP_NOFS); <=== PAGE_SIZE buffer blks = count_overhead(sb, i, buf); count_overhead(): for (j = ext4_bg_num_gdb(sb, grp); j > 0; j--) { <=== j = 842150400 ext4_set_bit(EXT4_B2C(sbi, s++), buf); <=== buffer overrun count++; } This can be reproduced easily for me by this script: #!/bin/bash rm -f fs.img mkdir -p /mnt/ext4 fallocate -l 16M fs.img mke2fs -t ext4 -O bigalloc,meta_bg,^resize_inode -F fs.img debugfs -w -R "ssv first_meta_bg 842150400" fs.img mount -o loop fs.img /mnt/ext4 Fix it by validating s_first_meta_bg first at mount time, and refusing to mount if its value exceeds the largest possible meta_bg number. Reported-by: Ralf Spenneberg <ralf@os-t.de> Signed-off-by: Eryu Guan <guaneryu@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca>	2016-12-01 15:08:37 -05:00
Eric Biggers	d7614cc161	ext4: correctly detect when an xattr value has an invalid size It was possible for an xattr value to have a very large size, which would then pass validation on 32-bit architectures due to a pointer wraparound. Fix this by validating the size in a way which avoids pointer wraparound. It was also possible that a value's size would fit in the available space but its padded size would not. This would cause an out-of-bounds memory write in ext4_xattr_set_entry when replacing the xattr value. For example, if an xattr value of unpadded size 253 bytes went until the very end of the inode or block, then using setxattr(2) to replace this xattr's value with 256 bytes would cause a write to the 3 bytes past the end of the inode or buffer, and the new xattr value would be incorrectly truncated. Fix this by requiring that the padded size fit in the available space rather than the unpadded size. This patch shouldn't have any noticeable effect on non-corrupted/non-malicious filesystems. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-12-01 14:57:29 -05:00
Eric Biggers	290ab23001	ext4: don't read out of bounds when checking for in-inode xattrs With i_extra_isize equal to or close to the available space, it was possible for us to read past the end of the inode when trying to detect or validate in-inode xattrs. Fix this by checking for the needed extra space first. This patch shouldn't have any noticeable effect on non-corrupted/non-malicious filesystems. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca>	2016-12-01 14:51:58 -05:00
Eric Biggers	2dc8d9e19b	ext4: forbid i_extra_isize not divisible by 4 i_extra_isize not divisible by 4 is problematic for several reasons: - It causes the in-inode xattr space to be misaligned, but the xattr header and entries are not declared __packed to express this possibility. This may cause poor performance or incorrect code generation on some platforms. - When validating the xattr entries we can read past the end of the inode if the size available for xattrs is not a multiple of 4. - It allows the nonsensical i_extra_isize=1, which doesn't even leave enough room for i_extra_isize itself. Therefore, update ext4_iget() to consider i_extra_isize not divisible by 4 to be an error, like the case where i_extra_isize is too large. This also matches the rule recently added to e2fsck for determining whether an inode has valid i_extra_isize. This patch shouldn't have any noticeable effect on non-corrupted/non-malicious filesystems, since the size of ext4_inode has always been a multiple of 4. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca>	2016-12-01 14:43:33 -05:00
Eric Biggers	ba679017ef	ext4: disable pwsalt ioctl when encryption disabled by config On a CONFIG_EXT4_FS_ENCRYPTION=n kernel, the ioctls to get and set encryption policies were disabled but EXT4_IOC_GET_ENCRYPTION_PWSALT was not. But there's no good reason to expose the pwsalt ioctl if the kernel doesn't support encryption. The pwsalt ioctl was also disabled pre-4.8 (via ext4_sb_has_crypto() previously returning 0 when encryption was disabled by config) and seems to have been enabled by mistake when ext4 encryption was refactored to use fs/crypto/. So let's disable it again. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-12-01 11:55:51 -05:00
Eric Biggers	35997d1ce8	ext4: get rid of ext4_sb_has_crypto() ext4_sb_has_crypto() just called through to ext4_has_feature_encrypt(), and all callers except one were already using the latter. So remove it and switch its one caller to ext4_has_feature_encrypt(). Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-12-01 11:54:18 -05:00
Daeho Jeong	05ac5aa18a	ext4: fix inode checksum calculation problem if i_extra_size is small We've fixed the race condition problem in calculating ext4 checksum value in commit `b47820edd1` ("ext4: avoid modifying checksum fields directly during checksum veficationon"). However, by this change, when calculating the checksum value of inode whose i_extra_size is less than 4, we couldn't calculate the checksum value in a proper way. This problem was found and reported by Nix, Thank you. Reported-by: Nix <nix@esperi.org.uk> Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com> Signed-off-by: Youngjin Gil <youngjin.gil@samsung.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-12-01 11:49:12 -05:00
Jan Kara	6dcc693bc5	ext4: warn when page is dirtied without buffers Warn when a page is dirtied without buffers (as that will likely lead to a crash in ext4_writepages()) or when it gets newly dirtied without the page being locked (as there is nothing that prevents buffers to get stripped just before calling set_page_dirty() under memory pressure). Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-12-01 11:46:40 -05:00
Jan Kara	d14e7683ec	ext4: be more strict when verifying flags set via SETFLAGS ioctls Currently we just silently ignore flags that we don't understand (or that cannot be manipulated) through EXT4_IOC_SETFLAGS and EXT4_IOC_FSSETXATTR ioctls. This makes it problematic for the unused flags to be used in future (some app may be inadvertedly setting them and we won't notice until the flag gets used). Also this is inconsistent with other filesystems like XFS or BTRFS which return EOPNOTSUPP when they see a flag they cannot set. ext4 has the additional problem that there are flags which are returned by EXT4_IOC_GETFLAGS ioctl but which cannot be modified via EXT4_IOC_SETFLAGS. So we have to be careful to ignore value of these flags and not fail the ioctl when they are set (as e.g. chattr(1) passes flags returned from EXT4_IOC_GETFLAGS to EXT4_IOC_SETFLAGS without any masking and thus we'd break this utility). Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-11-29 11:18:39 -05:00
Jan Kara	f8011d93a2	ext4: add EXT4_JOURNAL_DATA_FL and EXT4_EXTENTS_FL to modifiable mask Add EXT4_JOURNAL_DATA_FL and EXT4_EXTENTS_FL to EXT4_FL_USER_MODIFIABLE to recognize that they are modifiable by userspace. So far we got away without having them there because ext4_ioctl_setflags() treats them in a special way. But it was really confusing like that. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-11-29 11:13:13 -05:00
Eric Sandeen	9060dd2c50	ext4: fix mmp use after free during unmount In ext4_put_super, we call brelse on the buffer head containing the ext4 superblock, but then try to use it when we stop the mmp thread, because when the thread shuts down it does: write_mmp_block ext4_mmp_csum_set ext4_has_metadata_csum WARN_ON_ONCE(ext4_has_feature_metadata_csum(sb)...) which reaches into sb->s_fs_info->s_es->s_feature_ro_compat, which lives in the superblock buffer s_sbh which we just released. Fix this by moving the brelse down to a point where we are no longer using it. Reported-by: Wang Shu <shuwang@redhat.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca>	2016-11-26 14:24:51 -05:00
Jan Kara	4f5a763c9a	ext4: Add select for CONFIG_FS_IOMAP When ext4 is compiled with DAX support, it now needs the iomap code. Add appropriate select to Kconfig. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-11-22 23:21:58 -05:00
Eric Biggers	2f8f5e76c7	ext4: avoid lockdep warning when inheriting encryption context On a lockdep-enabled kernel, xfstests generic/027 fails due to a lockdep warning when run on ext4 mounted with -o test_dummy_encryption: xfs_io/4594 is trying to acquire lock: (jbd2_handle ){++++.+}, at: [<ffffffff813096ef>] jbd2_log_wait_commit+0x5/0x11b but task is already holding lock: (jbd2_handle ){++++.+}, at: [<ffffffff813000de>] start_this_handle+0x354/0x3d8 The abbreviated call stack is: [<ffffffff813096ef>] ? jbd2_log_wait_commit+0x5/0x11b [<ffffffff8130972a>] jbd2_log_wait_commit+0x40/0x11b [<ffffffff813096ef>] ? jbd2_log_wait_commit+0x5/0x11b [<ffffffff8130987b>] ? __jbd2_journal_force_commit+0x76/0xa6 [<ffffffff81309896>] __jbd2_journal_force_commit+0x91/0xa6 [<ffffffff813098b9>] jbd2_journal_force_commit_nested+0xe/0x18 [<ffffffff812a6049>] ext4_should_retry_alloc+0x72/0x79 [<ffffffff812f0c1f>] ext4_xattr_set+0xef/0x11f [<ffffffff812cc35b>] ext4_set_context+0x3a/0x16b [<ffffffff81258123>] fscrypt_inherit_context+0xe3/0x103 [<ffffffff812ab611>] __ext4_new_inode+0x12dc/0x153a [<ffffffff812bd371>] ext4_create+0xb7/0x161 When a file is created in an encrypted directory, ext4_set_context() is called to set an encryption context on the new file. This calls ext4_xattr_set(), which contains a retry loop where the journal is forced to commit if an ENOSPC error is encountered. If the task actually were to wait for the journal to commit in this case, then it would deadlock because a handle remains open from __ext4_new_inode(), so the running transaction can't be committed yet. Fortunately, __jbd2_journal_force_commit() avoids the deadlock by not allowing the running transaction to be committed while the current task has it open. However, the above lockdep warning is still triggered. This was a false positive which was introduced by: `1eaa566d36`: jbd2: track more dependencies on transaction commit Fix the problem by passing the handle through the 'fs_data' argument to ext4_set_context(), then using ext4_xattr_set_handle() instead of ext4_xattr_set(). And in the case where no journal handle is specified and ext4_set_context() has to open one, add an ENOSPC retry loop since in that case it is the outermost transaction. Signed-off-by: Eric Biggers <ebiggers@google.com>	2016-11-21 11:52:44 -05:00
Ross Zwisler	d086630e19	ext4: remove unused function ext4_aligned_io() The last user of ext4_aligned_io() was the DAX path in ext4_direct_IO_write(). This usage was removed by Jan Kara's patch entitled "ext4: Rip out DAX handling from direct IO path". Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-11-21 11:51:44 -05:00
Jan Kara	0bd2d5ec3d	ext4: rip out DAX handling from direct IO path Reads and writes for DAX inodes should no longer end up in direct IO code. Rip out the support and add a warning. Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-11-20 18:53:30 -05:00
Jan Kara	e2ae766c1b	ext4: convert DAX faults to iomap infrastructure Convert DAX faults to use iomap infrastructure. We would not have to start transaction in ext4_dax_fault() anymore since ext4_iomap_begin takes care of that but so far we do that to avoid lock inversion of transaction start with DAX entry lock which gets acquired in dax_iomap_fault() before calling ->iomap_begin handler. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-11-20 18:51:24 -05:00
Jan Kara	96f8ba3dd6	ext4: avoid split extents for DAX writes Currently mapping of blocks for DAX writes happen with EXT4_GET_BLOCKS_PRE_IO flag set. That has a result that each ext4_map_blocks() call creates a separate written extent, although it could be merged to the neighboring extents in the extent tree. The reason for using this flag is that in case the extent is unwritten, we need to convert it to written one and zero it out. However this "convert mapped range to written" operation is already implemented by ext4_map_blocks() for the case of data writes into unwritten extent. So just use flags for that mode of operation, simplify the code, and avoid unnecessary split extents. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-11-20 18:10:09 -05:00
Jan Kara	776722e85d	ext4: DAX iomap write support Implement DAX writes using the new iomap infrastructure instead of overloading the direct IO path. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-11-20 18:09:11 -05:00
Jan Kara	47e6935136	ext4: use iomap for zeroing blocks in DAX mode Use iomap infrastructure for zeroing blocks when in DAX mode. ext4_iomap_begin() handles read requests just fine and that's all that is needed for iomap_zero_range(). Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-11-20 18:08:05 -05:00
Jan Kara	364443cbcf	ext4: convert DAX reads to iomap infrastructure Implement basic iomap_begin function that handles reading and use it for DAX reads. Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-11-20 17:36:06 -05:00
Jan Kara	a3caa24b70	ext4: only set S_DAX if DAX is really supported Currently we have S_DAX set inode->i_flags for a regular file whenever ext4 is mounted with dax mount option. However in some cases we cannot really do DAX - e.g. when inode is marked to use data journalling, when inode data is being encrypted, or when inode is stored inline. Make sure S_DAX flag is appropriately set/cleared in these cases. Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-11-20 17:32:59 -05:00
Jan Kara	213bcd9ccb	ext4: factor out checks from ext4_file_write_iter() Factor out checks of 'from' and whether we are overwriting out of ext4_file_write_iter() so that the function is easier to follow. Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-11-20 17:29:51 -05:00

1 2 3 4 5 ...

2904 Commits