linux/fs
Dave Chinner 8c110d43c6 iomap: readpages doesn't zero page tail beyond EOF
When we read the EOF page of the file via readpages, we need
to zero the region beyond EOF that we either do not read or
should not contain data so that mmap does not expose stale data to
user applications.

However, iomap_adjust_read_range() fails to detect EOF correctly,
and so fsx on 1k block size filesystems fails very quickly with
mapreads exposing data beyond EOF. There are two problems here.

Firstly, when calculating the end block of the EOF byte, we have
to round the size by one to avoid a block aligned EOF from reporting
a block too large. i.e. a size of 1024 bytes is 1 block, which in
index terms is block 0. Therefore we have to calculate the end block
from (isize - 1), not isize.

The second bug is determining if the current page spans EOF, and so
whether we need split it into two half, one for the IO, and the
other for zeroing. Unfortunately, the code that checks whether
we should split the block doesn't actually check if we span EOF, it
just checks if the read spans the /offset in the page/ that EOF
sits on. So it splits every read into two if EOF is not page
aligned, regardless of whether we are reading the EOF block or not.

Hence we need to restrict the "does the read span EOF" check to
just the page that spans EOF, not every page we read.

This patch results in correct EOF detection through readpages:

xfs_vm_readpages:     dev 259:0 ino 0x43 nr_pages 24
xfs_iomap_found:      dev 259:0 ino 0x43 size 0x66c00 offset 0x4f000 count 98304 type hole startoff 0x13c startblock 1368 blockcount 0x4
iomap_readpage_actor: orig pos 323584 pos 323584, length 4096, poff 0 plen 4096, isize 420864
xfs_iomap_found:      dev 259:0 ino 0x43 size 0x66c00 offset 0x50000 count 94208 type hole startoff 0x140 startblock 1497 blockcount 0x5c
iomap_readpage_actor: orig pos 327680 pos 327680, length 94208, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 331776 pos 331776, length 90112, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 335872 pos 335872, length 86016, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 339968 pos 339968, length 81920, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 344064 pos 344064, length 77824, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 348160 pos 348160, length 73728, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 352256 pos 352256, length 69632, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 356352 pos 356352, length 65536, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 360448 pos 360448, length 61440, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 364544 pos 364544, length 57344, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 368640 pos 368640, length 53248, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 372736 pos 372736, length 49152, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 376832 pos 376832, length 45056, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 380928 pos 380928, length 40960, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 385024 pos 385024, length 36864, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 389120 pos 389120, length 32768, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 393216 pos 393216, length 28672, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 397312 pos 397312, length 24576, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 401408 pos 401408, length 20480, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 405504 pos 405504, length 16384, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 409600 pos 409600, length 12288, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 413696 pos 413696, length 8192, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 417792 pos 417792, length 4096, poff 0 plen 3072, isize 420864
iomap_readpage_actor: orig pos 420864 pos 420864, length 1024, poff 3072 plen 1024, isize 420864

As you can see, it now does full page reads until the last one which
is split correctly at the block aligned EOF, reading 3072 bytes and
zeroing the last 1024 bytes. The original version of the patch got
this right, but it got another case wrong.

The EOF detection crossing really needs to the the original length
as plen, while it starts at the end of the block, will be shortened
as up-to-date blocks are found on the page. This means "orig_pos +
plen" no longer points to the end of the page, and so will not
correctly detect EOF crossing. Hence we have to use the length
passed in to detect this partial page case:

xfs_filemap_fault:    dev 259:1 ino 0x43  write_fault 0
xfs_vm_readpage:      dev 259:1 ino 0x43 nr_pages 1
xfs_iomap_found:      dev 259:1 ino 0x43 size 0x2cc00 offset 0x2c000 count 4096 type hole startoff 0xb0 startblock 282 blockcount 0x4
iomap_readpage_actor: orig pos 180224 pos 181248, length 4096, poff 1024 plen 2048, isize 183296
xfs_iomap_found:      dev 259:1 ino 0x43 size 0x2cc00 offset 0x2cc00 count 1024 type hole startoff 0xb3 startblock 285 blockcount 0x1
iomap_readpage_actor: orig pos 183296 pos 183296, length 1024, poff 3072 plen 1024, isize 183296

Heere we see a trace where the first block on the EOF page is up to
date, hence poff = 1024 bytes. The offset into the page of EOF is
3072, so the range we want to read is 1024 - 3071, and the range we
want to zero is 3072 - 4095. You can see this is split correctly
now.

This fixes the stale data beyond EOF problem that fsx quickly
uncovers on 1k block size filesystems.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-11-21 10:10:54 -08:00
..
9p Merge branch 'work.afs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2018-11-01 19:58:52 -07:00
adfs adfs: use timespec64 for time conversion 2018-08-22 10:52:51 -07:00
affs affs: fix potential memory leak when parsing option 'prefix' 2018-05-28 12:36:41 +02:00
afs afs: Probe multiple fileservers simultaneously 2018-10-24 00:41:09 +01:00
autofs Merge branch 'akpm' (patches from Andrew) 2018-08-22 12:34:08 -07:00
befs fix a series of Documentation/ broken file name references 2018-06-15 18:10:01 -03:00
bfs bfs: add sanity check at bfs_fill_super() 2018-11-03 10:09:38 -07:00
btrfs vfs: rework data cloning infrastructure 2018-11-02 09:33:08 -07:00
cachefiles cachefiles: fix the race between cachefiles_bury_object() and rmdir(2) 2018-10-18 11:32:21 +02:00
ceph Merge branch 'work.afs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2018-11-01 19:58:52 -07:00
cifs cifs: fix signed/unsigned mismatch on aio_read patch 2018-11-02 14:09:42 -05:00
coda vfs: change inode times to use struct timespec64 2018-06-05 16:57:31 -07:00
configfs configfs: fix registered group removal 2018-07-17 06:14:07 -07:00
cramfs Make the Cramfs code more robust against filesystem corruptions, 2018-10-30 12:46:25 -07:00
crypto crypto: speck - remove Speck 2018-09-04 11:35:03 +08:00
debugfs Revert "debugfs: inode: debugfs_create_dir uses mode permission from parent" 2018-06-12 20:52:16 -07:00
devpts devpts: Convert to new IDA API 2018-08-21 23:54:17 -04:00
dlm iov_iter: Separate type from direction and use accessor functions 2018-10-24 00:41:07 +01:00
ecryptfs ecryptfs_rename(): verify that lower dentries are still OK after lock_rename() 2018-10-09 23:33:17 -04:00
efivarfs efivars: Call guid_parse() against guid_t type of variable 2018-07-22 14:13:44 +02:00
efs
exofs fs/exofs: only use true/false for asignment of bool type variable 2018-10-18 02:04:59 -04:00
exportfs ovl: do not try to reconnect a disconnected origin dentry 2018-04-12 12:04:49 +02:00
ext2 \n 2018-10-29 10:23:36 -07:00
ext4 for-linus-20181102 2018-11-02 11:25:48 -07:00
f2fs Merge branch 'xarray' of git://git.infradead.org/users/willy/linux-dax 2018-10-28 11:35:40 -07:00
fat fat: truncate inode timestamp updates in setattr 2018-10-31 08:54:14 -07:00
freevxfs freevxfs_lookup(): use d_splice_alias() 2018-05-22 14:27:51 -04:00
fscache fscache: Fix out of bound read in long cookie keys 2018-10-18 11:32:21 +02:00
fuse Merge branch 'work.afs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2018-11-01 19:58:52 -07:00
gfs2 Merge branch 'xarray' of git://git.infradead.org/users/willy/linux-dax 2018-10-28 11:35:40 -07:00
hfs fs/hfs/extent.c: fix array out of bounds read of array extent 2018-10-31 08:54:13 -07:00
hfsplus hfsplus: update timestamps on truncate() 2018-10-31 08:54:13 -07:00
hostfs vfs: discard ATTR_ATTR_FLAG 2018-08-17 16:20:28 -07:00
hpfs hpfs: remove unnecessary checks on the value of r when assigning error code 2018-08-25 12:42:33 -07:00
hugetlbfs mm: zero out the vma in vma_init() 2018-08-22 10:52:44 -07:00
isofs Update email address 2018-09-29 22:47:48 -04:00
jbd2 jbd2: fix use after free in jbd2_log_do_checkpoint() 2018-10-05 18:44:40 -04:00
jffs2 Merge branch 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2018-10-24 11:22:39 +01:00
jfs jfs: remove redundant dquot_initialize() in jfs_evict_inode() 2018-09-20 09:28:49 -05:00
kernfs mm: zero-seek shrinkers 2018-10-26 16:26:33 -07:00
lockd lockd: fix access beyond unterminated strings in prints 2018-10-29 16:58:04 -04:00
minix minix_lookup: use d_splice_alias() 2018-05-22 14:27:52 -04:00
nfs NFS client bugfixes for Linux 4.20 2018-11-04 08:20:09 -08:00
nfs_common
nfsd vfs: rework data cloning infrastructure 2018-11-02 09:33:08 -07:00
nilfs2 nilfs2: Convert to XArray 2018-10-21 10:46:42 -04:00
nls
notify fsnotify: Fix busy inodes during unmount 2018-10-25 15:49:19 +02:00
ntfs ntfs: don't open-code ERR_CAST 2018-10-12 22:46:50 -04:00
ocfs2 ocfs2: fix clusters leak in ocfs2_defrag_extent() 2018-11-03 10:09:37 -07:00
omfs omfs_lookup(): report IO errors, use d_splice_alias() 2018-05-22 14:27:58 -04:00
openpromfs openpromfs: switch to d_splice_alias() 2018-05-22 14:27:57 -04:00
orangefs Merge branch 'work.afs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2018-11-01 19:58:52 -07:00
overlayfs vfs: rework data cloning infrastructure 2018-11-02 09:33:08 -07:00
proc New gcc plugin: stackleak 2018-11-01 11:46:27 -07:00
pstore mm: remove CONFIG_HAVE_MEMBLOCK 2018-10-31 08:54:15 -07:00
qnx4 qnx4_lookup: use d_splice_alias() 2018-05-22 14:27:52 -04:00
qnx6 qnx6_lookup: switch to d_splice_alias() 2018-05-22 14:27:54 -04:00
quota fs/quota: Fix spectre gadget in do_quotactl 2018-08-22 18:17:48 +02:00
ramfs
reiserfs reiserfs: remove workaround code for GCC 3.x 2018-10-31 08:54:14 -07:00
romfs romfs_lookup: switch to d_splice_alias() 2018-05-22 14:27:55 -04:00
squashfs Squashfs: Compute expected length from inode size rather than block length 2018-08-02 09:34:02 -07:00
sysfs Driver core patches for 4.19-rc1 2018-08-18 11:44:53 -07:00
sysv fs/sysv/inode.c: use ktime_get_real_seconds() for superblock stamp 2018-08-22 10:52:51 -07:00
tracefs tracefs: Annotate tracefs_ops with __ro_after_init 2018-07-31 11:32:44 -04:00
ubifs ubifs: Remove unneeded semicolon 2018-10-23 13:49:02 +02:00
udf udf: Drop pack pragma from udf_sb.h 2018-09-07 10:32:23 +02:00
ufs fs/ufs: use ktime_get_real_seconds for sb and cg timestamps 2018-08-17 16:20:27 -07:00
xfs xfs: delalloc -> unwritten COW fork allocation can go wrong 2018-11-21 10:10:53 -08:00
aio.c y2038: globally rename compat_time to old_time32 2018-08-27 14:48:48 +02:00
anon_inodes.c anon_inode_getfile(): switch to alloc_file_pseudo() 2018-07-12 10:04:27 -04:00
attr.c fs: Fix attr.c kernel-doc 2018-07-03 16:44:45 -04:00
bad_inode.c get rid of 'opened' argument of ->atomic_open() - part 3 2018-07-12 10:04:20 -04:00
binfmt_aout.c exec: introduce finalize_exec() before start_thread() 2018-04-11 10:28:37 -07:00
binfmt_elf_fdpic.c treewide: kmalloc() -> kmalloc_array() 2018-06-12 16:19:22 -07:00
binfmt_elf.c signal: Distinguish between kernel_siginfo and siginfo 2018-10-03 16:47:43 +02:00
binfmt_em86.c
binfmt_flat.c exec: introduce finalize_exec() before start_thread() 2018-04-11 10:28:37 -07:00
binfmt_misc.c turn filp_clone_open() into inline wrapper for dentry_open() 2018-07-10 23:29:03 -04:00
binfmt_script.c
block_dev.c iov_iter: Use accessor function 2018-10-24 00:40:44 +01:00
buffer.c for-linus-20181102 2018-11-02 11:25:48 -07:00
char_dev.c
compat_binfmt_elf.c y2038: globally rename compat_time to old_time32 2018-08-27 14:48:48 +02:00
compat_ioctl.c media updates for v4.20-rc1 2018-10-29 14:29:58 -07:00
compat.c ncpfs: remove compat functionality 2018-06-05 19:23:26 +02:00
coredump.c signal: Distinguish between kernel_siginfo and siginfo 2018-10-03 16:47:43 +02:00
d_path.c
dax.c dax: Convert page fault handlers to XArray 2018-10-21 10:46:44 -04:00
dcache.c mm: remove include/linux/bootmem.h 2018-10-31 08:54:16 -07:00
dcookies.c
direct-io.c iov_iter: Use accessor function 2018-10-24 00:40:44 +01:00
drop_caches.c
eventfd.c Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL 2018-06-28 10:40:47 -07:00
eventpoll.c fs/eventpoll.c: simplify ep_is_linked() callers 2018-08-22 10:52:49 -07:00
exec.c vfs: require i_size <= SIZE_MAX in kernel_read_file() 2018-10-10 12:56:14 -04:00
fcntl.c signal: Distinguish between kernel_siginfo and siginfo 2018-10-03 16:47:43 +02:00
fhandle.c
file_table.c overlayfs update for 4.19 2018-08-21 18:19:09 -07:00
file.c fs: add ksys_close() wrapper; remove in-kernel calls to sys_close() 2018-04-02 20:16:00 +02:00
filesystems.c proc: introduce proc_create_single{,_data} 2018-05-16 07:23:35 +02:00
fs_pin.c
fs_struct.c
fs-writeback.c fs: Convert writeback to XArray 2018-10-21 10:46:42 -04:00
inode.c mm: remove include/linux/bootmem.h 2018-10-31 08:54:16 -07:00
internal.h overlayfs update for 4.19 2018-08-21 18:19:09 -07:00
ioctl.c vfs: rework data cloning infrastructure 2018-11-02 09:33:08 -07:00
iomap.c iomap: readpages doesn't zero page tail beyond EOF 2018-11-21 10:10:54 -08:00
Kconfig autofs: remove left-over autofs4 stubs 2018-06-11 08:22:34 -07:00
Kconfig.binfmt kconfig: move the "Executable file formats" menu to fs/Kconfig.binfmt 2018-08-02 08:06:55 +09:00
libfs.c
locks.c overlayfs update for 4.19 2018-08-21 18:19:09 -07:00
Makefile autofs: remove left-over autofs4 stubs 2018-06-11 08:22:34 -07:00
mbcache.c treewide: kmalloc() -> kmalloc_array() 2018-06-12 16:19:22 -07:00
mount.h
mpage.c mpage: mpage_readpages() should submit IO as read-ahead 2018-08-17 16:20:29 -07:00
namei.c namei: allow restricted O_CREAT of FIFOs and regular files 2018-08-23 18:48:43 -07:00
namespace.c mm: remove include/linux/bootmem.h 2018-10-31 08:54:16 -07:00
no-block.c
nsfs.c
open.c overlayfs update for 4.19 2018-08-21 18:19:09 -07:00
pipe.c Merge branch 'work.open3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2018-08-13 19:58:36 -07:00
pnode.c
pnode.h
posix_acl.c
proc_namespace.c
read_write.c vfs: vfs_dedupe_file_range() doesn't return EOPNOTSUPP 2018-11-21 10:10:54 -08:00
readdir.c fs: add ksys_getdents64() helper; remove in-kernel calls to sys_getdents64() 2018-04-02 20:16:02 +02:00
select.c y2038: globally rename compat_time to old_time32 2018-08-27 14:48:48 +02:00
seq_file.c fs/seq_file.c: simplify seq_file iteration code and interface 2018-08-17 16:20:28 -07:00
signalfd.c signal: Distinguish between kernel_siginfo and siginfo 2018-10-03 16:47:43 +02:00
splice.c iov_iter: Separate type from direction and use accessor functions 2018-10-24 00:41:07 +01:00
stack.c
stat.c y2038: Remove newstat family from default syscall set 2018-08-29 15:42:20 +02:00
statfs.c kernel: add kcompat_sys_{f,}statfs64() 2018-07-12 14:49:48 +01:00
super.c fsnotify: add super block object type 2018-09-03 15:14:01 +02:00
sync.c Changes for this release: 2018-04-04 12:44:02 -07:00
timerfd.c y2038: globally rename compat_time to old_time32 2018-08-27 14:48:48 +02:00
userfaultfd.c userfaultfd: disable irqs when taking the waitqueue lock 2018-10-26 16:25:18 -07:00
utimes.c y2038: utimes: Rework #ifdef guards for compat syscalls 2018-08-29 15:42:23 +02:00
xattr.c sysfs: Do not return POSIX ACL xattrs via listxattr 2018-09-18 07:30:48 -04:00