linux/fs/cachefiles
Kiran Kumar Modukuri 934140ab02 cachefiles: Fix refcounting bug in backing-file read monitoring
cachefiles_read_waiter() has the right to access a 'monitor' object by
virtue of being called under the waitqueue lock for one of the pages in its
purview.  However, it has no ref on that monitor object or on the
associated operation.

What it is allowed to do is to move the monitor object to the operation's
to_do list, but once it drops the work_lock, it's actually no longer
permitted to access that object.  However, it is trying to enqueue the
retrieval operation for processing - but it can only do this via a pointer
in the monitor object, something it shouldn't be doing.

If it doesn't enqueue the operation, the operation may not get processed.
If the order is flipped so that the enqueue is first, then it's possible
for the work processor to look at the to_do list before the monitor is
enqueued upon it.

Fix this by getting a ref on the operation so that we can trust that it
will still be there once we've added the monitor to the to_do list and
dropped the work_lock.  The op can then be enqueued after the lock is
dropped.

The bug can manifest in one of a couple of ways.  The first manifestation
looks like:

 FS-Cache:
 FS-Cache: Assertion failed
 FS-Cache: 6 == 5 is false
 ------------[ cut here ]------------
 kernel BUG at fs/fscache/operation.c:494!
 RIP: 0010:fscache_put_operation+0x1e3/0x1f0
 ...
 fscache_op_work_func+0x26/0x50
 process_one_work+0x131/0x290
 worker_thread+0x45/0x360
 kthread+0xf8/0x130
 ? create_worker+0x190/0x190
 ? kthread_cancel_work_sync+0x10/0x10
 ret_from_fork+0x1f/0x30

This is due to the operation being in the DEAD state (6) rather than
INITIALISED, COMPLETE or CANCELLED (5) because it's already passed through
fscache_put_operation().

The bug can also manifest like the following:

 kernel BUG at fs/fscache/operation.c:69!
 ...
    [exception RIP: fscache_enqueue_operation+246]
 ...
 #7 [ffff883fff083c10] fscache_enqueue_operation at ffffffffa0b793c6
 #8 [ffff883fff083c28] cachefiles_read_waiter at ffffffffa0b15a48
 #9 [ffff883fff083c48] __wake_up_common at ffffffff810af028

I'm not entirely certain as to which is line 69 in Lei's kernel, so I'm not
entirely clear which assertion failed.

Fixes: 9ae326a690 ("CacheFiles: A cache that backs onto a mounted filesystem")
Reported-by: Lei Xue <carmark.dlut@gmail.com>
Reported-by: Vegard Nossum <vegard.nossum@gmail.com>
Reported-by: Anthony DeRobertis <aderobertis@metrics.net>
Reported-by: NeilBrown <neilb@suse.com>
Reported-by: Daniel Axtens <dja@axtens.net>
Reported-by: Kiran Kumar Modukuri <kiran.modukuri@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Daniel Axtens <dja@axtens.net>
2018-07-25 14:49:00 +01:00
..
bind.c VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb) 2017-07-17 08:45:34 +01:00
daemon.c vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
interface.c fscache: Pass object size in rather than calling back for it 2018-04-06 14:05:14 +01:00
internal.h fscache: Add tracepoints 2018-04-04 13:41:27 +01:00
Kconfig CacheFiles: A cache that backs onto a mounted filesystem 2009-04-03 16:42:41 +01:00
key.c CacheFiles: Downgrade the requirements passed to the allocator 2012-12-20 21:58:25 +00:00
main.c fscache: Add tracepoints 2018-04-04 13:41:27 +01:00
Makefile License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
namei.c cachefiles: vfs_mkdir() might succeed leaving dentry negative unhashed 2018-05-21 14:30:10 -04:00
proc.c proc: introduce proc_create_seq{,_data} 2018-05-16 07:23:35 +02:00
rdwr.c cachefiles: Fix refcounting bug in backing-file read monitoring 2018-07-25 14:49:00 +01:00
security.c VFS: fs/cachefiles: d_backing_inode() annotations 2015-04-15 15:06:59 -04:00
xattr.c fscache: Pass object size in rather than calling back for it 2018-04-06 14:05:14 +01:00