linux/fs
Oleg Nesterov d80e731eca epoll: introduce POLLFREE to flush ->signalfd_wqh before kfree()
This patch is intentionally incomplete to simplify the review.
It ignores ep_unregister_pollwait() which plays with the same wqh.
See the next change.

epoll assumes that the EPOLL_CTL_ADD'ed file controls everything
f_op->poll() needs. In particular it assumes that the wait queue
can't go away until eventpoll_release(). This is not true in case
of signalfd, the task which does EPOLL_CTL_ADD uses its ->sighand
which is not connected to the file.

This patch adds the special event, POLLFREE, currently only for
epoll. It expects that init_poll_funcptr()'ed hook should do the
necessary cleanup. Perhaps it should be defined as EPOLLFREE in
eventpoll.

__cleanup_sighand() is changed to do wake_up_poll(POLLFREE) if
->signalfd_wqh is not empty, we add the new signalfd_cleanup()
helper.

ep_poll_callback(POLLFREE) simply does list_del_init(task_list).
This make this poll entry inconsistent, but we don't care. If you
share epoll fd which contains our sigfd with another process you
should blame yourself. signalfd is "really special". I simply do
not know how we can define the "right" semantics if it used with
epoll.

The main problem is, epoll calls signalfd_poll() once to establish
the connection with the wait queue, after that signalfd_poll(NULL)
returns the different/inconsistent results depending on who does
EPOLL_CTL_MOD/signalfd_read/etc. IOW: apart from sigmask, signalfd
has nothing to do with the file, it works with the current thread.

In short: this patch is the hack which tries to fix the symptoms.
It also assumes that nobody can take tasklist_lock under epoll
locks, this seems to be true.

Note:

	- we do not have wake_up_all_poll() but wake_up_poll()
	  is fine, poll/epoll doesn't use WQ_FLAG_EXCLUSIVE.

	- signalfd_cleanup() uses POLLHUP along with POLLFREE,
	  we need a couple of simple changes in eventpoll.c to
	  make sure it can't be "lost".

Reported-by: Maxime Bizon <mbizon@freebox.fr>
Cc: <stable@kernel.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-02-24 11:42:50 -08:00
..
9p Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs 2012-01-10 15:09:01 -08:00
adfs vfs: switch ->show_options() to struct dentry * 2012-01-06 23:19:54 -05:00
affs affs: propagate umode_t 2012-01-03 22:55:04 -05:00
afs switch ->create() to umode_t 2012-01-03 22:54:53 -05:00
autofs4 autofs4 - fix lockdep splat in autofs 2012-02-13 20:45:37 -05:00
befs vfs: fix the stupidity with i_dentry in inode destructors 2012-01-03 22:52:40 -05:00
bfs switch ->create() to umode_t 2012-01-03 22:54:53 -05:00
btrfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs 2012-02-24 09:02:53 -08:00
cachefiles fs: move code out of buffer.c 2012-01-03 22:54:07 -05:00
ceph Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client 2012-02-02 15:47:33 -08:00
cifs cifs: don't return error from standard_receive3 after marking response malformed 2012-02-07 22:25:31 -06:00
coda coda: switch coda_cnode_make() to sane API as well, clean coda_lookup() 2012-01-10 11:13:16 -05:00
configfs configfs: convert to umode_t 2012-01-03 22:54:57 -05:00
cramfs Merge branches 'vfsmount-guts', 'umode_t' and 'partitions' into Z 2012-01-06 23:15:54 -05:00
debugfs kernel-doc: fix new warnings in debugfs 2012-01-24 10:47:41 -08:00
devpts devpts: fix double-free on mount failure 2012-01-08 20:19:03 -05:00
dlm Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm 2012-01-10 14:55:55 -08:00
ecryptfs ecryptfs: remove the second argument of k[un]map_atomic() 2012-02-16 16:06:27 -06:00
efs vfs: fix the stupidity with i_dentry in inode destructors 2012-01-03 22:52:40 -05:00
exofs Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd 2012-01-09 12:51:01 -08:00
exportfs
ext2 ext2: protect inode changes in the SETVERSION and SETFLAGS ioctls 2012-01-11 13:39:02 +01:00
ext3 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs 2012-01-09 12:51:21 -08:00
ext4 Merge branch 'for_linus' into for_linus_merged 2012-01-10 11:54:07 -05:00
fat Merge branch 'usb-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb 2012-01-09 12:09:47 -08:00
freevxfs fs: propagate umode_t, misc bits 2012-01-03 22:55:10 -05:00
fscache FS-Cache: Fix __fscache_uncache_all_inode_pages()'s outer loop 2011-07-21 10:59:16 -07:00
fuse Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse 2012-01-12 12:39:21 -08:00
gfs2 GFS2: Fix nlink setting on inode creation 2012-01-11 12:35:05 +00:00
hfs vfs: switch ->show_options() to struct dentry * 2012-01-06 23:19:54 -05:00
hfsplus hfsplus: creation of hidden dir on mount can fail 2012-01-10 17:48:52 -05:00
hostfs vfs: switch ->show_options() to struct dentry * 2012-01-06 23:19:54 -05:00
hpfs switch ->mknod() to umode_t 2012-01-03 22:54:54 -05:00
hppfs vfs: for usbfs, etc. internal vfsmounts ->mnt_sb->s_root == ->mnt_root 2012-01-03 22:52:41 -05:00
hugetlbfs mm: compaction: introduce sync-light migration for use by compaction 2012-01-12 20:13:09 -08:00
isofs isofs: inode leak on mount failure 2012-01-09 10:48:11 -05:00
jbd jbd: Issue cache flush after checkpointing 2012-01-11 13:36:57 +01:00
jbd2 Merge branch 'for_linus' into for_linus_merged 2012-01-10 11:54:07 -05:00
jffs2 jffs2: do not initialize variable unnecessarily 2012-01-11 09:53:51 +00:00
jfs Merge branch 'pm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm 2012-01-08 13:10:57 -08:00
lockd module_param: make bool parameters really bool (drivers & misc) 2012-01-13 09:32:20 +10:30
logfs mtd: fix merge conflict resolution breakage 2012-02-01 11:10:24 -08:00
minix Merge branch 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-01-08 12:19:57 -08:00
ncpfs vfs: switch ->show_options() to struct dentry * 2012-01-06 23:19:54 -05:00
nfs NFSv4: fix server_scope memory leak 2012-02-17 17:34:03 -05:00
nfs_common
nfsd Merge branch 'for-3.3' of git://linux-nfs.org/~bfields/linux 2012-01-14 12:26:41 -08:00
nilfs2 nilfs2: avoid overflowing segment numbers in nilfs_ioctl_clean_segments() 2012-02-08 19:03:51 -08:00
nls NLS: raname "maxlen" to "maxout" in UTF conversion routines 2011-11-26 19:58:47 -08:00
notify fsnotify: don't BUG in fsnotify_destroy_mark() 2012-01-14 18:01:42 -08:00
ntfs module_param: avoid bool abuse, add bint for special cases. 2012-01-13 09:32:17 +10:30
ocfs2 ocfs2: deal with wraparounds of i_nlink in ocfs2_rename() 2012-02-13 20:45:39 -05:00
omfs omfs: propagate umode_t 2012-01-03 22:55:01 -05:00
openpromfs vfs: fix the stupidity with i_dentry in inode destructors 2012-01-03 22:52:40 -05:00
proc Fix race in process_vm_rw_core 2012-02-02 12:55:17 -08:00
pstore pstore: gracefully handle NULL pstore_info functions 2011-11-18 13:49:00 -08:00
qnx4 qnx4: don't leak ->BitMap on late failure exits 2012-01-19 13:54:36 -05:00
quota quota: Fix deadlock with suspend and quotas 2012-02-13 20:45:39 -05:00
ramfs pohmelfs: propagate umode_t 2012-01-03 22:55:07 -05:00
reiserfs reiserfs: don't lock root inode searching 2012-01-10 16:30:54 -08:00
romfs MTD pull for 3.3 2012-01-10 13:45:22 -08:00
squashfs Merge git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-next 2012-01-13 10:34:57 -08:00
sysfs sysfs: Complain bitterly about attempts to remove files from nonexistent directories. 2012-01-24 12:12:32 -08:00
sysv vfs: prefer ->dentry->d_sb to ->mnt->mnt_sb 2012-01-06 23:16:53 -05:00
ubifs UBIFS: fix non-debug configuration build 2012-01-15 13:46:02 +02:00
udf Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs 2012-01-09 12:51:21 -08:00
ufs vfs: switch ->show_options() to struct dentry * 2012-01-06 23:19:54 -05:00
xfs xfs: make inode quota check more general 2012-02-21 10:12:43 -06:00
aio.c Unused iocbs in a batch should not be accounted as active. 2012-01-13 20:39:44 -08:00
anon_inodes.c vfs: dont chain pipe/anon/socket on superblock s_inodes list 2011-07-26 12:57:09 -04:00
attr.c switch is_sxid() to umode_t 2012-01-03 22:55:11 -05:00
bad_inode.c switch ->mknod() to umode_t 2012-01-03 22:54:54 -05:00
binfmt_aout.c
binfmt_elf_fdpic.c consolidate BINPRM_FLAGS_ENFORCE_NONDUMP handling 2011-07-20 01:43:10 -04:00
binfmt_elf.c fs: binfmt_elf: create Kconfig variable for PIE randomization 2012-01-10 16:30:51 -08:00
binfmt_em86.c
binfmt_flat.c
binfmt_misc.c vfs: prefer ->dentry->d_sb to ->mnt->mnt_sb 2012-01-06 23:16:53 -05:00
binfmt_script.c
binfmt_som.c
bio-integrity.c fs: add export.h to files using EXPORT_SYMBOL/THIS_MODULE macros 2011-10-31 19:30:31 -04:00
bio.c bio: don't overflow in bio_get_nr_vecs() 2012-02-08 22:07:18 +01:00
block_dev.c vfs: cache request_queue in struct block_device 2012-01-12 20:13:12 -08:00
buffer.c fs: move code out of buffer.c 2012-01-03 22:54:07 -05:00
char_dev.c char_dev.c: fix up some whitespace errors 2011-12-13 11:18:17 -08:00
compat_binfmt_elf.c
compat_ioctl.c Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media 2012-01-15 12:49:56 -08:00
compat.c vfs: fix compat_sys_stat() handling of overflows in st_nlink 2012-02-13 20:45:39 -05:00
dcache.c vfs: fix panic in __d_lookup() with high dentry hashtable counts 2012-02-13 20:45:38 -05:00
dcookies.c
direct-io.c Restore direct_io / truncate locking API 2012-02-23 15:56:21 -08:00
drop_caches.c
eventfd.c
eventpoll.c epoll: introduce POLLFREE to flush ->signalfd_wqh before kfree() 2012-02-24 11:42:50 -08:00
exec.c exec: fix use-after-free bug in setup_new_exec() 2012-02-06 15:15:20 -08:00
fcntl.c
fhandle.c vfs: prefer ->dentry->d_sb to ->mnt->mnt_sb 2012-01-06 23:16:53 -05:00
fifo.c
file_table.c vfs: prevent remount read-only if pending removes 2012-01-06 23:20:13 -05:00
file.c
filesystems.c vfs: convert fs_supers to hlist 2012-01-03 22:52:39 -05:00
fs_struct.c
fs-writeback.c writeback: fix NULL bdi->dev in trace writeback_single_inode 2012-02-01 16:53:40 +08:00
generic_acl.c switch posix_acl_equiv_mode() to umode_t * 2011-08-01 02:10:06 -04:00
inode.c vfs: fix panic in __d_lookup() with high dentry hashtable counts 2012-02-13 20:45:38 -05:00
internal.h vfs: protect remounting superblock read-only 2012-01-06 23:20:12 -05:00
ioctl.c vfs: fix up ENOIOCTLCMD error handling 2012-01-05 15:40:12 -08:00
ioprio.c block: strip out locking optimization in put_io_context() 2012-02-07 07:51:30 +01:00
Kconfig Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd 2012-01-09 12:51:01 -08:00
Kconfig.binfmt fs: binfmt_elf: create Kconfig variable for PIE randomization 2012-01-10 16:30:51 -08:00
libfs.c fs: move code out of buffer.c 2012-01-03 22:54:07 -05:00
locks.c vfs: fix handling of lock allocation failure in lease-break case 2011-12-26 10:25:26 -08:00
Makefile Merge branches 'vfsmount-guts', 'umode_t' and 'partitions' into Z 2012-01-06 23:15:54 -05:00
mbcache.c
mount.h vfs: keep list of mounts for each superblock 2012-01-06 23:20:12 -05:00
mpage.c fs: remove unneeded plug in mpage_readpages() 2012-01-12 09:19:54 +01:00
namei.c vfs: fix d_inode_lookup() dentry ref leak 2012-02-13 20:45:37 -05:00
namespace.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2012-01-08 13:21:22 -08:00
no-block.c
open.c switch security_path_chmod() to struct path * 2012-01-06 23:16:53 -05:00
pipe.c pipe: fail cleanly when root tries F_SETPIPE_SZ with big size 2012-01-12 20:13:04 -08:00
pnode.c vfs: switch pnode.h macros to struct mount * 2012-01-03 22:57:11 -05:00
pnode.h vfs: switch pnode.h macros to struct mount * 2012-01-03 22:57:11 -05:00
posix_acl.c vfs: pass all mask flags check_acl and posix_acl_permission 2011-10-28 14:58:54 +02:00
proc_namespace.c vfs: switch ->show_options() to struct dentry * 2012-01-06 23:19:54 -05:00
read_write.c Cross Memory Attach 2011-10-31 17:30:44 -07:00
read_write.h
readdir.c
select.c sys_poll: fix incorrect type for 'timeout' parameter 2012-02-21 17:24:20 -08:00
seq_file.c constify seq_file stuff 2012-01-03 22:52:40 -05:00
signalfd.c epoll: introduce POLLFREE to flush ->signalfd_wqh before kfree() 2012-02-24 11:42:50 -08:00
splice.c fs: move code out of buffer.c 2012-01-03 22:54:07 -05:00
stack.c filesystems: add set_nlink() 2011-11-02 12:53:43 +01:00
stat.c readlinkat: ensure we return ENOENT for the empty pathname for normal lookups 2011-11-02 12:53:42 +01:00
statfs.c vfs: new helper - vfs_ustat() 2012-01-03 22:53:07 -05:00
super.c vfs: Provide function to get superblock and wait for it to thaw 2012-02-13 20:45:38 -05:00
sync.c fs: move code out of buffer.c 2012-01-03 22:54:07 -05:00
timerfd.c
utimes.c
xattr_acl.c
xattr.c vfs: mnt_drop_write_file() 2012-01-03 22:52:40 -05:00