linux-next

mirror of https://github.com/edk2-porting/linux-next.git synced 2024-12-22 20:23:57 +08:00

Author	SHA1	Message	Date
Fred Isaman	9f52c2525e	pnfs: do not need to clear NFS_LAYOUT_BULK_RECALL flag We do not need to clear the NFS_LAYOUT_BULK_RECALL, as setting it guarantees that NFS_LAYOUT_DESTROYED will be set once any outstanding io is finished. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:40 -05:00
Fred Isaman	3851172244	pnfs: avoid incorrect use of layout stateid The code could violate the following from RFC5661, section 12.5.3: "Once a client has no more layouts on a file, the layout stateid is no longer valid and MUST NOT be used." This can occur when a layout already has a lseg, starts another non-everlapping LAYOUTGET, and a CB_LAYOUTRECALL for the existing lseg is processed before we hit pnfs_layout_process(). Solve by setting, each time the client has no more lsegs for a file, a flag which blocks further use of the layout and triggers its removal. This also fixes a second bug which occurs in the same instance as above. If we actually use pnfs_layout_process, we add the new lseg to the layout, but the layout has been removed from the nfs_client list by the intervening CB_LAYOUTRECALL and will not be added back. Thus the newly acquired lseg will not be properly returned in the event of a subsequent CB_LAYOUTRECALL. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:39 -05:00
Chuck Lever	53d4737580	NFS: NFSROOT should default to "proto=udp" There have been a number of recent reports that NFSROOT is no longer working with default mount options, but fails only with certain NICs. Brian Downing <bdowning@lavos.net> bisected to commit `56463e50` "NFS: Use super.c for NFSROOT mount option parsing". Among other things, this commit changes the default mount options for NFSROOT to use TCP instead of UDP as the underlying transport. TCP seems less able to deal with NICs that are slow to initialize. The system logs that have accompanied reports of problems all show that NFSROOT attempts to establish a TCP connection before the NIC is fully initialized, and thus the TCP connection attempt fails. When a TCP connection attempt fails during a mount operation, the NFS stack needs to fail the operation. Usually user space knows how and when to retry it. The network layer does not report a distinct error code for this particular failure mode. Thus, there isn't a clean way for the RPC client to see that it needs to retry in this case, but not in others. Because NFSROOT is used in some environments where it is not possible to update the kernel command line to specify "udp", the proper thing to do is change NFSROOT to use UDP by default, as it did before commit `56463e50`. To make it easier to see how to change default mount options for NFSROOT and to distinguish default settings from mandatory settings, I've adjusted a couple of areas to document the specifics. root_nfs_cat() is also modified to deal with commas properly when concatenating strings containing mount option lists. This keeps root_nfs_cat() call sites simpler, now that we may be concatenating multiple mount option strings. Tested-by: Brian Downing <bdowning@lavos.net> Tested-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: <stable@kernel.org> # 2.6.37 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:07 -05:00
Huang Weiyi	57df216bd8	nfs4: remove duplicated #include Remove duplicated #include('s) in fs/nfs/nfs4proc.c Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:18:37 -05:00
Trond Myklebust	f9feab1e18	NFSv4: nfs4_state_mark_reclaim_nograce() should be static There are no more external users of nfs4_state_mark_reclaim_nograce() or nfs4_state_mark_reclaim_reboot(), so mark them as static. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:18:36 -05:00
Trond Myklebust	ecac799a5e	NFSv4: Fix the setlk error handler Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:18:36 -05:00
Trond Myklebust	b4410c2f7f	NFSv4.1: Fix the handling of the SEQUENCE status bits We want SEQUENCE status bits to be handled by the state manager in order to avoid threading issues. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:18:35 -05:00
Trond Myklebust	0400a6b0cb	NFSv4/4.1: Fix nfs4_schedule_state_recovery abuses nfs4_schedule_state_recovery() should only be used when we need to force the state manager to check the lease. If we just want to start the state manager in order to handle a state recovery situation, we should be using nfs4_schedule_state_manager(). This patch fixes the abuses of nfs4_schedule_state_recovery() by replacing its use with a set of helper functions that do the right thing. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:18:22 -05:00
Andy Adamson	c34c32ea97	NFSv4.1 reclaim complete must wait for completion Signed-off-by: Andy Adamson <andros@netapp.com> [Trond: fix whitespace errors] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-10 15:05:01 -05:00
Andy Adamson	114f64b5f2	NFSv4: remove duplicate clientid in struct nfs_client Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-10 15:05:00 -05:00
Ricardo Labiaga	7d6d63d642	NFSv4.1: Retry CREATE_SESSION on NFS4ERR_DELAY Fix bug where we currently retry the EXCHANGEID call again, eventhough we already have a valid clientid. Instead, delay and retry the CREATE_SESSION call. Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-10 15:04:59 -05:00
Frank Filz	3fa0b4e201	(try3-resend) Fix nfs_compat_user_ino64 so it doesn't cause problems if bit 31 or 63 are set in fileid The problem was use of an int32, which when converted to a uint64 is sign extended resulting in a fileid that doesn't fit in 32 bits even though the intent of the function is to fit the fileid into 32 bits. Signed-off-by: Frank Filz <ffilzlnx@us.ibm.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> [Trond: Added an include for compat.h] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-10 15:04:58 -05:00
Jovi Zhang	43b7c3f051	nfs: fix compilation warning this commit fix compilation warning as following: linux-2.6/fs/nfs/nfs4proc.c:3265: warning: comparison of distinct pointer types lacks a cast Signed-off-by: Jovi Zhang <bookjovi@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-10 15:04:56 -05:00
Stanislav Fomichev	b9f810570d	nfs: add kmalloc return value check in decode_and_add_ds add kmalloc return value check in decode_and_add_ds Signed-off-by: Stanislav Fomichev <kernel@fomichev.me> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-10 15:04:55 -05:00
Jeff Layton	d2224e7afb	nfs: close NFSv4 COMMIT vs. CLOSE race I've been adding in more artificial delays in the NFSv4 commit and close codepaths to uncover races. The kernel I'm testing has the patch to close the race in __rpc_wait_for_completion_task that's in Trond's cthon2011 branch. The reproducer I've been using does this in a loop: mkdir("DIR"); fd = open("DIR/FILE", O_WRONLY\|O_CREAT\|O_EXCL, 0644); write(fd, "abcdefg", 7); close(fd); unlink("DIR/FILE"); rmdir("DIR"); The above reproducer shouldn't result in any silly-renaming. However, when I add a "msleep(100)" just after the nfs_commit_clear_lock call in nfs_commit_release, I can almost always force one to occur. If I can force it to occur with that, then it can happen without that delay given the right timing. nfs_commit_inode waits for the NFS_INO_COMMIT bit to clear when called with FLUSH_SYNC set. nfs_commit_rpcsetup on the other hand does not wait for the task to complete before putting its reference to it, so the last reference get put in rpc_release task and gets queued to a workqueue. In this situation, the last open context reference may be put by the COMMIT release instead of the close() syscall. The close() syscall returns too quickly and the unlink runs while the d_count is still high since the COMMIT release hasn't put its dentry reference yet. Fix this by having rpc_commit_rpcsetup wait for the RPC call to complete before putting the task reference when FLUSH_SYNC is set. With this, the last reference is put by the process that's initiating the FLUSH_SYNC commit and the race is closed. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-10 15:04:53 -05:00
Trond Myklebust	bf294b41ce	SUNRPC: Close a race in __rpc_wait_for_completion_task() Although they run as rpciod background tasks, under normal operation (i.e. no SIGKILL), functions like nfs_sillyrename(), nfs4_proc_unlck() and nfs4_do_close() want to be fully synchronous. This means that when we exit, we want all references to the rpc_task to be gone, and we want any dentry references etc. held by that task to be released. For this reason these functions call __rpc_wait_for_completion_task(), followed by rpc_put_task() in the expectation that the latter will be releasing the last reference to the rpc_task, and thus ensuring that the callback_ops->rpc_release() has been called synchronously. This patch fixes a race which exists due to the fact that rpciod calls rpc_complete_task() (in order to wake up the callers of __rpc_wait_for_completion_task()) and then subsequently calls rpc_put_task() without ensuring that these two steps are done atomically. In order to avoid adding new spin locks, the patch uses the existing waitqueue spin lock to order the rpc_task reference count releases between the waiting process and rpciod. The common case where nobody is waiting for completion is optimised for by checking if the RPC_TASK_ASYNC flag is cleared and/or if the rpc_task reference count is 1: in those cases we drop trying to grab the spin lock, and immediately free up the rpc_task. Those few processes that need to put the rpc_task from inside an asynchronous context and that do not care about ordering are given a new helper: rpc_put_task_async(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-10 15:04:52 -05:00
Linus Torvalds	fb62c00a6d	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: no .snap inside of snapped namespace libceph: fix msgr standby handling libceph: fix msgr keepalive flag libceph: fix msgr backoff libceph: retry after authorization failure libceph: fix handling of short returns from get_user_pages ceph: do not clear I_COMPLETE from d_release ceph: do not set I_COMPLETE Revert "ceph: keep reference to parent inode on ceph_dentry"	2011-03-05 10:43:22 -08:00
Neil Horman	e9e3d724e2	nfs4: Ensure that ACL pages sent over NFS were not allocated from the slab (v3) The "bad_page()" page allocator sanity check was reported recently (call chain as follows): bad_page+0x69/0x91 free_hot_cold_page+0x81/0x144 skb_release_data+0x5f/0x98 __kfree_skb+0x11/0x1a tcp_ack+0x6a3/0x1868 tcp_rcv_established+0x7a6/0x8b9 tcp_v4_do_rcv+0x2a/0x2fa tcp_v4_rcv+0x9a2/0x9f6 do_timer+0x2df/0x52c ip_local_deliver+0x19d/0x263 ip_rcv+0x539/0x57c netif_receive_skb+0x470/0x49f :virtio_net:virtnet_poll+0x46b/0x5c5 net_rx_action+0xac/0x1b3 __do_softirq+0x89/0x133 call_softirq+0x1c/0x28 do_softirq+0x2c/0x7d do_IRQ+0xec/0xf5 default_idle+0x0/0x50 ret_from_intr+0x0/0xa default_idle+0x29/0x50 cpu_idle+0x95/0xb8 start_kernel+0x220/0x225 _sinittext+0x22f/0x236 It occurs because an skb with a fraglist was freed from the tcp retransmit queue when it was acked, but a page on that fraglist had PG_Slab set (indicating it was allocated from the Slab allocator (which means the free path above can't safely free it via put_page. We tracked this back to an nfsv4 setacl operation, in which the nfs code attempted to fill convert the passed in buffer to an array of pages in __nfs4_proc_set_acl, which gets used by the skb->frags list in xs_sendpages. __nfs4_proc_set_acl just converts each page in the buffer to a page struct via virt_to_page, but the vfs allocates the buffer via kmalloc, meaning the PG_slab bit is set. We can't create a buffer with kmalloc and free it later in the tcp ack path with put_page, so we need to either: 1) ensure that when we create the list of pages, no page struct has PG_Slab set or 2) not use a page list to send this data Given that these buffers can be multiple pages and arbitrarily sized, I think (1) is the right way to go. I've written the below patch to allocate a page from the buddy allocator directly and copy the data over to it. This ensures that we have a put_page free-able page for every entry that winds up on an skb frag list, so it can be safely freed when the frame is acked. We do a put page on each entry after the rpc_call_sync call so as to drop our own reference count to the page, leaving only the ref count taken by tcp_sendpages. This way the data will be properly freed when the ack comes in Successfully tested by myself to solve the above oops. Note, as this is the result of a setacl operation that exceeded a page of data, I think this amounts to a local DOS triggerable by an uprivlidged user, so I'm CCing security on this as well. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Trond Myklebust <Trond.Myklebust@netapp.com> CC: security@kernel.org CC: Jeff Layton <jlayton@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-03-04 17:28:52 -08:00
Sage Weil	455cec0abf	ceph: no .snap inside of snapped namespace Otherwise you can do things like # mkdir .snap/foo # cd .snap/foo/.snap # ls <badness> Signed-off-by: Sage Weil <sage@newdream.net>	2011-03-04 12:25:09 -08:00
Linus Torvalds	8336026942	Merge branch 'i_nlink' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'i_nlink' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: hfs: fix rename() over non-empty directory udf: fix i_nlink limit fix reiserfs mkdir() breakage exofs: i_nlink races in rename() nilfs2: i_nlink races in rename() minix: i_nlink races in rename() ufs: i_nlink races in rename() sysv: i_nlink races in rename()	2011-03-03 15:37:59 -08:00
Linus Torvalds	4c7fd114c6	Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs * 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: zero proper structure size for geometry calls	2011-03-03 12:44:22 -08:00
Linus Torvalds	c640e13f8e	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: nilfs2: fix regression that i-flag is not set on changeless checkpoints	2011-03-03 12:42:48 -08:00
Sage Weil	16a8b70a5a	ceph: do not clear I_COMPLETE from d_release First, this was racy anyway: d_release isn't called until well after the dentry is unhashed. Second, this runs afoul of the recent dcache change that clears d_parent prior to calling d_release (`949854d0`), causing a NULL pointer dereference. Signed-off-by: Sage Weil <sage@newdream.net>	2011-03-03 10:09:52 -08:00
Sage Weil	b545cc1505	ceph: do not set I_COMPLETE Do not set the I_COMPLETE flag on directories until we resolve races with dcache pruning. Signed-off-by: Sage Weil <sage@newdream.net>	2011-03-03 10:09:51 -08:00
Sage Weil	9bde178d05	Revert "ceph: keep reference to parent inode on ceph_dentry" This reverts commit `97d79b403e`. This fails to account for d_parent changes due to rename or disconnected dentries due to submounts or NFS reexports. Signed-off-by: Sage Weil <sage@newdream.net>	2011-03-03 10:09:50 -08:00
Al Viro	69102e9b4b	hfs: fix rename() over non-empty directory merge hfs_unlink() and hfs_rmdir(), while we are at it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-03 01:28:40 -05:00
Al Viro	810c1b2e48	udf: fix i_nlink limit (256 << sizeof(x)) - 1 is not the maximal possible value of x... In reality, the maximal allowed value for UDF FileLinkCount is 65535. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-03 01:28:40 -05:00
Al Viro	99890a3be1	fix reiserfs mkdir() breakage if directory has so many subdirectories that its link count is set to 1 (i.e. "can't tell accurately") and reiserfs_new_inode() fails, we shouldn't decrement the parent's link count in cleanup path; that's what DEC_DIR_INODE_NLINK() is for. As it is, we end up with parent suddenly getting zero i_nlink, with very unpleasant effects. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-03 01:28:40 -05:00
Al Viro	babfe56046	exofs: i_nlink races in rename() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-03 01:28:17 -05:00
Al Viro	30eb43d314	nilfs2: i_nlink races in rename() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-03 01:28:17 -05:00
Al Viro	6f88049caf	minix: i_nlink races in rename() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-03 01:28:16 -05:00
Al Viro	37750cdda3	ufs: i_nlink races in rename() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-03 01:28:16 -05:00
Al Viro	4787d45fa7	sysv: i_nlink races in rename() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-03 01:28:16 -05:00
Linus Torvalds	f7d222ea2a	Merge branch 'devicetree/merge' of git://git.secretlab.ca/git/linux-2.6 * 'devicetree/merge' of git://git.secretlab.ca/git/linux-2.6: of/promtree: allow DT device matching by fixing 'name' brokenness (v5) x86: OLPC: have prom_early_alloc BUG rather than return NULL of/flattree: Drop an uninteresting message to pr_debug level of: Add missing of_address.h to xilinx ehci driver	2011-03-02 20:01:57 -08:00
Paul Bolle	8aaccf7fa2	of/flattree: Drop an uninteresting message to pr_debug level This message looks like an error (which it isn't) when booting with a flattened device tree. Remove the message from normal kernel builds. Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Grant Likely <grant.likely@secretlab.ca>	2011-03-02 13:45:18 -07:00
Josh Hunt	e8a80c6f76	ext2: Fix link count corruption under heavy link+rename load vfs_rename_other() does not lock renamed inode with i_mutex. Thus changing i_nlink in a non-atomic manner (which happens in ext2_rename()) can corrupt it as reported and analyzed by Josh. In fact, there is no good reason to mess with i_nlink of the moved file. We did it presumably to simulate linking into the new directory and unlinking from an old one. But the practical effect of this is disputable because fsck can possibly treat file as being properly linked into both directories without writing any error which is confusing. So we just stop increment-decrement games with i_nlink which also fixes the corruption. CC: stable@kernel.org CC: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Josh Hunt <johunt@akamai.com> Signed-off-by: Jan Kara <jack@suse.cz>	2011-03-02 11:03:52 +01:00
Alex Elder	af24ee9ea8	xfs: zero proper structure size for geometry calls Commit `493f3358cb` added this call to xfs_fs_geometry() in order to avoid passing kernel stack data back to user space: + memset(geo, 0, sizeof(*geo)); Unfortunately, one of the callers of that function passes the address of a smaller data type, cast to fit the type that xfs_fs_geometry() requires. As a result, this can happen: Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: f87aca93 Pid: 262, comm: xfs_fsr Not tainted 2.6.38-rc6-493f3358cb2+ #1 Call Trace: [<c12991ac>] ? panic+0x50/0x150 [<c102ed71>] ? __stack_chk_fail+0x10/0x18 [<f87aca93>] ? xfs_ioc_fsgeometry_v1+0x56/0x5d [xfs] Fix this by fixing that one caller to pass the right type and then copy out the subset it is interested in. Note: This patch is an alternative to one originally proposed by Eric Sandeen. Reported-by: Jeffrey Hundstad <jeffrey.hundstad@mnsu.edu> Signed-off-by: Alex Elder <aelder@sgi.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Tested-by: Jeffrey Hundstad <jeffrey.hundstad@mnsu.edu>	2011-03-01 21:21:13 -06:00
Ryusuke Konishi	72746ac643	nilfs2: fix regression that i-flag is not set on changeless checkpoints According to the report from Jiro SEKIBA titled "regression in 2.6.37?" (Message-Id: <8739n8vs1f.wl%jir@sekiba.com>), on 2.6.37 and later kernels, lscp command no longer displays "i" flag on checkpoints that snapshot operations or garbage collection created. This is a regression of nilfs2 checkpointing function, and it's critical since it broke behavior of a part of nilfs2 applications. For instance, snapshot manager of TimeBrowse gets to create meaningless snapshots continuously; snapshot creation triggers another checkpoint, but applications cannot distinguish whether the new checkpoint contains meaningful changes or not without the i-flag. This patch fixes the regression and brings that application behavior back to normal. Reported-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Tested-by: Jiro SEKIBA <jir@unicus.jp> Cc: stable <stable@kernel.org> [2.6.37]	2011-03-02 09:55:18 +09:00
Randy Dunlap	e6eb5ce1b2	fs/block_dev.c: fix new kernel-doc warning Fix new kernel-doc warning in fs/block_dev.c: Warning(fs/block_dev.c:937): No description found for parameter 'kill_dirty' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-02-28 18:08:31 -08:00
Linus Torvalds	58da94f013	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: fix truncate after open fuse: fix hang of single threaded fuseblk filesystem	2011-02-28 17:53:04 -08:00
Linus Torvalds	158a5d61f7	Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: ocfs2: Check heartbeat mode for kernel stacks only Ocfs2/refcounttree: Fix a bug for refcounttree to writeback clusters in a right number. ocfs2: Fix estimate of necessary credits for mkdir	2011-02-28 17:52:47 -08:00
Jan Kara	7137c6bd45	aio: fix race between io_destroy() and io_submit() A race can occur when io_submit() races with io_destroy(): CPU1 CPU2 io_submit() do_io_submit() ... ctx = lookup_ioctx(ctx_id); io_destroy() Now do_io_submit() holds the last reference to ctx. ... queue new AIO put_ioctx(ctx) - frees ctx with active AIOs We solve this issue by checking whether ctx is being destroyed in AIO submission path after adding new AIO to ctx. Then we are guaranteed that either io_destroy() waits for new AIO or we see that ctx is being destroyed and bail out. Cc: Nick Piggin <npiggin@kernel.dk> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-02-25 15:07:37 -08:00
Nick Piggin	3bd9a5d734	aio: fix rcu ioctx lookup aio-dio-invalidate-failure GPFs in aio_put_req from io_submit. lookup_ioctx doesn't implement the rcu lookup pattern properly. rcu_read_lock does not prevent refcount going to zero, so we might take a refcount on a zero count ioctx. Fix the bug by atomically testing for zero refcount before incrementing. [jack@suse.cz: added comment into the code] Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Nick Piggin <npiggin@kernel.dk> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-02-25 15:07:37 -08:00
Timo Warns	294f6cf486	ldm: corrupted partition table can cause kernel oops The kernel automatically evaluates partition tables of storage devices. The code for evaluating LDM partitions (in fs/partitions/ldm.c) contains a bug that causes a kernel oops on certain corrupted LDM partitions. A kernel subsystem seems to crash, because, after the oops, the kernel no longer recognizes newly connected storage devices. The patch changes ldm_parse_vmdb() to Validate the value of vblk_size. Signed-off-by: Timo Warns <warns@pre-sense.de> Cc: Eugene Teo <eugeneteo@kernel.sg> Acked-by: Richard Russon <ldm@flatcap.org> Cc: Harvey Harrison <harvey.harrison@gmail.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-02-25 15:07:36 -08:00
Davide Libenzi	22bacca48a	epoll: prevent creating circular epoll structures In several places, an epoll fd can call another file's ->f_op->poll() method with ep->mtx held. This is in general unsafe, because that other file could itself be an epoll fd that contains the original epoll fd. The code defends against this possibility in its own ->poll() method using ep_call_nested, but there are several other unsafe calls to ->poll elsewhere that can be made to deadlock. For example, the following simple program causes the call in ep_insert recursively call the original fd's ->poll, leading to deadlock: #include <unistd.h> #include <sys/epoll.h> int main(void) { int e1, e2, p[2]; struct epoll_event evt = { .events = EPOLLIN }; e1 = epoll_create(1); e2 = epoll_create(2); pipe(p); epoll_ctl(e2, EPOLL_CTL_ADD, e1, &evt); epoll_ctl(e1, EPOLL_CTL_ADD, p[0], &evt); write(p[1], p, sizeof p); epoll_ctl(e1, EPOLL_CTL_ADD, e2, &evt); return 0; } On insertion, check whether the inserted file is itself a struct epoll, and if so, do a recursive walk to detect whether inserting this file would create a loop of epoll structures, which could lead to deadlock. [nelhage@ksplice.com: Use epmutex to serialize concurrent inserts] Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Nelson Elhage <nelhage@ksplice.com> Reported-by: Nelson Elhage <nelhage@ksplice.com> Tested-by: Nelson Elhage <nelhage@ksplice.com> Cc: <stable@kernel.org> [2.6.34+, possibly earlier] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-02-25 15:07:36 -08:00
Linus Torvalds	4660ba63f1	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: fix fiemap bugs with delalloc Btrfs: set FMODE_EXCL in btrfs_device->mode Btrfs: make btrfs_rm_device() fail gracefully Btrfs: Avoid accessing unmapped kernel address Btrfs: Fix BTRFS_IOC_SUBVOL_SETFLAGS ioctl Btrfs: allow balance to explicitly allocate chunks as it relocates Btrfs: put ENOSPC debugging under a mount option	2011-02-25 14:03:39 -08:00
Linus Torvalds	638691a7a4	Merge branch 'for-linus' of git://neil.brown.name/md * 'for-linus' of git://neil.brown.name/md: md: Fix - again - partition detection when array becomes active Fix over-zealous flush_disk when changing device size. md: avoid spinlock problem in blk_throtl_exit md: correctly handle probe of an 'mdp' device. md: don't set_capacity before array is active. md: Fix raid1->raid0 takeover	2011-02-25 11:13:26 -08:00
Anton Blanchard	f129ccc923	afs: Fix oops in afs_unlink_writeback I'm seeing the following oops when testing afs: Unable to handle kernel paging request for data at address 0x00000008 ... NIP [c0000000003393b0] .afs_unlink_writeback+0x38/0xc0 LR [c00000000033987c] .afs_put_writeback+0x98/0xec Call Trace: [c00000000345f600] [c00000000033987c] .afs_put_writeback+0x98/0xec [c00000000345f690] [c00000000033ae80] .afs_write_begin+0x6a4/0x75c [c00000000345f790] [c00000000012b77c] .generic_file_buffered_write+0x148/0x320 [c00000000345f8d0] [c00000000012e1b8] .__generic_file_aio_write+0x37c/0x3e4 [c00000000345f9d0] [c00000000012e2a8] .generic_file_aio_write+0x88/0xfc [c00000000345fa90] [c0000000003390a8] .afs_file_write+0x10c/0x178 [c00000000345fb40] [c000000000188788] .do_sync_write+0xc4/0x128 [c00000000345fcc0] [c000000000189658] .vfs_write+0xe8/0x1d8 [c00000000345fd70] [c000000000189884] .SyS_write+0x68/0xb0 [c00000000345fe30] [c000000000008564] syscall_exit+0x0/0x40 afs_write_begin hits an error and calls afs_unlink_writeback. In there we do list_del_init on an uninitialised list. The patch below initialises ->link when creating the afs_writeback struct. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-02-25 11:12:37 -08:00
Miklos Szeredi	8d56addd70	fuse: fix truncate after open Commit `e1181ee6` "vfs: pass struct file to do_truncate on O_TRUNC opens" broke the behavior of open(O_TRUNC\|O_RDONLY) in fuse. Fuse assumed that when called from open, a truncate() will be done, not an ftruncate(). Fix by restoring the old behavior, based on the ATTR_OPEN flag. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2011-02-25 14:44:58 +01:00
Miklos Szeredi	5a18ec176c	fuse: fix hang of single threaded fuseblk filesystem Single threaded NTFS-3G could get stuck if a delayed RELEASE reply triggered a DESTROY request via path_put(). Fix this by a) making RELEASE requests synchronous, whenever possible, on fuseblk filesystems b) if not possible (triggered by an asynchronous read/write) then do the path_put() in a separate thread with schedule_work(). Reported-by: Oliver Neukum <oneukum@suse.de> Cc: stable@kernel.org Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2011-02-25 14:44:58 +01:00
Tejun Heo	e7407d1619	block: bd_link_disk_holder() should hold on to holder_dir The new implementation of bd_link_disk_holder() added by `49731baa41` (block: restore multiple bd_link_disk_holder() support) didn't get an extra reference for the holder_dir kobject of the slave bdev; however, bdev kills holder_dir on removal, not release, so if the slave bdev is removed while there are holder links, the holder_dir will be destroyed while there still are holder links, which leads to oops later when bd_unlink_disk_order() tries to remove those links. Make bd_link_disk_holder() grab an extra reference for the slave's holder_dir and put it in bd_unlink_disk_holder(). Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: "Hawrylewicz Czarnowski, Przemyslaw" <przemyslaw.hawrylewicz.czarnowski@intel.com> Tested-by: "Hawrylewicz Czarnowski, Przemyslaw" <przemyslaw.hawrylewicz.czarnowski@intel.com> Cc: Neil Brown <neilb@suse.de> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-02-24 08:55:55 -08:00
J. R. Okajima	bf9faa2aa3	Unlock vfsmount_lock in do_umount By the commit `b3e19d9` 2011-01-07 fs: scale mntget/mntput vfsmount_lock was introduced around testing mnt_count. Fix the mis-typed 'unlock' Signed-off-by: J. R. Okajima <hooanon05@yahoo.co.jp> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-02-24 02:10:57 -05:00
NeilBrown	93b270f76e	Fix over-zealous flush_disk when changing device size. There are two cases when we call flush_disk. In one, the device has disappeared (check_disk_change) so any data will hold becomes irrelevant. In the oter, the device has changed size (check_disk_size_change) so data we hold may be irrelevant. In both cases it makes sense to discard any 'clean' buffers, so they will be read back from the device if needed. In the former case it makes sense to discard 'dirty' buffers as there will never be anywhere safe to write the data. In the second case it doesnot* make sense to discard dirty buffers as that will lead to file system corruption when you simply enlarge the containing devices. flush_disk calls __invalidate_devices. __invalidate_device calls both invalidate_inodes and invalidate_bdev. invalidate_inodes does discard I_DIRTY inodes and this does lead to fs corruption. invalidate_bev doesnot* discard dirty pages, but I don't really care about that at present. So this patch adds a flag to __invalidate_device (calling it __invalidate_device2) to indicate whether dirty buffers should be killed, and this is passed to invalidate_inodes which can choose to skip dirty inodes. flusk_disk then passes true from check_disk_change and false from check_disk_size_change. dm avoids tripping over this problem by calling i_size_write directly rathher than using check_disk_size_change. md does use check_disk_size_change and so is affected. This regression was introduced by commit `608aeef17a` which causes check_disk_size_change to call flush_disk, so it is suitable for any kernel since 2.6.27. Cc: stable@kernel.org Acked-by: Jeff Moyer <jmoyer@redhat.com> Cc: Andrew Patterson <andrew.patterson@hp.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: NeilBrown <neilb@suse.de>	2011-02-24 17:25:47 +11:00
Miklos Szeredi	2aa15890f3	mm: prevent concurrent unmap_mapping_range() on the same inode Michael Leun reported that running parallel opens on a fuse filesystem can trigger a "kernel BUG at mm/truncate.c:475" Gurudas Pai reported the same bug on NFS. The reason is, unmap_mapping_range() is not prepared for more than one concurrent invocation per inode. For example: thread1: going through a big range, stops in the middle of a vma and stores the restart address in vm_truncate_count. thread2: comes in with a small (e.g. single page) unmap request on the same vma, somewhere before restart_address, finds that the vma was already unmapped up to the restart address and happily returns without doing anything. Another scenario would be two big unmap requests, both having to restart the unmapping and each one setting vm_truncate_count to its own value. This could go on forever without any of them being able to finish. Truncate and hole punching already serialize with i_mutex. Other callers of unmap_mapping_range() do not, and it's difficult to get i_mutex protection for all callers. In particular ->d_revalidate(), which calls invalidate_inode_pages2_range() in fuse, may be called with or without i_mutex. This patch adds a new mutex to 'struct address_space' to prevent running multiple concurrent unmap_mapping_range() on the same mapping. [ We'll hopefully get rid of all this with the upcoming mm preemptibility series by Peter Zijlstra, the "mm: Remove i_mmap_mutex lockbreak" patch in particular. But that is for 2.6.39 ] Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reported-by: Michael Leun <lkml20101129@newton.leun.net> Reported-by: Gurudas Pai <gurudas.pai@oracle.com> Tested-by: Gurudas Pai <gurudas.pai@oracle.com> Acked-by: Hugh Dickins <hughd@google.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-02-23 19:52:52 -08:00
Chris Mason	ec29ed5b40	Btrfs: fix fiemap bugs with delalloc The Btrfs fiemap code wasn't properly returning delalloc extents, so applications that trust fiemap to decide if there are holes in the file see holes instead of delalloc. This reworks the btrfs fiemap code, adding a get_extent helper that searches for delalloc ranges and also adding a helper for extent_fiemap that skips past holes in the file. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-02-23 16:23:20 -05:00
Lukas Czerner	be715140b5	xfs: check if device support discard in xfs_ioc_trim() Right now we, are relying on the fact that when we attempt to actually do the discard, blkdev_issue_discar() returns -EOPNOTSUPP and the user is informed that the device does not support discard. However, in the case where the we do not hit any suitable free extent to trim in FITRIM code, it will finish without any error. This is very confusing, because it seems that FITRIM was successful even though the device does not actually supports discard. Solution: Check for the discard support before attempt to search for free extents. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-02-22 15:08:44 -06:00
Dan Rosenberg	3a3675b7f2	xfs: prevent leaking uninitialized stack memory in FSGEOMETRY_V1 The FSGEOMETRY_V1 ioctl (and its compat equivalent) calls out to xfs_fs_geometry() with a version number of 3. This code path does not fill in the logsunit member of the passed xfs_fsop_geom_t, leading to the leaking of four bytes of uninitialized stack data to potentially unprivileged callers. v2 switches to memset() to avoid future issues if structure members change, on suggestion of Dave Chinner. Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com> Reviewed-by: Eugene Teo <eugeneteo@kernel.org> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-02-22 15:06:47 -06:00
Linus Torvalds	3b71710f08	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6: eCryptfs: Copy up lower inode attrs in getattr ecryptfs: read on a directory should return EISDIR if not supported eCryptfs: Handle NULL nameidata pointers eCryptfs: Revert "dont call lookup_one_len to avoid NULL nameidata"	2011-02-21 17:25:00 -08:00
Randy Dunlap	361821854b	Docbook: add fs/eventfd.c and fix typos in it Add fs/eventfd.c to filesystems docbook. Make typo corrections in fs/eventfd.c. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-02-21 15:07:04 -08:00
Linus Torvalds	8bd89ca220	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: keep reference to parent inode on ceph_dentry ceph: queue cap_snaps once per realm libceph: fix socket write error handling libceph: fix socket read error handling	2011-02-21 15:01:38 -08:00
Linus Torvalds	b4f5c46245	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: [CIFS] update cifs version cifs: Fix regression in LANMAN (LM) auth code cifs: fix handling of scopeid in cifs_convert_address	2011-02-21 14:57:39 -08:00
Steve French	eed9e8307e	[CIFS] update cifs version Update version to 1.71 so we can more easily spot modules with the last two fixes Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-02-21 22:31:47 +00:00
Shirish Pargaonkar	5e640927a5	cifs: Fix regression in LANMAN (LM) auth code LANMAN response length was changed to 16 bytes instead of 24 bytes. Revert it back to 24 bytes. Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> CC: stable@kernel.org Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-02-21 21:53:30 +00:00
Tyler Hicks	55f9cf6bba	eCryptfs: Copy up lower inode attrs in getattr The lower filesystem may do some type of inode revalidation during a getattr call. eCryptfs should take advantage of that by copying the lower inode attributes to the eCryptfs inode after a call to vfs_getattr() on the lower inode. I originally wrote this fix while working on eCryptfs on nfsv3 support, but discovered it also fixed an eCryptfs on ext4 nanosecond timestamp bug that was reported. https://bugs.launchpad.net/bugs/613873 Cc: <stable@kernel.org> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>	2011-02-21 14:46:36 -06:00
Andy Whitcroft	323ef68faf	ecryptfs: read on a directory should return EISDIR if not supported read() calls against a file descriptor connected to a directory are incorrectly returning EINVAL rather than EISDIR: [EISDIR] [XSI] [Option Start] The fildes argument refers to a directory and the implementation does not allow the directory to be read using read() or pread(). The readdir() function should be used instead. [Option End] This occurs because we do not have a .read operation defined for ecryptfs directories. Connect this up to generic_read_dir(). BugLink: http://bugs.launchpad.net/bugs/719691 Signed-off-by: Andy Whitcroft <apw@canonical.com> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>	2011-02-21 14:46:36 -06:00
Tyler Hicks	70b8902199	eCryptfs: Handle NULL nameidata pointers Allow for NULL nameidata pointers in eCryptfs create, lookup, and d_revalidate functions. Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>	2011-02-21 14:45:57 -06:00
Mark Fasheh	52c303c56c	ocfs2: Check heartbeat mode for kernel stacks only Commit `2c442719e9` added some checks for proper heartbeat mode when the o2cb stack is running. Unfortunately, it didn't take into account that a userpsace stack could be running. Fix this by only doing the check if o2cb is in use. This patch allows userspace stacks to mount the fs again. Cc: stable@kernel.org Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <jlbec@evilplan.org>	2011-02-20 02:36:28 -08:00
Tristan Ye	acf3bb007e	Ocfs2/refcounttree: Fix a bug for refcounttree to writeback clusters in a right number. Current refcounttree codes actually didn't writeback the new pages out in write-back mode, due to a bug of always passing a ZERO number of clusters to 'ocfs2_cow_sync_writeback', the patch tries to pass a proper one in. Signed-off-by: Tristan Ye <tristan.ye@oracle.com> Cc: stable@kernel.org Signed-off-by: Joel Becker <jlbec@evilplan.org>	2011-02-20 02:36:12 -08:00
Jan Kara	705773a665	ocfs2: Fix estimate of necessary credits for mkdir In the rare case that INLINE_DATA, INDEX_DIR, QUOTA, XATTR features are disabled and both the allocation of the directory inode and the allocation of the first directory block need to relink allocation group, there need not be enough credits reserved in a transaction. Fix the estimate. CC: Mark Fasheh <mfasheh@suse.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <jlbec@evilplan.org>	2011-02-20 02:33:32 -08:00
Yehuda Sadeh	97d79b403e	ceph: keep reference to parent inode on ceph_dentry When creating a new dentry we now hold a reference to the parent inode in the ceph_dentry. This is required due to the new RCU changes from `949854d0`, which set dentry->d_parent to NULL in d_kill before calling the ->release() callback. If/when that behavior is changed, we can revert this hack. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>	2011-02-19 19:59:14 -08:00
Linus Torvalds	bc3adfc670	Merge branch 'fixes-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq * 'fixes-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: make sure MAYDAY_INITIAL_TIMEOUT is at least 2 jiffies long workqueue, freezer: unify spelling of 'freeze' + 'able' to 'freezable' workqueue: wake up a worker when a rescuer is leaving a gcwq	2011-02-18 12:36:06 -08:00
Tyler Hicks	8787c7a3e0	eCryptfs: Revert "dont call lookup_one_len to avoid NULL nameidata" This reverts commit `21edad3220` and commit `93c3fe40c2`, which fixed a regression by the former. Al Viro pointed out bypassed dcache lookups in ecryptfs_new_lower_dentry(), misuse of vfs_path_lookup() in ecryptfs_lookup_one_lower() and a dislike of passing nameidata to the lower filesystem. Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>	2011-02-17 20:30:29 -06:00
Timo Warns	fa7ea87a05	fs/partitions: Validate map_count in Mac partition tables Validate number of blocks in map and remove redundant variable. Signed-off-by: Timo Warns <warns@pre-sense.de> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-02-17 17:50:51 -08:00
Linus Torvalds	ee71508702	Merge branch 'for-2.6.38' of git://linux-nfs.org/~bfields/linux * 'for-2.6.38' of git://linux-nfs.org/~bfields/linux: nfsd: correctly handle return value from nfsd_map_name_to_*	2011-02-16 21:53:41 -08:00
Jeff Layton	9616125611	cifs: fix handling of scopeid in cifs_convert_address The code finds, the '%' sign in an ipv6 address and copies that to a buffer allocated on the stack. It then ignores that buffer, and passes 'pct' to simple_strtoul(), which doesn't work right because we're comparing 'endp' against a completely different string. Fix it by passing the correct pointer. While we're at it, this is a good candidate for conversion to strict_strtoul as well. Cc: stable@kernel.org Cc: David Howells <dhowells@redhat.com> Reported-by: BjÃ¶rn JACKE <bj@sernet.de> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2011-02-17 05:35:33 +00:00
Chuck Ebbert	e51900f7d3	block: revert block_dev read-only check This reverts commit `75f1dc0d07` ("block: check bdev_read_only() from blkdev_get()"). That commit added stricter checking to make sure devices that were being used read-only were actually opened in that mode. It turns out that the change breaks a bunch of kernel code that opens block devices. Affected systems include dm, md, and the loop device. Because strict checking for read-only opens of block devices was not done before this, the code that opens the devices was opening them read-write even if they were being used read-only. Auditing all that code will take time, and new userspace packages for dm, mdadm, etc. will also be required. Signed-off-by: Chuck Ebbert <cebbert@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-02-16 16:48:13 -08:00
NeilBrown	47c85291d3	nfsd: correctly handle return value from nfsd_map_name_to_* These functions return an nfs status, not a host_err. So don't try to convert before returning. This is a regression introduced by 3c726023402a2f3b28f49b9d90ebf9e71151157d; I fixed up two of the callers, but missed these two. Cc: stable@kernel.org Reported-by: Herbert Poetzl <herbert@13thfloor.at> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-02-16 18:31:05 -05:00
Ilya Dryomov	fb01aa85b8	Btrfs: set FMODE_EXCL in btrfs_device->mode This fixes a bug introduced in `d4d77629`, where the device added online (and therefore initialized via btrfs_init_new_device()) would be left with the positive bdev->bd_holders after unmount. Since `d4d77629` we no longer OR FMODE_EXCL explicitly on blkdev_put(), set it in btrfs_device->mode. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-02-16 16:34:00 -05:00
Ilya Dryomov	9b3517e913	Btrfs: make btrfs_rm_device() fail gracefully If shrinking done as part of the online device removal fails add that device back to the allocation list and increment the rw_devices counter. This fixes two bugs: 1) we could have a perfectly good device out of alloc list for no good reason; 2) in the btrfs consisting of two devices, failure in btrfs_rm_device() could lead to a situation where it was impossible to remove any of the devices because of the "unable to remove the only writeable device" error. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-02-16 15:37:59 -05:00
Li Zefan	ca9b688c1c	Btrfs: Avoid accessing unmapped kernel address When decompressing a chunk of data, we'll copy the data out to a working buffer if the data is stored in more than one page, otherwise we'll use the mapped page directly to avoid memory copy. In the latter case, we'll end up accessing the kernel address after we've unmapped the page in a corner case. Reported-by: Juan Francisco Cantero Hurtado <iam@juanfra.info> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-02-16 15:37:58 -05:00
Li Zefan	b4dc2b8c69	Btrfs: Fix BTRFS_IOC_SUBVOL_SETFLAGS ioctl - Check user-specified flags correctly - Check the inode owership - Search root item in root tree but not fs tree Reported-by: Dan Rosenberg <drosenberg@vsecurity.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-02-16 15:37:58 -05:00
Chris Mason	c87f08ca44	Btrfs: allow balance to explicitly allocate chunks as it relocates Btrfs device shrinking and balancing ends up reallocating all the blocks in order to allow COW to move them to new destinations. It is somewhat awkward in terms of ENOSPC because most of the enospc code is built around the idea that some operation on a reference counted tree triggers allocations in the non-reference counted trees. This commit changes the balancing code to deal with enospc by trying to allocate a new chunk. If that allocation succeeds, we go ahead and retry whatever failed due to enospc. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-02-16 15:28:47 -05:00
Chris Mason	91435650c2	Btrfs: put ENOSPC debugging under a mount option ENOSPC in btrfs is getting to the point where the extra debugging isn't required. I've put it under mount -o enospc_debug just in case someone is having difficult problems. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-02-16 15:28:36 -05:00
Linus Torvalds	3abb17e82f	vfs: fix BUG_ON() in fs/namei.c:1461 When Al moved the nameidata_dentry_drop_rcu_maybe() call into the do_follow_link function in commit `844a391799` ("nothing in do_follow_link() is going to see RCU"), he mistakenly left the BUG_ON(inode != path->dentry->d_inode); behind. Which would otherwise be ok, but that BUG_ON() really needs to be _after_ dropping RCU, since the dentry isn't necessarily stable otherwise. So complete the code movement in that commit, and move the BUG_ON() into do_follow_link() too. This means that we need to pass in 'inode' as an argument (just for this one use), but that's a small thing. And eventually we may be confident enough in our path lookup that we can just remove the BUG_ON() and the unnecessary inode argument. Reported-and-tested-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-02-16 08:56:55 -08:00
Tejun Heo	58a69cb47e	workqueue, freezer: unify spelling of 'freeze' + 'able' to 'freezable' There are two spellings in use for 'freeze' + 'able' - 'freezable' and 'freezeable'. The former is the more prominent one. The latter is mostly used by workqueue and in a few other odd places. Unify the spelling to 'freezable'. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Alan Stern <stern@rowland.harvard.edu> Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Acked-by: Dmitry Torokhov <dtor@mail.ru> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Alex Dubov <oakad@yahoo.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Steven Whitehouse <swhiteho@redhat.com>	2011-02-16 17:48:59 +01:00
Linus Torvalds	f60c153d50	Merge branch 'for-2.6.38' of git://linux-nfs.org/~bfields/linux * 'for-2.6.38' of git://linux-nfs.org/~bfields/linux: nfsd: break lease on unlink due to rename nfsd4: acquire only one lease per file nfsd4: modify fi_delegations under recall_lock nfsd4: remove unused deleg dprintk's. nfsd4: split lease setting into separate function nfsd4: fix leak on allocation error nfsd4: add helper function for lease setup nfsd4: split up nfsd_break_deleg_cb NFSD: memory corruption due to writing beyond the stat array NFSD: use nfserr for status after decode_cb_op_status nfsd: don't leak dentry count on mnt_want_write failure	2011-02-15 12:06:38 -08:00
Linus Torvalds	055d219441	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: get rid of nameidata_dentry_drop_rcu() calling nameidata_drop_rcu() drop out of RCU in return_reval split do_revalidate() into RCU and non-RCU cases in do_lookup() split RCU and non-RCU cases of need_revalidate nothing in do_follow_link() is going to see RCU	2011-02-15 08:06:36 -08:00
Linus Torvalds	007a14af26	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: check return value of alloc_extent_map() Btrfs - Fix memory leak in btrfs_init_new_device() btrfs: prevent heap corruption in btrfs_ioctl_space_info() Btrfs: Fix balance panic Btrfs: don't release pages when we can't clear the uptodate bits Btrfs: fix page->private races	2011-02-15 08:00:35 -08:00
Martin Schwidefsky	261cd298a8	s390: remove task_show_regs task_show_regs used to be a debugging aid in the early bringup days of Linux on s390. /proc/<pid>/status is a world readable file, it is not a good idea to show the registers of a process. The only correct fix is to remove task_show_regs. Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-02-15 07:34:16 -08:00
Al Viro	4e924a4f53	get rid of nameidata_dentry_drop_rcu() calling nameidata_drop_rcu() can't happen anymore and didn't work right anyway Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-02-15 02:26:54 -05:00
Al Viro	f60aef7ec6	drop out of RCU in return_reval ... thus killing the need to handle drop-from-RCU in d_revalidate() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-02-15 02:26:54 -05:00
Al Viro	f5e1c1c1af	split do_revalidate() into RCU and non-RCU cases fixing oopsen in lookup_one_len() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-02-15 02:26:54 -05:00
Al Viro	24643087e7	in do_lookup() split RCU and non-RCU cases of need_revalidate and use unlikely() instead of gotos, for fsck sake... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-02-15 02:26:54 -05:00
Al Viro	844a391799	nothing in do_follow_link() is going to see RCU Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-02-15 02:26:53 -05:00
Tsutomu Itoh	c26a920373	Btrfs: check return value of alloc_extent_map() I add the check on the return value of alloc_extent_map() to several places. In addition, alloc_extent_map() returns only the address or NULL. Therefore, check by IS_ERR() is unnecessary. So, I remove IS_ERR() checking. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-02-14 16:21:37 -05:00
Ilya Dryomov	67100f255d	Btrfs - Fix memory leak in btrfs_init_new_device() Memory allocated by calling kstrdup() should be freed. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-02-14 16:21:31 -05:00
Dan Rosenberg	51788b1bdd	btrfs: prevent heap corruption in btrfs_ioctl_space_info() Commit `bf5fc093c5` refactored btrfs_ioctl_space_info() and introduced several security issues. space_args.space_slots is an unsigned 64-bit type controlled by a possibly unprivileged caller. The comparison as a signed int type allows providing values that are treated as negative and cause the subsequent allocation size calculation to wrap, or be truncated to 0. By providing a size that's truncated to 0, kmalloc() will return ZERO_SIZE_PTR. It's also possible to provide a value smaller than the slot count. The subsequent loop ignores the allocation size when copying data in, resulting in a heap overflow or write to ZERO_SIZE_PTR. The fix changes the slot count type and comparison typecast to u64, which prevents truncation or signedness errors, and also ensures that we don't copy more data than we've allocated in the subsequent loop. Note that zero-size allocations are no longer possible since there is already an explicit check for space_args.space_slots being 0 and truncation of this value is no longer an issue. Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com> Signed-off-by: Josef Bacik <josef@redhat.com> Reviewed-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-02-14 16:04:23 -05:00
Yan, Zheng	6848ad6461	Btrfs: Fix balance panic Mark the cloned backref_node as checked in clone_backref_node() Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-02-14 16:00:03 -05:00
Chris Mason	e3f24cc521	Btrfs: don't release pages when we can't clear the uptodate bits Btrfs tracks uptodate state in an rbtree as well as in the page bits. This is supposed to enable us to use block sizes other than the page size, but there are a few parts still missing before that completely works. But, our readpage routine trusts this additional range based tracking of uptodateness, much in the same way the buffer head up to date bits are trusted for the other filesystems. The problem is that sometimes we need to allocate memory in order to split records in the rbtree, even when we are just clearing bits. This can be difficult when our clearing function is called GFP_ATOMIC, which can happen in the releasepage path. So, what happens today looks like this: releasepage called with GFP_ATOMIC btrfs_releasepage calls clear_extent_bit clear_extent_bit fails to allocate ram, leaving the up to date bit set btrfs_releasepage returns success The end result is the page being gone, but btrfs thinking the range is up to date. Later on if someone tries to read that same page, the btrfs readpage code will return immediately thinking the page is already up to date. This commit fixes things to fail the releasepage when we can't clear the extent state bits. It covers both data pages and metadata tree blocks. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-02-14 13:04:01 -05:00
Chris Mason	eb14ab8ed2	Btrfs: fix page->private races There is a race where btrfs_releasepage can drop the page->private contents just as alloc_extent_buffer is setting up pages for metadata. Because of how the Btrfs page flags work, this results in us skipping the crc on the page during IO. This patch sovles the race by waiting until after the extent buffer is inserted into the radix tree before it sets page private. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2011-02-14 13:03:52 -05:00

1 2 3 4 5 ...

21530 Commits