2
0
mirror of https://github.com/edk2-porting/linux-next.git synced 2024-12-28 07:04:00 +08:00
Commit Graph

36035 Commits

Author SHA1 Message Date
Miao Xie
21c7e75654 Btrfs: reclaim the reserved metadata space at background
Before applying this patch, the task had to reclaim the metadata space
by itself if the metadata space was not enough. And When the task started
the space reclamation, all the other tasks which wanted to reserve the
metadata space were blocked. At some cases, they would be blocked for
a long time, it made the performance fluctuate wildly.

So we introduce the background metadata space reclamation, when the space
is about to be exhausted, we insert a reclaim work into the workqueue, the
worker of the workqueue helps us to reclaim the reserved space at the
background. By this way, the tasks needn't reclaim the space by themselves at
most cases, and even if the tasks have to reclaim the space or are blocked
for the space reclamation, they will get enough space more quickly.

Here is my test result(Tested by compilebench):
 Memory:	2GB
 CPU:		2Cores * 1CPU
 Partition:	40GB(SSD)

Test command:
 # compilebench -D <mnt> -m

Without this patch:
 intial create total runs 30 avg 54.36 MB/s (user 0.52s sys 2.44s)
 compile total runs 30 avg 123.72 MB/s (user 0.13s sys 1.17s)
 read compiled tree total runs 3 avg 81.15 MB/s (user 0.74s sys 4.89s)
 delete compiled tree total runs 30 avg 5.32 seconds (user 0.35s sys 4.37s)

With this patch:
 intial create total runs 30 avg 59.80 MB/s (user 0.52s sys 2.53s)
 compile total runs 30 avg 151.44 MB/s (user 0.13s sys 1.11s)
 read compiled tree total runs 3 avg 83.25 MB/s (user 0.76s sys 4.91s)
 delete compiled tree total runs 30 avg 5.29 seconds (user 0.34s sys 4.34s)

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-06-09 17:20:34 -07:00
Miao Xie
32d6b47fe6 Btrfs: output warning instead of error when loading free space cache failed
If we fail to load a free space cache, we can rebuild it from the extent tree,
so it is not a serious error, we should not output a error message that
would make the users uncomfortable. This patch uses warning message instead
of it.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-06-09 17:20:33 -07:00
Qu Wenruo
5a1972bd9f btrfs: Add ctime/mtime update for btrfs device add/remove.
Btrfs will send uevent to udev inform the device change,
but ctime/mtime for the block device inode is not udpated, which cause
libblkid used by btrfs-progs unable to detect device change and use old
cache, causing 'btrfs dev scan; btrfs dev rmove; btrfs dev scan' give an
error message.

Reported-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
Cc: Karel Zak <kzak@redhat.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-06-09 17:20:33 -07:00
David Sterba
61155aa04e btrfs: assert that send is not in progres before root deletion
CC: Miao Xie <miaox@cn.fujitsu.com>
CC: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
2014-06-09 17:20:32 -07:00
David Sterba
521e0546c9 btrfs: protect snapshots from deleting during send
The patch "Btrfs: fix protection between send and root deletion"
(18f687d538) does not actually prevent to delete the snapshot
and just takes care during background cleaning, but this seems rather
user unfriendly, this patch implements the idea presented in

http://www.spinics.net/lists/linux-btrfs/msg30813.html

- add an internal root_item flag to denote a dead root
- check if the send_in_progress is set and refuse to delete, otherwise
  set the flag and proceed
- check the flag in send similar to the btrfs_root_readonly checks, for
  all involved roots

The root lookup in send via btrfs_read_fs_root_no_name will check if the
root is really dead or not. If it is, ENOENT, aborted send. If it's
alive, it's protected by send_in_progress, send can continue.

CC: Miao Xie <miaox@cn.fujitsu.com>
CC: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
2014-06-09 17:20:31 -07:00
Daeseok Youn
944a4515b2 btrfs: remove redundant null check in btrfs_dentry_release()
It doesn't need to check NULL for kfree()

Signed-off-by: Daeseok Youn <daeseok.youn@gmail.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-06-09 17:20:31 -07:00
Filipe Manana
ef3b9af50b Btrfs: implement inode_operations callback tmpfile
This implements the tmpfile callback of struct inode_operations, introduced
in the linux kernel 3.11, and implemented already by some filesystems. This
callback is invoked by the VFS when the flag O_TMPFILE is passed to the open
system call.

Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Chris Mason <clm@fb.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
2014-06-09 17:20:30 -07:00
David Sterba
e4ef90ff61 btrfs: make FS_INFO ioctl available to anyone
This ioctl provides basic info about the filesystem that can be obtained
in other ways (eg. sysfs), there's no reason to restrict it to
CAP_SYSADMIN.

Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
2014-06-09 17:20:29 -07:00
David Sterba
7d6213c5a7 btrfs: make DEV_INFO ioctl available to anyone
This ioctl provides basic info about the devices that can be obtained in
other ways (eg. sysfs), there's no reason to restrict it to
CAP_SYSADMIN.

Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
2014-06-09 17:20:28 -07:00
David Sterba
df93589a17 btrfs: export more from FS_INFO to sysfs
Similar to the FS_INFO updates, export the basic filesystem info through
sysfs: node size, sector size and clone alignment.

Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
2014-06-09 17:20:28 -07:00
David Sterba
80a773fbfc btrfs: retrieve more info from FS_INFO ioctl
Provide the basic information about filesystem through the ioctl:
* b-tree node size (same as leaf size)
* sector size
* expected alignment of CLONE_RANGE and EXTENT_SAME ioctl arguments

Backward compatibility: if the values are 0, kernel does not provide
this information, the applications should ignore them.

Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
2014-06-09 17:20:27 -07:00
David Sterba
7d824b6f9c btrfs: balance filter: add limit of processed chunks
This started as debugging helper, to watch the effects of converting
between raid levels on multiple devices, but could be useful standalone.

In my case the usage filter was not finegrained enough and led to
converting too many chunks at once. Another example use is in connection
with drange+devid or vrange filters that allow to work with a specific
chunk or even with a chunk on a given device.

The limit filter applies last, the value of 0 means no limiting.

CC: Ilya Dryomov <idryomov@gmail.com>
CC: Hugo Mills <hugo@carfax.org.uk>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
2014-06-09 17:20:26 -07:00
Filipe Manana
fc19c5e736 Btrfs: fix leaf corruption caused by ENOSPC while hole punching
While running a stress test with multiple threads writing to the same btrfs
file system, I ended up with a situation where a leaf was corrupted in that
it had 2 file extent item keys that had the same exact key. I was able to
detect this quickly thanks to the following patch which triggers an assertion
as soon as a leaf is marked dirty if there are duplicated keys or out of order
keys:

    Btrfs: check if items are ordered when a leaf is marked dirty
    (https://patchwork.kernel.org/patch/3955431/)

Basically while running the test, I got the following in dmesg:

    [28877.415877] WARNING: CPU: 2 PID: 10706 at fs/btrfs/file.c:553 btrfs_drop_extent_cache+0x435/0x440 [btrfs]()
    (...)
    [28877.415917] Call Trace:
    [28877.415922]  [<ffffffff816f1189>] dump_stack+0x4e/0x68
    [28877.415926]  [<ffffffff8104a32c>] warn_slowpath_common+0x8c/0xc0
    [28877.415929]  [<ffffffff8104a37a>] warn_slowpath_null+0x1a/0x20
    [28877.415944]  [<ffffffffa03775a5>] btrfs_drop_extent_cache+0x435/0x440 [btrfs]
    [28877.415949]  [<ffffffff8118e7be>] ? kmem_cache_alloc+0xfe/0x1c0
    [28877.415962]  [<ffffffffa03777d9>] fill_holes+0x229/0x3e0 [btrfs]
    [28877.415972]  [<ffffffffa0345865>] ? block_rsv_add_bytes+0x55/0x80 [btrfs]
    [28877.415984]  [<ffffffffa03792cb>] btrfs_fallocate+0xb6b/0xc20 [btrfs]
    (...)
    [29854.132560] BTRFS critical (device sdc): corrupt leaf, bad key order: block=955232256,root=1, slot=24
    [29854.132565] BTRFS info (device sdc): leaf 955232256 total ptrs 40 free space 778
    (...)
    [29854.132637] 	item 23 key (3486 108 667648) itemoff 2694 itemsize 53
    [29854.132638] 		extent data disk bytenr 14574411776 nr 286720
    [29854.132639] 		extent data offset 0 nr 286720 ram 286720
    [29854.132640] 	item 24 key (3486 108 954368) itemoff 2641 itemsize 53
    [29854.132641] 		extent data disk bytenr 0 nr 0
    [29854.132643] 		extent data offset 0 nr 0 ram 0
    [29854.132644] 	item 25 key (3486 108 954368) itemoff 2588 itemsize 53
    [29854.132645] 		extent data disk bytenr 8699670528 nr 77824
    [29854.132646] 		extent data offset 0 nr 77824 ram 77824
    [29854.132647] 	item 26 key (3486 108 1146880) itemoff 2535 itemsize 53
    [29854.132648] 		extent data disk bytenr 8699670528 nr 77824
    [29854.132649] 		extent data offset 0 nr 77824 ram 77824
    (...)
    [29854.132707] kernel BUG at fs/btrfs/ctree.h:3901!
    (...)
    [29854.132771] Call Trace:
    [29854.132779]  [<ffffffffa0342b5c>] setup_items_for_insert+0x2dc/0x400 [btrfs]
    [29854.132791]  [<ffffffffa0378537>] __btrfs_drop_extents+0xba7/0xdd0 [btrfs]
    [29854.132794]  [<ffffffff8109c0d6>] ? trace_hardirqs_on_caller+0x16/0x1d0
    [29854.132797]  [<ffffffff8109c29d>] ? trace_hardirqs_on+0xd/0x10
    [29854.132800]  [<ffffffff8118e7be>] ? kmem_cache_alloc+0xfe/0x1c0
    [29854.132810]  [<ffffffffa036783b>] insert_reserved_file_extent.constprop.66+0xab/0x310 [btrfs]
    [29854.132820]  [<ffffffffa036a6c6>] __btrfs_prealloc_file_range+0x116/0x340 [btrfs]
    [29854.132830]  [<ffffffffa0374d53>] btrfs_prealloc_file_range+0x23/0x30 [btrfs]
    (...)

So this is caused by getting an -ENOSPC error while punching a file hole, more
specifically, we get -ENOSPC error from __btrfs_drop_extents in the while loop
of file.c:btrfs_punch_hole() when it's unable to modify the btree to delete one
or more file extent items due to lack of enough free space. When this happens,
in btrfs_punch_hole(), we attempt to reclaim free space by switching our transaction
block reservation object to root->fs_info->trans_block_rsv, end our transaction and
start a new transaction basically - and, we keep increasing our current offset
(cur_offset) as long as it's smaller than the end of the target range (lockend) -
this makes use leave the loop with cur_offset == drop_end which in turn makes us
call fill_holes() for inserting a file extent item that represents a 0 bytes range
hole (and this insertion succeeds, as in the meanwhile more space became available).

This 0 bytes file hole extent item is a problem because any subsequent caller of
__btrfs_drop_extents (regular file writes, or fallocate calls for e.g.), with a
start file offset that is equal to the offset of the hole, will not remove this
extent item due to the following conditional in the while loop of
__btrfs_drop_extents:

    if (extent_end <= search_start) {
            path->slots[0]++;
            goto next_slot;
    }

This later makes the call to setup_items_for_insert() (at the very end of
__btrfs_drop_extents), insert a new file extent item with the same offset as
the 0 bytes file hole extent item that follows it. Needless is to say that this
causes chaos, either when reading the leaf from disk (btree_readpage_end_io_hook),
where we perform leaf sanity checks or in subsequent operations that manipulate
file extent items, as in the fallocate call as shown by the dmesg trace above.

Without my other patch to perform the leaf sanity checks once a leaf is marked
as dirty (if the integrity checker is enabled), it would have been much harder
to debug this issue.

This change might fix a few similar issues reported by users in the mailing
list regarding assertion failures in btrfs_set_item_key_safe calls performed
by __btrfs_drop_extents, such as the following report:

    http://comments.gmane.org/gmane.comp.file-systems.btrfs/32938

Asking fill_holes() to create a 0 bytes wide file hole item also produced the
first warning in the trace above, as we passed a range to btrfs_drop_extent_cache
that has an end smaller (by -1) than its start.

On 3.14 kernels this issue manifests itself through leaf corruption, as we get
duplicated file extent item keys in a leaf when calling setup_items_for_insert(),
but on older kernels, setup_items_for_insert() isn't called by __btrfs_drop_extents(),
instead we have callers of __btrfs_drop_extents(), namely the functions
inode.c:insert_inline_extent() and inode.c:insert_reserved_file_extent(), calling
btrfs_insert_empty_item() to insert the new file extent item, which would fail with
error -EEXIST, instead of inserting a duplicated key - which is still a serious
issue as it would make all similar file extent item replace operations keep
failing if they target the same file range.

Cc: stable@vger.kernel.org
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-06-09 17:20:26 -07:00
Liu Bo
d2cbf2a260 Btrfs: do not increment on bio_index one by one
'bio_index' is just a index, it's really not necessary to do increment
one by one.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
2014-06-09 17:20:25 -07:00
Filipe Manana
a1a50f60a6 Btrfs: read inode size after acquiring the mutex when punching a hole
In a previous change, commit 12870f1c9b,
I accidentally moved the roundup of inode->i_size to outside of the
critical section delimited by the inode mutex, which is not atomic and
not correct since the size can be changed by other task before we acquire
the mutex. Therefore fix it.

Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-06-09 17:20:24 -07:00
Tobias Klauser
7fb18a0664 btrfs: Remove unnecessary check for NULL
iput() already checks for the inode being NULL, thus it's unnecessary to
check before calling.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Chris Mason <clm@fb.com>
2014-06-09 17:20:23 -07:00
Zach Brown
166ae5a418 btrfs: fix inline compressed read err corruption
uncompress_inline() is dropping the error from btrfs_decompress() after
testing it and zeroing the page that was supposed to hold decompressed
data.  This can silently turn compressed inline data in to zeros if
decompression fails due to corrupt compressed data or memory allocation
failure.

I verified this by manually forcing the error from btrfs_decompress()
for a silly named copy of od:

	if (!strcmp(current->comm, "failod"))
		ret = -ENOMEM;

  # od -x /mnt/btrfs/dir/80 | head -1
  0000000 3031 3038 310a 2d30 6f70 6e69 0a74 3031
  # echo 3 > /proc/sys/vm/drop_caches
  # cp $(which od) /tmp/failod
  # /tmp/failod -x /mnt/btrfs/dir/80 | head -1
  0000000 0000 0000 0000 0000 0000 0000 0000 0000

The fix is to pass the error to its caller.  Which still has a BUG_ON().
So we fix that too.

There seems to be no reason for the zeroing of the page on the error
from btrfs_decompress() but not from the allocation error a few lines
above.  So the page zeroing is removed.

Signed-off-by: Zach Brown <zab@redhat.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
2014-06-09 17:20:23 -07:00
Zach Brown
774bcb35f0 btrfs: return ptr error from compression workspace
The btrfs compression wrappers translated errors from workspace
allocation to either -ENOMEM or -1.  The compression type workspace
allocators are already returning a ERR_PTR(-ENOMEM).  Just return that
and get rid of the magical -1.

This helps a future patch return errors from the compression wrappers.

Signed-off-by: Zach Brown <zab@redhat.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
2014-06-09 17:20:22 -07:00
Zach Brown
60e1975acb btrfs: return errno instead of -1 from compression
The compression layer seems to have been built to return -1 and have
callers make up errors that make sense.  This isn't great because there
are different errors that originate down in the compression layer.

Let's return real negative errnos from the compression layer so that
callers can pass on the error without having to guess what happened.
ENOMEM for allocation failure, E2BIG when compression exceeds the
uncompressed input, and EIO for everything else.

This helps a future path return errors from btrfs_decompress().

Signed-off-by: Zach Brown <zab@redhat.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-06-09 17:20:21 -07:00
Stefan Behrens
98806b446d btrfs: check_int: propagate out-of-memory error upwards
This issue was not causing any harm but IMO (and in the opinion of the
static code checker) it is better to propagate this error status upwards.

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-06-09 17:20:21 -07:00
Filipe Manana
61391d5622 Btrfs: fix hang on error (such as ENOSPC) when writing extent pages
When running low on available disk space and having several processes
doing buffered file IO, I got the following trace in dmesg:

[ 4202.720152] INFO: task kworker/u8:1:5450 blocked for more than 120 seconds.
[ 4202.720401]       Not tainted 3.13.0-fdm-btrfs-next-26+ #1
[ 4202.720596] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4202.720874] kworker/u8:1    D 0000000000000001     0  5450      2 0x00000000
[ 4202.720904] Workqueue: btrfs-flush_delalloc normal_work_helper [btrfs]
[ 4202.720908]  ffff8801f62ddc38 0000000000000082 ffff880203ac2490 00000000001d3f40
[ 4202.720913]  ffff8801f62ddfd8 00000000001d3f40 ffff8800c4f0c920 ffff880203ac2490
[ 4202.720918]  00000000001d4a40 ffff88020fe85a40 ffff88020fe85ab8 0000000000000001
[ 4202.720922] Call Trace:
[ 4202.720931]  [<ffffffff816a3cb9>] schedule+0x29/0x70
[ 4202.720950]  [<ffffffffa01ec48d>] btrfs_start_ordered_extent+0x6d/0x110 [btrfs]
[ 4202.720956]  [<ffffffff8108e620>] ? bit_waitqueue+0xc0/0xc0
[ 4202.720972]  [<ffffffffa01ec559>] btrfs_run_ordered_extent_work+0x29/0x40 [btrfs]
[ 4202.720988]  [<ffffffffa0201987>] normal_work_helper+0x137/0x2c0 [btrfs]
[ 4202.720994]  [<ffffffff810680e5>] process_one_work+0x1f5/0x530
(...)
[ 4202.721027] 2 locks held by kworker/u8:1/5450:
[ 4202.721028]  #0:  (%s-%s){++++..}, at: [<ffffffff81068083>] process_one_work+0x193/0x530
[ 4202.721037]  #1:  ((&work->normal_work)){+.+...}, at: [<ffffffff81068083>] process_one_work+0x193/0x530
[ 4202.721054] INFO: task btrfs:7891 blocked for more than 120 seconds.
[ 4202.721258]       Not tainted 3.13.0-fdm-btrfs-next-26+ #1
[ 4202.721444] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4202.721699] btrfs           D 0000000000000001     0  7891   7890 0x00000001
[ 4202.721704]  ffff88018c2119e8 0000000000000086 ffff8800a33d2490 00000000001d3f40
[ 4202.721710]  ffff88018c211fd8 00000000001d3f40 ffff8802144b0000 ffff8800a33d2490
[ 4202.721714]  ffff8800d8576640 ffff88020fe85bc0 ffff88020fe85bc8 7fffffffffffffff
[ 4202.721718] Call Trace:
[ 4202.721723]  [<ffffffff816a3cb9>] schedule+0x29/0x70
[ 4202.721727]  [<ffffffff816a2ebc>] schedule_timeout+0x1dc/0x270
[ 4202.721732]  [<ffffffff8109bd79>] ? mark_held_locks+0xb9/0x140
[ 4202.721736]  [<ffffffff816a90c0>] ? _raw_spin_unlock_irq+0x30/0x40
[ 4202.721740]  [<ffffffff8109bf0d>] ? trace_hardirqs_on_caller+0x10d/0x1d0
[ 4202.721744]  [<ffffffff816a488f>] wait_for_completion+0xdf/0x120
[ 4202.721749]  [<ffffffff8107fa90>] ? try_to_wake_up+0x310/0x310
[ 4202.721765]  [<ffffffffa01ebee4>] btrfs_wait_ordered_extents+0x1f4/0x280 [btrfs]
[ 4202.721781]  [<ffffffffa020526e>] btrfs_mksubvol.isra.62+0x30e/0x5a0 [btrfs]
[ 4202.721786]  [<ffffffff8108e620>] ? bit_waitqueue+0xc0/0xc0
[ 4202.721799]  [<ffffffffa02056a9>] btrfs_ioctl_snap_create_transid+0x1a9/0x1b0 [btrfs]
[ 4202.721813]  [<ffffffffa020583a>] btrfs_ioctl_snap_create_v2+0x10a/0x170 [btrfs]
(...)

It turns out that extent_io.c:__extent_writepage(), which ends up being called
through filemap_fdatawrite_range() in btrfs_start_ordered_extent(), was getting
-ENOSPC when calling the fill_delalloc callback. In this situation, it returned
without the writepage_end_io_hook callback (inode.c:btrfs_writepage_end_io_hook)
ever being called for the respective page, which prevents the ordered extent's
bytes_left count from ever reaching 0, and therefore a finish_ordered_fn work
is never queued into the endio_write_workers queue. This makes the task that
called btrfs_start_ordered_extent() hang forever on the wait queue of the ordered
extent.

This is fairly easy to reproduce using a small filesystem and fsstress on
a quad core vm:

    mkfs.btrfs -f -b `expr 2100 \* 1024 \* 1024` /dev/sdd
    mount /dev/sdd /mnt

    fsstress -p 6 -d /mnt -n 100000 -x \
        "btrfs subvolume snapshot -r /mnt /mnt/mysnap" \
	    -f allocsp=0 \
	    -f bulkstat=0 \
	    -f bulkstat1=0 \
	    -f chown=0 \
	    -f creat=1 \
	    -f dread=0 \
	    -f dwrite=0 \
	    -f fallocate=1 \
	    -f fdatasync=0 \
	    -f fiemap=0 \
	    -f freesp=0 \
	    -f fsync=0 \
	    -f getattr=0 \
	    -f getdents=0 \
	    -f link=0 \
	    -f mkdir=0 \
	    -f mknod=0 \
	    -f punch=1 \
	    -f read=0 \
	    -f readlink=0 \
	    -f rename=0 \
	    -f resvsp=0 \
	    -f rmdir=0 \
	    -f setxattr=0 \
	    -f stat=0 \
	    -f symlink=0 \
	    -f sync=0 \
	    -f truncate=1 \
	    -f unlink=0 \
	    -f unresvsp=0 \
	    -f write=4

So just ensure that if an error happens while writing the extent page
we call the writepage_end_io_hook callback. Also make it return the
error code and ensure the caller (extent_write_cache_pages) processes
all pages in the page vector even if an error happens only for some
of them, so that ordered extents end up released.

Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-06-09 17:20:20 -07:00
Linus Torvalds
c593e89787 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fix from Chris Mason:
 "I had this in my 3.16 merge window queue, but it is small and obvious
  enough for 3.15.  I cherry-picked and retested against current rc8"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: send, fix corrupted path strings for long paths
2014-06-07 15:12:18 -07:00
Naoya Horiguchi
d4c54919ed mm: add !pte_present() check on existing hugetlb_entry callbacks
The age table walker doesn't check non-present hugetlb entry in common
path, so hugetlb_entry() callbacks must check it.  The reason for this
behavior is that some callers want to handle it in its own way.

[ I think that reason is bogus, btw - it should just do what the regular
  code does, which is to call the "pte_hole()" function for such hugetlb
  entries  - Linus]

However, some callers don't check it now, which causes unpredictable
result, for example when we have a race between migrating hugepage and
reading /proc/pid/numa_maps.  This patch fixes it by adding !pte_present
checks on buggy callbacks.

This bug exists for years and got visible by introducing hugepage
migration.

ChangeLog v2:
- fix if condition (check !pte_present() instead of pte_present())

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: <stable@vger.kernel.org> [3.12+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ Backported to 3.15.  Signed-off-by: Josh Boyer <jwboyer@fedoraproject.org> ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 13:21:16 -07:00
Filipe Manana
01a9a8a9e2 Btrfs: send, fix corrupted path strings for long paths
If a path has more than 230 characters, we allocate a new buffer to
use for the path, but we were forgotting to copy the contents of the
previous buffer into the new one, which has random content from the
kmalloc call.

Test:

    mkfs.btrfs -f /dev/sdd
    mount /dev/sdd /mnt

    TEST_PATH="/mnt/fdmanana/.config/google-chrome-mysetup/Default/Pepper_Data/Shockwave_Flash/WritableRoot/#SharedObjects/JSHJ4ZKN/s.wsj.net/[[IMPORT]]/players.edgesuite.net/flash/plugins/osmf/advanced-streaming-plugin/v2.7/osmf1.6/Ak#"
    mkdir -p $TEST_PATH
    echo "hello world" > $TEST_PATH/amaiAdvancedStreamingPlugin.txt

    btrfs subvolume snapshot -r /mnt /mnt/mysnap1
    btrfs send /mnt/mysnap1 -f /tmp/1.snap

A test for xfstests follows.

Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Cc: Marc Merlin <marc@merlins.org>
Tested-by: Marc MERLIN <marc@merlins.org>
Signed-off-by: Chris Mason <clm@fb.com>
2014-06-06 12:00:46 -07:00
Jianyu Zhan
c9482a5bdc kernfs: move the last knowledge of sysfs out from kernfs
There is still one residue of sysfs remaining: the sb_magic
SYSFS_MAGIC. However this should be kernfs user specific,
so this patch moves it out. Kerrnfs user should specify their
magic number while mouting.

Signed-off-by: Jianyu Zhan <nasa4836@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-03 08:11:18 -07:00
Linus Torvalds
9f12600fe4 dcache: add missing lockdep annotation
lock_parent() very much on purpose does nested locking of dentries, and
is careful to maintain the right order (lock parent first).  But because
it didn't annotate the nested locking order, lockdep thought it might be
a deadlock on d_lock, and complained.

Add the proper annotation for the inner locking of the child dentry to
make lockdep happy.

Introduced by commit 046b961b45 ("shrink_dentry_list(): take parent's
->d_lock earlier").

Reported-and-tested-by: Josh Boyer <jwboyer@fedoraproject.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-05-31 09:13:21 -07:00
Al Viro
8cbf74da43 dentry_kill() doesn't need the second argument now
it's 1 in the only remaining caller.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-30 11:10:33 -04:00
Al Viro
b2b80195d8 dealing with the rest of shrink_dentry_list() livelock
We have the same problem with ->d_lock order in the inner loop, where
we are dropping references to ancestors.  Same solution, basically -
instead of using dentry_kill() we use lock_parent() (introduced in the
previous commit) to get that lock in a safe way, recheck ->d_count
(in case if lock_parent() has ended up dropping and retaking ->d_lock
and somebody managed to grab a reference during that window), trylock
the inode->i_lock and use __dentry_kill() to do the rest.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-30 11:10:33 -04:00
Al Viro
046b961b45 shrink_dentry_list(): take parent's ->d_lock earlier
The cause of livelocks there is that we are taking ->d_lock on
dentry and its parent in the wrong order, forcing us to use
trylock on the parent's one.  d_walk() takes them in the right
order, and unfortunately it's not hard to create a situation
when shrink_dentry_list() can't make progress since trylock
keeps failing, and shrink_dcache_parent() or check_submounts_and_drop()
keeps calling d_walk() disrupting the very shrink_dentry_list() it's
waiting for.

Solution is straightforward - if that trylock fails, let's unlock
the dentry itself and take locks in the right order.  We need to
stabilize ->d_parent without holding ->d_lock, but that's doable
using RCU.  And we'd better do that in the very beginning of the
loop in shrink_dentry_list(), since the checks on refcount, etc.
would need to be redone anyway.

That deals with a half of the problem - killing dentries on the
shrink list itself.  Another one (dropping their parents) is
in the next commit.

locking parent is interesting - it would be easy to do rcu_read_lock(),
lock whatever we think is a parent, lock dentry itself and check
if the parent is still the right one.  Except that we need to check
that *before* locking the dentry, or we are risking taking ->d_lock
out of order.  Fortunately, once the D1 is locked, we can check if
D2->d_parent is equal to D1 without the need to lock D2; D2->d_parent
can start or stop pointing to D1 only under D1->d_lock, so taking
D1->d_lock is enough.  In other words, the right solution is
rcu_read_lock/lock what looks like parent right now/check if it's
still our parent/rcu_read_unlock/lock the child.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-30 11:03:21 -04:00
Al Viro
ff2fde9929 expand dentry_kill(dentry, 0) in shrink_dentry_list()
Result will be massaged to saner shape in the next commits.  It is
ugly, no questions - the point of that one is to be a provably
equivalent transformation (and it might be worth splitting a bit
more).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-29 08:50:08 -04:00
Al Viro
e55fd01154 split dentry_kill()
... into trylocks and everything else.  The latter (actual killing)
is __dentry_kill().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-29 08:46:08 -04:00
Al Viro
64fd72e0a4 lift the "already marked killed" case into shrink_dentry_list()
It can happen only when dentry_kill() is called with unlock_on_failure
equal to 0 - other callers had dentry pinned until the moment they've
got ->d_lock and DCACHE_DENTRY_KILLED is set only after lockref_mark_dead().

IOW, only one of three call sites of dentry_kill() might end up reaching
that code.  Just move it there.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-28 09:48:44 -04:00
Miklos Szeredi
b6dd6f4738 vfs: fix vmplice_to_user()
Commit 6130f5315e "switch vmsplice_to_user() to copy_page_to_iter()" in
v3.15-rc1 broke vmsplice(2).

This patch fixes two bugs:

 - count is not initialized to a proper value, which resulted in no data
   being copied

 - if rw_copy_check_uvector() returns negative then the iov might be leaked.

Tested OK.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-28 01:54:52 -04:00
Linus Torvalds
db1003f231 Merge branch 'afs' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull AFS fixes and cleanups from David Howells:
 "Here are some patches to the AFS filesystem:

  1) Fix problems in the clean-up parts of the cache manager service
     handler.

  2) Split afs_end_call() introduced in (1) and replace some identical
     code elsewhere with a call to the first half of the split function.

  3) Fix an error introduced in the workqueue PREPARE_WORK() elimination
     commits.

  4) Clean up argument passing to functions called from the workqueue as
     there's now an insulating layer between them and the workqueue.
     This is possible from (3)"

* 'afs' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  AFS: Pass an afs_call* to call->async_workfn() instead of a work_struct*
  AFS: Fix kafs module unloading
  AFS: Part of afs_end_call() is identical to code elsewhere, so split it
  AFS: Fix cache manager service handlers
2014-05-25 12:40:36 -07:00
Linus Torvalds
80a1de29a5 Merge branch 'for-3.15' of git://linux-nfs.org/~bfields/linux
Pull two nfsd bugfixes from Bruce Fields:
 "Just two bugfixes, one for a merge-window-introduced ACL regression,
  the other for a longer-standing v4 state bug"

* 'for-3.15' of git://linux-nfs.org/~bfields/linux:
  nfsd4: warn on finding lockowner without stateid's
  nfsd4: remove lockowner when removing lock stateid
  nfsd4: fix corruption on setting an ACL.
2014-05-25 10:08:48 -07:00
Joseph Qi
66db6cfd49 ocfs2: fix double kmem_cache_destroy in dlm_init
In dlm_init, if create dlm_lockname_cache failed in
dlm_init_master_caches, it will destroy dlm_lockres_cache which created
before twice.  And this will cause system die when loading modules.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-05-23 09:37:30 -07:00
David Howells
656f88ddf1 AFS: Pass an afs_call* to call->async_workfn() instead of a work_struct*
call->async_workfn() can take an afs_call* arg rather than a work_struct* as
the functions assigned there are now called from afs_async_workfn() which has
to call container_of() anyway.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Nathaniel Wesley Filardo <nwf@cs.jhu.edu>
Reviewed-by: Tejun Heo <tj@kernel.org>
2014-05-23 13:05:22 +01:00
Nathaniel Wesley Filardo
150a6b4789 AFS: Fix kafs module unloading
At present, it is not possible to successfully unload the kafs module if there
are outstanding async outgoing calls (those made with afs_make_call()).  This
appears to be due to the changes introduced by:

	commit 059499453a
	Author: Tejun Heo <tj@kernel.org>
	Date:   Fri Mar 7 10:24:50 2014 -0500
	Subject: afs: don't use PREPARE_WORK

which didn't go far enough.  The problem is due to:

 (1) The aforementioned commit introduced a separate handler function pointer
     in the call, call->async_workfn, in addition to the original workqueue
     item, call->async_work, for asynchronous operations because workqueues
     subsystem cannot handle the workqueue item pointer being changed whilst
     the item is queued or being processed.

 (2) afs_async_workfn() was introduced in that commit to be the callback for
     call->async_work.  Its sole purpose is to run whatever call->async_workfn
     points to.

 (3) call->async_workfn is only used from afs_async_workfn(), which is only
     set on async_work by afs_collect_incoming_call() - ie. for incoming
     calls.

 (4) call->async_workfn is *not* set by afs_make_call() when outgoing calls are
     made, and call->async_work is set afs_process_async_call() - and not
     afs_async_workfn().

 (5) afs_process_async_call() now changes call->async_workfn rather than
     call->async_work to point to afs_delete_async_call() to clean up, but this
     is only effective for incoming calls because call->async_work does not
     point to afs_async_workfn() for outgoing calls.

 (6) Because, for incoming calls, call->async_work remains pointing to
     afs_process_async_call() this results in an infinite loop.

Instead, make the workqueue uniformly vector through call->async_workfn, via
afs_async_workfn() and simply initialise call->async_workfn to point to
afs_process_async_call() in afs_make_call().

Signed-off-by: Nathaniel Wesley Filardo <nwf@cs.jhu.edu>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Tejun Heo <tj@kernel.org>
2014-05-23 13:05:22 +01:00
Nathaniel Wesley Filardo
6cf12869f5 AFS: Part of afs_end_call() is identical to code elsewhere, so split it
Split afs_end_call() into two pieces, one of which is identical to code in
afs_process_async_call().  Replace the latter with a call to the first part of
afs_end_call().

Signed-off-by: Nathaniel Wesley Filardo <nwf@cs.jhu.edu>
Signed-off-by: David Howells <dhowells@redhat.com>
2014-05-23 13:05:15 +01:00
Linus Torvalds
11da37b263 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull two btrfs fixes from Chris Mason:
 "This has two fixes that we've been testing for 3.16, but since both
  are safe and fix real bugs, it makes sense to send for 3.15 instead"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: send, fix incorrect ref access when using extrefs
  Btrfs: fix EIO on reading file after ioctl clone works on it
2014-05-22 05:40:13 +09:00
Linus Torvalds
5e9d9fc4ed xfs: fixes for 3.15-rc6
Code inspection of the XFS error number sign translations found a bunch of
 issues, including returning incorrectly signed errors for some data integrity
 operations. These leak to userspace and result in applications not getting the
 errors correctly reported. Hence they need fixing sooner rather than later.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJTdXBCAAoJEK3oKUf0dfoddHgP/11HEo2mAU4s3IZ0FXiWg7IX
 LLz5laDlK0hBTEzlE43Y3bhX5Euk9cMYYschXoX7o9gOBG5VmC4RF9oIlzbohu1D
 IlekaClr9UYiy7G6k3jLYFB8UDO4L88SM1pkJOus40VDD74fU2mYRrkFCnxWgGUz
 9dcQkCB3C75rkH7LT5QGr1qejhmvC8WG0yVnwQB97/wiHDOeFuLIGpJtq8pYabfH
 HVm5VoWcBerX5q6Zd/8hFRLARfMcQLpeotByLRT6jiJHz/gteVou8jJhgBOW1c1/
 Z/CnK7GlvnWUo06/8FRVoHXwuOL+iPa1kiJIGm6DaYEIfZcsif28w2IPZyPlNzzN
 vrR7Tdq6jSqpHo8JHGmBJDmS+RAdQtGEo/5pjqJAdhWOK4EW1fUxcrAH24A8ATLZ
 hb5aIozVAYhGLN8wtPushL7endzZ5qQJFCuGmBO0QRP+5Cbkq018tC/3K9NCPXmM
 MRTyiMs3ZxyYIcvgBo08eU6k419S9D/eZuHy+LU6ALWLf8+Km4aJyC6hKAQmQnzb
 pw/3tP0xbdUK83Xl8wHVGmNUlQgjB1ZhOLdF0xAc9MocRarPqbuvLKTIUHslE8uO
 1+sGIkKeiTzeOd0fJ+UGQC8cFxYbRyhg/fpg2feWF69Rn+hkpUTaSXivhCgAoDVs
 fQ1SB/n97rNi68ZJF6z5
 =5syB
 -----END PGP SIGNATURE-----

Merge tag 'xfs-for-linus-3.15-rc6' of git://oss.sgi.com/xfs/xfs

Pull xfs fixes from Dave Chinner:
 "Code inspection of the XFS error number sign translations found a
  bunch of issues, including returning incorrectly signed errors for
  some data integrity operations.

  These leak to userspace and result in applications not getting the
  errors correctly reported.  Hence they need fixing sooner rather than
  later.

  A couple of the bugs are in data integrity operations, a couple more
  are in the new COLLAPSE_RANGE code.  One of these came in through a
  recent ext4 merge and so I had to update the base tree to 3.15-rc5
  before fixing the issues"

* tag 'xfs-for-linus-3.15-rc6' of git://oss.sgi.com/xfs/xfs:
  xfs: list_lru_init returns a negative error
  xfs: negate xfs_icsb_init_counters error value
  xfs: negate mount workqueue init error value
  xfs: fix wrong err sign on xfs_set_acl()
  xfs: fix wrong errno from xfs_initxattrs
  xfs: correct error sign on COLLAPSE_RANGE errors
  xfs: xfs_commit_metadata returns wrong errno
  xfs: fix incorrect error sign in xfs_file_aio_read
  xfs: xfs_dir_fsync() returns positive errno
2014-05-22 05:36:07 +09:00
J. Bruce Fields
27b11428b7 nfsd4: warn on finding lockowner without stateid's
The current code assumes a one-to-one lockowner<->lock stateid
correspondance.

Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-21 11:11:21 -04:00
J. Bruce Fields
a1b8ff4c97 nfsd4: remove lockowner when removing lock stateid
The nfsv4 state code has always assumed a one-to-one correspondance
between lock stateid's and lockowners even if it appears not to in some
places.

We may actually change that, but for now when FREE_STATEID releases a
lock stateid it also needs to release the parent lockowner.

Symptoms were a subsequent LOCK crashing in find_lockowner_str when it
calls same_lockowner_ino on a lockowner that unexpectedly has an empty
so_stateids list.

Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-21 11:11:21 -04:00
David Howells
6c67c7c38c AFS: Fix cache manager service handlers
Fix the cache manager RPC service handlers.  The afs_send_empty_reply() and
afs_send_simple_reply() functions:

 (a) Kill the call and free up the buffers associated with it if they fail.

 (b) Return with call intact if it they succeed.

However, none of the callers actually check the result or clean up if
successful - and may use the now non-existent data if it fails.

This was detected by Dan Carpenter using a static checker:

	The patch 08e0e7c82e: "[AF_RXRPC]: Make the in-kernel AFS
	filesystem use AF_RXRPC." from Apr 26, 2007, leads to the following
	static checker warning:
	"fs/afs/cmservice.c:155 SRXAFSCB_CallBack()
		 warn: 'call' was already freed."

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2014-05-21 14:48:05 +01:00
Linus Torvalds
439c610992 Driver core fixes for 3.15-rc6
Here are two driver core (well, sysfs) fixes for 3.15-rc6 that resolve
 some reported issues and a regression from 3.13.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iEUEABECAAYFAlN8LzgACgkQMUfUDdst+ynJnQCeKQt7KdEBlHAKI5/iP2IQVNNx
 KG8AmMepPCjpp9/MbrFQnx3miGgNEug=
 =813a
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-3.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core fixes from Greg KH:
 "Here are two driver core (well, sysfs) fixes for 3.15-rc6 that resolve
  some reported issues and a regression from 3.13"

* tag 'driver-core-3.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  sysfs: make sure read buffer is zeroed
  kernfs, sysfs, cgroup: restrict extra perm check on open to sysfs
2014-05-21 18:59:25 +09:00
Filipe Manana
51a60253a5 Btrfs: send, fix incorrect ref access when using extrefs
When running send, if an inode only has extended reference items
associated to it and no regular references, send.c:get_first_ref()
was incorrectly assuming the reference it found was of type
BTRFS_INODE_REF_KEY due to use of the wrong key variable.
This caused weird behaviour when using the found item has a regular
reference, such as weird path string, and occasionally (when lucky)
a crash:

[  190.600652] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
[  190.600994] Modules linked in: btrfs xor raid6_pq binfmt_misc nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc psmouse serio_raw evbug pcspkr i2c_piix4 e1000 floppy
[  190.602565] CPU: 2 PID: 14520 Comm: btrfs Not tainted 3.13.0-fdm-btrfs-next-26+ #1
[  190.602728] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  190.602868] task: ffff8800d447c920 ti: ffff8801fa79e000 task.ti: ffff8801fa79e000
[  190.603030] RIP: 0010:[<ffffffff813266b4>]  [<ffffffff813266b4>] memcpy+0x54/0x110
[  190.603262] RSP: 0018:ffff8801fa79f880  EFLAGS: 00010202
[  190.603395] RAX: ffff8800d4326e3f RBX: 000000000000036a RCX: ffff880000000000
[  190.603553] RDX: 000000000000032a RSI: ffe708844042936a RDI: ffff8800d43271a9
[  190.603710] RBP: ffff8801fa79f8c8 R08: 00000000003a4ef0 R09: 0000000000000000
[  190.603867] R10: 793a4ef09f000000 R11: 9f0000000053726f R12: ffff8800d43271a9
[  190.604020] R13: 0000160000000000 R14: ffff8802110134f0 R15: 000000000000036a
[  190.604020] FS:  00007fb423d09b80(0000) GS:ffff880216200000(0000) knlGS:0000000000000000
[  190.604020] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  190.604020] CR2: 00007fb4229d4b78 CR3: 00000001f5d76000 CR4: 00000000000006e0
[  190.604020] Stack:
[  190.604020]  ffffffffa01f4d49 ffff8801fa79f8f0 00000000000009f9 ffff8801fa79f8c8
[  190.604020]  00000000000009f9 ffff880211013260 000000000000f971 ffff88021147dba8
[  190.604020]  00000000000009f9 ffff8801fa79f918 ffffffffa02367f5 ffff8801fa79f928
[  190.604020] Call Trace:
[  190.604020]  [<ffffffffa01f4d49>] ? read_extent_buffer+0xb9/0x120 [btrfs]
[  190.604020]  [<ffffffffa02367f5>] fs_path_add_from_extent_buffer+0x45/0x60 [btrfs]
[  190.604020]  [<ffffffffa0238806>] get_first_ref+0x1f6/0x210 [btrfs]
[  190.604020]  [<ffffffffa0238994>] __get_cur_name_and_parent+0x174/0x3a0 [btrfs]
[  190.604020]  [<ffffffff8118df3d>] ? kmem_cache_alloc_trace+0x11d/0x1e0
[  190.604020]  [<ffffffffa0236674>] ? fs_path_alloc+0x24/0x60 [btrfs]
[  190.604020]  [<ffffffffa0238c91>] get_cur_path+0xd1/0x240 [btrfs]
(...)

Steps to reproduce (either crash or some weirdness like an odd path string):

    mkfs.btrfs -f -O extref /dev/sdd
    mount /dev/sdd /mnt

    mkdir /mnt/testdir
    touch /mnt/testdir/foobar

    for i in `seq 1 2550`; do
        ln /mnt/testdir/foobar /mnt/testdir/foobar_link_`printf "%04d" $i`
    done

    ln /mnt/testdir/foobar /mnt/testdir/final_foobar_name

    rm -f /mnt/testdir/foobar
    for i in `seq 1 2550`; do
        rm -f /mnt/testdir/foobar_link_`printf "%04d" $i`
    done

    btrfs subvolume snapshot -r /mnt /mnt/mysnap
    btrfs send /mnt/mysnap -f /tmp/mysnap.send

Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Chris Mason <clm@fb.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
2014-05-20 10:18:26 -07:00
Liu Bo
d3ecfcdf91 Btrfs: fix EIO on reading file after ioctl clone works on it
For inline data extent, we need to make its length aligned, otherwise,
we can get a phantom extent map which confuses readpages() to return -EIO.

This can be detected by xfstests/btrfs/035.

Reported-by: David Disseldorp <ddiss@suse.de>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <clm@fb.com>
2014-05-20 10:17:48 -07:00
Linus Torvalds
41abc90228 Metag architecture and related fixes for v3.15
Mostly fixes for metag and parisc relating to upgrowing stacks.
 
 * Fix missing compiler barriers in metag memory barriers.
 * Fix BUG_ON on metag when RLIMIT_STACK hard limit is increased beyond
   safe value.
 * Make maximum stack size configurable. This reduces the default user
   stack size back to 80MB (especially on parisc after their removal of
   _STK_LIM_MAX override). This only affects metag and parisc.
 * Remove metag _STK_LIM_MAX override to match other arches and follow
   parisc, now that it is safe to do so (due to the BUG_ON fix mentioned
   above).
 * Finally now that both metag and parisc _STK_LIM_MAX overrides have
   been removed, it makes sense to remove _STK_LIM_MAX altogether.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJTdAc3AAoJEGwLaZPeOHZ6L2QP/ihJag44CyWKKpeu/5FUkjP6
 62wX4cYKCFR9pTkOZDViWs7c+xrmW6OtORfQKuXu1g68L3v2cwb0HmcvybQ75pIQ
 CbaN+d5OnGPjHGYCSVqQBKlJ0qbcgQfoNUuCVOZx9kZgnCYQhJlh4HYRwHdUv9WY
 1FA3wor/JTTAiKvPBv+/t4NzTpTafpSIhYLahjxZbtuU1WjEfmj8QgWQpzTEJSeZ
 AyNE/nDlcYcdq4lDxMz2pcQfmJ2MpE56wvXJ7IdXadLaLp4yzc+WTAvFzNJ1XnAN
 2IcyNBpgF/vMXCbErA9QQegYwKd9jpF0w3oQmNLkgr27Kv27iL2sLIEWVn3FAXCu
 p+I0ypMlkD/gSdofCUaWTiGGOQiKMqAWJMfjky8RjA7Qz5TdLCldpjjuZEMKl8hM
 SLjkmgZHMG2/rJ8MosOL+ARAXl88v25gfM6rNIPTtMzH72qevrHgjFPj6pWHejhE
 0E43yDS+zt215HrFCXYBhVbFY1NM7JeBS8NFd9Y/8LKTWc8QSi2h8Q1ZaobKJi4O
 0zlKxRRR4QmmtF7S5wL/qOQ0U95HBvYSx+Ssp3C0eh/PEkZYWm0jiXtaKBCYtnDx
 33wRutv+R9sSkKaiiURBh9/VPWFLQ1ak5z+ejqrv32+oBzt/zmxb7LgwsxdAbAms
 9r/8XaY3V+JBPw7UxfQN
 =aveq
 -----END PGP SIGNATURE-----

Merge tag 'metag-for-v3.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/metag

Pull Metag architecture and related fixes from James Hogan:
 "Mostly fixes for metag and parisc relating to upgrowing stacks.

   - Fix missing compiler barriers in metag memory barriers.
   - Fix BUG_ON on metag when RLIMIT_STACK hard limit is increased
     beyond safe value.
   - Make maximum stack size configurable.  This reduces the default
     user stack size back to 80MB (especially on parisc after their
     removal of _STK_LIM_MAX override).  This only affects metag and
     parisc.
   - Remove metag _STK_LIM_MAX override to match other arches and follow
     parisc, now that it is safe to do so (due to the BUG_ON fix
     mentioned above).
   - Finally now that both metag and parisc _STK_LIM_MAX overrides have
     been removed, it makes sense to remove _STK_LIM_MAX altogether"

* tag 'metag-for-v3.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/metag:
  asm-generic: remove _STK_LIM_MAX
  metag: Remove _STK_LIM_MAX override
  parisc,metag: Do not hardcode maximum userspace stack size
  metag: Reduce maximum stack size to 256MB
  metag: fix memory barriers
2014-05-20 14:30:34 +09:00
Tejun Heo
f5c16f29bf sysfs: make sure read buffer is zeroed
13c589d5b0 ("sysfs: use seq_file when reading regular files")
switched sysfs from custom read implementation to seq_file to enable
later transition to kernfs.  After the change, the buffer passed to
->show() is acquired through seq_get_buf(); unfortunately, this
introduces a subtle behavior change.  Before the commit, the buffer
passed to ->show() was always zero as it was allocated using
get_zeroed_page().  Because seq_file doesn't clear buffers on
allocation and neither does seq_get_buf(), after the commit, depending
on the behavior of ->show(), we may end up exposing uninitialized data
to userland thus possibly altering userland visible behavior and
leaking information.

Fix it by explicitly clearing the buffer.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Ron <ron@debian.org>
Fixes: 13c589d5b0 ("sysfs: use seq_file when reading regular files")
Cc: stable <stable@vger.kernel.org> # 3.13+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-20 10:15:53 +09:00
J. Bruce Fields
5513a510fa nfsd4: fix corruption on setting an ACL.
As of 06f9cc12ca "nfsd4: don't create
unnecessary mask acl", any non-trivial ACL will be left with an
unitialized entry, and a trivial ACL may write one entry beyond what's
allocated.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-15 15:36:04 -04:00