We set cl_firststate when we first decide that a client will be
permitted to reclaim state on next boot. This happens:
- for new 4.0 clients, when they confirm their first open
- for returning 4.0 clients, when they reclaim their first open
- for 4.1+ clients, when they perform reclaim_complete
We also use cl_firststate to decide whether a reclaim_complete has
already been performed, in the 4.1+ case.
We were setting it on 4.1 open reclaims, which caused spurious
COMPLETE_ALREADY errors on RECLAIM_COMPLETE from an nfs4.1 client with
anything to reclaim.
Reported-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Respect client request for not getting a delegation in NFSv4.1
Appropriately return delegation "type" NFS4_OPEN_DELEGATE_NONE_EXT
and WND4_NOT_WANTED reason.
[nfsd41: add missing break when encoding op_why_no_deleg]
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
When I initially wrote it, I didn't understand how lists worked so I
wrote something that didn't use them. I think making a list of stateids
to test is a more straightforward implementation, especially compared to
especially compared to decoding stateids while simultaneously encoding
a reply to the client.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
If you try to set grace_period or timeout via a module parameter
to lockd, and do this on a big-endian machine where
sizeof(int) != sizeof(unsigned long)
it won't work. This number given will be effectively shifted right
by the difference in those two sizes.
So cast kp->arg properly to get correct result.
Cc: stable@kernel.org
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Currently, it will not correctly ignore any nfsv4.1 signal flags
if the client sends them.
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
This fixes an oops when a buggy client tries to use an initial seqid of
0 on a new slot, which we may misinterpret as a replay.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Combine two booleans into a single flag field, move the smaller fields
to the end.
(In practice this doesn't make the struct any smaller. But we'll be
adding another flag here soon.)
Remove some debugging code that doesn't look useful, while we're in the
neighborhood.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
From RFC 5661 2.10.6.1: "If the previous sequence ID was 0xFFFFFFFF,
then the next request for the slot MUST have the sequence ID set to
zero."
While we're there, delete some redundant comments.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The rpc buffers will be allocated out of low memory, so we should really
only be taking that into account.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Move calculation of the default into a helper function.
Get rid of an unused variable "err" while we're there.
Thanks to Mi Jinlong for catching an arithmetic error in a previous
version.
Cc: Mi Jinlong <mijinlong@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
We check for zero length strings in the caller now, so these aren't
needed.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
There are few important bug fixes for LogFS
Shortlog:
Joern Engel (5):
logfs: Prevent memory corruption
logfs: remove useless BUG_ON
logfs: Free areas before calling generic_shutdown_super()
logfs: Grow inode in delete path
Logfs: Allow NULL block_isbad() methods
Prasad Joshi (5):
logfs: update page reference count for pined pages
logfs: take write mutex lock during fsync and sync
logfs: set superblock shutdown flag after generic sb shutdown
logfs: Propagate page parameter to __logfs_write_inode
MAINTAINERS: Add Prasad Joshi in LogFS maintiners
Diffstat:
MAINTAINERS | 1 +
fs/logfs/dev_mtd.c | 26 +++++++++++-------------
fs/logfs/dir.c | 2 +-
fs/logfs/file.c | 2 +
fs/logfs/gc.c | 2 +-
fs/logfs/inode.c | 4 ++-
fs/logfs/journal.c | 1 -
fs/logfs/logfs.h | 5 +++-
fs/logfs/readwrite.c | 51 +++++++++++++++++++++++++++++++++----------------
fs/logfs/segment.c | 51 ++++++++++++++++++++++++++++++++++++++-----------
fs/logfs/super.c | 3 +-
11 files changed, 99 insertions(+), 49 deletions(-)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJPKByhAAoJEDFA/f+3K+ZNQSUP/3gACcIwcsl+FnXPWBtz9XIG
g0DjXoRDd/sR0u25nLgjCVdBJgx5FVEyA+PvLgvvUU2KCAsqI5F/EQ+fLJs21YEN
TzepBO5aHtFZbNEjo6WiXOlDbBePTtk44WrN6jqoCHM/aDeT4Wof3NZBmHWNN1PX
B2RtEZ0ypJ7/b1OY2LUNcQfTaJXNgVoP8Hkx4KGY5LUVxVrBXxvDTU7YbkS8a+ys
1Yje/EQ4XD4RyZB42TmFEuTenvGPRgMGVFdnkJKuON8EmJQ8Hc61jEf5d7Q8sWef
dH5F/ptoAaR9a9LbbO8LoYuBZ8MR8848NPsrNPpr/gWntj46Z79yII8Jr7YoSDyw
zq5G2dZbwlbVrtVWKGae47THkNB8bljR/g4cijvPAkvuIAku6mg+dgjVHAhZ/t+J
xu8+Gy2sWHUH2gmoSXuoNyppOvYpPIRd5RB16PizMH3bw+sMad2K8/rfOKnmF1/r
HTM2jZ5bDcHVDjSuVI6u2m/mQX+PmPXUTffreaFXuSI75YpT0dqN3nponTX4EgFI
Ad9ZBQvdg8w1LGDsNxIAaqrGx4Q87RxqfUV4W/wo6N8gKsp+I2y4GtYMeD/CEKyi
wncKg10YwoMXZj7cBAkWgPlgrOBYCPwYZc/1DVRHvqrHo/m13SJrWDKkNKVvoXzH
2y4Tfi5w1WDRUT7yeoyK
=TA1A
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://github.com/prasad-joshi/logfs_upstream
There are few important bug fixes for LogFS
* tag 'for-linus' of git://github.com/prasad-joshi/logfs_upstream:
Logfs: Allow NULL block_isbad() methods
logfs: Grow inode in delete path
logfs: Free areas before calling generic_shutdown_super()
logfs: remove useless BUG_ON
MAINTAINERS: Add Prasad Joshi in LogFS maintiners
logfs: Propagate page parameter to __logfs_write_inode
logfs: set superblock shutdown flag after generic sb shutdown
logfs: take write mutex lock during fsync and sync
logfs: Prevent memory corruption
logfs: update page reference count for pined pages
Fix up conflict in fs/logfs/dev_mtd.c due to semantic change in what
"mtd->block_isbad" means in commit f2933e86ad: "Logfs: Allow NULL
block_isbad() methods" clashing with the abstraction changes in the
commits 7086c19d07: "mtd: introduce mtd_block_isbad interface" and
d58b27ed58: "logfs: do not use 'mtd->block_isbad' directly".
This resolution takes the semantics from commit f2933e86ad, and just
makes mtd_block_isbad() return zero (false) if the 'block_isbad'
function is NULL. But that also means that now "mtd_can_have_bb()"
always returns 0.
Now, "mtd_block_markbad()" will obviously return an error if the
low-level driver doesn't support bad blocks, so this is somewhat
non-symmetric, but it actually makes sense if a NULL "block_isbad"
function is considered to mean "I assume that all my blocks are always
good".
It contains the removal of the sysdev code, now that all users of it are
gone, as well as some sysfs bugfixes that have been reported by users.
There are also some documentation updates here as well.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)
iEYEABECAAYFAk8jKW4ACgkQMUfUDdst+ynAUwCfVWwHJxpb4DSSMVZhGOnHMQrL
ZjIAn00gPeSs5u8y1nPvFrFikbon4FDs
=bzVy
-----END PGP SIGNATURE-----
Merge tag 'driver-core-3.3-rc1-bugfixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Here are some patches for the 3.3-rc1 tree.
It contains the removal of the sysdev code, now that all users of it are
gone, as well as some sysfs bugfixes that have been reported by users.
There are also some documentation updates here as well.
* tag 'driver-core-3.3-rc1-bugfixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
sysfs: Complain bitterly about attempts to remove files from nonexistent directories.
stable: update documentation to ask for kernel version
base/core.c:fix typo in comment in function device_add
Documentation: devres: add allocation functions to list of supported calls
Documentation update for the driver model core
kernel-doc: fix new warnings in driver-core
kernel-doc: fix new warnings in debugfs
kernel-doc: fix new warnings in device.h
driver core: remove drivers/base/sys.c and include/linux/sysdev.h
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: fix reservations in btrfs_page_mkwrite
Btrfs: advance window_start if we're using a bitmap
btrfs: mask out gfp flags in releasepage
Btrfs: fix enospc error caused by wrong checks of the chunk
Btrfs: do not defrag a file partially
Btrfs: fix warning for 32-bit build of fs/btrfs/check-integrity.c
Btrfs: use cluster->window_start when allocating from a cluster bitmap
Btrfs: Check for NULL page in extent_range_uptodate
btrfs: Fix busyloops in transaction waiting code
Btrfs: make sure a bitmap has enough bytes
Btrfs: fix uninit warning in backref.c
Can be necessary if an inode gets deleted (through -ENOSPC) before being
written. Might be better to move this into logfs_write_rec(), but for
now go with the stupid&safe patch.
Signed-off-by: Joern Engel <joern@logfs.org>
This is a bad one. I wonder whether we were so far protected by
no_free_segments(sb) usually being smaller than LOGFS_NO_AREAS.
Found by Dan Carpenter <dan.carpenter@oracle.com> using smatch.
Signed-off-by: Joern Engel <joern@logfs.org>
Signed-off-by: Prasad Joshi <prasadjoshi.linux@gmail.com>
LogFS sets PG_private flag to indicate a pined page. We assumed that
marking a page as private is enough to ensure its existence. But
instead it is necessary to hold a reference count to the page.
The change resolves the following BUG
BUG: Bad page state in process flush-253:16 pfn:6a6d0
page flags: 0x100000000000808(uptodate|private)
Suggested-and-Acked-by: Joern Engel <joern@logfs.org>
Signed-off-by: Prasad Joshi <prasadjoshi.linux@gmail.com>
Josef fixed btrfs_page_mkwrite to properly release reserved
extents if there was an error. But if we fail to get a reservation
and we fail to dirty the inode (for ENOSPC reasons), we'll end up
trying to release a reservation we never had.
This makes sure we only release if we were able to reserve.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
If we span a long area in a bitmap we could end up taking a lot of time
searching to the next free area if we're searching from the original
window_start, so advance window_start in order to make sure we don't do any
superficial searching. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
btree_releasepage is a callback and can be passed unknown gfp flags and then
they may end up in kmem_cache_alloc called from alloc_extent_state, slab
allocator will BUG_ON when there is HIGHMEM or DMA32 flag set.
This may happen when btrfs is mounted from a loop device, which masks out
__GFP_IO flag. The check in try_release_extent_state
3399 if ((mask & GFP_NOFS) == GFP_NOFS)
3400 mask = GFP_NOFS;
will not work and passes unfiltered flags further resulting in crash at
mm/slab.c:2963
[<000000000024ae4c>] cache_alloc_refill+0x3b4/0x5c8
[<000000000024c810>] kmem_cache_alloc+0x204/0x294
[<00000000001fd3c2>] mempool_alloc+0x52/0x170
[<000003c000ced0b0>] alloc_extent_state+0x40/0xd4 [btrfs]
[<000003c000cee5ae>] __clear_extent_bit+0x38a/0x4cc [btrfs]
[<000003c000cee78c>] try_release_extent_state+0x9c/0xd4 [btrfs]
[<000003c000cc4c66>] btree_releasepage+0x7e/0xd0 [btrfs]
[<0000000000210d84>] shrink_page_list+0x6a0/0x724
[<0000000000211394>] shrink_inactive_list+0x230/0x578
[<0000000000211bb8>] shrink_list+0x6c/0x120
[<0000000000211e4e>] shrink_zone+0x1e2/0x228
[<0000000000211f24>] shrink_zones+0x90/0x254
[<0000000000213410>] do_try_to_free_pages+0xac/0x420
[<0000000000213ae0>] try_to_free_pages+0x13c/0x1b0
[<0000000000204e6c>] __alloc_pages_nodemask+0x5b4/0x9a8
[<00000000001fb04a>] grab_cache_page_write_begin+0x7e/0xe8
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
When we did sysbench test for inline files, enospc error happened easily though
there was lots of free disk space which could be allocated for new chunks.
Reproduce steps:
# mkfs.btrfs -b $((2 * 1024 * 1024 * 1024)) <test partition>
# mount <test partition> /mnt
# ulimit -n 102400
# cd /mnt
# sysbench --num-threads=1 --test=fileio --file-num=81920 \
> --file-total-size=80M --file-block-size=1K --file-io-mode=sync \
> --file-test-mode=seqwr prepare
# sysbench --num-threads=1 --test=fileio --file-num=81920 \
> --file-total-size=80M --file-block-size=1K --file-io-mode=sync \
> --file-test-mode=seqwr run
<soon later, BUG_ON() was triggered by enospc error>
The reason of this bug is:
Now, we can reserve space which is larger than the free space in the chunks if
we have enough free disk space which can be used for new chunks. By this way,
the space allocator should allocate a new chunk by force if there is no free
space in the free space cache. But there are two wrong checks which break this
operation.
One is
if (ret == -ENOSPC && num_bytes > min_alloc_size)
in btrfs_reserve_extent(), it is wrong, we should try to allocate a new chunk
even we fail to allocate free space by minimum allocable size.
The other is
if (space_info->force_alloc)
force = space_info->force_alloc;
in do_chunk_alloc(). It makes the allocator ignore CHUNK_ALLOC_FORCE If someone
sets ->force_alloc to CHUNK_ALLOC_LIMITED, and makes the enospc error happen.
Fix these two wrong checks. Especially the second one, we fix it by changing
the value of CHUNK_ALLOC_LIMITED and CHUNK_ALLOC_FORCE, and make
CHUNK_ALLOC_FORCE greater than CHUNK_ALLOC_LIMITED since CHUNK_ALLOC_FORCE has
higher priority. And if the value which is passed in by the caller is greater
than ->force_alloc, use the passed value.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
xfstests 218 complains that btrfs defrags a file partially:
After: 1
Write backwards sync, but contiguous - should defrag to 1 extent
Before: 10
-After: 1
+After: 2
To fix this, we need to set max_to_defrag count properly.
Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
There have been 4 warnings on 32-bit build, they are herewith fixed.
Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
We specifically set window_start in the cluster struct to indicate where the
cluster starts in a bitmap, but we've been using min_start to indicate where
we're searching from. This is usually the start of the blockgroup, so
essentially means we're constantly searching from the start of any bitmap we
find, which completely negates all the trouble we go to in order to setup a
cluster. So start using window_start to make sure we actually use the area we
found. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
A user has encountered a NULL pointer kernel oops in btrfs when
encountering media errors. The problem has been identified
as an unhandled NULL pointer returned from find_get_page().
This modification simply checks for a NULL page, and returns
with an error if found (the extent_range_uptodate() function
returns 1 on errors).
After testing this patch, the user reported that the error with
the NULL pointer oops was solved. However, there is still a
remaining problem with a thread becoming stuck in
wait_on_page_locked(page) in the read_extent_buffer_pages(...)
function in extent_io.c
for (i = start_i; i < num_pages; i++) {
page = extent_buffer_page(eb, i);
wait_on_page_locked(page);
if (!PageUptodate(page))
ret = -EIO;
}
This patch leaves the issue with the locked page yet to be resolved.
Signed-off-by: Mitch Harder <mitch.harder@sabayonlinux.org>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
wait_log_commit() and wait_for_writer() were using slightly different
conditions for deciding whether they should call schedule() and whether they
should continue in the wait loop. Thus it could happen that we busylooped when
the first condition was not true while the second one was. That is burning CPU
cycles needlessly and is deadly on UP machines...
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
We have only been checking for min_bytes available in bitmap entries, but we
won't successfully setup a bitmap cluster unless it has at least bytes in the
bitmap, so in the common case min_bytes is 4k and we want something like 2MB, so
if there are a bunch of bitmap entries with less than 2mb's in them, we'll
search all them anyway, which is suboptimal. Fix this check. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Added initialization with the declaration of ret. It isn't set later on the
switch-default branch (which should never be taken).
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Quoth Ben Myers:
"Please pull in the following bugfix for xfs. We forgot to drop a lock on
error in xfs_readlink. It hasn't been through -next yet, but there is no
-next tree tomorrow. The fix is clear so I'm sending this request today."
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
xfs: Fix missing xfs_iunlock() on error recovery path in xfs_readlink()
The data encryption was moved from ecryptfs_write_end into
ecryptfs_writepage, this patch moves the corresponding function
comments to be consistent with the modification.
Signed-off-by: Li Wang <liwang@nudt.edu.cn>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Says Tyler:
"Tim's logging message update will be really helpful to users when
they're trying to locate a problematic file in the lower filesystem
with filename encryption enabled.
You'll recognize the fix from Li, as you commented on that.
You should also be familiar with my setattr/truncate improvements,
since you were the one that pointed them out to us (thanks again!).
Andrew noted the /dev/ecryptfs write count sanitization needed to be
improved, so I've got a fix in there for that along with some other
less important cleanups of the /dev/ecryptfs read/write code."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
eCryptfs: Fix oops when printing debug info in extent crypto functions
eCryptfs: Remove unused ecryptfs_read()
eCryptfs: Check inode changes in setattr
eCryptfs: Make truncate path killable
eCryptfs: Infinite loop due to overflow in ecryptfs_write()
eCryptfs: Replace miscdev read/write magic numbers
eCryptfs: Report errors in writes to /dev/ecryptfs
eCryptfs: Sanitize write counts of /dev/ecryptfs
ecryptfs: Remove unnecessary variable initialization
ecryptfs: Improve metadata read failure logging
MAINTAINERS: Update eCryptfs maintainer address
If pages passed to the eCryptfs extent-based crypto functions are not
mapped and the module parameter ecryptfs_verbosity=1 was specified at
loading time, a NULL pointer dereference will occur.
Note that this wouldn't happen on a production system, as you wouldn't
pass ecryptfs_verbosity=1 on a production system. It leaks private
information to the system logs and is for debugging only.
The debugging info printed in these messages is no longer very useful
and rather than doing a kmap() in these debugging paths, it will be
better to simply remove the debugging paths completely.
https://launchpad.net/bugs/913651
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Reported-by: Daniel DeFreez
Cc: <stable@vger.kernel.org>
ecryptfs_read() has been ifdef'ed out for years now and it was
apparently unused before then. It is time to get rid of it for good.
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>