If we have enough memory to allocate a new cap release message, do so, so
that we can send a partial release message immediately. This keeps us from
making the MDS wait when the cap release it needs is in a partially full
release message.
If we fail because of ENOMEM, oh well, they'll just have to wait a bit
longer.
Signed-off-by: Sage Weil <sage@newdream.net>
If we get an IMPORT that give us a cap, but we don't have the inode, queue
a release (and try to send it immediately) so that the MDS doesn't get
stuck waiting for us.
Signed-off-by: Sage Weil <sage@newdream.net>
bdi_seq is an atomic_long_t but we're using ATOMIC_INIT, which causes
build failures on ia64. This patch fixes it to use ATOMIC_LONG_INIT.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Sage Weil <sage@newdream.net>
If the client revokes a lease with a higher seq than what we have, keep
the mds's seq, so that it honors our release. Otherwise, we can hang
indefinitely.
Signed-off-by: Sage Weil <sage@newdream.net>
We were setting f_namelen in kstatfs to PATH_MAX instead of NAME_MAX.
That disagrees with ceph_lookup behavior (which checks against NAME_MAX),
and also makes the pjd posix test suite spit out ugly errors because with
can't clean up its temporary files.
Signed-off-by: Sage Weil <sage@newdream.net>
We misused list_move_tail() to order the dentry in d_subdirs.
This will screw up the d_subdirs order.
This bug can be reliably reproduced by:
1. mount ceph fs.
2. on ceph fs, git clone git://ceph.newdream.net/git/ceph.git
3. Run autogen.sh in ceph directory.
(Note: Errors only occur at the first time you run autogen.sh.)
Signed-off-by: Henry C Chang <henry_c_chang@tcloudcomputing.com>
Signed-off-by: Sage Weil <sage@newdream.net>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
ceph: clean up on forwarded aborted mds request
ceph: fix leak of osd authorizer
ceph: close out mds, osd connections before stopping auth
ceph: make lease code DN specific
fs/ceph: Use ERR_CAST
ceph: renew auth tickets before they expire
ceph: do not resend mon requests on auth ticket renewal
ceph: removed duplicated #includes
ceph: avoid possible null dereference
ceph: make mds requests killable, not interruptible
sched: add wait_for_completion_killable_timeout
If an mds request is aborted (timeout, SIGKILL), it is left registered to
keep our state in sync with the mds. If we get a forward notification,
though, we know the request didn't succeed and we can unregister it
safely. We were trying to resend it, but then bailing out (and not
unregistering) in __do_request.
Signed-off-by: Sage Weil <sage@newdream.net>
The auth module (part of the mon_client) is needed to free any
ceph_authorizer(s) used by the mds and osd connections. Flush the msgr
workqueue before stopping monc to ensure that the destroy_authorizer
auth op is available when those connections are closed out.
Signed-off-by: Sage Weil <sage@newdream.net>
The lease code includes a mask in the CEPH_LOCK_* namespace, but that
namespace is changing, and only one mask (formerly _DN == 1) is used, so
hard code for that value for now.
If we ever extend this code to handle leases over different data types we
can extend it accordingly.
Signed-off-by: Sage Weil <sage@newdream.net>
Use ERR_CAST(x) rather than ERR_PTR(PTR_ERR(x)). The former makes more
clear what is the purpose of the operation, which otherwise looks like a
no-op.
In the case of fs/ceph/inode.c, ERR_CAST is not needed, because the type of
the returned value is the same as the type of the enclosing function.
The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@@
type T;
T x;
identifier f;
@@
T f (...) { <+...
- ERR_PTR(PTR_ERR(x))
+ x
...+> }
@@
expression x;
@@
- ERR_PTR(PTR_ERR(x))
+ ERR_CAST(x)
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Sage Weil <sage@newdream.net>
We were only requesting renewal after our tickets expire; do so before
that. Most of the low-level logic for this was already there; just use
it.
Signed-off-by: Sage Weil <sage@newdream.net>
We only want to send pending mon requests when we successfully
authenticate. If we are already authenticated, like when we renew our
ticket, there is no need to resend pending requests.
Signed-off-by: Sage Weil <sage@newdream.net>
fs/ceph/auth.c: linux/slab.h is included more than once.
fs/ceph/super.h: linux/slab.h is included more than once.
Acked-by: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Signed-off-by: Sage Weil <sage@newdream.net>
ac->ops may be null; use protocol id in error message instead.
Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Sage Weil <sage@newdream.net>
The underlying problem is that many mds requests can't be restarted. For
example, a restarted create() would return -EEXIST if the original request
succeeds. However, we do not want a hung MDS to hang the client too. So,
use the _killable wait_for_completion variants to abort on SIGKILL but
nothing else.
Signed-off-by: Sage Weil <sage@newdream.net>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (59 commits)
ceph: reuse mon subscribe message instead of allocated anew
ceph: avoid resending queued message to monitor
ceph: Storage class should be before const qualifier
ceph: all allocation functions should get gfp_mask
ceph: specify max_bytes on readdir replies
ceph: cleanup pool op strings
ceph: Use kzalloc
ceph: use common helper for aborted dir request invalidation
ceph: cope with out of order (unsafe after safe) mds reply
ceph: save peer feature bits in connection structure
ceph: resync headers with userland
ceph: use ceph. prefix for virtual xattrs
ceph: throw out dirty caps metadata, data on session teardown
ceph: attempt mds reconnect if mds closes our session
ceph: clean up send_mds_reconnect interface
ceph: wait for mds OPEN reply to indicate reconnect success
ceph: only send cap releases when mds is OPEN|HUNG
ceph: dicard cap releases on mds restart
ceph: make mon client statfs handling more generic
ceph: drop src address(es) from message header [new protocol feature]
...
Use the same message, allocated during startup. No need to reallocate a
new one each time around (and potentially ENOMEM).
Signed-off-by: Sage Weil <sage@newdream.net>
Now that the last user passing a NULL file pointer is gone we can remove
the redundant dentry argument and associated hacks inside vfs_fsynmc_range.
The next step will be removig the dentry argument from ->fsync, but given
the luck with the last round of method prototype changes I'd rather
defer this until after the main merge window.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The auth_reply handler will (re)send any pending requests. For the
initial mon authenticate phase, that's correct, but when a auth ticket
renewal races with an in-flight request, we may resend a request message
that is already in flight. Avoid this by revoking the message before
sending it.
We should also avoid resending requests at all during ticket renewal; that
will come soon.
Signed-off-by: Sage Weil <sage@newdream.net>
The C99 specification states in section 6.11.5:
The placement of a storage-class specifier other than at the beginning
of the declaration specifiers in a declaration is an obsolescent
feature.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Sage Weil <sage@newdream.net>
This is essential, as for the rados block device we'll need
to run in different contexts that would need flags that
are other than GFP_NOFS.
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
Specify max bytes in request to bound size of reply. Add associated
mount option with default value of 512 KB.
Signed-off-by: Sage Weil <sage@newdream.net>
Use kzalloc rather than the combination of kmalloc and memset.
The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@@
expression x,size,flags;
statement S;
@@
-x = kmalloc(size,flags);
+x = kzalloc(size,flags);
if (x == NULL) S
-memset(x, 0, size);
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Sage Weil <sage@newdream.net>
We invalidate I_COMPLETE and dentry leases in two places: on aborted mds
request and on request replay. Use common helper to avoid duplicate code.
Signed-off-by: Sage Weil <sage@newdream.net>
The remove_session_caps() helper is called when an MDS closes out our
session (either normally, or as a result of a failed reconnect), and when
we tear down state for umount. If we remove the last cap, and there are
no cap migrations in progress, then there is little hope of us flushing
out that data to the mds (without heroic efforts to reconnect and flush).
So, to avoid leaving inodes pinned (due to dirty state) and crashing after
umount, throw out dirty caps state and unpin the inodes. Print a warning
to the console so we know something was lost.
NOTE: Although we drop wrbuffer refs, we don't actually mark pages clean;
maybe a truncate should be queued?
Signed-off-by: Sage Weil <sage@newdream.net>
Currently, if our session is closed (due to a timeout, or explicit close,
or whatever), we just sit there doing nothing unless/until the MDS
restarts, at which point we try to reconnect.
Change client to attempt an immediate reconnect if our session is closed.
Note that currently the MDS doesn't support this, and our attempt will
fail. We'll get a session CLOSE, our caps and dirty cap state will be
dropped, and the client will be free to attempt to reconnect. That's
clearly not as nice as a successful reconnect, but it at least allows us
to try to carry on, and in the future the MDS will support a reconnect
and we will fare better.
Signed-off-by: Sage Weil <sage@newdream.net>
Pass a ceph_mds_session, since the caller has it.
Remove the dead code for sending empty reconnects. It used to be used
when the MDS contacted _us_ to solicit a reconnect, and we could reply
saying "go away, I have no session." Now we only send reconnects based
on the mds map, and only when we do in fact have an open session.
Signed-off-by: Sage Weil <sage@newdream.net>
We used to infer reconnect success by watching the MDS state, essentially
assuming that hearing nothing meant things were ok. That wasn't
particularly reliable. Instead, the MDS replies with an explicit OPEN
message to indicate success.
Strictly speaking, this is a protocol change, but it is a backwards
compatible one that does not break new clients + old servers or old
clients + new servers. At least not yet.
Drop unused @all argument from kick_requests while we're at it.
Signed-off-by: Sage Weil <sage@newdream.net>
On OPENING we shouldn't have any caps (or releases).
On CLOSING, we should wait until we succeed (and throw it all out), or
don't (and are OPEN again).
On RECONNECTING we can wait until we are OPEN.
Signed-off-by: Sage Weil <sage@newdream.net>
If the MDS restarts, the expire caps state is no longer shared, and can be
thrown out. Caps state will be rebuilt on the MDS during the reconnect
process that follows. Zero out any release messages and adjust the
release counter accordingly.
Signed-off-by: Sage Weil <sage@newdream.net>
This is being done so that we could reuse the statfs
infrastructure with other requests that return values.
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
The CEPH_FEATURE_NOSRCADDR protocol feature avoids putting the full source
address in each message header (twice). This patch switches the client to
the new scheme, and _requires_ this feature on the server. The server
will support both the old and new schemes. That means an old client will
work with a new server, but a new client will not work with an old server.
Signed-off-by: Sage Weil <sage@newdream.net>
The bdi_setup_and_register() helper doesn't help us since we bdi_init() in
create_client() and bdi_register() only when sget() succeeds.
Signed-off-by: Sage Weil <sage@newdream.net>
We want to assign an offset when the dentry goes from null to linked, which
is always done by splice_dentry(). Notably, we should NOT assign an
offset when a dentry is first created and is still null.
BUG if we try to splice a non-null dentry (we shouldn't).
Signed-off-by: Sage Weil <sage@newdream.net>