linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-12-26 20:44:32 +08:00

Author	SHA1	Message	Date
Chuck Lever	c8c23c423d	NSM: Release nsmhandle in nlm_destroy_host The nsm_handle's reference count is bumped in nlm_lookup_host(). It should be decremented in nlm_destroy_host() to make it easier to see the balance of these two operations. Move the nsm_release() call to fs/lockd/host.c. The h_nsmhandle pointer is set in nlm_lookup_host(), and never cleared. The nlm_destroy_host() function is never called for the same nlm_host twice, so h_nsmhandle won't ever be NULL when nsm_unmonitor() is called. All references to the nlm_host are gone before it is freed. We can skip making h_nsmhandle NULL just before the nlm_host is deallocated. It's also likely we can remove the h_nsmhandle NULL check in nlmsvc_is_client() as well, but we can do that later when rearchitect- ing the nlm_host cache. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-01-06 11:53:52 -05:00
Chuck Lever	1e49323c4a	NLM: Move the public declaration of nsm_monitor() to lockd.h Clean up. Make the nlm_host argument "const," and move the public declaration to lockd.h with other NSM public function (nsm_release, eg) and global variable declarations. Add a documenting comment. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-01-06 11:53:52 -05:00
Chuck Lever	5d254b1198	NSM: Make sure to return an error if the SM_MON call result is not zero The nsm_monitor() function reports an error and does not set sm_monitored if the SM_MON upcall reply has a non-zero result code, but nsm_monitor() does not return an error to its caller in this case. Since sm_monitored is not set, the upcall is retried when the next NLM request invokes nsm_monitor(). However, that may not come for a while. In the meantime, at least one NLM request will potentially proceed without the peer being monitored properly. Have nsm_monitor() return an error if the result code is non-zero. This will cause all NLM requests to fail immediately if the upcall completed successfully but rpc.statd returned an error. This may be inconvenient in some cases (for example if rpc.statd cannot complete a proper DNS reverse lookup of the hostname), but will make the reboot monitoring service more robust by forcing such issues to be corrected by an admin. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-01-06 11:53:51 -05:00
Chuck Lever	5bc74bef7c	NSM: Remove BUG_ON() in nsm_monitor() Clean up: Remove the BUG_ON() invocation in nsm_monitor(). It's not likely that nsm_monitor() is ever called with a NULL host pointer, and the code will die anyway if host is NULL. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-01-06 11:53:51 -05:00
Chuck Lever	501c1ed3fb	NLM: Remove redundant printk() in nlmclnt_lock() The nsm_monitor() function already generates a printk(KERN_NOTICE) if the SM_MON upcall fails, so the similar printk() in the nlmclnt_lock() function is redundant. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-01-06 11:53:51 -05:00
Chuck Lever	9fee49024e	NSM: Use sm_name instead of h_name in nsm_monitor() and nsm_unmonitor() Clean up: Use the sm_name field for reporting the hostname in nsm_monitor() and nsm_unmonitor(), just as the other functions in fs/lockd/mon.c do. The h_name field is just a copy of the sm_name pointer. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-01-06 11:53:51 -05:00
Chuck Lever	29ed1407ed	NSM: Support IPv6 version of mon_name The "mon_name" argument of the NSMPROC_MON and NSMPROC_UNMON upcalls is a string that contains the hostname or IP address of the remote peer to be notified when this host has rebooted. The sm-notify command uses this identifier to contact the peer when we reboot, so it must be either a well-qualified DNS hostname or a presentation format IP address string. When the "nsm_use_hostnames" sysctl is set to zero, the kernel's NSM provides a presentation format IP address in the "mon_name" argument. Otherwise, the "caller_name" argument from NLM requests is used, which is usually just the DNS hostname of the peer. To support IPv6 addresses for the mon_name argument, we use the nsm_handle's address eye-catcher, which already contains an appropriate presentation format address string. Using the eye-catcher string obviates the need to use a large buffer on the stack to form the presentation address string for the upcall. This patch also addresses a subtle bug. An NSMPROC_MON request and the subsequent NSMPROC_UNMON request for the same peer are required to use the same value for the "mon_name" argument. Otherwise, rpc.statd's NSMPROC_UNMON processing cannot locate the database entry for that peer and remove it. If the setting of nsm_use_hostnames is changed between the time the kernel sends an NSMPROC_MON request and the time it sends the NSMPROC_UNMON request for the same peer, the "mon_name" argument for these two requests may not be the same. This is because the value of "mon_name" is currently chosen at the moment the call is made based on the setting of nsm_use_hostnames To ensure both requests pass identical contents in the "mon_name" argument, we now select which string to use for the argument in the nsm_monitor() function. A pointer to this string is saved in the nsm_handle so it can be used for a subsequent NSMPROC_UNMON upcall. NB: There are other potential problems, such as how nlm_host_rebooted() might behave if nsm_use_hostnames were changed while hosts are still being monitored. This patch does not attempt to address those problems. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-01-06 11:53:51 -05:00
Chuck Lever	5acf43155d	NSM: convert printk(KERN_DEBUG) to a dprintk() Clean up: make the printk(KERN_DEBUG) in nsm_mon_unmon() a dprintk, and add another dprintk to note if creating an RPC client for the upcall failed. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-01-06 11:53:50 -05:00
Chuck Lever	a4846750f0	NSM: Use C99 structure initializer to initialize nsm_args Clean up: Use a C99 structure initializer instead of open-coding the initialization of nsm_args. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-01-06 11:53:50 -05:00
Chuck Lever	afb03699dc	NLM: Add helper to handle IPv4 addresses Clean up: introduce a helper function to generate IPv4 addresses using the same style as the IPv6 helper function we just added. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-01-06 11:53:49 -05:00
Chuck Lever	bc995801a0	NLM: Support IPv6 scope IDs in nlm_display_address() Scope ID support is needed since the kernel's NSM implementation is about to use these displayed addresses as a mon_name in some cases. When nsm_use_hostnames is zero, without scope ID support NSM will fail to handle peers that contact us via a link-local address. Link-local addresses do not work without an interface ID, which is stored in the sockaddr's sin6_scope_id field. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-01-06 11:53:49 -05:00
Chuck Lever	6999fb4016	NLM: Remove AF_UNSPEC arm in nlm_display_address() AF_UNSPEC support is no longer needed in nlm_display_address() now that a presentation address is no longer generated for the h_srcaddr field. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-01-06 11:53:49 -05:00
Chuck Lever	1df40b609a	NLM: Remove address eye-catcher buffers from nlm_host The h_name field in struct nlm_host is a just copy of h_nsmhandle->sm_name. Likewise, the contents of the h_addrbuf field should be identical to the sm_addrbuf field. The h_srcaddrbuf field is used only in one place for debugging. We can live without this until we get %pI formatting for printk(). Currently these buffers are 48 bytes, but we need to support scope IDs in IPv6 presentation addresses, which means making the buffers even larger. Instead, let's find ways to eliminate them to save space. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-01-06 11:53:49 -05:00
Jeff Layton	c72a476b4b	lockd: set svc_serv->sv_maxconn to a more reasonable value (try #3 ) The default method for calculating the number of connections allowed per RPC service arbitrarily limits single-threaded services to 80 connections. This is too low for services like lockd and artificially limits the number of TCP clients that it can support. Have lockd set a default sv_maxconn value to 1024 (which is the typical default value for RLIMIT_NOFILE. Also add a module parameter to allow an admin to set this to an arbitrary value. Signed-off-by: Jeff Layton <jlayton@redhat.com> Acked-by: Neil Brown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-01-06 11:53:48 -05:00
Krishna Kumar	2bd9e7b62e	nfsd: Fix leaked memory in nfs4_make_rec_clidname cksum.data is not freed up in one error case. Compile tested. Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-01-06 11:53:47 -05:00
Krishna Kumar	9346eff0de	nfsd: Minor cleanup of find_stateid Minor cleanup/rewrite of find_stateid. Compile tested. Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-01-06 11:53:45 -05:00
J. Bruce Fields	b3d47676d4	nfsd: update fh_verify description Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-01-06 11:53:45 -05:00
Frederik Schwarzer	0211a9c850	trivial: fix an -> a typos in documentation and comments It is always "an" if there is a vowel _spoken_ (not written). So it is: "an hour" (spoken vowel) but "a uniform" (spoken 'j') Signed-off-by: Frederik Schwarzer <schwarzerf@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2009-01-06 11:28:07 +01:00
Frederik Schwarzer	025dfdafe7	trivial: fix then -> than typos in comments and documentation - (better, more, bigger ...) then -> (...) than Signed-off-by: Frederik Schwarzer <schwarzerf@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2009-01-06 11:28:06 +01:00
Linus Torvalds	7d8a804c59	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm: dlm: fs/dlm/ast.c: fix warning dlm: add new debugfs entry dlm: add time stamp of blocking callback dlm: change lock time stamping dlm: improve how bast mode handling dlm: remove extra blocking callback check dlm: replace schedule with cond_resched dlm: remove kmap/kunmap dlm: trivial annotation of be16 value dlm: fix up memory allocation flags	2009-01-05 19:02:09 -08:00
Linus Torvalds	c54febae99	Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: (27 commits) GFS2: Use DEFINE_SPINLOCK GFS2: Fix use-after-free bug on umount (try #2) Revert "GFS2: Fix use-after-free bug on umount" GFS2: Streamline alloc calculations for writes GFS2: Send useful information with uevent messages GFS2: Fix use-after-free bug on umount GFS2: Remove ancient, unused code GFS2: Move four functions from super.c GFS2: Fix bug in gfs2_lock_fs_check_clean() GFS2: Send some sensible sysfs stuff GFS2: Kill two daemons with one patch GFS2: Move gfs2_recoverd into recovery.c GFS2: Fix "truncate in progress" hang GFS2: Clean up & move gfs2_quotad GFS2: Add more detail to debugfs glock dumps GFS2: Banish struct gfs2_rgrpd_host GFS2: Move rg_free from gfs2_rgrpd_host to gfs2_rgrpd GFS2: Move rg_igeneration into struct gfs2_rgrpd GFS2: Banish struct gfs2_dinode_host GFS2: Move i_size from gfs2_dinode_host and rename it to i_disksize ...	2009-01-05 18:52:54 -08:00
Linus Torvalds	10cc04f5a0	Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2: (138 commits) ocfs2: Access the right buffer_head in ocfs2_merge_rec_left. ocfs2: use min_t in ocfs2_quota_read() ocfs2: remove unneeded lvb casts ocfs2: Add xattr support checking in init_security ocfs2: alloc xattr bucket in ocfs2_xattr_set_handle ocfs2: calculate and reserve credits for xattr value in mknod ocfs2/xattr: fix credits calculation during index create ocfs2/xattr: Always updating ctime during xattr set. ocfs2/xattr: Remove extend_trans call and add its credits from the beginning ocfs2/dlm: Fix race during lockres mastery ocfs2/dlm: Fix race in adding/removing lockres' to/from the tracking list ocfs2/dlm: Hold off sending lockres drop ref message while lockres is migrating ocfs2/dlm: Clean up errors in dlm_proxy_ast_handler() ocfs2/dlm: Fix a race between migrate request and exit domain ocfs2: One more hamming code optimization. ocfs2: Another hamming code optimization. ocfs2: Don't hand-code xor in ocfs2_hamming_encode(). ocfs2: Enable metadata checksums. ocfs2: Validate superblock with checksum and ecc. ocfs2: Checksum and ECC for directory blocks. ...	2009-01-05 18:32:43 -08:00
Linus Torvalds	520c853466	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: inotify: fix type errors in interfaces fix breakage in reiserfs_new_inode() fix the treatment of jfs special inodes vfs: remove duplicate code in get_fs_type() add a vfs_fsync helper sys_execve and sys_uselib do not call into fsnotify zero i_uid/i_gid on inode allocation inode->i_op is never NULL ntfs: don't NULL i_op isofs check for NULL ->i_op in root directory is dead code affs: do not zero ->i_op kill suid bit only for regular files vfs: lseek(fd, 0, SEEK_CUR) race condition	2009-01-05 18:32:06 -08:00
Michael Kerrisk	4ae8978cf9	inotify: fix type errors in interfaces The problems lie in the types used for some inotify interfaces, both at the kernel level and at the glibc level. This mail addresses the kernel problem. I will follow up with some suggestions for glibc changes. For the sys_inotify_rm_watch() interface, the type of the 'wd' argument is currently 'u32', it should be '__s32' . That is Robert's suggestion, and is consistent with the other declarations of watch descriptors in the kernel source, in particular, the inotify_event structure in include/linux/inotify.h: struct inotify_event { __s32 wd; /* watch descriptor / __u32 mask; / watch mask / __u32 cookie; / cookie to synchronize two events / __u32 len; / length (including nulls) of name / char name[0]; / stub for possible name */ }; The patch makes the changes needed for inotify_rm_watch(). Signed-off-by: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: Robert Love <rlove@google.com> Cc: Vegard Nossum <vegard.nossum@gmail.com> Cc: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-01-05 11:54:29 -05:00
Al Viro	2f1169e2dc	fix breakage in reiserfs_new_inode() now that we use ih.key earlier, we need to do all its setup early enough Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-01-05 11:54:29 -05:00
Al Viro	5b45d96bf9	fix the treatment of jfs special inodes We used to put them on a single list, without any locking. Racy. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-01-05 11:54:29 -05:00
Li Zefan	d8e9650dff	vfs: remove duplicate code in get_fs_type() save 14 bytes: text data bss dec hex filename 1354 32 4 1390 56e fs/filesystems.o.before text data bss dec hex filename 1340 32 4 1376 560 fs/filesystems.o Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-01-05 11:54:29 -05:00
Christoph Hellwig	4c728ef583	add a vfs_fsync helper Fsync currently has a fdatawrite/fdatawait pair around the method call, and a mutex_lock/unlock of the inode mutex. All callers of fsync have to duplicate this, but we have a few and most of them don't quite get it right. This patch adds a new vfs_fsync that takes care of this. It's a little more complicated as usual as ->fsync might get a NULL file pointer and just a dentry from nfsd, but otherwise gets afile and we want to take the mapping and file operations from it when it is there. Notes on the fsync callers: - ecryptfs wasn't calling filemap_fdatawrite / filemap_fdatawait on the lower file - coda wasn't calling filemap_fdatawrite / filemap_fdatawait on the host file, and returning 0 when ->fsync was missing - shm wasn't calling either filemap_fdatawrite / filemap_fdatawait nor taking i_mutex. Now given that shared memory doesn't have disk backing not doing anything in fsync seems fine and I left it out of the vfs_fsync conversion for now, but in that case we might just not pass it through to the lower file at all but just call the no-op simple_sync_file directly. [and now actually export vfs_fsync] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-01-05 11:54:28 -05:00
Eric Paris	6110e3abbf	sys_execve and sys_uselib do not call into fsnotify sys_execve and sys_uselib do not call into fsnotify so inotify does not get open events for these types of syscalls. This patch simply makes the requisite fsnotify calls. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-01-05 11:54:28 -05:00
Al Viro	56ff5efad9	zero i_uid/i_gid on inode allocation ... and don't bother in callers. Don't bother with zeroing i_blocks, while we are at it - it's already been zeroed. i_mode is not worth the effort; it has no common default value. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-01-05 11:54:28 -05:00
Al Viro	acfa4380ef	inode->i_op is never NULL We used to have rather schizophrenic set of checks for NULL ->i_op even though it had been eliminated years ago. You'd need to go out of your way to set it to NULL explicitly _and_ a bunch of code would die on such inodes anyway. After killing two remaining places that still did that bogosity, all that crap can go away. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-01-05 11:54:28 -05:00
Al Viro	9742df331d	ntfs: don't NULL i_op it's already set to empty table (and no, ntfs doesn't have any explicit checks for NULL ->i_op or NULL ->i_fop) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-01-05 11:54:27 -05:00
Al Viro	261964c60f	isofs check for NULL ->i_op in root directory is dead code for one thing it never happens, for another we check that inode is a directory right after that place anyway (and we'd already checked that reading it from disk has not failed). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-01-05 11:53:38 -05:00
Al Viro	c765d47903	affs: do not zero ->i_op it is already set to empty table and should never be NULL Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-01-05 11:53:07 -05:00
Alain Knaff	5b6f1eb97d	vfs: lseek(fd, 0, SEEK_CUR) race condition This patch fixes a race condition in lseek. While it is expected that unpredictable behaviour may result while repositioning the offset of a file descriptor concurrently with reading/writing to the same file descriptor, this should not happen when merely reading the file descriptor's offset. Unfortunately, the only portable way in Unix to read a file descriptor's offset is lseek(fd, 0, SEEK_CUR); however executing this concurrently with read/write may mess up the position. [with fixes from akpm] Signed-off-by: Alain Knaff <alain@knaff.lu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-01-05 11:53:07 -05:00
Tao Ma	9047beabb8	ocfs2: Access the right buffer_head in ocfs2_merge_rec_left. In commit "ocfs2: Use metadata-specific ocfs2_journal_access_*() functions", the wrong buffer_head is accessed. So change it to the right buffer_head. Signed-off-by: Tao Ma <tao.ma@oracle.com> Acked-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:37 -08:00
Mark Fasheh	dad7d975e4	ocfs2: use min_t in ocfs2_quota_read() This is preferred to min(). Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:37 -08:00
Mark Fasheh	a641dc2a5a	ocfs2: remove unneeded lvb casts dlmglue.c has lots of code which casts the return value of ocfs2_dlm_lvb(). This is pointless however, as ocfs2_dlm_lvb() returns void *. Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:36 -08:00
Tiger Yang	38d59ef61c	ocfs2: Add xattr support checking in init_security We must check whether ocfs2 volume support xattr in init_security, if not support xattr and security is enable, would cause failure of mknod. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:36 -08:00
Tiger Yang	008aafaf0b	ocfs2: alloc xattr bucket in ocfs2_xattr_set_handle In extreme situation, may need xattr bucket for setting security entry and acl entries during mknod. This only happens when block size is too small. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:36 -08:00
Tiger Yang	0e445b6fe9	ocfs2: calculate and reserve credits for xattr value in mknod We extend the credits for xattr's large value in set_value_outside before, this can give rise to a credits issue when we set one security entry and two acl entries duing mknod. As we remove extend_trans form set_value_outside, we must calculate and reserve the credits for xattr's large value in mknod. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:36 -08:00
Tao Ma	90cb546cad	ocfs2/xattr: fix credits calculation during index create When creating a xattr index block, the old calculation forget to add credits for the meta change of the alloc file. So add more credits and more comments to explain it. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:36 -08:00
Tao Ma	4b3f6209bf	ocfs2/xattr: Always updating ctime during xattr set. In xattr set, we should always update ctime if the operation goes sucessfully. The old one mistakenly put it in ocfs2_xattr_set_entry which is only called when we set xattr in inode or xattr block. The side benefit is that it resolve the bug 1052 since in that scenario, ocfs2_calc_xattr_set_need only calc out the xattr set credits while ocfs2_xattr_set_entry update the inode also which isn't concerned with the process of xattr set. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:36 -08:00
Tao Ma	71d548a6af	ocfs2/xattr: Remove extend_trans call and add its credits from the beginning Actually, when setting a new xattr value, we know it from the very beginning, and it isn't like the extension of bucket in which case we can't figure it out. So remove ocfs2_extend_trans in that function and calculate it before the transaction. It also relieve acl operation from the worry about the side effect of ocfs2_extend_trans. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:36 -08:00
Sunil Mushran	7b791d6856	ocfs2/dlm: Fix race during lockres mastery dlm_get_lock_resource() is supposed to return a lock resource with a proper master. If multiple concurrent threads attempt to lookup the lockres for the same lockid while the lock mastery in underway, one or more threads are likely to return a lockres without a proper master. This patch makes the threads wait in dlm_get_lock_resource() while the mastery is underway, ensuring all threads return the lockres with a proper master. This issue is known to be limited to users using the flock() syscall. For all other fs operations, the ocfs2 dlmglue layer serializes the dlm op for each lockid. Users encountering this bug will see flock() return EINVAL and dmesg have the following error: ERROR: Dlm error "DLM_BADARGS" while calling dlmlock on resource <LOCKID>: bad api args Reported-by: Coly Li <coyli@suse.de> Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:35 -08:00
Sunil Mushran	b0d4f817ba	ocfs2/dlm: Fix race in adding/removing lockres' to/from the tracking list This patch adds a new lock, dlm->tracking_lock, to protect adding/removing lockres' to/from the dlm->tracking_list. We were previously using dlm->spinlock for the same, but that proved inadequate as we could be freeing a lockres from a context that did not hold that lock. As the new lock only protects this list, we can explicitly take it when removing the lockres from the tracking list. This bug was exposed when testing multiple processes concurrently flock() the same file. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:35 -08:00
Sunil Mushran	d4f7e650e5	ocfs2/dlm: Hold off sending lockres drop ref message while lockres is migrating During lockres purge, o2dlm sends a drop reference message to the lockres master. This patch delays the message if the lockres is being migrated. Fixes oss bugzilla#1012 http://oss.oracle.com/bugzilla/show_bug.cgi?id=1012 Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:35 -08:00
Sunil Mushran	57dff2676e	ocfs2/dlm: Clean up errors in dlm_proxy_ast_handler() Patch cleans printed errors in dlm_proxy_ast_handler(). The errors now includes the node number that sent the (b)ast. Also it reduces the number of endian swaps of the cookie. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:35 -08:00
Sunil Mushran	2b83256407	ocfs2/dlm: Fix a race between migrate request and exit domain Patch address a racing migrate request message and an exit domain message. Instead of blocking exit domains for the duration of the migrate, we ignore failure to deliver that message. This is because an exiting domain should not have any active locks and thus has no role to play in the migration. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:35 -08:00
Joel Becker	58896c4d0e	ocfs2: One more hamming code optimization. The previous optimization used a fast find-highest-bit-set operation to give us a good starting point in calc_code_bit(). This version lets the caller cache the previous code buffer bit offset. Thus, the next call always starts where the last one left off. This reduces the calculation another 39%, for a total 80% reduction from the original, naive implementation. At least, on my machine. This also brings the parity calculation to within an order of magnitude of the crc32 calculation. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:35 -08:00
Joel Becker	7bb458a585	ocfs2: Another hamming code optimization. In the calc_code_bit() function, we must find all powers of two beneath the code bit number, after it's shifted by those powers of two. This requires a loop to see where it ends up. We can optimize it by starting at its most significant bit. This shaves 32% off the time, for a total of 67.6% shaved off of the original, naive implementation. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:35 -08:00
Joel Becker	e798b3f8a9	ocfs2: Don't hand-code xor in ocfs2_hamming_encode(). When I wrote ocfs2_hamming_encode(), I was following documentation of the algorithm and didn't have quite the (possibly still imperfect) grasp of it I do now. As part of this, I literally hand-coded xor. I would test a bit, and then add that bit via xor to the parity word. I can, of course, just do a single xor of the parity word and the source word (the code buffer bit offset). This cuts CPU usage by 53% on a mostly populated buffer (an inode containing utmp.h inline). Joel Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:34 -08:00
Joel Becker	9d28cfb73f	ocfs2: Enable metadata checksums. Add OCFS2_FEATURE_INCOMPAT_META_ECC to the list of supported features. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:34 -08:00
Joel Becker	d030cc978e	ocfs2: Validate superblock with checksum and ecc. The superblock is read via a raw call. Validate it after we find it from its signature. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:34 -08:00
Joel Becker	c175a518b4	ocfs2: Checksum and ECC for directory blocks. Use the db_check field of ocfs2_dir_block_trailer to crc/ecc the dirblocks. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:34 -08:00
Mark Fasheh	87d35a74b1	ocfs2: Add directory block trailers. Future ocfs2 features metaecc and indexed directories need to store a little bit of data in each dirblock. For compatibility, we place this in a trailer at the end of the dirblock. The trailer plays itself as an empty dirent, so that if the features are turned off, it can be reused without requiring a tunefs scan. This code adds the trailer and validates it when the block is read in. [ Mark is the original author, but I reinserted this code before his dir index work. -- Joel ] Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:34 -08:00
Joel Becker	8400897249	ocfs2: Use proper journal_access function in xattr.c Change the rest of the naked ocfs2_journal_access() calls in fs/ocfs2/xattr.c to use the appropriate ocfs2_journal_access_*() call for their metadata type. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:34 -08:00
Joel Becker	4311901daa	ocfs2: Pass value buf to ocfs2_remove_value_outside(). ocfs2_remove_value_outside() needs to know the type of buffer it is looking at. Pass in an ocfs2_xattr_value_buf. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:33 -08:00
Joel Becker	512620f44d	ocfs2: Use ocfs2_xattr_value_buf in ocfs2_xattr_set_entry(). ocfs2_xattr_set_entry is the function that knows what type of block it is setting into. This is what we wanted from ocfs2_xattr_value_buf. Plus, moving the value buf up into ocfs2_xattr_set_entry() allows us to pass it into ocfs2_xattr_set_value_outside() and ocfs2_xattr_cleanup(). Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:33 -08:00
Joel Becker	0c748e9532	ocfs2: Pass value buf to ocfs2_xattr_update_entry(). ocfs2_xattr_update_entry() updates the entry portion of an xattr buffer. This can be part of multiple metadata block types, so pass the buffer in via an ocfs2_xattr_value_buf. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:33 -08:00
Joel Becker	b3e5d37905	ocfs2: Pass ocfs2_xattr_value_buf into ocfs2_xattr_value_truncate(). The callers of ocfs2_xattr_value_truncate() now pass in ocfs2_xattr_value_bufs. These callers are the ones that calculated the xv location, so they are the right starting point. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:32 -08:00
Joel Becker	19b801f45f	ocfs2: Pull ocfs2_xattr_value_buf up into ocfs2_xattr_value_truncate(). Place an ocfs2_xattr_value_buf in ocfs2_xattr_value_truncate() and pass it down to ocfs2_xattr_shrink_size(). We can also pass it into ocfs2_xattr_extend_allocation(), replacing its ocfs2_xattr_value_buf. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:32 -08:00
Joel Becker	d72cc72d57	ocfs2: Pull ocfs2_xattr_value_buf up from __ocfs2_remove_xattr_range(). Place an ocfs2_xattr_value_buf in __ocfs2_xattr_shrink_size() and pass it down to __ocfs2_remove_xattr_range(). Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:32 -08:00
Joel Becker	2a50a743bd	ocfs2: Create ocfs2_xattr_value_buf. When an ocfs2 extended attribute is large enough to require its own allocation tree, we root it with an ocfs2_xattr_value_root. However, these roots can be a part of inodes, xattr blocks, or xattr buckets. Thus, they need a different journal access function for each container. We wrap the bh, its journal access function, and the value root (xv) in a structure called ocfs2_xattr_valu_buf. This is a package that can be passed around. In this first pass, we simply pass it to the extent tree code. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:32 -08:00
Joel Becker	4d0e214ee8	ocfs2: Add ecc and checksums to ocfs2 xattr buckets. The xattr bucket can span multiple blocks on disk. We have wrappers for this structure in the code. We use the new multi-block ecc calls to calculate and validate the bucket. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:32 -08:00
Joel Becker	13723d00e3	ocfs2: Use metadata-specific ocfs2_journal_access_() functions. The per-metadata-type ocfs2_journal_access_() functions hook up jbd2 commit triggers and allow us to compute metadata ecc right before the buffers are written out. This commit provides ecc for inodes, extent blocks, group descriptors, and quota blocks. It is not safe to use extened attributes and metaecc at the same time yet. The ocfs2_extent_tree and ocfs2_path abstractions in alloc.c both hide the type of block at their root. Before, it didn't matter, but now the root block must use the appropriate ocfs2_journal_access_*() function. To keep this abstract, the structures now have a pointer to the matching journal_access function and a wrapper call to call it. A few places use naked ocfs2_write_block() calls instead of adding the blocks to the journal. We make sure to calculate their checksum and ecc before the write. Since we pass around the journal_access functions. Let's typedef them in ocfs2.h. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:32 -08:00
Joel Becker	ffdd7a5463	ocfs2: Wrap up the common use cases of ocfs2_new_path(). The majority of ocfs2_new_path() calls are: ocfs2_new_path(path_root_bh(otherpath), path_root_el(otherpath)); Let's call that ocfs2_new_path_from_path(). The rest do similar things from struct ocfs2_extent_tree. Let's call those ocfs2_new_path_from_et(). This will make the next change easier. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:31 -08:00
Joel Becker	50655ae9e9	ocfs2: Add journal_access functions with jbd2 triggers. We create wrappers for ocfs2_journal_access() that are specific to the type of metadata block. This allows us to associate jbd2 commit triggers with the block. The triggers will compute metadata ecc in a future commit. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:31 -08:00
Joel Becker	d6b32bbb3e	ocfs2: block read meta ecc. Add block check calls to the read_block validate functions. This is the almost all of the read-side checking of metaecc. xattr buckets are not checked yet. Writes are also unchecked, and so a read-write mount will quickly fail. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:31 -08:00
Joel Becker	684ef27837	ocfs2: Add a validation hook for quota block reads. Add a currently-returns-success hook for quota block reads. We'll be adding checks to this. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:31 -08:00
Joel Becker	70ad1ba7b4	ocfs2: Add the underlying blockcheck code. This is the code that computes crc32 and ecc for ocfs2 metadata blocks. There are high-level functions that check whether the filesystem has the ecc feature, mid-level functions that work on a single block or array of buffer_heads, and the low-level ecc hamming code that can handle multiple buffers like crc32_le(). It's not hooked up to the filesystem yet. Signed-off-by: Joel Becker <joel.becker@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:31 -08:00
Joel Becker	ab552d5467	ocfs2: Add the on-disk structures for metadata checksums. Define struct ocfs2_block_check, an 8-byte structure containing a 32bit crc32_le and a 16bit hamming code ecc. This will be used for metadata checksums. Add the structure to free spaces in the various metadata structures. Add the OCFS2_FEATURE_INCOMPAT_META_ECC bit. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:31 -08:00
Joel Becker	e06c8227fd	jbd2: Add buffer triggers Filesystems often to do compute intensive operation on some metadata. If this operation is repeated many times, it can be very expensive. It would be much nicer if the operation could be performed once before a buffer goes to disk. This adds triggers to jbd2 buffer heads. Just before writing a metadata buffer to the journal, jbd2 will optionally call a commit trigger associated with the buffer. If the journal is aborted, an abort trigger will be called on any dirty buffers as they are dropped from pending transactions. ocfs2 will use this feature. Initially I tried to come up with a more generic trigger that could be used for non-buffer-related events like transaction completion. It doesn't tie nicely, because the information a buffer trigger needs (specific to a journal_head) isn't the same as what a transaction trigger needs (specific to a tranaction_t or perhaps journal_t). So I implemented a buffer set, with the understanding that journal/transaction wide triggers should be implemented separately. There is only one trigger set allowed per buffer. I can't think of any reason to attach more than one set. Contrast this with a journal or transaction in which multiple places may want to watch the entire transaction separately. The trigger sets are considered static allocation from the jbd2 perspective. ocfs2 will just have one trigger set per block type, setting the same set on every bh of the same type. Signed-off-by: Joel Becker <joel.becker@oracle.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:30 -08:00
Tao Ma	754938c142	ocfs2/quota: Add QUOTA in mlog_attribute. A new mlog mask has to be added into mlog_attribute before it can be really used in mlog. ML_QUOTA is only added in masklog.h, so add it to the array to enable it. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:30 -08:00
Joel Becker	91f2033fa9	ocfs2: Pass xs->bucket into ocfs2_add_new_xattr_bucket(). Pass the actual target bucket for insert through to ocfs2_add_new_xattr_bucket(). Now growing a bucket has no buffer_head knowledge. ocfs2_add_new_xattr_bucket() leavs xs->bucket in the proper state for insert. However, it doesn't update the rest of the search fields in xs, so we still have to relse() and re-find. That's OK, because everything is cached. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:30 -08:00
Joel Becker	ed29c0ca14	ocfs2: Move buckets up into ocfs2_add_new_xattr_bucket(). Lift the buckets from ocfs2_add_new_xattr_cluster() up into ocfs2_add_new_xattr_bucket(). Now ocfs2_add_new_xattr_cluster() doesn't deal with buffer_heads. In fact, we no longer have to play get_bh() tricks at all. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:30 -08:00
Joel Becker	012ee91087	ocfs2: Move buckets up into ocfs2_add_new_xattr_cluster(). Lift the buckets from ocfs2_adjust_xattr_cross_cluster() up into ocfs2_add_new_xattr_cluster(). Now ocfs2_adjust_xattr_cross_cluster() doesn't deal with buffer_heads. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:30 -08:00
Joel Becker	41cb814866	ocfs2: Pass buckets into ocfs2_mv_xattr_bucket_cross_cluster(). Now that ocfs2_adjust_xattr_cross_cluster() has buckets, it can pass them into ocfs2_mv_xattr_bucket_cross_cluster(). It no longer has to care about buffer_heads. The manipulation of first_bh and header_bh moves up to ocfs2_adjust_xattr_cross_cluster(). Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:30 -08:00
Joel Becker	92cf3adf48	ocfs2: Start using buckets in ocfs2_adjust_xattr_cross_cluster(). We want to be passing around buckets instead of buffer_heads. Let's get them into ocfs2_adjust_xattr_cross_cluster. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:30 -08:00
Joel Becker	c58b6032f9	ocfs2: Use ocfs2_mv_xattr_buckets() in ocfs2_mv_xattr_bucket_cross_cluster(). Now that ocfs2_mv_xattr_buckets() can move a partial cluster's worth of buckets, ocfs2_mv_xattr_bucket_cross_cluster() can use it. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:29 -08:00
Joel Becker	54ecb6b6df	ocfs2: ocfs2_mv_xattr_buckets() can handle a partial cluster now. If you look at ocfs2_mv_xattr_bucket_cross_cluster(), you'll notice that two-thirds of the code is almost identical to ocfs2_mv_xattr_buckets(). The only difference is that ocfs2_mv_xattr_buckets() moves a whole cluster's worth, while ocfs2_mv_xattr_bucket_cross_cluster() moves half the cluster. We change ocfs2_mv_xattr_buckets() to allow moving partial clusters. The original caller of ocfs2_mv_xattr_buckets() still moves the whole cluster's worth - it just passes a start_bucket of 0. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:29 -08:00
Joel Becker	874d65af1c	ocfs2: Rename ocfs2_cp_xattr_cluster() to ocfs2_mv_xattr_buckets(). ocfs2_cp_xattr_cluster() takes the last cluster of an xattr extent, copies its buckets to the front of a new extent, and then shrinks the bucket count of the original extent. So it's really moving the data, not copying it. While we're here, the function doesn't need a buffer_head for the old extent, just the block number. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:29 -08:00
Joel Becker	b5c03e7469	ocfs2: Use ocfs2_cp_xattr_bucket() in ocfs2_mv_xattr_bucket_cross_cluster(). The buffer copy loop of ocfs2_mv_xattr_bucket_cross_cluster() actually looks a lot like ocfs2_cp_xattr_bucket(). Let's just use that instead. We also use bucket operations to update the buckets at the start of each extent. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:27 -08:00
Joel Becker	2b656c1d6f	ocfs2: Explain t_is_new in ocfs2_cp_xattr_cluster(). I was unsure of the JOURNAL_ACCESS parameters in ocfs2_cp_xattr_cluster(). They're based on the function argument 't_is_new', but I couldn't quite figure out how t_is_new mapped to allocation. ocfs2_cp_xattr_cluster() actually overwrites the target, regardless of t_is_new. Well, I just figured it out. So I'm adding a big fat comment for those who come after me. ocfs2_divide_xattr_cluster() has the same behavior. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:27 -08:00
Joel Becker	15d609293d	ocfs2: Dirty the entire first bucket in ocfs2_cp_xattr_cluster(). ocfs2_cp_xattr_cluster() takes the last bucket of a full extent and copies it over to a new extent. It then updates the headers of both extents to reflect the new state. It is passed the first bh of the first bucket in order to update that first extent's bucket count. It reads and dirties the first bh of the new extent for the same reason. However, future code wants to always dirty the entire bucket when it is changed. So it is changed to read the entire bucket it is updating for both extents. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:27 -08:00
Joel Becker	92de109ade	ocfs2: Dirty the entire first bucket in ocfs2_extend_xattr_bucket() ocfs2_extend_xattr_bucket() takes an extent of buckets and shifts some of them down to make room for a new xattr. It is passed the first bh of the first bucket, because that is where we store the number of buckets in the extent. However, future code wants to always dirty the entire bucket when it is changed. So let's pass the entire bucket into this function, skip any block reads (we have them), and add the access/dirty logic. We also can skip passing in the target bucket bh - we only need its block number. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:26 -08:00
Tao Ma	88c3b0622a	ocfs2: Narrow the transaction for deleting xattrs from a bucket. We move the transaction into the loop because in ocfs2_remove_extent, we will double the credits in function ocfs2_extend_rotate_transaction. So if we have a large loop number, we will soon waste much the journal space. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:26 -08:00
Joel Becker	548b0f22bb	ocfs2: Dirty the entire bucket in ocfs2_bucket_value_truncate() ocfs2_bucket_value_truncate() currently takes the first bh of the bucket, and magically plays around with the value bh - even though the bucket structure in the calling function already has it. In addition, future code wants to always dirty the entire bucket when it is changed. So let's pass the entire bucket into this function, skip any block reads (we have them), and add the access/dirty logic. ocfs2_xattr_update_value_size() is no longer necessary, as it only did one thing other than journal access/dirty. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:26 -08:00
Tao Ma	df32b3343a	ocfs2/quota: sparse fixes for quota Fix 2 minor things in quota. They are both found by sparse check. 1. an endian bug in ocfs2_local_quota_add_chunk. 2. change olq_alloc_dquot to static. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:26 -08:00
Tao Ma	e35ff98f7c	ocfs2: fix indendation in ocfs2_dquot_drop_slow Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:26 -08:00
Jan Kara	a5b5ee3201	ext4: Add default allocation routines for quota structures Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:26 -08:00
Jan Kara	157091a2c3	ext3: Add default allocation routines for quota structures Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:25 -08:00
Jan Kara	4103003b3a	reiserfs: Add default allocation routines for quota structures Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:25 -08:00
Jan Kara	7d9056ba20	quota: Export dquot_alloc() and dquot_destroy() functions These are default functions for creating and destroying quota structures and they should be used from filesystems. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:25 -08:00
Jan Kara	9a2f3866c8	ocfs2: Fix build warnings (64-bit types vs long long) fs/ocfs2/quota_local.c: In function 'olq_set_dquot': fs/ocfs2/quota_local.c:844: warning: format '%lld' expects type 'long long int', but argument 7 has type '__le64' fs/ocfs2/quota_local.c:844: warning: format '%lld' expects type 'long long int', but argument 8 has type '__le64' fs/ocfs2/quota_local.c:844: warning: format '%lld' expects type 'long long int', but argument 7 has type '__le64' fs/ocfs2/quota_local.c:844: warning: format '%lld' expects type 'long long int', but argument 8 has type '__le64' fs/ocfs2/quota_local.c:844: warning: format '%lld' expects type 'long long int', but argument 7 has type '__le64' fs/ocfs2/quota_local.c:844: warning: format '%lld' expects type 'long long int', but argument 8 has type '__le64' fs/ocfs2/quota_global.c: In function '__ocfs2_sync_dquot': fs/ocfs2/quota_global.c:457: warning: format '%lld' expects type 'long long int', but argument 8 has type 's64' fs/ocfs2/quota_global.c:457: warning: format '%lld' expects type 'long long int', but argument 10 has type 's64' fs/ocfs2/quota_global.c:457: warning: format '%lld' expects type 'long long int', but argument 8 has type 's64' fs/ocfs2/quota_global.c:457: warning: format '%lld' expects type 'long long int', but argument 10 has type 's64' fs/ocfs2/quota_global.c:457: warning: format '%lld' expects type 'long long int', but argument 8 has type 's64' fs/ocfs2/quota_global.c:457: warning: format '%lld' expects type 'long long int', but argument 10 has type 's64' Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:25 -08:00
Jan Kara	53a3604610	ocfs2: Make ocfs2_get_quota_block() consistent with ocfs2_read_quota_block() Make function return error status and not buffer pointer so that it's consistent with ocfs2_read_quota_block(). Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:25 -08:00
Jan Kara	af09e51b68	ocfs2: Fix oops when extending quota files We have to mark buffer as uptodate before calling ocfs2_journal_access() and ocfs2_set_buffer_uptodate() does not do this for us. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:25 -08:00
Joel Becker	85eb8b73d6	ocfs2: Fix ocfs2_read_quota_block() error handling. ocfs2_bread() has become ocfs2_read_virt_blocks(), with a prototype to match ocfs2_read_blocks(). The quota code, converting from ocfs2_bread(), wraps the call to ocfs2_read_virt_blocks() in ocfs2_read_quota_block(). Unfortunately, the prototype of ocfs2_read_quota_block() matches the old prototype of ocfs2_bread(). The problem is that ocfs2_bread() returned the buffer head, and callers assumed that a NULL pointer was indicative of error. It wasn't. This is why ocfs2_bread() took an int*err argument as well. The new prototype of ocfs2_read_virt_blocks() avoids this error handling confusion. Let's change ocfs2_read_quota_block() to match. Signed-off-by: Joel Becker <joel.becker@oracle.com> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:24 -08:00
Jan Kara	57a09a7b3d	ocfs2: Add missing initialization Add missing variable initialization to ocfs2_dquot_drop_slow(). Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:24 -08:00
Mark Fasheh	b86c86fa1f	ocfs2: Use BH_JBDPrivateStart instead of BH_Unshadow This is safer. We no longer have to worry about tracking changes to jbd_state_bits. Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:24 -08:00
Jan Kara	19ece546a4	ocfs2: Enable quota accounting on mount, disable on umount Enable quota usage tracking on mount and disable it on umount. Also add support for quota on and quota off quotactls and usrquota and grpquota mount options. Add quota features among supported ones. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:24 -08:00
Jan Kara	2205363dce	ocfs2: Implement quota recovery Implement functions for recovery after a crash. Functions just read local quota file and sync info to global quota file. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:24 -08:00
Mark Fasheh	171bf93ce1	ocfs2: Periodic quota syncing This patch creates a work queue for periodic syncing of locally cached quota information to the global quota files. We constantly queue a delayed work item, to get the periodic behavior. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Acked-by: Jan Kara <jack@suse.cz>	2009-01-05 08:40:24 -08:00
Jan Kara	a90714c150	ocfs2: Add quota calls for allocation and freeing of inodes and space Add quota calls for allocation and freeing of inodes and space, also update estimates on number of needed credits for a transaction. Move out inode allocation from ocfs2_mknod_locked() because vfs_dq_init() must be called outside of a transaction. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:23 -08:00
Jan Kara	9e33d69f55	ocfs2: Implementation of local and global quota file handling For each quota type each node has local quota file. In this file it stores changes users have made to disk usage via this node. Once in a while this information is synced to global file (and thus with other nodes) so that limits enforcement at least aproximately works. Global quota files contain all the information about usage and limits. It's mostly handled by the generic VFS code (which implements a trie of structures inside a quota file). We only have to provide functions to convert structures from on-disk format to in-memory one. We also have to provide wrappers for various quota functions starting transactions and acquiring necessary cluster locks before the actual IO is really started. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:23 -08:00
Jan Kara	bbbd0eb34b	ocfs2: Mark system files as not subject to quota accounting Mark system files as not subject to quota accounting. This prevents possible recursions into quota code and thus deadlocks. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:23 -08:00
Jan Kara	1a224ad11e	ocfs2: Assign feature bits and system inodes to quota feature and quota files Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:23 -08:00
Jan Kara	90e86a63ea	ocfs2: Support nested transactions OCFS2 can easily support nested transactions. We just have to take care and not spoil statistics acquire semaphore unnecessarily. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:23 -08:00
Jan Kara	12c77527e4	quota: Implement function for scanning active dquots OCFS2 needs to scan all active dquots once in a while and sync quota information among cluster nodes. Provide a helper function for it so that it does not have to reimplement internally a list which VFS already has. Moreover this function is probably going to be useful for other clustered filesystems if they decide to use VFS quotas. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:23 -08:00
Jan Kara	3d9ea253a0	quota: Add helpers to allow ocfs2 specific quota initialization, freeing and recovery OCFS2 needs to peek whether quota structure is already in memory so that it can avoid expensive cluster locking in that case. Similarly when freeing dquots, it checks whether it is the last quota structure user or not. Finally, it needs to get reference to dquot structure for specified id and quota type when recovering quota file after crash. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:22 -08:00
Jan Kara	4d59bce4f9	quota: Keep which entries were set by SETQUOTA quotactl Quota in a clustered environment needs to synchronize quota information among cluster nodes. This means we have to occasionally update some information in dquot from disk / network. On the other hand we have to be careful not to overwrite changes administrator did via SETQUOTA. So indicate in dquot->dq_flags which entries have been set by SETQUOTA and quota format can clear these flags when it properly propagated the changes. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:22 -08:00
Jan Kara	db49d2df48	quota: Allow negative usage of space and inodes For clustered filesystems, it can happen that space / inode usage goes negative temporarily (because some node is allocating another node is freeing and they are not completely in sync). So let quota code allow this and change qsize_t so a signed type so that we don't underflow the variables. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:21 -08:00
Jan Kara	e3d4d56b97	quota: Convert union in mem_dqinfo to a pointer Coming quota support for OCFS2 is going to need quite a bit of additional per-sb quota information. Moreover having fs.h include all the types needed for this structure would be a pain in the a**. So remove the union from mem_dqinfo and add a private pointer for filesystem's use. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:21 -08:00
Jan Kara	1ccd14b9c2	quota: Split off quota tree handling into a separate file There is going to be a new version of quota format having 64-bit quota limits and a new quota format for OCFS2. They are both going to use the same tree structure as VFSv0 quota format. So split out tree handling into a separate file and make size of leaf blocks, amount of space usable in each block (needed for checksumming) and structures contained in them configurable so that the code can be shared. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:40:21 -08:00
Jan Kara	cf770c1371	quota: Move quotaio_v[12].h from include/linux/ to fs/ Since these include files are used only by implementation of quota formats, there's no need to have them in include/linux/. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:36:58 -08:00
Jan Kara	ca785ec66b	quota: Introduce DQUOT_QUOTA_SYS_FILE flag If filesystem can handle quota files as system files hidden from users, we can skip a lot of cache invalidation, syncing, inode flags setting etc. when turning quotas on, off and quota_sync. Allow filesystem to indicate that it is hiding quota files from users by DQUOT_QUOTA_SYS_FILE flag. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:36:57 -08:00
Jan Kara	6929f89124	reiserfs: Use sb_any_quota_loaded() instead of sb_any_quota_enabled(). Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:36:56 -08:00
Jan Kara	17bd13b31c	ext4: Use sb_any_quota_loaded() instead of sb_any_quota_enabled() Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:36:56 -08:00
Jan Kara	ee0d5ffe0d	ext3: Use sb_any_quota_loaded() instead of sb_any_quota_enabled() Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:36:56 -08:00
Jan Kara	f55abc0fb9	quota: Allow to separately enable quota accounting and enforcing limits Split DQUOT_USR_ENABLED (and DQUOT_GRP_ENABLED) into DQUOT_USR_USAGE_ENABLED and DQUOT_USR_LIMITS_ENABLED. This way we are able to separately enable / disable whether we should: 1) ignore quotas completely 2) just keep uptodate information about usage 3) actually enforce quota limits This is going to be useful when quota is treated as filesystem metadata - we then want to keep quota information uptodate all the time and just enable / disable limits enforcement. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:36:56 -08:00
Jan Kara	e4bc7b4b7f	quota: Make _SUSPENDED just a flag Upto now, DQUOT_USR_SUSPENDED behaved like a state - i.e., either quota was enabled or suspended or none. Now allowed states are 0, ENABLED, ENABLED \| SUSPENDED. This will be useful later when we implement separate enabling of quota usage tracking and limits enforcement because we need to keep track of a state which has been suspended. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:36:56 -08:00
Jan Kara	1497d3ad48	quota: Remove bogus 'optimization' in check_idq() and check_bdq() Checks like <= 0 for an unsigned type do not make much sence. The value could be only 0 and that does not happen often enough for the check to be worth it. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:36:56 -08:00
Jan Kara	12095460f7	quota: Increase size of variables for limits and inode usage So far quota was fine with quota block limits and inode limits/numbers in a 32-bit type. Now with rapid increase in storage sizes there are coming requests to be able to handle quota limits above 4TB / more that 2^32 inodes. So bump up sizes of types in mem_dqblk structure to 64-bits to be able to handle this. Also update inode allocation / checking functions to use qsize_t and make global structure keep quota limits in bytes so that things are consistent. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:36:55 -08:00
Jan Kara	74f783af95	quota: Add callbacks for allocating and destroying dquot structures Some filesystems would like to keep private information together with each dquot. Add callbacks alloc_dquot and destroy_dquot allowing filesystem to allocate larger dquots from their private slab in a similar fashion we currently allocate inodes. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:36:55 -08:00
Tao Ma	9f868f16e4	ocfs2/xattr: Restore not_found in xis During an xattr set, when we move a xattr which was stored in inode to the outside bucket, we have to delete it and it will use the old value of xis->not_found. xis->not_found is removed by ocfs2_calc_xattr_set_need though, so we must restore it. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:36:55 -08:00
Tao Ma	97aff52ae1	ocfs2/xattr: Fix a bug in xattr allocation estimation When we extend one xattr's value to a large size, the old value size might be smaller than the size of a value root. In those cases, we still need to guess the metadata allocation. Reported-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:36:55 -08:00
Mark Fasheh	53ef99cad9	ocfs2: Remove JBD compatibility layer JBD2 is fully backwards compatible with JBD and it's been tested enough with Ocfs2 that we can clean this code up now. Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:36:55 -08:00
Joel Becker	511308d90b	ocfs2: Convert ocfs2_read_dir_block() to ocfs2_read_virt_blocks() Now that we've centralized the ocfs2_read_virt_blocks() code, let's use it in ocfs2_read_dir_block(). Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:36:55 -08:00
Joel Becker	a8549fb5ab	ocfs2: Wrap virtual block reads in ocfs2_read_virt_blocks() The ocfs2_read_dir_block() function really maps an inode's virtual blocks to physical ones before calling ocfs2_read_blocks(). Let's extract that to common code, because other places might want to do that. Other than the block number being virtual, ocfs2_read_virt_blocks() takes the same arguments as ocfs2_read_blocks(). It converts those virtual block numbers to physical before calling ocfs2_read_blocks() directly. If the blocks asked for are discontiguous, this can mean multiple calls to ocfs2_read_blocks(), but this is mostly hidden from the caller. Like ocfs2_read_blocks(), the caller can pass in an existing buffer_head. This is usually done to pick up some readahead I/O. ocfs2_read_virt_blocks() checks the buffer_head's block number against the extent map - it must match. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:36:54 -08:00
Joel Becker	970e4936d7	ocfs2: Validate metadata only when it's read from disk. Add an optional validation hook to ocfs2_read_blocks(). Now the validation function is only called when a block was actually read off of disk. It is not called when the buffer was in cache. We add a buffer state bit BH_NeedsValidate to flag these buffers. It must always be one higher than the last JBD2 buffer state bit. The dinode, dirblock, extent_block, and xattr_block validators are lifted to this scheme directly. The group_descriptor validator needs to be split into two pieces. The first part only needs the gd buffer and is passed to ocfs2_read_block(). The second part requires the dinode as well, and is called every time. It's only 3 compares, so it's tiny. This also allows us to clean up the non-fatal gd check used by resize.c. It now has no magic argument. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:36:53 -08:00
Joel Becker	4ae1d69bed	ocfs2: Wrap xattr block reads in a dedicated function We weren't consistently checking xattr blocks after we read them. Most places checked the signature, but none checked xb_blkno or xb_fs_signature. Create a toplevel ocfs2_read_xattr_block() that does the read and the validation. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:36:53 -08:00
Joel Becker	a22305cc69	ocfs2: Wrap dirblock reads in a dedicated function. We have ocfs2_bread() as a vestige of the original ext-based dir code. It's only used by directories, though. Turn it into ocfs2_read_dir_block(), with a prototype matching the other metadata read functions. It's set up to validate dirblocks when the time comes. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:36:53 -08:00
Joel Becker	5e96581a37	ocfs2: Wrap extent block reads in a dedicated function. We weren't consistently checking extent blocks after we read them. Most places checked the signature, but none checked h_blkno or h_fs_signature. Create a toplevel ocfs2_read_extent_block() that does the read and the validation. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:36:53 -08:00
Joel Becker	4203530613	ocfs2: Morph the haphazard OCFS2_IS_VALID_GROUP_DESC() checks. Random places in the code would check a group descriptor bh to see if it was valid. The previous commit unified descriptor block reads, validating all block reads in the same place. Thus, these checks are no longer necessary. Rather than eliminate them, however, we change them to BUG_ON() checks. This ensures the assumptions remain true. All of the code paths to these checks have been audited to ensure they come from a validated descriptor read. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:36:53 -08:00
Joel Becker	68f64d471b	ocfs2: Wrap group descriptor reads in a dedicated function. We have a clean call for validating group descriptors, but every place that wants the always does a read_block()+validate() call pair. Create a toplevel ocfs2_read_group_descriptor() that does the right thing. This allows us to leverage the single call point later for fancier handling. We also add validation of gd->bg_generation against the superblock and gd->bg_blkno against the block we thought we read. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:36:53 -08:00
Joel Becker	57e3e79711	ocfs2: Consolidate validation of group descriptors. Currently the validation of group descriptors is directly duplicated so that one version can error the filesystem and the other (resize) can just report the problem. Consolidate to one function that takes a boolean. Wrap that function with the old call for the old users. This is in preparation for lifting the read+validate step into a single function. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:36:53 -08:00
Joel Becker	10995aa245	ocfs2: Morph the haphazard OCFS2_IS_VALID_DINODE() checks. Random places in the code would check a dinode bh to see if it was valid. Not only did they do different levels of validation, they handled errors in different ways. The previous commit unified inode block reads, validating all block reads in the same place. Thus, these haphazard checks are no longer necessary. Rather than eliminate them, however, we change them to BUG_ON() checks. This ensures the assumptions remain true. All of the code paths to these checks have been audited to ensure they come from a validated inode read. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:36:52 -08:00
Joel Becker	b657c95c11	ocfs2: Wrap inode block reads in a dedicated function. The ocfs2 code currently reads inodes off disk with a simple ocfs2_read_block() call. Each place that does this has a different set of sanity checks it performs. Some check only the signature. A couple validate the block number (the block read vs di->i_blkno). A couple others check for VALID_FL. Only one place validates i_fs_generation. A couple check nothing. Even when an error is found, they don't all do the same thing. We wrap inode reading into ocfs2_read_inode_block(). This will validate all the above fields, going readonly if they are invalid (they never should be). ocfs2_read_inode_block_full() is provided for the places that want to pass read_block flags. Every caller is passing a struct inode with a valid ip_blkno, so we don't need a separate blkno argument either. We will remove the validation checks from the rest of the code in a later commit, as they are no longer necessary. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:36:52 -08:00
Tiger Yang	a68979b857	ocfs2: add mount option and Kconfig option for acl This patch adds the Kconfig option "CONFIG_OCFS2_FS_POSIX_ACL" and mount options "acl" to enable acls in Ocfs2. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:36:52 -08:00
Tiger Yang	89c38bd0ad	ocfs2: add ocfs2_init_acl in mknod We need to get the parent directories acls and let the new child inherit it. To this, we add additional calculations for data/metadata allocation. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:34:20 -08:00
Tiger Yang	060bc66dd5	ocfs2: add ocfs2_acl_chmod This function is used to update acl xattrs during file mode changes. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:34:20 -08:00
Tiger Yang	23fc2702be	ocfs2: add ocfs2_check_acl This function is used to enhance permission checking with POSIX ACLs. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:34:20 -08:00
Tiger Yang	929fb014e0	ocfs2: add POSIX ACL API This patch adds POSIX ACL(access control lists) APIs in ocfs2. We convert struct posix_acl to many ocfs2_acl_entry and regard them as an extended attribute entry. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:34:20 -08:00
Tiger Yang	4e3e9d027f	ocfs2: add ocfs2_xattr_get_nolock This function does the work of ocfs2_xattr_get under an open lock. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:34:20 -08:00
Tiger Yang	534eadddc1	ocfs2: add ocfs2_init_security in during file create Security attributes must be set when creating a new inode. We do this in three steps. - First, get security xattr's name and value by security_operation - Calculate and reserve the meta data and clusters needed by this security xattr before starting transaction - Finally, we set it before add_entry Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:34:20 -08:00
Tiger Yang	923f7f3102	ocfs2: add security xattr API This patch add security xattr set/get/list APIs to support security attributes in Ocfs2. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:34:20 -08:00
Tiger Yang	6c3faba442	ocfs2: add ocfs2_xattr_set_handle This function is used to set xattr's in a started transaction. It is only called during inode creation inode for initial security/acl xattrs of the new inode. These xattrs could be put into ibody or extent block, so xattr bucket would not be use in this case. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:34:19 -08:00
Tiger Yang	f5d362022a	ocfs2: move new inode allocation out of the transaction Move out inode allocation from ocfs2_mknod_locked() because vfs_dq_init() must be called outside of a transaction. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:34:19 -08:00
Mark Fasheh	fecc01126d	ocfs2: turn __ocfs2_remove_inode_range() into ocfs2_remove_btree_range() This patch genericizes the high level handling of extent removal. ocfs2_remove_btree_range() is nearly identical to __ocfs2_remove_inode_range(), except that extent tree operations have been used where necessary. We update ocfs2_remove_inode_range() to use the generic helper. Now extent tree based structures have an easy way to truncate ranges. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Acked-by: Joel Becker <joel.becker@oracle.com>	2009-01-05 08:34:19 -08:00
Tao Ma	85db90e778	ocfs2/xattr: Merge xattr set transaction. In current ocfs2/xattr, the whole xattr set is divided into many steps are many transaction are used, this make the xattr set process isn't like a real transaction, so this patch try to merge all the transaction into one. Another benefit is that acl can use it easily now. I don't merge the transaction of deleting xattr when we remove an inode. The reason is that if we have a large number of xattrs and every xattrs has large values(large enough for outside storage), the whole transaction will be very huge and it looks like jbd can't handle it(I meet with a jbd complain once). And the old inode removal is also divided into many steps, so I'd like to leave as it is. Note: In xattr set, I try to avoid ocfs2_extend_trans since if the credits aren't enough for the extension, it will commit all the dirty blocks and create a new transaction which may lead to inconsistency in metadata. All ocfs2_extend_trans remained are safe now. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2009-01-05 08:34:19 -08:00

1 2 3 4 5 ...

11526 Commits