Commit Graph

507 Commits

Author SHA1 Message Date
Artur Świgoń
c42293a951 orangefs: Add octal zero prefix
This patch adds a missing zero to mode 755 specification required to
express it in octal numeral system.

Reported-by: Łukasz Wrochna <l.wrochna@samsung.com>
Signed-off-by: Artur Świgoń <a.swigon@partner.samsung.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2019-09-12 14:17:16 -04:00
Mauro Carvalho Chehab
25b532cec5 docs: fs: convert porting to ReST
This file has its own proper style, except that, after a while,
the coding style gets violated and whitespaces are placed on
different ways.

As Sphinx and ReST are very sentitive to whitespace differences,
I had to opt if each entry after required/mandatory/... fields
should start with zero spaces or with a tab. I opted to start them
all from the zero position, in order to avoid needing to break lines
with more than 80 columns, with would make harder for review.

Most of the other changes at porting.rst were made to use an unified
notation with works nice as a text file while also produce a good html
output after being parsed.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2019-07-31 13:31:10 -06:00
Mauro Carvalho Chehab
ec23eb54fb docs: fs: convert docs without extension to ReST
There are 3 remaining files without an extension inside the fs docs
dir.

Manually convert them to ReST.

In the case of the nfs/exporting.rst file, as the nfs docs
aren't ported yet, I opted to convert and add a :orphan: there,
with should be removed when it gets added into a nfs-specific
part of the fs documentation.

Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2019-07-31 13:31:05 -06:00
Linus Torvalds
0a8ad0ffa4 orangefs: This simple pull request is just a fix for an
Unused Value that colin.king@canonical.com sent me and a
 related fix I added.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJdLflkAAoJEM9EDqnrzg2+XEIQAImX96e5A/gDSAFhL23cIW73
 2mAeCjwvpHk9hmLESpek3NfKH9fBkKS07bv0ICr1P4vSW2KIecbyc6Megern8w6H
 WuY0WSELCDy207zqHf1BeGwEMC8Z/MadVSP/2wed7E0+AppcsPeGhe5yFlD8MKYq
 sFBoZ3tQ88iujNqlORlPy7bWJzsx3a9MYznDlbeIVY7gsgnFrgyHCoY+cpZTG9fr
 ZEQmyw2B7NWwzu438i3Qk+MDiztq4ldwhsO8HFt3+qTBtfUzmb7upWFMM1AUWQxd
 3XIYZiCXgPmfIsQQJE5HPeE10zrMNVRYu6CjeUZPl11iy+/OKcY/UVQCwNxbQFAc
 ZBFk6OIj9wZsMWAgnlp6zUUJ/s//OudnuSbbHlj3VDVEa+aZSJawgoIsauCyyZc/
 5/sDpKcjb5Z4O5Sd+BkT8Zg0iAszBx7aJSqj/ggPvqxt7UQ9dF97AgYdZYfav/br
 xL2O9KNfWGHj+qUV8qmQXVMJZWPrHZjQMPqA4Oi5CxmW+eF1kxr2mw3bU7suHXKm
 CA7fRECsiyo2ULIl3wT12cOrpuFRQegkUW6AuBq5Z2JNw+H15JCbPZVMllfj7xye
 TAxXXp9atcyRpCOrtB3dRO0itnGDrWx4WAxaKR127IeqKAiP2ZsPceu0mILamPno
 ekJLcwC8kwbUDrNHR170
 =PJuy
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-5.3-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux

Pull orangefs updates from Mike Marshall:
 "Two small fixes.

  This is just a fix for an unused value that Colin King sent me and a
  related fix I added"

* tag 'for-linus-5.3-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
  orangefs: eliminate needless variable assignments
  orangefs: remove redundant assignment to variable buffer_index
2019-07-16 15:15:29 -07:00
Linus Torvalds
5010fe9f09 New for 5.3:
- Standardize parameter checking for the SETFLAGS and FSSETXATTR ioctls
   (which were the file attribute setters for ext4 and xfs and have now
   been hoisted to the vfs)
 - Only allow the DAX flag to be set on files and directories.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAl0aJgMACgkQ+H93GTRK
 tOuKkg//SJaxcB63uVPZk9hDraYTmyo9OXRRX6X9WwDKPTWwa88CUwS1ny1QF7Mt
 zMkgzG2/y2Rs9PQ0ARoPbh1hNb2CXnvA+xnzUEev1MW6UN/nTFMZEOPn2ZQ+DxQE
 gg/0U56kKgtjtXzBZVpTgHzSETivdXwHxFW3hiTtyRXg+4ulgDIZLOjN2wRB+Pdb
 X8ZmM6MqKOTbhQEXlw13TlCKBzoMjC1w4UU4rkZPjoSjAaUWiPfrk/XU7qgguf9p
 v1dbSN2dADQ19jzZ1dmggXnlJsRMZjk/ls5rxJlB5DHDbh6YgnA2TE+tYrtH28eB
 uyKfD+RQnMzRVdmH8PsMQRQQFXR2UYyprVP7a6wi6TkB+gytn7sR5uT4sbAhmhcF
 TiTYfYNRXzemHCewyOwOsUE/7oCeiJcdbqiPAHHD/jYLZfRjSXDcGzz3+7ZYZ3GO
 hRxUhpxHPbkmK4T2OxhzReCbRsLN/0BeEcDdLkNWmi2FTh3V1gYzMGkgI9wsVbsd
 pHjoGIHbMPWqktF/obuGq96WVfYBBaWJ6WNzQqKT4dQYAJBW2omxitXQHLpi6cjt
 hG5ncxa3cPpWx4t3Lx2hb0TPS7RyYvuoQIcS/Me2RWioxrwWrgnOqdHFfLEwWpfN
 jRowdWiGgOIsq8hMt7qycmGCXzbgsbaA/7oRqh8TiwM9taPOM4c=
 =uH2E
 -----END PGP SIGNATURE-----

Merge tag 'vfs-fix-ioctl-checking-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull common SETFLAGS/FSSETXATTR parameter checking from Darrick Wong:
 "Here's a patch series that sets up common parameter checking functions
  for the FS_IOC_SETFLAGS and FS_IOC_FSSETXATTR ioctl implementations.

  The goal here is to reduce the amount of behaviorial variance between
  the filesystems where those ioctls originated (ext2 and XFS,
  respectively) and everybody else.

   - Standardize parameter checking for the SETFLAGS and FSSETXATTR
     ioctls (which were the file attribute setters for ext4 and xfs and
     have now been hoisted to the vfs)

   - Only allow the DAX flag to be set on files and directories"

* tag 'vfs-fix-ioctl-checking-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  vfs: only allow FSSETXATTR to set DAX flag on files and dirs
  vfs: teach vfs_ioc_fssetxattr_check to check extent size hints
  vfs: teach vfs_ioc_fssetxattr_check to check project id info
  vfs: create a generic checking function for FS_IOC_FSSETXATTR
  vfs: create a generic checking and prep function for FS_IOC_SETFLAGS
2019-07-12 16:54:37 -07:00
Mike Marshall
e65682b559 orangefs: eliminate needless variable assignments
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2019-07-11 12:53:02 -04:00
Colin Ian King
f10789e4f6 orangefs: remove redundant assignment to variable buffer_index
The variable buffer_index is being initialized however this is never
read and later it is being reassigned to a new value. The initialization
is redundant and hence can be removed.

Addresses-Coverity: ("Unused Value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2019-07-11 12:52:37 -04:00
Greg Kroah-Hartman
0979cf95d2 orangefs: fix build warning from debugfs cleanup patch
Stephen writes:
	After merging the driver-core tree, today's linux-next build (x86_64
	allmodconfig) produced this warning:

	fs/orangefs/orangefs-debugfs.c: In function 'orangefs_debugfs_init':
	fs/orangefs/orangefs-debugfs.c:193:1: warning: label 'out' defined but not used [-Wunused-label]
	 out:
	 ^~~
	fs/orangefs/orangefs-debugfs.c: In function 'orangefs_kernel_debug_init':
	fs/orangefs/orangefs-debugfs.c:204:17: warning: unused variable 'ret' [-Wunused-variable]
	  struct dentry *ret;
	                 ^~~
Fix this up and change the return type of the function to void as it can
not fail, which cleans up some more code and variables as well.

Cc: Mike Marshall <hubcap@omnibond.com>
Cc: Martin Brandenburg <martin@omnibond.com>
Cc: devel@lists.orangefs.org
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: f095adba36 ("orangefs: no need to check return value of debugfs_create functions")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-04 10:30:33 +02:00
Greg Kroah-Hartman
f095adba36 orangefs: no need to check return value of debugfs_create functions
When calling debugfs functions, there is no need to ever check the
return value.  The function can work or not, but the code logic should
never do something different based on this.

Cc: Mike Marshall <hubcap@omnibond.com>
Cc: Martin Brandenburg <martin@omnibond.com>
Cc: devel@lists.orangefs.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20190612152204.GA17511@kroah.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-03 16:57:17 +02:00
Darrick J. Wong
5aca284210 vfs: create a generic checking and prep function for FS_IOC_SETFLAGS
Create a generic function to check incoming FS_IOC_SETFLAGS flag values
and later prepare the inode for updates so that we can standardize the
implementations that follow ext4's flag values.

Note that the efivarfs implementation no longer fails a no-op SETFLAGS
without CAP_LINUX_IMMUTABLE since that's the behavior in ext*.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: David Sterba <dsterba@suse.com>
Reviewed-by: Bob Peterson <rpeterso@redhat.com>
2019-07-01 08:25:34 -07:00
Thomas Gleixner
ec8f24b7fa treewide: Add SPDX license identifier - Makefile/Kconfig
Add SPDX license identifiers to all Make/Kconfig files which:

 - Have no license information of any form

These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:

  GPL-2.0-only

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-21 10:50:46 +02:00
Thomas Gleixner
09c434b8a0 treewide: Add SPDX license identifier for more missed files
Add SPDX license identifiers to all files which:

 - Have no license information of any form

 - Have MODULE_LICENCE("GPL*") inside which was used in the initial
   scan/conversion to ignore the file

These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:

  GPL-2.0-only

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-21 10:50:45 +02:00
Ira Weiny
73b0140bf0 mm/gup: change GUP fast to use flags rather than a write 'bool'
To facilitate additional options to get_user_pages_fast() change the
singular write parameter to be gup_flags.

This patch does not change any functionality.  New functionality will
follow in subsequent patches.

Some of the get_user_pages_fast() call sites were unchanged because they
already passed FOLL_WRITE or 0 for the write parameter.

NOTE: It was suggested to change the ordering of the get_user_pages_fast()
arguments to ensure that callers were converted.  This breaks the current
GUP call site convention of having the returned pages be the final
parameter.  So the suggestion was rejected.

Link: http://lkml.kernel.org/r/20190328084422.29911-4-ira.weiny@intel.com
Link: http://lkml.kernel.org/r/20190317183438.2057-4-ira.weiny@intel.com
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Mike Marshall <hubcap@omnibond.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14 09:47:46 -07:00
Linus Torvalds
8823880561 Orangefs: This pull request includes one fix and our "Orangefs through
the pagecache" patch series which greatly improves our small IO
 performance and helps us pass more xfstests than before.
 
 fix: orangefs: truncate before updating size
 
 Pagecache series: all the rest
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJc1CucAAoJEM9EDqnrzg2+2h4P/i2ktG4Zi3VDwCr1xea0bWSf
 kidYxzVPmGAmfC+UNMn/+X4v/R6rtHcpW/g7/0WLqhLUhRwjB1gzT6U9mHfu8BQV
 1I7tg6lQqcTi4gNkBMWI4MHKRyoQ5bAgY7jFxLjqx1zj5+qxrtPn4GUCUbDBDPpP
 Ip9RD/izEKeG77gdSUkCD72ojHl9lqmT1Qd3k3asmozzdWzd8mdXy8EQplECo7tZ
 1tLzV+BSER2BVQfmws2CCG00DLFzQRJkDYxm7I5l1xoQ/Ft8n6k/tXZ1QzkPREg2
 FhaoY5eCA/2MkB0VhUKj1Y+gXNZcbys11oxcqyZZ6p2kOu+1bY+evkzwDUzsIAg+
 nl1d3LoeEVKoxpa4O+vCHLF++HX8TuVHSaxvNaZPMm86gxowCh0aberyWcscGbQp
 m2M28WdjlzYC7uPSakoKVvF+ZBNKQP5sn6jbjYSOHAB9nFz6x5ILH3SMK8MXoZZe
 2Ejd4NBmtqekdLp2MNlAWHF1O9sTagH+OqUvbNRNjrQXM3hNSgeguGx1b0E+KlT2
 lwKORIVwZNZ/xuizdwM01kWRQObAmGrptlVbR4+HKqo99g9gGv6sdw//cHzo9/2S
 GV34cpS5VRSwHVwuzADD1A/qQ+k5xomihrpZwcqh+YOwQV16JnTHiSeTYZKVIEW8
 cBiZ2k5dbKm1QcLc9tJ3
 =uBS+
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-5.2-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux

Pull orangefs updates from Mike Marshall:
 "This includes one fix and our "Orangefs through the pagecache" patch
  series which greatly improves our small IO performance and helps us
  pass more xfstests than before.

  Fix:
   - orangefs: truncate before updating size

  Pagecache series:
   - all the rest"

* tag 'for-linus-5.2-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux: (23 commits)
  orangefs: truncate before updating size
  orangefs: copy Orangefs-sized blocks into the pagecache if possible.
  orangefs: pass slot index back to readpage.
  orangefs: remember count when reading.
  orangefs: add orangefs_revalidate_mapping
  orangefs: implement writepages
  orangefs: write range tracking
  orangefs: avoid fsync service operation on flush
  orangefs: skip inode writeout if nothing to write
  orangefs: move do_readv_writev to direct_IO
  orangefs: do not return successful read when the client-core disappeared
  orangefs: implement writepage
  orangefs: migrate to generic_file_read_iter
  orangefs: service ops done for writeback are not killable
  orangefs: remove orangefs_readpages
  orangefs: reorganize setattr functions to track attribute changes
  orangefs: let setattr write to cached inode
  orangefs: set up and use backing_dev_info
  orangefs: hold i_lock during inode_getattr
  orangefs: update attributes rather than relying on server
  ...
2019-05-09 09:37:25 -07:00
Martin Brandenburg
33713cd09c orangefs: truncate before updating size
Otherwise we race with orangefs_writepage/orangefs_writepages
which and does not expect i_size < page_offset.

Fixes xfstests generic/129.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2019-05-03 14:39:10 -04:00
Mike Marshall
dd59a6475c orangefs: copy Orangefs-sized blocks into the pagecache if possible.
->readpage looks in file->private_data to try and find out how the
userspace program set "count" in read(2) or with "dd bs=" or whatever.

->readpage uses "count" and inode->i_size to calculate how much
data Orangefs should deposit in the Orangefs shared buffer, and
remembers which slot the data is in.

After copying data from the Orangefs shared buffer slot into
"the page", readpage tries to increment through the pagecache index
and fill as many pages as it can from the extra data in the shared
buffer. Hopefully these extra pages will soon be needed by the vfs,
and they'll be in the pagecache already.

Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
2019-05-03 14:32:39 -04:00
Mike Marshall
4077a0f25b orangefs: pass slot index back to readpage.
When userspace deposits more than a page of data into the shared buffer,
we'll need to know which slot it is in when we get back to readpage
so that we can try to use the extra data to fill some extra pages.

Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
2019-05-03 14:32:39 -04:00
Mike Marshall
c2549f8c7a orangefs: remember count when reading.
Orangefs wins when it can do IO on large (up to four meg) blocks at a time,
and looses when it has to do tiny "small io" reads and writes. Accessing
Orangefs through the pagecache with the kernel module helps with small io,
both reading and writing, a great deal. Readpage generally tries to fetch a
page (four k) at a time. We'll let users use "count" (as in read(2) or
pread(2) for example) as a knob to control how much data they get from
Orangefs at a time and we'll try to use the data to fill extra
pagecache pages when we get to ->readpage, hopefully resulting in
fewer calls to readpage and Orangefs userspace.

We need a way to remember how they set count so that we can still have
it available when we get to ->readpage.

 - We'll use file->private_data to keep track of "count".
   We'll wrap generic_file_open with orangefs_file_open and
   initialize private_data to NULL there.

 - In ->read_iter we have access to both "count" and file, so
   we'll kmalloc some space onto file->private_data and store
   "count" there.

 - We'll kfree file->private_data each time we visit ->flush and
   reinitialize it to NULL.

Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
2019-05-03 14:32:39 -04:00
Martin Brandenburg
8f04e1be78 orangefs: add orangefs_revalidate_mapping
This is modeled after NFS, except our method is different.  We use a
simple timer to determine whether to invalidate the page cache.  This
is bound to perform.

This addes a sysfs parameter cache_timeout_msecs which controls the time
between page cache invalidations.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2019-05-03 14:32:39 -04:00
Martin Brandenburg
c472ebc255 orangefs: implement writepages
Go through pages and look for a consecutive writable region.  After
finding a number of consecutive writable pages or when finding that
the next page's dirty range is not contiguous and cannot be written
as one request, send the write to the server.

The number of pages is determined by the client-core's buffer size.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2019-05-03 14:32:39 -04:00
Martin Brandenburg
52e2d0a380 orangefs: write range tracking
Attach the actual range of bytes written to plus the responsible uid/gid
to each dirty page.  This information must be sent to the server when
the page is written out.

Now write_begin, page_mkwrite, and invalidatepage keep up with this
information.  There are several conditions where they must write out the
page immediately to store the new range.  Two non-contiguous ranges
cannot be stored on a single page.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2019-05-03 14:32:38 -04:00
Martin Brandenburg
90fc07065a orangefs: avoid fsync service operation on flush
Without this, an fsync call is sent to the server even if no data
changed.  This resulted in a rather severe (50%) performance regression
under certain metadata-heavy workloads.

In the past, everything was direct IO.  Nothing happend on a close call.
An explicit fsync call would send an fsync request to the server which
in turn fsynced the underlying file.

Now there are cached writes.  Then fsync began writing out dirty pages
in addition to making an fsync request to the server, and close began
calling fsync.

With this commit, close only writes out dirty pages, and does not make
the fsync request.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2019-05-03 14:32:38 -04:00
Martin Brandenburg
8a88bbce6f orangefs: skip inode writeout if nothing to write
Would happen if an inode is dirty but whatever happened is not something
that can be written out to OrangeFS.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2019-05-03 14:32:38 -04:00
Martin Brandenburg
3e9dfc6e1e orangefs: move do_readv_writev to direct_IO
direct_IO was the only caller and all direct_IO did was call it,
so there's no use in having the code spread out into so many functions.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2019-05-03 14:32:38 -04:00
Martin Brandenburg
43f3457604 orangefs: do not return successful read when the client-core disappeared
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2019-05-03 14:32:38 -04:00
Martin Brandenburg
85ac799cf9 orangefs: implement writepage
Now orangefs_inode_getattr fills from cache if an inode has dirty pages.

also if attr_valid and dirty pages and !flags, we spin on inode writeback
before returning if pages still dirty after: should it be other way

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2019-05-03 14:32:38 -04:00
Martin Brandenburg
c453dcfc79 orangefs: migrate to generic_file_read_iter
Remove orangefs_inode_read.  It was used by readpage.  Calling
wait_for_direct_io directly serves the purpose just as well.  There is
now no check of the bufmap size in the readpage path.  There are already
other places the bufmap size is assumed to be greater than PAGE_SIZE.

Important to call truncate_inode_pages now in the write path so a
subsequent read sees the new data.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2019-05-03 14:32:38 -04:00
Martin Brandenburg
0dcac0f781 orangefs: service ops done for writeback are not killable
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2019-05-03 14:32:38 -04:00
Martin Brandenburg
a68d9c606a orangefs: remove orangefs_readpages
It's a copy of the loop which would run in read_pages from
mm/readahead.c.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2019-05-03 14:32:38 -04:00
Martin Brandenburg
afd9fb2a31 orangefs: reorganize setattr functions to track attribute changes
OrangeFS accepts a mask indicating which attributes were changed.  The
kernel must not set any bits except those that were actually changed.
The kernel must set the uid/gid of the request to the actual uid/gid
responsible for the change.

Code path for notify_change initiated setattrs is

orangefs_setattr(dentry, iattr)
-> __orangefs_setattr(inode, iattr)

In kernel changes are initiated by calling __orangefs_setattr.

Code path for writeback is

orangefs_write_inode
-> orangefs_inode_setattr

attr_valid and attr_uid and attr_gid change together under i_lock.
I_DIRTY changes separately.

__orangefs_setattr
	lock
	if needs to be cleaned first, unlock and retry
	set attr_valid
	copy data in
	unlock
	mark_inode_dirty

orangefs_inode_setattr
	lock
	copy attributes out
	unlock
	clear getattr_time
	# __writeback_single_inode clears dirty

orangefs_inode_getattr
	# possible to get here with attr_valid set and not dirty
	lock
	if getattr_time ok or attr_valid set, unlock and return
	unlock
	do server operation
	# another thread may getattr or setattr, so check for that
	lock
	if getattr_time ok or attr_valid, unlock and return
	else, copy in
	update getattr_time
	unlock

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2019-05-03 14:32:38 -04:00
Martin Brandenburg
df2d7337b5 orangefs: let setattr write to cached inode
This is a fairly big change, but ultimately it's not a lot of code.

Implement write_inode and then avoid the call to orangefs_inode_setattr
within orangefs_setattr.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2019-05-03 14:32:38 -04:00
Martin Brandenburg
f2d34c738c orangefs: set up and use backing_dev_info
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2019-05-03 14:32:38 -04:00
Martin Brandenburg
5e4f606e26 orangefs: hold i_lock during inode_getattr
This should be a no-op now.  When inode writeback works, this will
prevent a getattr from overwriting inode data while an inode is
transitioning to dirty.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2019-05-03 14:32:38 -04:00
Martin Brandenburg
5e7f1d4338 orangefs: update attributes rather than relying on server
This should be a no-op now, but once inode writeback works, it'll be
necessary to have the correct attribute in the dirty inode.

Previously the attribute fetch timeout was marked invalid and the server
provided the updated attribute.  When the inode is dirty, the server
cannot be consulted since it does not yet know the pending setattr.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2019-05-03 14:32:38 -04:00
Martin Brandenburg
8b60785c1d orangefs: simplify orangefs_inode_getattr interface
No need to store the received mask.  It is either STATX_BASIC_STATS or
STATX_BASIC_STATS & ~STATX_SIZE.  If STATX_SIZE is requested, the cache
is bypassed anyway, so the cached mask is unnecessary to decide whether
to do a real getattr.

This is a change.  Previously a getattr would want size and use the
cached size.  All of the in-kernel callers that wanted size did not want
a cached size.  Now a getattr cannot use the cached size if it wants
size at all.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2019-05-03 14:32:38 -04:00
Martin Brandenburg
66d5477d70 orangefs: do not invalidate attributes on inode create
When an inode is created, we fetch attributes from the server.  There is
no need to turn around and invalidate them.

No need to initialize attributes after the getattr either.  Either it'll
be exactly the same, or it'll be something else and wrong.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2019-05-03 14:32:37 -04:00
Martin Brandenburg
fc2e2e9c43 orangefs: implement xattr cache
This uses the same timeout as the getattr cache.  This substantially
increases performance when writing files with smaller buffer sizes.

When writing, the size is (often) changed, which causes a call to
notify_change which calls security_inode_need_killpriv which needs a
getxattr.  Caching it reduces traffic to the server.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2019-05-03 14:32:37 -04:00
Al Viro
f276ae0dd6 orangefs: make use of ->free_inode()
Acked-by: Mike Marshall <hubcap@omnibond.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01 22:43:27 -04:00
Linus Torvalds
5f739e4a49 Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc vfs updates from Al Viro:
 "Assorted fixes (really no common topic here)"

* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  vfs: Make __vfs_write() static
  vfs: fix preadv64v2 and pwritev64v2 compat syscalls with offset == -1
  pipe: stop using ->can_merge
  splice: don't merge into linked buffers
  fs: move generic stat response attr handling to vfs_getattr_nosec
  orangefs: don't reinitialize result_mask in ->getattr
  fs/devpts: always delete dcache dentry-s in dput()
2019-03-12 13:27:20 -07:00
Mike Marshall
6e356d4595 orangefs: remove two un-needed BUG_ONs...
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2019-02-20 15:12:52 -05:00
Christoph Hellwig
5678b5d6a8 orangefs: don't reinitialize result_mask in ->getattr
The caller already initializes it to the basic stats.  Just
clear not supported default bits where needed.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-02-01 01:55:45 -05:00
Nikolay Borisov
f86196ea87 fs: don't open code lru_to_page()
Multiple filesystems open code lru_to_page().  Rectify this by moving
the macro from mm_inline (which is specific to lru stuff) to the more
generic mm.h header and start using the macro where appropriate.

No functional changes.

Link: http://lkml.kernel.org/r/20181129104810.23361-1-nborisov@suse.com
Link: https://lkml.kernel.org/r/20181129075301.29087-1-nborisov@suse.com
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Pankaj gupta <pagupta@redhat.com>
Acked-by: "Yan, Zheng" <zyan@redhat.com>		[ceph]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04 13:13:48 -08:00
Davidlohr Bueso
08d405c8b8 fs/: remove caller signal_pending branch predictions
This is already done for us internally by the signal machinery.

[akpm@linux-foundation.org: fix fs/buffer.c]
Link: http://lkml.kernel.org/r/20181116002713.8474-7-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04 13:13:48 -08:00
Linus Torvalds
9931a07d51 Merge branch 'work.afs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull AFS updates from Al Viro:
 "AFS series, with some iov_iter bits included"

* 'work.afs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (26 commits)
  missing bits of "iov_iter: Separate type from direction and use accessor functions"
  afs: Probe multiple fileservers simultaneously
  afs: Fix callback handling
  afs: Eliminate the address pointer from the address list cursor
  afs: Allow dumping of server cursor on operation failure
  afs: Implement YFS support in the fs client
  afs: Expand data structure fields to support YFS
  afs: Get the target vnode in afs_rmdir() and get a callback on it
  afs: Calc callback expiry in op reply delivery
  afs: Fix FS.FetchStatus delivery from updating wrong vnode
  afs: Implement the YFS cache manager service
  afs: Remove callback details from afs_callback_break struct
  afs: Commit the status on a new file/dir/symlink
  afs: Increase to 64-bit volume ID and 96-bit vnode ID for YFS
  afs: Don't invoke the server to read data beyond EOF
  afs: Add a couple of tracepoints to log I/O errors
  afs: Handle EIO from delivery function
  afs: Fix TTL on VL server and address lists
  afs: Implement VL server rotation
  afs: Improve FS server rotation error handling
  ...
2018-11-01 19:58:52 -07:00
David Howells
aa563d7bca iov_iter: Separate type from direction and use accessor functions
In the iov_iter struct, separate the iterator type from the iterator
direction and use accessor functions to access them in most places.

Convert a bunch of places to use switch-statements to access them rather
then chains of bitwise-AND statements.  This makes it easier to add further
iterator types.  Also, this can be more efficient as to implement a switch
of small contiguous integers, the compiler can use ~50% fewer compare
instructions than it has to use bitwise-and instructions.

Further, cease passing the iterator type into the iterator setup function.
The iterator function can set that itself.  Only the direction is required.

Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24 00:41:07 +01:00
Mike Marshall
22fc9db296 orangefs: no need to check for service_operation returns > 0
service_operation returns > 0 is undefined.

Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2018-10-18 14:05:46 -04:00
Mike Marshall
34e6148a2c orangefs: some error code paths missed kmem_cache_free
If a slab cache object is allocated, it needs to be freed eventually,
certainly before anyone unloads the module that allocated it.

Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2018-10-18 13:58:25 -04:00
Mike Marshall
b5d72cdc53 orangefs: don't let orangefs_iget return NULL.
Suggested by Dan Carpenter.

Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2018-10-18 13:52:23 -04:00
Mike Marshall
56249998b2 orangefs: don't let orangefs_new_inode return NULL
Suggested by Dan Carpenter

Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2018-10-18 13:47:16 -04:00
Colin Ian King
2978d87347 orangefs: rate limit the client not running info message
Currently accessing various /sys/fs/orangefs files will spam the
kernel log with the following info message when the client is not
running:

[  491.489284] sysfs_service_op_show: Client not running :-5:

Rate limit this info message to make it less spammy.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2018-10-10 09:54:29 -04:00
Chengguang Xu
052d12766b orangefs: cache NULL when both default_acl and acl are NULL
default_acl and acl of newly created inode will be initiated
as ACL_NOT_CACHED in vfs function inode_init_always() and later
will be updated by calling xxx_init_acl() in specific filesystems.
Howerver, when default_acl and acl are NULL then they keep the value
of ACL_NOT_CACHED, this patch tries to cache NULL for acl/default_acl
in this case.

Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2018-10-10 09:54:25 -04:00
Colin Ian King
e1b437691a orangefs: remove redundant pointer orangefs_inode
Pointer orangefs_inode is being assigned but is never used hence it is
redundant and can be removed.

Cleans up clang warning:
warning: variable 'orangefs_inode' set but not used [-Wunused-but-set-variable]

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2018-08-14 12:07:14 -04:00
Souptick Joarder
8bf782f647 orangefs: Adding new return type vm_fault_t
Use new return type vm_fault_t for fault handler. For now,
this is just documenting that the function returns a VM_FAULT
value rather than an errno. Once all instances are converted,
vm_fault_t will become a distinct type.

See the following
commit 1c8f422059 ("mm: change return type to vm_fault_t")

Fixed checkpatch.pl warning.

Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2018-08-14 12:07:14 -04:00
Linus Torvalds
5e7b9212a4 Solve a series of broken links for files under Documentation:
- can.rst: fix a footnote reference;
 - crypto_engine.rst: Fix two parsing warnings;
 - Fix a lot of broken references to Documentation/*;
 - Improves the scripts/documentation-file-ref-check script,
   in order to help detecting/fixing broken references,
   preventing false-positives.
 
 After this patch series, only 33 broken references to doc files are
 detected by scripts/documentation-file-ref-check.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJbJC2aAAoJEAhfPr2O5OEVPmMP/2rN5m9LZ048oRWlg4hCwo73
 4FpWqDg18hbWCMHXYHIN1UACIMUkIUfgLhF7WE3D/XqRMuxHoiE5u7DUdak7+VNt
 wunpksKFJbgyfFMHRvykHcZV+jQFVbM7eFvXVPIvoSaAeGH6zx4imHTyeDn3x/nL
 gdtBqM4bvEhmBjotBTRR4PB8+oPrT/HIT5npHepx3UnFFFAzDQGEZ/I67/el2G5C
 pVmYdBXvr7iqrvUs6FilHLTEfe1quCI4UaKNfLHKrxXrTkiJQFOwugYuobZfNmxT
 GwjWzfpNy9HMlKJFYipcByALxel1Mnpqz5mIxFQaCTygBuEsORCWzW5MoKIsIUJ0
 KOoG76v0rUyMvLBRvaoao3CHYHdzxhQbtVV9DjyDuDksa2G5IoCAF1t6DyIOitRw
 9plMnGckk+FJ/MXJKYWXHszFS8NhI0SF2zHe3s1DmRTD8P6oxkxvxBFz6iqqADmL
 W6XHd8CcqJItaS9ctPen91TFuysN1HFpdzLLY+xwWmmKOcWC/jFjhTm8pj7xLQHM
 5yuuEcefsajf+Xk4w2fSQmRfXnuq+oOlPuWpwSvEy+59cHGI0ms18P1nHy/yt3II
 CJywwdx6fjwDon57RFKH7kkGd7px317zMqWdIv9gUj/qZAy9gcdLdvEQLhx9u0aV
 4F+hLKFDFEpf58xqRT1R
 =/ozx
 -----END PGP SIGNATURE-----

Merge tag 'docs-broken-links' of git://linuxtv.org/mchehab/experimental

Pull documentation fixes from Mauro Carvalho Chehab:
 "This solves a series of broken links for files under Documentation,
  and improves a script meant to detect such broken links (see
  scripts/documentation-file-ref-check).

  The changes on this series are:

   - can.rst: fix a footnote reference;

   - crypto_engine.rst: Fix two parsing warnings;

   - Fix a lot of broken references to Documentation/*;

   - improve the scripts/documentation-file-ref-check script, in order
     to help detecting/fixing broken references, preventing
     false-positives.

  After this patch series, only 33 broken references to doc files are
  detected by scripts/documentation-file-ref-check"

* tag 'docs-broken-links' of git://linuxtv.org/mchehab/experimental: (26 commits)
  fix a series of Documentation/ broken file name references
  Documentation: rstFlatTable.py: fix a broken reference
  ABI: sysfs-devices-system-cpu: remove a broken reference
  devicetree: fix a series of wrong file references
  devicetree: fix name of pinctrl-bindings.txt
  devicetree: fix some bindings file names
  MAINTAINERS: fix location of DT npcm files
  MAINTAINERS: fix location of some display DT bindings
  kernel-parameters.txt: fix pointers to sound parameters
  bindings: nvmem/zii: Fix location of nvmem.txt
  docs: Fix more broken references
  scripts/documentation-file-ref-check: check tools/*/Documentation
  scripts/documentation-file-ref-check: get rid of false-positives
  scripts/documentation-file-ref-check: hint: dash or underline
  scripts/documentation-file-ref-check: add a fix logic for DT
  scripts/documentation-file-ref-check: accept more wildcards at filenames
  scripts/documentation-file-ref-check: fix help message
  media: max2175: fix location of driver's companion documentation
  media: v4l: fix broken video4linux docs locations
  media: dvb: point to the location of the old README.dvb-usb file
  ...
2018-06-17 05:25:18 +09:00
Linus Torvalds
29d6849d88 Merge branch 'work.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull compat updates from Al Viro:
 "Some biarch patches - getting rid of assorted (mis)uses of
  compat_alloc_user_space().

  Not much in that area this cycle..."

* 'work.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  orangefs: simplify compat ioctl handling
  signalfd: lift sigmask copyin and size checks to callers of do_signalfd4()
  vmsplice(): lift importing iovec into vmsplice(2) and compat counterpart
2018-06-16 16:21:50 +09:00
Mauro Carvalho Chehab
44348e8ac1 fix a series of Documentation/ broken file name references
As files move around, their previous links break. Fix the
references for them.

Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: Jonathan Corbet <corbet@lwn.net>
2018-06-15 18:10:01 -03:00
Al Viro
430ff79170 orangefs: simplify compat ioctl handling
no need to mess with copy_in_user(), etc...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-06-15 00:23:55 -04:00
Linus Torvalds
7a932516f5 vfs/y2038: inode timestamps conversion to timespec64
This is a late set of changes from Deepa Dinamani doing an automated
 treewide conversion of the inode and iattr structures from 'timespec'
 to 'timespec64', to push the conversion from the VFS layer into the
 individual file systems.
 
 There were no conflicts between this and the contents of linux-next
 until just before the merge window, when we saw multiple problems:
 
 - A minor conflict with my own y2038 fixes, which I could address
   by adding another patch on top here.
 - One semantic conflict with late changes to the NFS tree. I addressed
   this by merging Deepa's original branch on top of the changes that
   now got merged into mainline and making sure the merge commit includes
   the necessary changes as produced by coccinelle.
 - A trivial conflict against the removal of staging/lustre.
 - Multiple conflicts against the VFS changes in the overlayfs tree.
   These are still part of linux-next, but apparently this is no longer
   intended for 4.18 [1], so I am ignoring that part.
 
 As Deepa writes:
 
   The series aims to switch vfs timestamps to use struct timespec64.
   Currently vfs uses struct timespec, which is not y2038 safe.
 
   The series involves the following:
   1. Add vfs helper functions for supporting struct timepec64 timestamps.
   2. Cast prints of vfs timestamps to avoid warnings after the switch.
   3. Simplify code using vfs timestamps so that the actual
      replacement becomes easy.
   4. Convert vfs timestamps to use struct timespec64 using a script.
      This is a flag day patch.
 
   Next steps:
   1. Convert APIs that can handle timespec64, instead of converting
      timestamps at the boundaries.
   2. Update internal data structures to avoid timestamp conversions.
 
 Thomas Gleixner adds:
 
   I think there is no point to drag that out for the next merge window.
   The whole thing needs to be done in one go for the core changes which
   means that you're going to play that catchup game forever. Let's get
   over with it towards the end of the merge window.
 
 [1] https://www.spinics.net/lists/linux-fsdevel/msg128294.html
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJbInZAAAoJEGCrR//JCVInReoQAIlVIIMt5ZX6wmaKbrjy9Itf
 MfgbFihQ/djLnuSPVQ3nztcxF0d66BKHZ9puVjz6+mIHqfDvJTRwZs9nU+sOF/T1
 g78fRkM1cxq6ZCkGYAbzyjyo5aC4PnSMP/NQLmwqvi0MXqqrbDoq5ZdP9DHJw39h
 L9lD8FM/P7T29Fgp9tq/pT5l9X8VU8+s5KQG1uhB5hii4VL6pD6JyLElDita7rg+
 Z7/V7jkxIGEUWF7vGaiR1QTFzEtpUA/exDf9cnsf51OGtK/LJfQ0oiZPPuq3oA/E
 LSbt8YQQObc+dvfnGxwgxEg1k5WP5ekj/Wdibv/+rQKgGyLOTz6Q4xK6r8F2ahxs
 nyZQBdXqHhJYyKr1H1reUH3mrSgQbE5U5R1i3My0xV2dSn+vtK5vgF21v2Ku3A1G
 wJratdtF/kVBzSEQUhsYTw14Un+xhBLRWzcq0cELonqxaKvRQK9r92KHLIWNE7/v
 c0TmhFbkZA+zR8HdsaL3iYf1+0W/eYy8PcvepyldKNeW2pVk3CyvdTfY2Z87G2XK
 tIkK+BUWbG3drEGG3hxZ3757Ln3a9qWyC5ruD3mBVkuug/wekbI8PykYJS7Mx4s/
 WNXl0dAL0Eeu1M8uEJejRAe1Q3eXoMWZbvCYZc+wAm92pATfHVcKwPOh8P7NHlfy
 A3HkjIBrKW5AgQDxfgvm
 =CZX2
 -----END PGP SIGNATURE-----

Merge tag 'vfs-timespec64' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground

Pull inode timestamps conversion to timespec64 from Arnd Bergmann:
 "This is a late set of changes from Deepa Dinamani doing an automated
  treewide conversion of the inode and iattr structures from 'timespec'
  to 'timespec64', to push the conversion from the VFS layer into the
  individual file systems.

  As Deepa writes:

   'The series aims to switch vfs timestamps to use struct timespec64.
    Currently vfs uses struct timespec, which is not y2038 safe.

    The series involves the following:
    1. Add vfs helper functions for supporting struct timepec64
       timestamps.
    2. Cast prints of vfs timestamps to avoid warnings after the switch.
    3. Simplify code using vfs timestamps so that the actual replacement
       becomes easy.
    4. Convert vfs timestamps to use struct timespec64 using a script.
       This is a flag day patch.

    Next steps:
    1. Convert APIs that can handle timespec64, instead of converting
       timestamps at the boundaries.
    2. Update internal data structures to avoid timestamp conversions'

  Thomas Gleixner adds:

   'I think there is no point to drag that out for the next merge
    window. The whole thing needs to be done in one go for the core
    changes which means that you're going to play that catchup game
    forever. Let's get over with it towards the end of the merge window'"

* tag 'vfs-timespec64' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground:
  pstore: Remove bogus format string definition
  vfs: change inode times to use struct timespec64
  pstore: Convert internal records to timespec64
  udf: Simplify calls to udf_disk_stamp_to_time
  fs: nfs: get rid of memcpys for inode times
  ceph: make inode time prints to be long long
  lustre: Use long long type to print inode time
  fs: add timespec64_truncate()
2018-06-15 07:31:07 +09:00
Arnd Bergmann
15eefe2a99 Merge branch 'vfs_timespec64' of https://github.com/deepa-hub/vfs into vfs-timespec64
Pull the timespec64 conversion from Deepa Dinamani:
 "The series aims to switch vfs timestamps to use
  struct timespec64. Currently vfs uses struct timespec,
  which is not y2038 safe.

  The flag patch applies cleanly. I've not seen the timestamps
  update logic change often. The series applies cleanly on 4.17-rc6
  and linux-next tip (top commit: next-20180517).

  I'm not sure how to merge this kind of a series with a flag patch.
  We are targeting 4.18 for this.
  Let me know if you have other suggestions.

  The series involves the following:
  1. Add vfs helper functions for supporting struct timepec64 timestamps.
  2. Cast prints of vfs timestamps to avoid warnings after the switch.
  3. Simplify code using vfs timestamps so that the actual
     replacement becomes easy.
  4. Convert vfs timestamps to use struct timespec64 using a script.
     This is a flag day patch.

  I've tried to keep the conversions with the script simple, to
  aid in the reviews. I've kept all the internal filesystem data
  structures and function signatures the same.

  Next steps:
  1. Convert APIs that can handle timespec64, instead of converting
     timestamps at the boundaries.
  2. Update internal data structures to avoid timestamp conversions."

I've pulled it into a branch based on top of the NFS changes that
are now in mainline, so I could resolve the non-obvious conflict
between the two while merging.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2018-06-14 14:54:00 +02:00
Linus Torvalds
091a0f2785 orangefs: fixes and cleanups
+ fix some sparse warnings
  + cleanup some code formatting
  + fix up some attribute/meta-data related code
 -----BEGIN PGP SIGNATURE-----
 
 iQIbBAABAgAGBQJbGUIEAAoJEM9EDqnrzg2+3ygP9RPTSd2Mh/vyzokxmBacT/iW
 CrJXdrJkG+Yb/adGuT2UAPIMg73aGLqAPMN8dHhfhzotdd1BgrhMK+zZECUClwOf
 dxQCA+RFn1YofXMojqWYAFtJ6ptJgMDJxje62hS+QCWzDz0IM1PYr2qCQPs1IRmb
 CWVuP1igwt82VIVLey8YEuI5yuOnFOSO+7rKi7TPsY991oPxxz2brmrMuiW2Dg0W
 6MCP19XUBpxdDJWtL2GWXAwgkP9JYkhokwkAloqWXoOpDUf0EniomuRMBLOtISoO
 tJfgnYIN+7sRTSbqsNtKQrm7f8/B01t+tGuzJYDGlUls02V3DkS9EjypvP+VZ6e9
 bsN3oQHD6xFx/7egxUVBRFD6Hw83j58DH1N+79whVrU3ospkj3z0QX1g6zqgY2ew
 mgileXBAJQD+SUr0F42QqeFVexjky9dKtDZLQe42jxLZXpPly2t737h4ZatdW3ro
 yKJLUzuHbLGQzgGJOSQsSfmwUkSeTilgCV6WFyJWinlxUWQEkp0F+1u5B4RX87BQ
 SHUz5O3pWDpz+kLtauzsVlXDbBQfZHimIbnKWHZoQ5dER/LQfWBfvOz+CT/5uBBg
 qhdPZ4wKLD4qvzByuZJKyYmPXpUoOWh/i5QcvJ8B8rNxCx8XEP/VoikjUqkCUajb
 4HFQwnZtnQOLfnufTTw=
 =vwvd
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-4.18-ofs' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux

Pull orangefs updates from Mike Marshall:
 "Fixes and cleanups:

   - fix some sparse warnings

   - cleanup some code formatting

   - fix up some attribute/meta-data related code"

* tag 'for-linus-4.18-ofs' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
  orangefs: use sparse annotations for holding locks across function calls.
  orangefs: make debug_help_fops static
  orangefs: remove unused function orangefs_get_bufmap_init
  orangefs: specify user pointers when using dev_map_desc and bufmap
  orangefs: formatting cleanups
  orangefs: set i_size on new symlink
  orangefs: report attributes_mask and attributes for statx
  orangefs: make struct orangefs_file_vm_ops static
  orangefs: revamp block sizes
2018-06-07 09:23:12 -07:00
Deepa Dinamani
95582b0083 vfs: change inode times to use struct timespec64
struct timespec is not y2038 safe. Transition vfs to use
y2038 safe struct timespec64 instead.

The change was made with the help of the following cocinelle
script. This catches about 80% of the changes.
All the header file and logic changes are included in the
first 5 rules. The rest are trivial substitutions.
I avoid changing any of the function signatures or any other
filesystem specific data structures to keep the patch simple
for review.

The script can be a little shorter by combining different cases.
But, this version was sufficient for my usecase.

virtual patch

@ depends on patch @
identifier now;
@@
- struct timespec
+ struct timespec64
  current_time ( ... )
  {
- struct timespec now = current_kernel_time();
+ struct timespec64 now = current_kernel_time64();
  ...
- return timespec_trunc(
+ return timespec64_trunc(
  ... );
  }

@ depends on patch @
identifier xtime;
@@
 struct \( iattr \| inode \| kstat \) {
 ...
-       struct timespec xtime;
+       struct timespec64 xtime;
 ...
 }

@ depends on patch @
identifier t;
@@
 struct inode_operations {
 ...
int (*update_time) (...,
-       struct timespec t,
+       struct timespec64 t,
...);
 ...
 }

@ depends on patch @
identifier t;
identifier fn_update_time =~ "update_time$";
@@
 fn_update_time (...,
- struct timespec *t,
+ struct timespec64 *t,
 ...) { ... }

@ depends on patch @
identifier t;
@@
lease_get_mtime( ... ,
- struct timespec *t
+ struct timespec64 *t
  ) { ... }

@te depends on patch forall@
identifier ts;
local idexpression struct inode *inode_node;
identifier i_xtime =~ "^i_[acm]time$";
identifier ia_xtime =~ "^ia_[acm]time$";
identifier fn_update_time =~ "update_time$";
identifier fn;
expression e, E3;
local idexpression struct inode *node1;
local idexpression struct inode *node2;
local idexpression struct iattr *attr1;
local idexpression struct iattr *attr2;
local idexpression struct iattr attr;
identifier i_xtime1 =~ "^i_[acm]time$";
identifier i_xtime2 =~ "^i_[acm]time$";
identifier ia_xtime1 =~ "^ia_[acm]time$";
identifier ia_xtime2 =~ "^ia_[acm]time$";
@@
(
(
- struct timespec ts;
+ struct timespec64 ts;
|
- struct timespec ts = current_time(inode_node);
+ struct timespec64 ts = current_time(inode_node);
)

<+... when != ts
(
- timespec_equal(&inode_node->i_xtime, &ts)
+ timespec64_equal(&inode_node->i_xtime, &ts)
|
- timespec_equal(&ts, &inode_node->i_xtime)
+ timespec64_equal(&ts, &inode_node->i_xtime)
|
- timespec_compare(&inode_node->i_xtime, &ts)
+ timespec64_compare(&inode_node->i_xtime, &ts)
|
- timespec_compare(&ts, &inode_node->i_xtime)
+ timespec64_compare(&ts, &inode_node->i_xtime)
|
ts = current_time(e)
|
fn_update_time(..., &ts,...)
|
inode_node->i_xtime = ts
|
node1->i_xtime = ts
|
ts = inode_node->i_xtime
|
<+... attr1->ia_xtime ...+> = ts
|
ts = attr1->ia_xtime
|
ts.tv_sec
|
ts.tv_nsec
|
btrfs_set_stack_timespec_sec(..., ts.tv_sec)
|
btrfs_set_stack_timespec_nsec(..., ts.tv_nsec)
|
- ts = timespec64_to_timespec(
+ ts =
...
-)
|
- ts = ktime_to_timespec(
+ ts = ktime_to_timespec64(
...)
|
- ts = E3
+ ts = timespec_to_timespec64(E3)
|
- ktime_get_real_ts(&ts)
+ ktime_get_real_ts64(&ts)
|
fn(...,
- ts
+ timespec64_to_timespec(ts)
,...)
)
...+>
(
<... when != ts
- return ts;
+ return timespec64_to_timespec(ts);
...>
)
|
- timespec_equal(&node1->i_xtime1, &node2->i_xtime2)
+ timespec64_equal(&node1->i_xtime2, &node2->i_xtime2)
|
- timespec_equal(&node1->i_xtime1, &attr2->ia_xtime2)
+ timespec64_equal(&node1->i_xtime2, &attr2->ia_xtime2)
|
- timespec_compare(&node1->i_xtime1, &node2->i_xtime2)
+ timespec64_compare(&node1->i_xtime1, &node2->i_xtime2)
|
node1->i_xtime1 =
- timespec_trunc(attr1->ia_xtime1,
+ timespec64_trunc(attr1->ia_xtime1,
...)
|
- attr1->ia_xtime1 = timespec_trunc(attr2->ia_xtime2,
+ attr1->ia_xtime1 =  timespec64_trunc(attr2->ia_xtime2,
...)
|
- ktime_get_real_ts(&attr1->ia_xtime1)
+ ktime_get_real_ts64(&attr1->ia_xtime1)
|
- ktime_get_real_ts(&attr.ia_xtime1)
+ ktime_get_real_ts64(&attr.ia_xtime1)
)

@ depends on patch @
struct inode *node;
struct iattr *attr;
identifier fn;
identifier i_xtime =~ "^i_[acm]time$";
identifier ia_xtime =~ "^ia_[acm]time$";
expression e;
@@
(
- fn(node->i_xtime);
+ fn(timespec64_to_timespec(node->i_xtime));
|
 fn(...,
- node->i_xtime);
+ timespec64_to_timespec(node->i_xtime));
|
- e = fn(attr->ia_xtime);
+ e = fn(timespec64_to_timespec(attr->ia_xtime));
)

@ depends on patch forall @
struct inode *node;
struct iattr *attr;
identifier i_xtime =~ "^i_[acm]time$";
identifier ia_xtime =~ "^ia_[acm]time$";
identifier fn;
@@
{
+ struct timespec ts;
<+...
(
+ ts = timespec64_to_timespec(node->i_xtime);
fn (...,
- &node->i_xtime,
+ &ts,
...);
|
+ ts = timespec64_to_timespec(attr->ia_xtime);
fn (...,
- &attr->ia_xtime,
+ &ts,
...);
)
...+>
}

@ depends on patch forall @
struct inode *node;
struct iattr *attr;
struct kstat *stat;
identifier ia_xtime =~ "^ia_[acm]time$";
identifier i_xtime =~ "^i_[acm]time$";
identifier xtime =~ "^[acm]time$";
identifier fn, ret;
@@
{
+ struct timespec ts;
<+...
(
+ ts = timespec64_to_timespec(node->i_xtime);
ret = fn (...,
- &node->i_xtime,
+ &ts,
...);
|
+ ts = timespec64_to_timespec(node->i_xtime);
ret = fn (...,
- &node->i_xtime);
+ &ts);
|
+ ts = timespec64_to_timespec(attr->ia_xtime);
ret = fn (...,
- &attr->ia_xtime,
+ &ts,
...);
|
+ ts = timespec64_to_timespec(attr->ia_xtime);
ret = fn (...,
- &attr->ia_xtime);
+ &ts);
|
+ ts = timespec64_to_timespec(stat->xtime);
ret = fn (...,
- &stat->xtime);
+ &ts);
)
...+>
}

@ depends on patch @
struct inode *node;
struct inode *node2;
identifier i_xtime1 =~ "^i_[acm]time$";
identifier i_xtime2 =~ "^i_[acm]time$";
identifier i_xtime3 =~ "^i_[acm]time$";
struct iattr *attrp;
struct iattr *attrp2;
struct iattr attr ;
identifier ia_xtime1 =~ "^ia_[acm]time$";
identifier ia_xtime2 =~ "^ia_[acm]time$";
struct kstat *stat;
struct kstat stat1;
struct timespec64 ts;
identifier xtime =~ "^[acmb]time$";
expression e;
@@
(
( node->i_xtime2 \| attrp->ia_xtime2 \| attr.ia_xtime2 \) = node->i_xtime1  ;
|
 node->i_xtime2 = \( node2->i_xtime1 \| timespec64_trunc(...) \);
|
 node->i_xtime2 = node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \);
|
 node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \);
|
 stat->xtime = node2->i_xtime1;
|
 stat1.xtime = node2->i_xtime1;
|
( node->i_xtime2 \| attrp->ia_xtime2 \) = attrp->ia_xtime1  ;
|
( attrp->ia_xtime1 \| attr.ia_xtime1 \) = attrp2->ia_xtime2;
|
- e = node->i_xtime1;
+ e = timespec64_to_timespec( node->i_xtime1 );
|
- e = attrp->ia_xtime1;
+ e = timespec64_to_timespec( attrp->ia_xtime1 );
|
node->i_xtime1 = current_time(...);
|
 node->i_xtime2 = node->i_xtime1 = node->i_xtime3 =
- e;
+ timespec_to_timespec64(e);
|
 node->i_xtime1 = node->i_xtime3 =
- e;
+ timespec_to_timespec64(e);
|
- node->i_xtime1 = e;
+ node->i_xtime1 = timespec_to_timespec64(e);
)

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Cc: <anton@tuxera.com>
Cc: <balbi@kernel.org>
Cc: <bfields@fieldses.org>
Cc: <darrick.wong@oracle.com>
Cc: <dhowells@redhat.com>
Cc: <dsterba@suse.com>
Cc: <dwmw2@infradead.org>
Cc: <hch@lst.de>
Cc: <hirofumi@mail.parknet.co.jp>
Cc: <hubcap@omnibond.com>
Cc: <jack@suse.com>
Cc: <jaegeuk@kernel.org>
Cc: <jaharkes@cs.cmu.edu>
Cc: <jslaby@suse.com>
Cc: <keescook@chromium.org>
Cc: <mark@fasheh.com>
Cc: <miklos@szeredi.hu>
Cc: <nico@linaro.org>
Cc: <reiserfs-devel@vger.kernel.org>
Cc: <richard@nod.at>
Cc: <sage@redhat.com>
Cc: <sfrench@samba.org>
Cc: <swhiteho@redhat.com>
Cc: <tj@kernel.org>
Cc: <trond.myklebust@primarydata.com>
Cc: <tytso@mit.edu>
Cc: <viro@zeniv.linux.org.uk>
2018-06-05 16:57:31 -07:00
Mike Marshall
b1116bc03c orangefs: use sparse annotations for holding locks across function calls.
Sparse complained and Al Viro knew what to do...

Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2018-06-01 14:51:49 -04:00
Mike Marshall
3cf796afed orangefs: make debug_help_fops static
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2018-06-01 14:51:45 -04:00
Mike Marshall
8a6080f574 orangefs: remove unused function orangefs_get_bufmap_init
get_bufmap_init is used in the out-of-tree module, but was left in
the upstream version as an oversight. Tip-of-the-hat to sparse and Al Viro.

Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2018-06-01 14:51:40 -04:00
Mike Marshall
817e9b4d9e orangefs: specify user pointers when using dev_map_desc and bufmap
Sparse lead me to the dev_map_desc one and Al Viro lead me to the bufmap
one.

Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2018-06-01 14:51:36 -04:00
Mike Marshall
95f5f88f89 orangefs: formatting cleanups
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2018-06-01 14:51:30 -04:00
Martin Brandenburg
f6a4b4c9d0 orangefs: set i_size on new symlink
As long as a symlink inode remains in-core, the destination (and
therefore size) will not be re-fetched from the server, as it cannot
change.  The original implementation of the attribute cache assumed that
setting the expiry time in the past was sufficient to cause a re-fetch
of all attributes on the next getattr.  That does not work in this case.

The bug manifested itself as follows.  When the command sequence

touch foo; ln -s foo bar; ls -l bar

is run, the output was

lrwxrwxrwx. 1 fedora fedora 4906 Apr 24 19:10 bar -> foo

However, after a re-mount, ls -l bar produces

lrwxrwxrwx. 1 fedora fedora    3 Apr 24 19:10 bar -> foo

After this commit, even before a re-mount, the output is

lrwxrwxrwx. 1 fedora fedora    3 Apr 24 19:10 bar -> foo

Reported-by: Becky Ligon <ligon@clemson.edu>
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Fixes: 71680c18c8 ("orangefs: Cache getattr results.")
Cc: stable@vger.kernel.org
Cc: hubcap@omnibond.com
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2018-06-01 14:49:46 -04:00
Martin Brandenburg
7f54910fa8 orangefs: report attributes_mask and attributes for statx
OrangeFS formerly failed to set attributes_mask with the result that
software could not see immutable and append flags present in the
filesystem.

Reported-by: Becky Ligon <ligon@clemson.edu>
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Fixes: 68a24a6cc4 ("orangefs: implement statx")
Cc: stable@vger.kernel.org
Cc: hubcap@omnibond.com
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2018-06-01 14:49:36 -04:00
Colin Ian King
ec62e95ae3 orangefs: make struct orangefs_file_vm_ops static
The struct orangefs_file_vm_ops is local to the source and does not need
to be in global scope, so make it static.

Cleans up sparse warning:
fs/orangefs/file.c:547:35: warning: symbol 'orangefs_file_vm_ops' was not
declared. Should it be static?

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2018-06-01 14:49:00 -04:00
Martin Brandenburg
9f8fd53cd0 orangefs: revamp block sizes
Now the superblock block size is PAGE_SIZE.  The inode block size is
PAGE_SIZE for directories and symlinks, but is the server-reported
block size for regular files.

The block size in the OrangeFS private inode is now deleted.  Stat
now reports PAGE_SIZE for directories and symlinks and the
server-reported block size for regular files.

The user-space visible change is that the block size for directores
and symlinks and the superblock is now PAGE_SIZE rather than the size of
the client-core shared memory buffers, which was typically four
megabytes.

Reported-by: Becky Ligon <ligon@clemson.edu>
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Cc: hubcap@omnibond.com
Cc: walt@omnibond.com
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2018-06-01 14:48:31 -04:00
Al Viro
04bb1ba141 orangefs_lookup: simplify
d_splice_alias() can handle NULL and ERR_PTR() for inode just fine...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-22 14:27:58 -04:00
Al Viro
1e2e547a93 do d_instantiate/unlock_new_inode combinations safely
For anything NFS-exported we do _not_ want to unlock new inode
before it has grown an alias; original set of fixes got the
ordering right, but missed the nasty complication in case of
lockdep being enabled - unlock_new_inode() does
	lockdep_annotate_inode_mutex_key(inode)
which can only be done before anyone gets a chance to touch
->i_mutex.  Unfortunately, flipping the order and doing
unlock_new_inode() before d_instantiate() opens a window when
mkdir can race with open-by-fhandle on a guessed fhandle, leading
to multiple aliases for a directory inode and all the breakage
that follows from that.

	Correct solution: a new primitive (d_instantiate_new())
combining these two in the right order - lockdep annotate, then
d_instantiate(), then the rest of unlock_new_inode().  All
combinations of d_instantiate() with unlock_new_inode() should
be converted to that.

Cc: stable@kernel.org	# 2.6.29 and later
Tested-by: Mike Marshall <hubcap@omnibond.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-11 15:36:37 -04:00
Al Viro
659038428c orangefs_kill_sb(): deal with allocation failures
orangefs_fill_sb() might've failed to allocate ORANGEFS_SB(s); don't
oops in that case.

Cc: stable@kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-04-15 23:49:12 -04:00
Linus Torvalds
8ea4a5d84e orangefs: fixes and cleanups
+ Documentation cleanups
  + removal of unused code
  + cause some structs to be static
  + implement Orangefs vm_operations fault callout
  + eliminate two single-use functions and put their cleaned up code in line.
  + replace a vmalloc/memset instance with vzalloc
  + fix a race condition bug in wait code.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJayk8pAAoJEM9EDqnrzg2+92IP/2AqMOUodqF+AfKrAebQIAEk
 DvQefRC2eQXN2Ak8KYSO7oMdqfY7xeVSaKllWsSHdnktI5L9l6AOBa7uYQYF4OQt
 sqT5Wj5sbnDS54gOfFt95sUHEQXs5wegTXBE15t4uxaLSpv22mPjzqlTzyZcK7IX
 nhSIxEi+utXOW1WodT2xLgS08ac693KXciNNN14NKi5DJ+h6a8Z2kZBq/6ssC1oV
 nW2tIfRVWNVk6cfJ+Wq12JksEZOPemzjv2zojv3PZwCcwObOyxvf16nxnLjdkJvw
 obYWmR8ZcF3TItBbkxOTfanr+UAhZ55YVYDpitJmV9fpPM9eoPAR0gNWBgq45kaS
 s27pvMYf5uvGRJHgEO6hBpmi1dExDDMLfH6Y4IN+zyHKjNfgvfQKwIkzrjlmG6q6
 AgO3mbbMJfid2x7lkShuP1uYqUfv03fo2xW3zxWjjxo3tEbADSgzJQ5kjPixDvcB
 0ru2SlRVqgYIFqBN8VgWkXKbfcpSEcDITmFEkBKg/SkA08ry4iLk744UqIPa5n76
 5DYB0u99nHPMf2BBVsz6L/mUfVuHBYUBx1iII95PqJhVlj2Yi0sIj3AHUzQ4neKS
 YWGMk4VQ+B5zCObSohCRv9l3ECUqugpiJzCuZjZZS5LUkj9RO5EJ1LDhOU/31xDL
 OUTgZl7f5NykxwIHh3bD
 =zJ1Z
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-4.17-ofs' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux

Pull orangefs updates from Mike Marshall:
 "Fixes and cleanups:

   - Documentation cleanups

   - removal of unused code

   - make some structs static

   - implement Orangefs vm_operations fault callout

   - eliminate two single-use functions and put their cleaned up code in
     line.

   - replace a vmalloc/memset instance with vzalloc

   - fix a race condition bug in wait code"

* tag 'for-linus-4.17-ofs' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
  Orangefs: documentation updates
  orangefs: document package install and xfstests procedure
  orangefs: remove unused code
  orangefs: make several *_operations structs static
  orangefs: implement vm_ops->fault
  orangefs: open code short single-use functions
  orangefs: replace vmalloc and memset with vzalloc
  orangefs: bug fix for a race condition when getting a slot
2018-04-09 12:45:47 -07:00
Linus Torvalds
9022ca6b11 Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc vfs updates from Al Viro:
 "Assorted stuff, including Christoph's I_DIRTY patches"

* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fs: move I_DIRTY_INODE to fs.h
  ubifs: fix bogus __mark_inode_dirty(I_DIRTY_SYNC | I_DIRTY_DATASYNC) call
  ntfs: fix bogus __mark_inode_dirty(I_DIRTY_SYNC | I_DIRTY_DATASYNC) call
  gfs2: fix bogus __mark_inode_dirty(I_DIRTY_SYNC | I_DIRTY_DATASYNC) calls
  fs: fold open_check_o_direct into do_dentry_open
  vfs: Replace stray non-ASCII homoglyph characters with their ASCII equivalents
  vfs: make sure struct filename->iname is word-aligned
  get rid of pointless includes of fs_struct.h
  [poll] annotate SAA6588_CMD_POLL users
2018-04-06 11:07:08 -07:00
Martin Brandenburg
209469d978 orangefs: remove unused code
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2018-04-03 21:55:28 -04:00
Martin Brandenburg
bdd6f08358 orangefs: make several *_operations structs static
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2018-04-03 21:55:27 -04:00
Martin Brandenburg
a5135eeab2 orangefs: implement vm_ops->fault
Must retrieve size before running filemap_fault so the kernel has
an up-to-date size.

This should have been caught by xfstests generic/246, but it was masked
by orangefs_new_inode, which set i_size to PAGE_SIZE.  When nothing
caused a getattr prior to a pagefault, i_size was still PAGE_SIZE.
Since xfstests only read 10 bytes, it did not catch this bug.

When orangefs_new_inode was modified to perform a getattr instead,
i_size was set to zero, as it was a newly created file.  Then
orangefs_file_write_iter did NOT set i_size.  Instead it invalidated the
attribute cache, which should have caused the next caller to retrieve
i_size.  But the fault handler did not know it was supposed to retrieve
i_size.  So during xfstests, i_size was still zero, and filemap_fault
returned VM_FAULT_SIGBUS.

Fixes xfstests generic/452.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2018-04-03 21:55:27 -04:00
Martin Brandenburg
dbcb5e7fc4 orangefs: open code short single-use functions
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2018-04-02 08:10:17 -04:00
Colin Ian King
81e3d0253f orangefs: replace vmalloc and memset with vzalloc
Use vzalloc instead of the vmalloc, memset combo

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2018-04-02 08:10:17 -04:00
David Reynolds
c2676ef801 orangefs: bug fix for a race condition when getting a slot
When a slot becomes free, call wake_up_locked regardless of the number
of slots available.

Without this patch, wake_up_locked is only called when going from no
free slots to one. This means that there is a chance a waiting task
will not be woken up. In many cases, the system will bounce between 0
and 1 free slots, and the waiting tasks will be woken up. But if there
is still a waiting task and another slot becomes available before the
number of free slots reaches zero, that waiting task may never be woken
up since the number of free slots may never reach zero again.

The bug behavior is easy to reproduce with the following script,
where /mnt/orangefs is an OrangeFS file system.

for i in {1..100}; do
	for j in {1..20}; do
		dd if=/dev/zero of=/mnt/orangefs/tmp$j bs=32768 count=32 &
	done
	wait
done

Signed-off-by: David Reynolds <david@omnibond.com>
Reviewed-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2018-04-02 08:10:17 -04:00
Masanari Iida
bc8282a730 treewide: Fix typos in printk
This patch fixes spelling typos found in printk.

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2018-03-27 09:51:22 +02:00
Al Viro
304ec482f5 get rid of pointless includes of fs_struct.h
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-02-22 14:28:50 -05:00
Linus Torvalds
a9a08845e9 vfs: do bulk POLL* -> EPOLL* replacement
This is the mindless scripted replacement of kernel use of POLL*
variables as described by Al, done by this script:

    for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
        L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
        for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
    done

with de-mangling cleanups yet to come.

NOTE! On almost all architectures, the EPOLL* constants have the same
values as the POLL* constants do.  But they keyword here is "almost".
For various bad reasons they aren't the same, and epoll() doesn't
actually work quite correctly in some cases due to this on Sparc et al.

The next patch from Al will sort out the final differences, and we
should be all done.

Scripted-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-11 14:34:03 -08:00
Linus Torvalds
a0f79386a4 Mostly cleanups, but three bug fixes:
1. don't pass garbage return codes back up the call chain (Mike Marshall)
 
  2. fix stale inode test (Martin Brandenburg)
 
  3. fix off-by-one errors (Xiongfeng Wang)
 
 Also: add Martin as a reviewer in the Maintainers file.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJaejneAAoJEM9EDqnrzg2+XhoQAIDF112mOwLwqDPmr4ty0g6/
 gBcoHOrRFlYWPlS5aubjoZ3jFX2fAeNuHzYS4LIuqVKUdsC+oTKQ2URJ7KKpvLiK
 6zOaz2Y4GLns2sa1ZUKli6nEBbPi6uwoF54FNbwt3b+97wpmJwlnXm9ztyt5REKA
 zOHvLgJAcfGNZEJ7gyB1zjwllu4JeD0A4MoN4vJCtkKLAaNClywu4+V0jwZB+SSN
 8QjDXNqkcD31ahWhQ/CaU4zXlxOOV+4ZR7/p5IKT693hEhV+ikTvmXy8g0+bksxj
 L+FHmQMTO+GqCS5FxuBQd3v1IP5FkoHEmAwvr3C5aMlRAaVJ9eVVIZaC9CpOJBRB
 S/CiaG2Mw8vx8VGOm8O93Z+xDi9tCYP8x4i7b5r62h0T9wSyHJSkSIUd6VIkCV9Q
 c92bX/N3wHBvCPT+RC898plni5HsFpzs3vSs8hiaAICgp64sC8pIqVlZOAdMtJd8
 RL4la/Fited/T+3BpaCTkmnvNk8Ktax7wHYsCt4gSyHN8WRvkzowgC5kV6S30Qlh
 zfoXG0K50FcU8T5r3i8slvUHmsiyYxYwJIk/z1iDgXI7y4IIR6FGDxQmw5TxgNS7
 +veTo6FCxon6QshtpAOeELCau7qNXhtlDdGqqm4+gDfMWoCn0Jem/LzdA2gPXCOr
 iCDwHLiu6WXt7ZHTrgln
 =xrih
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux

Pull orangefs updates from Mike Marshall:
 "Mostly cleanups, but three bug fixes:

   - don't pass garbage return codes back up the call chain (Mike
     Marshall)

   - fix stale inode test (Martin Brandenburg)

   - fix off-by-one errors (Xiongfeng Wang)

  Also add Martin as a reviewer in the Maintainers file"

* tag 'for-linus-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
  orangefs: reverse sense of is-inode-stale test in d_revalidate
  orangefs: simplify orangefs_inode_is_stale
  Orangefs: don't propogate whacky error codes
  orangefs: use correct string length
  orangefs: make orangefs_make_bad_inode static
  orangefs: remove ORANGEFS_KERNEL_DEBUG
  orangefs: remove gossip_ldebug and gossip_lerr
  orangefs: make orangefs_client_debug_init static
  MAINTAINERS: update orangefs list and add myself as reviewer
2018-02-08 12:20:41 -08:00
Martin Brandenburg
74e938c227 orangefs: reverse sense of is-inode-stale test in d_revalidate
If a dentry is deleted, then a dentry is recreated with the same handle
but a different type (i.e. it was a file and now it's a symlink), then
its a different inode.  The check was backwards, so d_revalidate would
not have noticed.

Due to the design of the OrangeFS server, this is rather unlikely.

It's also possible for the dentry to be deleted and recreated with the
same type.  This would be undetectable.  It's a bit of a ship of
Theseus.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2018-02-06 16:38:13 -05:00
Martin Brandenburg
480e5ae9b8 orangefs: simplify orangefs_inode_is_stale
Check whether this is a new inode at location of call.

Raises the question of what to do with an unknown inode type.  Old code
would've marked the inode bad and returned ESTALE.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2018-02-06 16:38:13 -05:00
Mike Marshall
cf546ab6b1 Orangefs: don't propogate whacky error codes
When we get an error return code from userspace (the client-core)
we check to make sure it is a valid code.

This patch maps the whacky return code to -EINVAL instead of
propagating garbage back up the call chain potentially resulting
in a hard-to-find train-wreck.

The client-core doesn't have any business returning whacky return
codes, but if it does, we don't want the kernel to crash as a result.

Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2018-02-06 16:38:12 -05:00
Xiongfeng Wang
6bdfb48dae orangefs: use correct string length
gcc-8 reports

fs/orangefs/dcache.c: In function 'orangefs_d_revalidate':
./include/linux/string.h:245:9: warning: '__builtin_strncpy' specified
bound 256 equals destination size [-Wstringop-truncation]

fs/orangefs/namei.c: In function 'orangefs_rename':
./include/linux/string.h:245:9: warning: '__builtin_strncpy' specified
bound 256 equals destination size [-Wstringop-truncation]

fs/orangefs/super.c: In function 'orangefs_mount':
./include/linux/string.h:245:9: warning: '__builtin_strncpy' specified
bound 256 equals destination size [-Wstringop-truncation]

We need one less byte or call strlcpy() to make it a nul-terminated
string.

Signed-off-by: Xiongfeng Wang <xiongfeng.wang@linaro.org>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2018-02-06 16:38:12 -05:00
Martin Brandenburg
4d0cac7e75 orangefs: make orangefs_make_bad_inode static
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2018-02-06 16:38:12 -05:00
Martin Brandenburg
538e304821 orangefs: remove ORANGEFS_KERNEL_DEBUG
It wasn't possible to enable it, and it would've had very little effect.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2018-02-06 16:38:12 -05:00
Martin Brandenburg
79d7cd611d orangefs: remove gossip_ldebug and gossip_lerr
gossip_ldebug is unused.

gossip_lerr is used in two places.  The messages are unique so line
numbers are unnecessary.

Also remove support for compiling gossip messages out.  It wasn't
possible to enable it anyway.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2018-02-06 16:38:12 -05:00
Martin Brandenburg
7a3bc1f019 orangefs: make orangefs_client_debug_init static
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2018-02-06 16:38:12 -05:00
Linus Torvalds
617aebe6a9 Currently, hardened usercopy performs dynamic bounds checking on slab
cache objects. This is good, but still leaves a lot of kernel memory
 available to be copied to/from userspace in the face of bugs. To further
 restrict what memory is available for copying, this creates a way to
 whitelist specific areas of a given slab cache object for copying to/from
 userspace, allowing much finer granularity of access control. Slab caches
 that are never exposed to userspace can declare no whitelist for their
 objects, thereby keeping them unavailable to userspace via dynamic copy
 operations. (Note, an implicit form of whitelisting is the use of constant
 sizes in usercopy operations and get_user()/put_user(); these bypass all
 hardened usercopy checks since these sizes cannot change at runtime.)
 
 This new check is WARN-by-default, so any mistakes can be found over the
 next several releases without breaking anyone's system.
 
 The series has roughly the following sections:
 - remove %p and improve reporting with offset
 - prepare infrastructure and whitelist kmalloc
 - update VFS subsystem with whitelists
 - update SCSI subsystem with whitelists
 - update network subsystem with whitelists
 - update process memory with whitelists
 - update per-architecture thread_struct with whitelists
 - update KVM with whitelists and fix ioctl bug
 - mark all other allocations as not whitelisted
 - update lkdtm for more sensible test overage
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 Comment: Kees Cook <kees@outflux.net>
 
 iQIcBAABCgAGBQJabvleAAoJEIly9N/cbcAmO1kQAJnjVPutnLSbnUteZxtsv7W4
 43Cggvokfxr6l08Yh3hUowNxZVKjhF9uwMVgRRg9Nl5WdYCN+vCQbHz+ZdzGJXKq
 cGqdKWgexMKX+aBdNDrK7BphUeD46sH7JWR+a/lDV/BgPxBCm9i5ZZCgXbPP89AZ
 NpLBji7gz49wMsnm/x135xtNlZ3dG0oKETzi7MiR+NtKtUGvoIszSKy5JdPZ4m8q
 9fnXmHqmwM6uQFuzDJPt1o+D1fusTuYnjI7EgyrJRRhQ+BB3qEFZApXnKNDRS9Dm
 uB7jtcwefJCjlZVCf2+PWTOEifH2WFZXLPFlC8f44jK6iRW2Nc+wVRisJ3vSNBG1
 gaRUe/FSge68eyfQj5OFiwM/2099MNkKdZ0fSOjEBeubQpiFChjgWgcOXa5Bhlrr
 C4CIhFV2qg/tOuHDAF+Q5S96oZkaTy5qcEEwhBSW15ySDUaRWFSrtboNt6ZVOhug
 d8JJvDCQWoNu1IQozcbv6xW/Rk7miy8c0INZ4q33YUvIZpH862+vgDWfTJ73Zy9H
 jR/8eG6t3kFHKS1vWdKZzOX1bEcnd02CGElFnFYUEewKoV7ZeeLsYX7zodyUAKyi
 Yp5CImsDbWWTsptBg6h9nt2TseXTxYCt2bbmpJcqzsqSCUwOQNQ4/YpuzLeG0ihc
 JgOmUnQNJWCTwUUw5AS1
 =tzmJ
 -----END PGP SIGNATURE-----

Merge tag 'usercopy-v4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull hardened usercopy whitelisting from Kees Cook:
 "Currently, hardened usercopy performs dynamic bounds checking on slab
  cache objects. This is good, but still leaves a lot of kernel memory
  available to be copied to/from userspace in the face of bugs.

  To further restrict what memory is available for copying, this creates
  a way to whitelist specific areas of a given slab cache object for
  copying to/from userspace, allowing much finer granularity of access
  control.

  Slab caches that are never exposed to userspace can declare no
  whitelist for their objects, thereby keeping them unavailable to
  userspace via dynamic copy operations. (Note, an implicit form of
  whitelisting is the use of constant sizes in usercopy operations and
  get_user()/put_user(); these bypass all hardened usercopy checks since
  these sizes cannot change at runtime.)

  This new check is WARN-by-default, so any mistakes can be found over
  the next several releases without breaking anyone's system.

  The series has roughly the following sections:
   - remove %p and improve reporting with offset
   - prepare infrastructure and whitelist kmalloc
   - update VFS subsystem with whitelists
   - update SCSI subsystem with whitelists
   - update network subsystem with whitelists
   - update process memory with whitelists
   - update per-architecture thread_struct with whitelists
   - update KVM with whitelists and fix ioctl bug
   - mark all other allocations as not whitelisted
   - update lkdtm for more sensible test overage"

* tag 'usercopy-v4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (38 commits)
  lkdtm: Update usercopy tests for whitelisting
  usercopy: Restrict non-usercopy caches to size 0
  kvm: x86: fix KVM_XEN_HVM_CONFIG ioctl
  kvm: whitelist struct kvm_vcpu_arch
  arm: Implement thread_struct whitelist for hardened usercopy
  arm64: Implement thread_struct whitelist for hardened usercopy
  x86: Implement thread_struct whitelist for hardened usercopy
  fork: Provide usercopy whitelisting for task_struct
  fork: Define usercopy region in thread_stack slab caches
  fork: Define usercopy region in mm_struct slab caches
  net: Restrict unwhitelisted proto caches to size 0
  sctp: Copy struct sctp_sock.autoclose to userspace using put_user()
  sctp: Define usercopy region in SCTP proto slab cache
  caif: Define usercopy region in caif proto slab cache
  ip: Define usercopy region in IP proto slab cache
  net: Define usercopy region in struct proto slab cache
  scsi: Define usercopy region in scsi_sense_cache slab cache
  cifs: Define usercopy region in cifs_request slab cache
  vxfs: Define usercopy region in vxfs_inode slab cache
  ufs: Define usercopy region in ufs_inode_cache slab cache
  ...
2018-02-03 16:25:42 -08:00
Linus Torvalds
168fe32a07 Merge branch 'misc.poll' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull poll annotations from Al Viro:
 "This introduces a __bitwise type for POLL### bitmap, and propagates
  the annotations through the tree. Most of that stuff is as simple as
  'make ->poll() instances return __poll_t and do the same to local
  variables used to hold the future return value'.

  Some of the obvious brainos found in process are fixed (e.g. POLLIN
  misspelled as POLL_IN). At that point the amount of sparse warnings is
  low and most of them are for genuine bugs - e.g. ->poll() instance
  deciding to return -EINVAL instead of a bitmap. I hadn't touched those
  in this series - it's large enough as it is.

  Another problem it has caught was eventpoll() ABI mess; select.c and
  eventpoll.c assumed that corresponding POLL### and EPOLL### were
  equal. That's true for some, but not all of them - EPOLL### are
  arch-independent, but POLL### are not.

  The last commit in this series separates userland POLL### values from
  the (now arch-independent) kernel-side ones, converting between them
  in the few places where they are copied to/from userland. AFAICS, this
  is the least disruptive fix preserving poll(2) ABI and making epoll()
  work on all architectures.

  As it is, it's simply broken on sparc - try to give it EPOLLWRNORM and
  it will trigger only on what would've triggered EPOLLWRBAND on other
  architectures. EPOLLWRBAND and EPOLLRDHUP, OTOH, are never triggered
  at all on sparc. With this patch they should work consistently on all
  architectures"

* 'misc.poll' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (37 commits)
  make kernel-side POLL... arch-independent
  eventpoll: no need to mask the result of epi_item_poll() again
  eventpoll: constify struct epoll_event pointers
  debugging printk in sg_poll() uses %x to print POLL... bitmap
  annotate poll(2) guts
  9p: untangle ->poll() mess
  ->si_band gets POLL... bitmap stored into a user-visible long field
  ring_buffer_poll_wait() return value used as return value of ->poll()
  the rest of drivers/*: annotate ->poll() instances
  media: annotate ->poll() instances
  fs: annotate ->poll() instances
  ipc, kernel, mm: annotate ->poll() instances
  net: annotate ->poll() instances
  apparmor: annotate ->poll() instances
  tomoyo: annotate ->poll() instances
  sound: annotate ->poll() instances
  acpi: annotate ->poll() instances
  crypto: annotate ->poll() instances
  block: annotate ->poll() instances
  x86: annotate ->poll() instances
  ...
2018-01-30 17:58:07 -08:00
Martin Brandenburg
6793f1c450 orangefs: fix deadlock; do not write i_size in read_iter
After do_readv_writev, the inode cache is invalidated anyway, so i_size
will never be read.  It will be fetched from the server which will also
know about updates from other machines.

Fixes deadlock on 32-bit SMP.

See https://marc.info/?l=linux-fsdevel&m=151268557427760&w=2

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Mike Marshall <hubcap@omnibond.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-25 17:26:24 -08:00
Martin Brandenburg
a0ec1ded22 orangefs: initialize op on loop restart in orangefs_devreq_read
In orangefs_devreq_read, there is a loop which picks an op off the list
of pending ops.  If the loop fails to find an op, there is nothing to
read, and it returns EAGAIN.  If the op has been given up on, the loop
is restarted via a goto.  The bug is that the variable which the found
op is written to is not reinitialized, so if there are no more eligible
ops on the list, the code runs again on the already handled op.

This is triggered by interrupting a process while the op is being copied
to the client-core.  It's a fairly small window, but it's there.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-22 13:51:14 -08:00
Martin Brandenburg
0afc0decf2 orangefs: use list_for_each_entry_safe in purge_waiting_ops
set_op_state_purged can delete the op.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-22 13:51:14 -08:00
David Windsor
6b330623e5 orangefs: Define usercopy region in orangefs_inode_cache slab cache
orangefs symlink pathnames, stored in struct orangefs_inode_s.link_target
and therefore contained in the orangefs_inode_cache, need to be copied
to/from userspace.

cache object allocation:
    fs/orangefs/super.c:
        orangefs_alloc_inode(...):
            ...
            orangefs_inode = kmem_cache_alloc(orangefs_inode_cache, ...);
            ...
            return &orangefs_inode->vfs_inode;

    fs/orangefs/orangefs-utils.c:
        exofs_symlink(...):
            ...
            inode->i_link = orangefs_inode->link_target;

example usage trace:
    readlink_copy+0x43/0x70
    vfs_readlink+0x62/0x110
    SyS_readlinkat+0x100/0x130

    fs/namei.c:
        readlink_copy(..., link):
            ...
            copy_to_user(..., link, len);

        (inlined in vfs_readlink)
        generic_readlink(dentry, ...):
            struct inode *inode = d_inode(dentry);
            const char *link = inode->i_link;
            ...
            readlink_copy(..., link);

In support of usercopy hardening, this patch defines a region in the
orangefs_inode_cache slab cache in which userspace copy operations are
allowed.

This region is known as the slab cache's usercopy region. Slab caches
can now check that each dynamically sized copy operation involving
cache-managed memory falls entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, provide usage trace]
Cc: Mike Marshall <hubcap@omnibond.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
2018-01-15 12:07:55 -08:00
Al Viro
076ccb76e1 fs: annotate ->poll() instances
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-11-27 16:20:05 -05:00
Al Viro
e410c60360 orangefs: fix a braino in ->poll()
It's POLLIN, not POLL_IN...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-11-27 16:19:38 -05:00
Linus Torvalds
1751e8a6cb Rename superblock flags (MS_xyz -> SB_xyz)
This is a pure automated search-and-replace of the internal kernel
superblock flags.

The s_flags are now called SB_*, with the names and the values for the
moment mirroring the MS_* flags that they're equivalent to.

Note how the MS_xyz flags are the ones passed to the mount system call,
while the SB_xyz flags are what we then use in sb->s_flags.

The script to do this was:

    # places to look in; re security/*: it generally should *not* be
    # touched (that stuff parses mount(2) arguments directly), but
    # there are two places where we really deal with superblock flags.
    FILES="drivers/mtd drivers/staging/lustre fs ipc mm \
            include/linux/fs.h include/uapi/linux/bfs_fs.h \
            security/apparmor/apparmorfs.c security/apparmor/include/lib.h"
    # the list of MS_... constants
    SYMS="RDONLY NOSUID NODEV NOEXEC SYNCHRONOUS REMOUNT MANDLOCK \
          DIRSYNC NOATIME NODIRATIME BIND MOVE REC VERBOSE SILENT \
          POSIXACL UNBINDABLE PRIVATE SLAVE SHARED RELATIME KERNMOUNT \
          I_VERSION STRICTATIME LAZYTIME SUBMOUNT NOREMOTELOCK NOSEC BORN \
          ACTIVE NOUSER"

    SED_PROG=
    for i in $SYMS; do SED_PROG="$SED_PROG -e s/MS_$i/SB_$i/g"; done

    # we want files that contain at least one of MS_...,
    # with fs/namespace.c and fs/pnode.c excluded.
    L=$(for i in $SYMS; do git grep -w -l MS_$i $FILES; done| sort|uniq|grep -v '^fs/namespace.c'|grep -v '^fs/pnode.c')

    for f in $L; do sed -i $f $SED_PROG; done

Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-27 13:05:09 -08:00
Linus Torvalds
b620fd2df2 3 Cleanups: remove initialization of i_version - Jeff Layton
use ARRAY_SIZE - Jérémy Lefaure
             call op_release sooner when creating inodes - Martin Brandenburg
 
 1 Patch: stop setting atime on inode dirty - Martin Brandenburg
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJaEyiIAAoJEM9EDqnrzg2+LvEP+QGkMxX7i0Y4KSbIIWPkE3Ec
 y5OrEV8NjBg3u9eINNIfym65blGiOKK++dltSm7UAM//QoctMpG+HAhUMsFQsf3H
 XBvosvdRwxd9n/vxkcA9KsICdRKDi//vBoAS9EiyQYZfn1spE4LZBs+uZxtkQpIY
 ofUOdGYDOsXE5Jb8oBz2PRS3nQWPsflIOs2y1oTiwAfjP6WIBq11tu7wdamUJ02A
 F7vvFTA5wbxuuZq9cLA52Ho7IVR09GiymSaDTbilPK3d73eaacVl/zlfYcdMVRJA
 YmsyXcgdpgLhgiKl4B969dWU5p2X7a3cbkexTbIU+iFXcq685OohLj/SacFYH1eA
 /eZibdz9UhO6rLGwR5YDQ50lMIzwPYxMM98f8E/jjfxdRFrG3Pu4A2yLjDtaJYZc
 ATJDVk491xnGOhYDARQ6Wt/Dy3Yj0TtPsJeXggR6NiXH4AgsjZxToD2QgHXBhynb
 2+dFadBb0erFMT1rB295thBGJWeD6kArIXwZS9alz83z/VH7O5rpjIx0I4Qj5NeP
 fZEYHf3E2+jFVQzqdw31fK6nTVsCN6/YhSwSYOGo+MAdvurCVxuFp0ulUM6FOCGR
 cfNYle/KrP3q1A3zzR4lpSDLXbXGKYbmImEYw4pobYH/vnjAtNVOpcEAMaxGyogm
 NUbQyGgcP9JIglkLSlQ5
 =nqnT
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-4.15-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux

Pull orangefs updates from Mike Marshall:
 "Fix:

   - stop setting atime on inode dirty (Martin Brandenburg)

  Cleanups:

   - remove initialization of i_version (Jeff Layton)

   - use ARRAY_SIZE (Jérémy Lefaure)

   - call op_release sooner when creating inodes (Mike MarshallMartin
     Brandenburg)"

* tag 'for-linus-4.15-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
  orangefs: call op_release sooner when creating inodes
  orangefs: stop setting atime on inode dirty
  orangefs: use ARRAY_SIZE
  orangefs: remove initialization of i_version
2017-11-21 05:40:48 -10:00
Linus Torvalds
16382e17c0 Merge branch 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull iov_iter updates from Al Viro:

 - bio_{map,copy}_user_iov() series; those are cleanups - fixes from the
   same pile went into mainline (and stable) in late September.

 - fs/iomap.c iov_iter-related fixes

 - new primitive - iov_iter_for_each_range(), which applies a function
   to kernel-mapped segments of an iov_iter.

   Usable for kvec and bvec ones, the latter does kmap()/kunmap() around
   the callback. _Not_ usable for iovec- or pipe-backed iov_iter; the
   latter is not hard to fix if the need ever appears, the former is by
   design.

   Another related primitive will have to wait for the next cycle - it
   passes page + offset + size instead of pointer + size, and that one
   will be usable for everything _except_ kvec. Unfortunately, that one
   didn't get exposure in -next yet, so...

 - a bit more lustre iov_iter work, including a use case for
   iov_iter_for_each_range() (checksum calculation)

 - vhost/scsi leak fix in failure exit

 - misc cleanups and detritectomy...

* 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (21 commits)
  iomap_dio_actor(): fix iov_iter bugs
  switch ksocknal_lib_recv_...() to use of iov_iter_for_each_range()
  lustre: switch struct ksock_conn to iov_iter
  vhost/scsi: switch to iov_iter_get_pages()
  fix a page leak in vhost_scsi_iov_to_sgl() error recovery
  new primitive: iov_iter_for_each_range()
  lnet_return_rx_credits_locked: don't abuse list_entry
  xen: don't open-code iov_iter_kvec()
  orangefs: remove detritus from struct orangefs_kiocb_s
  kill iov_shorten()
  bio_alloc_map_data(): do bmd->iter setup right there
  bio_copy_user_iov(): saner bio size calculation
  bio_map_user_iov(): get rid of copying iov_iter
  bio_copy_from_iter(): get rid of copying iov_iter
  move more stuff down into bio_copy_user_iov()
  blk_rq_map_user_iov(): move iov_iter_advance() down
  bio_map_user_iov(): get rid of the iov_for_each()
  bio_map_user_iov(): move alignment check into the main loop
  don't rely upon subsequent bio_add_pc_page() calls failing
  ... and with iov_iter_get_pages_alloc() it becomes even simpler
  ...
2017-11-17 12:08:18 -08:00
Martin Brandenburg
db0267e7af orangefs: call op_release sooner when creating inodes
Prevents holding an unnecessary op while the kernel processes another op
and yields the CPU.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-11-13 15:10:18 -05:00
Martin Brandenburg
a55f2d8615 orangefs: stop setting atime on inode dirty
The previous code path was to mark the inode dirty, let
orangefs_inode_dirty set a flag in our private inode, then later during
inode release call orangefs_flush_inode which notices the flag and
writes the atime out.

The code path worked almost identically for mtime, ctime, and mode
except that those flags are set explicitly and not as side effects of
dirty.

Now orangefs_flush_inode is removed.  Marking an inode dirty does not
imply an atime update.  Any place where flags were set before is now
an explicit call to orangefs_inode_setattr.  Since OrangeFS does not
utilize inode writeback, the attribute change should be written out
immediately.

Fixes generic/120.

In namei.c, there are several places where the directory mtime and ctime
are set, but only the mtime is sent to the server.  These don't seem
right, but I've left them as is for now.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-11-13 15:10:11 -05:00
Jérémy Lefaure
296200d3bb orangefs: use ARRAY_SIZE
Using the ARRAY_SIZE macro improves the readability of the code.

Found with Coccinelle with the following semantic patch:
@r depends on (org || report)@
type T;
T[] E;
position p;
@@
(
 (sizeof(E)@p /sizeof(*E))
|
 (sizeof(E)@p /sizeof(E[...]))
|
 (sizeof(E)@p /sizeof(T))
)

Signed-off-by: Jérémy Lefaure <jeremy.lefaure@lse.epita.fr>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-11-13 15:09:58 -05:00
Jeff Layton
933f7ac1a1 orangefs: remove initialization of i_version
...as it's completely unused.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-11-13 15:09:33 -05:00
Greg Kroah-Hartman
b24413180f License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier.  The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
 - file had no licensing information it it.
 - file was a */uapi/* one with no licensing information in it,
 - file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne.  Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed.  Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
 - Files considered eligible had to be source code files.
 - Make and config files were included as candidates if they contained >5
   lines of source
 - File already had some variant of a license header in it (even if <5
   lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

 - when both scanners couldn't find any license traces, file was
   considered to have no license information in it, and the top level
   COPYING file license applied.

   For non */uapi/* files that summary was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0                                              11139

   and resulted in the first patch in this series.

   If that file was a */uapi/* path one, it was "GPL-2.0 WITH
   Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0 WITH Linux-syscall-note                        930

   and resulted in the second patch in this series.

 - if a file had some form of licensing information in it, and was one
   of the */uapi/* ones, it was denoted with the Linux-syscall-note if
   any GPL family license was found in the file or had no licensing in
   it (per prior point).  Results summary:

   SPDX license identifier                            # files
   ---------------------------------------------------|------
   GPL-2.0 WITH Linux-syscall-note                       270
   GPL-2.0+ WITH Linux-syscall-note                      169
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
   LGPL-2.1+ WITH Linux-syscall-note                      15
   GPL-1.0+ WITH Linux-syscall-note                       14
   ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
   LGPL-2.0+ WITH Linux-syscall-note                       4
   LGPL-2.1 WITH Linux-syscall-note                        3
   ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
   ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1

   and that resulted in the third patch in this series.

 - when the two scanners agreed on the detected license(s), that became
   the concluded license(s).

 - when there was disagreement between the two scanners (one detected a
   license but the other didn't, or they both detected different
   licenses) a manual inspection of the file occurred.

 - In most cases a manual inspection of the information in the file
   resulted in a clear resolution of the license that should apply (and
   which scanner probably needed to revisit its heuristics).

 - When it was not immediately clear, the license identifier was
   confirmed with lawyers working with the Linux Foundation.

 - If there was any question as to the appropriate license identifier,
   the file was flagged for further research and to be revisited later
   in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights.  The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
 - a full scancode scan run, collecting the matched texts, detected
   license ids and scores
 - reviewing anything where there was a license detected (about 500+
   files) to ensure that the applied SPDX license was correct
 - reviewing anything where there was no detection but the patch license
   was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
   SPDX license was correct

This produced a worksheet with 20 files needing minor correction.  This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg.  Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected.  This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.)  Finally Greg ran the script using the .csv files to
generate the patches.

Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-02 11:10:55 +01:00
Al Viro
6570f0dd60 orangefs: remove detritus from struct orangefs_kiocb_s
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-10-11 17:23:44 -04:00
Markus Elfring
0b08273c8a orangefs: Adjust three checks for null pointers
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The script “checkpatch.pl” pointed information out like the following.

Comparison to NULL could be written !…

Thus fix affected source code places.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-09-14 14:58:31 -04:00
Markus Elfring
5e273a0e06 orangefs: Use kcalloc() in orangefs_prepare_cdm_array()
* A multiplication for the size determination of a memory allocation
  indicated that an array data structure should be processed.
  Thus use the corresponding function "kcalloc".

  This issue was detected by using the Coccinelle software.

* Replace the specification of a data structure by a pointer dereference
  to make the corresponding size determination a bit safer according to
  the Linux coding style convention.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-09-14 14:58:30 -04:00
Markus Elfring
07a258531c orangefs: Delete error messages for a failed memory allocation in five functions
Omit an extra message for a memory allocation failure in these functions.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-09-14 14:58:29 -04:00
Julia Lawall
1217444405 orangefs: constify xattr_handler structure
The xattr_handler structure is only stored in an array of const
structures.  Thus the xattr_handler structure itself can be
const.

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-09-14 14:58:29 -04:00
Jeff Layton
49e5571324 orangefs: don't call filemap_write_and_wait from fsync
Orangefs doesn't do buffered writes yet, so there's no point in
initiating and waiting for writeback.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-09-14 14:58:28 -04:00
Dan Carpenter
5f13e58767 orangefs: off by ones in xattr size checks
A previous patch which claimed to remove off by ones actually introduced
them.

strlen() returns the length of the string not including the NUL
character.  We are using strcpy() to copy "name" into a buffer which is
ORANGEFS_MAX_XATTR_NAMELEN characters long.  We should make sure to
leave space for the NUL, otherwise we're writing one character beyond
the end of the buffer.

Fixes: e675c5ec51 ("orangefs: clean up oversize xattr validation")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-09-14 14:58:27 -04:00
Mike Marshall
4bef69000d orangefs: react properly to posix_acl_update_mode's aftermath.
posix_acl_update_mode checks to see if the permissions
described by the ACL can be encoded into the
object's mode. If so, it sets "acl" to NULL
and "mode" to the new desired value. Prior to this patch
we failed to actually propagate the new mode back to the
server.

Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-09-14 14:54:38 -04:00
Jan Kara
b5accbb0df orangefs: Don't clear SGID when inheriting ACLs
When new directory 'DIR1' is created in a directory 'DIR0' with SGID bit
set, DIR1 is expected to have SGID bit set (and owning group equal to
the owning group of 'DIR0'). However when 'DIR0' also has some default
ACLs that 'DIR1' inherits, setting these ACLs will result in SGID bit on
'DIR1' to get cleared if user is not member of the owning group.

Fix the problem by creating __orangefs_set_acl() function that does not
call posix_acl_update_mode() and use it when inheriting ACLs. That
prevents SGID bit clearing and the mode has been properly set by
posix_acl_create() anyway.

Fixes: 073931017b
CC: stable@vger.kernel.org
CC: Mike Marshall <hubcap@omnibond.com>
CC: pvfs2-developers@beowulf-underground.org
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-09-14 14:54:37 -04:00
Linus Torvalds
78dcf73421 Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull ->s_options removal from Al Viro:
 "Preparations for fsmount/fsopen stuff (coming next cycle). Everything
  gets moved to explicit ->show_options(), killing ->s_options off +
  some cosmetic bits around fs/namespace.c and friends. Basically, the
  stuff needed to work with fsmount series with minimum of conflicts
  with other work.

  It's not strictly required for this merge window, but it would reduce
  the PITA during the coming cycle, so it would be nice to have those
  bits and pieces out of the way"

* 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  isofs: Fix isofs_show_options()
  VFS: Kill off s_options and helpers
  orangefs: Implement show_options
  9p: Implement show_options
  isofs: Implement show_options
  afs: Implement show_options
  affs: Implement show_options
  befs: Implement show_options
  spufs: Implement show_options
  bpf: Implement show_options
  ramfs: Implement show_options
  pstore: Implement show_options
  omfs: Implement show_options
  hugetlbfs: Implement show_options
  VFS: Don't use save/replace_mount_options if not using generic_show_options
  VFS: Provide empty name qstr
  VFS: Make get_filesystem() return the affected filesystem
  VFS: Clean up whitespace in fs/namespace.c and fs/super.c
  Provide a function to create a NUL-terminated string from unterminated data
2017-07-15 12:00:42 -07:00
David Howells
4dfdb71307 orangefs: Implement show_options
Implement the show_options superblock op for orangefs as part of a bid to
rid of s_options and generic_show_options() to make it easier to implement
a context-based mount where the mount options can be passed individually
over a file descriptor.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Mike Marshall <hubcap@omnibond.com>
cc: pvfs2-developers@beowulf-underground.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-07-11 06:09:21 -04:00
Ingo Molnar
2055da9738 sched/wait: Disambiguate wq_entry->task_list and wq_head->task_list naming
So I've noticed a number of instances where it was not obvious from the
code whether ->task_list was for a wait-queue head or a wait-queue entry.

Furthermore, there's a number of wait-queue users where the lists are
not for 'tasks' but other entities (poll tables, etc.), in which case
the 'task_list' name is actively confusing.

To clear this all up, name the wait-queue head and entry list structure
fields unambiguously:

	struct wait_queue_head::task_list	=> ::head
	struct wait_queue_entry::task_list	=> ::entry

For example, this code:

	rqw->wait.task_list.next != &wait->task_list

... is was pretty unclear (to me) what it's doing, while now it's written this way:

	rqw->wait.head.next != &wait->entry

... which makes it pretty clear that we are iterating a list until we see the head.

Other examples are:

	list_for_each_entry_safe(pos, next, &x->task_list, task_list) {
	list_for_each_entry(wq, &fence->wait.task_list, task_list) {

... where it's unclear (to me) what we are iterating, and during review it's
hard to tell whether it's trying to walk a wait-queue entry (which would be
a bug), while now it's written as:

	list_for_each_entry_safe(pos, next, &x->head, entry) {
	list_for_each_entry(wq, &fence->wait.head, entry) {

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-20 12:19:14 +02:00
Ingo Molnar
ac6424b981 sched/wait: Rename wait_queue_t => wait_queue_entry_t
Rename:

	wait_queue_t		=>	wait_queue_entry_t

'wait_queue_t' was always a slight misnomer: its name implies that it's a "queue",
but in reality it's a queue *entry*. The 'real' queue is the wait queue head,
which had to carry the name.

Start sorting this out by renaming it to 'wait_queue_entry_t'.

This also allows the real structure name 'struct __wait_queue' to
lose its double underscore and become 'struct wait_queue_entry',
which is the more canonical nomenclature for such data types.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-20 12:18:27 +02:00
Linus Torvalds
aeced66196 Some cleanups:
remove unused get_fsid_from_ino
   fix bounds check for listxattr
   clean up oversize xattr validation
   do not set getattr_time on orangefs_lookup
   return from orangefs_devreq_read quickly if possible
   do not wait for timeout if umounting
   handle zero size write in debugfs
 
 Bug fixes:
 
   do not check possibly stale size on truncate
   ensure the userspace component is unmounted if mount fails
   total reimplementation of dir.c
 
 New feature:
 
   implement statx
 
 The new implementation of dir.c is kind of a big deal, all new
 code. It has been posted to fs-devel during the previous rc period,
 we didn't get much review or feedback from there, but it has been reviewed
 very heavily here, so much so that we have two entire versions of the
 reimplementation. Not only does the new implementation fix some
 xfstests, but it passes all the new tests we made here that involve
 seeking and rewinding and giant directories and long file names.
 The new dir code has three patches itself:
 
   skip forward to the next directory entry if seek is short
   invalidate stored directory on seek
   count directory pieces correctly
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJZDLasAAoJEM9EDqnrzg2+yR4P/jNsryNfQush5V/6EO+wpQ7p
 O0epuLG42QMN67wdsQDVOOzcRQq2IAoYrgupZfEvCVsoBiYxdCTTwhN/55UctMBA
 xnakv8BarrLd6pqSJOlQviP7ByXdEvy7dtYYuAEtdRnPtTZEmjDH0k9ME759+DVm
 pPQ6fanPzSZuG8fdjI4QrKiFfpE5slMeyMV9SmzIq81S1i+t2b9sDYKTiP3Jt14y
 KTweGdJXTRT+Piy27d80HN9ExlFXlcyru9GDWNhZi4EHlax7bq76Qwu1XKyaOg0h
 MN40+18k+Zqrpj1/tq4aj3YM0P3HjpRhtb5TqOC+QhZDIL1gJ8bv8rv61snWTak+
 6cXtwvIh7r4aEU+gkMLP29HXCVlGg3V4up+DdbHJVbIEXV8C5csJBP+sQUlU7A5D
 WoPmheV7CJ8nicwkxYm31dhdnW7mOwW/J4uUlM9w/yU/dVfoz1SK8AtKjy0xX87c
 Jpo7nuJEDprI+9neT0y5U+RHVqH08+cA5DCrdk0x8JaJIrjOZpvTROIPrtzlS7QL
 aTu+W/ISXtFwnM+ERmw8TKPD7TTUXypydYhzXe8V6itDpiNp1kQFGmLGzLhAMElH
 iGQkFatR6LSKh+DxUD3PREQGNyQCKpgPiqLoGYprzQ829tqLpThumfZic9lX1C/+
 we5VEpRbiz6BjN110DBJ
 =NGTt
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-4.12-ofs-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux

Pull orangefs updates from Mike Marshall:
 "Orangefs cleanups, fixes and statx support.

  Some cleanups:

   - remove unused get_fsid_from_ino
   - fix bounds check for listxattr
   - clean up oversize xattr validation
   - do not set getattr_time on orangefs_lookup
   - return from orangefs_devreq_read quickly if possible
   - do not wait for timeout if umounting
   - handle zero size write in debugfs

  Bug fixes:

   - do not check possibly stale size on truncate
   - ensure the userspace component is unmounted if mount fails
   - total reimplementation of dir.c

  New feature:

   - implement statx

  The new implementation of dir.c is kind of a big deal, all new code.
  It has been posted to fs-devel during the previous rc period, we
  didn't get much review or feedback from there, but it has been
  reviewed very heavily here, so much so that we have two entire
  versions of the reimplementation.

  Not only does the new implementation fix some xfstests, but it passes
  all the new tests we made here that involve seeking and rewinding and
  giant directories and long file names. The new dir code has three
  patches itself:

   - skip forward to the next directory entry if seek is short
   - invalidate stored directory on seek
   - count directory pieces correctly"

* tag 'for-linus-4.12-ofs-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
  orangefs: count directory pieces correctly
  orangefs: invalidate stored directory on seek
  orangefs: skip forward to the next directory entry if seek is short
  orangefs: handle zero size write in debugfs
  orangefs: do not wait for timeout if umounting
  orangefs: return from orangefs_devreq_read quickly if possible
  orangefs: ensure the userspace component is unmounted if mount fails
  orangefs: do not check possibly stale size on truncate
  orangefs: implement statx
  orangefs: remove ORANGEFS_READDIR macros
  orangefs: support very large directories
  orangefs: support llseek on directories
  orangefs: rewrite readdir to fix several bugs
  orangefs: do not set getattr_time on orangefs_lookup
  orangefs: clean up oversize xattr validation
  orangefs: fix bounds check for listxattr
  orangefs: remove unused get_fsid_from_ino
2017-05-05 13:36:10 -07:00
Martin Brandenburg
2f713b5c7d orangefs: count directory pieces correctly
A large directory full of differently sized file names triggered this.
Most directories, even very large directories with shorter names, would
be lucky enough to fit in one server response.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-05-04 14:38:24 -04:00
Martin Brandenburg
942835d68f orangefs: invalidate stored directory on seek
If an application seeks to a position before the point which has been
read, it must want updates which have been made to the directory.  So
delete the copy stored in the kernel so it will be fetched again.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-05-04 14:38:15 -04:00
Martin Brandenburg
bf15ba7c1f orangefs: skip forward to the next directory entry if seek is short
If userspace seeks to a position in the stream which is not correct, it
would have returned EIO because the data in the buffer at that offset
would be incorrect.  This and the userspace daemon returning a corrupt
directory are indistinguishable.

Now if the data does not look right, skip forward to the next chunk and
try again.  The motivation is that if the directory changes, an
application may seek to a position that was valid and no longer is valid.

It is not yet possible for a directory to change.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-05-04 14:38:10 -04:00
Dan Carpenter
907bfcd8d8 orangefs: handle zero size write in debugfs
If we write zero bytes to this debugfs file, then it will cause an
underflow when we do copy_from_user(buf, ubuf, count - 1).  Debugfs can
normally only be written to by root so the impact of this is low.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-04-26 14:33:01 -04:00
Martin Brandenburg
b5a9d61eeb orangefs: do not wait for timeout if umounting
When the computer is turned off, all the processes are killed and then
all the filesystems are umounted.  OrangeFS should not wait for the
userspace daemon to come back in that case.

This only works for plain umount(2).  To actually take advantage of this
interactively, `umount -f' is needed; otherwise umount will issue a
statfs first, which will wait for the userspace daemon to come back.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-04-26 14:33:01 -04:00
Martin Brandenburg
b7a57ccab8 orangefs: return from orangefs_devreq_read quickly if possible
It is not necessary to take the lock and search through the request list
if the list is empty.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-04-26 14:33:00 -04:00
Martin Brandenburg
9d286b0d82 orangefs: ensure the userspace component is unmounted if mount fails
If the mount is aborted after userspace has been asked to mount,
userspace must be told to unmount.

Ordinarily orangefs_kill_sb does the unmount.  However it cannot be
called if the superblock has not been set up.  This is a very narrow
window.

The NULL fs_id is not unmounted.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-04-26 14:33:00 -04:00
Martin Brandenburg
53950ef541 orangefs: do not check possibly stale size on truncate
Let the server figure this out because our size might be out of date or
not present.

The bug was that

	xfs_io -f -t -c "pread -v 0 100" /mnt/foo
	echo "Test" > /mnt/foo
	xfs_io -f -t -c "pread -v 0 100" /mnt/foo

fails because the second truncate did not happen if nothing had
requested the size after the write in echo.  Thus i_size was zero (not
present) and the orangefs_setattr though i_size was zero and there was
nothing to do.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Cc: stable@vger.kernel.org
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-04-26 14:33:00 -04:00
Martin Brandenburg
68a24a6cc4 orangefs: implement statx
Fortunately OrangeFS has had a getattr request mask for a long time.

The server basically has two difficulty levels for attributes.  Fetching
any attribute except size requires communicating with the metadata
server for that handle.  Since all the attributes are right there, it
makes sense to return them all.  Fetching the size requires
communicating with every I/O server (that the file is distributed
across).  Therefore if asked for anything except size, get everything
except size, and if asked for size, get everything.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-04-26 14:33:00 -04:00
Martin Brandenburg
7b796ae370 orangefs: remove ORANGEFS_READDIR macros
They are clones of the ORANGEFS_ITERATE macros in use elsewhere.  Delete
ORANGEFS_ITERATE_NEXT which is a hack previously used by readdir.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-04-26 14:33:00 -04:00
Martin Brandenburg
480e3e532e orangefs: support very large directories
This works by maintaining a linked list of pages which the directory
has been read into rather than one giant fixed-size buffer.

This replaces code which limits the total directory size to the total
amount that could be returned in one server request.  Since filenames
are usually considerably shorter than the maximum, the old code could
usually handle several server requests before running out of space.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-04-26 14:33:00 -04:00
Martin Brandenburg
72f66b8329 orangefs: support llseek on directories
This and the previous commit fix xfstests generic/257.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-04-26 14:33:00 -04:00
Martin Brandenburg
382f4581e6 orangefs: rewrite readdir to fix several bugs
In the past, readdir assumed that the user buffer will be large enough
that all entries from the server will fit.  If this was not true,
entries would be skipped.

Since it works now, request 512 entries rather than 96 per server
operation.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-04-26 14:33:00 -04:00
Martin Brandenburg
17930b252c orangefs: do not set getattr_time on orangefs_lookup
Since orangefs_lookup calls orangefs_iget which calls
orangefs_inode_getattr, getattr_time will get set.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Cc: stable@vger.kernel.org
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-04-26 14:33:00 -04:00
Martin Brandenburg
e675c5ec51 orangefs: clean up oversize xattr validation
Also don't check flags as this has been validated by the VFS already.

Fix an off-by-one error in the max size checking.

Stop logging just because userspace wants to write attributes which do
not fit.

This and the previous commit fix xfstests generic/020.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Cc: stable@vger.kernel.org
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-04-26 14:33:00 -04:00
Martin Brandenburg
a956af337b orangefs: fix bounds check for listxattr
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Cc: stable@vger.kernel.org
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-04-26 14:33:00 -04:00
Martin Brandenburg
418ce3eb66 orangefs: remove unused get_fsid_from_ino
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-04-26 14:33:00 -04:00
Al Viro
c63ed807d1 orangefs: use iov_iter_revert()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-04-21 13:57:32 -04:00
Al Viro
890559e34e orangefs_bufmap_copy_from_iovec(): fix EFAULT handling
short copy here should mean instant EFAULT, not "move to the
next page and hope it fails there, this time with nothing
copied"

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-04-17 14:23:20 -04:00
Martin Brandenburg
1ec1688c53 orangefs: free superblock when mount fails
Otherwise lockdep says:

[ 1337.483798] ================================================
[ 1337.483999] [ BUG: lock held when returning to user space! ]
[ 1337.484252] 4.11.0-rc6 #19 Not tainted
[ 1337.484423] ------------------------------------------------
[ 1337.484626] mount/14766 is leaving the kernel with locks still held!
[ 1337.484841] 1 lock held by mount/14766:
[ 1337.485017]  #0:  (&type->s_umount_key#33/1){+.+.+.}, at: [<ffffffff8124171f>] sget_userns+0x2af/0x520

Caught by xfstests generic/413 which tried to mount with the unsupported
mount option dax.  Then xfstests generic/422 ran sync which deadlocks.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Acked-by: Mike Marshall <hubcap@omnibond.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-15 09:39:31 -07:00
Martin Brandenburg
cefdc26e86 orangefs: move features validation to fix filesystem hang
Without this fix (and another to the userspace component itself
described later), the kernel will be unable to process any OrangeFS
requests after the userspace component is restarted (due to a crash or
at the administrator's behest).

The bug here is that inside orangefs_remount, the orangefs_request_mutex
is locked.  When the userspace component restarts while the filesystem
is mounted, it sends a ORANGEFS_DEV_REMOUNT_ALL ioctl to the device,
which causes the kernel to send it a few requests aimed at synchronizing
the state between the two.  While this is happening the
orangefs_request_mutex is locked to prevent any other requests going
through.

This is only half of the bugfix.  The other half is in the userspace
component which outright ignores(!) requests made before it considers
the filesystem remounted, which is after the ioctl returns.  Of course
the ioctl doesn't return until after the userspace component responds to
the request it ignores.  The userspace component has been changed to
allow ORANGEFS_VFS_OP_FEATURES regardless of the mount status.

Mike Marshall says:
 "I've tested this patch against the fixed userspace part. This patch is
  real important, I hope it can make it into 4.11...

  Here's what happens when the userspace daemon is restarted, without
  the patch:

    =============================================
    [ INFO: possible recursive locking detected ]
    [   4.10.0-00007-ge98bdb3 #1 Not tainted    ]
    ---------------------------------------------
    pvfs2-client-co/29032 is trying to acquire lock:
     (orangefs_request_mutex){+.+.+.}, at: service_operation+0x3c7/0x7b0 [orangefs]
                  but task is already holding lock:
     (orangefs_request_mutex){+.+.+.}, at: dispatch_ioctl_command+0x1bf/0x330 [orangefs]

    CPU: 0 PID: 29032 Comm: pvfs2-client-co Not tainted 4.10.0-00007-ge98bdb3 #1
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-1.fc25 04/01/2014
    Call Trace:
     __lock_acquire+0x7eb/0x1290
     lock_acquire+0xe8/0x1d0
     mutex_lock_killable_nested+0x6f/0x6e0
     service_operation+0x3c7/0x7b0 [orangefs]
     orangefs_remount+0xea/0x150 [orangefs]
     dispatch_ioctl_command+0x227/0x330 [orangefs]
     orangefs_devreq_ioctl+0x29/0x70 [orangefs]
     do_vfs_ioctl+0xa3/0x6e0
     SyS_ioctl+0x79/0x90"

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Acked-by: Mike Marshall <hubcap@omnibond.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-07 13:41:22 -07:00
Linus Torvalds
590dce2d49 Merge branch 'rebased-statx' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs 'statx()' update from Al Viro.

This adds the new extended stat() interface that internally subsumes our
previous stat interfaces, and allows user mode to specify in more detail
what kind of information it wants.

It also allows for some explicit synchronization information to be
passed to the filesystem, which can be relevant for network filesystems:
is the cached value ok, or do you need open/close consistency, or what?

From David Howells.

Andreas Dilger points out that the first version of the extended statx
interface was posted June 29, 2010:

    https://www.spinics.net/lists/linux-fsdevel/msg33831.html

* 'rebased-statx' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  statx: Add a system call to make enhanced file info available
2017-03-03 11:38:56 -08:00
Linus Torvalds
1827adb11a Merge branch 'WIP.sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull sched.h split-up from Ingo Molnar:
 "The point of these changes is to significantly reduce the
  <linux/sched.h> header footprint, to speed up the kernel build and to
  have a cleaner header structure.

  After these changes the new <linux/sched.h>'s typical preprocessed
  size goes down from a previous ~0.68 MB (~22K lines) to ~0.45 MB (~15K
  lines), which is around 40% faster to build on typical configs.

  Not much changed from the last version (-v2) posted three weeks ago: I
  eliminated quirks, backmerged fixes plus I rebased it to an upstream
  SHA1 from yesterday that includes most changes queued up in -next plus
  all sched.h changes that were pending from Andrew.

  I've re-tested the series both on x86 and on cross-arch defconfigs,
  and did a bisectability test at a number of random points.

  I tried to test as many build configurations as possible, but some
  build breakage is probably still left - but it should be mostly
  limited to architectures that have no cross-compiler binaries
  available on kernel.org, and non-default configurations"

* 'WIP.sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (146 commits)
  sched/headers: Clean up <linux/sched.h>
  sched/headers: Remove #ifdefs from <linux/sched.h>
  sched/headers: Remove the <linux/topology.h> include from <linux/sched.h>
  sched/headers, hrtimer: Remove the <linux/wait.h> include from <linux/hrtimer.h>
  sched/headers, x86/apic: Remove the <linux/pm.h> header inclusion from <asm/apic.h>
  sched/headers, timers: Remove the <linux/sysctl.h> include from <linux/timer.h>
  sched/headers: Remove <linux/magic.h> from <linux/sched/task_stack.h>
  sched/headers: Remove <linux/sched.h> from <linux/sched/init.h>
  sched/core: Remove unused prefetch_stack()
  sched/headers: Remove <linux/rculist.h> from <linux/sched.h>
  sched/headers: Remove the 'init_pid_ns' prototype from <linux/sched.h>
  sched/headers: Remove <linux/signal.h> from <linux/sched.h>
  sched/headers: Remove <linux/rwsem.h> from <linux/sched.h>
  sched/headers: Remove the runqueue_is_locked() prototype
  sched/headers: Remove <linux/sched.h> from <linux/sched/hotplug.h>
  sched/headers: Remove <linux/sched.h> from <linux/sched/debug.h>
  sched/headers: Remove <linux/sched.h> from <linux/sched/nohz.h>
  sched/headers: Remove <linux/sched.h> from <linux/sched/stat.h>
  sched/headers: Remove the <linux/gfp.h> include from <linux/sched.h>
  sched/headers: Remove <linux/rtmutex.h> from <linux/sched.h>
  ...
2017-03-03 10:16:38 -08:00
David Howells
a528d35e8b statx: Add a system call to make enhanced file info available
Add a system call to make extended file information available, including
file creation and some attribute flags where available through the
underlying filesystem.

The getattr inode operation is altered to take two additional arguments: a
u32 request_mask and an unsigned int flags that indicate the
synchronisation mode.  This change is propagated to the vfs_getattr*()
function.

Functions like vfs_stat() are now inline wrappers around new functions
vfs_statx() and vfs_statx_fd() to reduce stack usage.

========
OVERVIEW
========

The idea was initially proposed as a set of xattrs that could be retrieved
with getxattr(), but the general preference proved to be for a new syscall
with an extended stat structure.

A number of requests were gathered for features to be included.  The
following have been included:

 (1) Make the fields a consistent size on all arches and make them large.

 (2) Spare space, request flags and information flags are provided for
     future expansion.

 (3) Better support for the y2038 problem [Arnd Bergmann] (tv_sec is an
     __s64).

 (4) Creation time: The SMB protocol carries the creation time, which could
     be exported by Samba, which will in turn help CIFS make use of
     FS-Cache as that can be used for coherency data (stx_btime).

     This is also specified in NFSv4 as a recommended attribute and could
     be exported by NFSD [Steve French].

 (5) Lightweight stat: Ask for just those details of interest, and allow a
     netfs (such as NFS) to approximate anything not of interest, possibly
     without going to the server [Trond Myklebust, Ulrich Drepper, Andreas
     Dilger] (AT_STATX_DONT_SYNC).

 (6) Heavyweight stat: Force a netfs to go to the server, even if it thinks
     its cached attributes are up to date [Trond Myklebust]
     (AT_STATX_FORCE_SYNC).

And the following have been left out for future extension:

 (7) Data version number: Could be used by userspace NFS servers [Aneesh
     Kumar].

     Can also be used to modify fill_post_wcc() in NFSD which retrieves
     i_version directly, but has just called vfs_getattr().  It could get
     it from the kstat struct if it used vfs_xgetattr() instead.

     (There's disagreement on the exact semantics of a single field, since
     not all filesystems do this the same way).

 (8) BSD stat compatibility: Including more fields from the BSD stat such
     as creation time (st_btime) and inode generation number (st_gen)
     [Jeremy Allison, Bernd Schubert].

 (9) Inode generation number: Useful for FUSE and userspace NFS servers
     [Bernd Schubert].

     (This was asked for but later deemed unnecessary with the
     open-by-handle capability available and caused disagreement as to
     whether it's a security hole or not).

(10) Extra coherency data may be useful in making backups [Andreas Dilger].

     (No particular data were offered, but things like last backup
     timestamp, the data version number and the DOS archive bit would come
     into this category).

(11) Allow the filesystem to indicate what it can/cannot provide: A
     filesystem can now say it doesn't support a standard stat feature if
     that isn't available, so if, for instance, inode numbers or UIDs don't
     exist or are fabricated locally...

     (This requires a separate system call - I have an fsinfo() call idea
     for this).

(12) Store a 16-byte volume ID in the superblock that can be returned in
     struct xstat [Steve French].

     (Deferred to fsinfo).

(13) Include granularity fields in the time data to indicate the
     granularity of each of the times (NFSv4 time_delta) [Steve French].

     (Deferred to fsinfo).

(14) FS_IOC_GETFLAGS value.  These could be translated to BSD's st_flags.
     Note that the Linux IOC flags are a mess and filesystems such as Ext4
     define flags that aren't in linux/fs.h, so translation in the kernel
     may be a necessity (or, possibly, we provide the filesystem type too).

     (Some attributes are made available in stx_attributes, but the general
     feeling was that the IOC flags were to ext[234]-specific and shouldn't
     be exposed through statx this way).

(15) Mask of features available on file (eg: ACLs, seclabel) [Brad Boyer,
     Michael Kerrisk].

     (Deferred, probably to fsinfo.  Finding out if there's an ACL or
     seclabal might require extra filesystem operations).

(16) Femtosecond-resolution timestamps [Dave Chinner].

     (A __reserved field has been left in the statx_timestamp struct for
     this - if there proves to be a need).

(17) A set multiple attributes syscall to go with this.

===============
NEW SYSTEM CALL
===============

The new system call is:

	int ret = statx(int dfd,
			const char *filename,
			unsigned int flags,
			unsigned int mask,
			struct statx *buffer);

The dfd, filename and flags parameters indicate the file to query, in a
similar way to fstatat().  There is no equivalent of lstat() as that can be
emulated with statx() by passing AT_SYMLINK_NOFOLLOW in flags.  There is
also no equivalent of fstat() as that can be emulated by passing a NULL
filename to statx() with the fd of interest in dfd.

Whether or not statx() synchronises the attributes with the backing store
can be controlled by OR'ing a value into the flags argument (this typically
only affects network filesystems):

 (1) AT_STATX_SYNC_AS_STAT tells statx() to behave as stat() does in this
     respect.

 (2) AT_STATX_FORCE_SYNC will require a network filesystem to synchronise
     its attributes with the server - which might require data writeback to
     occur to get the timestamps correct.

 (3) AT_STATX_DONT_SYNC will suppress synchronisation with the server in a
     network filesystem.  The resulting values should be considered
     approximate.

mask is a bitmask indicating the fields in struct statx that are of
interest to the caller.  The user should set this to STATX_BASIC_STATS to
get the basic set returned by stat().  It should be noted that asking for
more information may entail extra I/O operations.

buffer points to the destination for the data.  This must be 256 bytes in
size.

======================
MAIN ATTRIBUTES RECORD
======================

The following structures are defined in which to return the main attribute
set:

	struct statx_timestamp {
		__s64	tv_sec;
		__s32	tv_nsec;
		__s32	__reserved;
	};

	struct statx {
		__u32	stx_mask;
		__u32	stx_blksize;
		__u64	stx_attributes;
		__u32	stx_nlink;
		__u32	stx_uid;
		__u32	stx_gid;
		__u16	stx_mode;
		__u16	__spare0[1];
		__u64	stx_ino;
		__u64	stx_size;
		__u64	stx_blocks;
		__u64	__spare1[1];
		struct statx_timestamp	stx_atime;
		struct statx_timestamp	stx_btime;
		struct statx_timestamp	stx_ctime;
		struct statx_timestamp	stx_mtime;
		__u32	stx_rdev_major;
		__u32	stx_rdev_minor;
		__u32	stx_dev_major;
		__u32	stx_dev_minor;
		__u64	__spare2[14];
	};

The defined bits in request_mask and stx_mask are:

	STATX_TYPE		Want/got stx_mode & S_IFMT
	STATX_MODE		Want/got stx_mode & ~S_IFMT
	STATX_NLINK		Want/got stx_nlink
	STATX_UID		Want/got stx_uid
	STATX_GID		Want/got stx_gid
	STATX_ATIME		Want/got stx_atime{,_ns}
	STATX_MTIME		Want/got stx_mtime{,_ns}
	STATX_CTIME		Want/got stx_ctime{,_ns}
	STATX_INO		Want/got stx_ino
	STATX_SIZE		Want/got stx_size
	STATX_BLOCKS		Want/got stx_blocks
	STATX_BASIC_STATS	[The stuff in the normal stat struct]
	STATX_BTIME		Want/got stx_btime{,_ns}
	STATX_ALL		[All currently available stuff]

stx_btime is the file creation time, stx_mask is a bitmask indicating the
data provided and __spares*[] are where as-yet undefined fields can be
placed.

Time fields are structures with separate seconds and nanoseconds fields
plus a reserved field in case we want to add even finer resolution.  Note
that times will be negative if before 1970; in such a case, the nanosecond
fields will also be negative if not zero.

The bits defined in the stx_attributes field convey information about a
file, how it is accessed, where it is and what it does.  The following
attributes map to FS_*_FL flags and are the same numerical value:

	STATX_ATTR_COMPRESSED		File is compressed by the fs
	STATX_ATTR_IMMUTABLE		File is marked immutable
	STATX_ATTR_APPEND		File is append-only
	STATX_ATTR_NODUMP		File is not to be dumped
	STATX_ATTR_ENCRYPTED		File requires key to decrypt in fs

Within the kernel, the supported flags are listed by:

	KSTAT_ATTR_FS_IOC_FLAGS

[Are any other IOC flags of sufficient general interest to be exposed
through this interface?]

New flags include:

	STATX_ATTR_AUTOMOUNT		Object is an automount trigger

These are for the use of GUI tools that might want to mark files specially,
depending on what they are.

Fields in struct statx come in a number of classes:

 (0) stx_dev_*, stx_blksize.

     These are local system information and are always available.

 (1) stx_mode, stx_nlinks, stx_uid, stx_gid, stx_[amc]time, stx_ino,
     stx_size, stx_blocks.

     These will be returned whether the caller asks for them or not.  The
     corresponding bits in stx_mask will be set to indicate whether they
     actually have valid values.

     If the caller didn't ask for them, then they may be approximated.  For
     example, NFS won't waste any time updating them from the server,
     unless as a byproduct of updating something requested.

     If the values don't actually exist for the underlying object (such as
     UID or GID on a DOS file), then the bit won't be set in the stx_mask,
     even if the caller asked for the value.  In such a case, the returned
     value will be a fabrication.

     Note that there are instances where the type might not be valid, for
     instance Windows reparse points.

 (2) stx_rdev_*.

     This will be set only if stx_mode indicates we're looking at a
     blockdev or a chardev, otherwise will be 0.

 (3) stx_btime.

     Similar to (1), except this will be set to 0 if it doesn't exist.

=======
TESTING
=======

The following test program can be used to test the statx system call:

	samples/statx/test-statx.c

Just compile and run, passing it paths to the files you want to examine.
The file is built automatically if CONFIG_SAMPLES is enabled.

Here's some example output.  Firstly, an NFS directory that crosses to
another FSID.  Note that the AUTOMOUNT attribute is set because transiting
this directory will cause d_automount to be invoked by the VFS.

	[root@andromeda ~]# /tmp/test-statx -A /warthog/data
	statx(/warthog/data) = 0
	results=7ff
	  Size: 4096            Blocks: 8          IO Block: 1048576  directory
	Device: 00:26           Inode: 1703937     Links: 125
	Access: (3777/drwxrwxrwx)  Uid:     0   Gid:  4041
	Access: 2016-11-24 09:02:12.219699527+0000
	Modify: 2016-11-17 10:44:36.225653653+0000
	Change: 2016-11-17 10:44:36.225653653+0000
	Attributes: 0000000000001000 (-------- -------- -------- -------- -------- -------- ---m---- --------)

Secondly, the result of automounting on that directory.

	[root@andromeda ~]# /tmp/test-statx /warthog/data
	statx(/warthog/data) = 0
	results=7ff
	  Size: 4096            Blocks: 8          IO Block: 1048576  directory
	Device: 00:27           Inode: 2           Links: 125
	Access: (3777/drwxrwxrwx)  Uid:     0   Gid:  4041
	Access: 2016-11-24 09:02:12.219699527+0000
	Modify: 2016-11-17 10:44:36.225653653+0000
	Change: 2016-11-17 10:44:36.225653653+0000

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-03-02 20:51:15 -05:00
Linus Torvalds
94e877d0fb Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs pile two from Al Viro:

 - orangefs fix

 - series of fs/namei.c cleanups from me

 - VFS stuff coming from overlayfs tree

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  orangefs: Use RCU for destroy_inode
  vfs: use helper for calling f_op->fsync()
  mm: use helper for calling f_op->mmap()
  vfs: use helpers for calling f_op->{read,write}_iter()
  vfs: pass type instead of fn to do_{loop,iter}_readv_writev()
  vfs: extract common parts of {compat_,}do_readv_writev()
  vfs: wrap write f_ops with file_{start,end}_write()
  vfs: deny copy_file_range() for non regular files
  vfs: deny fallocate() on directory
  vfs: create vfs helper vfs_tmpfile()
  namei.c: split unlazy_walk()
  namei.c: fold the check for DCACHE_OP_REVALIDATE into d_revalidate()
  lookup_fast(): clean up the logics around the fallback to non-rcu mode
  namei: fold unlazy_link() into its sole caller
2017-03-02 15:20:00 -08:00
Peter Zijlstra
0695d7dc1d orangefs: Use RCU for destroy_inode
freeing of inodes must be RCU-delayed on all filesystems

Cc: stable@vger.kernel.org
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-03-02 06:40:36 -05:00
Ingo Molnar
174cd4b1e5 sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sched.h> into <linux/sched/signal.h>
Fix up affected files that include this signal functionality via sched.h.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:32 +01:00