Commit Graph

49 Commits

Author SHA1 Message Date
Vincenzo Frascino
3568b88944 arm64: compat: Fix syscall number of compat_clock_getres
The syscall number of compat_clock_getres was erroneously set to 247
(__NR_io_cancel!) instead of 264. This causes the vDSO fallback of
clock_getres() to land on the wrong syscall for compat tasks.

Fix the numbering.

Cc: <stable@vger.kernel.org>
Fixes: 53c489e1df ("arm64: compat: Add missing syscall numbers")
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
2020-03-19 19:23:46 +00:00
Linus Torvalds
83fa805bcb threads-v5.6
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCXjFo8wAKCRCRxhvAZXjc
 omaGAQDVwCHQekqxp2eC8EJH4Pkt+Bn1BLrA25stlTo93YBPHgEAsPVUCRNcrZAl
 VncYmxCfpt3Yu0S/MTVXu5xrRiIXPQk=
 =uqTN
 -----END PGP SIGNATURE-----

Merge tag 'threads-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux

Pull thread management updates from Christian Brauner:
 "Sargun Dhillon over the last cycle has worked on the pidfd_getfd()
  syscall.

  This syscall allows for the retrieval of file descriptors of a process
  based on its pidfd. A task needs to have ptrace_may_access()
  permissions with PTRACE_MODE_ATTACH_REALCREDS (suggested by Oleg and
  Andy) on the target.

  One of the main use-cases is in combination with seccomp's user
  notification feature. As a reminder, seccomp's user notification
  feature was made available in v5.0. It allows a task to retrieve a
  file descriptor for its seccomp filter. The file descriptor is usually
  handed of to a more privileged supervising process. The supervisor can
  then listen for syscall events caught by the seccomp filter of the
  supervisee and perform actions in lieu of the supervisee, usually
  emulating syscalls. pidfd_getfd() is needed to expand its uses.

  There are currently two major users that wait on pidfd_getfd() and one
  future user:

   - Netflix, Sargun said, is working on a service mesh where users
     should be able to connect to a dns-based VIP. When a user connects
     to e.g. 1.2.3.4:80 that runs e.g. service "foo" they will be
     redirected to an envoy process. This service mesh uses seccomp user
     notifications and pidfd to intercept all connect calls and instead
     of connecting them to 1.2.3.4:80 connects them to e.g.
     127.0.0.1:8080.

   - LXD uses the seccomp notifier heavily to intercept and emulate
     mknod() and mount() syscalls for unprivileged containers/processes.
     With pidfd_getfd() more uses-cases e.g. bridging socket connections
     will be possible.

   - The patchset has also seen some interest from the browser corner.
     Right now, Firefox is using a SECCOMP_RET_TRAP sandbox managed by a
     broker process. In the future glibc will start blocking all signals
     during dlopen() rendering this type of sandbox impossible. Hence,
     in the future Firefox will switch to a seccomp-user-nofication
     based sandbox which also makes use of file descriptor retrieval.
     The thread for this can be found at
     https://sourceware.org/ml/libc-alpha/2019-12/msg00079.html

  With pidfd_getfd() it is e.g. possible to bridge socket connections
  for the supervisee (binding to a privileged port) and taking actions
  on file descriptors on behalf of the supervisee in general.

  Sargun's first version was using an ioctl on pidfds but various people
  pushed for it to be a proper syscall which he duely implemented as
  well over various review cycles. Selftests are of course included.
  I've also added instructions how to deal with merge conflicts below.

  There's also a small fix coming from the kernel mentee project to
  correctly annotate struct sighand_struct with __rcu to fix various
  sparse warnings. We've received a few more such fixes and even though
  they are mostly trivial I've decided to postpone them until after -rc1
  since they came in rather late and I don't want to risk introducing
  build warnings.

  Finally, there's a new prctl() command PR_{G,S}ET_IO_FLUSHER which is
  needed to avoid allocation recursions triggerable by storage drivers
  that have userspace parts that run in the IO path (e.g. dm-multipath,
  iscsi, etc). These allocation recursions deadlock the device.

  The new prctl() allows such privileged userspace components to avoid
  allocation recursions by setting the PF_MEMALLOC_NOIO and
  PF_LESS_THROTTLE flags. The patch carries the necessary acks from the
  relevant maintainers and is routed here as part of prctl()
  thread-management."

* tag 'threads-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  prctl: PR_{G,S}ET_IO_FLUSHER to support controlling memory reclaim
  sched.h: Annotate sighand_struct with __rcu
  test: Add test for pidfd getfd
  arch: wire up pidfd_getfd syscall
  pid: Implement pidfd_getfd syscall
  vfs, fdtable: Add fget_task helper
2020-01-29 19:38:34 -08:00
Linus Torvalds
6aee4badd8 Merge branch 'work.openat2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull openat2 support from Al Viro:
 "This is the openat2() series from Aleksa Sarai.

  I'm afraid that the rest of namei stuff will have to wait - it got
  zero review the last time I'd posted #work.namei, and there had been a
  leak in the posted series I'd caught only last weekend. I was going to
  repost it on Monday, but the window opened and the odds of getting any
  review during that... Oh, well.

  Anyway, openat2 part should be ready; that _did_ get sane amount of
  review and public testing, so here it comes"

From Aleksa's description of the series:
 "For a very long time, extending openat(2) with new features has been
  incredibly frustrating. This stems from the fact that openat(2) is
  possibly the most famous counter-example to the mantra "don't silently
  accept garbage from userspace" -- it doesn't check whether unknown
  flags are present[1].

  This means that (generally) the addition of new flags to openat(2) has
  been fraught with backwards-compatibility issues (O_TMPFILE has to be
  defined as __O_TMPFILE|O_DIRECTORY|[O_RDWR or O_WRONLY] to ensure old
  kernels gave errors, since it's insecure to silently ignore the
  flag[2]). All new security-related flags therefore have a tough road
  to being added to openat(2).

  Furthermore, the need for some sort of control over VFS's path
  resolution (to avoid malicious paths resulting in inadvertent
  breakouts) has been a very long-standing desire of many userspace
  applications.

  This patchset is a revival of Al Viro's old AT_NO_JUMPS[3] patchset
  (which was a variant of David Drysdale's O_BENEATH patchset[4] which
  was a spin-off of the Capsicum project[5]) with a few additions and
  changes made based on the previous discussion within [6] as well as
  others I felt were useful.

  In line with the conclusions of the original discussion of
  AT_NO_JUMPS, the flag has been split up into separate flags. However,
  instead of being an openat(2) flag it is provided through a new
  syscall openat2(2) which provides several other improvements to the
  openat(2) interface (see the patch description for more details). The
  following new LOOKUP_* flags are added:

  LOOKUP_NO_XDEV:

     Blocks all mountpoint crossings (upwards, downwards, or through
     absolute links). Absolute pathnames alone in openat(2) do not
     trigger this. Magic-link traversal which implies a vfsmount jump is
     also blocked (though magic-link jumps on the same vfsmount are
     permitted).

  LOOKUP_NO_MAGICLINKS:

     Blocks resolution through /proc/$pid/fd-style links. This is done
     by blocking the usage of nd_jump_link() during resolution in a
     filesystem. The term "magic-links" is used to match with the only
     reference to these links in Documentation/, but I'm happy to change
     the name.

     It should be noted that this is different to the scope of
     ~LOOKUP_FOLLOW in that it applies to all path components. However,
     you can do openat2(NO_FOLLOW|NO_MAGICLINKS) on a magic-link and it
     will *not* fail (assuming that no parent component was a
     magic-link), and you will have an fd for the magic-link.

     In order to correctly detect magic-links, the introduction of a new
     LOOKUP_MAGICLINK_JUMPED state flag was required.

  LOOKUP_BENEATH:

     Disallows escapes to outside the starting dirfd's
     tree, using techniques such as ".." or absolute links. Absolute
     paths in openat(2) are also disallowed.

     Conceptually this flag is to ensure you "stay below" a certain
     point in the filesystem tree -- but this requires some additional
     to protect against various races that would allow escape using
     "..".

     Currently LOOKUP_BENEATH implies LOOKUP_NO_MAGICLINKS, because it
     can trivially beam you around the filesystem (breaking the
     protection). In future, there might be similar safety checks done
     as in LOOKUP_IN_ROOT, but that requires more discussion.

  In addition, two new flags are added that expand on the above ideas:

  LOOKUP_NO_SYMLINKS:

     Does what it says on the tin. No symlink resolution is allowed at
     all, including magic-links. Just as with LOOKUP_NO_MAGICLINKS this
     can still be used with NOFOLLOW to open an fd for the symlink as
     long as no parent path had a symlink component.

  LOOKUP_IN_ROOT:

     This is an extension of LOOKUP_BENEATH that, rather than blocking
     attempts to move past the root, forces all such movements to be
     scoped to the starting point. This provides chroot(2)-like
     protection but without the cost of a chroot(2) for each filesystem
     operation, as well as being safe against race attacks that
     chroot(2) is not.

     If a race is detected (as with LOOKUP_BENEATH) then an error is
     generated, and similar to LOOKUP_BENEATH it is not permitted to
     cross magic-links with LOOKUP_IN_ROOT.

     The primary need for this is from container runtimes, which
     currently need to do symlink scoping in userspace[7] when opening
     paths in a potentially malicious container.

     There is a long list of CVEs that could have bene mitigated by
     having RESOLVE_THIS_ROOT (such as CVE-2017-1002101,
     CVE-2017-1002102, CVE-2018-15664, and CVE-2019-5736, just to name a
     few).

  In order to make all of the above more usable, I'm working on
  libpathrs[8] which is a C-friendly library for safe path resolution.
  It features a userspace-emulated backend if the kernel doesn't support
  openat2(2). Hopefully we can get userspace to switch to using it, and
  thus get openat2(2) support for free once it's ready.

  Future work would include implementing things like
  RESOLVE_NO_AUTOMOUNT and possibly a RESOLVE_NO_REMOTE (to allow
  programs to be sure they don't hit DoSes though stale NFS handles)"

* 'work.openat2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  Documentation: path-lookup: include new LOOKUP flags
  selftests: add openat2(2) selftests
  open: introduce openat2(2) syscall
  namei: LOOKUP_{IN_ROOT,BENEATH}: permit limited ".." resolution
  namei: LOOKUP_IN_ROOT: chroot-like scoped resolution
  namei: LOOKUP_BENEATH: O_BENEATH-like scoped resolution
  namei: LOOKUP_NO_XDEV: block mountpoint crossing
  namei: LOOKUP_NO_MAGICLINKS: block magic-link resolution
  namei: LOOKUP_NO_SYMLINKS: block symlink resolution
  namei: allow set_root() to produce errors
  namei: allow nd_jump_link() to produce errors
  nsfs: clean-up ns_get_path() signature to return int
  namei: only return -ECHILD from follow_dotdot_rcu()
2020-01-29 11:20:24 -08:00
Aleksa Sarai
fddb5d430a open: introduce openat2(2) syscall
/* Background. */
For a very long time, extending openat(2) with new features has been
incredibly frustrating. This stems from the fact that openat(2) is
possibly the most famous counter-example to the mantra "don't silently
accept garbage from userspace" -- it doesn't check whether unknown flags
are present[1].

This means that (generally) the addition of new flags to openat(2) has
been fraught with backwards-compatibility issues (O_TMPFILE has to be
defined as __O_TMPFILE|O_DIRECTORY|[O_RDWR or O_WRONLY] to ensure old
kernels gave errors, since it's insecure to silently ignore the
flag[2]). All new security-related flags therefore have a tough road to
being added to openat(2).

Userspace also has a hard time figuring out whether a particular flag is
supported on a particular kernel. While it is now possible with
contemporary kernels (thanks to [3]), older kernels will expose unknown
flag bits through fcntl(F_GETFL). Giving a clear -EINVAL during
openat(2) time matches modern syscall designs and is far more
fool-proof.

In addition, the newly-added path resolution restriction LOOKUP flags
(which we would like to expose to user-space) don't feel related to the
pre-existing O_* flag set -- they affect all components of path lookup.
We'd therefore like to add a new flag argument.

Adding a new syscall allows us to finally fix the flag-ignoring problem,
and we can make it extensible enough so that we will hopefully never
need an openat3(2).

/* Syscall Prototype. */
  /*
   * open_how is an extensible structure (similar in interface to
   * clone3(2) or sched_setattr(2)). The size parameter must be set to
   * sizeof(struct open_how), to allow for future extensions. All future
   * extensions will be appended to open_how, with their zero value
   * acting as a no-op default.
   */
  struct open_how { /* ... */ };

  int openat2(int dfd, const char *pathname,
              struct open_how *how, size_t size);

/* Description. */
The initial version of 'struct open_how' contains the following fields:

  flags
    Used to specify openat(2)-style flags. However, any unknown flag
    bits or otherwise incorrect flag combinations (like O_PATH|O_RDWR)
    will result in -EINVAL. In addition, this field is 64-bits wide to
    allow for more O_ flags than currently permitted with openat(2).

  mode
    The file mode for O_CREAT or O_TMPFILE.

    Must be set to zero if flags does not contain O_CREAT or O_TMPFILE.

  resolve
    Restrict path resolution (in contrast to O_* flags they affect all
    path components). The current set of flags are as follows (at the
    moment, all of the RESOLVE_ flags are implemented as just passing
    the corresponding LOOKUP_ flag).

    RESOLVE_NO_XDEV       => LOOKUP_NO_XDEV
    RESOLVE_NO_SYMLINKS   => LOOKUP_NO_SYMLINKS
    RESOLVE_NO_MAGICLINKS => LOOKUP_NO_MAGICLINKS
    RESOLVE_BENEATH       => LOOKUP_BENEATH
    RESOLVE_IN_ROOT       => LOOKUP_IN_ROOT

open_how does not contain an embedded size field, because it is of
little benefit (userspace can figure out the kernel open_how size at
runtime fairly easily without it). It also only contains u64s (even
though ->mode arguably should be a u16) to avoid having padding fields
which are never used in the future.

Note that as a result of the new how->flags handling, O_PATH|O_TMPFILE
is no longer permitted for openat(2). As far as I can tell, this has
always been a bug and appears to not be used by userspace (and I've not
seen any problems on my machines by disallowing it). If it turns out
this breaks something, we can special-case it and only permit it for
openat(2) but not openat2(2).

After input from Florian Weimer, the new open_how and flag definitions
are inside a separate header from uapi/linux/fcntl.h, to avoid problems
that glibc has with importing that header.

/* Testing. */
In a follow-up patch there are over 200 selftests which ensure that this
syscall has the correct semantics and will correctly handle several
attack scenarios.

In addition, I've written a userspace library[4] which provides
convenient wrappers around openat2(RESOLVE_IN_ROOT) (this is necessary
because no other syscalls support RESOLVE_IN_ROOT, and thus lots of care
must be taken when using RESOLVE_IN_ROOT'd file descriptors with other
syscalls). During the development of this patch, I've run numerous
verification tests using libpathrs (showing that the API is reasonably
usable by userspace).

/* Future Work. */
Additional RESOLVE_ flags have been suggested during the review period.
These can be easily implemented separately (such as blocking auto-mount
during resolution).

Furthermore, there are some other proposed changes to the openat(2)
interface (the most obvious example is magic-link hardening[5]) which
would be a good opportunity to add a way for userspace to restrict how
O_PATH file descriptors can be re-opened.

Another possible avenue of future work would be some kind of
CHECK_FIELDS[6] flag which causes the kernel to indicate to userspace
which openat2(2) flags and fields are supported by the current kernel
(to avoid userspace having to go through several guesses to figure it
out).

[1]: https://lwn.net/Articles/588444/
[2]: https://lore.kernel.org/lkml/CA+55aFyyxJL1LyXZeBsf2ypriraj5ut1XkNDsunRBqgVjZU_6Q@mail.gmail.com
[3]: commit 629e014bb8 ("fs: completely ignore unknown open flags")
[4]: https://sourceware.org/bugzilla/show_bug.cgi?id=17523
[5]: https://lore.kernel.org/lkml/20190930183316.10190-2-cyphar@cyphar.com/
[6]: https://youtu.be/ggD-eb3yPVs

Suggested-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-01-18 09:19:18 -05:00
Sargun Dhillon
9a2cef09c8
arch: wire up pidfd_getfd syscall
This wires up the pidfd_getfd syscall for all architectures.

Signed-off-by: Sargun Dhillon <sargun@sargun.me>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20200107175927.4558-4-sargun@sargun.me
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-01-13 21:49:47 +01:00
Amanieu d'Antras
3e3c8ca5a3
arm64: Move __ARCH_WANT_SYS_CLONE3 definition to uapi headers
Previously this was only defined in the internal headers which
resulted in __NR_clone3 not being defined in the user headers.

Signed-off-by: Amanieu d'Antras <amanieu@gmail.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: <stable@vger.kernel.org> # 5.3.x
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20200102172413.654385-2-amanieu@gmail.com
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-01-07 13:30:49 +01:00
Linus Torvalds
8f6ccf6159 clone3-v5.3
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCXSMhhgAKCRCRxhvAZXjc
 or7kAP9VzDcQaK/WoDd2ezh2C7Wh5hNy9z/qJVCa6Tb+N+g1UgEAxbhFUg55uGOA
 JNf7fGar5JF5hBMIXR+NqOi1/sb4swg=
 =ELWo
 -----END PGP SIGNATURE-----

Merge tag 'clone3-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux

Pull clone3 system call from Christian Brauner:
 "This adds the clone3 syscall which is an extensible successor to clone
  after we snagged the last flag with CLONE_PIDFD during the 5.2 merge
  window for clone(). It cleanly supports all of the flags from clone()
  and thus all legacy workloads.

  There are few user visible differences between clone3 and clone.
  First, CLONE_DETACHED will cause EINVAL with clone3 so we can reuse
  this flag. Second, the CSIGNAL flag is deprecated and will cause
  EINVAL to be reported. It is superseeded by a dedicated "exit_signal"
  argument in struct clone_args thus freeing up even more flags. And
  third, clone3 gives CLONE_PIDFD a dedicated return argument in struct
  clone_args instead of abusing CLONE_PARENT_SETTID's parent_tidptr
  argument.

  The clone3 uapi is designed to be easy to handle on 32- and 64 bit:

    /* uapi */
    struct clone_args {
            __aligned_u64 flags;
            __aligned_u64 pidfd;
            __aligned_u64 child_tid;
            __aligned_u64 parent_tid;
            __aligned_u64 exit_signal;
            __aligned_u64 stack;
            __aligned_u64 stack_size;
            __aligned_u64 tls;
    };

  and a separate kernel struct is used that uses proper kernel typing:

    /* kernel internal */
    struct kernel_clone_args {
            u64 flags;
            int __user *pidfd;
            int __user *child_tid;
            int __user *parent_tid;
            int exit_signal;
            unsigned long stack;
            unsigned long stack_size;
            unsigned long tls;
    };

  The system call comes with a size argument which enables the kernel to
  detect what version of clone_args userspace is passing in. clone3
  validates that any additional bytes a given kernel does not know about
  are set to zero and that the size never exceeds a page.

  A nice feature is that this patchset allowed us to cleanup and
  simplify various core kernel codepaths in kernel/fork.c by making the
  internal _do_fork() function take struct kernel_clone_args even for
  legacy clone().

  This patch also unblocks the time namespace patchset which wants to
  introduce a new CLONE_TIMENS flag.

  Note, that clone3 has only been wired up for x86{_32,64}, arm{64}, and
  xtensa. These were the architectures that did not require special
  massaging.

  Other architectures treat fork-like system calls individually and
  after some back and forth neither Arnd nor I felt confident that we
  dared to add clone3 unconditionally to all architectures. We agreed to
  leave this up to individual architecture maintainers. This is why
  there's an additional patch that introduces __ARCH_WANT_SYS_CLONE3
  which any architecture can set once it has implemented support for
  clone3. The patch also adds a cond_syscall(clone3) for architectures
  such as nios2 or h8300 that generate their syscall table by simply
  including asm-generic/unistd.h. The hope is to get rid of
  __ARCH_WANT_SYS_CLONE3 and cond_syscall() rather soon"

* tag 'clone3-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  arch: handle arches who do not yet define clone3
  arch: wire-up clone3() syscall
  fork: add clone3
2019-07-11 10:09:44 -07:00
Linus Torvalds
5450e8a316 pidfd-updates-v5.3
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCXSMhUgAKCRCRxhvAZXjc
 okkiAQC3Hlg/O2JoIb4PqgEvBkpHSdVxyuWagn0ksjACW9ANKQEAl5OadMhvOq16
 UHGhKlpE/M8HflknIffoEGlIAWHrdwU=
 =7kP5
 -----END PGP SIGNATURE-----

Merge tag 'pidfd-updates-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux

Pull pidfd updates from Christian Brauner:
 "This adds two main features.

   - First, it adds polling support for pidfds. This allows process
     managers to know when a (non-parent) process dies in a race-free
     way.

     The notification mechanism used follows the same logic that is
     currently used when the parent of a task is notified of a child's
     death. With this patchset it is possible to put pidfds in an
     {e}poll loop and get reliable notifications for process (i.e.
     thread-group) exit.

   - The second feature compliments the first one by making it possible
     to retrieve pollable pidfds for processes that were not created
     using CLONE_PIDFD.

     A lot of processes get created with traditional PID-based calls
     such as fork() or clone() (without CLONE_PIDFD). For these
     processes a caller can currently not create a pollable pidfd. This
     is a problem for Android's low memory killer (LMK) and service
     managers such as systemd.

  Both patchsets are accompanied by selftests.

  It's perhaps worth noting that the work done so far and the work done
  in this branch for pidfd_open() and polling support do already see
  some adoption:

   - Android is in the process of backporting this work to all their LTS
     kernels [1]

   - Service managers make use of pidfd_send_signal but will need to
     wait until we enable waiting on pidfds for full adoption.

   - And projects I maintain make use of both pidfd_send_signal and
     CLONE_PIDFD [2] and will use polling support and pidfd_open() too"

[1] https://android-review.googlesource.com/q/topic:%22pidfd+polling+support+4.9+backport%22
    https://android-review.googlesource.com/q/topic:%22pidfd+polling+support+4.14+backport%22
    https://android-review.googlesource.com/q/topic:%22pidfd+polling+support+4.19+backport%22

[2] aab6e3eb73/src/lxc/start.c (L1753)

* tag 'pidfd-updates-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  tests: add pidfd_open() tests
  arch: wire-up pidfd_open()
  pid: add pidfd_open()
  pidfd: add polling selftests
  pidfd: add polling support
2019-07-10 22:17:21 -07:00
Christian Brauner
7615d9e178
arch: wire-up pidfd_open()
This wires up the pidfd_open() syscall into all arches at once.

Signed-off-by: Christian Brauner <christian@brauner.io>
Reviewed-by: David Howells <dhowells@redhat.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jann Horn <jannh@google.com>
Cc: Andy Lutomirsky <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-api@vger.kernel.org
Cc: linux-alpha@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-m68k@lists.linux-m68k.org
Cc: linux-mips@vger.kernel.org
Cc: linux-parisc@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s390@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: sparclinux@vger.kernel.org
Cc: linux-xtensa@linux-xtensa.org
Cc: linux-arch@vger.kernel.org
Cc: x86@kernel.org
2019-06-28 12:17:55 +02:00
Vincenzo Frascino
53c489e1df arm64: compat: Add missing syscall numbers
vDSO requires gettimeofday() and clock_gettime() syscalls to implement the
fallback mechanism.

Add the missing syscall numbers to unistd.h for arm64.

Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Shijith Thotton <sthotton@marvell.com>
Tested-by: Andre Przywara <andre.przywara@arm.com>
Cc: linux-arch@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-mips@vger.kernel.org
Cc: linux-kselftest@vger.kernel.org
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Mark Salyzyn <salyzyn@android.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Huw Davies <huw@codeweavers.com>
Link: https://lkml.kernel.org/r/20190621095252.32307-7-vincenzo.frascino@arm.com
2019-06-22 21:21:07 +02:00
Christian Brauner
d68dbb0c9a
arch: handle arches who do not yet define clone3
This cleanly handles arches who do not yet define clone3.

clone3() was initially placed under __ARCH_WANT_SYS_CLONE under the
assumption that this would cleanly handle all architectures. It does
not.
Architectures such as nios2 or h8300 simply take the asm-generic syscall
definitions and generate their syscall table from it. Since they don't
define __ARCH_WANT_SYS_CLONE the build would fail complaining about
sys_clone3 missing. The reason this doesn't happen for legacy clone is
that nios2 and h8300 provide assembly stubs for sys_clone. This seems to
be done for architectural reasons.

The build failures for nios2 and h8300 were caught int -next luckily.
The solution is to define __ARCH_WANT_SYS_CLONE3 that architectures can
add. Additionally, we need a cond_syscall(clone3) for architectures such
as nios2 or h8300 that generate their syscall table in the way I
explained above.

Fixes: 8f3220a806 ("arch: wire-up clone3() syscall")
Signed-off-by: Christian Brauner <christian@brauner.io>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Adrian Reber <adrian@lisas.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: linux-api@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: x86@kernel.org
2019-06-21 01:54:53 +02:00
Thomas Gleixner
caab277b1d treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 234
Based on 1 normalized pattern(s):

  this program is free software you can redistribute it and or modify
  it under the terms of the gnu general public license version 2 as
  published by the free software foundation this program is
  distributed in the hope that it will be useful but without any
  warranty without even the implied warranty of merchantability or
  fitness for a particular purpose see the gnu general public license
  for more details you should have received a copy of the gnu general
  public license along with this program if not see http www gnu org
  licenses

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-only

has been chosen to replace the boilerplate/reference in 503 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Enrico Weigelt <info@metux.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190602204653.811534538@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-19 17:09:07 +02:00
Christian Brauner
8f3220a806
arch: wire-up clone3() syscall
Wire up the clone3() call on all arches that don't require hand-rolled
assembly.

Some of the arches look like they need special assembly massaging and it is
probably smarter if the appropriate arch maintainers would do the actual
wiring. Arches that are wired-up are:
- x86{_32,64}
- arm{64}
- xtensa

Signed-off-by: Christian Brauner <christian@brauner.io>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Adrian Reber <adrian@lisas.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: linux-api@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: x86@kernel.org
2019-06-09 09:29:46 +02:00
David Howells
d8076bdb56 uapi: Wire up the mount API syscalls on non-x86 arches [ver #2]
Wire up the mount API syscalls on non-x86 arches.

Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-16 12:23:45 -04:00
Arnd Bergmann
39036cd272 arch: add pidfd and io_uring syscalls everywhere
Add the io_uring and pidfd_send_signal system calls to all architectures.

These system calls are designed to handle both native and compat tasks,
so all entries are the same across architectures, only arm-compat and
the generic tale still use an old format.

Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> (s390)
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-04-15 16:31:17 +02:00
Arnd Bergmann
48166e6ea4 y2038: add 64-bit time_t syscalls to all 32-bit architectures
This adds 21 new system calls on each ABI that has 32-bit time_t
today. All of these have the exact same semantics as their existing
counterparts, and the new ones all have macro names that end in 'time64'
for clarification.

This gets us to the point of being able to safely use a C library
that has 64-bit time_t in user space. There are still a couple of
loose ends to tie up in various areas of the code, but this is the
big one, and should be entirely uncontroversial at this point.

In particular, there are four system calls (getitimer, setitimer,
waitid, and getrusage) that don't have a 64-bit counterpart yet,
but these can all be safely implemented in the C library by wrapping
around the existing system calls because the 32-bit time_t they
pass only counts elapsed time, not time since the epoch. They
will be dealt with later.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
2019-02-07 00:13:28 +01:00
Arnd Bergmann
4ab65ba7a5 ARM: add kexec_file_load system call number
A couple of architectures including arm64 already implement the
kexec_file_load system call, on many others we have assigned a system
call number for it, but not implemented it yet.

Adding the number in arch/arm/ lets us use the system call on arm64
systems in compat mode, and also reduces the number of differences
between architectures. If we want to implement kexec_file_load on ARM
in the future, the number assignment means that kexec tools can already
be built with the now current set of kernel headers.

Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-01-25 17:22:50 +01:00
Arnd Bergmann
78594b9599 ARM: add migrate_pages() system call
The migrate_pages system call has an assigned number on all architectures
except ARM. When it got added initially in commit d80ade7b32 ("ARM:
Fix warning: #warning syscall migrate_pages not implemented"), it was
intentionally left out based on the observation that there are no 32-bit
ARM NUMA systems.

However, there are now arm64 NUMA machines that can in theory run 32-bit
kernels (actually enabling NUMA there would require additional work)
as well as 32-bit user space on 64-bit kernels, so that argument is no
longer very strong.

Assigning the number lets us use the system call on 64-bit kernels as well
as providing a more consistent set of syscalls across architectures.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
2019-01-25 17:22:43 +01:00
Will Deacon
7e0b44e870 arm64: compat: Hook up io_pgetevents() for 32-bit tasks
Commit 73aeb2cbcd ("ARM: 8787/1: wire up io_pgetevents syscall")
hooked up the io_pgetevents() system call for 32-bit ARM, so we can
do the same for the compat wrapper on arm64.

Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-01-04 14:18:01 +00:00
Will Deacon
169113ece0 arm64: compat: Avoid sending SIGILL for unallocated syscall numbers
The ARM Linux kernel handles the EABI syscall numbers as follows:

  0           - NR_SYSCALLS-1	: Invoke syscall via syscall table
  NR_SYSCALLS - 0xeffff		: -ENOSYS (to be allocated in future)
  0xf0000     - 0xf07ff		: Private syscall or -ENOSYS if not allocated
  > 0xf07ff			: SIGILL

Our compat code gets this wrong and ends up sending SIGILL in response
to all syscalls greater than NR_SYSCALLS which have a value greater
than 0x7ff in the bottom 16 bits.

Fix this by defining the end of the ARM private syscall region and
checking the syscall number against that directly. Update the comment
while we're at it.

Cc: <stable@vger.kernel.org>
Cc: Dave Martin <Dave.Martin@arm.com>
Reported-by: Pi-Hsun Shih <pihsun@chromium.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-01-04 14:18:01 +00:00
Arnd Bergmann
4faea239e5 y2038: utimes: Rework #ifdef guards for compat syscalls
After changing over to 64-bit time_t syscalls, many architectures will
want compat_sys_utimensat() but not respective handlers for utime(),
utimes() and futimesat(). This adds a new __ARCH_WANT_SYS_UTIME32 to
complement __ARCH_WANT_SYS_UTIME. For now, all 64-bit architectures that
support CONFIG_COMPAT set it, but future 64-bit architectures will not
(tile would not have needed it either, but got removed).

As older 32-bit architectures get converted to using CONFIG_64BIT_TIME,
they will have to use __ARCH_WANT_SYS_UTIME32 instead of
__ARCH_WANT_SYS_UTIME. Architectures using the generic syscall ABI don't
need either of them as they never had a utime syscall.

Since the compat_utimbuf structure is now required outside of
CONFIG_COMPAT, I'm moving it into compat_time.h.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
changed from last version:
- renamed __ARCH_WANT_COMPAT_SYS_UTIME to __ARCH_WANT_SYS_UTIME32
2018-08-29 15:42:23 +02:00
Arnd Bergmann
caf6f9c8a3 asm-generic: Remove unneeded __ARCH_WANT_SYS_LLSEEK macro
The sys_llseek sytem call is needed on all 32-bit architectures and
none of the 64-bit ones, so we can remove the __ARCH_WANT_SYS_LLSEEK guard
and simplify the include/asm-generic/unistd.h header further.

Since 32-bit tasks can run either natively or in compat mode on 64-bit
architectures, we have to check for both !CONFIG_64BIT and CONFIG_COMPAT.

There are a few 64-bit architectures that also reference sys_llseek
in their 64-bit ABI (e.g. sparc), but I verified that those all
select CONFIG_COMPAT, so the #if check is still correct here. It's
a bit odd to include it in the syscall table though, as it's the
same as sys_lseek() on 64-bit, but with strange calling conventions.

Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2018-08-29 15:42:21 +02:00
Will Deacon
409d5db498 arm64: rseq: Implement backend rseq calls and select HAVE_RSEQ
Implement calls to rseq_signal_deliver, rseq_handle_notify_resume
and rseq_syscall so that we can select HAVE_RSEQ on arm64.

Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-07-11 13:29:34 +01:00
Al Viro
2611dc1939 Remove compat_sys_getdents64()
Unlike normal compat syscall variants, it is needed only for
biarch architectures that have different alignement requirements for
u64 in 32bit and 64bit ABI *and* have __put_user() that won't handle
a store of 64bit value at 32bit-aligned address.  We used to have one
such (ia64), but its biarch support has been gone since 2010 (after
being broken in 2008, which went unnoticed since nobody had been using
it).

It had escaped removal at the same time only because back in 2004
a patch that switched several syscalls on amd64 from private wrappers to
generic compat ones had switched to use of compat_sys_getdents64(), which
hadn't needed (or used) a compat wrapper on amd64.

Let's bury it - it's at least 7 years overdue.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-04-17 12:52:22 -04:00
Will Deacon
713cc9df64 arm64: compat: Update compat syscalls
Hook up three pkey syscalls (which we don't implement) and the new statx
syscall, as has been done for arch/arm/.

Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-03-21 18:10:40 +00:00
Will Deacon
10fdf8513f arm64: unistd32.h: wire up missing syscalls for compat tasks
We're missing entries for mlock2, copy_file_range, preadv2 and pwritev2
in our compat syscall table, so hook them up. Only the last two need
compat wrappers.

Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-06-01 18:48:20 +01:00
Will Deacon
eb93ce2cb7 arm64: compat: wire up new syscalls
Commit 208473c1f3 ("ARM: wire up new syscalls") hooked up the new
userfaultfd and membarrier syscalls for ARM, so do the same for our
compat syscall table in arm64.

Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-10-14 13:51:41 +01:00
Linus Torvalds
6b00f7efb5 arm64 updates for 3.20:
- reimplementation of the virtual remapping of UEFI Runtime Services in
   a way that is stable across kexec
 - emulation of the "setend" instruction for 32-bit tasks (user
   endianness switching trapped in the kernel, SCTLR_EL1.E0E bit set
   accordingly)
 - compat_sys_call_table implemented in C (from asm) and made it a
   constant array together with sys_call_table
 - export CPU cache information via /sys (like other architectures)
 - DMA API implementation clean-up in preparation for IOMMU support
 - macros clean-up for KVM
 - dropped some unnecessary cache+tlb maintenance
 - CONFIG_ARM64_CPU_SUSPEND clean-up
 - defconfig update (CPU_IDLE)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJU25v3AAoJEGvWsS0AyF7xYjcP/j8ESvs+z0BPgeJ6XREfOnCh
 cp+w/1rJ5BafJ5RRkibrciwTNOIJS4FGMivWyURtoh430lS0Rh7fxZ3Ouna3xjrT
 Nf7AxenWoA8Lo6wHh+FlNUeGk3iWfX6WwA2tYrbKudK+LBJ1wHjwpE7cWQO0FgwJ
 aFDahu+QD5/u45p/VcVctMtiEDvOxBdO8gfat6r+YkLm7pbRxQkZnpA/JE4Gps1p
 Td5jvMNH9pXI5pffSbeR9Q+vs/r0yqKLXQg01Eb2bZgGDgwf9yzADrHuaKamZt35
 X5flmLiTGC6swJCJvUkZC1Nuue33bXcvW5+vgvar+MNGyXsxv+B/wARLqGhiWhQZ
 nLGwFpuNu6wdY9tGHb/XR8khcewkw1/lRH1hHKhchrmRyUqHvXcPgC5tamjLrY8C
 BV3BAeQvRho8OKwWUmbXIlyON1vPux6CJdj4D/A5NL+qph2WHeVWJCXg6nVFx0Wc
 Eb3bXbI4QRwTFL7pGRF8RyZJBAQtgYhQMKWMW2GHgUgn+r1EixG73BZoSwvpHrrw
 FOR9AVNfVBqmNON8xiIb3DN4EViq76EF0jrsZh5I9EoWS2w5qtk60kJQgXE+M4EE
 vOlmh3dhEVfCN2SxOn0bgoQmTulyjqGauTSSJKQbIBuinPFveukrJfGNFIWt0SZs
 f38FBMo6sgU4VG85B+Fr
 =X5x/
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Catalin Marinas:
 "arm64 updates for 3.20:

   - reimplementation of the virtual remapping of UEFI Runtime Services
     in a way that is stable across kexec
   - emulation of the "setend" instruction for 32-bit tasks (user
     endianness switching trapped in the kernel, SCTLR_EL1.E0E bit set
     accordingly)
   - compat_sys_call_table implemented in C (from asm) and made it a
     constant array together with sys_call_table
   - export CPU cache information via /sys (like other architectures)
   - DMA API implementation clean-up in preparation for IOMMU support
   - macros clean-up for KVM
   - dropped some unnecessary cache+tlb maintenance
   - CONFIG_ARM64_CPU_SUSPEND clean-up
   - defconfig update (CPU_IDLE)

  The EFI changes going via the arm64 tree have been acked by Matt
  Fleming.  There is also a patch adding sys_*stat64 prototypes to
  include/linux/syscalls.h, acked by Andrew Morton"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (47 commits)
  arm64: compat: Remove incorrect comment in compat_siginfo
  arm64: Fix section mismatch on alloc_init_p[mu]d()
  arm64: Avoid breakage caused by .altmacro in fpsimd save/restore macros
  arm64: mm: use *_sect to check for section maps
  arm64: drop unnecessary cache+tlb maintenance
  arm64:mm: free the useless initial page table
  arm64: Enable CPU_IDLE in defconfig
  arm64: kernel: remove ARM64_CPU_SUSPEND config option
  arm64: make sys_call_table const
  arm64: Remove asm/syscalls.h
  arm64: Implement the compat_sys_call_table in C
  syscalls: Declare sys_*stat64 prototypes if __ARCH_WANT_(COMPAT_)STAT64
  compat: Declare compat_sys_sigpending and compat_sys_sigprocmask prototypes
  arm64: uapi: expose our struct ucontext to the uapi headers
  smp, ARM64: Kill SMP single function call interrupt
  arm64: Emulate SETEND for AArch32 tasks
  arm64: Consolidate hotplug notifier for instruction emulation
  arm64: Track system support for mixed endian EL0
  arm64: implement generic IOMMU configuration
  arm64: Combine coherent and non-coherent swiotlb dma_ops
  ...
2015-02-11 18:03:54 -08:00
Catalin Marinas
0156411b18 arm64: Implement the compat_sys_call_table in C
Unlike the sys_call_table[], the compat one was implemented in sys32.S
making it impossible to notice discrepancies between the number of
compat syscalls and the __NR_compat_syscalls macro, the latter having to
be defined in asm/unistd.h as including asm/unistd32.h would cause
conflicts on __NR_* definitions. With this patch, incorrect
__NR_compat_syscalls values will result in a build-time error.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Suggested-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
2015-01-27 09:38:07 +00:00
Will Deacon
cd25b85ba6 arm64: compat: wire up compat_sys_execveat
With 841ee23025 ("ARM: wire up execveat syscall"), arch/arm/ has grown
support for the execveat system call.

This patch wires up the compat variant for arm64.

Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-01-12 16:46:21 +00:00
Mark Rutland
0f9132ceab arm64: Correct __NR_compat_syscalls for bpf
Commit 97b56be103 (arm64: compat: Enable bpf syscall) made the
usual mistake of forgetting to update __NR_compat_syscalls. Due to this,
when el0_sync_compat calls el0_svc_naked, the test against sc_nr
(__NR_compat_syscalls) will fail, and we'll call ni_sys, returning
-ENOSYS to userspace.

This patch bumps __NR_compat_syscalls appropriately, enabling the use of
the bpf syscall from compat tasks.

Due to the reorganisation of unistd{,32}.h as part of commit
f3e5c847ec (arm64: Add __NR_* definitions for compat syscalls) it
is not currently possible to include both headers and sanity-check the
value of __NR_compat_syscalls at build-time to prevent this from
happening again. Additional rework is required to make such niceties a
possibility.

Cc: Will Deacon <will.deacon@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-01-07 11:40:58 +00:00
AKASHI Takahiro
a1ae65b219 arm64: add seccomp support
secure_computing() is called first in syscall_trace_enter() so that
a system call will be aborted quickly without doing succeeding syscall
tracing if seccomp rules want to deny that system call.

On compat task, syscall numbers for system calls allowed in seccomp mode 1
are different from those on normal tasks, and so _NR_seccomp_xxx_32's need
to be redefined.

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2014-11-28 10:24:59 +00:00
Will Deacon
a97a42c476 arm64: compat: wire up memfd_create and getrandom syscalls for aarch32
arch/arm/ just grew support for the new memfd_create and getrandom
syscalls, so add them to our compat layer too.

Signed-off-by: Will Deacon <will.deacon@arm.com>
2014-08-18 19:47:04 +01:00
Catalin Marinas
f3e5c847ec arm64: Add __NR_* definitions for compat syscalls
This patch adds __NR_* definitions to asm/unistd32.h, moves the
__NR_compat_* definitions to asm/unistd.h and removes all the explicit
unistd32.h includes apart from the one building the compat syscall
table. The aim is to have the compat __NR_* definitions available but
without colliding with the native syscall definitions (required by
lib/compat_audit.c to avoid duplicating the audit header files between
native and compat).

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-07-10 11:02:40 +01:00
AKASHI Takahiro
055b1212d1 arm64: ftrace: Add system call tracepoint
This patch allows system call entry or exit to be traced as ftrace events,
ie. sys_enter_*/sys_exit_*, if CONFIG_FTRACE_SYSCALLS is enabled.
Those events appear and can be controlled under
    ${sysfs}/tracing/events/syscalls/

Please note that we can't trace compat system calls here because
AArch32 mode does not share the same syscall table with AArch64.
Just define ARCH_TRACE_IGNORE_COMPAT_SYSCALLS in order to avoid unexpected
results (bogus syscalls reported or even hang-up).

Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2014-05-29 09:08:33 +01:00
Heiko Carstens
0473c9b5f0 compat: let architectures define __ARCH_WANT_COMPAT_SYS_GETDENTS64
For architecture dependent compat syscalls in common code an architecture
must define something like __ARCH_WANT_<WHATEVER> if it wants to use the
code.
This however is not true for compat_sys_getdents64 for which architectures
must define __ARCH_OMIT_COMPAT_SYS_GETDENTS64 if they do not want the code.

This leads to the situation where all architectures, except mips, get the
compat code but only x86_64, arm64 and the generic syscall architectures
actually use it.

So invert the logic, so that architectures actively must do something to
get the compat code.

This way a couple of architectures get rid of otherwise dead code.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2014-03-04 09:05:33 +01:00
Al Viro
d64008a8f3 burying unused conditionals
__ARCH_WANT_SYS_RT_SIGACTION,
__ARCH_WANT_SYS_RT_SIGSUSPEND,
__ARCH_WANT_COMPAT_SYS_RT_SIGSUSPEND,
__ARCH_WANT_COMPAT_SYS_SCHED_RR_GET_INTERVAL - not used anymore
CONFIG_GENERIC_{SIGALTSTACK,COMPAT_RT_SIG{ACTION,QUEUEINFO,PENDING,PROCMASK}} -
can be assumed always set.
2013-02-14 09:21:15 -05:00
Linus Torvalds
54d46ea993 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal
Pull signal handling cleanups from Al Viro:
 "sigaltstack infrastructure + conversion for x86, alpha and um,
  COMPAT_SYSCALL_DEFINE infrastructure.

  Note that there are several conflicts between "unify
  SS_ONSTACK/SS_DISABLE definitions" and UAPI patches in mainline;
  resolution is trivial - just remove definitions of SS_ONSTACK and
  SS_DISABLED from arch/*/uapi/asm/signal.h; they are all identical and
  include/uapi/linux/signal.h contains the unified variant."

Fixed up conflicts as per Al.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
  alpha: switch to generic sigaltstack
  new helpers: __save_altstack/__compat_save_altstack, switch x86 and um to those
  generic compat_sys_sigaltstack()
  introduce generic sys_sigaltstack(), switch x86 and um to it
  new helper: compat_user_stack_pointer()
  new helper: restore_altstack()
  unify SS_ONSTACK/SS_DISABLE definitions
  new helper: current_user_stack_pointer()
  missing user_stack_pointer() instances
  Bury the conditionals from kernel_thread/kernel_execve series
  COMPAT_SYSCALL_DEFINE: infrastructure
2012-12-20 18:05:28 -08:00
Al Viro
ae903caae2 Bury the conditionals from kernel_thread/kernel_execve series
All architectures have
	CONFIG_GENERIC_KERNEL_THREAD
	CONFIG_GENERIC_KERNEL_EXECVE
	__ARCH_WANT_SYS_EXECVE
None of them have __ARCH_WANT_KERNEL_EXECVE and there are only two callers
of kernel_execve() (which is a trivial wrapper for do_execve() now) left.
Kill the conditionals and make both callers use do_execve().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-12-19 18:07:38 -05:00
Catalin Marinas
0ad50c3896 compat: generic compat_sys_sched_rr_get_interval() implementation
This function is used by sparc, powerpc tile and arm64 for compat support.
 The patch adds a generic implementation with a wrapper for PowerPC to do
the u32->int sign extension.

The reason for a single patch covering powerpc, tile, sparc and arm64 is
to keep it bisectable, otherwise kernel building may fail with mismatched
function declarations.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>  [for tile]
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-17 17:15:18 -08:00
Al Viro
9ac0800213 arm64: sanitize copy_thread(), switch to generic fork/vfork/clone
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-11-28 22:36:33 -05:00
Al Viro
2bf81c8af9 Merge branch 'arch-microblaze' into no-rebases 2012-11-16 22:28:43 -05:00
Will Deacon
6212a51224 arm64: compat: select CONFIG_ARCH_WANT_COMPAT_IPC_PARSE_VERSION
Commit c1d7e01d78 ("ipc: use Kconfig options for
__ARCH_WANT_[COMPAT_]IPC_PARSE_VERSION") replaced the
__ARCH_WANT_COMPAT_IPC_PARSE_VERSION token with a corresponding Kconfig
option instead.

This patch updates arm64 to use the latter, rather than #define an
unused token.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2012-11-08 16:06:20 +00:00
Catalin Marinas
6a872777ff arm64: Use generic sys_execve() implementation
This patch converts the arm64 port to use the generic sys_execve()
implementation removing the arm64-specific (compat_)sys_execve_wrapper()
functions.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2012-10-17 14:41:51 +01:00
David Howells
4262a72762 UAPI: (Scripted) Disintegrate arch/arm64/include/asm
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Michael Kerrisk <mtk.manpages@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Dave Jones <davej@redhat.com>
2012-10-11 11:05:13 +01:00
Catalin Marinas
f3d447a97f arm64: Do not include asm/unistd32.h in asm/unistd.h
This patch only includes asm/unistd32.h where necessary and removes its
inclusion in the asm/unistd.h file. The __SYSCALL_COMPAT guard is
dropped.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
2012-10-11 10:39:08 +01:00
David Howells
1c1e436269 UAPI: Split compound conditionals containing __KERNEL__ in Arm64
Split compound conditionals containing __KERNEL__ in Arm64 where possible to
make it easier for the UAPI disintegration scripts to handle them.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
2012-10-04 12:10:18 +01:00
David Howells
890139529d UAPI: Fix the guards on various asm/unistd.h files
asm-generic/unistd.h and a number of asm/unistd.h files have been given
reinclusion guards that allow the guard to be overridden if __SYSCALL is
defined.  Unfortunately, these files define __SYSCALL and don't undefine it
when they've finished with it, thus rendering the guard ineffective.

The reason for this override is to allow the file to be #included multiple
times with different settings on __SYSCALL for purposes like generating syscall
tables.

The following guards are problematic:

arch/arm64/include/asm/unistd.h:#if !defined(__ASM_UNISTD_H) || defined(__SYSCALL)
arch/arm64/include/asm/unistd32.h:#if !defined(__ASM_UNISTD32_H) || defined(__SYSCALL)
arch/c6x/include/asm/unistd.h:#if !defined(_ASM_C6X_UNISTD_H) || defined(__SYSCALL)
arch/hexagon/include/asm/unistd.h:#if !defined(_ASM_HEXAGON_UNISTD_H) || defined(__SYSCALL)
arch/openrisc/include/asm/unistd.h:#if !defined(__ASM_OPENRISC_UNISTD_H) || defined(__SYSCALL)
arch/score/include/asm/unistd.h:#if !defined(_ASM_SCORE_UNISTD_H) || defined(__SYSCALL)
arch/tile/include/asm/unistd.h:#if !defined(_ASM_TILE_UNISTD_H) || defined(__SYSCALL)
arch/unicore32/include/asm/unistd.h:#if !defined(__UNICORE_UNISTD_H__) || defined(__SYSCALL)
include/asm-generic/unistd.h:#if !defined(_ASM_GENERIC_UNISTD_H) || defined(__SYSCALL)

On the assumption that the guards' ineffectiveness has passed unnoticed, just
remove these guards entirely.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
2012-10-04 12:10:18 +01:00
Catalin Marinas
7992d60dc4 arm64: System calls handling
This patch adds support for system calls coming from 64-bit
applications. It uses the asm-generic/unistd.h definitions with the
canonical set of system calls. The private system calls are only used
for 32-bit (compat) applications as 64-bit ones can set the TLS and
flush the caches entirely from user space.

The sys_call_table is just an array defined in a C file and it contains
pointers to the syscall functions. The array is 4KB aligned to allow the
use of the ADRP instruction (longer range ADR) in entry.S.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Tony Lindgren <tony@atomide.com>
Acked-by: Nicolas Pitre <nico@linaro.org>
Acked-by: Olof Johansson <olof@lixom.net>
Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
2012-09-17 13:42:08 +01:00