Add ftrace.instance.*.alloc_snapshot option.
This option has been described in Documentation/trace/boottime-trace.rst
but not implemented yet.
ftrace.[instance.INSTANCE.]alloc_snapshot
Allocate snapshot buffer.
The difference from kernel.alloc_snapshot is that the kernel.alloc_snapshot
will allocate the buffer only for the main instance, but this can allocate
buffer for any new instances.
Link: https://lkml.kernel.org/r/160234368948.400560.15313384470765915015.stgit@devnote2
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Below race can come, if trace_open and resize of
cpu buffer is running parallely on different cpus
CPUX CPUY
ring_buffer_resize
atomic_read(&buffer->resize_disabled)
tracing_open
tracing_reset_online_cpus
ring_buffer_reset_cpu
rb_reset_cpu
rb_update_pages
remove/insert pages
resetting pointer
This race can cause data abort or some times infinte loop in
rb_remove_pages and rb_insert_pages while checking pages
for sanity.
Take buffer lock to fix this.
Link: https://lkml.kernel.org/r/1601976833-24377-1-git-send-email-gkohli@codeaurora.org
Cc: stable@vger.kernel.org
Fixes: b23d7a5f4a ("ring-buffer: speed up buffer resets by avoiding synchronize_rcu for each CPU")
Signed-off-by: Gaurav Kohli <gkohli@codeaurora.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
A cut and paste error had the check to use __get_str() test "is_dynamic"
twice, instead of checking "is_string && is_dynamic".
Link: https://lore.kernel.org/r/d34dccd5-96ba-a2d9-46ea-de8807525deb@canonical.com
Reported-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The variable 'len' has been assigned a value but is not used after that.
So, remove the assignement.
Link: https://lkml.kernel.org/r/20200930184303.22896-1-sudipm.mukherjee@gmail.com
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Based on the following two reasones, we could simplify the calculation:
- If the number after roundup count is not power of 2, we would
definitely have more than 1 empty page with a higher order.
- get_count_order() just return current order, so one lower order
could meet the requirement.
The calculation could be simplified by lower one order level when pages
are not power of 2.
Link: https://lkml.kernel.org/r/20200831031104.23322-5-richard.weiyang@linux.alibaba.com
Signed-off-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
No need to add a check to subtract the number of bits if bits is zero after
fls(). Just divide the size by two before calling it. This does give the
same answer for size of 0 and 1, but that's fine.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Currently, sythetic events only support static string fields such as:
# echo 'test_latency u64 lat; char somename[32]' > /sys/kernel/debug/tracing/synthetic_events
Which is fine, but wastes a lot of space in the event.
It also prevents the most commonly-defined strings in the existing
trace events e.g. those defined using __string(), from being passed to
synthetic events via the trace() action.
With this change, synthetic events with dynamic fields can be defined:
# echo 'test_latency u64 lat; char somename[]' > /sys/kernel/debug/tracing/synthetic_events
And the trace() action can be used to generate events using either
dynamic or static strings:
# echo 'hist:keys=name:lat=common_timestamp.usecs-$ts0:onmatch(sys.event).test_latency($lat,name)' > /sys/kernel/debug/tracing/events
The synthetic event dynamic strings are implemented in the same way as
the existing __data_loc strings and appear as such in the format file.
[ <rostedt@goodmis.org>: added __set_synth_event_print_fmt() changes:
I added the following to make it work with trace-cmd. Dynamic strings
must have __get_str() for events in the print_fmt otherwise it can't be
parsed correctly. ]
Link: https://lore.kernel.org/r/cover.1601588066.git.zanussi@kernel.org
Link: https://lkml.kernel.org/r/3ed35b6d0e390f5b94cb4a9ba1cc18f5982ab277.1601848695.git.zanussi@kernel.org
Tested-by: Axel Rasmussen <axelrasmussen@google.com>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
String variables created as field variables and save variables are
already handled properly by having their values copied when set. The
same isn't done for normal variables, but needs to be - simply saving
a pointer to a string contained in an old event isn't sufficient,
since that event's data may quickly become overwritten and therefore a
string pointer to it could yield garbage.
This change uses the same mechanism as field variables and simply
appends the new strings to the existing per-element field_var_str[]
array allocated for that purpose.
Link: https://lkml.kernel.org/r/1c1a03798b02e67307412a0c719d1bfb69b13007.1601848695.git.zanussi@kernel.org
Fixes: 02205a6752 (tracing: Add support for 'field variables')
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
synth_field_size() returns either a positive size or an error (zero or
a negative value). However, the existing code assumes the only error
value is 0. It doesn't handle negative error codes, as it assigns
directly to field->size (a size_t; unsigned), thereby interpreting the
error code as a valid size instead.
Do the test before assignment to field->size.
[ axelrasmussen@google.com: changelog addition, first paragraph above ]
Link: https://lkml.kernel.org/r/9b6946d9776b2eeb43227678158196de1c3c6e1d.1601848695.git.zanussi@kernel.org
Fixes: 4b147936fa (tracing: Add support for 'synthetic' events)
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Tested-by: Axel Rasmussen <axelrasmussen@google.com>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
32 is too small for this value, and anyway it makes more sense to use
MAX_FILTER_STR_VAL, as this is also the value used for variable-length
__strings.
Link: https://lkml.kernel.org/r/6adfd1668ac1fd8670bd58206944a762061a5559.1601848695.git.zanussi@kernel.org
Tested-by: Axel Rasmussen <axelrasmussen@google.com>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Initialize per-instance event list in early boot time (before
initializing instance directory on tracefs). This fixes boot-time
tracing to correctly handle the boot-time per-instance settings.
Link: https://lkml.kernel.org/r/160096560826.182763.17110991546046128881.stgit@devnote2
Fixes: 4114fbfd02 ("tracing: Enable creating new instance early boot")
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Initialize boot-time tracing in core_initcall_sync instead of
fs_initcall, and initialize required tracers (kprobes and synth)
in core_initcall. This will allow the boot-time tracing to trace
__init code from the beginning of postcore_initcall stage.
Link: https://lkml.kernel.org/r/159974155727.478751.7486926132902849578.stgit@devnote2
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Enable creating new trace_array instance in early boot stage.
If the instances directory is not created, postpone it until
the tracefs is initialized.
Link: https://lkml.kernel.org/r/159974154763.478751.6289753509587233103.stgit@devnote2
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Split the event fields initialization from creating new event directory.
This allows the boot-time tracing to define dynamic events before
initializing events directory on tracefs.
Link: https://lkml.kernel.org/r/159974153790.478751.3475515065034825374.stgit@devnote2
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Define event fields at early stage so that boot-time tracing can
access the event fields (like per-event filter setting).
Link: https://lkml.kernel.org/r/159974152862.478751.2023768466808361350.stgit@devnote2
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Init kprobes feature in early_initcall as same as jump_label and
dynamic_debug does, so that we can use kprobes events in earlier
boot stage.
Link: https://lkml.kernel.org/r/159974151897.478751.8342374158615496628.stgit@devnote2
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Support perf-style return probe ("SYMBOL%return") for kprobe events.
This will allow boot-time tracing user to define a return probe event.
Link: https://lkml.kernel.org/r/159972813535.428528.4437029657208468954.stgit@devnote2
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Add per-instance tracing_on option, which will be useful with
traceon/traceoff event trigger actions. For example, if we
disable tracing_on by default and set traceon and traceoff on
a pair of events, we can trace functions between the pair of
events.
Link: https://lkml.kernel.org/r/159972811538.428528.2561315102284268611.stgit@devnote2
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
It seems that alloc_retstack_tasklist() can also take a lockless
approach for scanning the tasklist, instead of using the big global
tasklist_lock. For this we also kill another deprecated and rcu-unsafe
tsk->thread_group user replacing it with for_each_process_thread(),
maintaining semantics.
Here tasklist_lock does not protect anything other than the list
against concurrent fork/exit. And considering that the whole thing
is capped by FTRACE_RETSTACK_ALLOC_SIZE (32), it should not be a
problem to have a pontentially stale, yet stable, list. The task cannot
go away either, so we don't risk racing with ftrace_graph_exit_task()
which clears the retstack.
The tsk->ret_stack management is not protected by tasklist_lock, being
serialized with the corresponding publish/subscribe barriers against
concurrent ftrace_push_return_trace(). In addition this plays nicer
with cachelines by avoiding two atomic ops in the uncontended case.
Link: https://lkml.kernel.org/r/20200907013326.9870-1-dave@stgolabs.net
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The "tr" is a stack variable so setting it to NULL before a return is
a no-op. Delete the assignment.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
It is advised to use module_name() macro instead of dereferencing mod->name
directly. This makes sense for consistencys sake and also it prevents a
hard dependency to CONFIG_MODULES.
Link: https://lkml.kernel.org/r/20200818050857.117998-1-jarkko.sakkinen@linux.intel.com
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-mm@kvack.org
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>,
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Will Deacon <will@kernel.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Current tracing_init_dentry() return a d_entry pointer, while is not
necessary. This function returns NULL on success or error on failure,
which means there is no valid d_entry pointer return.
Let's return 0 on success and negative value for error.
Link: https://lkml.kernel.org/r/20200712011036.70948-5-richard.weiyang@linux.alibaba.com
Signed-off-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Currently we have following call flow:
tracer_init_tracefs()
tracing_init_dentry()
event_trace_init()
tracing_init_dentry()
This shows tracing_init_dentry() is called twice in this flow and this
is not necessary.
Let's remove the second one when it is for sure be properly initialized.
Link: https://lkml.kernel.org/r/20200712011036.70948-4-richard.weiyang@linux.alibaba.com
Signed-off-by: Wei Yang <richard.weiyang@linux.alibaba.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Since kprobe_event= cmdline option allows user to put kprobes on the
functions in initmem, kprobe has to make such probes gone after boot.
Currently the probes on the init functions in modules will be handled
by module callback, but the kernel init text isn't handled.
Without this, kprobes may access non-exist text area to disable or
remove it.
Link: https://lkml.kernel.org/r/159972810544.428528.1839307531600646955.stgit@devnote2
Fixes: 970988e19e ("tracing/kprobe: Add kprobe_event= boot parameter")
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
clang static analyzer reports this problem
trace_events_hist.c:3824:3: warning: Attempt to free
released memory
kfree(hist_data->attrs->var_defs.name[i]);
In parse_var_defs() if there is a problem allocating
var_defs.expr, the earlier var_defs.name is freed.
This free is duplicated by free_var_defs() which frees
the rest of the list.
Because free_var_defs() has to run anyway, remove the
second free fom parse_var_defs().
Link: https://lkml.kernel.org/r/20200907135845.15804-1-trix@redhat.com
Cc: stable@vger.kernel.org
Fixes: 30350d65ac ("tracing: Add variable support to hist triggers")
Reviewed-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Commit 32927393dc ("sysctl: pass kernel pointers to ->proc_handler")
changed ctl_table.proc_handler to take a kernel pointer. Adjust the
signature of ftrace_enable_sysctl to match ctl_table.proc_handler which
fixes the following sparse warning:
kernel/trace/ftrace.c:7544:43: warning: incorrect type in argument 3 (different address spaces)
kernel/trace/ftrace.c:7544:43: expected void *
kernel/trace/ftrace.c:7544:43: got void [noderef] __user *buffer
Link: https://lkml.kernel.org/r/20200907093207.13540-1-tklauser@distanz.ch
Fixes: 32927393dc ("sysctl: pass kernel pointers to ->proc_handler")
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
For 64bit CONFIG_BASE_SMALL=0 systems PID_MAX_LIMIT is set by default to
4194304. During boot the kernel sets a new value based on number of CPUs
but no lower than 32768. It is 1024 per CPU so with 128 CPUs the default
becomes 131072 which needs six digits.
This value can be increased during run time but must not exceed the
initial upper limit.
Systemd sometime after v241 sets it to the upper limit during boot. The
result is that when the pid exceeds five digits, the trace output is a
little hard to read because it is no longer properly padded (same like
on big iron with 98+ CPUs).
Increase the pid padding to seven digits.
Link: https://lkml.kernel.org/r/20200904082331.dcdkrr3bkn3e4qlg@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Add synchronize_rcu() after list_del_rcu() in
ftrace_remove_trampoline_from_kallsyms() to protect readers of
ftrace_ops_trampoline_list (in ftrace_get_trampoline_kallsym)
which is used when kallsyms is read.
Link: https://lkml.kernel.org/r/20200901091617.31837-1-adrian.hunter@intel.com
Fixes: fc0ea795f5 ("ftrace: Add symbols for ftrace trampolines")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Commit fc0ea795f5 ("ftrace: Add symbols for ftrace trampolines")
missed to remove ops from new ftrace_ops_trampoline_list in
ftrace_startup() if ftrace_hash_ipmodify_enable() fails there. It may
lead to BUG if such ops come from a module which may be removed.
Moreover, the trampoline itself is not freed in this case.
Fix it by calling ftrace_trampoline_free() during the rollback.
Link: https://lkml.kernel.org/r/20200831122631.28057-1-mbenes@suse.cz
Fixes: fc0ea795f5 ("ftrace: Add symbols for ftrace trampolines")
Fixes: f8b8be8a31 ("ftrace, kprobes: Support IPMODIFY flag to find IP modify conflict")
Signed-off-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
- Fix memory resource leak of user_notif under TSYNC race (Tycho Andersen)
-----BEGIN PGP SIGNATURE-----
iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAl9cE6IWHGtlZXNjb29r
QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJuHfD/9CrUBQl1A4ZuvRjJEiP9V/7g/B
JKDoU+VY3j4B7adFypol2atXmrpcFRUY8FfZYLY4lJtl30YUTC5mxTeQpXjH71p2
PVSHUc1eKGFgThgcGaGs8qRGDctvLJTX9KnRRfYX6UGo5fsbyJBTDJMWZ00+87Ia
3cgCo60Q/107KiDDfb4D8rROG9uKkTaa+icZPjCzGAOlBOZhWX2y5ViT0KvEre/r
ObaCHAs4JIIyqTTrPUTLeOqjzIjp0yYZ/FmyJOQZ8cSA1HezbxHU9kgi6d69QaZB
natXjarHmU5/eUBjbQ95jH324qamoLq++ch/sL4NiitjboAmAxZrIZ80Ir4qOrcU
6ddTr0jhzKsfGzibZKI6g3fYCJJ38DJl/JaiADeySovdEaf7h3cs85WjXK2nVuZR
uKI5heaK/4tumIBqTBSo4cU7Bk9hSOXtoAUloiIem/jXZYS4Atl5WbXynAI4fM3b
FO1PwKm3LBX5Ua1cjOHRydFZ1qZB90TvzoylLWXOSJ+ThmKOWfxtk98G6C7l/AY5
18FjYjQxn8NT1AFBoRyFB+0Jf0KPrkqr0un1BdWt+B8hNMovEn7PHvAFJ1tJOQic
8TnbGtDYO58kkMsdSSFATwquzo31yu1epXXUtviR/cJVanY/dhGuCtgamXwrUhVa
ElFPQaO0W5DgBAxXUA==
=I7rD
-----END PGP SIGNATURE-----
Merge tag 'seccomp-v5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull seccomp fixes from Kees Cook:
"This fixes a rare race condition in seccomp when using TSYNC and
USER_NOTIF together where a memory allocation would not get freed
(found by syzkaller, fixed by Tycho).
Additionally updates Tycho's MAINTAINERS and .mailmap entries for his
new address"
* tag 'seccomp-v5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
seccomp: don't leave dangling ->notif if file allocation fails
mailmap, MAINTAINERS: move to tycho.pizza
seccomp: don't leak memory when filter install races
Using gcov to collect coverage data for kernels compiled with GCC 10.1
causes random malfunctions and kernel crashes. This is the result of a
changed GCOV_COUNTERS value in GCC 10.1 that causes a mismatch between
the layout of the gcov_info structure created by GCC profiling code and
the related structure used by the kernel.
Fix this by updating the in-kernel GCOV_COUNTERS value. Also re-enable
config GCOV_KERNEL for use with GCC 10.
Reported-by: Colin Ian King <colin.king@canonical.com>
Reported-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Tested-by: Leon Romanovsky <leonro@nvidia.com>
Tested-and-Acked-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull crypto fix from Herbert Xu:
"This fixes a regression in padata"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
padata: fix possible padata_works_lock deadlock
Christian and Kees both pointed out that this is a bit sloppy to open-code
both places, and Christian points out that we leave a dangling pointer to
->notif if file allocation fails. Since we check ->notif for null in order
to determine if it's ok to install a filter, this means people won't be
able to install a filter if the file allocation fails for some reason, even
if they subsequently should be able to.
To fix this, let's hoist this free+null into its own little helper and use
it.
Reported-by: Kees Cook <keescook@chromium.org>
Reported-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Tycho Andersen <tycho@tycho.pizza>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Link: https://lore.kernel.org/r/20200902140953.1201956-1-tycho@tycho.pizza
Signed-off-by: Kees Cook <keescook@chromium.org>
In seccomp_set_mode_filter() with TSYNC | NEW_LISTENER, we first initialize
the listener fd, then check to see if we can actually use it later in
seccomp_may_assign_mode(), which can fail if anyone else in our thread
group has installed a filter and caused some divergence. If we can't, we
partially clean up the newly allocated file: we put the fd, put the file,
but don't actually clean up the *memory* that was allocated at
filter->notif. Let's clean that up too.
To accomplish this, let's hoist the actual "detach a notifier from a
filter" code to its own helper out of seccomp_notify_release(), so that in
case anyone adds stuff to init_listener(), they only have to add the
cleanup code in one spot. This does a bit of extra locking and such on the
failure path when the filter is not attached, but it's a slow failure path
anyway.
Fixes: 51891498f2 ("seccomp: allow TSYNC and USER_NOTIF together")
Reported-by: syzbot+3ad9614a12f80994c32e@syzkaller.appspotmail.com
Signed-off-by: Tycho Andersen <tycho@tycho.pizza>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Link: https://lore.kernel.org/r/20200902014017.934315-1-tycho@tycho.pizza
Signed-off-by: Kees Cook <keescook@chromium.org>
Merge misc fixes from Andrew Morton:
"19 patches.
Subsystems affected by this patch series: MAINTAINERS, ipc, fork,
checkpatch, lib, and mm (memcg, slub, pagemap, madvise, migration,
hugetlb)"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
include/linux/log2.h: add missing () around n in roundup_pow_of_two()
mm/khugepaged.c: fix khugepaged's request size in collapse_file
mm/hugetlb: fix a race between hugetlb sysctl handlers
mm/hugetlb: try preferred node first when alloc gigantic page from cma
mm/migrate: preserve soft dirty in remove_migration_pte()
mm/migrate: remove unnecessary is_zone_device_page() check
mm/rmap: fixup copying of soft dirty and uffd ptes
mm/migrate: fixup setting UFFD_WP flag
mm: madvise: fix vma user-after-free
checkpatch: fix the usage of capture group ( ... )
fork: adjust sysctl_max_threads definition to match prototype
ipc: adjust proc_ipc_sem_dointvec definition to match prototype
mm: track page table modifications in __apply_to_page_range()
MAINTAINERS: IA64: mark Status as Odd Fixes only
MAINTAINERS: add LLVM maintainers
MAINTAINERS: update Cavium/Marvell entries
mm: slub: fix conversion of freelist_corrupted()
mm: memcg: fix memcg reclaim soft lockup
memcg: fix use-after-free in uncharge_batch
Commit 32927393dc ("sysctl: pass kernel pointers to ->proc_handler")
changed ctl_table.proc_handler to take a kernel pointer. Adjust the
definition of sysctl_max_threads to match its prototype in
linux/sysctl.h which fixes the following sparse error/warning:
kernel/fork.c:3050:47: warning: incorrect type in argument 3 (different address spaces)
kernel/fork.c:3050:47: expected void *
kernel/fork.c:3050:47: got void [noderef] __user *buffer
kernel/fork.c:3036:5: error: symbol 'sysctl_max_threads' redeclared with different type (incompatible argument 3 (different address spaces)):
kernel/fork.c:3036:5: int extern [addressable] [signed] [toplevel] sysctl_max_threads( ... )
kernel/fork.c: note: in included file (through include/linux/key.h, include/linux/cred.h, include/linux/sched/signal.h, include/linux/sched/cputime.h):
include/linux/sysctl.h:242:5: note: previously declared as:
include/linux/sysctl.h:242:5: int extern [addressable] [signed] [toplevel] sysctl_max_threads( ... )
Fixes: 32927393dc ("sysctl: pass kernel pointers to ->proc_handler")
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Link: https://lkml.kernel.org/r/20200825093647.24263-1-tklauser@distanz.ch
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andy reported that the syscall treacing for 32bit fast syscall fails:
# ./tools/testing/selftests/x86/ptrace_syscall_32
...
[RUN] SYSEMU
[FAIL] Initial args are wrong (nr=224, args=10 11 12 13 14 4289172732)
...
[RUN] SYSCALL
[FAIL] Initial args are wrong (nr=29, args=0 0 0 0 0 4289172732)
The eason is that the conversion to generic entry code moved the retrieval
of the sixth argument (EBP) after the point where the syscall entry work
runs, i.e. ptrace, seccomp, audit...
Unbreak it by providing a split up version of syscall_enter_from_user_mode().
- syscall_enter_from_user_mode_prepare() establishes state and enables
interrupts
- syscall_enter_from_user_mode_work() runs the entry work
Replace the call to syscall_enter_from_user_mode() in the 32bit fast
syscall C-entry with the split functions and stick the EBP retrieval
between them.
Fixes: 27d6b4d14f ("x86/entry: Use generic syscall entry function")
Reported-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/87k0xdjbtt.fsf@nanos.tec.linutronix.de