mirror of
https://mirrors.bfsu.edu.cn/git/linux.git
synced 2024-12-19 00:54:41 +08:00
2b001b9407
41797 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Andrii Nakryiko
|
e7d85427ef |
bpf: Validate BPF object in BPF_OBJ_PIN before calling LSM
Do a sanity check whether provided file-to-be-pinned is actually a BPF object (prog, map, btf) before calling security_path_mknod LSM hook. If it's not, LSM hook doesn't have to be triggered, as the operation has no chance of succeeding anyways. Suggested-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/bpf/20230522232917.2454595-2-andrii@kernel.org |
||
Aditi Ghag
|
e924e80ee6 |
bpf: Add kfunc filter function to 'struct btf_kfunc_id_set'
This commit adds the ability to filter kfuncs to certain BPF program types. This is required to limit bpf_sock_destroy kfunc implemented in follow-up commits to programs with attach type 'BPF_TRACE_ITER'. The commit adds a callback filter to 'struct btf_kfunc_id_set'. The filter has access to the `bpf_prog` construct including its properties such as `expected_attached_type`. Signed-off-by: Aditi Ghag <aditi.ghag@isovalent.com> Link: https://lore.kernel.org/r/20230519225157.760788-7-aditi.ghag@isovalent.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> |
||
Yafang Shao
|
e859e42951 |
bpf: Show target_{obj,btf}_id in tracing link fdinfo
The target_btf_id can help us understand which kernel function is linked by a tracing prog. The target_btf_id and target_obj_id have already been exposed to userspace, so we just need to show them. The result as follows, $ cat /proc/10673/fdinfo/10 pos: 0 flags: 02000000 mnt_id: 15 ino: 2094 link_type: tracing link_id: 2 prog_tag: a04f5eef06a7f555 prog_id: 13 attach_type: 24 target_obj_id: 1 target_btf_id: 13964 Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230517103126.68372-2-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
Andrii Nakryiko
|
cff36398bd |
bpf: drop unnecessary user-triggerable WARN_ONCE in verifierl log
It's trivial for user to trigger "verifier log line truncated" warning, as verifier has a fixed-sized buffer of 1024 bytes (as of now), and there are at least two pieces of user-provided information that can be output through this buffer, and both can be arbitrarily sized by user: - BTF names; - BTF.ext source code lines strings. Verifier log buffer should be properly sized for typical verifier state output. But it's sort-of expected that this buffer won't be long enough in some circumstances. So let's drop the check. In any case code will work correctly, at worst truncating a part of a single line output. Reported-by: syzbot+8b2a08dfbd25fd933d75@syzkaller.appspotmail.com Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230516180409.3549088-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
Jakub Kicinski
|
a0e35a648f |
bpf-next-for-netdev
-----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZGKqEAAKCRDbK58LschI g6LYAQDp1jAszCOkmJ8VUA0ZyC5NAFDv+7y9Nd1toYWYX1btzAEAkf8+5qBJ1qmI P5M0hjMTbH4MID9Aql10ZbMHheyOBAo= =NUQM -----END PGP SIGNATURE----- Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2023-05-16 We've added 57 non-merge commits during the last 19 day(s) which contain a total of 63 files changed, 3293 insertions(+), 690 deletions(-). The main changes are: 1) Add precision propagation to verifier for subprogs and callbacks, from Andrii Nakryiko. 2) Improve BPF's {g,s}setsockopt() handling with wrong option lengths, from Stanislav Fomichev. 3) Utilize pahole v1.25 for the kernel's BTF generation to filter out inconsistent function prototypes, from Alan Maguire. 4) Various dyn-pointer verifier improvements to relax restrictions, from Daniel Rosenberg. 5) Add a new bpf_task_under_cgroup() kfunc for designated task, from Feng Zhou. 6) Unblock tests for arm64 BPF CI after ftrace supporting direct call, from Florent Revest. 7) Add XDP hint kfunc metadata for RX hash/timestamp for igc, from Jesper Dangaard Brouer. 8) Add several new dyn-pointer kfuncs to ease their usability, from Joanne Koong. 9) Add in-depth LRU internals description and dot function graph, from Joe Stringer. 10) Fix KCSAN report on bpf_lru_list when accessing node->ref, from Martin KaFai Lau. 11) Only dump unprivileged_bpf_disabled log warning upon write, from Kui-Feng Lee. 12) Extend test_progs to directly passing allow/denylist file, from Stephen Veiss. 13) Fix BPF trampoline memleak upon failure attaching to fentry, from Yafang Shao. 14) Fix emitting struct bpf_tcp_sock type in vmlinux BTF, from Yonghong Song. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (57 commits) bpf: Fix memleak due to fentry attach failure bpf: Remove bpf trampoline selector bpf, arm64: Support struct arguments in the BPF trampoline bpftool: JIT limited misreported as negative value on aarch64 bpf: fix calculation of subseq_idx during precision backtracking bpf: Remove anonymous union in bpf_kfunc_call_arg_meta bpf: Document EFAULT changes for sockopt selftests/bpf: Correctly handle optlen > 4096 selftests/bpf: Update EFAULT {g,s}etsockopt selftests bpf: Don't EFAULT for {g,s}setsockopt with wrong optlen libbpf: fix offsetof() and container_of() to work with CO-RE bpf: Address KCSAN report on bpf_lru_list bpf: Add --skip_encoding_btf_inconsistent_proto, --btf_gen_optimized to pahole flags for v1.25 selftests/bpf: Accept mem from dynptr in helper funcs bpf: verifier: Accept dynptr mem as mem in helpers selftests/bpf: Check overflow in optional buffer selftests/bpf: Test allowing NULL buffer in dynptr slice bpf: Allow NULL buffers in bpf_dynptr_slice(_rw) selftests/bpf: Add testcase for bpf_task_under_cgroup bpf: Add bpf_task_under_cgroup() kfunc ... ==================== Link: https://lore.kernel.org/r/20230515225603.27027-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
Yafang Shao
|
108598c39e |
bpf: Fix memleak due to fentry attach failure
If it fails to attach fentry, the allocated bpf trampoline image will be
left in the system. That can be verified by checking /proc/kallsyms.
This meamleak can be verified by a simple bpf program as follows:
SEC("fentry/trap_init")
int fentry_run()
{
return 0;
}
It will fail to attach trap_init because this function is freed after
kernel init, and then we can find the trampoline image is left in the
system by checking /proc/kallsyms.
$ tail /proc/kallsyms
ffffffffc0613000 t bpf_trampoline_6442453466_1 [bpf]
ffffffffc06c3000 t bpf_trampoline_6442453466_1 [bpf]
$ bpftool btf dump file /sys/kernel/btf/vmlinux | grep "FUNC 'trap_init'"
[2522] FUNC 'trap_init' type_id=119 linkage=static
$ echo $((6442453466 & 0x7fffffff))
2522
Note that there are two left bpf trampoline images, that is because the
libbpf will fallback to raw tracepoint if -EINVAL is returned.
Fixes:
|
||
Yafang Shao
|
47e79cbeea |
bpf: Remove bpf trampoline selector
After commit |
||
Andrii Nakryiko
|
d84b1a6708 |
bpf: fix calculation of subseq_idx during precision backtracking
Subsequent instruction index (subseq_idx) is an index of an instruction
that was verified/executed by verifier after the currently processed
instruction. It is maintained during precision backtracking processing
and is used to detect various subprog calling conditions.
This patch fixes the bug with incorrectly resetting subseq_idx to -1
when going from child state to parent state during backtracking. If we
don't maintain correct subseq_idx we can misidentify subprog calls
leading to precision tracking bugs.
One such case was triggered by test_global_funcs/global_func9 test where
global subprog call happened to be the very last instruction in parent
state, leading to subseq_idx==-1, triggering WARN_ONCE:
[ 36.045754] verifier backtracking bug
[ 36.045764] WARNING: CPU: 13 PID: 2073 at kernel/bpf/verifier.c:3503 __mark_chain_precision+0xcc6/0xde0
[ 36.046819] Modules linked in: aesni_intel(E) crypto_simd(E) cryptd(E) kvm_intel(E) kvm(E) irqbypass(E) i2c_piix4(E) serio_raw(E) i2c_core(E) crc32c_intel)
[ 36.048040] CPU: 13 PID: 2073 Comm: test_progs Tainted: G W OE 6.3.0-07976-g4d585f48ee6b-dirty #972
[ 36.048783] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
[ 36.049648] RIP: 0010:__mark_chain_precision+0xcc6/0xde0
[ 36.050038] Code: 3d 82 c6 05 bb 35 32 02 01 e8 66 21 ec ff 0f 0b b8 f2 ff ff ff e9 30 f5 ff ff 48 c7 c7 f3 61 3d 82 4c 89 0c 24 e8 4a 21 ec ff <0f> 0b 4c0
With the fix precision tracking across multiple states works correctly now:
mark_precise: frame0: last_idx 45 first_idx 38 subseq_idx -1
mark_precise: frame0: regs=r8 stack= before 44: (61) r7 = *(u32 *)(r10 -4)
mark_precise: frame0: regs=r8 stack= before 43: (85) call pc+41
mark_precise: frame0: regs=r8 stack= before 42: (07) r1 += -48
mark_precise: frame0: regs=r8 stack= before 41: (bf) r1 = r10
mark_precise: frame0: regs=r8 stack= before 40: (63) *(u32 *)(r10 -48) = r1
mark_precise: frame0: regs=r8 stack= before 39: (b4) w1 = 0
mark_precise: frame0: regs=r8 stack= before 38: (85) call pc+38
mark_precise: frame0: parent state regs=r8 stack=: R0_w=scalar() R1_w=map_value(off=4,ks=4,vs=8,imm=0) R6=1 R7_w=scalar() R8_r=P0 R10=fpm
mark_precise: frame0: last_idx 36 first_idx 28 subseq_idx 38
mark_precise: frame0: regs=r8 stack= before 36: (18) r1 = 0xffff888104f2ed14
mark_precise: frame0: regs=r8 stack= before 35: (85) call pc+33
mark_precise: frame0: regs=r8 stack= before 33: (18) r1 = 0xffff888104f2ed10
mark_precise: frame0: regs=r8 stack= before 32: (85) call pc+36
mark_precise: frame0: regs=r8 stack= before 31: (07) r1 += -4
mark_precise: frame0: regs=r8 stack= before 30: (bf) r1 = r10
mark_precise: frame0: regs=r8 stack= before 29: (63) *(u32 *)(r10 -4) = r7
mark_precise: frame0: regs=r8 stack= before 28: (4c) w7 |= w0
mark_precise: frame0: parent state regs=r8 stack=: R0_rw=scalar() R6=1 R7_rw=scalar() R8_rw=P0 R10=fp0 fp-48_r=mmmmmmmm
mark_precise: frame0: last_idx 27 first_idx 16 subseq_idx 28
mark_precise: frame0: regs=r8 stack= before 27: (85) call pc+31
mark_precise: frame0: regs=r8 stack= before 26: (b7) r1 = 0
mark_precise: frame0: regs=r8 stack= before 25: (b7) r8 = 0
Note how subseq_idx starts out as -1, then is preserved as 38 and then 28 as we
go up the parent state chain.
Reported-by: Alexei Starovoitov <ast@kernel.org>
Fixes:
|
||
Dave Marchevsky
|
4d585f48ee |
bpf: Remove anonymous union in bpf_kfunc_call_arg_meta
For kfuncs like bpf_obj_drop and bpf_refcount_acquire - which take
user-defined types as input - the verifier needs to track the specific
type passed in when checking a particular kfunc call. This requires
tracking (btf, btf_id) tuple. In commit
|
||
Stanislav Fomichev
|
29ebbba7d4 |
bpf: Don't EFAULT for {g,s}setsockopt with wrong optlen
With the way the hooks implemented right now, we have a special
condition: optval larger than PAGE_SIZE will expose only first 4k into
BPF; any modifications to the optval are ignored. If the BPF program
doesn't handle this condition by resetting optlen to 0,
the userspace will get EFAULT.
The intention of the EFAULT was to make it apparent to the
developers that the program is doing something wrong.
However, this inadvertently might affect production workloads
with the BPF programs that are not too careful (i.e., returning EFAULT
for perfectly valid setsockopt/getsockopt calls).
Let's try to minimize the chance of BPF program screwing up userspace
by ignoring the output of those BPF programs (instead of returning
EFAULT to the userspace). pr_info_once those cases to
the dmesg to help with figuring out what's going wrong.
Fixes:
|
||
Martin KaFai Lau
|
ee9fd0ac30 |
bpf: Address KCSAN report on bpf_lru_list
KCSAN reported a data-race when accessing node->ref. Although node->ref does not have to be accurate, take this chance to use a more common READ_ONCE() and WRITE_ONCE() pattern instead of data_race(). There is an existing bpf_lru_node_is_ref() and bpf_lru_node_set_ref(). This patch also adds bpf_lru_node_clear_ref() to do the WRITE_ONCE(node->ref, 0) also. ================================================================== BUG: KCSAN: data-race in __bpf_lru_list_rotate / __htab_lru_percpu_map_update_elem write to 0xffff888137038deb of 1 bytes by task 11240 on cpu 1: __bpf_lru_node_move kernel/bpf/bpf_lru_list.c:113 [inline] __bpf_lru_list_rotate_active kernel/bpf/bpf_lru_list.c:149 [inline] __bpf_lru_list_rotate+0x1bf/0x750 kernel/bpf/bpf_lru_list.c:240 bpf_lru_list_pop_free_to_local kernel/bpf/bpf_lru_list.c:329 [inline] bpf_common_lru_pop_free kernel/bpf/bpf_lru_list.c:447 [inline] bpf_lru_pop_free+0x638/0xe20 kernel/bpf/bpf_lru_list.c:499 prealloc_lru_pop kernel/bpf/hashtab.c:290 [inline] __htab_lru_percpu_map_update_elem+0xe7/0x820 kernel/bpf/hashtab.c:1316 bpf_percpu_hash_update+0x5e/0x90 kernel/bpf/hashtab.c:2313 bpf_map_update_value+0x2a9/0x370 kernel/bpf/syscall.c:200 generic_map_update_batch+0x3ae/0x4f0 kernel/bpf/syscall.c:1687 bpf_map_do_batch+0x2d9/0x3d0 kernel/bpf/syscall.c:4534 __sys_bpf+0x338/0x810 __do_sys_bpf kernel/bpf/syscall.c:5096 [inline] __se_sys_bpf kernel/bpf/syscall.c:5094 [inline] __x64_sys_bpf+0x43/0x50 kernel/bpf/syscall.c:5094 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd read to 0xffff888137038deb of 1 bytes by task 11241 on cpu 0: bpf_lru_node_set_ref kernel/bpf/bpf_lru_list.h:70 [inline] __htab_lru_percpu_map_update_elem+0x2f1/0x820 kernel/bpf/hashtab.c:1332 bpf_percpu_hash_update+0x5e/0x90 kernel/bpf/hashtab.c:2313 bpf_map_update_value+0x2a9/0x370 kernel/bpf/syscall.c:200 generic_map_update_batch+0x3ae/0x4f0 kernel/bpf/syscall.c:1687 bpf_map_do_batch+0x2d9/0x3d0 kernel/bpf/syscall.c:4534 __sys_bpf+0x338/0x810 __do_sys_bpf kernel/bpf/syscall.c:5096 [inline] __se_sys_bpf kernel/bpf/syscall.c:5094 [inline] __x64_sys_bpf+0x43/0x50 kernel/bpf/syscall.c:5094 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd value changed: 0x01 -> 0x00 Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 11241 Comm: syz-executor.3 Not tainted 6.3.0-rc7-syzkaller-00136-g6a66fdd29ea1 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/30/2023 ================================================================== Reported-by: syzbot+ebe648a84e8784763f82@syzkaller.appspotmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20230511043748.1384166-1-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
Daniel Rosenberg
|
2012c867c8 |
bpf: verifier: Accept dynptr mem as mem in helpers
This allows using memory retrieved from dynptrs with helper functions that accept ARG_PTR_TO_MEM. For instance, results from bpf_dynptr_data can be passed along to bpf_strncmp. Signed-off-by: Daniel Rosenberg <drosen@google.com> Link: https://lore.kernel.org/r/20230506013134.2492210-5-drosen@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
Daniel Rosenberg
|
3bda08b636 |
bpf: Allow NULL buffers in bpf_dynptr_slice(_rw)
bpf_dynptr_slice(_rw) uses a user provided buffer if it can not provide a pointer to a block of contiguous memory. This buffer is unused in the case of local dynptrs, and may be unused in other cases as well. There is no need to require the buffer, as the kfunc can just return NULL if it was needed and not provided. This adds another kfunc annotation, __opt, which combines with __sz and __szk to allow the buffer associated with the size to be NULL. If the buffer is NULL, the verifier does not check that the buffer is of sufficient size. Signed-off-by: Daniel Rosenberg <drosen@google.com> Link: https://lore.kernel.org/r/20230506013134.2492210-2-drosen@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
Feng Zhou
|
b5ad4cdc46 |
bpf: Add bpf_task_under_cgroup() kfunc
Add a kfunc that's similar to the bpf_current_task_under_cgroup. The difference is that it is a designated task. When hook sched related functions, sometimes it is necessary to specify a task instead of the current task. Signed-off-by: Feng Zhou <zhoufeng.zf@bytedance.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20230506031545.35991-2-zhoufeng.zf@bytedance.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
Linus Torvalds
|
e919a3f705 |
Minor tracing updates:
- Make buffer_percent read/write. The buffer_percent file is how users can state how long to block on the tracing buffer depending on how much is in the buffer. When it hits the "buffer_percent" it will wake the task waiting on the buffer. For some reason it was set to read-only. This was not noticed because testing was done as root without SELinux, but with SELinux it will prevent even root to write to it without having CAP_DAC_OVERRIDE. - The "touched_functions" was added this merge window, but one of the reasons for adding it was not implemented. That was to show what functions were not only touched, but had either a direct trampoline attached to it, or a kprobe or live kernel patching that can "hijack" the function to run a different function. The point is to know if there's functions in the kernel that may not be behaving as the kernel code shows. This can be used for debugging. TODO: Add this information to kernel oops too. -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZFUcrxQccm9zdGVkdEBn b29kbWlzLm9yZwAKCRAp5XQQmuv6qgOoAP0U2R6+jvA2ehQFb0UTCH9wEu2uEELA g2CkdPNdn6wJjAD+O1+v5nVkqSpsArjHOhv5OGYrgh+VSXK3Z8EpQ9vUVgg= =nfoh -----END PGP SIGNATURE----- Merge tag 'trace-v6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull more tracing updates from Steven Rostedt: - Make buffer_percent read/write. The buffer_percent file is how users can state how long to block on the tracing buffer depending on how much is in the buffer. When it hits the "buffer_percent" it will wake the task waiting on the buffer. For some reason it was set to read-only. This was not noticed because testing was done as root without SELinux, but with SELinux it will prevent even root to write to it without having CAP_DAC_OVERRIDE. - The "touched_functions" was added this merge window, but one of the reasons for adding it was not implemented. That was to show what functions were not only touched, but had either a direct trampoline attached to it, or a kprobe or live kernel patching that can "hijack" the function to run a different function. The point is to know if there's functions in the kernel that may not be behaving as the kernel code shows. This can be used for debugging. TODO: Add this information to kernel oops too. * tag 'trace-v6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: ftrace: Add MODIFIED flag to show if IPMODIFY or direct was attached tracing: Fix permissions for the buffer_percent file |
||
Linus Torvalds
|
b115d85a95 |
Locking changes in v6.4:
- Introduce local{,64}_try_cmpxchg() - a slightly more optimal primitive, which will be used in perf events ring-buffer code. - Simplify/modify rwsems on PREEMPT_RT, to address writer starvation. - Misc cleanups/fixes. Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmRUvUoRHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1hlIhAArP33rTKi+HAndQ3UHW3XtmHRxEEQTfiE wvIoN89h58QW4DGMeAV4ltafbIPQAkI233Aogwz903L0qbDV0Ro4OU3XJembRuWl LeOADKwYyypXdOa8XICuY9aIP7e1/h0DF3ySs7inLcwK9JCyAIxnsVHYej+hsRXA kZoXN98T3TR1C0V9UQy4SU3HI1lC3tsG3R9Ti9TnYUg3ygVXhRE9lOQ4kv9lFPVz BNuj2Blj7KNiVaY9kehrhO54THI7NmsCVZO44Rcl48I0KAcFulAmFcNlE7GnR8Nj thj38pU6XAFVHXG8MYjgE+Al+PnK48NtJxexCtHyGvGG4D2aLzRMnkolxAUCcVuK G+UBsQm3ybjYgHgt1zuN6ehcpT+5tULkDH8JA7vrgZYaVgxHzsUaHgYfCCWKnmUY mPR6aImEmYZwZVNLskhe0HT4mq244bp+VnWlnJ6LZK7t/itenvDhqnj7KTi4Bfej lTHplOTitV/8uCEW8V4pX+YTEenVsIQmTc/G3iIabXP/6HzLffA3q4vyW6vKIErE pqrpuFA0Z4GB+pU0mJXt7+I7zscDVthwI055jDyQBjA7IcdVGm2MjQ6xcNRW5FYN UynvaEMocue4ZO4WdFsd1ZBUd9VfoNzGQspBw46DhCL1MEQBYv36SKQNjej/9aRr ilVwqnOWI2s= =mM0A -----END PGP SIGNATURE----- Merge tag 'locking-core-2023-05-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: - Introduce local{,64}_try_cmpxchg() - a slightly more optimal primitive, which will be used in perf events ring-buffer code - Simplify/modify rwsems on PREEMPT_RT, to address writer starvation - Misc cleanups/fixes * tag 'locking-core-2023-05-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/atomic: Correct (cmp)xchg() instrumentation locking/x86: Define arch_try_cmpxchg_local() locking/arch: Wire up local_try_cmpxchg() locking/generic: Wire up local{,64}_try_cmpxchg() locking/atomic: Add generic try_cmpxchg{,64}_local() support locking/rwbase: Mitigate indefinite writer starvation locking/arch: Rename all internal __xchg() names to __arch_xchg() |
||
Steven Rostedt (Google)
|
6ce2c04fcb |
ftrace: Add MODIFIED flag to show if IPMODIFY or direct was attached
If a function had ever had IPMODIFY or DIRECT attached to it, where this is how live kernel patching and BPF overrides work, mark them and display an "M" in the enabled_functions and touched_functions files. This can be used for debugging. If a function had been modified and later there's a bug in the code related to that function, this can be used to know if the cause is possibly from a live kernel patch or a BPF program that changed the behavior of the code. Also update the documentation on the enabled_functions and touched_functions output, as it was missing direct callers and CALL_OPS. And include this new modify attribute. Link: https://lore.kernel.org/linux-trace-kernel/20230502213233.004e3ae4@gandalf.local.home Cc: Mark Rutland <mark.rutland@arm.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> |
||
Andrii Nakryiko
|
fde2a3882b |
bpf: support precision propagation in the presence of subprogs
Add support precision backtracking in the presence of subprogram frames in jump history. This means supporting a few different kinds of subprogram invocation situations, all requiring a slightly different handling in precision backtracking handling logic: - static subprogram calls; - global subprogram calls; - callback-calling helpers/kfuncs. For each of those we need to handle a few precision propagation cases: - what to do with precision of subprog returns (r0); - what to do with precision of input arguments; - for all of them callee-saved registers in caller function should be propagated ignoring subprog/callback part of jump history. N.B. Async callback-calling helpers (currently only bpf_timer_set_callback()) are transparent to all this because they set a separate async callback environment and thus callback's history is not shared with main program's history. So as far as all the changes in this commit goes, such helper is just a regular helper. Let's look at all these situation in more details. Let's start with static subprogram being called, using an exxerpt of a simple main program and its static subprog, indenting subprog's frame slightly to make everything clear. frame 0 frame 1 precision set ======= ======= ============= 9: r6 = 456; 10: r1 = 123; fr0: r6 11: call pc+10; fr0: r1, r6 22: r0 = r1; fr0: r6; fr1: r1 23: exit fr0: r6; fr1: r0 12: r1 = <map_pointer> fr0: r0, r6 13: r1 += r0; fr0: r0, r6 14: r1 += r6; fr0: r6 15: exit As can be seen above main function is passing 123 as single argument to an identity (`return x;`) subprog. Returned value is used to adjust map pointer offset, which forces r0 to be marked as precise. Then instruction #14 does the same for callee-saved r6, which will have to be backtracked all the way to instruction #9. For brevity, precision sets for instruction #13 and #14 are combined in the diagram above. First, for subprog calls, r0 returned from subprog (in frame 0) has to go into subprog's frame 1, and should be cleared from frame 0. So we go back into subprog's frame knowing we need to mark r0 precise. We then see that insn #22 sets r0 from r1, so now we care about marking r1 precise. When we pop up from subprog's frame back into caller at insn #11 we keep r1, as it's an argument-passing register, so we eventually find `10: r1 = 123;` and satify precision propagation chain for insn #13. This example demonstrates two sets of rules: - r0 returned after subprog call has to be moved into subprog's r0 set; - *static* subprog arguments (r1-r5) are moved back to caller precision set. Let's look at what happens with callee-saved precision propagation. Insn #14 mark r6 as precise. When we get into subprog's frame, we keep r6 in frame 0's precision set *only*. Subprog itself has its own set of independent r6-r10 registers and is not affected. When we eventually made our way out of subprog frame we keep r6 in precision set until we reach `9: r6 = 456;`, satisfying propagation. r6-r10 propagation is perhaps the simplest aspect, it always stays in its original frame. That's pretty much all we have to do to support precision propagation across *static subprog* invocation. Let's look at what happens when we have global subprog invocation. frame 0 frame 1 precision set ======= ======= ============= 9: r6 = 456; 10: r1 = 123; fr0: r6 11: call pc+10; # global subprog fr0: r6 12: r1 = <map_pointer> fr0: r0, r6 13: r1 += r0; fr0: r0, r6 14: r1 += r6; fr0: r6; 15: exit Starting from insn #13, r0 has to be precise. We backtrack all the way to insn #11 (call pc+10) and see that subprog is global, so was already validated in isolation. As opposed to static subprog, global subprog always returns unknown scalar r0, so that satisfies precision propagation and we drop r0 from precision set. We are done for insns #13. Now for insn #14. r6 is in precision set, we backtrack to `call pc+10;`. Here we need to recognize that this is effectively both exit and entry to global subprog, which means we stay in caller's frame. So we carry on with r6 still in precision set, until we satisfy it at insn #9. The only hard part with global subprogs is just knowing when it's a global func. Lastly, callback-calling helpers and kfuncs do simulate subprog calls, so jump history will have subprog instructions in between caller program's instructions, but the rules of propagating r0 and r1-r5 differ, because we don't actually directly call callback. We actually call helper/kfunc, which at runtime will call subprog, so the only difference between normal helper/kfunc handling is that we need to make sure to skip callback simulatinog part of jump history. Let's look at an example to make this clearer. frame 0 frame 1 precision set ======= ======= ============= 8: r6 = 456; 9: r1 = 123; fr0: r6 10: r2 = &callback; fr0: r6 11: call bpf_loop; fr0: r6 22: r0 = r1; fr0: r6 fr1: 23: exit fr0: r6 fr1: 12: r1 = <map_pointer> fr0: r0, r6 13: r1 += r0; fr0: r0, r6 14: r1 += r6; fr0: r6; 15: exit Again, insn #13 forces r0 to be precise. As soon as we get to `23: exit` we see that this isn't actually a static subprog call (it's `call bpf_loop;` helper call instead). So we clear r0 from precision set. For callee-saved register, there is no difference: it stays in frame 0's precision set, we go through insn #22 and #23, ignoring them until we get back to caller frame 0, eventually satisfying precision backtrack logic at insn #8 (`r6 = 456;`). Assuming callback needed to set r0 as precise at insn #23, we'd backtrack to insn #22, switching from r0 to r1, and then at the point when we pop back to frame 0 at insn #11, we'll clear r1-r5 from precision set, as we don't really do a subprog call directly, so there is no input argument precision propagation. That's pretty much it. With these changes, it seems like the only still unsupported situation for precision backpropagation is the case when program is accessing stack through registers other than r10. This is still left as unsupported (though rare) case for now. As for results. For selftests, few positive changes for bigger programs, cls_redirect in dynptr variant benefitting the most: [vmuser@archvm bpf]$ ./veristat -C ~/subprog-precise-before-results.csv ~/subprog-precise-after-results.csv -f @veristat.cfg -e file,prog,insns -f 'insns_diff!=0' File Program Insns (A) Insns (B) Insns (DIFF) ---------------------------------------- ------------- --------- --------- ---------------- pyperf600_bpf_loop.bpf.linked1.o on_event 2060 2002 -58 (-2.82%) test_cls_redirect_dynptr.bpf.linked1.o cls_redirect 15660 2914 -12746 (-81.39%) test_cls_redirect_subprogs.bpf.linked1.o cls_redirect 61620 59088 -2532 (-4.11%) xdp_synproxy_kern.bpf.linked1.o syncookie_tc 109980 86278 -23702 (-21.55%) xdp_synproxy_kern.bpf.linked1.o syncookie_xdp 97716 85147 -12569 (-12.86%) Cilium progress don't really regress. They don't use subprogs and are mostly unaffected, but some other fixes and improvements could have changed something. This doesn't appear to be the case: [vmuser@archvm bpf]$ ./veristat -C ~/subprog-precise-before-results-cilium.csv ~/subprog-precise-after-results-cilium.csv -e file,prog,insns -f 'insns_diff!=0' File Program Insns (A) Insns (B) Insns (DIFF) ------------- ------------------------------ --------- --------- ------------ bpf_host.o tail_nodeport_nat_ingress_ipv6 4983 5003 +20 (+0.40%) bpf_lxc.o tail_nodeport_nat_ingress_ipv6 4983 5003 +20 (+0.40%) bpf_overlay.o tail_nodeport_nat_ingress_ipv6 4983 5003 +20 (+0.40%) bpf_xdp.o tail_handle_nat_fwd_ipv6 12475 12504 +29 (+0.23%) bpf_xdp.o tail_nodeport_nat_ingress_ipv6 6363 6371 +8 (+0.13%) Looking at (somewhat anonymized) Meta production programs, we see mostly insignificant variation in number of instructions, with one program (syar_bind6_protect6) benefitting the most at -17%. [vmuser@archvm bpf]$ ./veristat -C ~/subprog-precise-before-results-fbcode.csv ~/subprog-precise-after-results-fbcode.csv -e prog,insns -f 'insns_diff!=0' Program Insns (A) Insns (B) Insns (DIFF) ------------------------ --------- --------- ---------------- on_request_context_event 597 585 -12 (-2.01%) read_async_py_stack 43789 43657 -132 (-0.30%) read_sync_py_stack 35041 37599 +2558 (+7.30%) rrm_usdt 946 940 -6 (-0.63%) sysarmor_inet6_bind 28863 28249 -614 (-2.13%) sysarmor_inet_bind 28845 28240 -605 (-2.10%) syar_bind4_protect4 154145 147640 -6505 (-4.22%) syar_bind6_protect6 165242 137088 -28154 (-17.04%) syar_task_exit_setgid 21289 19720 -1569 (-7.37%) syar_task_exit_setuid 21290 19721 -1569 (-7.37%) do_uprobe 19967 19413 -554 (-2.77%) tw_twfw_ingress 215877 204833 -11044 (-5.12%) tw_twfw_tc_in 215877 204833 -11044 (-5.12%) But checking duration (wall clock) differences, that is the actual time taken by verifier to validate programs, we see a sometimes dramatic improvements, all the way to about 16x improvements: [vmuser@archvm bpf]$ ./veristat -C ~/subprog-precise-before-results-meta.csv ~/subprog-precise-after-results-meta.csv -e prog,duration -s duration_diff^ | head -n20 Program Duration (us) (A) Duration (us) (B) Duration (us) (DIFF) ---------------------------------------- ----------------- ----------------- -------------------- tw_twfw_ingress 4488374 272836 -4215538 (-93.92%) tw_twfw_tc_in 4339111 268175 -4070936 (-93.82%) tw_twfw_egress 3521816 270751 -3251065 (-92.31%) tw_twfw_tc_eg 3472878 284294 -3188584 (-91.81%) balancer_ingress 343119 291391 -51728 (-15.08%) syar_bind6_protect6 78992 64782 -14210 (-17.99%) ttls_tc_ingress 11739 8176 -3563 (-30.35%) kprobe__security_inode_link 13864 11341 -2523 (-18.20%) read_sync_py_stack 21927 19442 -2485 (-11.33%) read_async_py_stack 30444 28136 -2308 (-7.58%) syar_task_exit_setuid 10256 8440 -1816 (-17.71%) Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230505043317.3629845-9-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
Andrii Nakryiko
|
c50c0b57a5 |
bpf: fix mark_all_scalars_precise use in mark_chain_precision
When precision backtracking bails out due to some unsupported sequence of instructions (e.g., stack access through register other than r10), we need to mark all SCALAR registers as precise to be safe. Currently, though, we mark SCALARs precise only starting from the state we detected unsupported condition, which could be one of the parent states of the actual current state. This will leave some registers potentially not marked as precise, even though they should. So make sure we start marking scalars as precise from current state (env->cur_state). Further, we don't currently detect a situation when we end up with some stack slots marked as needing precision, but we ran out of available states to find the instructions that populate those stack slots. This is akin the `i >= func->allocated_stack / BPF_REG_SIZE` check and should be handled similarly by falling back to marking all SCALARs precise. Add this check when we run out of states. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230505043317.3629845-8-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
Andrii Nakryiko
|
f655badf2a |
bpf: fix propagate_precision() logic for inner frames
Fix propagate_precision() logic to perform propagation of all necessary
registers and stack slots across all active frames *in one batch step*.
Doing this for each register/slot in each individual frame is wasteful,
but the main problem is that backtracking of instruction in any frame
except the deepest one just doesn't work. This is due to backtracking
logic relying on jump history, and available jump history always starts
(or ends, depending how you view it) in current frame. So, if
prog A (frame #0) called subprog B (frame #1) and we need to propagate
precision of, say, register R6 (callee-saved) within frame #0, we
actually don't even know where jump history that corresponds to prog
A even starts. We'd need to skip subprog part of jump history first to
be able to do this.
Luckily, with struct backtrack_state and __mark_chain_precision()
handling bitmasks tracking/propagation across all active frames at the
same time (added in previous patch), propagate_precision() can be both
fixed and sped up by setting all the necessary bits across all frames
and then performing one __mark_chain_precision() pass. This makes it
unnecessary to skip subprog parts of jump history.
We also improve logging along the way, to clearly specify which
registers' and slots' precision markings are propagated within which
frame. Each frame will have dedicated line and all registers and stack
slots from that frame will be reported in format similar to precision
backtrack regs/stack logging. E.g.:
frame 1: propagating r1,r2,r3,fp-8,fp-16
frame 0: propagating r3,r9,fp-120
Fixes:
|
||
Andrii Nakryiko
|
1ef22b6865 |
bpf: maintain bitmasks across all active frames in __mark_chain_precision
Teach __mark_chain_precision logic to maintain register/stack masks across all active frames when going from child state to parent state. Currently this should be mostly no-op, as precision backtracking usually bails out when encountering subprog entry/exit. It's not very apparent from the diff due to increased indentation, but the logic remains the same, except everything is done on specific `fr` frame index. Calls to bt_clear_reg() and bt_clear_slot() are replaced with frame-specific bt_clear_frame_reg() and bt_clear_frame_slot(), where frame index is passed explicitly, instead of using current frame number. We also adjust logging to emit affected frame number. And we also add better logging of human-readable register and stack slot masks, similar to previous patch. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230505043317.3629845-6-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
Andrii Nakryiko
|
d9439c21a9 |
bpf: improve precision backtrack logging
Add helper to format register and stack masks in more human-readable format. Adjust logging a bit during backtrack propagation and especially during forcing precision fallback logic to make it clearer what's going on (with log_level=2, of course), and also start reporting affected frame depth. This is in preparation for having more than one active frame later when precision propagation between subprog calls is added. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230505043317.3629845-5-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
Andrii Nakryiko
|
407958a0e9 |
bpf: encapsulate precision backtracking bookkeeping
Add struct backtrack_state and straightforward API around it to keep track of register and stack masks used and maintained during precision backtracking process. Having this logic separately allow to keep high-level backtracking algorithm cleaner, but also it sets us up to cleanly keep track of register and stack masks per frame, allowing (with some further logic adjustments) to perform precision backpropagation across multiple frames (i.e., subprog calls). Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230505043317.3629845-4-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
Andrii Nakryiko
|
e0bf462276 |
bpf: mark relevant stack slots scratched for register read instructions
When handling instructions that read register slots, mark relevant stack slots as scratched so that verifier log would contain those slots' states, in addition to currently emitted registers with stack slot offsets. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230505043317.3629845-3-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> |
||
Linus Torvalds
|
a1fd058b07 |
Five hotfixes. Three are cc:stable, two for this -rc cycle.
-----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZFLuDAAKCRDdBJ7gKXxA jk4KAP9ceSzcPrMejKeeWrkj0PoQzy8FMp3VhG9yaXkWPSNHUgD9EUG8J/lQftsH t39eKmn6FDuY2cLpFS8HCrlain9JcAE= =pn8p -----END PGP SIGNATURE----- Merge tag 'mm-hotfixes-stable-2023-05-03-16-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull hitfixes from Andrew Morton: "Five hotfixes. Three are cc:stable, two for this -rc cycle" * tag 'mm-hotfixes-stable-2023-05-03-16-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mm: change per-VMA lock statistics to be disabled by default MAINTAINERS: update Michal Simek's email mm/mempolicy: correctly update prev when policy is equal on mbind relayfs: fix out-of-bounds access in relay_file_read kasan: hw_tags: avoid invalid virt_to_page() |
||
Linus Torvalds
|
15fb96a35d |
- Some DAMON cleanups from Kefeng Wang
- Some KSM work from David Hildenbrand, to make the PR_SET_MEMORY_MERGE ioctl's behavior more similar to KSM's behavior. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZFLsxAAKCRDdBJ7gKXxA jl8yAQCqjstPsOULf9QN0z4bGAUhY+Wj4ERz1jbKSIuhFCJWiQEAgQvgRXObKjmi OtUB0Ek4CMDCQzbyIQ1Bhp3kxi6+Jgs= =AbyC -----END PGP SIGNATURE----- Merge tag 'mm-stable-2023-05-03-16-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull more MM updates from Andrew Morton: - Some DAMON cleanups from Kefeng Wang - Some KSM work from David Hildenbrand, to make the PR_SET_MEMORY_MERGE ioctl's behavior more similar to KSM's behavior. [ Andrew called these "final", but I suspect we'll have a series fixing up the fact that the last commit in the dmapools series in the previous pull seems to have unintentionally just reverted all the other commits in the same series.. - Linus ] * tag 'mm-stable-2023-05-03-16-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mm: hwpoison: coredump: support recovery from dump_user_range() mm/page_alloc: add some comments to explain the possible hole in __pageblock_pfn_to_page() mm/ksm: move disabling KSM from s390/gmap code to KSM code selftests/ksm: ksm_functional_tests: add prctl unmerge test mm/ksm: unmerge and clear VM_MERGEABLE when setting PR_SET_MEMORY_MERGE=0 mm/damon/paddr: fix missing folio_sz update in damon_pa_young() mm/damon/paddr: minor refactor of damon_pa_mark_accessed_or_deactivate() mm/damon/paddr: minor refactor of damon_pa_pageout() |
||
Linus Torvalds
|
b408242872 |
modules-6.4-rc1 v2
There is only one fix by Arnd far for modules pending which came in after the first pull request. The issue was found as part of some late compile tests with 0-day. I take it 0-day does some secondary late builds with after some initial ones. -----BEGIN PGP SIGNATURE----- iQJGBAABCgAwFiEENnNq2KuOejlQLZofziMdCjCSiKcFAmRStMcSHG1jZ3JvZkBr ZXJuZWwub3JnAAoJEM4jHQowkoinIx0QAJaNXQNpbyIfLxtc1ILXMFH+o6H6M//D 7FQVyZbz+Xc3dQAYsi7Ux/AhTZKEz1L6j1cGxPBVEHGiaVb4RDzVmdPk/kQpTjnl OdIdIPlMdLh1cuXl/sm1j5OW6gu9wxL13qxNuVfu/ADN09xRupuyruiXeA/8N2ca kaXgufMMipLx7NisecYJ21CFQeyVjxEkSvhzL1UBJLm6D+fS+0iWiL6V5Nc0hxpH RqZYZIK+KpBoTIYZbbR3+Gerev6gjbARh3/SY8WlfbQyKWG7eOULRBO8Urcs+x/a Kf3XAVma24tHF4M5vu9qW98w/ghNr7ytyI47o8XA+HfxA6BkKxPsWBumvOs0S5pW fT5YZ96oz85IfXipWy45xM+oZpcTxsnD7K6IYexDp7FO6458OkZazHED5djTboer e77GLkdSc+7gMZ2AB0EVSKb9iTrpsQV8pQgrzP0qj7Z99/2q9Rlsi1//3SKBNOAK mhQSbZ6m0rfdbA+wCS5efeA1roTZvHXJldHnsYyBzwcs7h5jLupqbKLiMKxpxxwk z1kdBcQa5jc3KlRGUkIXktay0eTLwGfIIA24p60Wi6cALHr9oISeLVZECxBz/0az NR6UUYrXSxvzU7dpyLPc+iOC8fbDdk50z02pASg9qjoc7yBJSkM+N/09AvBrQTyD 0Wm3C0aw7XYh =JUoF -----END PGP SIGNATURE----- Merge tag 'modules-6.4-rc1-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux Pull modules fix from Luis Chamberlain: "One fix by Arnd far for modules which came in after the first pull request. The issue was found as part of some late compile tests with 0-day. I take it 0-day does some secondary late builds with after some initial ones" * tag 'modules-6.4-rc1-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: module: include internal.h in module/dups.c |
||
Linus Torvalds
|
049a18f232 |
sysctl-6.4-rc1-v2
As mentioned on my first pull request for sysctl-next, for v6.4-rc1 we're very close to being able to deprecating register_sysctl_paths(). I was going to assess the situation after the first week of the merge window. That time is now and things are looking good. We only have one stragglers on the patch which had already an ACK for so I'm picking this up here now and the last patch is the one that uses an axe. Some careful eyeballing would be appreciated by others. If this doesn't get properly reviewed I can also just hold off on this in my tree for the next merge window. Either way is fine by me. I have boot tested the last patch and 0-day build completed successfully. -----BEGIN PGP SIGNATURE----- iQJGBAABCgAwFiEENnNq2KuOejlQLZofziMdCjCSiKcFAmRSsn0SHG1jZ3JvZkBr ZXJuZWwub3JnAAoJEM4jHQowkoinzzMQAK6ddUwQM32z6E2SY/Ku6ZDQJhVKxE+Y +HvghMqaGzr2eawEaASZzV6p//Q1aH4c2yaChyENa/O82QBXhbc2RBvdiAzQeJZx cUQ4C6Lc+BlpoB24Nes69F9j1LAEI5YXKMK911DKDu7LNNS7Ytxt1IOfM2RpyqRV 6+9vOvAqCSh9EEjZeZDrMlsYhBA+t3YIkU6JFMX7Upc2P7m//57inLsZyUZBqnou t9sfC0d1lDTZXZ0vSIk534VhoxXe1MkYERKkAciEprxbdNnqcsi4WMXKdXG6Mcpy O1ZuUXqndAfhTSHLkqNidtuDP29TTvcdz5tDfwmaJ3JUTt0cDvlC2T7J9WyXDfCZ XsR8Ik0/vEH/j9rVabF9fQ8DeTSLe9AgpaItHd6/LWI8UESs5k/wYi9O+7lhCf2p JZpXl3G1itKA9ABMD1GUEtC5hfWTUxkTEgPkXbqFuKtCl0mI8lD3FPFRbuhYNLa8 7R/6SN9h6/43C9Ffp2bY3c/gKQj51QlvGOSctahvdYSFkG8KXKhEnsDu0V6eLM9G QYrhvht8o9jbuKJKtEno9fjTlClVvXp5vARQQyy9OHyTuhU/Y8q2lH2BCZtFYZrM cpIFdfqB18tZmo7QfNHZPLfws2j3MqsbXFG8Q23BB7cJp1P5QdLJjmjnqbMq8xmk 3kdRMZ2Xkfcx =8VOt -----END PGP SIGNATURE----- Merge tag 'sysctl-6.4-rc1-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux Pull more sysctl updates from Luis Chamberlain: "As mentioned on my first pull request for sysctl-next, for v6.4-rc1 we're very close to being able to deprecating register_sysctl_paths(). I was going to assess the situation after the first week of the merge window. That time is now and things are looking good. We only have one which had already an ACK for so I'm picking this up here now and the last patch is the one that uses an axe. I have boot tested the last patch and 0-day build completed successfully" * tag 'sysctl-6.4-rc1-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: sysctl: remove register_sysctl_paths() kernel: pid_namespace: simplify sysctls with register_sysctl() |
||
Linus Torvalds
|
fa31fc82fb |
More power management updates for 6.4-rc1
- Make test_resume work again after the changes that made hibernation open the snapshot device in exclusive mode (Chen Yu). - Clean up code in several places in intel_idle (Artem Bityutskiy). -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmRSbLgSHHJqd0Byand5 c29ja2kubmV0AAoJEILEb/54YlRxbvUP/3WeXEfulhN2FRuqmfqLtWNrpMwsoFpd 9BQRokAQdKL3V2Q+YgdEH22+cB2LNAo5ty1+SHXjzsiExxBYWbKd6/AwOwwmFVPq I8uHS5pSqYQrZLI7eA1ZhFiTotePOpLyHpsHO3lezxXMlvEw30tY8g09WTH1F2Cz uzgTB1NTHRZaHWObrjvPkq3IERAbtF1xAQVPMtyWzs7IoCOlLxsKtHpfLBwGFYpZ 1U7dbAFoGuQYjCUE9i1wbQdee9elRhPDJ6TGCx8rqtRRybxPZOdz1M947K/N5Q23 OtK1HHfTIFHoi36sjwfEdZC89RGr7CI3hc5CAgexxwtsCw4gpIgvFoNXjCKW1q0J +C5ztCntxTGzWi37pnriV3I0NlTjTdEAoS2VDNRGlj0Vvrrx5S5H35qjoqD66N3a +zJ9eEYl5WGJlmZWxUvycJmS0PdPp755tmxrBRWqm7nr+oVJWKY8j2OKjlADCgoG k74zf1dp4zZJHIml1QpaRp0EsueiHs66Xu52VoFKyrAp0+ytYtnC1/SeKoYF4jDg PoTJmIT5ve8Nq8vwYbrg/z497J3bKHfbf1LPxDPNHVB6gx4Nv254qUNaM8oyic9j aHNwna6IAl+BshickaR0lUccJLnBAgPWyxnwFlfHaDbAGVtLNLSVFYlwr7xiUvlo t9eB4FZNGKMU =CwBd -----END PGP SIGNATURE----- Merge tag 'pm-6.4-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more power management updates from Rafael Wysocki: "These fix a hibernation test mode regression and clean up the intel_idle driver. Specifics: - Make test_resume work again after the changes that made hibernation open the snapshot device in exclusive mode (Chen Yu) - Clean up code in several places in intel_idle (Artem Bityutskiy)" * tag 'pm-6.4-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: intel_idle: mark few variables as __read_mostly intel_idle: do not sprinkle module parameter definitions around intel_idle: fix confusing message intel_idle: improve C-state flags handling robustness intel_idle: further intel_idle_init_cstates_icpu() cleanup intel_idle: clean up intel_idle_init_cstates_icpu() intel_idle: use pr_info() instead of printk() PM: hibernate: Do not get block device exclusively in test_resume mode PM: hibernate: Turn snapshot_test into global variable |
||
Ondrej Mosnacek
|
4f94559f40 |
tracing: Fix permissions for the buffer_percent file
This file defines both read and write operations, yet it is being
created as read-only. This means that it can't be written to without the
CAP_DAC_OVERRIDE capability. Fix the permissions to allow root to write
to it without the need to override DAC perms.
Link: https://lore.kernel.org/linux-trace-kernel/20230503140114.3280002-1-omosnace@redhat.com
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Fixes:
|
||
Arnd Bergmann
|
0b891c83d8 |
module: include internal.h in module/dups.c
Two newly introduced functions are declared in a header that is not
included before the definition, causing a warning with sparse or
'make W=1':
kernel/module/dups.c:118:6: error: no previous prototype for 'kmod_dup_request_exists_wait' [-Werror=missing-prototypes]
118 | bool kmod_dup_request_exists_wait(char *module_name, bool wait, int *dup_ret)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
kernel/module/dups.c:220:6: error: no previous prototype for 'kmod_dup_request_announce' [-Werror=missing-prototypes]
220 | void kmod_dup_request_announce(char *module_name, int ret)
| ^~~~~~~~~~~~~~~~~~~~~~~~~
Add an explicit include to ensure the prototypes match.
Fixes:
|
||
Luis Chamberlain
|
9e7c73c0b9 |
kernel: pid_namespace: simplify sysctls with register_sysctl()
register_sysctl_paths() is only required if your child (directories) have entries and pid_namespace does not. So use register_sysctl_init() instead where we don't care about the return value and use register_sysctl() where we do. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Acked-by: Jeff Xu <jeffxu@google.com> Link: https://lore.kernel.org/r/20230302202826.776286-9-mcgrof@kernel.org |
||
Zhang Zhengming
|
43ec16f145 |
relayfs: fix out-of-bounds access in relay_file_read
There is a crash in relay_file_read, as the var from
point to the end of last subbuf.
The oops looks something like:
pc : __arch_copy_to_user+0x180/0x310
lr : relay_file_read+0x20c/0x2c8
Call trace:
__arch_copy_to_user+0x180/0x310
full_proxy_read+0x68/0x98
vfs_read+0xb0/0x1d0
ksys_read+0x6c/0xf0
__arm64_sys_read+0x20/0x28
el0_svc_common.constprop.3+0x84/0x108
do_el0_svc+0x74/0x90
el0_svc+0x1c/0x28
el0_sync_handler+0x88/0xb0
el0_sync+0x148/0x180
We get the condition by analyzing the vmcore:
1). The last produced byte and last consumed byte
both at the end of the last subbuf
2). A softirq calls function(e.g __blk_add_trace)
to write relay buffer occurs when an program is calling
relay_file_read_avail().
relay_file_read
relay_file_read_avail
relay_file_read_consume(buf, 0, 0);
//interrupted by softirq who will write subbuf
....
return 1;
//read_start point to the end of the last subbuf
read_start = relay_file_read_start_pos
//avail is equal to subsize
avail = relay_file_read_subbuf_avail
//from points to an invalid memory address
from = buf->start + read_start
//system is crashed
copy_to_user(buffer, from, avail)
Link: https://lkml.kernel.org/r/20230419040203.37676-1-zhang.zhengming@h3c.com
Fixes:
|
||
David Hildenbrand
|
24139c07f4 |
mm/ksm: unmerge and clear VM_MERGEABLE when setting PR_SET_MEMORY_MERGE=0
Patch series "mm/ksm: improve PR_SET_MEMORY_MERGE=0 handling and cleanup disabling KSM", v2. (1) Make PR_SET_MEMORY_MERGE=0 unmerge pages like setting MADV_UNMERGEABLE does, (2) add a selftest for it and (3) factor out disabling of KSM from s390/gmap code. This patch (of 3): Let's unmerge any KSM pages when setting PR_SET_MEMORY_MERGE=0, and clear the VM_MERGEABLE flag from all VMAs -- just like KSM would. Of course, only do that if we previously set PR_SET_MEMORY_MERGE=1. Link: https://lkml.kernel.org/r/20230422205420.30372-1-david@redhat.com Link: https://lkml.kernel.org/r/20230422205420.30372-2-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Stefan Roesch <shr@devkernel.io> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Rik van Riel <riel@surriel.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
Kui-Feng Lee
|
fedf99200a |
bpf: Print a warning only if writing to unprivileged_bpf_disabled.
Only print the warning message if you are writing to "/proc/sys/kernel/unprivileged_bpf_disabled". The kernel may print an annoying warning when you read "/proc/sys/kernel/unprivileged_bpf_disabled" saying WARNING: Unprivileged eBPF is enabled with eIBRS on, data leaks possible via Spectre v2 BHB attacks! However, this message is only meaningful when the feature is disabled or enabled. Signed-off-by: Kui-Feng Lee <kuifeng@meta.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20230502181418.308479-1-kuifeng@meta.com |
||
Linus Torvalds
|
58390c8ce1 |
IOMMU Updates for Linux 6.4
Including: - Convert to platform remove callback returning void - Extend changing default domain to normal group - Intel VT-d updates: - Remove VT-d virtual command interface and IOASID - Allow the VT-d driver to support non-PRI IOPF - Remove PASID supervisor request support - Various small and misc cleanups - ARM SMMU updates: - Device-tree binding updates: * Allow Qualcomm GPU SMMUs to accept relevant clock properties * Document Qualcomm 8550 SoC as implementing an MMU-500 * Favour new "qcom,smmu-500" binding for Adreno SMMUs - Fix S2CR quirk detection on non-architectural Qualcomm SMMU implementations - Acknowledge SMMUv3 PRI queue overflow when consuming events - Document (in a comment) why ATS is disabled for bypass streams - AMD IOMMU updates: - 5-level page-table support - NUMA awareness for memory allocations - Unisoc driver: Support for reattaching an existing domain - Rockchip driver: Add missing set_platform_dma_ops callback - Mediatek driver: Adjust the dma-ranges - Various other small fixes and cleanups -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmRONeAACgkQK/BELZcB GuPmpw/8C9ruxQ0JU5rcDBXQGvos4gMmxlbELMrBpbbiTtdb35xchpKfdhnECGIF k2SrrcF40R/S82SyzNU/eZtGKirtcXvGFraUFgu/QdCcnnqpRHs+IJMXX2NJP+it +0wO1uiInt3CN1ERcR4F31cDKiWjDG8bvQVE5LIyiy4KrIU5ld2G91Fkaa0R13Au 6H+/wKkcUC6OyaGE6wPx474xBkapT20vj5AIQuAWisXJJR0wbBon1sUTo/IRKsU+ IkNxH0W+1PNImJ+crAdf/nkOlyqoChY4ww6cm07LrOsBLIsX5bCqXfL4HvKthElD MEgk2SN5kfjfR5Vf29W4hZVM1CT8VbhO41I7OzaZ6X6RU2PXoldPKlgKtZGeSKn1 9bcMpSgB0BtbttvBevSkxTo5KHFozXS2DG3DFoMB3yFMme8Th0LrhBZ9oB7NIPNw ntMo4K75vviC6Vvzjy4Anj/+y+Zm3W6wDDP7F12O6WZLkK5s4hrSsHUm/MQnnKQP muJlG870RnSl73xUQZe3cuBxktXuJ3EHqqYIPE0npzvauu8hhWcis3opf2Y+U2s8 aBCCIgp5kTKqjHLh2e4lNCKZf1/b/dhxRcRBQhpAIb8YsjMlIJyM+G8Jz6K6gBga 5Ld+68UQ3oHJwoLV1HCFN8jbpQ9KZn1s9+h3yrYjRAcLNiFb3nU= =OvTo -----END PGP SIGNATURE----- Merge tag 'iommu-updates-v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu updates from Joerg Roedel: - Convert to platform remove callback returning void - Extend changing default domain to normal group - Intel VT-d updates: - Remove VT-d virtual command interface and IOASID - Allow the VT-d driver to support non-PRI IOPF - Remove PASID supervisor request support - Various small and misc cleanups - ARM SMMU updates: - Device-tree binding updates: * Allow Qualcomm GPU SMMUs to accept relevant clock properties * Document Qualcomm 8550 SoC as implementing an MMU-500 * Favour new "qcom,smmu-500" binding for Adreno SMMUs - Fix S2CR quirk detection on non-architectural Qualcomm SMMU implementations - Acknowledge SMMUv3 PRI queue overflow when consuming events - Document (in a comment) why ATS is disabled for bypass streams - AMD IOMMU updates: - 5-level page-table support - NUMA awareness for memory allocations - Unisoc driver: Support for reattaching an existing domain - Rockchip driver: Add missing set_platform_dma_ops callback - Mediatek driver: Adjust the dma-ranges - Various other small fixes and cleanups * tag 'iommu-updates-v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (82 commits) iommu: Remove iommu_group_get_by_id() iommu: Make iommu_release_device() static iommu/vt-d: Remove BUG_ON in dmar_insert_dev_scope() iommu/vt-d: Remove a useless BUG_ON(dev->is_virtfn) iommu/vt-d: Remove BUG_ON in map/unmap() iommu/vt-d: Remove BUG_ON when domain->pgd is NULL iommu/vt-d: Remove BUG_ON in handling iotlb cache invalidation iommu/vt-d: Remove BUG_ON on checking valid pfn range iommu/vt-d: Make size of operands same in bitwise operations iommu/vt-d: Remove PASID supervisor request support iommu/vt-d: Use non-privileged mode for all PASIDs iommu/vt-d: Remove extern from function prototypes iommu/vt-d: Do not use GFP_ATOMIC when not needed iommu/vt-d: Remove unnecessary checks in iopf disabling path iommu/vt-d: Move PRI handling to IOPF feature path iommu/vt-d: Move pfsid and ats_qdep calculation to device probe path iommu/vt-d: Move iopf code from SVA to IOPF enabling path iommu/vt-d: Allow SVA with device-specific IOPF dmaengine: idxd: Add enable/disable device IOPF feature arm64: dts: mt8186: Add dma-ranges for the parent "soc" node ... |
||
Linus Torvalds
|
10de638d8e |
s390 updates for the 6.4 merge window
- Add support for stackleak feature. Also allow specifying architecture-specific stackleak poison function to enable faster implementation. On s390, the mvc-based implementation helps decrease typical overhead from a factor of 3 to just 25% - Convert all assembler files to use SYM* style macros, deprecating the ENTRY() macro and other annotations. Select ARCH_USE_SYM_ANNOTATIONS - Improve KASLR to also randomize module and special amode31 code base load addresses - Rework decompressor memory tracking to support memory holes and improve error handling - Add support for protected virtualization AP binding - Add support for set_direct_map() calls - Implement set_memory_rox() and noexec module_alloc() - Remove obsolete overriding of mem*() functions for KASAN - Rework kexec/kdump to avoid using nodat_stack to call purgatory - Convert the rest of the s390 code to use flexible-array member instead of a zero-length array - Clean up uaccess inline asm - Enable ARCH_HAS_MEMBARRIER_SYNC_CORE - Convert to using CONFIG_FUNCTION_ALIGNMENT and enable DEBUG_FORCE_FUNCTION_ALIGN_64B - Resolve last_break in userspace fault reports - Simplify one-level sysctl registration - Clean up branch prediction handling - Rework CPU counter facility to retrieve available counter sets just once - Other various small fixes and improvements all over the code -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAmRM8pwACgkQjYWKoQLX FBjV1AgAlvAhu1XkwOdwqdT4GqE8pcN4XXzydog1MYihrSO2PdgWAxpEW7o2QURN W+3xa6RIqt7nX2YBiwTanMZ12TYaFY7noGl3eUpD/NhueprweVirVl7VZUEuRoW/ j0mbx77xsVzLfuDFxkpVwE6/j+tTO78kLyjUHwcN9rFVUaL7/orJneDJf+V8fZG0 sHLOv0aljF7Jr2IIkw82lCmW/vdk7k0dACWMXK2kj1H3dIK34B9X4AdKDDf/WKXk /OSElBeZ93tSGEfNDRIda6iR52xocROaRnQAaDtargKFl9VO0/dN9ADxO+SLNHjN pFE/9VD6xT/xo4IuZZh/Z3TcYfiLvA== =Geqx -----END PGP SIGNATURE----- Merge tag 's390-6.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Vasily Gorbik: - Add support for stackleak feature. Also allow specifying architecture-specific stackleak poison function to enable faster implementation. On s390, the mvc-based implementation helps decrease typical overhead from a factor of 3 to just 25% - Convert all assembler files to use SYM* style macros, deprecating the ENTRY() macro and other annotations. Select ARCH_USE_SYM_ANNOTATIONS - Improve KASLR to also randomize module and special amode31 code base load addresses - Rework decompressor memory tracking to support memory holes and improve error handling - Add support for protected virtualization AP binding - Add support for set_direct_map() calls - Implement set_memory_rox() and noexec module_alloc() - Remove obsolete overriding of mem*() functions for KASAN - Rework kexec/kdump to avoid using nodat_stack to call purgatory - Convert the rest of the s390 code to use flexible-array member instead of a zero-length array - Clean up uaccess inline asm - Enable ARCH_HAS_MEMBARRIER_SYNC_CORE - Convert to using CONFIG_FUNCTION_ALIGNMENT and enable DEBUG_FORCE_FUNCTION_ALIGN_64B - Resolve last_break in userspace fault reports - Simplify one-level sysctl registration - Clean up branch prediction handling - Rework CPU counter facility to retrieve available counter sets just once - Other various small fixes and improvements all over the code * tag 's390-6.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (118 commits) s390/stackleak: provide fast __stackleak_poison() implementation stackleak: allow to specify arch specific stackleak poison function s390: select ARCH_USE_SYM_ANNOTATIONS s390/mm: use VM_FLUSH_RESET_PERMS in module_alloc() s390: wire up memfd_secret system call s390/mm: enable ARCH_HAS_SET_DIRECT_MAP s390/mm: use BIT macro to generate SET_MEMORY bit masks s390/relocate_kernel: adjust indentation s390/relocate_kernel: use SYM* macros instead of ENTRY(), etc. s390/entry: use SYM* macros instead of ENTRY(), etc. s390/purgatory: use SYM* macros instead of ENTRY(), etc. s390/kprobes: use SYM* macros instead of ENTRY(), etc. s390/reipl: use SYM* macros instead of ENTRY(), etc. s390/head64: use SYM* macros instead of ENTRY(), etc. s390/earlypgm: use SYM* macros instead of ENTRY(), etc. s390/mcount: use SYM* macros instead of ENTRY(), etc. s390/crc32le: use SYM* macros instead of ENTRY(), etc. s390/crc32be: use SYM* macros instead of ENTRY(), etc. s390/crypto,chacha: use SYM* macros instead of ENTRY(), etc. s390/amode31: use SYM* macros instead of ENTRY(), etc. ... |
||
Linus Torvalds
|
b28e6315a0 |
dma-mapping updates for Linux 6.4
- fix a PageHighMem check in dma-coherent initialization (Doug Berger) - clean up the coherency defaul initialiation (Jiaxun Yang) - add cacheline to user/kernel dma-debug space dump messages (Desnes Nunes, Geert Uytterhoeve) - swiotlb statistics improvements (Michael Kelley) - misc cleanups (Petr Tesarik) -----BEGIN PGP SIGNATURE----- iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmRLYsoLHGhjaEBsc3Qu ZGUACgkQD55TZVIEUYP4+RAAwpIqI198CrPxodCuBdwetuxznwncdwFvU3W+NQLF cC5gDeUB2ZZevVh3moKITV7gXHrbTJF7jQs9jpWV0QEA5APzu0WDf3Y0m4sXPVpn E9jS3jGJyntZ9rIMzHFs/lguI37xzT1YRAHAYgoZ84b7K/9g94NgEE2HecfNKVqZ D6PN0UJcA4KQo+5UJ7MWiQxWM3QAwVfSKsP1mXv51tiRGo4UUzNW77Ej2nKRJjhK wDNiZ+08khfeS2BuF9J2ebAzpgma5EgweH2z7zmx8Ch5t4Cx6hVAQ4Z6axbZMGjP HxXPw5rIwZTnQYoaGU86BrxrFH2j2bb963kWoDzliH+4PQrJ/iIEpkF7vu5Y2oWr WtXdOo6CsdQh1rT1UWA87ZYDtkWgj3/ITv5xJrXf8VyD9WHHSPst616XHLzBLGzo Hc+lAPhnVm59XZhQbVgXZy37Eqa9qHEG6GIRUkwD13nttSSfLfizO0IlXlH+awQV 2A+TjbAt2lneUaRzMPfxG/yFt3rPqbBfSWj3o2ClPPn9sKksKxj7IjNW0v81Ztq/ H6UmYRuq+wlQJzlwiF8+6SzoBXObztrmtIa2ipiM5k+xePG1jsPGFLm98UMlPcxN 5IMz78DQ/hE3K3fKRt6clImd98xq5R0H9iUQPor2I7C/67fpTjThDRdHDUina1tk Oxo= =vAit -----END PGP SIGNATURE----- Merge tag 'dma-mapping-6.4-2023-04-28' of git://git.infradead.org/users/hch/dma-mapping Pull dma-mapping updates from Christoph Hellwig: - fix a PageHighMem check in dma-coherent initialization (Doug Berger) - clean up the coherency defaul initialiation (Jiaxun Yang) - add cacheline to user/kernel dma-debug space dump messages (Desnes Nunes, Geert Uytterhoeve) - swiotlb statistics improvements (Michael Kelley) - misc cleanups (Petr Tesarik) * tag 'dma-mapping-6.4-2023-04-28' of git://git.infradead.org/users/hch/dma-mapping: swiotlb: Omit total_used and used_hiwater if !CONFIG_DEBUG_FS swiotlb: track and report io_tlb_used high water marks in debugfs swiotlb: fix debugfs reporting of reserved memory pools swiotlb: relocate PageHighMem test away from rmem_swiotlb_setup of: address: always use dma_default_coherent for default coherency dma-mapping: provide CONFIG_ARCH_DMA_DEFAULT_COHERENT dma-mapping: provide a fallback dma_default_coherent dma-debug: Use %pa to format phys_addr_t dma-debug: add cacheline to user/kernel space dump messages dma-debug: small dma_debug_entry's comment and variable name updates dma-direct: cleanup parameters to dma_direct_optimal_gfp_mask |
||
Linus Torvalds
|
7d8d20191c |
Timekeeping and clocksource/event driver updates the second batch:
- A trivial documentation fix in the timekeeping core - A really boring set of small fixes, enhancements and cleanups in the drivers code. No new clocksource/clockevent drivers for a change. -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmRLuTsTHHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoQ+vEACSlqE5SN+6SxNQOwWcou79d1loB0Lk 3kSlFvRH9CdPDdW5a0Qnr3YJx4mFXrN9mMdFsywhl5NGrZQcH3nGPEYN74B3ynhP WpE5PSDJDVOA9F/yK6kmf5xX39RPh0aVy+C6ShaHD/anqwX2mTlXVBAg/3nOGeNy iHNYHzP4AtQfE+EtgbEPEZaOUpzmGL/dZb1HAzJaFU1QBmsrXWHLs4xqGUR0A36+ 1I0TGK53WVSXHvEVciTx4lH7mHR1xzR3LvnotdET6rRsqLREptosqA4nBRqYZLGK uF+jNxVE/0OwVzge5gPvwL3YSAjiln9cZjhA/q7z3L/pdoj/kR3hXv4XyXGrLPN6 L371RA/RLtjkrBb/rHcB/VNADBmtwLQjo7gJJ3UMzIuuvnkokzQrl3fxTxJjmegK ypR8dpMUaO5vlwIGqwSuQyKxkNEeuNzm2fv84IpZJNSKoQj5nGHPmk+0u6FLhJeG sqvIfDfuH/+Hc8fxbG5BKBu5lNvmCD4MZ3xxf3Wv80fykJBX6dvJs30B/iuJFQXr VylbUbxddCNjdHGtByswY5tLGfpWuou0g2XWqtsEB5P0aLs54R0gaoDeTPuBTzJW Io4tHnvRu7nZCSncxzHUuUfnve0WjMDBgJeSfa2Rx4Qz8M7G5l3XQLO4n+iFGzI5 gdYnrztBLSegww== =LWO6 -----END PGP SIGNATURE----- Merge tag 'timers-core-2023-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull more timer updates from Thomas Gleixner: "Timekeeping and clocksource/event driver updates the second batch: - A trivial documentation fix in the timekeeping core - A really boring set of small fixes, enhancements and cleanups in the drivers code. No new clocksource/clockevent drivers for a change" * tag 'timers-core-2023-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timekeeping: Fix references to nonexistent ktime_get_fast_ns() dt-bindings: timer: rockchip: Add rk3588 compatible dt-bindings: timer: rockchip: Drop superfluous rk3288 compatible clocksource/drivers/ti: Use of_property_read_bool() for boolean properties clocksource/drivers/timer-ti-dm: Fix finding alwon timer clocksource/drivers/davinci: Fix memory leak in davinci_timer_register when init fails clocksource/drivers/stm32-lp: Drop of_match_ptr for ID table clocksource/drivers/timer-ti-dm: Convert to platform remove callback returning void clocksource/drivers/timer-tegra186: Convert to platform remove callback returning void clocksource/drivers/timer-ti-dm: Improve error message in .remove clocksource/drivers/timer-stm32-lp: Mark driver as non-removable clocksource/drivers/sh_mtu2: Mark driver as non-removable clocksource/drivers/timer-ti-dm: Use of_address_to_resource() clocksource/drivers/timer-imx-gpt: Remove non-DT function clocksource/drivers/timer-mediatek: Split out CPUXGPT timers clocksource/drivers/exynos_mct: Explicitly return 0 for shared timer |
||
Linus Torvalds
|
86e98ed15b |
cgroup changes for v6.4-rc1
* cpuset changes including the fix for an incorrect interaction with CPU hotplug and an optimization. * Other doc and cosmetic changes. -----BEGIN PGP SIGNATURE----- iIQEABYIACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCZErfng4cdGpAa2VybmVs Lm9yZwAKCRCxYfJx3gVYGVVtAQCDycK4VSgc4nsFPG1vh1Oy1A723ciEUwAbKmV/ F1n7xwEA68FiDvE29LpMJJuYP9HnX0A5zRMyNnb52kN9jmgcEQI= =ALol -----END PGP SIGNATURE----- Merge tag 'cgroup-for-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: - cpuset changes including the fix for an incorrect interaction with CPU hotplug and an optimization - Other doc and cosmetic changes * tag 'cgroup-for-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: docs: cgroup-v1/cpusets: update libcgroup project link cgroup/cpuset: Minor updates to test_cpuset_prs.sh cgroup/cpuset: Include offline CPUs when tasks' cpumasks in top_cpuset are updated cgroup/cpuset: Skip task update if hotplug doesn't affect current cpuset cpuset: Clean up cpuset_node_allowed cgroup: bpf: use cgroup_lock()/cgroup_unlock() wrappers |
||
Linus Torvalds
|
cd546fa325 |
workqueue changes for v6.4-rc1
Mostly changes from Petr to improve warning and error reporting. Workqueue now reports more of the relevant failures with better context which should help debugging. -----BEGIN PGP SIGNATURE----- iIQEABYIACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCZErbOg4cdGpAa2VybmVs Lm9yZwAKCRCxYfJx3gVYGfipAP9BXrUBI99uZL7XjSEe6rLWa0STODyWh67ikR+0 nUO22QEA+ttt5ecLZyopYB18Le06VBBW7J5cs0Cg83bCBTGb4wE= =gHbc -----END PGP SIGNATURE----- Merge tag 'wq-for-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue updates from Tejun Heo: "Mostly changes from Petr to improve warning and error reporting. Workqueue now reports more of the relevant failures with better context which should help debugging" * tag 'wq-for-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: Introduce show_freezable_workqueues workqueue: Print backtraces from CPUs with hung CPU bound workqueues workqueue: Warn when a rescuer could not be created workqueue: Interrupted create_worker() is not a repeated event workqueue: Warn when a new worker could not be created workqueue: Fix hung time report of worker pools workqueue: Simplify a pr_warn() call in wq_select_unbound_cpu() MAINTAINERS: Add workqueue_internal.h to the WORKQUEUE entry |
||
Sebastian Andrzej Siewior
|
286deb7ec0 |
locking/rwbase: Mitigate indefinite writer starvation
On PREEMPT_RT, rw_semaphore and rwlock_t locks are unfair to writers. Readers can indefinitely acquire the lock unless the writer fully acquired the lock, which might never happen if there is always a reader in the critical section owning the lock. Mel Gorman reported that since LTP-20220121 the dio_truncate test case went from having 1 reader to having 16 readers and that number of readers is sufficient to prevent the down_write ever succeeding while readers exist. Eventually the test is killed after 30 minutes as a failure. Mel proposed a timeout to limit how long a writer can be blocked until the reader is forced into the slowpath. Thomas argued that there is no added value by providing this timeout. From a PREEMPT_RT point of view, there are no critical rw_semaphore or rwlock_t locks left where the reader must be preferred. Mitigate indefinite writer starvation by forcing the READER into the slowpath once the WRITER attempts to acquire the lock. Reported-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Mel Gorman <mgorman@techsingularity.net> Link: https://lore.kernel.org/877cwbq4cq.ffs@tglx Link: https://lore.kernel.org/r/20230321161140.HMcQEhHb@linutronix.de Cc: Linus Torvalds <torvalds@linux-foundation.org> |
||
Linus Torvalds
|
5ea8abf589 |
tracing/tools: Updates for 6.4
- Add auto-analysis only option to rtla/timerlat Add an --aa-only option to the tooling to perform only the auto analysis and not to parse and format the data. - Other minor fixes and clean ups -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZEr6eRQccm9zdGVkdEBn b29kbWlzLm9yZwAKCRAp5XQQmuv6qpmpAQD5arr/Y++metYGug0qtAaRHEw/7XR4 xWDepF32eAdZDAEAtx69nu+t9q3Z5/CY+OdSmniRUjo6sDYTnAw8ok8U7wI= =Yln0 -----END PGP SIGNATURE----- Merge tag 'trace-tools-v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing tools updates from Steven Rostedt: - Add auto-analysis only option to rtla/timerlat Add an --aa-only option to the tooling to perform only the auto analysis and not to parse and format the data. - Other minor fixes and clean ups * tag 'trace-tools-v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: rtla/timerlat: Fix "Previous IRQ" auto analysis' line rtla/timerlat: Add auto-analysis only option rv: Remove redundant assignment to variable retval rv: Fix addition on an uninitialized variable 'run' rtla: Add .gitignore file |
||
Linus Torvalds
|
d579c468d7 |
tracing updates for 6.4:
- User events are finally ready! After lots of collaboration between various parties, we finally locked down on a stable interface for user events that can also work with user space only tracing. This is implemented by telling the kernel (or user space library, but that part is user space only and not part of this patch set), where the variable is that the application uses to know if something is listening to the trace. There's also an interface to tell the kernel about these events, which will show up in the /sys/kernel/tracing/events/user_events/ directory, where it can be enabled. When it's enabled, the kernel will update the variable, to tell the application to start writing to the kernel. See https://lwn.net/Articles/927595/ - Cleaned up the direct trampolines code to simplify arm64 addition of direct trampolines. Direct trampolines use the ftrace interface but instead of jumping to the ftrace trampoline, applications (mostly BPF) can register their own trampoline for performance reasons. - Some updates to the fprobe infrastructure. fprobes are more efficient than kprobes, as it does not need to save all the registers that kprobes on ftrace do. More work needs to be done before the fprobes will be exposed as dynamic events. - More updates to references to the obsolete path of /sys/kernel/debug/tracing for the new /sys/kernel/tracing path. - Add a seq_buf_do_printk() helper to seq_bufs, to print a large buffer line by line instead of all at once. There's users in production kernels that have a large data dump that originally used printk() directly, but the data dump was larger than what printk() allowed as a single print. Using seq_buf() to do the printing fixes that. - Add /sys/kernel/tracing/touched_functions that shows all functions that was every traced by ftrace or a direct trampoline. This is used for debugging issues where a traced function could have caused a crash by a bpf program or live patching. - Add a "fields" option that is similar to "raw" but outputs the fields of the events. It's easier to read by humans. - Some minor fixes and clean ups. -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZEr36xQccm9zdGVkdEBn b29kbWlzLm9yZwAKCRAp5XQQmuv6quZHAQCzuqnn2S8DsPd3Sy1vKIYaj0uajW5D Kz1oUJH4F0H7kgEA8XwXkdtfKpOXWc/ZH4LWfL7Orx2wJZJQMV9dVqEPDAE= =w0Z1 -----END PGP SIGNATURE----- Merge tag 'trace-v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing updates from Steven Rostedt: - User events are finally ready! After lots of collaboration between various parties, we finally locked down on a stable interface for user events that can also work with user space only tracing. This is implemented by telling the kernel (or user space library, but that part is user space only and not part of this patch set), where the variable is that the application uses to know if something is listening to the trace. There's also an interface to tell the kernel about these events, which will show up in the /sys/kernel/tracing/events/user_events/ directory, where it can be enabled. When it's enabled, the kernel will update the variable, to tell the application to start writing to the kernel. See https://lwn.net/Articles/927595/ - Cleaned up the direct trampolines code to simplify arm64 addition of direct trampolines. Direct trampolines use the ftrace interface but instead of jumping to the ftrace trampoline, applications (mostly BPF) can register their own trampoline for performance reasons. - Some updates to the fprobe infrastructure. fprobes are more efficient than kprobes, as it does not need to save all the registers that kprobes on ftrace do. More work needs to be done before the fprobes will be exposed as dynamic events. - More updates to references to the obsolete path of /sys/kernel/debug/tracing for the new /sys/kernel/tracing path. - Add a seq_buf_do_printk() helper to seq_bufs, to print a large buffer line by line instead of all at once. There are users in production kernels that have a large data dump that originally used printk() directly, but the data dump was larger than what printk() allowed as a single print. Using seq_buf() to do the printing fixes that. - Add /sys/kernel/tracing/touched_functions that shows all functions that was every traced by ftrace or a direct trampoline. This is used for debugging issues where a traced function could have caused a crash by a bpf program or live patching. - Add a "fields" option that is similar to "raw" but outputs the fields of the events. It's easier to read by humans. - Some minor fixes and clean ups. * tag 'trace-v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (41 commits) ring-buffer: Sync IRQ works before buffer destruction tracing: Add missing spaces in trace_print_hex_seq() ring-buffer: Ensure proper resetting of atomic variables in ring_buffer_reset_online_cpus recordmcount: Fix memory leaks in the uwrite function tracing/user_events: Limit max fault-in attempts tracing/user_events: Prevent same address and bit per process tracing/user_events: Ensure bit is cleared on unregister tracing/user_events: Ensure write index cannot be negative seq_buf: Add seq_buf_do_printk() helper tracing: Fix print_fields() for __dyn_loc/__rel_loc tracing/user_events: Set event filter_type from type ring-buffer: Clearly check null ptr returned by rb_set_head_page() tracing: Unbreak user events tracing/user_events: Use print_format_fields() for trace output tracing/user_events: Align structs with tabs for readability tracing/user_events: Limit global user_event count tracing/user_events: Charge event allocs to cgroups tracing/user_events: Update documentation for ABI tracing/user_events: Use write ABI in example tracing/user_events: Add ABI self-test ... |
||
Linus Torvalds
|
f20730efbd |
SMP cross-CPU function-call updates for v6.4:
- Remove diagnostics and adjust config for CSD lock diagnostics - Add a generic IPI-sending tracepoint, as currently there's no easy way to instrument IPI origins: it's arch dependent and for some major architectures it's not even consistently available. Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmRK438RHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1jJ5Q/5AZ0HGpyqwdFK8GmGznyu5qjP5HwV9pPq gZQScqSy4tZEeza4TFMi83CoXSg9uJ7GlYJqqQMKm78LGEPomnZtXXC7oWvTA9M5 M/jAvzytmvZloSCXV6kK7jzSejMHhag97J/BjTYhZYQpJ9T+hNC87XO6J6COsKr9 lPIYqkFrIkQNr6B0U11AQfFejRYP1ics2fnbnZL86G/zZAc6x8EveM3KgSer2iHl KbrO+xcYyGY8Ef9P2F72HhEGFfM3WslpT1yzqR3sm4Y+fuMG0oW3qOQuMJx0ZhxT AloterY0uo6gJwI0P9k/K4klWgz81Tf/zLb0eBAtY2uJV9Fo3YhPHuZC7jGPGAy3 JusW2yNYqc8erHVEMAKDUsl/1KN4TE2uKlkZy98wno+KOoMufK5MA2e2kPPqXvUi Jk9RvFolnWUsexaPmCftti0OCv3YFiviVAJ/t0pchfmvvJA2da0VC9hzmEXpLJVF 25nBTV/1uAOrWvOpCyo3ElrC2CkQVkFmK5rXMDdvf6ib0Nid4vFcCkCSLVfu+ePB 11mi7QYro+CcnOug1K+yKogUDmsZgV/u1kUwgQzTIpZ05Kkb49gUiXw9L2RGcBJh yoDoiI66KPR7PWQ2qBdQoXug4zfEEtWG0O9HNLB0FFRC3hu7I+HHyiUkBWs9jasK PA5+V7HcQRk= =Wp7f -----END PGP SIGNATURE----- Merge tag 'smp-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull SMP cross-CPU function-call updates from Ingo Molnar: - Remove diagnostics and adjust config for CSD lock diagnostics - Add a generic IPI-sending tracepoint, as currently there's no easy way to instrument IPI origins: it's arch dependent and for some major architectures it's not even consistently available. * tag 'smp-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: trace,smp: Trace all smp_function_call*() invocations trace: Add trace_ipi_send_cpu() sched, smp: Trace smp callback causing an IPI smp: reword smp call IPI comment treewide: Trace IPIs sent via smp_send_reschedule() irq_work: Trace self-IPIs sent via arch_irq_work_raise() smp: Trace IPIs sent via arch_send_call_function_ipi_mask() sched, smp: Trace IPIs sent via send_call_function_single_ipi() trace: Add trace_ipi_send_cpumask() kernel/smp: Make csdlock_debug= resettable locking/csd_lock: Remove per-CPU data indirection from CSD lock debugging locking/csd_lock: Remove added data from CSD lock debugging locking/csd_lock: Add Kconfig option for csd_debug default |
||
Linus Torvalds
|
586b222d74 |
Scheduler changes for v6.4:
- Allow unprivileged PSI poll()ing - Fix performance regression introduced by mm_cid - Improve livepatch stalls by adding livepatch task switching to cond_resched(), this resolves livepatching busy-loop stalls with certain CPU-bound kthreads. - Improve sched_move_task() performance on autogroup configs. - On core-scheduling CPUs, avoid selecting throttled tasks to run - Misc cleanups, fixes and improvements. Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmRK39cRHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1hXPhAAk2WqOV2cW4BjSCHjWWE05IfTb0HMn8si mFGBAnr1GIkJRvICAusAwDU3FcmP5mWyXA+LK110d3x4fKJP15vCD5ru5lHnBfX7 fSD+Ml8uM4Xlp8iUoQspilbQwmWkQSwhudbDs3Nj7XGUzJCvNgm1sM3xPRDlqSJ5 6zumfVOPTfzSGcZY3a8sMuJnCepZHLRR6NkLzo/DuI1NMy2Jw1dK43dh77AO1mBF M53PF2IQgm6Wu/67p2k5eDq4c0AKL4PyIb4dRTGOPyljWMf41n28jwMv1tjlvu+Y uT0JD8MJSrFiylyT41x7Asr7orAGXj3cPhShK5R0vrutx/SbqBiaaE1MO9U3aC3B 7xVXEORHWD6KIDqTvzmWGrMBkIdyWB6CLk6EJKr3MqM9hUtP2ift7bkAgIad9h+4 G9DdVePGoCyh/TQtJ9EPIULAYeu9mmDZe8rTQ8C5MCSg//05/CTMgBbb0NiFWhnd 0JQl1B0nNUA87whVUxK8Hfu4DLh7m9jrzgQr9Ww8/FwQ6tQHBOKWgDdbv45ckkaG cJIQt/+vLilddazc8u8E+BGaD5w2uIYF0uL7kvG6Q5oARX06AZ5dj1m06vhZe/Ym laOVZEpJsbQnxviY6jwj1n+CSB9aK7feiQfDePBPbpJGGUHyZoKrnLN6wmW2se+H VCHtdgsEl5I= =Hgci -----END PGP SIGNATURE----- Merge tag 'sched-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: - Allow unprivileged PSI poll()ing - Fix performance regression introduced by mm_cid - Improve livepatch stalls by adding livepatch task switching to cond_resched(). This resolves livepatching busy-loop stalls with certain CPU-bound kthreads - Improve sched_move_task() performance on autogroup configs - On core-scheduling CPUs, avoid selecting throttled tasks to run - Misc cleanups, fixes and improvements * tag 'sched-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/clock: Fix local_clock() before sched_clock_init() sched/rt: Fix bad task migration for rt tasks sched: Fix performance regression introduced by mm_cid sched/core: Make sched_dynamic_mutex static sched/psi: Allow unprivileged polling of N*2s period sched/psi: Extract update_triggers side effect sched/psi: Rename existing poll members in preparation sched/psi: Rearrange polling code in preparation sched/fair: Fix inaccurate tally of ttwu_move_affine vhost: Fix livepatch timeouts in vhost_worker() livepatch,sched: Add livepatch task switching to cond_resched() livepatch: Skip task_call_func() for current task livepatch: Convert stack entries array to percpu sched: Interleave cfs bandwidth timers for improved single thread performance at low utilization sched/core: Reduce cost of sched_move_task when config autogroup sched/core: Avoid selecting the task that is throttled to run when core-sched enable sched/topology: Make sched_energy_mutex,update static |
||
Linus Torvalds
|
7c339778f9 |
Perf changes for v6.4:
- Add Intel Granite Rapids support - Add uncore events for Intel SPR IMC PMU - Fix perf IRQ throttling bug Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmRK3GoRHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1hqghAAwZyNLY8oAu/izkp5DKML7okLa48nh1+K JWsM4GT16ldx6MBqxX4V7tiYfAeo6ydCkd/LbxPklAk4Fmcwt5HWotOEGHab7N7B iTOix481TIa+E7ERuDLU5vS0chtdE5CrVfmBwtkI4WEv0c8vBwQHvRzy6RaoGG+2 XSqwFhkDG7l6N6rIz/V8zJKFo0hNPou0bllYJMM+A+BRdOthKhgXNjM6yuuKXgst TWxByDCoAG59uWvzq3WmKRLk/bJ4VC2lRDFpo01xtPvn1hA/Bs8MDF1940NG8uEh u7aTI1ZyaJJnnxaTk98r/aZIMYlHHWIIKnymtVFT52ZLeKARDt6MmX5ttec16Cd1 a5IXuiWPkiSfGlmN7Q4sK89eg606WQ/i+eIKfNMG0ZruvKFiT0uD+gNUlQvE/4fk 6OgJot/iF2B5+UWcBmEOMYV4jagSWLFkIQ3vGVT6E3Zp8gEa1/R3ek1aSbQeb8Gi kvMjcwthUNOgGIXrFKbdDTN/3seoNmrgIP2wetUzXhVaAOIYz6mpxIcXLc+vNiDh soTRtkLgNTkrVQ8AKp7snKNRLjazL6CqUd+RILH4tM+bshFnyeH6K4Eck+Fo//Dn XtEHry6nJ0ZPhsJPRwFHOoYCAuDQdy5DG7QQ07qXYW9DyJV7ZmefDJYOEjgcip6G f3/sUe7ekRA= =41Sa -----END PGP SIGNATURE----- Merge tag 'perf-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: - Add Intel Granite Rapids support - Add uncore events for Intel SPR IMC PMU - Fix perf IRQ throttling bug * tag 'perf-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel/uncore: Add events for Intel SPR IMC PMU perf/core: Fix hardlockup failure caused by perf throttle perf/x86/cstate: Add Granite Rapids support perf/x86/msr: Add Granite Rapids perf/x86/intel: Add Granite Rapids |
||
Linus Torvalds
|
2aff7c706c |
Objtool changes for v6.4:
- Mark arch_cpu_idle_dead() __noreturn, make all architectures & drivers that did this inconsistently follow this new, common convention, and fix all the fallout that objtool can now detect statically. - Fix/improve the ORC unwinder becoming unreliable due to UNWIND_HINT_EMPTY ambiguity, split it into UNWIND_HINT_END_OF_STACK and UNWIND_HINT_UNDEFINED to resolve it. - Fix noinstr violations in the KCSAN code and the lkdtm/stackleak code. - Generate ORC data for __pfx code - Add more __noreturn annotations to various kernel startup/shutdown/panic functions. - Misc improvements & fixes. Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmRK1x0RHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1ghxQ/+IkCynMYtdF5OG9YwbcGJqsPSfOPMEcEM pUSFYg+gGPBDT/fJfcVSqvUtdnWbLC2kXt9yiswXz3X3J2nmNkBk5YKQftsNDcul TmKeqIIAK51XTncpegKH0EGnOX63oZ9Vxa8CTPdDlb+YF23Km2FoudGRI9F5qbUd LoraXqGYeiaeySkGyWmZVl6Uc8dIxnMkTN3H/oI9aB6TOrsi059hAtFcSaFfyemP c4LqXXCH7k2baiQt+qaLZ8cuZVG/+K5r2N2cmjO5kmJc6ynIaFnfMe4XxZLjp5LT /PulYI15bXkvSARKx5CRh/CDHMOx5Blw+ASO0RhWbdy0WH4ZhhcaVF5AeIpPW86a 1LBcz97rMp72WmvKgrJeVO1r9+ll4SI6/YKGJRsxsCMdP3hgFpqntXyVjTFNdTM1 0gH6H5v55x06vJHvhtTk8SR3PfMTEM2fRU5jXEOrGowoGifx+wNUwORiwj6LE3KQ SKUdT19RNzoW3VkFxhgk65ThK1S7YsJUKRoac3YdhttpqqqtFV//erenrZoR4k/p vzvKy68EQ7RCNyD5wNWNFe0YjeJl5G8gQ8bUm4Xmab7djjgz+pn4WpQB8yYKJLAo x9dqQ+6eUbw3Hcgk6qQ9E+r/svbulnAL0AeALAWK/91DwnZ2mCzKroFkLN7napKi fRho4CqzrtM= =NwEV -----END PGP SIGNATURE----- Merge tag 'objtool-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool updates from Ingo Molnar: - Mark arch_cpu_idle_dead() __noreturn, make all architectures & drivers that did this inconsistently follow this new, common convention, and fix all the fallout that objtool can now detect statically - Fix/improve the ORC unwinder becoming unreliable due to UNWIND_HINT_EMPTY ambiguity, split it into UNWIND_HINT_END_OF_STACK and UNWIND_HINT_UNDEFINED to resolve it - Fix noinstr violations in the KCSAN code and the lkdtm/stackleak code - Generate ORC data for __pfx code - Add more __noreturn annotations to various kernel startup/shutdown and panic functions - Misc improvements & fixes * tag 'objtool-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits) x86/hyperv: Mark hv_ghcb_terminate() as noreturn scsi: message: fusion: Mark mpt_halt_firmware() __noreturn x86/cpu: Mark {hlt,resume}_play_dead() __noreturn btrfs: Mark btrfs_assertfail() __noreturn objtool: Include weak functions in global_noreturns check cpu: Mark nmi_panic_self_stop() __noreturn cpu: Mark panic_smp_self_stop() __noreturn arm64/cpu: Mark cpu_park_loop() and friends __noreturn x86/head: Mark *_start_kernel() __noreturn init: Mark start_kernel() __noreturn init: Mark [arch_call_]rest_init() __noreturn objtool: Generate ORC data for __pfx code x86/linkage: Fix padding for typed functions objtool: Separate prefix code from stack validation code objtool: Remove superfluous dead_end_function() check objtool: Add symbol iteration helpers objtool: Add WARN_INSN() scripts/objdump-func: Support multiple functions context_tracking: Fix KCSAN noinstr violation objtool: Add stackleak instrumentation to uaccess safe list ... |
||
Linus Torvalds
|
33afd4b763 |
Mainly singleton patches all over the place. Series of note are:
- updates to scripts/gdb from Glenn Washburn - kexec cleanups from Bjorn Helgaas -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZEr+6wAKCRDdBJ7gKXxA jn4NAP4u/hj/kR2dxYehcVLuQqJspCRZZBZlAReFJyHNQO6voAEAk0NN9rtG2+/E r0G29CJhK+YL0W6mOs8O1yo9J1rZnAM= =2CUV -----END PGP SIGNATURE----- Merge tag 'mm-nonmm-stable-2023-04-27-16-01' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: "Mainly singleton patches all over the place. Series of note are: - updates to scripts/gdb from Glenn Washburn - kexec cleanups from Bjorn Helgaas" * tag 'mm-nonmm-stable-2023-04-27-16-01' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (50 commits) mailmap: add entries for Paul Mackerras libgcc: add forward declarations for generic library routines mailmap: add entry for Oleksandr ocfs2: reduce ioctl stack usage fs/proc: add Kthread flag to /proc/$pid/status ia64: fix an addr to taddr in huge_pte_offset() checkpatch: introduce proper bindings license check epoll: rename global epmutex scripts/gdb: add GDB convenience functions $lx_dentry_name() and $lx_i_dentry() scripts/gdb: create linux/vfs.py for VFS related GDB helpers uapi/linux/const.h: prefer ISO-friendly __typeof__ delayacct: track delays from IRQ/SOFTIRQ scripts/gdb: timerlist: convert int chunks to str scripts/gdb: print interrupts scripts/gdb: raise error with reduced debugging information scripts/gdb: add a Radix Tree Parser lib/rbtree: use '+' instead of '|' for setting color. proc/stat: remove arch_idle_time() checkpatch: check for misuse of the link tags checkpatch: allow Closes tags with links ... |
||
Linus Torvalds
|
7fa8a8ee94 |
- Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of
switching from a user process to a kernel thread. - More folio conversions from Kefeng Wang, Zhang Peng and Pankaj Raghav. - zsmalloc performance improvements from Sergey Senozhatsky. - Yue Zhao has found and fixed some data race issues around the alteration of memcg userspace tunables. - VFS rationalizations from Christoph Hellwig: - removal of most of the callers of write_one_page(). - make __filemap_get_folio()'s return value more useful - Luis Chamberlain has changed tmpfs so it no longer requires swap backing. Use `mount -o noswap'. - Qi Zheng has made the slab shrinkers operate locklessly, providing some scalability benefits. - Keith Busch has improved dmapool's performance, making part of its operations O(1) rather than O(n). - Peter Xu adds the UFFD_FEATURE_WP_UNPOPULATED feature to userfaultd, permitting userspace to wr-protect anon memory unpopulated ptes. - Kirill Shutemov has changed MAX_ORDER's meaning to be inclusive rather than exclusive, and has fixed a bunch of errors which were caused by its unintuitive meaning. - Axel Rasmussen give userfaultfd the UFFDIO_CONTINUE_MODE_WP feature, which causes minor faults to install a write-protected pte. - Vlastimil Babka has done some maintenance work on vma_merge(): cleanups to the kernel code and improvements to our userspace test harness. - Cleanups to do_fault_around() by Lorenzo Stoakes. - Mike Rapoport has moved a lot of initialization code out of various mm/ files and into mm/mm_init.c. - Lorenzo Stoakes removd vmf_insert_mixed_prot(), which was added for DRM, but DRM doesn't use it any more. - Lorenzo has also coverted read_kcore() and vread() to use iterators and has thereby removed the use of bounce buffers in some cases. - Lorenzo has also contributed further cleanups of vma_merge(). - Chaitanya Prakash provides some fixes to the mmap selftesting code. - Matthew Wilcox changes xfs and afs so they no longer take sleeping locks in ->map_page(), a step towards RCUification of pagefaults. - Suren Baghdasaryan has improved mmap_lock scalability by switching to per-VMA locking. - Frederic Weisbecker has reworked the percpu cache draining so that it no longer causes latency glitches on cpu isolated workloads. - Mike Rapoport cleans up and corrects the ARCH_FORCE_MAX_ORDER Kconfig logic. - Liu Shixin has changed zswap's initialization so we no longer waste a chunk of memory if zswap is not being used. - Yosry Ahmed has improved the performance of memcg statistics flushing. - David Stevens has fixed several issues involving khugepaged, userfaultfd and shmem. - Christoph Hellwig has provided some cleanup work to zram's IO-related code paths. - David Hildenbrand has fixed up some issues in the selftest code's testing of our pte state changing. - Pankaj Raghav has made page_endio() unneeded and has removed it. - Peter Xu contributed some rationalizations of the userfaultfd selftests. - Yosry Ahmed has fixed an issue around memcg's page recalim accounting. - Chaitanya Prakash has fixed some arm-related issues in the selftests/mm code. - Longlong Xia has improved the way in which KSM handles hwpoisoned pages. - Peter Xu fixes a few issues with uffd-wp at fork() time. - Stefan Roesch has changed KSM so that it may now be used on a per-process and per-cgroup basis. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZEr3zQAKCRDdBJ7gKXxA jlLoAP0fpQBipwFxED0Us4SKQfupV6z4caXNJGPeay7Aj11/kQD/aMRC2uPfgr96 eMG3kwn2pqkB9ST2QpkaRbxA//eMbQY= =J+Dj -----END PGP SIGNATURE----- Merge tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of switching from a user process to a kernel thread. - More folio conversions from Kefeng Wang, Zhang Peng and Pankaj Raghav. - zsmalloc performance improvements from Sergey Senozhatsky. - Yue Zhao has found and fixed some data race issues around the alteration of memcg userspace tunables. - VFS rationalizations from Christoph Hellwig: - removal of most of the callers of write_one_page() - make __filemap_get_folio()'s return value more useful - Luis Chamberlain has changed tmpfs so it no longer requires swap backing. Use `mount -o noswap'. - Qi Zheng has made the slab shrinkers operate locklessly, providing some scalability benefits. - Keith Busch has improved dmapool's performance, making part of its operations O(1) rather than O(n). - Peter Xu adds the UFFD_FEATURE_WP_UNPOPULATED feature to userfaultd, permitting userspace to wr-protect anon memory unpopulated ptes. - Kirill Shutemov has changed MAX_ORDER's meaning to be inclusive rather than exclusive, and has fixed a bunch of errors which were caused by its unintuitive meaning. - Axel Rasmussen give userfaultfd the UFFDIO_CONTINUE_MODE_WP feature, which causes minor faults to install a write-protected pte. - Vlastimil Babka has done some maintenance work on vma_merge(): cleanups to the kernel code and improvements to our userspace test harness. - Cleanups to do_fault_around() by Lorenzo Stoakes. - Mike Rapoport has moved a lot of initialization code out of various mm/ files and into mm/mm_init.c. - Lorenzo Stoakes removd vmf_insert_mixed_prot(), which was added for DRM, but DRM doesn't use it any more. - Lorenzo has also coverted read_kcore() and vread() to use iterators and has thereby removed the use of bounce buffers in some cases. - Lorenzo has also contributed further cleanups of vma_merge(). - Chaitanya Prakash provides some fixes to the mmap selftesting code. - Matthew Wilcox changes xfs and afs so they no longer take sleeping locks in ->map_page(), a step towards RCUification of pagefaults. - Suren Baghdasaryan has improved mmap_lock scalability by switching to per-VMA locking. - Frederic Weisbecker has reworked the percpu cache draining so that it no longer causes latency glitches on cpu isolated workloads. - Mike Rapoport cleans up and corrects the ARCH_FORCE_MAX_ORDER Kconfig logic. - Liu Shixin has changed zswap's initialization so we no longer waste a chunk of memory if zswap is not being used. - Yosry Ahmed has improved the performance of memcg statistics flushing. - David Stevens has fixed several issues involving khugepaged, userfaultfd and shmem. - Christoph Hellwig has provided some cleanup work to zram's IO-related code paths. - David Hildenbrand has fixed up some issues in the selftest code's testing of our pte state changing. - Pankaj Raghav has made page_endio() unneeded and has removed it. - Peter Xu contributed some rationalizations of the userfaultfd selftests. - Yosry Ahmed has fixed an issue around memcg's page recalim accounting. - Chaitanya Prakash has fixed some arm-related issues in the selftests/mm code. - Longlong Xia has improved the way in which KSM handles hwpoisoned pages. - Peter Xu fixes a few issues with uffd-wp at fork() time. - Stefan Roesch has changed KSM so that it may now be used on a per-process and per-cgroup basis. * tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits) mm,unmap: avoid flushing TLB in batch if PTE is inaccessible shmem: restrict noswap option to initial user namespace mm/khugepaged: fix conflicting mods to collapse_file() sparse: remove unnecessary 0 values from rc mm: move 'mmap_min_addr' logic from callers into vm_unmapped_area() hugetlb: pte_alloc_huge() to replace huge pte_alloc_map() maple_tree: fix allocation in mas_sparse_area() mm: do not increment pgfault stats when page fault handler retries zsmalloc: allow only one active pool compaction context selftests/mm: add new selftests for KSM mm: add new KSM process and sysfs knobs mm: add new api to enable ksm per process mm: shrinkers: fix debugfs file permissions mm: don't check VMA write permissions if the PTE/PMD indicates write permissions migrate_pages_batch: fix statistics for longterm pin retry userfaultfd: use helper function range_in_vma() lib/show_mem.c: use for_each_populated_zone() simplify code mm: correct arg in reclaim_pages()/reclaim_clean_pages_from_list() fs/buffer: convert create_page_buffers to folio_create_buffers fs/buffer: add folio_create_empty_buffers helper ... |