write_bm_pid_to_resctrl() uses resctrl_val to check test name which is
not a good interface generic resctrl FS functions should provide.
Tests define mongrp when needed. Remove the test name check in
write_bm_pid_to_resctrl() to only rely on the mongrp parameter being
non-NULL.
Remove write_bm_pid_to_resctrl() resctrl_val parameter and resctrl_val
member from the struct resctrl_val_param that are not used anymore.
Similarly, remove the test name constants that are no longer used.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
The CMT selftest instantiates a monitor group to read LLC occupancy.
Since the test also creates a control group, it is unnecessary to
create another one for monitoring because control groups already
provide monitoring too.
Remove the unnecessary monitor group from the CMT selftest.
Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Nothing during MBA test uses mongrp even if it has been defined ever
since the introduction of the MBA test in the commit 01fee6b4d1
("selftests/resctrl: Add MBA test").
Remove the mongrp from MBA test.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
The struct resctrl_val_param has control and monitor groups as char
arrays but they are not supposed to be mutated within resctrl_val().
Convert the ctrlgrp and mongrp char array within resctrl_val_param to
plain const char pointers and adjust the strlen() based checks to
check NULL instead.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Control group, monitor group and resctrl_val are not mutated and
should not be mutated within resctrlfs.c functions.
Mark this by using const char * for the arguments.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
bw_report is only needed for selecting the correct value from the
values IMC measured. It is a member in the resctrl_val_param struct and
is always set to "reads". The value is then checked in resctrl_val()
using validate_bw_report_request() that besides validating the input,
assumes it can mutate the string which is questionable programming
practice.
Simplify handling bw_report:
- Convert validate_bw_report_request() into get_bw_report_type() that
inputs and returns const char *. Use NULL to indicate error.
- Validate the report types inside measure_mem_bw(), not in
resctrl_val().
- Pass bw_report to measure_mem_bw() from ->measure() hook because
resctrl_val() no longer needs bw_report for anything.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
The struct resctrl_val_param is there to customize behavior inside
resctrl_val() which is currently not used to full extent and there are
number of strcmp()s for test name in resctrl_val done by resctrl_val().
Create ->init() hook into the struct resctrl_val_param to cleanly
do per test initialization.
Remove also unused branches to setup paths and the related #defines
for CMT test.
While touching kerneldoc, make the adjacent line consistent with the
newly added form (callback vs call back).
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
The measurement done in resctrl_val() varies depending on test type.
The decision for how to measure is decided based on the string compare
to test name which is quite inflexible.
Add ->measure() callback into the struct resctrl_val_param to allow
each test to provide necessary code as a function which simplifies what
resctrl_val() has to do.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
initialize_mem_bw_resctrl() and set_mbm_path() contain complicated set
of conditions, each yielding different file to be opened to measure
memory bandwidth through resctrl FS. In practice, only two of them are
used. For MBA test, ctrlgrp is always provided, and for MBM test both
ctrlgrp and mongrp are set.
The file used differ between MBA/MBM test, however, MBM test
unnecessarily create monitor group because resctrl FS already provides
monitoring interface underneath any ctrlgrp too, which is what the MBA
selftest uses.
Consolidate memory bandwidth file used to the one used by the MBA
selftest. Remove all unused branches opening other files to simplify
the code.
Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
measure_vals() is awfully generic name so rename it to measure_mem_bw()
to describe better what it does and document the function parameters.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
'bm_pid' and 'ppid' are global variables. As they are used by different
processes and in signal handler, they cannot be entirely converted into
local variables.
The scope of those variables can still be reduced into resctrl_val.c
only. As PARENT_EXIT() macro is using 'ppid', make it a function in
resctrl_val.c and pass ppid to it as an argument because it is easier
to understand than using the global variable directly.
Pass 'bm_pid' into measure_vals() instead of relying on the global
variable which helps to make the call signatures of measure_vals() and
measure_llc_resctrl() more similar to each other.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
A few functions receive PIDs through int arguments. PIDs variables
should be of type pid_t, not int.
Convert pid arguments from int to pid_t.
Before printing PID, match the type to %d by casting to int which is
enough for Linux (standard would allow using a longer integer type but
generalizing for that would complicate the code unnecessarily, the
selftest code does not need to be portable).
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Both initialize_mem_bw_resctrl() and initialize_llc_occu_resctrl() that
are called from resctrl_val() need to determine domain ID to construct
resctrl fs related paths. Both functions do it by taking CPU ID which
neither needs for any other purpose than determining the domain ID.
Consolidate determining the domain ID into resctrl_val() and pass the
domain ID instead of CPU ID to initialize_mem_bw_resctrl() and
initialize_llc_occu_resctrl().
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Resctrl selftests refer to "bandwidth" currently in two other forms in
the code ("B/W" and "band width").
Use "bandwidth" consistently everywhere. While at it, fix also one
"over flow" -> "overflow" on a line that is touched by the change.
Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
For MBM/MBA tests, measure_vals() calls get_mem_bw_imc() that performs
the measurement over a duration of sleep(1) call. The memory bandwidth
numbers from IMC are derived over this duration. The resctrl FS derived
memory bandwidth, however, is calculated inside measure_vals() and only
takes delta between the previous value and the current one which
besides the actual test, also samples inter-test noise.
Rework the logic in measure_vals() and get_mem_bw_imc() such that the
resctrl FS memory bandwidth section covers much shorter duration
closely matching that of the IMC perf counters to improve measurement
accuracy.
For the second read after rewind() to return a fresh value, also
newline has to be consumed by the fscanf().
Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
The imc perf fd close() calls are missing from all error paths. In
addition, get_mem_bw_imc() handles fds in a for loop but close() is
based on two fixed indexes READ and WRITE.
Open code inner for loops to READ+WRITE entries for clarity and add a
function to close() IMC fds properly in all cases.
Fixes: 7f4d257e3a ("selftests/resctrl: Add callback to start a benchmark")
Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
There are extra spaces in the middle of #define. It is recommended
to delete the spaces to make the code look more comfortable.
Signed-off-by: aigourensheng <shechenglong001@gmail.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
gcc defaults to silence (off) for the following warnings, but clang
defaults to the opposite. The warnings are not useful for the kernel
itself, which is why they have remained disabled in gcc for the main
kernel build. And it is only due to including kernel data structures in
the selftests, that we get the warnings from clang.
-Waddress-of-packed-member
-Wgnu-variable-sized-type-not-at-end
In other words, the warnings are not unique to the selftests: there is
nothing that the selftests' code does that triggers these warnings,
other than the act of including the kernel's data structures. Therefore,
silence them for the clang builds as well.
This eliminates warnings for the net/ and user_events/ kselftest
subsystems, in these files:
./net/af_unix/scm_rights.c
./net/timestamping.c
./net/ipsec.c
./user_events/perf_test.c
Cc: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Acked-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Current release - regressions:
- core: fix rc7's __skb_datagram_iter() regression
Current release - new code bugs:
- eth: bnxt: fix crashes when reducing ring count with active RSS contexts
Previous releases - regressions:
- sched: fix UAF when resolving a clash
- skmsg: skip zero length skb in sk_msg_recvmsg2
- sunrpc: fix kernel free on connection failure in xs_tcp_setup_socket
- tcp: avoid too many retransmit packets
- tcp: fix incorrect undo caused by DSACK of TLP retransmit
- udp: Set SOCK_RCU_FREE earlier in udp_lib_get_port().
- eth: ks8851: fix deadlock with the SPI chip variant
- eth: i40e: fix XDP program unloading while removing the driver
Previous releases - always broken:
- bpf:
- fix too early release of tcx_entry
- fail bpf_timer_cancel when callback is being cancelled
- bpf: fix order of args in call to bpf_map_kvcalloc
- netfilter: nf_tables: prefer nft_chain_validate
- ppp: reject claimed-as-LCP but actually malformed packets
- wireguard: avoid unaligned 64-bit memory accesses
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmaP4GYSHHBhYmVuaUBy
ZWRoYXQuY29tAAoJECkkeY3MjxOkzUgQAKroA17iMt2rD8h385hL8T9r483CGsR+
MX6SuWn6T8v4cuhKbqhkf25pWOD0mKH2i+dmgYon7g9LjLG4DMiZBZAqmBwArbyM
mITgndWH57MnQQh3pgkDFp0lhzYkeERCVSgcgh2AFTcNoxXbazHkMMIghsBENx3+
wccTsqtPmT2GWRpw6IrHO6kUs98Gry4O2p6fw3dX3/umD0z8OgnyRoCdVkylCDTM
2tBl4rWsXw8LvzSzmQ7qX3FSzdS++RJk2iXKWdrglah8cuKohZ98WbUwlt2ObCxz
fLbaAocPzOijaX2YAsNzKYzWizq0i4IjpgSebNUcI3YFthAog5nNGJv79w+cxBKy
NpaQA31Hd6K6oFJybkysHAf776RC3ueF48Isp3kag7NQ8Qy2+nfWAM9g1wq4UnOu
IePFdgojqUbfp3GzOG5yFyqqD8RPppJp7DowSjjfYN8Dxw1y5R090suDyrjFeiiC
MezV4xu7vZdi/6R8RXVfskR/iczHqDHGuuMikTlkm1LaLty9dfnoIZC5AagFH1rA
Jkzztkd9MnSGK94G9upIhu+t8F22/wzmrJhJOgl9LFvbXP91uGjZGr75u4cc8Q/G
jn2uy05T/vzoBSiLRQc0Z2Wp9GYEWRXHKSjLkabrGeakw6tYYgZSAVAlxqrpvqbV
+fm9Ibpvs9fb
=qwVA
-----END PGP SIGNATURE-----
Merge tag 'net-6.10-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from bpf and netfilter.
Current release - regressions:
- core: fix rc7's __skb_datagram_iter() regression
Current release - new code bugs:
- eth: bnxt: fix crashes when reducing ring count with active RSS
contexts
Previous releases - regressions:
- sched: fix UAF when resolving a clash
- skmsg: skip zero length skb in sk_msg_recvmsg2
- sunrpc: fix kernel free on connection failure in
xs_tcp_setup_socket
- tcp: avoid too many retransmit packets
- tcp: fix incorrect undo caused by DSACK of TLP retransmit
- udp: Set SOCK_RCU_FREE earlier in udp_lib_get_port().
- eth: ks8851: fix deadlock with the SPI chip variant
- eth: i40e: fix XDP program unloading while removing the driver
Previous releases - always broken:
- bpf:
- fix too early release of tcx_entry
- fail bpf_timer_cancel when callback is being cancelled
- bpf: fix order of args in call to bpf_map_kvcalloc
- netfilter: nf_tables: prefer nft_chain_validate
- ppp: reject claimed-as-LCP but actually malformed packets
- wireguard: avoid unaligned 64-bit memory accesses"
* tag 'net-6.10-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (33 commits)
net, sunrpc: Remap EPERM in case of connection failure in xs_tcp_setup_socket
net/sched: Fix UAF when resolving a clash
net: ks8851: Fix potential TX stall after interface reopen
udp: Set SOCK_RCU_FREE earlier in udp_lib_get_port().
netfilter: nf_tables: prefer nft_chain_validate
netfilter: nfnetlink_queue: drop bogus WARN_ON
ethtool: netlink: do not return SQI value if link is down
ppp: reject claimed-as-LCP but actually malformed packets
selftests/bpf: Add timer lockup selftest
net: ethernet: mtk-star-emac: set mac_managed_pm when probing
e1000e: fix force smbus during suspend flow
tcp: avoid too many retransmit packets
bpf: Defer work in bpf_timer_cancel_and_free
bpf: Fail bpf_timer_cancel when callback is being cancelled
bpf: fix order of args in call to bpf_map_kvcalloc
net: ethernet: lantiq_etop: fix double free in detach
i40e: Fix XDP program unloading while removing the driver
net: fix rc7's __skb_datagram_iter()
net: ks8851: Fix deadlock with the SPI chip variant
octeontx2-af: Fix incorrect value output on error path in rvu_check_rsrc_availability()
...
Add a selftest that tries to trigger a situation where two timer callbacks
are attempting to cancel each other's timer. By running them continuously,
we hit a condition where both run in parallel and cancel each other.
Without the fix in the previous patch, this would cause a lockup as
hrtimer_cancel on either side will wait for forward progress from the
callback.
Ensure that this situation leads to a EDEADLK error.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240711052709.2148616-1-memxor@gmail.com
The CXL driver was recently updated to return EBUSY rather than
ENXIO when the device reports that an injection request exceeds
the device's limit. That change to EBUSY allows debug users to
differentiate between limit reached and inject failures for any
other reason.
Change cxl-test to also return EBUSY and tidy up the dev_dbg()
messaging to emit the correct limit.
Reminder: the cxl-test per device injection limit is a configurable
attribute: /sys/bus/platform/drivers/cxl_mock_mem/poison_inject_max
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Tested-by: Xingtao Yao <yaoxt.fnst@fujitsu.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Link: https://patch.msgid.link/ba1b80e1658b644d85d0d5e2287112d00a48b9cf.1720316188.git.alison.schofield@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
In tools/ directory, function bitmap_clear() is currently only used in
object file tools/testing/radix-tree/xarray.o.
But instead of keeping a bitmap.c with only bitmap_clear() definition in
radix-tree's own directory, it would be more proper to put it in common
directory lib/.
Sync the kernel definition and link some related libs, no functional
change is expected.
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
CC: Matthew Wilcox <willy@infradead.org>
CC: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
If bpf_object__load() fails in test_xdp_adjust_frags_tail_grow(), "obj"
opened before this should be closed. So use "goto out" to close it instead
of using "return" here.
Fixes: 110221081a ("bpf: selftests: update xdp_adjust_tail selftest to include xdp frags")
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Link: https://lore.kernel.org/r/f282a1ed2d0e3fb38cceefec8e81cabb69cab260.1720615848.git.tanggeliang@kylinos.cn
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Run bpf_tcp_ca selftests (./test_progs -t bpf_tcp_ca) on a Loongarch
platform, some "Segmentation fault" errors occur:
'''
test_dctcp:PASS:bpf_dctcp__open_and_load 0 nsec
test_dctcp:FAIL:bpf_map__attach_struct_ops unexpected error: -524
#29/1 bpf_tcp_ca/dctcp:FAIL
test_cubic:PASS:bpf_cubic__open_and_load 0 nsec
test_cubic:FAIL:bpf_map__attach_struct_ops unexpected error: -524
#29/2 bpf_tcp_ca/cubic:FAIL
test_dctcp_fallback:PASS:dctcp_skel 0 nsec
test_dctcp_fallback:PASS:bpf_dctcp__load 0 nsec
test_dctcp_fallback:FAIL:dctcp link unexpected error: -524
#29/4 bpf_tcp_ca/dctcp_fallback:FAIL
test_write_sk_pacing:PASS:open_and_load 0 nsec
test_write_sk_pacing:FAIL:attach_struct_ops unexpected error: -524
#29/6 bpf_tcp_ca/write_sk_pacing:FAIL
test_update_ca:PASS:open 0 nsec
test_update_ca:FAIL:attach_struct_ops unexpected error: -524
settcpca:FAIL:setsockopt unexpected setsockopt: \
actual -1 == expected -1
(network_helpers.c:99: errno: No such file or directory) \
Failed to call post_socket_cb
start_test:FAIL:start_server_str unexpected start_server_str: \
actual -1 == expected -1
test_update_ca:FAIL:ca1_ca1_cnt unexpected ca1_ca1_cnt: \
actual 0 <= expected 0
#29/9 bpf_tcp_ca/update_ca:FAIL
#29 bpf_tcp_ca:FAIL
Caught signal #11!
Stack trace:
./test_progs(crash_handler+0x28)[0x5555567ed91c]
linux-vdso.so.1(__vdso_rt_sigreturn+0x0)[0x7ffffee408b0]
./test_progs(bpf_link__update_map+0x80)[0x555556824a78]
./test_progs(+0x94d68)[0x5555564c4d68]
./test_progs(test_bpf_tcp_ca+0xe8)[0x5555564c6a88]
./test_progs(+0x3bde54)[0x5555567ede54]
./test_progs(main+0x61c)[0x5555567efd54]
/usr/lib64/libc.so.6(+0x22208)[0x7ffff2aaa208]
/usr/lib64/libc.so.6(__libc_start_main+0xac)[0x7ffff2aaa30c]
./test_progs(_start+0x48)[0x55555646bca8]
Segmentation fault
'''
This is because BPF trampoline is not implemented on Loongarch yet,
"link" returned by bpf_map__attach_struct_ops() is NULL. test_progs
crashs when this NULL link passes to bpf_link__update_map(). This
patch adds NULL checks for all links in bpf_tcp_ca to fix these errors.
If "link" is NULL, goto the newly added label "out" to destroy the skel.
v2:
- use "goto out" instead of "return" as Eduard suggested.
Fixes: 06da9f3bd6 ("selftests/bpf: Test switching TCP Congestion Control algorithms.")
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/r/b4c841492bd4ed97964e4e61e92827ce51bf1dc9.1720615848.git.tanggeliang@kylinos.cn
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
CONFIG_MEMCG_KMEM used to be a user-visible option for whether slab
tracking is enabled. It has been default-enabled and equivalent to
CONFIG_MEMCG for almost a decade. We've only grown more kernel memory
accounting sites since, and there is no imaginable cgroup usecase going
forward that wants to track user pages but not the multitude of
user-drivable kernel allocations.
Link: https://lkml.kernel.org/r/20240701153148.452230-1-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Muchun Song <muchun.song@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Centralize the _GNU_SOURCE definition to CFLAGS in lib.mk. Remove
redundant defines from Makefiles that import lib.mk. Convert any usage of
"#define _GNU_SOURCE 1" to "#define _GNU_SOURCE".
This uses the form "-D_GNU_SOURCE=", which is equivalent to
"#define _GNU_SOURCE".
Otherwise using "-D_GNU_SOURCE" is equivalent to "-D_GNU_SOURCE=1" and
"#define _GNU_SOURCE 1", which is less commonly seen in source code and
would require many changes in selftests to avoid redefinition warnings.
Link: https://lkml.kernel.org/r/20240625223454.1586259-2-edliaw@google.com
Signed-off-by: Edward Liaw <edliaw@google.com>
Suggested-by: John Hubbard <jhubbard@nvidia.com>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: André Almeida <andrealmeid@igalia.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Jarkko Sakkinen <jarkko@kernel.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kees Cook <kees@kernel.org>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Reinette Chatre <reinette.chatre@intel.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Both Ryan and Chris have been utilizing the small test program to aid in
debugging and identifying issues with swap entry allocation. While a real
or intricate workload might be more suitable for assessing the correctness
and effectiveness of the swap allocation policy, a small test program
presents a simpler means of understanding the problem and initially
verifying the improvements being made.
Let's endeavor to integrate it into tools/mm. Although it presently only
accommodates 64KB and 4KB, I'm optimistic that we can expand its
capabilities to support multiple sizes and simulate more complex systems
in the future as required.
Basically, we have
1. Use MADV_PAGEPUT for rapid swap-out, putting the swap allocation
code under high exercise in a short time.
2. Use MADV_DONTNEED to simulate the behavior of libc and Java heap in
freeing memory, as well as for munmap, app exits, or OOM killer
scenarios. This ensures new mTHP is always generated, released or
swapped out, similar to the behavior on a PC or Android phone where
many applications are frequently started and terminated.
3. Swap in with or without the "-a" option to observe how fragments
due to swap-in and the incoming swap-in of large folios will impact
swap-out fallback.
Due to 2, we ensure a certain proportion of mTHP. Similarly, because of
3, we maintain a certain proportion of small folios, as we don't support
large folios swap-in, meaning any swap-in will immediately result in small
folios. Therefore, with both 2 and 3, we automatically achieve a system
containing both mTHP and small folios. Additionally, 1 provides the
ability to continuously swap them out.
We can also use "-s" to add a dedicated small folios memory area.
[akpm@linux-foundation.org: thp_swap_allocator_test.c needs mman.h, per Kairui Song]
Link: https://lkml.kernel.org/r/20240622071231.576056-2-21cnbao@gmail.com
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
Acked-by: Chris Li <chrisl@kernel.org>
Tested-by: Chris Li <chrisl@kernel.org>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Tested-by: Ryan Roberts <ryan.roberts@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This patch uses public helper connect_fd_to_fd() exported in
network_helpers.h instead of using getsockname() + connect() in
run_lookup_prog() in prog_tests/sk_lookup.c. This can simplify
the code.
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Link: https://lore.kernel.org/r/7077c277cde5a1864cdc244727162fb75c8bb9c5.1720515893.git.tanggeliang@kylinos.cn
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
This patch uses public helper start_server_addr() in udp_recv_send()
in prog_tests/sk_lookup.c to simplify the code.
And use ASSERT_OK_FD() to check fd returned by start_server_addr().
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Link: https://lore.kernel.org/r/f11cabfef4a2170ecb66a1e8e2e72116d8f621b3.1720515893.git.tanggeliang@kylinos.cn
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
This patch uses public helper start_server_str() to simplify make_server()
in prog_tests/sk_lookup.c.
Add a callback setsockopts() to do all sockopts, set it to post_socket_cb
pointer of struct network_helper_opts. And add a new struct cb_opts to save
the data needed to pass to the callback. Then pass this network_helper_opts
to start_server_str().
Also use ASSERT_OK_FD() to check fd returned by start_server_str().
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Link: https://lore.kernel.org/r/5981539f5591d2c4998c962ef2bf45f34c940548.1720515893.git.tanggeliang@kylinos.cn
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
In the error path when update_lookup_map() fails in drop_on_reuseport in
prog_tests/sk_lookup.c, "server1", the fd of server 1, should be closed.
This patch fixes this by using "goto close_srv1" lable instead of "detach"
to close "server1" in this case.
Fixes: 0ab5539f85 ("selftests/bpf: Tests for BPF_SK_LOOKUP attach point")
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Link: https://lore.kernel.org/r/86aed33b4b0ea3f04497c757845cff7e8e621a2d.1720515893.git.tanggeliang@kylinos.cn
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Add a new dedicated ASSERT macro ASSERT_OK_FD to test whether a socket
FD is valid or not. It can be used to replace macros ASSERT_GT(fd, 0, ""),
ASSERT_NEQ(fd, -1, "") or statements (fd < 0), (fd != -1).
Suggested-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Link: https://lore.kernel.org/r/ded75be86ac630a3a5099739431854c1ec33f0ea.1720515893.git.tanggeliang@kylinos.cn
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Some callers expect __start_server() helper to pass their own "backlog"
value to listen() instead of the default of 1. So this patch adds struct
member "backlog" for network_helper_opts to allow callers to set "backlog"
value via start_server_str() helper.
listen(fd, 0 /* backlog */) can be used to enforce syncookie. Meaning
backlog 0 is a legit value.
Using 0 as a default and changing it to 1 here is fine. It makes the test
program easier to write for the common case. Enforcing syncookie mode by
using backlog 0 is a niche use case but it should at least have a way for
the caller to do that. Thus, -ve backlog value is used here for the
syncookie use case. Please see the comment in network_helpers.h for
the details.
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Link: https://lore.kernel.org/r/1660229659b66eaad07aa2126e9c9fe217eba0dd.1720515893.git.tanggeliang@kylinos.cn
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
In many cases, kernel netfilter functionality is built as modules.
If CONFIG_NF_FLOW_TABLE=m in particular, progs/xdp_flowtable.c
(and hence selftests) will fail to compile, so add a ___local
version of "struct flow_ports".
Fixes: c77e572d3a ("selftests/bpf: Add selftest for bpf_xdp_flow_lookup kfunc")
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/r/20240710150051.192598-1-alan.maguire@oracle.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
When clone3() was introduced, it was not obvious how each architecture
deals with setting up the stack and keeping the register contents in
a fork()-like system call, so this was left for the architecture
maintainers to implement, with __ARCH_WANT_SYS_CLONE3 defined by those
that already implement it.
Five years later, we still have a few architectures left that are missing
clone3(), and the macro keeps getting in the way as it's fundamentally
different from all the other __ARCH_WANT_SYS_* macros that are meant
to provide backwards-compatibility with applications using older
syscalls that are no longer provided by default.
Address this by reversing the polarity of the macro, adding an
__ARCH_BROKEN_SYS_CLONE3 macro to all architectures that don't
already provide the syscall, and remove __ARCH_WANT_SYS_CLONE3
from all the other ones.
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Currently, BPF kfuncs which accept trusted pointer arguments
i.e. those flagged as KF_TRUSTED_ARGS, KF_RCU, or KF_RELEASE, all
require an original/unmodified trusted pointer argument to be supplied
to them. By original/unmodified, it means that the backing register
holding the trusted pointer argument that is to be supplied to the BPF
kfunc must have its fixed offset set to zero, or else the BPF verifier
will outright reject the BPF program load. However, this zero fixed
offset constraint that is currently enforced by the BPF verifier onto
BPF kfuncs specifically flagged to accept KF_TRUSTED_ARGS or KF_RCU
trusted pointer arguments is rather unnecessary, and can limit their
usability in practice. Specifically, it completely eliminates the
possibility of constructing a derived trusted pointer from an original
trusted pointer. To put it simply, a derived pointer is a pointer
which points to one of the nested member fields of the object being
pointed to by the original trusted pointer.
This patch relaxes the zero fixed offset constraint that is enforced
upon BPF kfuncs which specifically accept KF_TRUSTED_ARGS, or KF_RCU
arguments. Although, the zero fixed offset constraint technically also
applies to BPF kfuncs accepting KF_RELEASE arguments, relaxing this
constraint for such BPF kfuncs has subtle and unwanted
side-effects. This was discovered by experimenting a little further
with an initial version of this patch series [0]. The primary issue
with relaxing the zero fixed offset constraint on BPF kfuncs accepting
KF_RELEASE arguments is that it'd would open up the opportunity for
BPF programs to supply both trusted pointers and derived trusted
pointers to them. For KF_RELEASE BPF kfuncs specifically, this could
be problematic as resources associated with the backing pointer could
be released by the backing BPF kfunc and cause instabilities for the
rest of the kernel.
With this new fixed offset semantic in-place for BPF kfuncs accepting
KF_TRUSTED_ARGS and KF_RCU arguments, we now have more flexibility
when it comes to the BPF kfuncs that we're able to introduce moving
forward.
Early discussions covering the possibility of relaxing the zero fixed
offset constraint can be found using the link below. This will provide
more context on where all this has stemmed from [1].
Notably, pre-existing tests have been updated such that they provide
coverage for the updated zero fixed offset
functionality. Specifically, the nested offset test was converted from
a negative to positive test as it was already designed to assert zero
fixed offset semantics of a KF_TRUSTED_ARGS BPF kfunc.
[0] https://lore.kernel.org/bpf/ZnA9ndnXKtHOuYMe@google.com/
[1] https://lore.kernel.org/bpf/ZhkbrM55MKQ0KeIV@google.com/
Signed-off-by: Matt Bobrowski <mattbobrowski@google.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20240709210939.1544011-1-mattbobrowski@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Improve how we handle old BPF skeletons when it comes to BPF map
auto-attachment. Emit one warn-level message per each struct_ops map
that could have been auto-attached, if user provided recent enough BPF
skeleton version. Don't spam log if there are no relevant struct_ops
maps, though.
This should help users realize that they probably need to regenerate BPF
skeleton header with more recent bpftool/libbpf-cargo (or whatever other
means of BPF skeleton generation).
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20240708204540.4188946-4-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
BPF skeleton was designed from day one to be extensible. Generated BPF
skeleton code specifies actual sizes of map/prog/variable skeletons for
that reason and libbpf is supposed to work with newer/older versions
correctly.
Unfortunately, it was missed that we implicitly embed hard-coded most
up-to-date (according to libbpf's version of libbpf.h header used to
compile BPF skeleton header) sizes of those structs, which can differ
from the actual sizes at runtime when libbpf is used as a shared
library.
We have a few places were we just index array of maps/progs/vars, which
implicitly uses these potentially invalid sizes of structs.
This patch aims to fix this problem going forward. Once this lands,
we'll backport these changes in Github repo to create patched releases
for older libbpfs.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Fixes: d66562fba1 ("libbpf: Add BPF object skeleton support")
Fixes: 430025e5dc ("libbpf: Add subskeleton scaffolding")
Fixes: 08ac454e25 ("libbpf: Auto-attach struct_ops BPF maps in BPF skeleton")
Co-developed-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240708204540.4188946-3-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Old versions of libbpf don't handle varying sizes of bpf_map_skeleton
struct correctly. As such, BPF skeleton generated by newest bpftool
might not be compatible with older libbpf (though only when libbpf is
used as a shared library), even though it, by design, should.
Going forward libbpf will be fixed, plus we'll release bug fixed
versions of relevant old libbpfs, but meanwhile try to mitigate from
bpftool side by conservatively assuming older and smaller definition of
bpf_map_skeleton, if possible. Meaning, if there are no struct_ops maps.
If there are struct_ops, then presumably user would like to have
auto-attaching logic and struct_ops map link placeholders, so use the
full bpf_map_skeleton definition in that case.
Acked-by: Quentin Monnet <qmo@kernel.org>
Co-developed-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20240708204540.4188946-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Some workloads may want to rehash the flows in response to an imbalance.
Most effective way to do that is changing the RSS key. Check that changing
the key does not cause link flaps or traffic disruption.
Disrupting traffic for key update is not incorrect, but makes the key
update unusable for rehashing under load.
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240708213627.226025-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Some devices dynamically increase and decrease the size of the RSS
indirection table based on the number of enabled queues.
When that happens driver must maintain the balance of entries
(preferably duplicating the smaller table).
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240708213627.226025-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
By default main RSS table should change to include all queues.
When user sets a specific RSS config the driver should preserve it,
even when queue count changes. Driver should refuse to deactivate
queues used in the user-set RSS config.
For additional contexts driver should still refuse to deactivate
queues in use. Whether the contexts should get resized like
context 0 when queue count increases is a bit unclear. I anticipate
most drivers today don't do that. Since main use case for additional
contexts is to set the indir table - it doesn't seem worthwhile to
care about behavior of the default table too much. Don't test that.
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240708213627.226025-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Wrap up sending traffic and checking in which queues it landed
in a helper.
The method used for testing is to send a lot of iperf traffic
and check which queues received the most packets. Those should
be the queues where we expect iperf to land - either because we
installed a filter for the port iperf uses, or we didn't and
expect it to use context 0.
Contexts get disjoint queue sets, but the main context (AKA context 0)
may receive some background traffic (noise).
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240708213627.226025-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The basic test may fail without resetting the RSS indir table.
Use the .exec() method to run cleanup early since we re-test
with traffic that returning to default state works.
While at it reformat the doc a tiny bit.
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240708213627.226025-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The ageing time used by the test is too short for debug kernels and
results in entries being aged out prematurely [1].
Fix by increasing the ageing time.
The same change was done for the VLAN-aware version of the test in
commit dfbab74044 ("selftests: forwarding: Make vxlan-bridge-1q pass
on debug kernels").
[1]
# ./vxlan_bridge_1d.sh
[...]
# TEST: VXLAN: flood before learning [ OK ]
# TEST: VXLAN: show learned FDB entry [ OK ]
# TEST: VXLAN: learned FDB entry [FAIL]
# veth3: Expected to capture 0 packets, got 4.
# RTNETLINK answers: No such file or directory
# TEST: VXLAN: deletion of learned FDB entry [ OK ]
# TEST: VXLAN: Ageing of learned FDB entry [FAIL]
# veth3: Expected to capture 0 packets, got 2.
[...]
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240707095458.2870260-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Lu Baolu says:
====================
This series implements the functionality of delivering IO page faults to
user space through the IOMMUFD framework. One feasible use case is the
nested translation. Nested translation is a hardware feature that supports
two-stage translation tables for IOMMU. The second-stage translation table
is managed by the host VMM, while the first-stage translation table is
owned by user space. This allows user space to control the IOMMU mappings
for its devices.
When an IO page fault occurs on the first-stage translation table, the
IOMMU hardware can deliver the page fault to user space through the
IOMMUFD framework. User space can then handle the page fault and respond
to the device top-down through the IOMMUFD. This allows user space to
implement its own IO page fault handling policies.
User space application that is capable of handling IO page faults should
allocate a fault object, and bind the fault object to any domain that it
is willing to handle the fault generatd for them. On a successful return
of fault object allocation, the user can retrieve and respond to page
faults by reading or writing to the file descriptor (FD) returned.
The iommu selftest framework has been updated to test the IO page fault
delivery and response functionality.
====================
* iommufd_pri:
iommufd/selftest: Add coverage for IOPF test
iommufd/selftest: Add IOPF support for mock device
iommufd: Associate fault object with iommufd_hw_pgtable
iommufd: Fault-capable hwpt attach/detach/replace
iommufd: Add iommufd fault object
iommufd: Add fault and response message definitions
iommu: Extend domain attach group with handle support
iommu: Add attach handle to struct iopf_group
iommu: Remove sva handle list
iommu: Introduce domain attachment handle
Link: https://lore.kernel.org/all/20240702063444.105814-1-baolu.lu@linux.intel.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Extend the selftest tool to add coverage of testing IOPF handling. This
would include the following tests:
- Allocating and destroying an iommufd fault object.
- Allocating and destroying an IOPF-capable HWPT.
- Attaching/detaching/replacing an IOPF-capable HWPT on a device.
- Triggering an IOPF on the mock device.
- Retrieving and responding to the IOPF through the file interface.
Link: https://lore.kernel.org/r/20240702063444.105814-11-baolu.lu@linux.intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
This kselftest fixes update for Linux 6.10 consists of fixes to clang
build failures to timerns, vDSO tests and fixes to vDSO makefile.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmaMaSgACgkQCwJExA0N
Qxz+xw/8DG3OciD30wrXyPbH7Lw35AJXH3IsIjirQET6hoE+saxYOWbVN9POqqVy
VTh21ZwJpuSrh8gIYHUEZPLezTwIYWN7aZm2Zps7VttlfXjNvbWiLBB4ptAL/XWu
SogHFeE1u1KHk8JZY3v3j//hQxL0FqnUbqRjv5nnOUS1krgL4shP6JsdUU65Bs9x
TLSCJrJSCSpG/u7KAXSHlYy0kn9fnL+F2LUqTFf+kzOOdLZ+XaxHS/02GsZYgcVI
SUvL6x4NEqVMyxcnvL4QBs91SD1/q80vf7g0+gKHkcuHKluto/Zmnwhw40oN92lr
T6muSS2jW+OemZzglJdD4aIbCEisVtwPsPkdtux9JZV9VAH2lyYz0+G0J2fX7r11
LOcd4Y7HhoYA5UL6s6puE8xQEZOUrBNMY4exfeOkW/UaJhscewtyTMQsNRs8qW+4
lEoHFJSsVQtfuZSxUaiXm49loVxu8JueynG6dafRue8tf9mCWpOzl01fVpkoLL/1
5lMOau3DZallsiHKU0COg6eJhAi6QQjC2nYNMJHwO3DFCKpwneMYbbU9xqS5MZ5Y
wVijpgyFdIMk5qxHDdVEmevFNyYG3xGYKq/sReDuwb4qJkdx7rDS5mMTkVxyHdCe
ezHxw6tuiLohHXDHVCR/KxQwjiHkXZF2uudzTFDt6Lxeu68PAFk=
=c2zt
-----END PGP SIGNATURE-----
Merge tag 'linux_kselftest-fixes-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest fixes from Shuah Khan
"Fixes to clang build failures to timerns, vDSO tests and fixes to vDSO
makefile"
* tag 'linux_kselftest-fixes-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests/vDSO: remove duplicate compiler invocations from Makefile
selftests/vDSO: remove partially duplicated "all:" target in Makefile
selftests/vDSO: fix clang build errors and warnings
selftest/timerns: fix clang build failures for abs() calls
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZoxN0AAKCRDbK58LschI
g0c5AQDa3ZV9gfbN42y1zSDoM1uOgO60fb+ydxyOYh8l3+OiQQD/fLfpTY3gBFSY
9yi/pZhw/QdNzQskHNIBrHFGtJbMxgs=
=p1Zz
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2024-07-08
The following pull-request contains BPF updates for your *net-next* tree.
We've added 102 non-merge commits during the last 28 day(s) which contain
a total of 127 files changed, 4606 insertions(+), 980 deletions(-).
The main changes are:
1) Support resilient split BTF which cuts down on duplication and makes BTF
as compact as possible wrt BTF from modules, from Alan Maguire & Eduard Zingerman.
2) Add support for dumping kfunc prototypes from BTF which enables both detecting
as well as dumping compilable prototypes for kfuncs, from Daniel Xu.
3) Batch of s390x BPF JIT improvements to add support for BPF arena and to implement
support for BPF exceptions, from Ilya Leoshkevich.
4) Batch of riscv64 BPF JIT improvements in particular to add 12-argument support
for BPF trampolines and to utilize bpf_prog_pack for the latter, from Pu Lehui.
5) Extend BPF test infrastructure to add a CHECKSUM_COMPLETE validation option
for skbs and add coverage along with it, from Vadim Fedorenko.
6) Inline bpf_get_current_task/_btf() helpers in the arm64 BPF JIT which gives
a small 1% performance improvement in micro-benchmarks, from Puranjay Mohan.
7) Extend the BPF verifier to track the delta between linked registers in order
to better deal with recent LLVM code optimizations, from Alexei Starovoitov.
8) Fix bpf_wq_set_callback_impl() kfunc signature where the third argument should
have been a pointer to the map value, from Benjamin Tissoires.
9) Extend BPF selftests to add regular expression support for test output matching
and adjust some of the selftest when compiled under gcc, from Cupertino Miranda.
10) Simplify task_file_seq_get_next() and remove an unnecessary loop which always
iterates exactly once anyway, from Dan Carpenter.
11) Add the capability to offload the netfilter flowtable in XDP layer through
kfuncs, from Florian Westphal & Lorenzo Bianconi.
12) Various cleanups in networking helpers in BPF selftests to shave off a few
lines of open-coded functions on client/server handling, from Geliang Tang.
13) Properly propagate prog->aux->tail_call_reachable out of BPF verifier, so
that x86 JIT does not need to implement detection, from Leon Hwang.
14) Fix BPF verifier to add a missing check_func_arg_reg_off() to prevent an
out-of-bounds memory access for dynpointers, from Matt Bobrowski.
15) Fix bpf_session_cookie() kfunc to return __u64 instead of long pointer as
it might lead to problems on 32-bit archs, from Jiri Olsa.
16) Enhance traffic validation and dynamic batch size support in xsk selftests,
from Tushar Vyavahare.
bpf-next-for-netdev
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (102 commits)
selftests/bpf: DENYLIST.aarch64: Remove fexit_sleep
selftests/bpf: amend for wrong bpf_wq_set_callback_impl signature
bpf: helpers: fix bpf_wq_set_callback_impl signature
libbpf: Add NULL checks to bpf_object__{prev_map,next_map}
selftests/bpf: Remove exceptions tests from DENYLIST.s390x
s390/bpf: Implement exceptions
s390/bpf: Change seen_reg to a mask
bpf: Remove unnecessary loop in task_file_seq_get_next()
riscv, bpf: Optimize stack usage of trampoline
bpf, devmap: Add .map_alloc_check
selftests/bpf: Remove arena tests from DENYLIST.s390x
selftests/bpf: Add UAF tests for arena atomics
selftests/bpf: Introduce __arena_global
s390/bpf: Support arena atomics
s390/bpf: Enable arena
s390/bpf: Support address space cast instruction
s390/bpf: Support BPF_PROBE_MEM32
s390/bpf: Land on the next JITed instruction after exception
s390/bpf: Introduce pre- and post- probe functions
s390/bpf: Get rid of get_probe_mem_regno()
...
====================
Link: https://patch.msgid.link/20240708221438.10974-1-daniel@iogearbox.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
"After" was missing an "r", nothing to see here.
Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
We had few lines about the feature, but without any complete examples.
Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
We had an extra "+" at the beginning of some lines that look like a
poorly formated patch.
Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
User can now read perf counters using "--add perf/<device>/<event>".
Other details work similarly to how --add works with MSRs.
Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
These three counters now are treated similar to other perf counters
groups. This simplifies and gets rid of a lot of special cases for APERF
and MPERF.
Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
These addresses the performance issues reported by Matt, Namhyung and
Linus. Recently it changed processing comm string and DSO with sorted
arrays but it required to sort the array whenever it adds a new entry.
This caused a performance issue and fix is to enhance the sorting by
finding the insertion point in the sorted array and to shift righthand
side using memmove().
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQSo2x5BnqMqsoHtzsmMstVUGiXMgwUCZov/wgAKCRCMstVUGiXM
g3olAQCFzp/BnopE7VgUSK5j0EOnMjSsvkQGkocqCVN1Km3y8AEAlV3EKb1rUN8s
SQ+QcEx7F4V38s+Aoo2SqU1yAwYsXAc=
=Ao/v
-----END PGP SIGNATURE-----
Merge tag 'perf-tools-fixes-for-v6.10-2024-07-08' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
Pull perf tools fixes from Namhyung Kim:
"Fix performance issue for v6.10
These address the performance issues reported by Matt, Namhyung and
Linus. Recently perf changed the processing of the comm string and DSO
using sorted arrays but this caused it to sort the array whenever
adding a new entry.
This caused a performance issue and the fix is to enhance the sorting
by finding the insertion point in the sorted array and to shift
righthand side using memmove()"
* tag 'perf-tools-fixes-for-v6.10-2024-07-08' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools:
perf dsos: When adding a dso into sorted dsos maintain the sort order
perf comm str: Avoid sort during insert
fexit_sleep test runs successfully now on the BPF CI so remove it
from the deny list. ftrace direct calls was blocking tracing programs
on arm64 but it has been resolved by now. For more details see also
discussion in [*].
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240705145009.32340-1-puranjay@kernel.org [*]
See the previous patch: the API was wrong, we were provided the pointer
to the value, not the actual struct bpf_wq *.
Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
Link: https://lore.kernel.org/r/20240708-fix-wq-v2-2-667e5c9fbd99@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
In the current state, an erroneous call to
bpf_object__find_map_by_name(NULL, ...) leads to a segmentation
fault through the following call chain:
bpf_object__find_map_by_name(obj = NULL, ...)
-> bpf_object__for_each_map(pos, obj = NULL)
-> bpf_object__next_map((obj = NULL), NULL)
-> return (obj = NULL)->maps
While calling bpf_object__find_map_by_name with obj = NULL is
obviously incorrect, this should not lead to a segmentation
fault but rather be handled gracefully.
As __bpf_map__iter already handles this situation correctly, we
can delegate the check for the regular case there and only add
a check in case the prev or next parameter is NULL.
Signed-off-by: Andreas Ziegler <ziegler.andreas@siemens.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240703083436.505124-1-ziegler.andreas@siemens.com
This cpupower second update for Linux 6.11-rc1 consists of
-- fix to install cpupower library in standard librray intall
location - /usr/lib
-- disable direct build of cpupower bench as it can only be
built from the cpupower main makefile.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmaIdeoACgkQCwJExA0N
Qxx7iA//UehgsvCngPaOwBAyDALv5mBK4LHviUCZ95NqZHkmE2b9YOpVXLRBmxna
K2VU4HpfN05P5yOwi+ah46vJr4zq0mS3GThqHGdxNruVzky3TwmAiLIkZoeiOQ/I
v0o4+3RTingMylyR6DxdMV5leHdct1W89JbM+CPkeU4BRtYYxEbAQc2fElTANaLR
CsPqXsW0sk09XoQnjdclrBp86mTP9OvjSq8ZgBlXOHyIRzKGE3PADn0oi3BiFv1L
YojkciiVHxbavtbp0QOGyeTZCjVb2DeMRCeO8ZLlLxmnZzYBBsGuQm5SMM5Sv6sl
9wej7MwHuWBctgKoqD6MZt0EjunPFcy5gdV3oh8f2r07yPDsYnL28RgolGh9n9zC
XELzV2N1QXfp4373mYO5d2OlppYd+1jPsshFnv5ALbdeiGElya+wonEz/Q/J6a9f
Ip23k4hOM8PnWVDJlqyMp3K0Kii7gOXkHYwmvOfxSU5daMlrEWfkOpxc48w1nrVy
W9kScrfTOOedPWu7YL56yif9FBjR1Elf7GmaAthwI1V6h1HLJFtFm3j1oJN9Mu8A
ek9npCHN+qOTysz9V2IcWG+6AX9dd59aSldcrPPRePi6S8nrm1/4pAUyHFdCsoop
uBljD6SFssJr37K0fdkWZz3oi9emFUMuZ7WcvPkhTdTytJkkNc8=
=WIv4
-----END PGP SIGNATURE-----
Merge tag 'linux-cpupower-6.11-rc1-2' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/shuah/linux into pm-tools
Merge more cpupower utility changes for 6.11-rc1 from Shuah Khan:
"This cpupower second update for Linux 6.11-rc1 consists of
-- fix to install cpupower library in standard librray intall
location - /usr/lib
-- disable direct build of cpupower bench as it can only be
built from the cpupower main makefile."
* tag 'linux-cpupower-6.11-rc1-2' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/shuah/linux:
cpupower: fix lib default installation path
cpupower: Disable direct build of the 'bench' subproject
dsos__add would add at the end of the dso array possibly requiring a
later find to re-sort the array. Patterns of find then add were
becoming O(n*log n) due to the sorts. Change the add routine to be
O(n) rather than O(1) but to maintain the sorted-ness of the dsos
array so that later finds don't need the O(n*log n) sort.
Fixes: 3f4ac23a99 ("perf dsos: Switch backing storage to array from rbtree/list")
Reported-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Steinar Gunderson <sesse@google.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Matt Fleming <matt@readmodwrite.com>
Link: https://lore.kernel.org/r/20240703172117.810918-3-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The array is sorted, so just move the elements and insert in order.
Fixes: 13ca628716 ("perf comm: Add reference count checking to 'struct comm_str'")
Reported-by: Matt Fleming <matt@readmodwrite.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Matt Fleming <matt@readmodwrite.com>
Cc: Steinar Gunderson <sesse@google.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Link: https://lore.kernel.org/r/20240703172117.810918-2-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
This version addresses one issue:
- Fix updating TRL MSR after SST-TF is disabled in auto mode.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
When SST-TF is disabled in auto mode, the performance is getting
limited.
This is caused by wrong programming of Turbo Ratio Limit (TRL) MSR.
This MSR always accepts the frequency ratio in 100 MHz unit. When the
TPMI is sending TRL in 1 MHz unit, change to 100 MHz, before updating
TRL MSR.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
- Fix unnecessary copy to 0 when kernel is booted at address 0.
- Fix usercopy crash when dumping dtl via debugfs.
- Avoid possible crash when PCI hotplug races with error handling.
- Fix kexec crash caused by scv being disabled before other CPUs call-in.
- Fix powerpc selftests build with USERCFLAGS set.
Thanks to: Anjali K, Ganesh Goudar, Gautam Menghani, Jinglin Wen, Nicholas
Piggin, Sourabh Jain, Srikar Dronamraju, Vishal Chourasia.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmaJyekTHG1wZUBlbGxl
cm1hbi5pZC5hdQAKCRBR6+o8yOGlgN6zD/0d8lPrWQ3TRkS+jLdhsDfHc+qMW1/N
DuxPrVJl4qLgvYPEZWAF5+uWuhJurmbTCXNRnUQ5HHfwPtkU77pbTNiQcCAYsy2l
W35DYE+vqnNNid9hFCgvLoSrGDA0qvcGpMVBVfqjRygOLxpWztmV7S7q9E0CvuWg
ESXt4HNyPiRVl4ufPam12lmiEDh+PycsD24U6FSjaTxqvd4kwSTyLDLfmI+gTaqx
1PdzKt0c3g2QhDBoR7cpRaTCRamKRPwqFHANMUAkIXm3fIdHpWOEF03lvTsA0OgA
0ktzaEUhCPHr6kjAizbybmgXZovh/eoZc9wUd7zCWdSGNiq8FlhsmFuIuScrbQ7k
YCYz+X/KoqNk2VbxKkDneO6/H2juzu9wzzK5OMcKsVGSWi7+DjBp9FBDiFCfb3VQ
ZMuc71dOTtA7fDqWDnYtFMtEwrUGpTixE5xPNBzbzIVkKdSjb1H3RLd/mhu7+X/B
eVjFOPj7mRburIX5M3UllvsdbOiLqjbg6P28JL3qG6qT/OiiQAmF5apKvf1LNvPV
xgJHGPemlAkVNihg6Xu8+up+wcPuMi13osjA9FZkLUdLXK4O+d3q/K/Rf7TGjT2X
rBNhCd3lRd6gmpa52ujm5X0f9czEJxMrfy0Ota3L8YFUb7hW8JK6fgcwSispXGSI
o/JUlQ30K6JAjA==
=B1pw
-----END PGP SIGNATURE-----
Merge tag 'powerpc-6.10-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Fix unnecessary copy to 0 when kernel is booted at address 0
- Fix usercopy crash when dumping dtl via debugfs
- Avoid possible crash when PCI hotplug races with error handling
- Fix kexec crash caused by scv being disabled before other CPUs
call-in
- Fix powerpc selftests build with USERCFLAGS set
Thanks to Anjali K, Ganesh Goudar, Gautam Menghani, Jinglin Wen,
Nicholas Piggin, Sourabh Jain, Srikar Dronamraju, and Vishal Chourasia.
* tag 'powerpc-6.10-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
selftests/powerpc: Fix build with USERCFLAGS set
powerpc/pseries: Fix scv instruction crash with kexec
powerpc/eeh: avoid possible crash when edev->pdev changes
powerpc/pseries: Whitelist dtl slub object for copying to userspace
powerpc/64s: Fix unnecessary copy to 0 when kernel is booted at address 0
Currently building the powerpc selftests with USERCFLAGS set to anything
causes the build to break:
$ make -C tools/testing/selftests/powerpc V=1 USERCFLAGS=-Wno-error
...
gcc -Wno-error cache_shape.c ...
cache_shape.c:18:10: fatal error: utils.h: No such file or directory
18 | #include "utils.h"
| ^~~~~~~~~
compilation terminated.
This happens because the USERCFLAGS are added to CFLAGS in lib.mk, which
causes the check of CFLAGS in powerpc/flags.mk to skip setting CFLAGS at
all, resulting in none of the usual CFLAGS being passed. That can
be seen in the output above, the only flag passed to the compiler is
-Wno-error.
Fix it by dropping the conditional setting of CFLAGS in flags.mk.
Instead always set CFLAGS, but also append USERCFLAGS if they are set.
Note that appending to CFLAGS (with +=) wouldn't work, because flags.mk
is included by multiple Makefiles (to support partial builds), causing
CFLAGS to be appended to multiple times. Additionally that would place
the USERCFLAGS prior to the standard CFLAGS, meaning the USERCFLAGS
couldn't override the standard flags. Being able to override the
standard flags is desirable, for example for adding -Wno-error.
With the fix in place, the CFLAGS are set correctly, including the
USERCFLAGS:
$ make -C tools/testing/selftests/powerpc V=1 USERCFLAGS=-Wno-error
...
gcc -std=gnu99 -O2 -Wall -Werror -DGIT_VERSION='"v6.10-rc2-7-gdea17e7e56c3"'
-I/home/michael/linux/tools/testing/selftests/powerpc/include -Wno-error
cache_shape.c ...
Fixes: 5553a79387 ("selftests/powerpc: Add flags.mk to support pmu buildable")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240706120833.909853-1-mpe@ellerman.id.au
Add a test to verify sampling packets via psample works.
In order to do that, create a subcommand in ovs-dpctl.py to listen to
on the psample multicast group and print samples.
Reviewed-by: Aaron Conole <aconole@redhat.com>
Tested-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Link: https://patch.msgid.link/20240704085710.353845-11-amorenoz@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The trunc action was supported decode-able but not parse-able. Add
support for parsing the action string.
Reviewed-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Link: https://patch.msgid.link/20240704085710.353845-10-amorenoz@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The userspace action lacks parsing support plus it contains a bug in the
name of one of its attributes.
This patch makes userspace action work.
Reviewed-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Link: https://patch.msgid.link/20240704085710.353845-9-amorenoz@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add sample and psample action support to ovs-dpctl.py.
Refactor common attribute parsing logic into an external function.
Reviewed-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Link: https://patch.msgid.link/20240704085710.353845-8-amorenoz@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The Makefile open-codes compiler invocations that ../lib.mk already
provides.
Avoid this by using a Make feature that allows setting per-target
variables, which in this case are: CFLAGS and LDFLAGS. This approach
generates the exact same compiler invocations as before, but removes all
of the code duplication, along with the quirky mangled variable names.
So now the Makefile is smaller, less unusual, and easier to read.
The new dependencies are listed after including lib.mk, in order to
let lib.mk provide the first target ("all:"), and are grouped together
with their respective source file dependencies, for visual clarity.
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
There were a couple of errors here:
1. TEST_GEN_PROGS was incorrectly prepending $(OUTPUT) to each program
to be built. However, lib.mk already does that because it assumes "bare"
program names are passed in, so this ended up creating
$(OUTPUT)/$(OUTPUT)/file.c, which of course won't work as intended.
2. lib.mk was included before TEST_GEN_PROGS was set, which led to
lib.mk's "all:" target not seeing anything to rebuild.
So nothing worked, which caused the author to force things by creating
an "all:" target locally--while still including ../lib.mk.
Fix all of this by including ../lib.mk at the right place, and removing
the $(OUTPUT) prefix to the programs to be built, and removing the
duplicate "all:" target.
Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
When building with clang, via:
make LLVM=1 -C tools/testing/selftests
...there are several warnings, and an error. This fixes all of those and
allows these tests to run and pass.
1. Fix linker error (undefined reference to memcpy) by providing a local
version of memcpy.
2. clang complains about using this form:
if (g = h & 0xf0000000)
...so factor out the assignment into a separate step.
3. The code is passing a signed const char* to elf_hash(), which expects
a const unsigned char *. There are several callers, so fix this at
the source by allowing the function to accept a signed argument, and
then converting to unsigned operations, once inside the function.
4. clang doesn't have __attribute__((externally_visible)) and generates
a warning to that effect. Fortunately, gcc 12 and gcc 13 do not seem
to require that attribute in order to build, run and pass tests here,
so remove it.
Reviewed-by: Carlos Llamas <cmllamas@google.com>
Reviewed-by: Edward Liaw <edliaw@google.com>
Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Tested-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
* A fix for the CMODX example in therecently added icache flushing
prctl().
* A fix to the perf driver to avoid corrupting event data on counter
overflows when external overflow handlers are in use.
* A fix to clear all hardware performance monitor events on boot, to
avoid dangling events firmware or previously booted kernels from
triggering spuriously.
* A fix to the perf event probing logic to avoid erroneously reporting
the presence of unimplemented counters. This also prevents some
implemented counters from being reported.
* A build fix for the vector sigreturn selftest on clang.
* A fix to ftrace, which now requires the previously optional index
argument to ftrace_graph_ret_addr().
* A fix to avoid deadlocking if kexec crash handling triggers in an
interrupt context.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmaIJBATHHBhbG1lckBk
YWJiZWx0LmNvbQAKCRAuExnzX7sYiYk6D/981DUAWJ5JPsqve7PihWnhFXh7T/fm
KZL7cNQN7/9QmqzJMD756oQCHZT2TeDTxwji4WUQo27uoS1SamsAxRWCPdW8GqDt
GwBJeviyWDwjNMgrejWwgH3d9so+WZ4kNKfiUrY+j1vgQ8TkE4h5wMzUtOBTSgDI
5EhHT5B5yjiRcadPshXivZAyimc6mxKJKph5v8W3BGgtLQRHs5tYop4ZkP5Utmv3
yBie7orfMRx5fNxE6fgn0c/3r49i+KGTSCzkK+0689qPlQNt7MTj4kqDVp7xu2ll
jl5GJNZrWSZR0cST9AG3VByqfeN2f9sbGYq5fAozkZy3idEYovtvGIU2xJVZRuIU
ZhY+VTk0fwO8HlilTLMbyk7t99EJ4a7bXcUuD6ub3BthlKfc41PArhZgasL/dFPd
VOSjy5hfGpJgmifSTpPXElf8jgBq6N4Kw9N+rBNkNiruEiwtWfsyqOckYAfNbULe
Z8Nikl+3pfWlwzQrAb30X78s4ZyJyOX+XxP118lvx+UAbZofxg5qJJGo7U0Ru54r
JPBCW8swlco6AXwvAj3yKcaL3qtKlc6f068QvcSaRELUvS2qfuJ7w4fjKdl/IT93
QggGUyuEVG3UC1Dj961plrACXmqISTAlW8HqkdPvUgLY9rSPuTLuCR54b3fGI+n/
3wJF6gl5leEPMw==
=/Gsf
-----END PGP SIGNATURE-----
Merge tag 'riscv-for-linus-6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
- A fix for the CMODX example in the recently added icache flushing
prctl()
- A fix to the perf driver to avoid corrupting event data on counter
overflows when external overflow handlers are in use
- A fix to clear all hardware performance monitor events on boot, to
avoid dangling events firmware or previously booted kernels from
triggering spuriously
- A fix to the perf event probing logic to avoid erroneously reporting
the presence of unimplemented counters. This also prevents some
implemented counters from being reported
- A build fix for the vector sigreturn selftest on clang
- A fix to ftrace, which now requires the previously optional index
argument to ftrace_graph_ret_addr()
- A fix to avoid deadlocking if kexec crash handling triggers in an
interrupt context
* tag 'riscv-for-linus-6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: kexec: Avoid deadlock in kexec crash path
riscv: stacktrace: fix usage of ftrace_graph_ret_addr()
riscv: selftests: Fix vsetivli args for clang
perf: RISC-V: Check standard event availability
drivers/perf: riscv: Reset the counter to hpmevent mapping while starting cpus
drivers/perf: riscv: Do not update the event data if uptodate
documentation: Fix riscv cmodx example
When building with clang, via:
make LLVM=1 -C tools/testing/selftests
...clang warns about mismatches between the expected and required
integer length being supplied to abs(3).
Fix this by using the correct variant of abs(3): labs(3) or llabs(3), in
these cases.
Reviewed-by: Dmitry Safonov <dima@arista.com>
Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Acked-by: Andrei Vagin <avagin@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
commit 8043832e2a ("memblock: use numa_valid_node() helper to check
for invalid node ID") introduce a new helper numa_valid_node(), which is
not defined in memblock tests.
Let's add it in the corresponding header file.
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
CC: Mike Rapoport (IBM) <rppt@kernel.org>
Link: https://lore.kernel.org/r/20240624015432.31134-1-richard.weiyang@gmail.com
Signed-off-by: Mike Rapoport <rppt@kernel.org>
Add regression and new tests when hugepage has correctable memory errors,
and how userspace wants to deal with it:
* if enable_soft_offline=1, mapped hugepage is soft offlined
* if enable_soft_offline=0, mapped hugepage is intact
Free hugepages case is not explicitly covered by the tests.
Hugepage having corrected memory errors is emulated with
MADV_SOFT_OFFLINE.
[jiaqiyan@google.com: v7]
Link: https://lkml.kernel.org/r/20240628205958.2845610-4-jiaqiyan@google.com
Link: https://lkml.kernel.org/r/20240626050818.2277273-4-jiaqiyan@google.com
Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
Acked-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Frank van der Linden <fvdl@google.com>
Cc: Jane Chu <jane.chu@oracle.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Lance Yang <ioworker0@gmail.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
If CONFIG_PTE_MARKER_UFFD_WP is disabled, then we turn off three features
in userfaultfd_api (UFFD_FEATURE_WP_HUGETLBFS_SHMEM,
UFFD_FEATURE_WP_UNPOPULATED, and UFFD_FEATURE_WP_ASYNC).
Currently this test always will call uffdio_regsiter with the flag
UFFDIO_REGISTER_MODE_WP. However, the kernel ensures in vma_can_userfault
that if the feature UFFD_FEATURE_WP_HUGETLBFS_SHMEM is disabled, only
allow the VM_UFFD_WP on anonymous vmas, meaning our call to
uffdio_regsiter will fail.
We still want to be able to run the test even if we have
CONFIG_PTE_MARKER_UFFD_WP disabled, so check to see if the feature
UFFD_FEATURE_WP_HUGETLBFS_SHMEM has been turned off in the test and if so,
disable us from calling uffdio_regsiter with the flag
UFFDIO_REGISTER_MODE_WP.
Link: https://lkml.kernel.org/r/20240626130513.120193-3-audra@redhat.com
Signed-off-by: Audra Mitchell <audra@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Rafael Aquini <raquini@redhat.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Now that we have updated userfaultfd_api to correctly return EINVAL when a
feature is requested but not available, let's fix the uffd-stress test to
only set the UFFD_FEATURE_WP_UNPOPULATED feature when the config is set.
In addition, still run the test if the CONFIG_PTE_MARKER_UFFD_WP is not
set, just dont use the corresponding UFFD_FEATURE_WP_UNPOPULATED feature.
Link: https://lkml.kernel.org/r/20240626130513.120193-2-audra@redhat.com
Signed-off-by: Audra Mitchell <audra@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rafael Aquini <raquini@redhat.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
There's one fix for power management with Intel's e1000e here,
Thorsten tells us there's another problem that started in v6.9.
We're trying to wrap that up but I don't think it's blocking.
Current release - new code bugs:
- wifi: mac80211: disable softirqs for queued frame handling
- af_unix: fix uninit-value in __unix_walk_scc(), with the new garbage
collection algo
Previous releases - regressions:
- Bluetooth:
- qca: fix BT enable failure for QCA6390 after warm reboot
- add quirk to ignore reserved PHY bits in LE Extended Adv Report,
abused by some Broadcom controllers found on Apple machines
- wifi: wilc1000: fix ies_len type in connect path
Previous releases - always broken:
- tcp: fix DSACK undo in fast recovery to call tcp_try_to_open(),
avoid premature timeouts
- net: make sure skb_datagram_iter maps fragments page by page,
in case we somehow get compound highmem mixed in
- eth: bnx2x: fix multiple UBSAN array-index-out-of-bounds when
more queues are used
Misc:
- MAINTAINERS: Remembering Larry Finger
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmaGv3sACgkQMUZtbf5S
IrviRxAAr7yvDg07P1UzFsXI1khOijGCxOYcDDV+AHVZKwJ9fAq0ky5pcaiW62mY
h3HffvEbNgZ07zd9l4Z9dVoel5ke9k4yofYZrI0D8X/T1e/Xo0LlxUFGwb0IidBj
IYkTnZfu1lGa73TWCIh369s1HgybupiHQicYSw+KIO1wtfds8gvyZJUyjNhlvUYQ
NdB/JQrBp/oxm2JlAMDubZfNuVEFCum5J3Ldj5W32j+H82RbGDi/eMn5w+Cs/tEx
rRFSuJ1L0rBhNcB4HDbcfin9jHjLhDjNXyYprlZAauMXK5AEBwRcOuEzyXWt1Npq
ZkJ8t/ToVLk9QkXaKA1gR9C6Bo8A+SL5a8ddfj/pHEqOa/GNXKYqEvGOmM7mmbBf
93sU+dBYZ3nLGrUtuTRVGTnbr+J1AhP/kUqIY1c787m3gCSB1qFkF67DQiYTGB9g
qf+xTcmJeGpL+4OtXgjpK4gUa152g0VsuAMTzecW/7EU/owD0+zCWuVGK9Gv/bgf
si40hgZ7Ipnq8k+N+4e2VQp1ufCduT8zGn6sxiivdS5GSNc8e2BnQH3AfjfIM8Z8
rK15U5WJIVQiCkthYh8cx8pxh2uwtcXevjUh4B682/U4HbLdiYfAQuD4/AOc2i8M
EJVzl7/5AaxhjoZPxloe9mtRnMvt7XhUiNOW0lR9fgDYOqcmnSo=
=MZAG
-----END PGP SIGNATURE-----
Merge tag 'net-6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bluetooth, wireless and netfilter.
There's one fix for power management with Intel's e1000e here,
Thorsten tells us there's another problem that started in v6.9. We're
trying to wrap that up but I don't think it's blocking.
Current release - new code bugs:
- wifi: mac80211: disable softirqs for queued frame handling
- af_unix: fix uninit-value in __unix_walk_scc(), with the new
garbage collection algo
Previous releases - regressions:
- Bluetooth:
- qca: fix BT enable failure for QCA6390 after warm reboot
- add quirk to ignore reserved PHY bits in LE Extended Adv Report,
abused by some Broadcom controllers found on Apple machines
- wifi: wilc1000: fix ies_len type in connect path
Previous releases - always broken:
- tcp: fix DSACK undo in fast recovery to call tcp_try_to_open(),
avoid premature timeouts
- net: make sure skb_datagram_iter maps fragments page by page, in
case we somehow get compound highmem mixed in
- eth: bnx2x: fix multiple UBSAN array-index-out-of-bounds when more
queues are used
Misc:
- MAINTAINERS: Remembering Larry Finger"
* tag 'net-6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (62 commits)
bnxt_en: Fix the resource check condition for RSS contexts
mlxsw: core_linecards: Fix double memory deallocation in case of invalid INI file
inet_diag: Initialize pad field in struct inet_diag_req_v2
tcp: Don't flag tcp_sk(sk)->rx_opt.saw_unknown for TCP AO.
selftests: make order checking verbose in msg_zerocopy selftest
selftests: fix OOM in msg_zerocopy selftest
ice: use proper macro for testing bit
ice: Reject pin requests with unsupported flags
ice: Don't process extts if PTP is disabled
ice: Fix improper extts handling
selftest: af_unix: Add test case for backtrack after finalising SCC.
af_unix: Fix uninit-value in __unix_walk_scc()
bonding: Fix out-of-bounds read in bond_option_arp_ip_targets_set()
net: rswitch: Avoid use-after-free in rswitch_poll()
netfilter: nf_tables: unconditionally flush pending work before notifier
wifi: iwlwifi: mvm: check vif for NULL/ERR_PTR before dereference
wifi: iwlwifi: mvm: avoid link lookup in statistics
wifi: iwlwifi: mvm: don't wake up rx_sync_waitq upon RFKILL
wifi: iwlwifi: properly set WIPHY_FLAG_SUPPORTS_EXT_KEK_KCK
wifi: wilc1000: fix ies_len type in connect path
...
-----BEGIN PGP SIGNATURE-----
iIYEABYKAC4WIQSVyBthFV4iTW/VU1/l49DojIL20gUCZoaS3BAcbWljQGRpZ2lr
b2QubmV0AAoJEOXj0OiMgvbSCq4A/0yPPeNTsq0mio4mq1cvCCq5C5HcDOEjqAc/
80qeUvb/AQCaV4dETTRSgHWQw3PNdsFBkZrqjWByakaLep5ZTPqOCQ==
=0QIq
-----END PGP SIGNATURE-----
Merge tag 'kselftest-fix-2024-07-04' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux
Pull Kselftest fix from Mickaël Salaün:
"Fix Kselftests timeout.
We can't use CLONE_VFORK, since that blocks the parent - and thus the
timeout handling - until the child exits or execve's.
Go back to using plain fork()"
* tag 'kselftest-fix-2024-07-04' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
selftests/harness: Fix tests timeout and race condition
parallel_test() function in vringh_test needs to verify
the creation of the guest/host pipe.
Signed-off-by: Yunseong Kim <yskelg@gmail.com>
Message-Id: <20240624174905.27980-2-yskelg@gmail.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
We find that when lock debugging is on, notifications may not come in
order. Thus, we have order checking outputs managed by cfg_verbose, to
avoid too many outputs in this case.
Fixes: 07b65c5b31 ("test: add msg_zerocopy test")
Signed-off-by: Zijian Zhang <zijianzhang@bytedance.com>
Signed-off-by: Xiaochun Lu <xiaochun.lu@bytedance.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240701225349.3395580-3-zijianzhang@bytedance.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In selftests/net/msg_zerocopy.c, it has a while loop keeps calling sendmsg
on a socket with MSG_ZEROCOPY flag, and it will recv the notifications
until the socket is not writable. Typically, it will start the receiving
process after around 30+ sendmsgs. However, as the introduction of commit
dfa2f04833 ("tcp: get rid of sysctl_tcp_adv_win_scale"), the sender is
always writable and does not get any chance to run recv notifications.
The selftest always exits with OUT_OF_MEMORY because the memory used by
opt_skb exceeds the net.core.optmem_max. Meanwhile, it could be set to a
different value to trigger OOM on older kernels too.
Thus, we introduce "cfg_notification_limit" to force sender to receive
notifications after some number of sendmsgs.
Fixes: 07b65c5b31 ("test: add msg_zerocopy test")
Signed-off-by: Zijian Zhang <zijianzhang@bytedance.com>
Signed-off-by: Xiaochun Lu <xiaochun.lu@bytedance.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240701225349.3395580-2-zijianzhang@bytedance.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
syzkaller reported a KMSAN splat in __unix_walk_scc() while backtracking
edge_stack after finalising SCC.
Let's add a test case exercising the path.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Shigeru Yoshida <syoshida@redhat.com>
Link: https://patch.msgid.link/20240702160428.10153-2-syoshida@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
User could update max_nr_regions parameter while DAMON is running to a
value that smaller than the current number of regions that DAMON is
seeing. Such update could be done for reducing the monitoring overhead.
In the case, DAMON should merge regions aggressively more than normal
situation to ensure the new limit is successfully applied. Implement a
kselftest to ensure that.
Link: https://lkml.kernel.org/r/20240625180538.73134-9-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Users can update DAMON parameters while it is running, using 'commit'
DAMON sysfs interface command. For testing the feature in future tests,
implement a function for doing that on the test-purpose DAMON sysfs
interface wrapper Python module.
Link: https://lkml.kernel.org/r/20240625180538.73134-8-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Implement a kselftest for DAMON's {min,max}_nr_regions' parameters. The
test ensures both the minimum and the maximum number of regions limit is
respected even if the workload's real number of regions is less than the
minimum or larger than the maximum limits.
Link: https://lkml.kernel.org/r/20240625180538.73134-7-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Implement DAMON stop function on the test-purpose DAMON sysfs interface
wrapper Python module, _damon_sysfs.py. This feature will be used by
future DAMON tests that need to start/stop DAMON multiple times.
Link: https://lkml.kernel.org/r/20240625180538.73134-6-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Implement a test for DAMOS tried regions command of DAMON sysfs interface.
It ensures the expected number of monitoring regions are created using an
artificial memory access pattern generator program.
Link: https://lkml.kernel.org/r/20240625180538.73134-5-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
To test schemes_tried_regions feature, we need to have a program having
specific number of regions that having different access pattern. Existing
artificial access pattern generator, 'access_memory', cannot be used for
the purpose, since it accesses only one region at a given time. Extending
it could be an option, but since the purpose and the implementation are
pretty simple, implementing another one from the scratch is better.
Implement such another artificial memory access program that allocates
user-defined number/size regions and accesses even-numbered regions.
Link: https://lkml.kernel.org/r/20240625180538.73134-4-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Implement schemes_update_tried_regions DAMON sysfs command on
_damon_sysfs.py, to use on implementations of future tests for the
feature.
Link: https://lkml.kernel.org/r/20240625180538.73134-3-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "selftests/damon: test DAMOS tried regions and
{min,max}_nr_regions".
This patch series fix a minor issue in a program for DAMON selftest, and
implement new functionality selftests for DAMOS tried regions and
{min,max}_nr_regions. The test for max_nr_regions also test the recovery
from online tuning-caused limit violation, which was fixed by a previous
patch [1] titled "mm/damon/core: merge regions aggressively when
max_nr_regions is unmet".
The first patch fixes a minor problem in the articial memory access
pattern generator for tests. Following 3 patches (2-4) implement schemes
tried regions test. Then a couple of patches (5-6) implementing static
setup based {min,max}_nr_regions functionality test follows. Final two
patches (7-8) implement dynamic max_nr_regions update test.
[1] https://lore.kernel.org/20240624210650.53960C2BBFC@smtp.kernel.org
This patch (of 8):
'access_memory' is an artificial memory access pattern generator for DAMON
tests. It creates and accesses memory regions that the user specified the
number and size via the command line. However, real access part of the
program ignores the user-specified size of each region. Instead, it uses
a hard-coded value, 10 MiB. Fix it to use user-defined size.
Note that all existing 'access_memory' users are setting the region size
as 10 MiB. Hence no real problem has happened so far.
Link: https://lkml.kernel.org/r/20240625180538.73134-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20240625180538.73134-2-sj@kernel.org
Fixes: b5906f5f73 ("selftests/damon: add a test for update_schemes_tried_regions sysfs command")
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Like for KASAN, it's useful to temporarily disable KMSAN checks around,
e.g., redzone accesses. Introduce kmsan_disable_current() and
kmsan_enable_current(), which are similar to their KASAN counterparts.
Make them reentrant in order to handle memory allocations in interrupt
context. Repurpose the allow_reporting field for this.
Link: https://lkml.kernel.org/r/20240621113706.315500-12-iii@linux.ibm.com
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Alexander Potapenko <glider@google.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: <kasan-dev@googlegroups.com>
Cc: Marco Elver <elver@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This continues the work on getting the selftests to build without
requiring people to first run "make headers" [1].
Now that the system call numbers are in the correct, checked-in locations
in the kernel tree (./tools/include/uapi/asm/unistd*.h), make sure that
the mm selftests include that file (indirectly).
Doing so provides guaranteed definitions at build time, so remove all of
the checks for "ifdef __NR_xxx" in the mm selftests, because they will
always be true (defined).
[1] commit e076eaca59 ("selftests: break the dependency upon local
header files")
Link: https://lkml.kernel.org/r/20240618022422.804305-7-jhubbard@nvidia.com
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Jeff Xu <jeffxu@chromium.org>
Cc: Andrei Vagin <avagin@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Kees Cook <kees@kernel.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
thuge-gen.c defines SHM_HUGE_* macros that are provided by the uapi since
4.14. These macros get redefined when compiling with Android's bionic
because its sys/shm.h will import the uapi definitions.
However if linux/shm.h is included, with glibc, sys/shm.h will clash on
some struct definitions:
/usr/include/linux/shm.h:26:8: error: redefinition of ‘struct shmid_ds’
26 | struct shmid_ds {
| ^~~~~~~~
In file included from /usr/include/x86_64-linux-gnu/bits/shm.h:45,
from /usr/include/x86_64-linux-gnu/sys/shm.h:30:
/usr/include/x86_64-linux-gnu/bits/types/struct_shmid_ds.h:24:8: note: originally defined here
24 | struct shmid_ds
| ^~~~~~~~
For now, guard the SHM_HUGE_* defines with ifndef to prevent redefinition
warnings on Android bionic.
Link: https://lkml.kernel.org/r/20240605223637.1374969-3-edliaw@google.com
Signed-off-by: Edward Liaw <edliaw@google.com>
Reviewed-by: Carlos Llamas <cmllamas@google.com>
Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Bill Wendling <morbo@google.com>
Cc: Carlos Llamas <cmllamas@google.com>
Cc: Justin Stitt <justinstitt@google.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
thuge-gen defines MAP_HUGE_* macros that are provided by linux/mman.h
since 4.15. Removes the macros and includes linux/mman.h instead.
Link: https://lkml.kernel.org/r/20240605223637.1374969-2-edliaw@google.com
Signed-off-by: Edward Liaw <edliaw@google.com>
Reviewed-by: Carlos Llamas <cmllamas@google.com>
Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Bill Wendling <morbo@google.com>
Cc: Justin Stitt <justinstitt@google.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
create_pagecache_thp_and_fd() in split_huge_page_test.c used the variable
dummy to perform mmap read.
However, this test was skipped even on XFS which has large folio support.
The issue was compiler (gcc 13.2.0) was optimizing out the dummy variable,
therefore, not creating huge page in the page cache.
Use asm volatile() trick to force the compiler not to optimize out the
loop where we read from the mmaped addr. This is similar to what is being
done in other tests (cow.c, etc)
As the variable is now used in the asm statement, remove the unused
attribute.
Link: https://lkml.kernel.org/r/20240606203619.677276-1-kernel@pankajraghav.com
Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Pankaj Raghav <p.raghav@samsung.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
On Ubuntu 23.04, the kvm and mdwe selftests/mm build fails due to
missing a few items that are found in prctl.h. Here is an excerpt of the
build failures:
ksm_tests.c:252:13: error: use of undeclared identifier 'PR_SET_MEMORY_MERGE'
...
mdwe_test.c:26:18: error: use of undeclared identifier 'PR_SET_MDWE'
mdwe_test.c:38:18: error: use of undeclared identifier 'PR_GET_MDWE'
Fix these errors by adding a new tools/include/uapi/linux/prctl.h . This
file was created by running "make headers", and then copying a snapshot
over from ./usr/include/linux/prctl.h, as per the approach we settled on
in [1].
[1] commit e076eaca59 ("selftests: break the dependency upon local
header files")
Link: https://lkml.kernel.org/r/20240618022422.804305-6-jhubbard@nvidia.com
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Andrei Vagin <avagin@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jeff Xu <jeffxu@chromium.org>
Cc: Kees Cook <kees@kernel.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
On Ubuntu 23.04, on a clean git tree, the selftests/mm build fails due 10
or 20 missing items, all of which are found in fs.h, which is created via
"make headers". However, as per [1], the idea is to stop requiring "make
headers", and instead, take a snapshot of the files and check them in.
Here are a few of the build errors:
vm_util.c:34:21: error: variable has incomplete type 'struct pm_scan_arg'
struct pm_scan_arg arg;
...
vm_util.c:45:28: error: use of undeclared identifier 'PAGE_IS_WPALLOWED'
...
vm_util.c:55:21: error: variable has incomplete type 'struct page_region'
...
vm_util.c:105:20: error: use of undeclared identifier 'PAGE_IS_SOFT_DIRTY'
To fix this, add fs.h, taken from a snapshot of ./usr/include/linux/fs.h
after running "make headers".
[1] commit e076eaca59 ("selftests: break the dependency upon local
header files")
Link: https://lkml.kernel.org/r/20240618022422.804305-5-jhubbard@nvidia.com
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Andrei Vagin <avagin@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jeff Xu <jeffxu@chromium.org>
Cc: Kees Cook <kees@kernel.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Now that the test macros are factored out into their final location, and
simplified, it's time to rename TEST_END_CHECK to something that
represents its new functionality: REPORT_TEST_PASS.
Link: https://lkml.kernel.org/r/20240618022422.804305-4-jhubbard@nvidia.com
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Jeff Xu <jeffxu@chromium.org>
Tested-by: Jeff Xu <jeffxu@chromium.org>
Cc: Andrei Vagin <avagin@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Kees Cook <kees@kernel.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Clean up and move some copy-pasted items into a new mseal_helpers.h.
1. The test macros can be made safer and simpler, by observing that
they are invariably called when about to return. This means that the
macros do not need an intrusive label to goto; they can simply return.
2. PKEY* items. We cannot, unfortunately use pkey-helpers.h. The
best we can do is to factor out these few items into mseal_helpers.h.
3. These tests still need their own definition of u64, so also move
that to the header file.
4. Be sure to include the new mseal_helpers.h in the Makefile
dependencies.
[jhubbard@nvidia.com: include the new mseal_helpers.h in Makefile dependencies]
Link: https://lkml.kernel.org/r/01685978-f6b1-4c24-8397-22cd3c24b91a@nvidia.com
Link: https://lkml.kernel.org/r/20240618022422.804305-3-jhubbard@nvidia.com
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Jeff Xu <jeffxu@chromium.org>
Cc: Andrei Vagin <avagin@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Kees Cook <kees@kernel.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "cleanups, fixes, and progress towards avoiding "make
headers"", v3.
Eventually, once the build succeeds on a sufficiently old distro, the idea
is to delete $(KHDR_INCLUDES) from the selftests/mm build, and then after
that, from selftests/lib.mk and all of the other selftest builds.
For now, this series merely achieves a clean build of selftests/mm on a
not-so-old distro: Ubuntu 23.04. In other words, after this series is
applied, it is possible to delete $(KHDR_INCLUDES) from
selftests/mm/Makefile and the build will still succeed.
1. Add tools/uapi/asm/unistd_[32|x32|64].h files, which include
definitions of __NR_mseal, and include them (indirectly) from the files
that use __NR_mseal. The new files are copied from ./usr/include/asm,
which is how we have agreed to do this sort of thing, see [1].
2. Add fs.h, similarly created: it was copied directly from a snapshot
of ./usr/include/linux/fs.h after running "make headers".
3. Add a few selected prctl.h values that the ksm and mdwe tests require.
4. Factor out some common code from mseal_test.c and seal_elf.c, into
a new mseal_helpers.h file.
5. Remove local __NR_* definitions and checks.
[1] commit e076eaca59 ("selftests: break the dependency upon local
header files")
This patch (of 6):
The selftests/mm build isn't exactly "broken", according to the current
documentation, which still claims that one must run "make headers",
before building the kselftests. However, according to the new plan to
get rid of that requirement [1], they are future-broken: attempting to
build selftests/mm *without* first running "make headers" will fail due
to not finding __NR_mseal.
Therefore, include asm-generic/unistd.h, which has all of the system
call numbers that are needed, abstracted across the various CPU arches.
Some explanation in support of this "asm-generic" approach:
For most user space programs, the header file inclusion behaves as per
this microblaze example, which comes from David Hildenbrand (thanks!):
arch/microblaze/include/asm/unistd.h
-> #include <uapi/asm/unistd.h>
arch/microblaze/include/uapi/asm/unistd.h
-> #include <asm/unistd_32.h>
-> Generated during "make headers"
usr/include/asm/unistd_32.h is generated via
arch/microblaze/kernel/syscalls/Makefile with the syshdr command.
So we never end up including asm-generic/unistd.h directly on
microblaze... [2]
However, those programs are installed on a single computer that has a
single set of asm and kernel headers installed.
In contrast, the kselftests are quite special, because they must
provide a set of user space programs that:
a) Mostly avoid using the installed (distro) system header files.
b) Build (and run) on all supported CPU architectures
c) Occasionally use symbols that have so new that they have not
yet been included in the distro's header files.
Doing (a) creates a new problem: how to get a set of cross-platform
headers that works in all cases.
Fortunately, asm-generic headers solve that one. Which is why we need
to use them here--at least, for particularly difficult headers such as
unistd.h.
The reason this hasn't really come up yet, is that until now, the
kselftests requirement (which I'm trying to eventually remove) was that
"make headers" must first be run. That allowed the selftests to get a
snapshot of sufficiently new header files that looked just like (and
conflict with) the installed system headers.
And as an aside, this is also an improvement over past practices of
simply open-coding in a single (not per-arch) definition of a new
symbol, directly into the selftest code.
[1] commit e076eaca59 ("selftests: break the dependency upon local
header files")
[2] https://lore.kernel.org/all/0b152bea-ccb6-403e-9c57-08ed5e828135@redhat.com/
Link: https://lkml.kernel.org/r/20240618022422.804305-1-jhubbard@nvidia.com
Link: https://lkml.kernel.org/r/20240618022422.804305-2-jhubbard@nvidia.com
Fixes: 4926c7a52d ("selftest mm/mseal memory sealing")
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Jeff Xu <jeffxu@chromium.org>
Cc: Andrei Vagin <avagin@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Kees Cook <kees@kernel.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Commit 1b151e2435 ("block: Remove special-casing of compound pages")
caused a change in behaviour when releasing the pages if the buffer does
not start at the beginning of the page. This was because the calculation
of the number of pages to release was incorrect. This was fixed by commit
38b43539d6 ("block: Fix page refcounts for unaligned buffers in
__bio_release_pages()").
We pin the user buffer during direct I/O writes. If this buffer is a
hugepage, bio_release_page() will unpin it and decrement all references
and pin counts at ->bi_end_io. However, if any references to the hugepage
remain post-I/O, the hugepage will not be freed upon unmap, leading to a
memory leak.
This patch verifies that a hugepage, used as a user buffer for DIO
operations, is correctly freed upon unmapping, regardless of whether the
offsets are aligned or unaligned w.r.t page boundary.
Test Result Fail Scenario (Without the fix)
--------------------------------------------------------
[]# ./hugetlb_dio
TAP version 13
1..4
No. Free pages before allocation : 7
No. Free pages after munmap : 7
ok 1 : Huge pages freed successfully !
No. Free pages before allocation : 7
No. Free pages after munmap : 7
ok 2 : Huge pages freed successfully !
No. Free pages before allocation : 7
No. Free pages after munmap : 7
ok 3 : Huge pages freed successfully !
No. Free pages before allocation : 7
No. Free pages after munmap : 6
not ok 4 : Huge pages not freed!
Totals: pass:3 fail:1 xfail:0 xpass:0 skip:0 error:0
Test Result PASS Scenario (With the fix)
---------------------------------------------------------
[]#./hugetlb_dio
TAP version 13
1..4
No. Free pages before allocation : 7
No. Free pages after munmap : 7
ok 1 : Huge pages freed successfully !
No. Free pages before allocation : 7
No. Free pages after munmap : 7
ok 2 : Huge pages freed successfully !
No. Free pages before allocation : 7
No. Free pages after munmap : 7
ok 3 : Huge pages freed successfully !
No. Free pages before allocation : 7
No. Free pages after munmap : 7
ok 4 : Huge pages freed successfully !
Totals: pass:4 fail:0 xfail:0 xpass:0 skip:0 error:0
[donettom@linux.ibm.com: address review comments from Muhammad]
Link: https://lkml.kernel.org/r/20240604132801.23377-1-donettom@linux.ibm.com
[donettom@linux.ibm.com: add this test to run_vmtests.sh]
Link: https://lkml.kernel.org/r/20240607182000.6494-1-donettom@linux.ibm.com
Link: https://lkml.kernel.org/r/20240523063905.3173-1-donettom@linux.ibm.com
Fixes: 38b43539d6 ("block: Fix page refcounts for unaligned buffers in __bio_release_pages()")
Signed-off-by: Donet Tom <donettom@linux.ibm.com>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Co-developed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Tony Battersby <tonyb@cybernetics.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Post FEAT_LPA2, the Aarch64 Linux kernel extends higher address support to
4K and 16K translation granules. To support testing this out, we need to
do away with static initialization of page size, while still maintaining
the nice array of testcases; this can be achieved by initializing and
populating the array as a stack variable, and filling in the page size and
hugepage size at runtime.
Link: https://lkml.kernel.org/r/20240522070435.773918-3-dev.jain@arm.com
Signed-off-by: Dev Jain <dev.jain@arm.com>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "Restructure va_high_addr_switch".
The va_high_addr_switch memory selftest tests out some corner cases
related to allocation and page/hugepage faulting around the switch
boundary. Currently, the page size and hugepage size have been statically
defined. Post FEAT_LPA2, the Aarch64 Linux kernel adds support for 4k and
16k translation granules on higher addresses; we restructure the test to
support the same. In addition, we avoid invocation of the binary twice,
in the shell script, to reduce test noise.
This patch (of 2):
When invoking the binary with "--run-hugetlb" flag, the testcases
involving the base page are anyways going to be run. Therefore, remove
duplication by invoking the binary only once.
Link: https://lkml.kernel.org/r/20240522070435.773918-1-dev.jain@arm.com
Link: https://lkml.kernel.org/r/20240522070435.773918-2-dev.jain@arm.com
Signed-off-by: Dev Jain <dev.jain@arm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The openvswitch selftest is difficult to debug for anyone that isn't
directly familiar with the openvswitch module and the specifics of the
test cases. Many times when something fails, the debug log will be
sparsely populated and it takes some time to understand where a failure
occured.
Increase the amount of details logged to the debug log by trapping all
'info' logs, and all 'ovs_sbx' commands.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240702132830.213384-4-aconole@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Previously, the openvswitch.sh test suites would not attempt to autoload
the openvswitch module. The idea was that a user who is manually running
tests might not even have the OVS module loaded or configured for their
own development. However, if the kernel module is configured, and the
module can be autoloaded then we should just attempt to load it and run
the tests. This is especially true in the CI environments, where the CI
tests should be able to rely on auto loading to get the test suite running.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240702132830.213384-3-aconole@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We found that since some tests rely on the TCP SYN timeouts to cause flow
misses, the default test suite timeout of 45 seconds is quick to be
exceeded. Bump the timeout to 15 minutes.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240702132830.213384-2-aconole@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As predicted by David running the test on a machine with a single
interface is a bit unreliable. We try to send 20k packets with
iperf and expect fewer than 10k packets on the default context.
The test isn't very quick, iperf will usually send 100k packets
by the time we stop it. So we're off by 5x on the number of iperf
packets but still expect default context to only get the hardcoded
10k. The intent is to make sure we get noticeably less traffic
on the default context. Use half of the resulting iperf traffic
instead of the hard coded 10k.
Link: https://patch.msgid.link/20240702233728.4183387-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
dsos__add would add at the end of the dso array possibly requiring a
later find to re-sort the array. Patterns of find then add were
becoming O(n*log n) due to the sorts. Change the add routine to be
O(n) rather than O(1) but to maintain the sorted-ness of the dsos
array so that later finds don't need the O(n*log n) sort.
Fixes: 3f4ac23a99 ("perf dsos: Switch backing storage to array from rbtree/list")
Reported-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Steinar Gunderson <sesse@google.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Matt Fleming <matt@readmodwrite.com>
Link: https://lore.kernel.org/r/20240703172117.810918-3-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The array is sorted, so just move the elements and insert in order.
Fixes: 13ca628716 ("perf comm: Add reference count checking to 'struct comm_str'")
Reported-by: Matt Fleming <matt@readmodwrite.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Matt Fleming <matt@readmodwrite.com>
Cc: Steinar Gunderson <sesse@google.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Link: https://lore.kernel.org/r/20240703172117.810918-2-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Clang does not support implicit LMUL in the vset* instruction sequences.
Introduce an explicit LMUL in the vsetivli instruction.
Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
Fixes: 9d5328eeb1 ("riscv: selftests: Add signal handling vector tests")
Link: https://lore.kernel.org/r/20240702-fix_sigreturn_test-v1-1-485f88a80612@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The encoding of an x86 instruction can include a ModR/M and a SIB
(Scale-Index-Base) byte to describe the addressing mode of the
instruction.
objtool processes all addressing mode with a SIB base of 5 as having
%rbp as the base register. However, a SIB base of 5 means that the
effective address has either no base (if ModR/M mod is zero) or %rbp
as the base (if ModR/M mod is 1 or 2). This can cause objtool to confuse
an absolute address access with a stack operation.
For example, objtool will see the following instruction:
4c 8b 24 25 e0 ff ff mov 0xffffffffffffffe0,%r12
as a stack operation (i.e. similar to: mov -0x20(%rbp), %r12).
[Note that this kind of weird absolute address access is added by the
compiler when using KASAN.]
If this perceived stack operation happens to reference the location
where %r12 was pushed on the stack then the objtool validation will
think that %r12 is being restored and this can cause a stack state
mismatch.
This kind behavior was seen on xfs code, after a minor change (convert
kmem_alloc() to kmalloc()):
>> fs/xfs/xfs.o: warning: objtool: xfs_da_grow_inode_int+0x6c1: stack state mismatch: reg1[12]=-2-48 reg2[12]=-1+0
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202402220435.MGN0EV6l-lkp@intel.com/
Signed-off-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Link: https://lore.kernel.org/r/20240620144747.2524805-1-alexandre.chartre@oracle.com
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
The help message mentions the main options as "actions", which is
different from the optional "options". But the check error messages
outputs "option" or "command" for referring to actions.
Make the error messages consistent with help.
Signed-off-by: Siddh Raman Pant <siddh.raman.pant@oracle.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Support building the C YNL userspace library into one big static file.
We can then link selftests against it for easy to use C netlink
interface.
Signed-off-by: Mina Almasry <almasrymina@google.com>
Link: https://patch.msgid.link/20240628003253.1694510-14-almasrymina@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In the past, the exclude_guest setting has had no effect on Intel PT
tracing, but that may not be the case in the future.
Set the flag correctly based upon whether KVM is using Intel PT
"Host/Guest" mode, which is determined by the kvm_intel module
parameter pt_mode:
pt_mode=0 System-wide mode : host and guest output to host buffer
pt_mode=1 Host/Guest mode : host/guest output to host/guest
buffers respectively
Fixes: 6e86bfdc4a ("perf intel-pt: Support decoding of guest kernel")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Link: https://lore.kernel.org/r/20240625104532.11990-3-adrian.hunter@intel.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
aux_watermark is a u32. For a 64-bit size, cap the aux_watermark
calculation at UINT_MAX instead of truncating it to 32-bits.
Fixes: 874fc35cdd ("perf intel-pt: Use aux_watermark")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Link: https://lore.kernel.org/r/20240625104532.11990-2-adrian.hunter@intel.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Invocation the tool built with the default settings fails:
$ cpupower
cpupower: error while loading shared libraries: libcpupower.so.1: cannot
open shared object file: No such file or directory
The issue is that Makefile puts the library to "/usr/lib64" dir for a 64
bit machine. This is wrong. According to the "File hierarchy standard
specification:
https://en.wikipedia.org/wiki/Filesystem_Hierarchy_Standardhttps://refspecs.linuxfoundation.org/FHS_3.0/fhs-3.0.pdf
"/usr/lib<qual>" dirs are intended for alternative-format libraries
(e.g., "/usr/lib32" for 32-bit libraries on a 64-bit machine (optional)).
The utility is built for the current machine and doesn't handle
'CROSS_COMPILE' and 'ARCH' env variables. It also doesn't change bit
depth. So the result is always the same - binary for x86_64
architecture. Therefore the library should be put in the '/usr/lib'
dir regardless of the build options.
This is the case for all the distros that comply with the
'File Hierarchy Standard 3.0" by Linux Foundation. Most of the distros
comply with it. For example, one can check this by examining the
"/usr/lb64" dir on debian-based distros and find that it contains only
"/usr/lib64/ld-linux-x86-64.so.2". And examine that "/usr/lib" contains
both 32 and 64 bit code:
find /usr/lib -name "*.so*" -type f | xargs file | grep 32-bit
find /usr/lib -name "*.so*" -type f | xargs file | grep 64-bit
Fix the issue by changing library destination dir to "/usr/lib".
Signed-off-by: Roman Storozhenko <romeusmeister@gmail.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
This kselftest fixes update for Linux 6.10-rc7 consists of one single
patch to fix the non-contiguous CBM resctrl:
- AMD supports non-contiguous CBM but does not report it via CPUID. This
test should not use CPUID on AMD to detect non-contiguous CBM support.
Fix the problem so the test uses CPUID to discover non-contiguous CBM
support only on Intel.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmaEZZkACgkQCwJExA0N
QxwgbRAAwLYXO30O83wEYW3CpTQ+zuEGB9OUwZpDX1zFatREYXUfwxWWnnZVHoed
4/ifd4Xn+GcHEDZi1adzAadEAbHWyta8nTzpNvP15wQ1zLCDq33G/ec+aFYiAntK
zu/AGqwDvV+mgOq0OJ9QuZy/PSaB1WPcRXghxbaeiD0jJCZt8QL/WDjTuq7+n7J1
Sgbo0W9eoRpVNAgtAf8kFrghggvPAorrTvah28YMRM3yEGc5Vp3XtkURYAhbKBzr
ZF/04PUoM2GDN3ua1wY63n3eGz5CupP7f/AdCRxwW0YJgKjGQuKmyBSt7AAdsAvy
kV2eAy0Qb1u6JowJwfvJ/P5/nEgvtqovfMah/yLpr1Y0AIgycgKsmJcy6FaK1KlD
hV3omXNLlRQGgepViHI7PEQcsYhi8GX3Mi9M8oWu5QgMCsxXocvpUVBu64gmUakK
Foj3gI4CGPskgO5IT5FsZlv/PbyWNVntn+I7geSpemTxHy1e7CIxMGaDgSM3Wsps
3/IQvP4RwF0uHzvC5dK85JJe9toxD9zzXUZavyYUmrhSc6dfex0qvlynfhvvTL9S
+sJe3gdu09PFZCMRFIsWmmeN6EopNEr6cLO46uLnkrHZVi33yhtezuTloXimo2yR
udMYLsM2nVV5+S2m+0XGb+peXWu48bmxepNxHFfiIXDKWReaid0=
=Net1
-----END PGP SIGNATURE-----
Merge tag 'linux_kselftest-fixes-6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest fixes from Shuah Khan:
"One single patch to fix the non-contiguous CBM resctrl:
- AMD supports non-contiguous CBM but does not report it via CPUID.
This test should not use CPUID on AMD to detect non-contiguous CBM
support. Fix the problem so the test uses CPUID to discover
non-contiguous CBM support only on Intel"
* tag 'linux_kselftest-fixes-6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests/resctrl: Fix non-contiguous CBM for AMD
cxl_event_common was an unfortunate naming choice and caused confusion with
the existing Common Event Record. Furthermore, its fields didn't map all
the common information between DRAM and General Media Events.
Remove cxl_event_common and introduce cxl_event_media_hdr to record common
information between DRAM and General Media events.
cxl_event_media_hdr, which is embedded in both cxl_event_gen_media and
cxl_event_dram, leverages the commonalities between the two events to
simplify their respective handling.
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Fabio M. De Francesco <fabio.m.de.francesco@linux.intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20240607144423.48681-1-fabio.m.de.francesco@linux.intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Merge fixes and updates in v6.10 into perf-tools-next to resolve changes
in synthesizing the LOST_SAMPLES records and build fixes.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Check that __sync_*() functions don't cause kernel panics when handling
freed arena pages.
x86_64 does not support some arena atomics yet, and aarch64 may or may
not support them, based on the availability of LSE atomics at run time.
Do not enable this test for these architectures for simplicity.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240701234304.14336-12-iii@linux.ibm.com
While clang uses __attribute__((address_space(1))) both for defining
arena pointers and arena globals, GCC requires different syntax for
both. While __arena covers the first use case, introduce __arena_global
to cover the second one.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240701234304.14336-11-iii@linux.ibm.com
As Quentin said [0], BPF map pinning will fail if the pinmaps path is not
under the bpffs, like:
libbpf: specified path /home/ubuntu/test/sock_ops_map is not on BPF FS
Error: failed to pin all maps
[0] https://github.com/libbpf/bpftool/issues/146
Fixes: 3767a94b32 ("bpftool: add pinmaps argument to the load/loadall")
Signed-off-by: Tao Chen <chen.dylane@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Quentin Monnet <qmo@kernel.org>
Reviewed-by: Quentin Monnet <qmo@kernel.org>
Link: https://lore.kernel.org/bpf/20240702131150.15622-1-chen.dylane@gmail.com
Add selftests for both atomic replace and non atomic replace
livepatches. The result is as follows,
TEST: sysfs test ... ok
TEST: sysfs test object/patched ... ok
TEST: sysfs test replace enabled ... ok
TEST: sysfs test replace disabled ... ok
Suggested-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Tested-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Reviewed-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Acked-by: Miroslav Benes <mbenes@suse.cz>
Link: https://lore.kernel.org/r/20240625151123.2750-3-laoar.shao@gmail.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
Add testcase where 7th argument is struct for architectures with 8 argument
registers, and increase the complexity of the struct.
Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Acked-by: Björn Töpel <bjorn@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20240702121944.1091530-4-pulehui@huaweicloud.com
Factor out many args tests from tracing_struct and rename some function names
to make more sense. Meanwhile, remove unnecessary skeleton detach operation
as it will be covered by skeleton destroy operation.
Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20240702121944.1091530-3-pulehui@huaweicloud.com
Introduce dynamic adjustment capabilities for fill_size and comp_size
parameters to support larger batch sizes beyond the previous 2K limit.
Update HW_SW_MAX_RING_SIZE test cases to evaluate AF_XDP's robustness by
pushing hardware and software ring sizes to their limits. This test
ensures AF_XDP's reliability amidst potential producer/consumer throttling
due to maximum ring utilization.
Signed-off-by: Tushar Vyavahare <tushar.vyavahare@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/bpf/20240702055916.48071-3-tushar.vyavahare@intel.com
Previously, HW_SW_MIN_RING_SIZE and HW_SW_MAX_RING_SIZE test cases were
not validating Tx/Rx traffic at all due to early return after changing HW
ring size in testapp_validate_traffic().
Fix the flow by checking return value of set_ring_size() and act upon it
rather than terminating the test case there.
Signed-off-by: Tushar Vyavahare <tushar.vyavahare@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/bpf/20240702055916.48071-2-tushar.vyavahare@intel.com
- Fix no cxl_nvd during pmem region auto-assemble
- Avoid NULLL pointer dereference in region lookup
- Add missing checks to interleave capability
- Add cxl kdoc fix to address document compilation error
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE5DAy15EJMCV1R6v9YGjFFmlTOEoFAmaC5f0ACgkQYGjFFmlT
OEpjug/8DxnDA/hxi45kQHz0iwtMNFPv87qpC5MQ5XEXG0RLY4xka5UI7SKDPuo8
JZGWCi/wObUU0iAtG9tBji3aMkx9BM0PYj3LIMkQaZccUtS47xgzQius4c4KpPCC
gGDGEXc1oMVIIBzh7/ZXU6PTd4a3+8c6DZoSIUEyyt1j72R8ef9lans/w4Dl39FI
c+SVE4GdlnVe5/34CUTe+It8vn8bV/a9YXwjadIuXnOFxsPym2CdeADssj8IZUOS
pmDU5CdGPJAnL9jT+/NtuY312wrGi7ImxhLtD9/3pJhluqs/OMW4OWcIgDoDAP13
Ndn/eLoO2zgZtVAoCeMMuEQcRBZGwCcrIbN1CBVJ2HR+n6XlO7ICmABcOoZvhG0b
wdK6ukNnWLoP0xXRpqYWcTsGfjWTadKqom1hs6jJqMeJcK8HcQNzb8xNzoXdk0QR
wT8AKYRupwQuAY90mSp4aAlo8aJypAXB6tJzzp05QcgbTd+uif0STiiOzkG9FCAE
1v1snUjjkrIjUwODzX4Me2xw0AxSZdOvk//5mKB3fSQdXYxQUNfRunAI02qIM+8M
XPM/QxA+DeJEyD8BTnDo5J5SK5XhoMHaCPrOPYMm5bPYKS0TxHHCY4tMRAQTfhW+
QVcbkqi+WSAvVibl0OcwYmR64TnGMtCwhQrPFqaX+aWBpvhc+1c=
=6MV8
-----END PGP SIGNATURE-----
Merge tag 'cxl-fixes-6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull cxl fixes from Dave Jiang:
- Fix no cxl_nvd during pmem region auto-assemble
- Avoid NULLL pointer dereference in region lookup
- Add missing checks to interleave capability
- Add cxl kdoc fix to address document compilation error
* tag 'cxl-fixes-6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
cxl: documentation: add missing files to cxl driver-api
cxl/region: check interleave capability
cxl/region: Avoid null pointer dereference in region lookup
cxl/mem: Fix no cxl_nvd during pmem region auto-assembling
Coverity points out that after calling btf__new_empty_split() the wrong
value is checked for error.
Fixes: 58e185a0dc ("libbpf: Add btf__distill_base() creating split BTF with distilled base BTF")
Reported-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240629100058.2866763-1-alan.maguire@oracle.com
In the same way than commit ae7487d112 ("selftests/hid: ensure we can
compile the tests on kernels pre-6.3") we should expose struct hid_bpf_ops
when it's not available in vmlinux.h.
So unexpose an eventual struct hid_bpf_ops, include vmlinux.h, and
re-export struct hid_bpf_ops.
Fixes: d7696738d6 ("selftests/hid: convert the hid_bpf selftests with struct_ops")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/r/202406270328.bscLN1IF-lkp@intel.com/
Link: https://patch.msgid.link/20240701-fix-cki-v2-1-20564e2e1393@kernel.org
Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
Building the sigaltstack test with GCC on 64-bit powerpc errors with:
gcc -Wall sas.c -o /home/michael/linux/.build/kselftest/sigaltstack/sas
In file included from sas.c:23:
current_stack_pointer.h:22:2: error: #error "implement current_stack_pointer equivalent"
22 | #error "implement current_stack_pointer equivalent"
| ^~~~~
sas.c: In function ‘my_usr1’:
sas.c:50:13: error: ‘sp’ undeclared (first use in this function); did you mean ‘p’?
50 | if (sp < (unsigned long)sstack ||
| ^~
This happens because GCC doesn't define __ppc__ for 64-bit builds, only
32-bit builds. Instead use __powerpc__ to detect powerpc builds, which
is defined by clang and GCC for 64-bit and 32-bit builds.
Fixes: 05107edc91 ("selftests: sigaltstack: fix -Wuninitialized")
Cc: stable@vger.kernel.org # v6.3+
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240520062647.688667-1-mpe@ellerman.id.au
objtool complains:
arch/x86/kvm/kvm.o: warning: objtool: .altinstr_replacement+0xc5: call without frame pointer save/setup
vmlinux.o: warning: objtool: .altinstr_replacement+0x2eb: call without frame pointer save/setup
Make sure %rSP is an output operand to the respective asm() statements.
The test_cc() hunk and ALT_OUTPUT_SP() courtesy of peterz. Also from him
add some helpful debugging info to the documentation.
Now on to the explanations:
tl;dr: The alternatives macros are pretty fragile.
If I do ALT_OUTPUT_SP(output) in order to be able to package in a %rsp
reference for objtool so that a stack frame gets properly generated, the
inline asm input operand with positional argument 0 in clear_page():
"0" (page)
gets "renumbered" due to the added
: "+r" (current_stack_pointer), "=D" (page)
and then gcc says:
./arch/x86/include/asm/page_64.h:53:9: error: inconsistent operand constraints in an ‘asm’
The fix is to use an explicit "D" constraint which points to a singleton
register class (gcc terminology) which ends up doing what is expected
here: the page pointer - input and output - should be in the same %rdi
register.
Other register classes have more than one register in them - example:
"r" and "=r" or "A":
‘A’
The ‘a’ and ‘d’ registers. This class is used for
instructions that return double word results in the ‘ax:dx’
register pair. Single word values will be allocated either in
‘ax’ or ‘dx’.
so using "D" and "=D" just works in this particular case.
And yes, one would say, sure, why don't you do "+D" but then:
: "+r" (current_stack_pointer), "+D" (page)
: [old] "i" (clear_page_orig), [new1] "i" (clear_page_rep), [new2] "i" (clear_page_erms),
: "cc", "memory", "rax", "rcx")
now find the Waldo^Wcomma which throws a wrench into all this.
Because that silly macro has an "input..." consume-all last macro arg
and in it, one is supposed to supply input *and* clobbers, leading to
silly syntax snafus.
Yap, they need to be cleaned up, one fine day...
Closes: https://lore.kernel.org/oe-kbuild-all/202406141648.jO9qNGLa-lkp@intel.com/
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Sean Christopherson <seanjc@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240625112056.GDZnqoGDXgYuWBDUwu@fat_crate.local
This is a sloppy logic analyzer using GPIOs. It comes with a script to
isolate a CPU for polling. While this is definitely not a production
level analyzer, it can be a helpful first view when remote debugging.
Read the documentation for details.
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20240620094159.6785-2-wsa+renesas@sang-engineering.com
[Bartosz: moved the Kconfig entry into a different category]
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEN9lkrMBJgcdVAPub1V2XiooUIOQFAmZ+3ogACgkQ1V2XiooU
IOR7HRAAsVkJnKLPqV4lcY2Yx/QHi+o1s0pBCTZIqzs2rRXfaYrdu9xV0225DuPn
xuNNV2GChtWftQxvwcxVLgTGHGG/p8bNYiNJoYEE6acftHZMV4ZZ7NG1yCv2TI3x
8Udu3vFnvnQhV9Q4LNR3SMtCtz5Z5QP1KNM74uaksN+9opCNniuG23Eft6YXh7Kf
BYLvJX4pn+St2YTvvnNbA6U/ALxy5OZ/YwXP6FjmERp3AGoFPF2w+MEBmBlyGE3X
LDKZ05hnKG4Sd/qp7XnZi9kEZoI9iBKg+GPm5ey1BVjZNMCc5hSpCIdYKb8FiwRa
cN+UCc82H9/N2mJXSrcBDA6n8+lp0dLpfomliERyieY3m38Rp7BKTh6pUOmQCw+H
bmTJ7rz5WBCC5yjts0N7+2SaVOo+RQpSLXV/SQCIKmk+Xl5sJinvP/gnKWAaPWIm
3gC4Bv7JUuB6x62EcRzoWGFDw8dXlQ64gvkwyMpeelFIexR3dFCfoA3zAaqJnlxJ
uZXEF9xuQsZht8IYD37Z6C99tVJzVj/4gCKWKZwi3Kcn/G/MRkQ3lNAPyLewIcMV
nC1pwU31z1PXNrbSXrXlUEdl1yUzg04wkc4RrVMJgU983kdQdMTp8Q4BbckdhWCV
4agMNuP4brp6iCvDPamcrWQ+4AbXw/zSdqQr8ONExrOgDUd1ePw=
=0BtN
-----END PGP SIGNATURE-----
Merge tag 'nf-next-24-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next into main
Pablo Neira Ayuso says:
====================
Netfilter/IPVS updates for net-next
The following patchset contains Netfilter/IPVS updates for net-next:
Patch #1 to #11 to shrink memory consumption for transaction objects:
struct nft_trans_chain { /* size: 120 (-32), cachelines: 2, members: 10 */
struct nft_trans_elem { /* size: 72 (-40), cachelines: 2, members: 4 */
struct nft_trans_flowtable { /* size: 80 (-48), cachelines: 2, members: 5 */
struct nft_trans_obj { /* size: 72 (-40), cachelines: 2, members: 4 */
struct nft_trans_rule { /* size: 80 (-32), cachelines: 2, members: 6 */
struct nft_trans_set { /* size: 96 (-24), cachelines: 2, members: 8 */
struct nft_trans_table { /* size: 56 (-40), cachelines: 1, members: 2 */
struct nft_trans_elem can now be allocated from kmalloc-96 instead of
kmalloc-128 slab.
Series from Florian Westphal. For the record, I have mangled patch #1
to add nft_trans_container_*() and use if for every transaction object.
I have also added BUILD_BUG_ON to ensure struct nft_trans always comes
at the beginning of the container transaction object. And few minor
cleanups, any new bugs are of my own.
Patch #12 simplify check for SCTP GSO in IPVS, from Ismael Luceno.
Patch #13 nf_conncount key length remains in the u32 bound, from Yunjian Wang.
Patch #14 removes unnecessary check for CTA_TIMEOUT_L3PROTO when setting
default conntrack timeouts via nfnetlink_cttimeout API, from
Lin Ma.
Patch #15 updates NFT_SECMARK_CTX_MAXLEN to 4096, SELinux could use
larger secctx names than the existing 256 bytes length.
Patch #16 adds a selftest to exercise nfnetlink_queue listeners leaving
nfnetlink_queue, from Florian Westphal.
Patch #17 increases hitcount from 255 to 65535 in xt_recent, from Phil Sutter.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a protocol spec for tcp_metrics, so that it's accessible via YNL.
Useful at the very least for testing fixes.
In this episode of "10,000 ways to complicate netlink" the metric
nest has defines which are off by 1. iproute2 does:
struct rtattr *m[TCP_METRIC_MAX + 1 + 1];
parse_rtattr_nested(m, TCP_METRIC_MAX + 1, a);
for (i = 0; i < TCP_METRIC_MAX + 1; i++) {
// ...
attr = m[i + 1];
This is too weird to support in YNL, add a new set of defines
with _correct_ values to the official kernel header.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
nolibc gained an implementation of strerror() recently.
Use it and drop the ifdeffery.
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
strerror() is commonly used.
For example in kselftest which currently needs to do an #ifdef NOLIBC to
handle the lack of strerror().
Keep it simple and reuse the output format of perror() for strerror().
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Some tests only make sense on nolibc. To avoid gaps in the test numbers
do to inline "#ifdef NOLIBC", add a condition to formally skip these
tests.
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
The implementation always works on uintmax_t values.
This is inefficient when only 32bit are needed.
However for all functions this only happens for strtol() on 32bit
platforms.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/r/20240425-nolibc-strtol-v1-2-bfeef7846902@weissschuh.net
run-tests.sh hides the output from the compiler unless the compilation
fails. To recognize newly introduced warnings use -Werror by default.
Also add a switch to disable -Werror in case the warnings are expected.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/r/20240423-nolibc-werror-v1-1-e6f0bd66eb45@weissschuh.net
On musl calls to brk() and sbrk() always fail with ENOMEM.
Detect this and skip the tests on musl.
Tested on glibc 2.39 and musl 1.2.5 in addition to nolibc.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Willy Tarreau <w@1wt.eu>
Link: https://lore.kernel.org/r/20240424-nolibc-musl-brk-v1-1-b49882dd9a93@weissschuh.net
Userspace builds of the radix-tree testing suite fails because of patch
KUnit: add missing MODULE_DESCRIPTION() macros for lib/test_*.ko. Add the
proper defines to tools/testing/radix-tree/idr-test.c so
MODULE_DESCRIPTION has a definition. This allows the build to succeed.
Link: https://lkml.kernel.org/r/20240626232100.306130-1-sidhartha.kumar@oracle.com
Fixes: f069e33daf ("KUnit: add missing MODULE_DESCRIPTION() macros for lib/test_*.ko")
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Jeff Johnson <quic_jjohnson@quicinc.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This implements what I was describing in [1]. When writing a test
author can schedule cleanup / undo actions right after the creation
completes, eg:
cmd("touch /tmp/file")
defer(cmd, "rm /tmp/file")
defer() takes the function name as first argument, and the rest are
arguments for that function. defer()red functions are called in
inverse order after test exits. It's also possible to capture them
and execute earlier (in which case they get automatically de-queued).
undo = defer(cmd, "rm /tmp/file")
# ... some unsafe code ...
undo.exec()
As a nice safety all exceptions from defer()ed calls are captured,
printed, and ignored (they do make the test fail, however).
This addresses the common problem of exceptions in cleanup paths
often being unhandled, leading to potential leaks.
There is a global action queue, flushed by ksft_run(). We could support
function level defers too, I guess, but there's no immediate need..
Link: https://lore.kernel.org/all/877cedb2ki.fsf@nvidia.com/ # [1]
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/20240627185502.3069139-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Exception handlers print the result and use continue
to skip the non-exception result printing. This makes
inserting common post-test code hard. Refactor to
avoid the continues and have only one ktap_result() call.
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/20240627185502.3069139-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Extend the existing test to exercise UDP GSO egress through devices with
various offload capabilities, including lack of checksum offload, which is
the default case for TUN/TAP devices.
Test against a dummy device because it is simpler to set up then TUN/TAP.
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240626-linux-udpgso-v2-2-422dfcbd6b48@cloudflare.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Test if KVM emulates the APIC bus clock at the expected frequency when
userspace configures the frequency via KVM_CAP_X86_APIC_BUS_CYCLES_NS.
Set APIC timer's initial count to the maximum value and busy wait for 100
msec (largely arbitrary) using the TSC. Read the APIC timer's "current
count" to calculate the actual APIC bus clock frequency based on TSC
frequency.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/r/2fccf35715b5ba8aec5e5708d86ad7015b8d74e6.1718214999.git.reinette.chatre@intel.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Execution of the 'make' command in the 'bench' subfolder causes the
following error:
$ make O=cpupower/build/ DESTDIR=cpupower/install/ -j8
" CC " cpupower/build//main.o
" CC " cpupower/build//parse.o
/bin/sh: 1: " CC "cpupower/build//system.o
CC : not found
make: *** [Makefile:21: cpupower/build//main.o] Error 127
make: *** Waiting for unfinished jobs....
/bin/sh: 1: CC : not found
/bin/sh: 1: CC : not found
make: *** [Makefile:21: cpupower/build//parse.o] Error 127
make: *** [Makefile:21: cpupower/build//system.o] Error 127
The makefile uses variables defined in the main project makefile and it
is not intended to run standalone. The reason is that 'bench' subproject
depends on the 'libcpupower' library, see the 'compile-bench' target in
the main makefile.
Add a check that prevents standalone execution of the 'bench' makefile.
Signed-off-by: Roman Storozhenko <romeusmeister@gmail.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Currently, the -r/--repeat option accepts values from 0 and complains
for -1. The help section specifies:
-r, --repeat <n> repeat the workload replay N times (-1: infinite)
The -r -1 option raises an error because replay_repeat is defined as
an unsigned int.
In the current implementation, the workload is repeated n times when
-r <n> is used, except when n is 0.
When -r is set to 0, the workload is also repeated once. This happens
because when -r=0, the run_one_test function is not called. (Note that
mutex unlocking, which is essential for child threads spawned to emulate
the workload, happens in run_one_test.) However, mutex unlocking is
still performed in the destroy_tasks function. Thus, -r=0 results in the
workload running once coincidentally.
To clarify and maintain the existing logic for -r >= 1 (which runs the
workload the specified number of times) and to fix the issue with infinite
runs, make -r=0 perform an infinite run.
Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Link: https://lore.kernel.org/r/20240628071821.15264-1-vineethr@linux.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Add udelay() for x86 tests to allow busy waiting in the guest for a
specific duration, and to match ARM and RISC-V's udelay() in the hopes
of eventually making udelay() available on all architectures.
Get the guest's TSC frequency using KVM_GET_TSC_KHZ and expose it to all
VMs via a new global, guest_tsc_khz. Assert that KVM_GET_TSC_KHZ returns
a valid frequency, instead of simply skipping tests, which would require
detecting which tests actually need/want udelay(). KVM hasn't returned an
error for KVM_GET_TSC_KHZ since commit cc578287e3 ("KVM: Infrastructure
for software and hardware based TSC rate scaling"), which predates KVM
selftests by 6+ years (KVM_GET_TSC_KHZ itself predates KVM selftest by 7+
years).
Note, if the GUEST_ASSERT() in udelay() somehow fires and the test doesn't
check for guest asserts, then the test will fail with a very cryptic
message. But fixing that, e.g. by automatically handling guest asserts,
is a much larger task, and practically speaking the odds of a test afoul
of this wart are infinitesimally small.
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/r/5aa86285d1c1d7fe1960e3fe490f4b22273977e6.1718214999.git.reinette.chatre@intel.com
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
It didn't use the passed field separator (using -x option) when it
prints the metric headers and always put "," between the fields.
Before:
$ sudo ./perf stat -a -x : --per-core -M tma_core_bound --metric-only true
core,cpus,% tma_core_bound: <<<--- here: "core,cpus," but ":" expected
S0-D0-C0:2:10.5:
S0-D0-C1:2:14.8:
S0-D0-C2:2:9.9:
S0-D0-C3:2:13.2:
After:
$ sudo ./perf stat -a -x : --per-core -M tma_core_bound --metric-only true
core:cpus:% tma_core_bound:
S0-D0-C0:2:10.5:
S0-D0-C1:2:15.0:
S0-D0-C2:2:16.5:
S0-D0-C3:2:12.5:
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/r/20240628000604.1296808-2-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The new --per-cluster option was added recently but it forgot to update
the aggr_header fields which are used for --metric-only option. And it
resulted in a segfault due to NULL string in fputs().
Fixes: cbc917a1b0 ("perf stat: Support per-cluster aggregation")
Reviewed-by: Yicong Yang <yangyicong@hisilicon.com>
Tested-by: Yicong Yang <yangyicong@hisilicon.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/r/20240628000604.1296808-1-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Ensure that a dead thread leader doesn't prevent installing new filters
with SECCOMP_FILTER_FLAG_TSYNC from other threads.
Signed-off-by: Andrei Vagin <avagin@google.com>
Link: https://lore.kernel.org/r/20240628021014.231976-5-avagin@google.com
Reviewed-by: Tycho Andersen <tandersen@netflix.com>
Signed-off-by: Kees Cook <kees@kernel.org>
Add a new test case to check that SECCOMP_IOCTL_NOTIF_RECV returns when all
tasks have gone.
Signed-off-by: Andrei Vagin <avagin@google.com>
Link: https://lore.kernel.org/r/20240628021014.231976-4-avagin@google.com
Reviewed-by: Tycho Andersen <tandersen@netflix.com>
Signed-off-by: Kees Cook <kees@kernel.org>
commit a9af47e382 ("iommufd/selftest: Test IOMMU_HWPT_GET_DIRTY_BITMAP")
added tests covering edge cases in the boundaries of iova bitmap. Although
it used buffer sizes thinking in PAGE_SIZE (4K) as opposed to the
MOCK_PAGE_SIZE (2K) that is used in iommufd mock selftests. This meant that
isn't correctly exercising everything specifically the u32 and 4K bitmap
test cases. Fix selftests buffer sizes to be based on mock page size.
Link: https://lore.kernel.org/r/20240627110105.62325-5-joao.m.martins@oracle.com
Reported-by: Kevin Tian <kevin.tian@intel.com>
Closes: https://lore.kernel.org/linux-iommu/96efb6cf-a41c-420f-9673-2f0b682cac8c@oracle.com/
Fixes: a9af47e382 ("iommufd/selftest: Test IOMMU_HWPT_GET_DIRTY_BITMAP")
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Matt Ochs <mochs@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Add more tests for bitmaps smaller than or equal to an u8, though skip the
tests if the IOVA buffer size is smaller than the mock page size.
Link: https://lore.kernel.org/r/20240627110105.62325-4-joao.m.martins@oracle.com
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Matt Ochs <mochs@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
With 64k base pages, the first 128k iova length test requires less than a
byte for a bitmap, exposing a bug in the tests that assume that bitmaps are
at least a byte.
Rather than dealing with bytes, have _test_mock_dirty_bitmaps() pass the
number of bits. The caller functions are adjusted to also use bits as well,
and converting to bytes when clearing, allocating and freeing the bitmap.
Link: https://lore.kernel.org/r/20240627110105.62325-2-joao.m.martins@oracle.com
Reported-by: Matt Ochs <mochs@nvidia.com>
Fixes: a9af47e382 ("iommufd/selftest: Test IOMMU_HWPT_GET_DIRTY_BITMAP")
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Matt Ochs <mochs@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEE67dNfPFP+XUaA73mB9BFOha3NhcFAmZ+PUEUHGxlbi5icm93
bkBpbnRlbC5jb20ACgkQB9BFOha3Nhcq4hAAiGPxoZMkTZLVbipLJFo8AZRvG0Wh
BGhWJ76L4gfKtYIFnR1nBItPIvrD//5gXvRE48wI2dsylHjDyCPqktEpYivv5ESA
SA7oqb47FroXXO4b5u0A5lG+Nzbt3SJK28z0t6GWy9NjQz6HHGT09UAP6X35/ZJH
mc2lwLmDH60f2zdaKIJfAYfhUWdOk8VyctnZVeu8LFYnZf6lP9hD54gqhtd1eYB0
1upAWeH4h8oYcr153zuRECAuz2kZMjjgy+mskHBMRLGcGTojtKwROBeVtXiSJo+N
81ds2QsCAG5YtH3Er3eoLCyWTI+pNeraKuZbnVKfgnR943vh+J89CpCA56hSACGT
9DBA6sjngYE6HKB+NxeAxrBHDC3kE0orsfArpaPsWyWeSYj9lxqjP1raH96w5uwS
+t0KOgKlfOOxrUyUpfXqor4Z/J7JvpRF2ymvzsDWnhQVTnGIsBhmpyyW+FVn/QYm
K38HkMK9eippA0t0wShsg2qF79vVMNhlPMwYhuA9kamTKh0jkOmtn8g9LNP3LBup
qLjq5GT+r2dAO1FPZhk88I0OjhHTjAQkzbCeKM+C0++TVi+HQ9Npcv8hkfebb8rw
ZP6owRfAho6cniYxJNQSsghUcus51zUs4jVQaKwetQxp3awgMUsuOHu5eUy8V7GL
eqjfchtqgWHzxK0=
=if6M
-----END PGP SIGNATURE-----
Merge tag 'v6.10-rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux
Pull turbostat fixes from Len Brown:
"Fix three recent minor turbostat regressions"
* tag 'v6.10-rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
tools/power turbostat: Add local build_bug.h header for snapshot target
tools/power turbostat: Fix unc freq columns not showing with '-q' or '-l'
tools/power turbostat: option '-n' is ambiguous
If userspace program exits while the queue its subscribed to has packets
those need to be discarded.
commit dc21c6cc3d ("netfilter: nfnetlink_queue: acquire rcu_read_lock()
in instance_destroy_rcu()") fixed a (harmless) rcu splat that could be
triggered in this case.
Add a test case to cover this.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Currently the PMU counters test does a single CLFLUSH{,OPT} on the loop's
code, but due to speculative execution this might not cause LLC misses
within the measured section.
Instead of doing a single flush before the loop, do a cache flush on each
iteration of the loop to confuse the prediction and ensure that at least
one cache miss occurs within the measured section.
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
[sean: keep MFENCE, massage changelog]
Link: https://lore.kernel.org/r/20240628005558.3835480-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Tweak the macros in the PMU counters test to prepare for moving the
CLFLUSH+MFENCE instructions into the loop body, to fix an issue where
a single CLFUSH doesn't guarantee an LLC miss.
Link: https://lore.kernel.org/r/20240628005558.3835480-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
We cannot use CLONE_VFORK because we also need to wait for the timeout
signal.
Restore tests timeout by using the original fork() call in __run_test()
but also in __TEST_F_IMPL(). Also fix a race condition when waiting for
the test child process.
Because test metadata are shared between test processes, only the
parent process must set the test PID (child). Otherwise, t->pid may be
set to zero, leading to inconsistent error cases:
# RUN layout1.rule_on_mountpoint ...
# rule_on_mountpoint: Test ended in some other way [127]
# OK layout1.rule_on_mountpoint
ok 20 layout1.rule_on_mountpoint
As safeguards, initialize the "status" variable with a valid exit code,
and handle unknown test exits as errors.
The use of fork() introduces a new race condition in landlock/fs_test.c
which seems to be specific to hostfs bind mounts, but I haven't found
the root cause and it's difficult to trigger. I'll try to fix it with
another patch.
Cc: Christian Brauner <brauner@kernel.org>
Cc: Günther Noack <gnoack@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Will Drewry <wad@chromium.org>
Cc: stable@vger.kernel.org
Closes: https://lore.kernel.org/r/9341d4db-5e21-418c-bf9e-9ae2da7877e1@sirena.org.uk
Fixes: a86f18903d ("selftests/harness: Fix interleaved scheduling leading to race conditions")
Fixes: 24cf65a622 ("selftests/harness: Share _metadata between forked processes")
Link: https://lore.kernel.org/r/20240621180605.834676-1-mic@digikod.net
Tested-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
The direct-call syscall dispatch function doesn't know that the exit()
and exit_group() syscall handlers don't return, so the call sites aren't
optimized accordingly.
Fix that by marking the exit syscall declarations __noreturn.
Fixes the following warnings:
vmlinux.o: warning: objtool: x64_sys_call+0x2804: __x64_sys_exit() is missing a __noreturn annotation
vmlinux.o: warning: objtool: ia32_sys_call+0x29b6: __ia32_sys_exit_group() is missing a __noreturn annotation
Fixes: 1e3ad78334 ("x86/syscall: Don't force use of indirect calls for system calls")
Closes: https://lkml.kernel.org/lkml/6dba9b32-db2c-4e6d-9500-7a08852f17a3@paulmck-laptop
Reported-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://lore.kernel.org/r/5d8882bc077d8eadcc7fd1740b56dfb781f12288.1719381528.git.jpoimboe@kernel.org
This test is unusual in that overriding TESTS does not change the tests to
be run. Split the individual tests into several functions and invoke them
through tests_run() as appropriate.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nothing calls these.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These functions are not used anymore.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The selftest does not use functions from mirror_gre_lib, ditch the import.
It does not use arping either, so drop the require_command as well.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After the previous patch, the function test_span_failable() is always
called with should_fail=1. Drop the argument and streamline the code.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The mirroring tests are currently run in a skip_hw and optionally a skip_sw
mode. The former tests the SW datapath, the latter the HW datapath, if
available. In order to be able to test SW datapath on HW loopbacks, traps
are installed on ingress to get traffic from the HW datapath to the SW one.
This adds an unnecessary complexity when it would be much simpler to just
use a veth-based topology to test the SW datapath. Thus drop all the code
that supports this dual testing.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The mirroring selftests work by sending ICMP traffic between two hosts.
Along the way, this traffic is mirrored to a gretap netdevice, and counter
taps are then installed strategically along the path of the mirrored
traffic to verify the mirroring took place.
The problem with this is that besides mirroring the primary traffic, any
other service traffic is mirrored as well. At the same time, because the
tests need to work in HW-offloaded scenarios, the ability of the device to
do arbitrary packet inspection should not be taken for granted. Most tests
therefore simply use matchall, one uses flower to match on IP address.
As a result, the selftests are noisy, because besides the primary ICMP
traffic, any amount of other service traffic is mirrored as well.
mirror_test() accommodated this noisiness by giving the counters an
allowance of several packets. But in the previous patch, where possible,
counter taps were changed to match only on an exact ICMP message. At least
in those cases, we can demand an exact number of packets to match.
Where the tap is installed on a connective netdevice, the exact matching is
not practical (though with u32, anything is possible). In those places,
there should still be some leeway -- and probably bigger than before,
because experience shows that these tests are very noisy.
To that end, change mirror_test() so that it can be either called with an
exact number to expect, or with an expression. Where leeway is needed,
adjust callers to pass a ">= 10" instead of mere 10.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The mirroring selftests work by sending ICMP traffic between two hosts.
Along the way, this traffic is mirrored to a gretap netdevice, and counter
taps are then installed strategically along the path of the mirrored
traffic to verify the mirroring took place.
The problem with this is that besides mirroring the primary traffic, any
other service traffic is mirrored as well. At the same time, because the
tests need to work in HW-offloaded scenarios, the ability of the device to
do arbitrary packet inspection should not be taken for granted. Most tests
therefore simply use matchall, one uses flower to match on IP address.
As a result, the selftests are noisy, because besides the primary ICMP
traffic, any amount of other service traffic is mirrored as well.
However, often the counter tap is installed at the remote end of the gretap
tunnel. Since this is a SW-datapath scenario anyway, we can make the filter
arbitrarily accurate.
Thus in this patch, add parameters forward_type and backward_type to
several mirroring test helpers, as some other helpers already have. Then
change do_test_span_dir_ips() to instead of installing one generic tap and
using it for test in both directions, install the tap for each direction
separately, matching on the ICMP type given by these parameters.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The test works by sending packets through a tunnel, whence they are
forwarded to a LAG. One of the LAG children is removed from the LAG prior
to the exercise, and the test then counts how many packets pass through the
other one. The issue with this is that it counts all packets, not just the
encapsulated ones.
So instead add a second gretap endpoint to receive the sent packets, and
check reception counters there.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The argument $dir has a fallback value of "ingress". Move the fallback from
the usage site to the argument definition block to make the fact clearer.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The argument is not used by these functions except to propagate it for
ultimately no purpose.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In some functions, argument-forwarding through "$@" without listing the
individual arguments explicitly is fundamental to the operation of a
function. E.g. xfail_on_veth() should be able to run various tests in the
fail-to-xfail regime, and usage of "$@" is appropriate as an abstraction
mechanism. For functions such as simple_if_init(), $@ is a handy way to
pass an array.
In other functions, it's merely a mechanism to save some typing, which
however ends up obscuring the real arguments and makes life hard for those
that end up reading the code.
This patch adds some of the implicit function arguments and correspondingly
expands $@'s. In several cases this will come in handy as following patches
adjust the parameter lists.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
CMIS compliant modules such as QSFP-DD might be running a firmware that
can be updated in a vendor-neutral way by exchanging messages between
the host and the module as described in section 7.3.1 of revision 5.2 of
the CMIS standard.
Add a pair of new ethtool messages that allow:
* User space to trigger firmware update of transceiver modules
* The kernel to notify user space about the progress of the process
The user interface is designed to be asynchronous in order to avoid
RTNL being held for too long and to allow several modules to be
updated simultaneously. The interface is designed with CMIS compliant
modules in mind, but kept generic enough to accommodate future use
cases, if these arise.
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
It makes it harder to shoot yourself in the foot, by using
additional __must_be_array() check.
Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Reuse the enum. It means the same thing in both cases.
Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
fd_perf field used to be part of the union, but later moved out of it,
because we test it with fd_perf != -1 to determine if any perf counter
is opened, making the union unused.
Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
New CPU #defines encode vendor and family as well as model.
N.B. Copied VFM_*() defines here from <asm/cpu_device_id.h> to avoid
an application picking a second internal kernel header file.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Commit 78464d7681 ("tools/power turbostat: Add columns for clustered
uncore frequency") introduced 'probe_intel_uncore_frequency_cluster()'
in a way which prevents printing uncore frequency columns if either of
the '-q' or '-l' options are used. Systems which do not have multiple
uncore frequencies per package are unaffected by this regression.
Fix the function so that uncore frequency columns are shown when either
the '-l' or '-q' option is used by checking if 'quiet' is true after
adding counters for the uncore frequency columns.
Fixes: 78464d7681 ("tools/power turbostat: Add columns for clustered uncore frequency")
Signed-off-by: Adam Hawley <adam.james.hawley@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
In some cases specifying the '-n' command line argument will cause
turbostat to fail. For instance 'turbostat -n 1' works fine; however,
'turbostat -n 1 -d' will fail. This is the result of the first call
to getopt_long_only() where "MP" is specified as the optstring. This can
be easily fixed by changing the optstring from "MP" to "MPn:" to remove
ambiguity between the arguments.
tools/power turbostat: option '-n' is ambiguous; possibilities: '-num_iterations' '-no-msr' '-no-perf'
Fixes: a0e86c90b8 ("tools/power turbostat: Add --no-perf option")
Signed-off-by: David Arcari <darcari@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Arm PMUs have a suffix, either a single decimal (armv8_pmuv3_0) or 3 hex
digits which (armv8_cortex_a53) which Perf assumes are both strippable
suffixes for the purposes of deduplication. S390 "cpum_cf" is a
similarly suffixed core PMU but is only two characters so is not treated
as strippable because the rules are a minimum of 3 hex characters or 1
decimal character.
There are two paths involved in listing PMU events:
* HW/cache event printing assumes core PMUs don't have suffixes so
doesn't try to strip.
* Sysfs PMU events share the printing function with uncore PMUs which
strips.
This results in slightly inconsistent Perf list behavior if a core PMU
has a suffix:
# perf list
...
armv8_pmuv3_0/branch-load-misses/
armv8_pmuv3/l3d_cache_wb/ [Kernel PMU event]
...
Fix it by partially reverting back to the old list behavior where
stripping was only done for uncore PMUs. For example commit 8d9f5146f5
("perf pmus: Sort pmus by name then suffix") mentions that only PMUs
starting 'uncore_' are considered to have a potential suffix. This
change doesn't go back that far, but does only strip PMUs that are
!is_core. This keeps the desirable behavior where the many possibly
duplicated uncore PMUs aren't repeated, but it doesn't break listing for
core PMUs.
Searching for a PMU continues to use the new stripped comparison
functions, meaning that it's still possible to request an event by
specifying the common part of a PMU name, or even open events on
multiple similarly named PMUs. For example:
# perf stat -e armv8_cortex/inst_retired/
5777173628 armv8_cortex_a53/inst_retired/ (99.93%)
7469626951 armv8_cortex_a57/inst_retired/ (49.88%)
Fixes: 3241d46f5f ("perf pmus: Sort/merge/aggregate PMUs like mrvl_ddr_pmu")
Suggested-by: Ian Rogers <irogers@google.com>
Signed-off-by: James Clark <james.clark@arm.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Cc: robin.murphy@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240626145448.896746-3-james.clark@arm.com
Commit b2b9d3a3f0 ("perf pmu: Support wildcards on pmu name in dynamic
pmu events") gives the following example for wildcarding a subset of
PMUs:
E.g., in a system with the following dynamic pmus:
mypmu_0
mypmu_1
mypmu_2
mypmu_4
perf stat -e mypmu_[01]/<config>/
Since commit f91fa2ae63 ("perf pmu: Refactor perf_pmu__match()"), only
"*" has been supported, removing the ability to subset PMUs, even though
parse-events.l still supports ? and [] characters.
Fix it by using fnmatch() when any glob character is detected and add a
test which covers that and other scenarios of
perf_pmu__match_ignoring_suffix().
Fixes: f91fa2ae63 ("perf pmu: Refactor perf_pmu__match()")
Signed-off-by: James Clark <james.clark@arm.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Cc: robin.murphy@arm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240626145448.896746-2-james.clark@arm.com
It's possible to save pipe output of perf record into a file.
$ perf record -o- ... > pipe.data
And you can use the data same as the normal perf data.
$ perf report -i pipe.data
In that case, perf tools will treat the input as a pipe, but it can get
the total size of the input. This means it can show the progress bar
unlike the normal pipe input (which doesn't know the total size in
advance).
While at it, fix the string in __perf_session__process_dir_events().
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240627181916.1202110-1-namhyung@kernel.org
The pmtu testing will require that the OVS module is installed,
so do that.
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Simon Horman <horms@kernel.org>
Signed-off-by: Aaron Conole <aconole@redhat.com>
Link: https://patch.msgid.link/20240625172245.233874-8-aconole@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The current pmtu test infrastucture requires an installed copy of the
ovs-vswitchd userspace. This means that any automated or constrained
environments may not have the requisite tools to run the tests. However,
the pmtu tests don't require any special classifier processing. Indeed
they are only using the vswitchd in the most basic mode - as a NORMAL
switch.
However, the ovs-dpctl kernel utility can now program all the needed basic
flows to allow traffic to traverse the tunnels and provide support for at
least testing some basic pmtu scenarios. More complicated flow pipelines
can be added to the internal ovs test infrastructure, but that is work for
the future. For now, enable the most common cases - wide mega flows with
no other prerequisites.
Enhance the pmtu testing to try testing using the internal utility, first.
As a fallback, if the internal utility isn't running, then try with the
ovs-vswitchd userspace tools.
Additionally, make sure that when the pyroute2 package is not available
the ovs-dpctl utility will error out to properly signal an error has
occurred and skip using the internal utility.
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Aaron Conole <aconole@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240625172245.233874-7-aconole@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The current iteration of IPv6 support requires explicit fields to be set
in addition to not properly support the actual IPv6 addresses properly.
With this change, make it so that the ipv6() bare option is usable to
create wildcarded flows to match broad swaths of ipv6 traffic.
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Simon Horman <horms@kernel.org>
Signed-off-by: Aaron Conole <aconole@redhat.com>
Link: https://patch.msgid.link/20240625172245.233874-6-aconole@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This will be used when setting details about the tunnel to use as
transport. There is a difference between the ODP format between tunnel():
the 'key' flag is not actually a flag field, so we don't support it in the
same way that the vswitchd userspace supports displaying it.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240625172245.233874-5-aconole@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
These will be used in upcoming commits to set specific attributes for
interacting with tunnels. Since set() will use the key parsing routine, we
also make sure to prepend it with an open paren, for the action parsing to
properly understand it.
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Simon Horman <horms@kernel.org>
Signed-off-by: Aaron Conole <aconole@redhat.com>
Link: https://patch.msgid.link/20240625172245.233874-4-aconole@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Until recently, the ovs-dpctl utility was used with a limited actions set
and didn't need to have support for multiple similar actions. However,
when adding support for tunnels, it will be important to support multiple
set() actions in a single flow. When printing these actions, the existing
code will be unable to print all of the sets - it will only print the
first.
Refactor this code to be easier to read and support multiple actions of the
same type in an action list.
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Simon Horman <horms@kernel.org>
Signed-off-by: Aaron Conole <aconole@redhat.com>
Link: https://patch.msgid.link/20240625172245.233874-3-aconole@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The OVS module can operate in conjunction with various types of
tunnel ports. These are created as either explicit tunnel vport
types, OR by creating a tunnel interface which acts as an anchor
for the lightweight tunnel support.
This patch adds the ability to add tunnel ports to an OVS
datapath for testing various scenarios with tunnel ports. With
this addition, the vswitch "plumbing" will at least be able to
push packets around using the tunnel vports. Future patches
will add support for setting required tunnel metadata for lwts
in the datapath. The end goal will be to push packets via these
tunnels, and will be used in an upcoming commit for testing the
path MTU.
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Simon Horman <horms@kernel.org>
Signed-off-by: Aaron Conole <aconole@redhat.com>
Link: https://patch.msgid.link/20240625172245.233874-2-aconole@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Use display hints for formatting scalar attrs. This is specifically
useful for formatting IPv4 addresses carried typically as u32.
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20240626201234.2572964-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This cpupower update for Linux 6.11-rc1 consists of cleanups to man
pages, README files, and enhancements to add help to Makefile.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmZ7MO8ACgkQCwJExA0N
QxwKSBAAoBFIwyfSXCKUfEPfcDAswUF3wBoiwbkVnyYMYb1akDiF7Kb4Fjm3d//d
N//3o+EAo6BBlPTY1VNbiA3N+gCNOz2VVXqXXelgj4jLt4EKtbIx95mtyJV0tBda
iv6iBryNEJIhDygpp8RCBDnyr6CCFeQG/YkO//AghNK33zdY2oLFIkDiK1EsVMBC
3HYVPBuyOFQh3j0FOACoT3uM5fO2Ubw57arp9k/vh5bcJGVgj6HOj3h5Y8IbSzK5
/pL/XwMZ9ELJV//ww3qthZRiEYD0oX1ldOFGMj3Sic27z6lhZoxRD4T5q23rdxJe
wK6nuzcmlBWt4i/2VeOf4st/XKaOrkCNINhT4mG0n5NGU12YmDh9Hi9Uzu9tlQNP
5QXZr9qihK/0IZ51giqc00hQ0c1Vc9pcmcRXRUE+E7v4Qq7B4HohsmxqTDuuUj/i
zWd9QmW5p1kPmGCGbICaBjtDz3K/ciWp/YW3sNdYFcqs6dTfLrR7b32BX8BY5NJY
PTmpCjLjyFqyGkfIrvWHzN+FEKvzVyo8Gk669xiigQ508VgZrcxEV3XLrlL0Bp3P
6kxW9RadmoqmknOGV4wmKoX0+2bVGpJrD6GRraBJVfxydfSfFqTgoHKNL/JeTfxW
fQryjYQF0HtmIH1HLAy6tkKKTHWxq9R00ayBusZrqoqPa4PnhUk=
=3Ngt
-----END PGP SIGNATURE-----
Merge tag 'linux-cpupower-6.11-rc1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/shuah/linux
Merge cpupower utility updates for 6.11 from Shuah Khan:
"This cpupower update for Linux 6.11-rc1 consists of cleanups to man
pages, README files, and enhancements to add help to Makefile."
* tag 'linux-cpupower-6.11-rc1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/shuah/linux:
cpupower: Change the var type of the 'monitor' subcommand display mode
cpupower: Remove absent 'v' parameter from monitor man page
cpupower: Improve cpupower build process description
cpupower: Add 'help' target to the main Makefile
cpupower: Replace a dead reference link with working ones
Cross-merge networking fixes after downstream PR.
No conflicts.
Adjacent changes:
e3f02f32a0 ("ionic: fix kernel panic due to multi-buffer handling")
d9c0420999 ("ionic: Mark error paths in the data path as unlikely")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Current release - regressions:
- core: add softirq safety to netdev_rename_lock
- tcp: fix tcp_rcv_fastopen_synack() to enter TCP_CA_Loss for failed TFO
- batman-adv: fix RCU race at module unload time
Current release - new code bugs:
Previous releases - regressions:
- openvswitch: get related ct labels from its master if it is not confirmed
- eth: bonding: fix incorrect software timestamping report
- eth: mlxsw: fix memory corruptions on spectrum-4 systems
- eth: ionic: use dev_consume_skb_any outside of napi
Previous releases - always broken:
- netfilter: fully validate NFT_DATA_VALUE on store to data registers
- unix: several fixes for OoB data
- tcp: fix race for duplicate reqsk on identical SYN
- bpf:
- fix may_goto with negative offset.
- fix the corner case with may_goto and jump to the 1st insn.
- fix overrunning reservations in ringbuf
- can:
- j1939: recover socket queue on CAN bus error during BAM transmission
- mcp251xfd: fix infinite loop when xmit fails
- dsa: microchip: monitor potential faults in half-duplex mode
- eth: vxlan: pull inner IP header in vxlan_xmit_one()
- eth: ionic: fix kernel panic due to multi-buffer handling
Misc:
- selftest: unix tests refactor and a lot of new cases added
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmZ9ZlQSHHBhYmVuaUBy
ZWRoYXQuY29tAAoJECkkeY3MjxOkawoQAKLTWHswqM790uaAAgqP6jGuC4/waRS8
MowEt5rHlwdMXcHhLrDSrLQoDJAZRsWmjniIgbsaeX+HtY4HXfF0tfDMPKiws3vx
Z51qVj7zYjdT7IoZ7Yc8Zlwmt2kVgO4ba6gSigQSORQO9Qq/WNSb0q8BM6cDaYXT
cXC7ikPeMlLnxKxsFRpZ3CUD06dI/aJFp/pefPEm7/X/EbROlSs5y+2GshPdp5t7
tzOUsLHs6ORVq/6jg2nRHH+0D+LMuQG0Z0yCMmYerJMJNtRIxyW6tTYeAsWXeyn3
UN3gaoQ/SIURDrNRZvHsaVDNO/u4rbYtFLoK7S5uPffPWqsGJY59FcH+xYFukFCD
P5Lca4kKBr8xOahsRfSiO0uFbwQfQAauzNiz9Ue39n1hj+ZhZ/CliBLhUeoBl6Y6
jSsxq+/8CZCQ7beek96cyLx83skAcWAU5BEC9xOVlOTuTL91Gxr9UzSx/FqLI34h
Smgw9ZUPzJgvFLgB/OBQ/WYne9LfJ5RYQHZoAXObiozO3TX7NgBUfa0e1T9dLE3F
TalysSO3/goiZNK5a/UNJcj3fAcSEs4M2z9UIK790i3P3GuRigs1sJEtTUqyowWk
aaTFmWCXE0wdoshJjux3syh3Vk6phJWpOlMLYjy0v5s0BF/ZOfDaKQT/dGsvV1HE
AFGpKpybizNV
=BYgZ
-----END PGP SIGNATURE-----
Merge tag 'net-6.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from can, bpf and netfilter.
There are a bunch of regressions addressed here, but hopefully nothing
spectacular. We are still waiting the driver fix from Intel, mentioned
by Jakub in the previous networking pull.
Current release - regressions:
- core: add softirq safety to netdev_rename_lock
- tcp: fix tcp_rcv_fastopen_synack() to enter TCP_CA_Loss for failed
TFO
- batman-adv: fix RCU race at module unload time
Previous releases - regressions:
- openvswitch: get related ct labels from its master if it is not
confirmed
- eth: bonding: fix incorrect software timestamping report
- eth: mlxsw: fix memory corruptions on spectrum-4 systems
- eth: ionic: use dev_consume_skb_any outside of napi
Previous releases - always broken:
- netfilter: fully validate NFT_DATA_VALUE on store to data registers
- unix: several fixes for OoB data
- tcp: fix race for duplicate reqsk on identical SYN
- bpf:
- fix may_goto with negative offset
- fix the corner case with may_goto and jump to the 1st insn
- fix overrunning reservations in ringbuf
- can:
- j1939: recover socket queue on CAN bus error during BAM
transmission
- mcp251xfd: fix infinite loop when xmit fails
- dsa: microchip: monitor potential faults in half-duplex mode
- eth: vxlan: pull inner IP header in vxlan_xmit_one()
- eth: ionic: fix kernel panic due to multi-buffer handling
Misc:
- selftest: unix tests refactor and a lot of new cases added"
* tag 'net-6.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (61 commits)
net: mana: Fix possible double free in error handling path
selftest: af_unix: Check SIOCATMARK after every send()/recv() in msg_oob.c.
af_unix: Fix wrong ioctl(SIOCATMARK) when consumed OOB skb is at the head.
selftest: af_unix: Check EPOLLPRI after every send()/recv() in msg_oob.c
selftest: af_unix: Check SIGURG after every send() in msg_oob.c
selftest: af_unix: Add SO_OOBINLINE test cases in msg_oob.c
af_unix: Don't stop recv() at consumed ex-OOB skb.
selftest: af_unix: Add non-TCP-compliant test cases in msg_oob.c.
af_unix: Don't stop recv(MSG_DONTWAIT) if consumed OOB skb is at the head.
af_unix: Stop recv(MSG_PEEK) at consumed OOB skb.
selftest: af_unix: Add msg_oob.c.
selftest: af_unix: Remove test_unix_oob.c.
tracing/net_sched: NULL pointer dereference in perf_trace_qdisc_reset()
netfilter: nf_tables: fully validate NFT_DATA_VALUE on store to data registers
net: usb: qmi_wwan: add Telit FN912 compositions
tcp: fix tcp_rcv_fastopen_synack() to enter TCP_CA_Loss for failed TFO
ionic: use dev_consume_skb_any outside of napi
net: dsa: microchip: fix wrong register write when masking interrupt
Fix race for duplicate reqsk on identical SYN
ibmvnic: Add tx check to prevent skb leak
...
Print the guest's random seed during VM creation if and only if the seed
has changed since the seed was last printed. The vast majority of tests,
if not all tests at this point, set the seed during test initialization
and never change the seed, i.e. printing it every time a VM is created is
useless noise.
Snapshot and print the seed during early selftest init to play nice with
tests that use the kselftests harness, at the cost of printing an unused
seed for tests that change the seed during test-specific initialization,
e.g. dirty_log_perf_test. The kselftests harness runs each testcase in a
separate process that is forked from the original process before creating
each testcase's VM, i.e. waiting until first VM creation will result in
the seed being printed by each testcase despite it never changing. And
long term, the hope/goal is that setting the seed will be handled by the
core framework, i.e. that the dirty_log_perf_test wart will naturally go
away.
Reported-by: Yi Lai <yi1.lai@intel.com>
Reported-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Link: https://lore.kernel.org/r/20240627021756.144815-2-dapeng1.mi@linux.intel.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
To catch regression, let's check ioctl(SIOCATMARK) after every
send() and recv() calls.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Even if OOB data is recv()ed, ioctl(SIOCATMARK) must return 1 when the
OOB skb is at the head of the receive queue and no new OOB data is queued.
Without fix:
# RUN msg_oob.no_peek.oob ...
# msg_oob.c:305:oob:Expected answ[0] (0) == oob_head (1)
# oob: Test terminated by assertion
# FAIL msg_oob.no_peek.oob
not ok 2 msg_oob.no_peek.oob
With fix:
# RUN msg_oob.no_peek.oob ...
# OK msg_oob.no_peek.oob
ok 2 msg_oob.no_peek.oob
Fixes: 314001f0bf ("af_unix: Add OOB support")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
When OOB data is in recvq, we can detect it with epoll by checking
EPOLLPRI.
This patch add checks for EPOLLPRI after every send() and recv() in
all test cases.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
When data is sent with MSG_OOB, SIGURG is sent to a process if the
receiver socket has set its owner to the process by ioctl(FIOSETOWN)
or fcntl(F_SETOWN).
This patch adds SIGURG check after every send(MSG_OOB) call.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
When SO_OOBINLINE is enabled on a socket, MSG_OOB can be recv()ed
without MSG_OOB flag, and ioctl(SIOCATMARK) will behaves differently.
This patch adds some test cases for SO_OOBINLINE.
Note the new test cases found two bugs in TCP.
1) After reading OOB data with non-inline mode, we can re-read
the data by setting SO_OOBINLINE.
# RUN msg_oob.no_peek.inline_oob_ahead_break ...
# msg_oob.c:146:inline_oob_ahead_break:AF_UNIX :world
# msg_oob.c:147:inline_oob_ahead_break:TCP :oworld
# OK msg_oob.no_peek.inline_oob_ahead_break
ok 14 msg_oob.no_peek.inline_oob_ahead_break
2) The head OOB data is dropped if SO_OOBINLINE is disabled
if a new OOB data is queued.
# RUN msg_oob.no_peek.inline_ex_oob_drop ...
# msg_oob.c:171:inline_ex_oob_drop:AF_UNIX :x
# msg_oob.c:172:inline_ex_oob_drop:TCP :y
# msg_oob.c:146:inline_ex_oob_drop:AF_UNIX :y
# msg_oob.c:147:inline_ex_oob_drop:TCP :Resource temporarily unavailable
# OK msg_oob.no_peek.inline_ex_oob_drop
ok 17 msg_oob.no_peek.inline_ex_oob_drop
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Currently, recv() is stopped at a consumed OOB skb even if a new
OOB skb is queued and we can ignore the old OOB skb.
>>> from socket import *
>>> c1, c2 = socket(AF_UNIX, SOCK_STREAM)
>>> c1.send(b'hellowor', MSG_OOB)
8
>>> c2.recv(1, MSG_OOB) # consume OOB data stays at middle of recvq.
b'r'
>>> c1.send(b'ld', MSG_OOB)
2
>>> c2.recv(10) # recv() stops at the old consumed OOB
b'hellowo' # should be 'hellowol'
manage_oob() should not stop recv() at the old consumed OOB skb if
there is a new OOB data queued.
Note that TCP behaviour is apparently wrong in this test case because
we can recv() the same OOB data twice.
Without fix:
# RUN msg_oob.no_peek.ex_oob_ahead_break ...
# msg_oob.c:138:ex_oob_ahead_break:AF_UNIX :hellowo
# msg_oob.c:139:ex_oob_ahead_break:Expected:hellowol
# msg_oob.c:141:ex_oob_ahead_break:Expected ret[0] (7) == expected_len (8)
# ex_oob_ahead_break: Test terminated by assertion
# FAIL msg_oob.no_peek.ex_oob_ahead_break
not ok 11 msg_oob.no_peek.ex_oob_ahead_break
With fix:
# RUN msg_oob.no_peek.ex_oob_ahead_break ...
# msg_oob.c:146:ex_oob_ahead_break:AF_UNIX :hellowol
# msg_oob.c:147:ex_oob_ahead_break:TCP :helloworl
# OK msg_oob.no_peek.ex_oob_ahead_break
ok 11 msg_oob.no_peek.ex_oob_ahead_break
Fixes: 314001f0bf ("af_unix: Add OOB support")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
While testing, I found some weird behaviour on the TCP side as well.
For example, TCP drops the preceding OOB data when queueing a new
OOB data if the old OOB data is at the head of recvq.
# RUN msg_oob.no_peek.ex_oob_drop ...
# msg_oob.c:146:ex_oob_drop:AF_UNIX :x
# msg_oob.c:147:ex_oob_drop:TCP :Resource temporarily unavailable
# msg_oob.c:146:ex_oob_drop:AF_UNIX :y
# msg_oob.c:147:ex_oob_drop:TCP :Invalid argument
# OK msg_oob.no_peek.ex_oob_drop
ok 9 msg_oob.no_peek.ex_oob_drop
# RUN msg_oob.no_peek.ex_oob_drop_2 ...
# msg_oob.c:146:ex_oob_drop_2:AF_UNIX :x
# msg_oob.c:147:ex_oob_drop_2:TCP :Resource temporarily unavailable
# OK msg_oob.no_peek.ex_oob_drop_2
ok 10 msg_oob.no_peek.ex_oob_drop_2
This patch allows AF_UNIX's MSG_OOB implementation to produce different
results from TCP when operations are guarded with tcp_incompliant{}.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Let's say a socket send()s "hello" with MSG_OOB and "world" without flags,
>>> from socket import *
>>> c1, c2 = socketpair(AF_UNIX)
>>> c1.send(b'hello', MSG_OOB)
5
>>> c1.send(b'world')
5
and its peer recv()s "hell" and "o".
>>> c2.recv(10)
b'hell'
>>> c2.recv(1, MSG_OOB)
b'o'
Now the consumed OOB skb stays at the head of recvq to return a correct
value for ioctl(SIOCATMARK), which is broken now and fixed by a later
patch.
Then, if peer issues recv() with MSG_DONTWAIT, manage_oob() returns NULL,
so recv() ends up with -EAGAIN.
>>> c2.setblocking(False) # This causes -EAGAIN even with available data
>>> c2.recv(5)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
BlockingIOError: [Errno 11] Resource temporarily unavailable
However, next recv() will return the following available data, "world".
>>> c2.recv(5)
b'world'
When the consumed OOB skb is at the head of the queue, we need to fetch
the next skb to fix the weird behaviour.
Note that the issue does not happen without MSG_DONTWAIT because we can
retry after manage_oob().
This patch also adds a test case that covers the issue.
Without fix:
# RUN msg_oob.no_peek.ex_oob_break ...
# msg_oob.c:134:ex_oob_break:AF_UNIX :Resource temporarily unavailable
# msg_oob.c:135:ex_oob_break:Expected:ld
# msg_oob.c:137:ex_oob_break:Expected ret[0] (-1) == expected_len (2)
# ex_oob_break: Test terminated by assertion
# FAIL msg_oob.no_peek.ex_oob_break
not ok 8 msg_oob.no_peek.ex_oob_break
With fix:
# RUN msg_oob.no_peek.ex_oob_break ...
# OK msg_oob.no_peek.ex_oob_break
ok 8 msg_oob.no_peek.ex_oob_break
Fixes: 314001f0bf ("af_unix: Add OOB support")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
After consuming OOB data, recv() reading the preceding data must break at
the OOB skb regardless of MSG_PEEK.
Currently, MSG_PEEK does not stop recv() for AF_UNIX, and the behaviour is
not compliant with TCP.
>>> from socket import *
>>> c1, c2 = socketpair(AF_UNIX)
>>> c1.send(b'hello', MSG_OOB)
5
>>> c1.send(b'world')
5
>>> c2.recv(1, MSG_OOB)
b'o'
>>> c2.recv(9, MSG_PEEK) # This should return b'hell'
b'hellworld' # even with enough buffer.
Let's fix it by returning NULL for consumed skb and unlinking it only if
MSG_PEEK is not specified.
This patch also adds test cases that add recv(MSG_PEEK) before each recv().
Without fix:
# RUN msg_oob.peek.oob_ahead_break ...
# msg_oob.c:134:oob_ahead_break:AF_UNIX :hellworld
# msg_oob.c:135:oob_ahead_break:Expected:hell
# msg_oob.c:137:oob_ahead_break:Expected ret[0] (9) == expected_len (4)
# oob_ahead_break: Test terminated by assertion
# FAIL msg_oob.peek.oob_ahead_break
not ok 13 msg_oob.peek.oob_ahead_break
With fix:
# RUN msg_oob.peek.oob_ahead_break ...
# OK msg_oob.peek.oob_ahead_break
ok 13 msg_oob.peek.oob_ahead_break
Fixes: 314001f0bf ("af_unix: Add OOB support")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
AF_UNIX's MSG_OOB functionality lacked thorough testing, and we found
some bizarre behaviour.
The new selftest validates every MSG_OOB operation against TCP as a
reference implementation.
This patch adds only a few tests with basic send() and recv() that
do not fail.
The following patches will add more test cases for SO_OOBINLINE, SIGURG,
EPOLLPRI, and SIOCATMARK.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
test_unix_oob.c does not fully cover AF_UNIX's MSG_OOB functionality,
thus there are discrepancies between TCP behaviour.
Also, the test uses fork() to create message producer, and it's not
easy to understand and add more test cases.
Let's remove test_unix_oob.c and rewrite a new test.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Similar to test_multiply_events_wq: we receive one event and inject a
new one. But given that this time we are already in the event hook, we
can use hid_bpf_try_input_report() directly as this function will not
sleep.
Note that the injected event gets processed before the original one this
way.
Link: https://patch.msgid.link/20240626-hid_hw_req_bpf-v2-12-cfd60fb6c79f@kernel.org
Acked-by: Jiri Kosina <jkosina@suse.com>
Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
Now that bpf_wq is available, we can write a test with it. Having
hid_bpf_input_report() waiting for the device means that we can
directly call it, and we get that event when the device is ready.
Link: https://patch.msgid.link/20240626-hid_hw_req_bpf-v2-10-cfd60fb6c79f@kernel.org
Acked-by: Jiri Kosina <jkosina@suse.com>
Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
We add 3 new tests:
- first, we make sure we can prevent the output_report to happen
- second, we make sure that we can detect that a given hidraw client
was actually doing the request, and for that client only, call ourself
hid_bpf_hw_output_report(), returning a custom value
- last, we ensure that we can not loop between hooks for
hid_hw_output_report() and manual calls to hid_bpf_hw_output_report()
from that same hook
Link: https://patch.msgid.link/20240626-hid_hw_req_bpf-v2-8-cfd60fb6c79f@kernel.org
Acked-by: Jiri Kosina <jkosina@suse.com>
Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
We add 3 new tests:
- first, we make sure we can prevent the raw_request to happen
- second, we make sure that we can detect that a given hidraw client
was actually doing the request, and for that client only, call ourself
hid_bpf_hw_request(), returning a custom value
- last, we ensure that we can not loop between hooks for
hid_hw_raw_request() and manual calls to hid_bpf_hw_request() from that
hook
Link: https://patch.msgid.link/20240626-hid_hw_req_bpf-v2-6-cfd60fb6c79f@kernel.org
Acked-by: Jiri Kosina <jkosina@suse.com>
Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
Inside of test_pcm_time() arguments are printed via printf
but '%d' is used to print @flags (of type unsigned int).
Use '%u' instead, just like we do everywhere else.
Signed-off-by: Zhu Jun <zhujun2@cmss.chinamobile.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://patch.msgid.link/20240626084859.4350-1-zhujun2@cmss.chinamobile.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Add tests focusing on indirection table configuration and
creating extra RSS contexts in drivers which support it.
$ export NETIF=eth0 REMOTE_...
$ ./drivers/net/hw/rss_ctx.py
KTAP version 1
1..8
ok 1 rss_ctx.test_rss_key_indir
ok 2 rss_ctx.test_rss_context
ok 3 rss_ctx.test_rss_context4
# Increasing queue count 44 -> 66
# Failed to create context 32, trying to test what we got
ok 4 rss_ctx.test_rss_context32 # SKIP Tested only 31 contexts, wanted 32
ok 5 rss_ctx.test_rss_context_overlap
ok 6 rss_ctx.test_rss_context_overlap2
# .. sprays traffic like a headless chicken ..
not ok 7 rss_ctx.test_rss_context_out_of_order
ok 8 rss_ctx.test_rss_context4_create_with_cfg
# Totals: pass:6 fail:1 xfail:0 xpass:0 skip:1 error:0
Note that rss_ctx.test_rss_context_out_of_order fails with the device
I tested with, but it seems to be a device / driver bug.
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240626012456.2326192-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Teach the load generator how to wait for at least given number
of packets to be received. This will be useful for filtering
where we'll want to send a non-trivial number of packets and
make sure they landed in right queues.
Reviewed-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240626012456.2326192-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Some devices DMA stats to the host periodically. Add a helper
which can wait for that to happen, based on frequency reported
by the driver in ethtool.
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240626012456.2326192-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We use random ports for communication. As Willem predicted
this leads to occasional failures. Try to check if port is
already in use by opening a socket and binding to that port.
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240626012456.2326192-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
ARRAY_SIZE is used on multiple places, move its definition in
bpf_misc.h header.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/bpf/20240626134719.3893748-1-jolsa@kernel.org
When building with clang for ARCH=i386, the following errors are
observed:
CC kernel/bpf/btf_relocate.o
./tools/lib/bpf/btf_relocate.c:206:23: error: implicit truncation from 'int' to a one-bit wide bit-field changes value from 1 to -1 [-Werror,-Wsingle-bit-bitfield-constant-conversion]
206 | info[id].needs_size = true;
| ^ ~
./tools/lib/bpf/btf_relocate.c:256:25: error: implicit truncation from 'int' to a one-bit wide bit-field changes value from 1 to -1 [-Werror,-Wsingle-bit-bitfield-constant-conversion]
256 | base_info.needs_size = true;
| ^ ~
2 errors generated.
The problem is we use 1-bit, 31-bit bitfields in a signed int.
Changing to
bool needs_size: 1;
unsigned int size:31;
...resolves the error and pahole reports that 4 bytes are used
for the underlying representation:
$ pahole btf_name_info tools/lib/bpf/btf_relocate.o
struct btf_name_info {
const char * name; /* 0 8 */
unsigned int needs_size:1; /* 8: 0 4 */
unsigned int size:31; /* 8: 1 4 */
__u32 id; /* 12 4 */
/* size: 16, cachelines: 1, members: 4 */
/* last cacheline: 16 bytes */
};
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240624192903.854261-1-alan.maguire@oracle.com
Add new negative selftests which are intended to cover the
out-of-bounds memory access that could be performed on a
CONST_PTR_TO_DYNPTR within functions taking a ARG_PTR_TO_DYNPTR |
MEM_RDONLY as an argument, and acceptance of invalid register types
i.e. PTR_TO_BTF_ID within functions taking a ARG_PTR_TO_DYNPTR |
MEM_RDONLY.
Reported-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Matt Bobrowski <mattbobrowski@google.com>
Link: https://lore.kernel.org/r/20240625062857.92760-2-mattbobrowski@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The non-contiguous CBM test fails on AMD with:
Starting L3_NONCONT_CAT test ...
Mounting resctrl to "/sys/fs/resctrl"
CPUID output doesn't match 'sparse_masks' file content!
not ok 5 L3_NONCONT_CAT: test
AMD always supports non-contiguous CBM but does not report it via CPUID.
Fix the non-contiguous CBM test to use CPUID to discover non-contiguous
CBM support only on Intel.
Fixes: ae638551ab ("selftests/resctrl: Add non-contiguous CBMs CAT test")
Signed-off-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
The test has been failing for some time when two separate runs of
perf benchmarks are recorded for cycles events and their counts are
compared, while once the recording was done with option --bpf-counters
and once without it. It is expected that the count of the samples
should be within a certain range, firstly the difference was set to be
within 10%, which was then later raised to 20%. However, the test case
keeps failing on certain architectures as recording the provided
benchmark can produce completely different counts based on the
current load of the system.
Sampling two separate runs on intel-eaglestream-spr-13 of "perf stat
--no-big-num -e cycles -- perf bench sched messaging -g 1 -l 100 -t":
Performance counter stats for 'perf bench sched messaging -g 1 -l 100 -t':
396782898 cycles
0.010051983 seconds time elapsed
0.008664000 seconds user
0.097058000 seconds sys
Performance counter stats for 'perf bench sched messaging -g 1 -l 100 -t':
1431133032 cycles
0.021803714 seconds time elapsed
0.023377000 seconds user
0.349918000 seconds sys
, which is ranging from 400mil to 1400mil samples.
Instead of recording the cycles use instructions event, which provides
more stable values. At the same time change the tested workload to one
of the provided testing workloads by perf that is not based on a
scheduler, which can provide another dependency on the current load.
Sampling instructions event with the new workload provide much more
stable results on intel-eaglestream-spr-13 of "perf stat --no-big-num
-e instructions -- perf test -w brstack":
Performance counter stats for 'perf test -w brstack':
64584494 instructions
0.009173945 seconds time elapsed
0.007262000 seconds user
0.002071000 seconds sys
Performance counter stats for 'perf test -w brstack':
64672669 instructions
0.008888135 seconds time elapsed
0.005018000 seconds user
0.004018000 seconds sys
Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: mpetlan@redhat.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240625092001.10909-1-vmolnaro@redhat.com