This test measures the performance effects of KVM's access tracking.
Access tracking is driven by the MMU notifiers test_young, clear_young,
and clear_flush_young. These notifiers do not have a direct userspace
API, however the clear_young notifier can be triggered by marking a
pages as idle in /sys/kernel/mm/page_idle/bitmap. This test leverages
that mechanism to enable access tracking on guest memory.
To measure performance this test runs a VM with a configurable number of
vCPUs that each touch every page in disjoint regions of memory.
Performance is measured in the time it takes all vCPUs to finish
touching their predefined region.
Example invocation:
$ ./access_tracking_perf_test -v 8
Testing guest mode: PA-bits:ANY, VA-bits:48, 4K pages
guest physical test memory offset: 0xffdfffff000
Populating memory : 1.337752570s
Writing to populated memory : 0.010177640s
Reading from populated memory : 0.009548239s
Mark memory idle : 23.973131748s
Writing to idle memory : 0.063584496s
Mark memory idle : 24.924652964s
Reading from idle memory : 0.062042814s
Breaking down the results:
* "Populating memory": The time it takes for all vCPUs to perform the
first write to every page in their region.
* "Writing to populated memory" / "Reading from populated memory": The
time it takes for all vCPUs to write and read to every page in their
region after it has been populated. This serves as a control for the
later results.
* "Mark memory idle": The time it takes for every vCPU to mark every
page in their region as idle through page_idle.
* "Writing to idle memory" / "Reading from idle memory": The time it
takes for all vCPUs to write and read to every page in their region
after it has been marked idle.
This test should be portable across architectures but it is only enabled
for x86_64 since that's all I have tested.
Reviewed-by: Ben Gardon <bgardon@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20210713220957.3493520-7-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
There is a missing break statement which causes a fallthrough to the
next statement where optarg will be null and a segmentation fault will
be generated.
Fixes: 9e965bb75a ("KVM: selftests: Add backing src parameter to dirty_log_perf_test")
Reviewed-by: Ben Gardon <bgardon@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Message-Id: <20210713220957.3493520-6-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This test passes pointers obtained from anon_allocate_area to the
userfaultfd and mremap APIs. This causes a problem if the system
allocator returns tagged pointers because with the tagged address ABI
the kernel rejects tagged addresses passed to these APIs, which would
end up causing the test to fail. To make this test compatible with such
system allocators, stop using the system allocator to allocate memory in
anon_allocate_area, and instead just use mmap.
Link: https://lkml.kernel.org/r/20210714195437.118982-3-pcc@google.com
Link: https://linux-review.googlesource.com/id/Icac91064fcd923f77a83e8e133f8631c5b8fc241
Fixes: c47174fc36 ("userfaultfd: selftest")
Co-developed-by: Lokesh Gidra <lokeshgidra@google.com>
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
Signed-off-by: Peter Collingbourne <pcc@google.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Dave Martin <Dave.Martin@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Alistair Delva <adelva@google.com>
Cc: William McVicker <willmcvicker@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Mitch Phillips <mitchp@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: <stable@vger.kernel.org> [5.4]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a list of vmtest script dependencies to make it easier for new
contributors to get going.
Signed-off-by: Evgeniy Litvinenko <evgeniyl@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210723223645.907802-1-evgeniyl@fb.com
This patch adds tests for the batching and bpf_(get|set)sockopt in
bpf tcp iter.
It first creates:
a) 1 non SO_REUSEPORT listener in lhash2.
b) 256 passive and active fds connected to the listener in (a).
c) 256 SO_REUSEPORT listeners in one of the lhash2 bucket.
The test sets all listeners and connections to bpf_cubic before
running the bpf iter.
The bpf iter then calls setsockopt(TCP_CONGESTION) to switch
each listener and connection from bpf_cubic to bpf_dctcp.
The bpf iter has a random_retry mode such that it can return EAGAIN
to the usespace in the middle of a batch.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210701200625.1036874-1-kafai@fb.com
Previously, the newly introduced test case in test_map_in_map(), which
checks whether the inner map is destroyed after unsuccessful creation of
the outer map, logged the following harmless and expected error:
libbpf: map 'mim': failed to create: Invalid argument(-22) libbpf:
failed to load object './test_map_in_map_invalid.o'
To avoid any possible confusion, mute the logging during loading of the
prog.
Fixes: 08f71a1e39 ("selftests/bpf: Check inner map deletion")
Reported-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210721140941.563175-1-m@lambda.lt
Pull networking fixes from David Miller:
1) Fix type of bind option flag in af_xdp, from Baruch Siach.
2) Fix use after free in bpf_xdp_link_release(), from Xuan Zhao.
3) PM refcnt imbakance in r8152, from Takashi Iwai.
4) Sign extension ug in liquidio, from Colin Ian King.
5) Mising range check in s390 bpf jit, from Colin Ian King.
6) Uninit value in caif_seqpkt_sendmsg(), from Ziyong Xuan.
7) Fix skb page recycling race, from Ilias Apalodimas.
8) Fix memory leak in tcindex_partial_destroy_work, from Pave Skripkin.
9) netrom timer sk refcnt issues, from Nguyen Dinh Phi.
10) Fix data races aroun tcp's tfo_active_disable_stamp, from Eric
Dumazet.
11) act_skbmod should only operate on ethernet packets, from Peilin Ye.
12) Fix slab out-of-bpunds in fib6_nh_flush_exceptions(),, from Psolo
Abeni.
13) Fix sparx5 dependencies, from Yajun Deng.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (74 commits)
dpaa2-switch: seed the buffer pool after allocating the swp
net: sched: cls_api: Fix the the wrong parameter
net: sparx5: fix unmet dependencies warning
net: dsa: tag_ksz: dont let the hardware process the layer 4 checksum
net: dsa: ensure linearized SKBs in case of tail taggers
ravb: Remove extra TAB
ravb: Fix a typo in comment
net: dsa: sja1105: make VID 4095 a bridge VLAN too
tcp: disable TFO blackhole logic by default
sctp: do not update transport pathmtu if SPP_PMTUD_ENABLE is not set
net: ixp46x: fix ptp build failure
ibmvnic: Remove the proper scrq flush
selftests: net: add ESP-in-UDP PMTU test
udp: check encap socket in __udp_lib_err
sctp: update active_key for asoc when old key is being replaced
r8169: Avoid duplicate sysfs entry creation error
ixgbe: Fix packet corruption due to missing DMA sync
Revert "qed: fix possible unpaired spin_{un}lock_bh in _qed_mcp_cmd_and_union()"
ipv6: fix another slab-out-of-bounds in fib6_nh_flush_exceptions
fsl/fman: Add fibre support
...
The case of ESP in UDP encapsulation was not covered before. Add
cases of local changes of MTU and difference on routed path.
Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
This test evaluates the IOAM insertion for IPv6 by checking the IOAM data
integrity on the receiver.
The topology is formed by 3 nodes: Alpha (sender), Beta (router in-between)
and Gamma (receiver). An IOAM domain is configured from Alpha to Gamma only,
which means not on the reverse path. When Gamma is the destination, Alpha
adds an IOAM option (Pre-allocated Trace) inside a Hop-by-hop and fills the
trace with its own IOAM data. Beta and Gamma also fill the trace. The IOAM
data integrity is checked on Gamma, by comparing with the pre-defined IOAM
configuration (see below).
+-------------------+ +-------------------+
| | | |
| alpha netns | | gamma netns |
| | | |
| +-------------+ | | +-------------+ |
| | veth0 | | | | veth0 | |
| | db01::2/64 | | | | db02::2/64 | |
| +-------------+ | | +-------------+ |
| . | | . |
+-------------------+ +-------------------+
. .
. .
. .
+----------------------------------------------------+
| . . |
| +-------------+ +-------------+ |
| | veth0 | | veth1 | |
| | db01::1/64 | ................ | db02::1/64 | |
| +-------------+ +-------------+ |
| |
| beta netns |
| |
+--------------------------+-------------------------+
~~~~~~~~~~~~~~~~~~~~~~
| IOAM configuration |
~~~~~~~~~~~~~~~~~~~~~~
Alpha
+-----------------------------------------------------------+
| Type | Value |
+-----------------------------------------------------------+
| Node ID | 1 |
+-----------------------------------------------------------+
| Node Wide ID | 11111111 |
+-----------------------------------------------------------+
| Ingress ID | 0xffff (default value) |
+-----------------------------------------------------------+
| Ingress Wide ID | 0xffffffff (default value) |
+-----------------------------------------------------------+
| Egress ID | 101 |
+-----------------------------------------------------------+
| Egress Wide ID | 101101 |
+-----------------------------------------------------------+
| Namespace Data | 0xdeadbee0 |
+-----------------------------------------------------------+
| Namespace Wide Data | 0xcafec0caf00dc0de |
+-----------------------------------------------------------+
| Schema ID | 777 |
+-----------------------------------------------------------+
| Schema Data | something that will be 4n-aligned |
+-----------------------------------------------------------+
Note: When Gamma is the destination, Alpha adds an IOAM Pre-allocated Trace
option inside a Hop-by-hop, where 164 bytes are pre-allocated for the
trace, with 123 as the IOAM-Namespace and with 0xfff00200 as the trace
type (= all available options at this time). As a result, and based on
IOAM configurations here, only both Alpha and Beta should be capable of
inserting their IOAM data while Gamma won't have enough space and will
set the overflow bit.
Beta
+-----------------------------------------------------------+
| Type | Value |
+-----------------------------------------------------------+
| Node ID | 2 |
+-----------------------------------------------------------+
| Node Wide ID | 22222222 |
+-----------------------------------------------------------+
| Ingress ID | 201 |
+-----------------------------------------------------------+
| Ingress Wide ID | 201201 |
+-----------------------------------------------------------+
| Egress ID | 202 |
+-----------------------------------------------------------+
| Egress Wide ID | 202202 |
+-----------------------------------------------------------+
| Namespace Data | 0xdeadbee1 |
+-----------------------------------------------------------+
| Namespace Wide Data | 0xcafec0caf11dc0de |
+-----------------------------------------------------------+
| Schema ID | 0xffffff (= None) |
+-----------------------------------------------------------+
| Schema Data | |
+-----------------------------------------------------------+
Gamma
+-----------------------------------------------------------+
| Type | Value |
+-----------------------------------------------------------+
| Node ID | 3 |
+-----------------------------------------------------------+
| Node Wide ID | 33333333 |
+-----------------------------------------------------------+
| Ingress ID | 301 |
+-----------------------------------------------------------+
| Ingress Wide ID | 301301 |
+-----------------------------------------------------------+
| Egress ID | 0xffff (default value) |
+-----------------------------------------------------------+
| Egress Wide ID | 0xffffffff (default value) |
+-----------------------------------------------------------+
| Namespace Data | 0xdeadbee2 |
+-----------------------------------------------------------+
| Namespace Wide Data | 0xcafec0caf22dc0de |
+-----------------------------------------------------------+
| Schema ID | 0xffffff (= None) |
+-----------------------------------------------------------+
| Schema Data | |
+-----------------------------------------------------------+
Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Set hthresh, dump it again and verify thresh.lbits && thresh.rbits.
They are passed as attributes of xfrm_spdattr_type_t, different from
other message attributes that use xfrm_attr_type_t.
Also, test attribute that is bigger than XFRMA_SPD_MAX, currently it
should be silently ignored.
Cc: Shuah Khan <shuah@kernel.org>
Cc: linux-kselftest@vger.kernel.org
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
When run test_tc_tunnel.sh, it complains following error
ipip
encap 192.168.1.1 to 192.168.1.2, type ipip, mac none len 100
test basic connectivity
nc: cannot use -p and -l
nc man page has:
-l Listen for an incoming connection rather than initiating
a connection to a remote host.Cannot be used together with
any of the options -psxz. Additionally, any timeouts specified
with the -w option are ignored.
Correct nc in server_listen().
Signed-off-by: Vincent Li <vincent.mc.li@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20210719223022.66681-1-vincent.mc.li@gmail.com
UDP socket support was added recently so testing UDP insert failure is no
longer correct and causes test_maps failure. The fix is easy though, we
simply need to test that UDP is correctly added instead of blocked.
Fixes: 122e6c79ef ("sock_map: Update sock type checks for UDP")
Reported-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210720184832.452430-1-john.fastabend@gmail.com
Simple functional test for the newly exposed features.
Also add an optional stress test for the channel number
update under flood.
RFC v1 -> RFC v2:
- add the stress test
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a test case to check whether an unsuccessful creation of an outer
map of a BTF-defined map-in-map destroys the inner map.
As bpf_object__create_map() is a static function, we cannot just call it
from the test case and then check whether a map accessible via
map->inner_map_fd has been closed. Instead, we iterate over all maps and
check whether the map "$MAP_NAME.inner" does not exist.
Signed-off-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20210719173838.423148-3-m@lambda.lt
- Fix MTE shared page detection
- Fix selftest use of obsolete pthread_yield() in favour of sched_yield()
- Enable selftest's use of PMU registers when asked to
-----BEGIN PGP SIGNATURE-----
iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmD0BgcPHG1hekBrZXJu
ZWwub3JnAAoJECPQ0LrRPXpDSWwP/1dmthmV6qh7eqlXquxq9GRFOZoFBdY+48R0
8X7eZkM+TxIFd36gOfur9v8kh3KLCZmTi4/f/0hsWpcE36SYVv8hfNBck6wRWoln
Yh4mYqW23l8nME2wwzRkOE1uZ6o3RRKSWROoMultPYrL1XdVTv+sCUmIqY/aa4vu
dgrZaFBC5sxzW6Rsiw0H7NswC/RezjIooY9qimeQpSNMOM0585FCjVTtU9/0+S4V
UpUPe/Y9g74N98NAtbJ9Wy5Q1I4xsQeYBejzgLyVYvTVnV7/6ITpQOi/EjgtvTln
j7LgN72H9BBQv7wEmnQs4AmCdPqppb0PfHKwpUQTEgK271SvROf9ssmSaaAS5rRx
5d3XCr+D93JViP6W/KUoA8Mye5en1gikfnrQeL5eLPpryOVIFsX1zeA7wOnYeRGh
r7QAboC1ScSKlTZiXaeSvxI1xC4FVljUeTNGuWJIofkyo744KX5tFktZefr0i3LL
/n/oE/aErXDRVfinAFxSRE4NUZON6vc6JnUBGFbs/pCJKj60XjVBx6IhnPnHBlv2
pMS5REZ6+LqovXoaib/BI/4LfLqk32u/fPVTEJTqs9H+brBVD7+lujJozcZ8Wes8
/fC7hZfNnvfbRgH/BdEetxVqbB68FNV3DXZ3V3MjnP+G6mI/ugLCuQ22LN3PmH9w
0BiZilau
=Ln0f
-----END PGP SIGNATURE-----
Merge tag 'kvmarm-fixes-5.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for 5.14, take #1
- Fix MTE shared page detection
- Fix selftest use of obsolete pthread_yield() in favour of sched_yield()
- Enable selftest's use of PMU registers when asked to
This Kselftest fixes update for Linux 5.14-rc2 consists of fix
to memory-hotplug hot-remove test to stop spamming logs with
dump_page() entries and slowing the system down to a crawl.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmDx/hcACgkQCwJExA0N
QxzwIw/+OIg2i+vYwHRaGHw2kIhz+1Sfja+sJ5KCldxxtCFI+alQnkrZwcvjjR0N
OsqQXIfp2BH2YrtMGA4EO3B7ZtgejQ90cFLhFan4VwZC7IVZc8VjTrxIiruUpq9P
XV5TtGDtGSPgMUlqS67nh2fl35GdCminADSkqJ3UETZfvyE2vPw+UEYbij+IzE9n
Ndwcaqwqcf8jFl1dkK2UhTto7Xtjrmf+AOuEZzHFXldaK9tLU0aUz17efwAgxZYx
l1Bldi8f8cqfyXLoW459pV5f13r5hurc7goQHwNft5zayB1AKXJmwIy1CCUsEAQC
P6UpU3lwBumJRZd/sDaSxAhrUk7gRITPJcfej+2tuIMdo+vIqrtl82pLOVtitJux
6INcfGMtZ6LxMWJbF35evhVOmYcpUXu3Uh6xkwjFYoqdw7+y0b+6aAFaveLAKBdE
ne/dN6T2Yo81C94Bst1XPD1z65lpNO4a/xWmt25vQuA2TEHMfMZ+Wg7Ew9zglZSo
sy6fpZnHmSiwFaowP6tfYnFrT2OD6iEaLre58chPNHr3e2alrJWm3bW59TbemSbu
ZWZMTC+NFNLVOVEUDVKXMkW5lXoPpn+NJGX50IEhMltPYVvSCVk74pEUySbzN7eo
mG+pb4Xud9JvnOj/+yxeqw7jkQWY8hknelF/kmwkcMW97HttW80=
=zpTx
-----END PGP SIGNATURE-----
Merge tag 'linux-kselftest-fixes-5.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest fix from Shuah Khan:
"A fix to memory-hotplug hot-remove test to stop spamming logs with
dump_page() entries and slowing the system down to a crawl"
* tag 'linux-kselftest-fixes-5.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests: memory-hotplug: avoid spamming logs with dump_page(), ratio limit hot-remove error test
Test various type data dumping operations by comparing expected
format with the dumped string; an snprintf-style printf function
is used to record the string dumped. Also verify overflow handling
where the data passed does not cover the full size of a type,
such as would occur if a tracer has a portion of the 8k
"struct task_struct".
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1626362126-27775-4-git-send-email-alan.maguire@oracle.com
Add several test cases for checking update_alu_sanitation_state() under
multiple paths:
# ./test_verifier
[...]
#1061/u map access: known scalar += value_ptr unknown vs const OK
#1061/p map access: known scalar += value_ptr unknown vs const OK
#1062/u map access: known scalar += value_ptr const vs unknown OK
#1062/p map access: known scalar += value_ptr const vs unknown OK
#1063/u map access: known scalar += value_ptr const vs const (ne) OK
#1063/p map access: known scalar += value_ptr const vs const (ne) OK
#1064/u map access: known scalar += value_ptr const vs const (eq) OK
#1064/p map access: known scalar += value_ptr const vs const (eq) OK
#1065/u map access: known scalar += value_ptr unknown vs unknown (eq) OK
#1065/p map access: known scalar += value_ptr unknown vs unknown (eq) OK
#1066/u map access: known scalar += value_ptr unknown vs unknown (lt) OK
#1066/p map access: known scalar += value_ptr unknown vs unknown (lt) OK
#1067/u map access: known scalar += value_ptr unknown vs unknown (gt) OK
#1067/p map access: known scalar += value_ptr unknown vs unknown (gt) OK
[...]
Summary: 1762 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Alexei Starovoitov says:
====================
pull-request: bpf-next 2021-07-15
The following pull-request contains BPF updates for your *net-next* tree.
We've added 45 non-merge commits during the last 15 day(s) which contain
a total of 52 files changed, 3122 insertions(+), 384 deletions(-).
The main changes are:
1) Introduce bpf timers, from Alexei.
2) Add sockmap support for unix datagram socket, from Cong.
3) Fix potential memleak and UAF in the verifier, from He.
4) Add bpf_get_func_ip helper, from Jiri.
5) Improvements to generic XDP mode, from Kumar.
6) Support for passing xdp_md to XDP programs in bpf_prog_run, from Zvi.
===================
Signed-off-by: David S. Miller <davem@davemloft.net>
TCP and other connection oriented sockets have accept()
for each incoming connection on the server side, hence
they can just insert those fd's from accept() to sockmap,
which are of course established.
Now with datagram sockets begin to support sockmap and
redirection, the restriction is no longer applicable to
them, as they have no accept(). So we have to lift this
restriction for them. This is fine, because inside
bpf_sk_redirect_map() we still have another socket status
check, sock_map_redirect_allowed(), as a guard.
This also means they do not have to be removed from
sockmap when disconnecting.
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210704190252.11866-3-xiyou.wangcong@gmail.com
Adding test for bpf_get_func_ip in kprobe+ofset probe.
Because of the offset value it's arch specific, enabling
the new test only for x86_64 architecture.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210714094400.396467-9-jolsa@kernel.org
Check that map-in-map supports bpf timers.
Check that indirect "recursion" of timer callbacks works:
timer_cb1() { bpf_timer_set_callback(timer_cb2); }
timer_cb2() { bpf_timer_set_callback(timer_cb1); }
Check that
bpf_map_release
htab_free_prealloced_timers
bpf_timer_cancel_and_free
hrtimer_cancel
works while timer cb is running.
"while true; do ./test_progs -t timer_mim; done"
is a great stress test. It caught missing timer cancel in htab->extra_elems.
timer_mim_reject.c is a negative test that checks
that timer<->map mismatch is prevented.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210715005417.78572-12-alexei.starovoitov@gmail.com
Add bpf_timer test that creates timers in preallocated and
non-preallocated hash, in array and in lru maps.
Let array timer expire once and then re-arm it for 35 seconds.
Arm lru timer into the same callback.
Then arm and re-arm hash timers 10 times each.
At the last invocation of prealloc hash timer cancel the array timer.
Force timer free via LRU eviction and direct bpf_map_delete_elem.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210715005417.78572-11-alexei.starovoitov@gmail.com
The variable buf is unused since commit 005edd1656 ("selftests/bpf:
convert bpf tunnel test to BPF_ADJ_ROOM_MAC"). Remove it to fix the
following warning:
test_tc_tunnel.c:531:7: warning: unused variable 'buf' [-Wunused-variable]
Fixes: 005edd1656 ("selftests/bpf: convert bpf tunnel test to BPF_ADJ_ROOM_MAC")
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20210713102719.8890-1-tklauser@distanz.ch
* Fixes for host SMIs on AMD
* Fixes for guest SMIs on AMD
* Fixes for selftests on s390 and ARM
* Fix memory leak
* Enforce no-instrumentation area on vmentry when hardware
breakpoints are in use.
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmDwRi4UHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroOt4AgAl6xEkMwDC74d/QFIOA7s2GD3ugfa
z5XqGN1qz/nmEMnuIg6/tjTXDPmn/dfLMqy8RGZfyUv6xbgPcv/7JuFMRILvwGTb
SbOVrGnR/QOhMdlfWH34qDkXeEsthTXSgQgVm/iiED0TttvQYVcZ/E9mgzaWQXor
T1yTug2uAUXJ1EBxY0ZBo2kbh+BvvdmhEF0pksZOuwqZdH3zn3QCXwAwkL/OtUYE
M6nNn3j1LU38C4OK1niXOZZVOuMIdk/l7LyFpjUQTFlIqitQAPtBE5MD+K+A9oC2
Yocxyj2tId1e6o8bLic/oN8/LpdORTvA/wDMj5M1DcMzvxQuQIpGYkcVGg==
=gjVA
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
- Allow again loading KVM on 32-bit non-PAE builds
- Fixes for host SMIs on AMD
- Fixes for guest SMIs on AMD
- Fixes for selftests on s390 and ARM
- Fix memory leak
- Enforce no-instrumentation area on vmentry when hardware breakpoints
are in use.
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (25 commits)
KVM: selftests: smm_test: Test SMM enter from L2
KVM: nSVM: Restore nested control upon leaving SMM
KVM: nSVM: Fix L1 state corruption upon return from SMM
KVM: nSVM: Introduce svm_copy_vmrun_state()
KVM: nSVM: Check that VM_HSAVE_PA MSR was set before VMRUN
KVM: nSVM: Check the value written to MSR_VM_HSAVE_PA
KVM: SVM: Fix sev_pin_memory() error checks in SEV migration utilities
KVM: SVM: Return -EFAULT if copy_to_user() for SEV mig packet header fails
KVM: SVM: add module param to control the #SMI interception
KVM: SVM: remove INIT intercept handler
KVM: SVM: #SMI interception must not skip the instruction
KVM: VMX: Remove vmx_msr_index from vmx.h
KVM: X86: Disable hardware breakpoints unconditionally before kvm_x86->run()
KVM: selftests: Address extra memslot parameters in vm_vaddr_alloc
kvm: debugfs: fix memory leak in kvm_create_vm_debugfs
KVM: x86/pmu: Clear anythread deprecated bit when 0xa leaf is unsupported on the SVM
KVM: mmio: Fix use-after-free Read in kvm_vm_ioctl_unregister_coalesced_mmio
KVM: SVM: Revert clearing of C-bit on GPA in #NPF handler
KVM: x86/mmu: Do not apply HPA (memory encryption) mask to GPAs
KVM: x86: Use kernel's x86_phys_bits to handle reduced MAXPHYADDR
...
Two additional tests are added:
- SMM triggered from L2 does not currupt L1 host state.
- Save/restore during SMM triggered from L2 does not corrupt guest/host
state.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210628104425.391276-7-vkuznets@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Commit a75a895e64 ("KVM: selftests: Unconditionally use memslot 0 for
vaddr allocations") removed the memslot parameters from vm_vaddr_alloc.
It addressed all callers except one under lib/aarch64/, due to a race
with commit e3db7579ef ("KVM: selftests: Add exception handling
support for aarch64")
Fix the vm_vaddr_alloc call in lib/aarch64/processor.c.
Reported-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Ricardo Koller <ricarkol@google.com>
Message-Id: <20210702201042.4036162-1-ricarkol@google.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Current release - regressions:
- sock: fix parameter order in sock_setsockopt()
Current release - new code bugs:
- netfilter: nft_last:
- fix incorrect arithmetic when restoring last used
- honor NFTA_LAST_SET on restoration
Previous releases - regressions:
- udp: properly flush normal packet at GRO time
- sfc: ensure correct number of XDP queues; don't allow enabling the
feature if there isn't sufficient resources to Tx from any CPU
- dsa: sja1105: fix address learning getting disabled on the CPU port
- mptcp: addresses a rmem accounting issue that could keep packets
in subflow receive buffers longer than necessary, delaying
MPTCP-level ACKs
- ip_tunnel: fix mtu calculation for ETHER tunnel devices
- do not reuse skbs allocated from skbuff_fclone_cache in the napi
skb cache, we'd try to return them to the wrong slab cache
- tcp: consistently disable header prediction for mptcp
Previous releases - always broken:
- bpf: fix subprog poke descriptor tracking use-after-free
- ipv6:
- allocate enough headroom in ip6_finish_output2() in case
iptables TEE is used
- tcp: drop silly ICMPv6 packet too big messages to avoid
expensive and pointless lookups (which may serve as a DDOS
vector)
- make sure fwmark is copied in SYNACK packets
- fix 'disable_policy' for forwarded packets (align with IPv4)
- netfilter: conntrack: do not renew entry stuck in tcp SYN_SENT state
- netfilter: conntrack: do not mark RST in the reply direction coming
after SYN packet for an out-of-sync entry
- mptcp: cleanly handle error conditions with MP_JOIN and syncookies
- mptcp: fix double free when rejecting a join due to port mismatch
- validate lwtstate->data before returning from skb_tunnel_info()
- tcp: call sk_wmem_schedule before sk_mem_charge in zerocopy path
- mt76: mt7921: continue to probe driver when fw already downloaded
- bonding: fix multiple issues with offloading IPsec to (thru?) bond
- stmmac: ptp: fix issues around Qbv support and setting time back
- bcmgenet: always clear wake-up based on energy detection
Misc:
- sctp: move 198 addresses from unusable to private scope
- ptp: support virtual clocks and timestamping
- openvswitch: optimize operation for key comparison
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmDu3mMACgkQMUZtbf5S
Irsjxg//UwcPJMYFmXV+fGkEsWYe1Kf29FcUDEeANFtbltfAcIfZ0GoTbSDRnrVb
HcYAKcm4XRx5bWWdQrQsQq/yiLbnS/rSLc7VRB+uRHWRKl3eYcaUB2rnCXsxrjGw
wQJgOmztDCJS4BIky24iQpF/8lg7p/Gj2Ih532gh93XiYo612FrEJKkYb2/OQfYX
GkbnZ0kL2Y1SV+bhy6aT5azvhHKM4/3eA4fHeJ2p8e2gOZ5ni0vpX0xEzdzKOCd0
vwR/Wu3h/+2QuFYVcSsVguuM++JXACG8MAS/Tof78dtNM4a3kQxzqeh5Bv6IkfTu
rokENLq4pjNRy+nBAOeQZj8Jd0K0kkf/PN9WMdGQtplMoFhjjV25R6PeRrV9wwPo
peozIz2MuQo7Kfof1D+44h2foyLfdC28/Z0CvRbDpr5EHOfYynvBbrnhzIGdQp6V
xgftKTOdgz2Djgg8HiblZund1FA44OYerddVAASrIsnSFnIz1VLVQIsfV+GLBwwc
FawrIZ6WfIjzRSrDGOvDsbAQI47T/1jbaPJeK6XgjWkQmjEd6UtRWRZLYCxemQEw
4HP3sWC96BOehuD8ylipVE1oFqrxCiOB/fZxezXqjo8dSX3NLdak4cCHTHoW5SuZ
eEAxQRaBliKd+P7hoy9cZ57CAu3zUa8kijfM5QRlCAHF+zSxaPs=
=QFnb
-----END PGP SIGNATURE-----
Merge tag 'net-5.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski.
"Including fixes from bpf and netfilter.
Current release - regressions:
- sock: fix parameter order in sock_setsockopt()
Current release - new code bugs:
- netfilter: nft_last:
- fix incorrect arithmetic when restoring last used
- honor NFTA_LAST_SET on restoration
Previous releases - regressions:
- udp: properly flush normal packet at GRO time
- sfc: ensure correct number of XDP queues; don't allow enabling the
feature if there isn't sufficient resources to Tx from any CPU
- dsa: sja1105: fix address learning getting disabled on the CPU port
- mptcp: addresses a rmem accounting issue that could keep packets in
subflow receive buffers longer than necessary, delaying MPTCP-level
ACKs
- ip_tunnel: fix mtu calculation for ETHER tunnel devices
- do not reuse skbs allocated from skbuff_fclone_cache in the napi
skb cache, we'd try to return them to the wrong slab cache
- tcp: consistently disable header prediction for mptcp
Previous releases - always broken:
- bpf: fix subprog poke descriptor tracking use-after-free
- ipv6:
- allocate enough headroom in ip6_finish_output2() in case
iptables TEE is used
- tcp: drop silly ICMPv6 packet too big messages to avoid
expensive and pointless lookups (which may serve as a DDOS
vector)
- make sure fwmark is copied in SYNACK packets
- fix 'disable_policy' for forwarded packets (align with IPv4)
- netfilter: conntrack:
- do not renew entry stuck in tcp SYN_SENT state
- do not mark RST in the reply direction coming after SYN packet
for an out-of-sync entry
- mptcp: cleanly handle error conditions with MP_JOIN and syncookies
- mptcp: fix double free when rejecting a join due to port mismatch
- validate lwtstate->data before returning from skb_tunnel_info()
- tcp: call sk_wmem_schedule before sk_mem_charge in zerocopy path
- mt76: mt7921: continue to probe driver when fw already downloaded
- bonding: fix multiple issues with offloading IPsec to (thru?) bond
- stmmac: ptp: fix issues around Qbv support and setting time back
- bcmgenet: always clear wake-up based on energy detection
Misc:
- sctp: move 198 addresses from unusable to private scope
- ptp: support virtual clocks and timestamping
- openvswitch: optimize operation for key comparison"
* tag 'net-5.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (158 commits)
net: dsa: properly check for the bridge_leave methods in dsa_switch_bridge_leave()
sfc: add logs explaining XDP_TX/REDIRECT is not available
sfc: ensure correct number of XDP queues
sfc: fix lack of XDP TX queues - error XDP TX failed (-22)
net: fddi: fix UAF in fza_probe
net: dsa: sja1105: fix address learning getting disabled on the CPU port
net: ocelot: fix switchdev objects synced for wrong netdev with LAG offload
net: Use nlmsg_unicast() instead of netlink_unicast()
octeontx2-pf: Fix uninitialized boolean variable pps
ipv6: allocate enough headroom in ip6_finish_output2()
net: hdlc: rename 'mod_init' & 'mod_exit' functions to be module-specific
net: bridge: multicast: fix MRD advertisement router port marking race
net: bridge: multicast: fix PIM hello router port marking race
net: phy: marvell10g: fix differentiation of 88X3310 from 88X3340
dsa: fix for_each_child.cocci warnings
virtio_net: check virtqueue_add_sgs() return value
mptcp: properly account bulk freed memory
selftests: mptcp: fix case multiple subflows limited by server
mptcp: avoid processing packet if a subflow reset
mptcp: fix syncookie process if mptcp can not_accept new subflow
...
Commit b78f4a5966 ("KVM: selftests: Rename vm_handle_exception")
raced with a couple of new x86 tests, missing two vm_handle_exception
to vm_install_exception_handler conversions.
Help the two broken tests to catch up with the new world.
Cc: Andrew Jones <drjones@redhat.com>
CC: Ricardo Koller <ricarkol@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Message-Id: <20210701071928.2971053-1-maz@kernel.org>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Ricardo Koller <ricarkol@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We reworked get-reg-list to make it easier to enable optional register
sublists by parametrizing their vcpu feature flags as well as making
other generalizations. That was all to make sure we enable the PMU
registers when we want to test them. Somehow we forgot to actually
include the PMU feature flag in the PMU sublist description though!
Do that now.
Fixes: 313673bad8 ("KVM: arm64: selftests: get-reg-list: Split base and pmu registers")
Signed-off-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210713203742.29680-3-drjones@redhat.com
With later GCC we get
steal_time.c: In function ‘main’:
steal_time.c:323:25: warning: ‘pthread_yield’ is deprecated: pthread_yield is deprecated, use sched_yield instead [-Wdeprecated-declarations]
Let's follow the instructions and use sched_yield instead.
Signed-off-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210713203742.29680-2-drjones@redhat.com
While the offline memory test obey ratio limit, the same test with
error injection does not and tries to offline all the hotpluggable
memory, spamming system logs with hundreds of thousands of dump_page()
entries, slowing system down (to the point the test itself timesout and
gets terminated) and excessive fs occupation:
...
[ 9784.393354] page:c00c0000007d1b40 refcount:3 mapcount:0 mapping:c0000001fc03e950 index:0xe7b
[ 9784.393355] def_blk_aops
[ 9784.393356] flags: 0x3ffff800002062(referenced|active|workingset|private)
[ 9784.393358] raw: 003ffff800002062 c0000001b9343a68 c0000001b9343a68 c0000001fc03e950
[ 9784.393359] raw: 0000000000000e7b c000000006607b18 00000003ffffffff c00000000490d000
[ 9784.393359] page dumped because: migration failure
[ 9784.393360] page->mem_cgroup:c00000000490d000
[ 9784.393416] migrating pfn 1f46d failed ret:1
...
$ grep "page dumped because: migration failure" /var/log/kern.log | wc -l
2405558
$ ls -la /var/log/kern.log
-rw-r----- 1 syslog adm 2256109539 Jun 30 14:19 /var/log/kern.log
Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
After patch "mptcp: fix syncookie process if mptcp can not_accept new
subflow", if subflow is limited, MP_JOIN SYN is dropped, and no SYN/ACK
will be replied.
So in case "multiple subflows limited by server", the expected SYN/ACK
number should be 1.
Fixes: 00587187ad ("selftests: mptcp: add test cases for mptcp join tests with syn cookies")
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Jianguo Wu <wujianguo@chinatelecom.cn>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf 2021-07-09
The following pull-request contains BPF updates for your *net* tree.
We've added 9 non-merge commits during the last 9 day(s) which contain
a total of 13 files changed, 118 insertions(+), 62 deletions(-).
The main changes are:
1) Fix runqslower task->state access from BPF, from SanjayKumar Jeyakumar.
2) Fix subprog poke descriptor tracking use-after-free, from John Fastabend.
3) Fix sparse complaint from prior devmap RCU conversion, from Toke Høiland-Jørgensen.
4) Fix missing va_end in bpftool JIT json dump's error path, from Gu Shengxian.
5) Fix tools/bpf install target from missing runqslower install, from Wei Li.
6) Fix xdpsock BPF sample to unload program on shared umem option, from Wang Hai.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
- Fixed a bug that broke the .sym-offset modifier and added a test to make
sure nothing breaks it again.
- Replace a list_del/list_add() with a list_move()
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYOhMlxQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6quYRAQCbciaL9gQwk6lS4Rn3NkC/i1xsxv0A
q56XDPK3qyuHWQD/d6gcJ5n+Bl0OoNaDyG4UqnSGucZctYQeY4LQMzd4PQA=
=wPyY
-----END PGP SIGNATURE-----
Merge tag 'trace-v5.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fix and cleanup from Steven Rostedt:
"Tracing fix for histograms and a clean up in ftrace:
- Fixed a bug that broke the .sym-offset modifier and added a test to
make sure nothing breaks it again.
- Replace a list_del/list_add() with a list_move()"
* tag 'trace-v5.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ftrace: Use list_move instead of list_del/list_add
tracing/selftests: Add tests to test histogram sym and sym-offset modifiers
tracing/histograms: Fix parsing of "sym-offset" modifier
This adds some extra noise to the tailcall_bpf2bpf4 tests that will cause
verify to patch insns. This then moves around subprog start/end insn
index and poke descriptor insn index to ensure that verify and JIT will
continue to track these correctly.
If done correctly verifier should pass this program same as before and
JIT should emit tail call logic.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210707223848.14580-3-john.fastabend@gmail.com
With a large mmap map size, we can overlap with the text area and using
MAP_FIXED results in unmapping that area. Switch to MAP_FIXED_NOREPLACE
and handle the EEXIST error.
Link: https://lkml.kernel.org/r/20210616045239.370802-3-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Reviewed-by: Kalesh Singh <kaleshsingh@google.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "mrermap fixes", v2.
This patch (of 6):
Instead of hardcoding 4K page size fetch it using sysconf(). For the
performance measurements test still assume 2M and 1G are hugepage sizes.
Link: https://lkml.kernel.org/r/20210616045239.370802-1-aneesh.kumar@linux.ibm.com
Link: https://lkml.kernel.org/r/20210616045239.370802-2-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Reviewed-by: Kalesh Singh <kaleshsingh@google.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Hugh Dickins <hughd@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The test verifies that file descriptor created with memfd_secret does not
allow read/write operations, that secret memory mappings respect
RLIMIT_MEMLOCK and that remote accesses with process_vm_read() and
ptrace() to the secret memory fail.
Link: https://lkml.kernel.org/r/20210518072034.31572-8-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Elena Reshetova <elena.reshetova@intel.com>
Cc: Hagen Paul Pfeifer <hagen@jauu.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Bottomley <jejb@linux.ibm.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Palmer Dabbelt <palmerdabbelt@google.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tycho Andersen <tycho@tycho.ws>
Cc: Will Deacon <will@kernel.org>
Cc: kernel test robot <lkp@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a test to the tracing selftests that will catch if the .sym or
.sym-offset modifiers break in the future.
Link: https://lkml.kernel.org/r/20210707121451.101a1002@oasis.local.home
Acked-by: Tom Zanussi <zanussi@kernel.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Support for cpumap and devmap entry progs in previous commits means the
test needs to be updated for the new semantics. Also take this
opportunity to convert it from CHECK macros to the new ASSERT macros.
Since xdp_cpumap_attach has no subtest, put the sole test inside the
test_xdp_cpumap_attach function.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210702111825.491065-6-memxor@gmail.com
Add a test for using xdp_md as a context to BPF_PROG_TEST_RUN for XDP
programs.
The test uses a BPF program that takes in a return value from XDP
meta data, then reduces the size of the XDP meta data by 4 bytes.
Test cases validate the possible failure cases for passing in invalid
xdp_md contexts, that the return value is successfully passed
in, and that the adjusted meta data is successfully copied out.
Co-developed-by: Cody Haas <chaas@riotgames.com>
Co-developed-by: Lisa Watanabe <lwatanabe@riotgames.com>
Signed-off-by: Cody Haas <chaas@riotgames.com>
Signed-off-by: Lisa Watanabe <lwatanabe@riotgames.com>
Signed-off-by: Zvi Effron <zeffron@riotgames.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210707221657.3985075-5-zeffron@riotgames.com
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Do not refresh timeout in SYN_SENT for syn retransmissions.
Add selftest for unreplied TCP connection, from Florian Westphal.
2) Fix null dereference from error path with hardware offload
in nftables.
3) Remove useless nf_ct_gre_keymap_flush() from netns exit path,
from Vasily Averin.
4) Missing rcu read-lock side in ctnetlink helper info dump,
also from Vasily.
5) Do not mark RST in the reply direction coming after SYN packet
for an out-of-sync entry, from Ali Abdallah and Florian Westphal.
6) Add tcp_ignore_invalid_rst sysctl to allow to disable out of
segment RSTs, from Ali.
7) KCSAN fix for nf_conntrack_all_lock(), from Manfred Spraul.
8) Honor NFTA_LAST_SET in nft_last.
9) Fix incorrect arithmetics when restore last_jiffies in nft_last.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
After redirecting, it's already a new path. So the old PMTU info should
be cleared. The IPv6 test "mtu exception plus redirect" should only
has redirect info without old PMTU.
The IPv4 test can not be changed because of legacy.
Fixes: ec81053528 ("selftests: Add redirect tests")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the kernel doesn't enable option CONFIG_IPV6_SUBTREES, the RTA_SRC
info will not be exported to userspace in rt6_fill_node(). And ip cmd will
not print "from ::" to the route output. So remove this check.
Fixes: ec81053528 ("selftests: Add redirect tests")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Prevent sigaltstack out of bounds writes. The kernel unconditionally
writes the FPU state to the alternate stack without checking whether
the stack is large enough to accomodate it.
Check the alternate stack size before doing so and in case it's too
small force a SIGSEGV instead of silently corrupting user space data.
- MINSIGSTKZ and SIGSTKSZ are constants in signal.h and have never been
updated despite the fact that the FPU state which is stored on the
signal stack has grown over time which causes trouble in the field
when AVX512 is available on a CPU. The kernel does not expose the
minimum requirements for the alternate stack size depending on the
available and enabled CPU features.
ARM already added an aux vector AT_MINSIGSTKSZ for the same reason.
Add it to x86 as well
- A major cleanup of the x86 FPU code. The recent discoveries of XSTATE
related issues unearthed quite some inconsistencies, duplicated code
and other issues.
The fine granular overhaul addresses this, makes the code more robust
and maintainable, which allows to integrate upcoming XSTATE related
features in sane ways.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmDlcpETHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoeP5D/4i+AgYYeiMLgGb+NS7iaKPfoWo6LIz
y3qdTSA0DQaIYbYivWwRO/g0GYdDMXDWeZalFi7eGnVI8O3eOog+22Zrf/y0UINB
KJHdYd4ApWHhs401022y5hexrWQvnV8w1yQCuj/zLm6eC+AVhdwt2AY+IBoRrdUj
wqY97B/4rJNsBvvqTDn9EeDrJA2y0y0Suc7AhIp2BGMI+dpIdxys8RJDamXNWyDL
gJf0YRgUoiIn3AHKb+fgv60AoxfC175NSg/5/y/scFNXqVlW0Up4YCb7pqG9o2Ga
f3XvtWfbw1N5PmUYjFkALwEkzGUbM3v0RA3xLY2j2WlWm9fBPPy59dt+i/h/VKyA
GrA7i7lcIqX8dfVH6XkrReZBkRDSB6t9SZTvV54jAz5fcIZO2Rg++UFUvI/R6GKK
XCcxukYaArwo+IG62iqDszS3gfLGhcor/cviOeULRC5zMUIO4Jah+IhDnifmShtC
M5s9QzrwIRD/XMewGRQmvkiN4kBfE7jFoBQr1J9leCXJKrM+2JQmMzVInuubTQIq
SdlKOaAIn7xtekz+6XdFG9Gmhck0PCLMJMOLNvQkKWI3KqGLRZ+dAWKK0vsCizAx
0BA7ZeB9w9lFT+D8mQCX77JvW9+VNwyfwIOLIrJRHk3VqVpS5qvoiFTLGJJBdZx4
/TbbRZu7nXDN2w==
=Mq1m
-----END PGP SIGNATURE-----
Merge tag 'x86-fpu-2021-07-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fpu updates from Thomas Gleixner:
"Fixes and improvements for FPU handling on x86:
- Prevent sigaltstack out of bounds writes.
The kernel unconditionally writes the FPU state to the alternate
stack without checking whether the stack is large enough to
accomodate it.
Check the alternate stack size before doing so and in case it's too
small force a SIGSEGV instead of silently corrupting user space
data.
- MINSIGSTKZ and SIGSTKSZ are constants in signal.h and have never
been updated despite the fact that the FPU state which is stored on
the signal stack has grown over time which causes trouble in the
field when AVX512 is available on a CPU. The kernel does not expose
the minimum requirements for the alternate stack size depending on
the available and enabled CPU features.
ARM already added an aux vector AT_MINSIGSTKSZ for the same reason.
Add it to x86 as well.
- A major cleanup of the x86 FPU code. The recent discoveries of
XSTATE related issues unearthed quite some inconsistencies,
duplicated code and other issues.
The fine granular overhaul addresses this, makes the code more
robust and maintainable, which allows to integrate upcoming XSTATE
related features in sane ways"
* tag 'x86-fpu-2021-07-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (74 commits)
x86/fpu/xstate: Clear xstate header in copy_xstate_to_uabi_buf() again
x86/fpu/signal: Let xrstor handle the features to init
x86/fpu/signal: Handle #PF in the direct restore path
x86/fpu: Return proper error codes from user access functions
x86/fpu/signal: Split out the direct restore code
x86/fpu/signal: Sanitize copy_user_to_fpregs_zeroing()
x86/fpu/signal: Sanitize the xstate check on sigframe
x86/fpu/signal: Remove the legacy alignment check
x86/fpu/signal: Move initial checks into fpu__restore_sig()
x86/fpu: Mark init_fpstate __ro_after_init
x86/pkru: Remove xstate fiddling from write_pkru()
x86/fpu: Don't store PKRU in xstate in fpu_reset_fpstate()
x86/fpu: Remove PKRU handling from switch_fpu_finish()
x86/fpu: Mask PKRU from kernel XRSTOR[S] operations
x86/fpu: Hook up PKRU into ptrace()
x86/fpu: Add PKRU storage outside of task XSAVE buffer
x86/fpu: Dont restore PKRU in fpregs_restore_userspace()
x86/fpu: Rename xfeatures_mask_user() to xfeatures_mask_uabi()
x86/fpu: Move FXSAVE_LEAK quirk info __copy_kernel_to_fpregs()
x86/fpu: Rename __fpregs_load_activate() to fpregs_restore_userregs()
...
Unless the user sets overcommit_memory or has plenty of swap, the latest
changes to the testcase will result in ENOMEM failures for hosts with
less than 64GB RAM. As we do not use much of the allocated memory, we
can use MAP_NORESERVE to avoid this error.
Cc: Zenghui Yu <yuzenghui@huawei.com>
Cc: vkuznets@redhat.com
Cc: wanghaibin.wang@huawei.com
Cc: stable@vger.kernel.org
Fixes: 309505dd56 ("KVM: selftests: Fix mapping length truncation in m{,un}map()")
Tested-by: Zenghui Yu <yuzenghui@huawei.com>
Link: https://lore.kernel.org/kvm/20210701160425.33666-1-borntraeger@de.ibm.com/
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Older machines like z196 and zEC12 do only support 44 bits of physical
addresses. Make this the default and check via IBC if we are on a later
machine. We then add P47V64 as an additional model.
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Link: https://lore.kernel.org/kvm/20210701153853.33063-1-borntraeger@de.ibm.com/
Fixes: 1bc603af73 ("KVM: selftests: introduce P47V64 for s390x")
Here is the big set of char / misc and other driver subsystem updates
for 5.14-rc1. Included in here are:
- habanna driver updates
- fsl-mc driver updates
- comedi driver updates
- fpga driver updates
- extcon driver updates
- interconnect driver updates
- mei driver updates
- nvmem driver updates
- phy driver updates
- pnp driver updates
- soundwire driver updates
- lots of other tiny driver updates for char and misc drivers
This is looking more and more like the "various driver subsystems mushed
together" tree...
All of these have been in linux-next for a while with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYOM8jQ8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ymECgCg0yL+8WxDKO5Gg5llM5PshvLB1rQAn0y5pDgg
nw78LV3HQ0U7qaZBtI91
=x+AR
-----END PGP SIGNATURE-----
Merge tag 'char-misc-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char / misc driver updates from Greg KH:
"Here is the big set of char / misc and other driver subsystem updates
for 5.14-rc1. Included in here are:
- habanalabs driver updates
- fsl-mc driver updates
- comedi driver updates
- fpga driver updates
- extcon driver updates
- interconnect driver updates
- mei driver updates
- nvmem driver updates
- phy driver updates
- pnp driver updates
- soundwire driver updates
- lots of other tiny driver updates for char and misc drivers
This is looking more and more like the "various driver subsystems
mushed together" tree...
All of these have been in linux-next for a while with no reported
issues"
* tag 'char-misc-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (292 commits)
mcb: Use DEFINE_RES_MEM() helper macro and fix the end address
PNP: moved EXPORT_SYMBOL so that it immediately followed its function/variable
bus: mhi: pci-generic: Add missing 'pci_disable_pcie_error_reporting()' calls
bus: mhi: Wait for M2 state during system resume
bus: mhi: core: Fix power down latency
intel_th: Wait until port is in reset before programming it
intel_th: msu: Make contiguous buffers uncached
intel_th: Remove an unused exit point from intel_th_remove()
stm class: Spelling fix
nitro_enclaves: Set Bus Master for the NE PCI device
misc: ibmasm: Modify matricies to matrices
misc: vmw_vmci: return the correct errno code
siox: Simplify error handling via dev_err_probe()
fpga: machxo2-spi: Address warning about unused variable
lkdtm/heap: Add init_on_alloc tests
selftests/lkdtm: Enable various testable CONFIGs
lkdtm: Add CONFIG hints in errors where possible
lkdtm: Enable DOUBLE_FAULT on all architectures
lkdtm/heap: Add vmalloc linear overflow test
lkdtm/bugs: XFAIL UNALIGNED_LOAD_STORE_WRITE
...
Pull RCU updates from Paul McKenney:
- Bitmap parsing support for "all" as an alias for all bits
- Documentation updates
- Miscellaneous fixes, including some that overlap into mm and lockdep
- kvfree_rcu() updates
- mem_dump_obj() updates, with acks from one of the slab-allocator
maintainers
- RCU NOCB CPU updates, including limited deoffloading
- SRCU updates
- Tasks-RCU updates
- Torture-test updates
* 'core-rcu-2021.07.04' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (78 commits)
tasks-rcu: Make show_rcu_tasks_gp_kthreads() be static inline
rcu-tasks: Make ksoftirqd provide RCU Tasks quiescent states
rcu: Add missing __releases() annotation
rcu: Remove obsolete rcu_read_unlock() deadlock commentary
rcu: Improve comments describing RCU read-side critical sections
rcu: Create an unrcu_pointer() to remove __rcu from a pointer
srcu: Early test SRCU polling start
rcu: Fix various typos in comments
rcu/nocb: Unify timers
rcu/nocb: Prepare for fine-grained deferred wakeup
rcu/nocb: Only cancel nocb timer if not polling
rcu/nocb: Delete bypass_timer upon nocb_gp wakeup
rcu/nocb: Cancel nocb_timer upon nocb_gp wakeup
rcu/nocb: Allow de-offloading rdp leader
rcu/nocb: Directly call __wake_nocb_gp() from bypass timer
rcu: Don't penalize priority boosting when there is nothing to boost
rcu: Point to documentation of ordering guarantees
rcu: Make rcu_gp_cleanup() be noinline for tracing
rcu: Restrict RCU_STRICT_GRACE_PERIOD to at most four CPUs
rcu: Make show_rcu_gp_kthreads() dump rcu_node structures blocking GP
...
This Kselftest update for Linux 5.14-rc1 consists of fixes to
existing tests and framework:
-- migrate sgx test to kselftest harness
-- add new test cases to sgx test
-- ftrace test fix event-no-pid on 1-core machine
-- splice test adjust for handler fallback removal
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmDfQGUACgkQCwJExA0N
Qxw6hw/+OEBEs+eHlDbbxHb9fBEg8kWls0AJKLAD2DEJOt1eARzyVa4J1MUpS6yK
jJi/k0wKvqhMbdQgEEL3oSVvr9JgmOell0OkLzK9tx4HgEou89GnuO+wVRAeG87o
QGDVWOa76GZwC47rvDDt9i9io075O1fol9HPgPZIdyVFcFgu45PVmRoSK4vaKUpm
m5VtEznCxcL60vD+u2XhbBO2QMUi4OoBrNdJpnkBx/U6iCv5ZQbpN3OoDgdhltXC
raQChLmy9bi2uYXskklcY1xGftyUiiWwfaiz7w6a65C2DedsUoD1lwhnaLlsJmSh
KTOx9r5xF5bgRsdbWefIc7yRk9nVNczVSRs+x8aGlXT3p+e9jwaSXWXvCaZowYOV
M+R0nCgmpOoGl5jhgeeu+P+JyC7aRRtMhVTGOiuUzyAPZbaIur1JVjoHpRHCVZle
XjkCbq+VZ7Qb6pZ5sM+ve20GIw+J8/pDc4qtpttqb+hLtwe5rJj+Ydw2GQ/yZ30c
uFwfdvGQMVMHjvPaL8IQb6WE3tiBhvJiUYQaRdR83HaWFstI14fFHgm7Zed9I19b
TAgoDZ4P08cROtTbopbjRX1B3cnx/sUrFjwW8cTstIkedzo3rHbdf4UvV/h96O+S
BjrZfsbEQBaCLW8KNu9wO/id+sPXeYwuW4HCkE0R0H1ZJI4Fgio=
=6gOE
-----END PGP SIGNATURE-----
Merge tag 'linux-kselftest-next-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull Kselftest update from Shuah Khan:
"Fixes to existing tests and framework:
- migrate sgx test to kselftest harness
- add new test cases to sgx test
- ftrace test fix event-no-pid on 1-core machine
- splice test adjust for handler fallback removal"
* tag 'linux-kselftest-next-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests/sgx: remove checks for file execute permissions
selftests/ftrace: fix event-no-pid on 1-core machine
selftests/sgx: Refine the test enclave to have storage
selftests/sgx: Add EXPECT_EEXIT() macro
selftests/sgx: Dump enclave memory map
selftests/sgx: Migrate to kselftest harness
selftests/sgx: Rename 'eenter' and 'sgx_call_vdso'
selftests: timers: rtcpie: skip test if default RTC device does not exist
selftests: lib.mk: Also install "config" and "settings"
selftests: splice: Adjust for handler fallback removal
selftests/tls: Add {} to avoid static checker warning
selftests/resctrl: Fix incorrect parsing of option "-t"
- A big series refactoring parts of our KVM code, and converting some to C.
- Support for ARCH_HAS_SET_MEMORY, and ARCH_HAS_STRICT_MODULE_RWX on some CPUs.
- Support for the Microwatt soft-core.
- Optimisations to our interrupt return path on 64-bit.
- Support for userspace access to the NX GZIP accelerator on PowerVM on Power10.
- Enable KUAP and KUEP by default on 32-bit Book3S CPUs.
- Other smaller features, fixes & cleanups.
Thanks to: Andy Shevchenko, Aneesh Kumar K.V, Arnd Bergmann, Athira Rajeev, Baokun Li,
Benjamin Herrenschmidt, Bharata B Rao, Christophe Leroy, Daniel Axtens, Daniel Henrique
Barboza, Finn Thain, Geoff Levand, Haren Myneni, Jason Wang, Jiapeng Chong, Joel Stanley,
Jordan Niethe, Kajol Jain, Nathan Chancellor, Nathan Lynch, Naveen N. Rao, Nicholas
Piggin, Nick Desaulniers, Paul Mackerras, Russell Currey, Sathvika Vasireddy, Shaokun
Zhang, Stephen Rothwell, Sudeep Holla, Suraj Jitindar Singh, Tom Rix, Vaibhav Jain,
YueHaibing, Zhang Jianhua, Zhen Lei.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmDfFS4THG1wZUBlbGxl
cm1hbi5pZC5hdQAKCRBR6+o8yOGlgFxHEAC88NJ+Gz87LiTQFt6QjhziBaJUd+sY
uqADPRROr4P50O8PjYZbMi2qbXzOlLkZO4wJWX7jpZ1F9KmbPNqY2shD8h4ahyge
F/uqzBW1FXBJfnDEKdU2MzalkeTP+dwxLZyouUamjDCGNLFjOV4x/Fft5otOdXjO
k9uO6yoGyOkWYzjC+Y/irNPlIDDByB/+bD92Cb52Y2mXMDDEnx4JzbtkeJW+8udT
Sjn3bWzeL+dz5GehjMKwK4+SptNiyQGOgM8FwtnKUMvgzxv04DqCGjr9YC12L2Z7
VoFZc4GzVgtf8DZg4fJ3KG5aG2nH3Tui7jc9lUckdrxixDAZw5wSG7CQ39gFb/5+
7A4fEJk4Z3h5llibwxAZrC7wV8ZDDXn8oRFzRcOJjfxYaD+ohOOyWHIebwkdiXYx
nfYI7sBcScDLXeBvHtDra2GJpbFSpVL3S/QNhhi1vKVNrFSyAgbAybcVL2xPLZ6+
8Mh7A8xt+hf2bo9AXuYJDo9mwXWfg1093d0kT+AslcRhZioBk18c2AiZLIz0FzuL
Ua/e5FPb99x9LSdcZHvaAXBoHT2iTgDyCyDa3gkIesyuRX6ggHoFcVQuvdDcbJ9d
H8LK+Tahy1Y+E5b6KdtU8mDEGE+QG+CWLnwQ6YSCaL/MYgaFzNa32Jdj1fmztSBC
cttP43kHZ7ljTw==
=zo4d
-----END PGP SIGNATURE-----
Merge tag 'powerpc-5.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
- A big series refactoring parts of our KVM code, and converting some
to C.
- Support for ARCH_HAS_SET_MEMORY, and ARCH_HAS_STRICT_MODULE_RWX on
some CPUs.
- Support for the Microwatt soft-core.
- Optimisations to our interrupt return path on 64-bit.
- Support for userspace access to the NX GZIP accelerator on PowerVM on
Power10.
- Enable KUAP and KUEP by default on 32-bit Book3S CPUs.
- Other smaller features, fixes & cleanups.
Thanks to: Andy Shevchenko, Aneesh Kumar K.V, Arnd Bergmann, Athira
Rajeev, Baokun Li, Benjamin Herrenschmidt, Bharata B Rao, Christophe
Leroy, Daniel Axtens, Daniel Henrique Barboza, Finn Thain, Geoff Levand,
Haren Myneni, Jason Wang, Jiapeng Chong, Joel Stanley, Jordan Niethe,
Kajol Jain, Nathan Chancellor, Nathan Lynch, Naveen N. Rao, Nicholas
Piggin, Nick Desaulniers, Paul Mackerras, Russell Currey, Sathvika
Vasireddy, Shaokun Zhang, Stephen Rothwell, Sudeep Holla, Suraj Jitindar
Singh, Tom Rix, Vaibhav Jain, YueHaibing, Zhang Jianhua, and Zhen Lei.
* tag 'powerpc-5.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (218 commits)
powerpc: Only build restart_table.c for 64s
powerpc/64s: move ret_from_fork etc above __end_soft_masked
powerpc/64s/interrupt: clean up interrupt return labels
powerpc/64/interrupt: add missing kprobe annotations on interrupt exit symbols
powerpc/64: enable MSR[EE] in irq replay pt_regs
powerpc/64s/interrupt: preserve regs->softe for NMI interrupts
powerpc/64s: add a table of implicit soft-masked addresses
powerpc/64e: remove implicit soft-masking and interrupt exit restart logic
powerpc/64e: fix CONFIG_RELOCATABLE build warnings
powerpc/64s: fix hash page fault interrupt handler
powerpc/4xx: Fix setup_kuep() on SMP
powerpc/32s: Fix setup_{kuap/kuep}() on SMP
powerpc/interrupt: Use names in check_return_regs_valid()
powerpc/interrupt: Also use exit_must_hard_disable() on PPC32
powerpc/sysfs: Replace sizeof(arr)/sizeof(arr[0]) with ARRAY_SIZE
powerpc/ptrace: Refactor regs_set_return_{msr/ip}
powerpc/ptrace: Move set_return_regs_changed() before regs_set_return_{msr/ip}
powerpc/stacktrace: Fix spurious "stale" traces in raise_backtrace_ipi()
powerpc/pseries/vas: Include irqdomain.h
powerpc: mark local variables around longjmp as volatile
...
Merge more updates from Andrew Morton:
"190 patches.
Subsystems affected by this patch series: mm (hugetlb, userfaultfd,
vmscan, kconfig, proc, z3fold, zbud, ras, mempolicy, memblock,
migration, thp, nommu, kconfig, madvise, memory-hotplug, zswap,
zsmalloc, zram, cleanups, kfence, and hmm), procfs, sysctl, misc,
core-kernel, lib, lz4, checkpatch, init, kprobes, nilfs2, hfs,
signals, exec, kcov, selftests, compress/decompress, and ipc"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (190 commits)
ipc/util.c: use binary search for max_idx
ipc/sem.c: use READ_ONCE()/WRITE_ONCE() for use_global_lock
ipc: use kmalloc for msg_queue and shmid_kernel
ipc sem: use kvmalloc for sem_undo allocation
lib/decompressors: remove set but not used variabled 'level'
selftests/vm/pkeys: exercise x86 XSAVE init state
selftests/vm/pkeys: refill shadow register after implicit kernel write
selftests/vm/pkeys: handle negative sys_pkey_alloc() return code
selftests/vm/pkeys: fix alloc_random_pkey() to make it really, really random
kcov: add __no_sanitize_coverage to fix noinstr for all architectures
exec: remove checks in __register_bimfmt()
x86: signal: don't do sas_ss_reset() until we are certain that sigframe won't be abandoned
hfsplus: report create_date to kstat.btime
hfsplus: remove unnecessary oom message
nilfs2: remove redundant continue statement in a while-loop
kprobes: remove duplicated strong free_insn_page in x86 and s390
init: print out unknown kernel parameters
checkpatch: do not complain about positive return values starting with EPOLL
checkpatch: improve the indented label test
checkpatch: scripts/spdxcheck.py now requires python3
...
Pull cgroup updates from Tejun Heo:
- cgroup.kill is added which implements atomic killing of the whole
subtree.
Down the line, this should be able to replace the multiple userland
implementations of "keep killing till empty".
- PSI can now be turned off at boot time to avoid overhead for
configurations which don't care about PSI.
* 'for-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: make per-cgroup pressure stall tracking configurable
cgroup: Fix kernel-doc
cgroup: inline cgroup_task_freeze()
tests/cgroup: test cgroup.kill
tests/cgroup: move cg_wait_for(), cg_prepare_for_wait()
tests/cgroup: use cgroup.kill in cg_killall()
docs/cgroup: add entry for cgroup.kill
cgroup: introduce cgroup.kill
TCP connections in UNREPLIED state (only SYN seen) can be kept alive
indefinitely, as each SYN re-sets the timeout.
This means that even if a peer has closed its socket the entry
never times out.
This also prevents re-evaluation of configured NAT rules.
Add a test case that sets SYN timeout to 10 seconds, then check
that the nat redirection added later eventually takes effect.
This is based off a repro script from Antonio Ojea.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
On x86, there is a set of instructions used to save and restore register
state collectively known as the XSAVE architecture. There are about a
dozen different features managed with XSAVE. The protection keys
register, PKRU, is one of those features.
The hardware optimizes XSAVE by tracking when the state has not changed
from its initial (init) state. In this case, it can avoid the cost of
writing state to memory (it would usually just be a bunch of 0's).
When the pkey register is 0x0 the hardware optionally choose to track the
register as being in the init state (optimize away the writes). AMD CPUs
do this more aggressively compared to Intel.
On x86, PKRU is rarely in its (very permissive) init state. Instead, the
value defaults to something very restrictive. It is not surprising that
bugs have popped up in the rare cases when PKRU reaches its init state.
Add a protection key selftest which gets the protection keys register into
its init state in a way that should work on Intel and AMD. Then, do a
bunch of pkey register reads to watch for inadvertent changes.
This adds "-mxsave" to CFLAGS for all the x86 vm selftests in order to
allow use of the XSAVE instruction __builtin functions. This will make
the builtins available on all of the vm selftests, but is expected to be
harmless.
Link: https://lkml.kernel.org/r/20210611164202.1849B712@viggo.jf.intel.com
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Ram Pai <linuxram@us.ibm.com>
Cc: Sandipan Das <sandipan@linux.ibm.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: "Desnes A. Nunes do Rosario" <desnesn@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Suchanek <msuchanek@suse.de>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The pkey test code keeps a "shadow" of the pkey register around. This
ensures that any bugs which might write to the register can be caught more
quickly.
Generally, userspace has a good idea when the kernel is going to write to
the register. For instance, alloc_pkey() is passed a permission mask.
The caller of alloc_pkey() can update the shadow based on the return value
and the mask.
But, the kernel can also modify the pkey register in a more sneaky way.
For mprotect(PROT_EXEC) mappings, the kernel will allocate a pkey and
write the pkey register to create an execute-only mapping. The kernel
never tells userspace what key it uses for this.
This can cause the test to fail with messages like:
protection_keys_64.2: pkey-helpers.h:132: _read_pkey_reg: Assertion `pkey_reg == shadow_pkey_reg' failed.
because the shadow was not updated with the new kernel-set value.
Forcibly update the shadow value immediately after an mprotect().
Link: https://lkml.kernel.org/r/20210611164200.EF76AB73@viggo.jf.intel.com
Fixes: 6af17cf89e ("x86/pkeys/selftests: Add PROT_EXEC test")
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Ram Pai <linuxram@us.ibm.com>
Cc: Sandipan Das <sandipan@linux.ibm.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: "Desnes A. Nunes do Rosario" <desnesn@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Suchanek <msuchanek@suse.de>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The alloc_pkey() sefltest function wraps the sys_pkey_alloc() system call.
On success, it updates its "shadow" register value because
sys_pkey_alloc() updates the real register.
But, the success check is wrong. pkey_alloc() considers any non-zero
return code to indicate success where the pkey register will be modified.
This fails to take negative return codes into account.
Consider only a positive return value as a successful call.
Link: https://lkml.kernel.org/r/20210611164157.87AB4246@viggo.jf.intel.com
Fixes: 5f23f6d082 ("x86/pkeys: Add self-tests")
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Ram Pai <linuxram@us.ibm.com>
Cc: Sandipan Das <sandipan@linux.ibm.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: "Desnes A. Nunes do Rosario" <desnesn@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Suchanek <msuchanek@suse.de>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "selftests/vm/pkeys: Bug fixes and a new test".
There has been a lot of activity on the x86 front around the XSAVE
architecture which is used to context-switch processor state (among other
things). In addition, AMD has recently joined the protection keys club by
adding processor support for PKU.
The AMD implementation helped uncover a kernel bug around the PKRU "init
state", which actually applied to Intel's implementation but was just
harder to hit. This series adds a test which is expected to help find
this class of bug both on AMD and Intel. All the work around pkeys on x86
also uncovered a few bugs in the selftest.
This patch (of 4):
The "random" pkey allocation code currently does the good old:
srand((unsigned int)time(NULL));
*But*, it unfortunately does this on every random pkey allocation.
There may be thousands of these a second. time() has a one second
resolution. So, each time alloc_random_pkey() is called, the PRNG is
*RESET* to time(). This is nasty. Normally, if you do:
srand(<ANYTHING>);
foo = rand();
bar = rand();
You'll be quite guaranteed that 'foo' and 'bar' are different. But, if
you do:
srand(1);
foo = rand();
srand(1);
bar = rand();
You are quite guaranteed that 'foo' and 'bar' are the *SAME*. The recent
"fix" effectively forced the test case to use the same "random" pkey for
the whole test, unless the test run crossed a second boundary.
Only run srand() once at program startup.
This explains some very odd and persistent test failures I've been seeing.
Link: https://lkml.kernel.org/r/20210611164153.91B76FB8@viggo.jf.intel.com
Link: https://lkml.kernel.org/r/20210611164155.192D00FF@viggo.jf.intel.com
Fixes: 6e373263ce ("selftests/vm/pkeys: fix alloc_random_pkey() to make it really random")
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Ram Pai <linuxram@us.ibm.com>
Cc: Sandipan Das <sandipan@linux.ibm.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: "Desnes A. Nunes do Rosario" <desnesn@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Suchanek <msuchanek@suse.de>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Let's add a simple test for MADV_POPULATE_READ and MADV_POPULATE_WRITE,
verifying some error handling, that population works, and that softdirty
tracking works as expected. For now, limit the test to private anonymous
memory.
Link: https://lkml.kernel.org/r/20210419135443.12822-6-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Chris Zankel <chris@zankel.net>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rolf Eike Beer <eike-kernel@sf-tec.de>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Enable test_uffdio_minor for test_type == TEST_SHMEM, and modify the test
slightly to pass in / check for the right feature flags.
Link: https://lkml.kernel.org/r/20210503180737.2487560-11-axelrasmussen@google.com
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Wang Qing <wangqing@vivo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently, the context (fds, mmap-ed areas, etc.) are global. Each test
mutates this state in some way, in some cases really "clobbering it"
(e.g., the events test mremap-ing area_dst over the top of area_src, or
the minor faults tests overwriting the count_verify values in the test
areas). We run the tests in a particular order, each test is careful to
make the right assumptions about its starting state, etc.
But, this is fragile. It's better for a test's success or failure to not
depend on what some other prior test case did to the global state.
To that end, clear and reinitialize the test context at the start of each
test case, so whatever prior test cases did doesn't affect future tests.
This is particularly relevant to this series because the events test's
mremap of area_dst screws up assumptions the minor fault test was relying
on. This wasn't a problem for hugetlb, as we don't mremap in that case.
[peterx@redhat.com: fix conflict between this patch and the uffd pagemap series]
Link: https://lkml.kernel.org/r/YKQqKrl+/cQ1utrb@t490s
Link: https://lkml.kernel.org/r/20210503180737.2487560-10-axelrasmussen@google.com
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Wang Qing <wangqing@vivo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Previously, we just allocated two shm areas: area_src and area_dst. With
this commit, change this so we also allocate area_src_alias, and
area_dst_alias.
area_*_alias and area_* (respectively) point to the same underlying
physical pages, but are different VMAs. In a future commit in this
series, we'll leverage this setup to exercise minor fault handling support
for shmem, just like we do in the hugetlb_shared test.
Link: https://lkml.kernel.org/r/20210503180737.2487560-9-axelrasmussen@google.com
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Wang Qing <wangqing@vivo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is a preparatory commit. In the future, we want to be able to setup
alias mappings for area_src and area_dst in the shmem test, like we do in
the hugetlb_shared test. With a VMA obtained via mmap(MAP_ANONYMOUS |
MAP_SHARED), it isn't clear how to do this.
So, mmap() with an fd, so we can create alias mappings. Use memfd_create
instead of actually passing in a tmpfs path like hugetlb does, since it's
more convenient / simpler to run, and works just as well.
Future commits will:
1. Setup the alias mappings.
2. Extend our tests to actually take advantage of this, to test new
userfaultfd behavior being introduced in this series.
Also, a small fix in the area we're changing: when the hugetlb setup fails
in main(), pass in the right argv[] so we actually print out the hugetlb
file path.
Link: https://lkml.kernel.org/r/20210503180737.2487560-8-axelrasmussen@google.com
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Wang Qing <wangqing@vivo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add one anonymous specific test to start using pagemap. With pagemap
support, we can directly read the uffd-wp bit from pgtable without
triggering any fault, so it's easier to do sanity checks in unit tests.
Meanwhile this test also leverages the newly introduced MADV_PAGEOUT
madvise function to test swap ptes with uffd-wp bit set, and across
fork()s.
Link: https://lkml.kernel.org/r/20210428225030.9708-7-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Wang Qing <wangqing@vivo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Introduce err()/_err() and replace all the different ways to fail the
program, mostly "fprintf" and "perror" with tons of exit() calls. Always
stop the test program at any failure.
Link: https://lkml.kernel.org/r/20210412232753.1012412-6-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Wang Qing <wangqing@vivo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
WP and MINOR modes are conditionally enabled on specific memory types.
This patch avoids dumping tons of zeros for those cases when the modes are
not supported at all.
Link: https://lkml.kernel.org/r/20210412232753.1012412-5-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Wang Qing <wangqing@vivo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It tries to check against all zeros and looped for quite a few times.
However after that we'll verify the same page with count_verify, while
count_verify can never be zero. So it means if it's a zero page we'll
detect it anyways with below code.
There's yet another place we conditionally check the fault flag - just do
it unconditionally.
Link: https://lkml.kernel.org/r/20210412232753.1012412-4-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Wang Qing <wangqing@vivo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There seems to have no guarantee that time() will return the same for the
two calls even if there's no delay, e.g. when a fault is accidentally
crossing the changing of a second. Meanwhile, this message is also not
helping that much since delay could happen with a lot of reasons, e.g.,
schedule latency of resolving thread. It may not mean an issue with uffd.
Neither do I saw this error triggered either in the past runs. Even if it
triggers, it'll be drown in all the rest of test logs. Remove it.
Link: https://lkml.kernel.org/r/20210412232753.1012412-3-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Wang Qing <wangqing@vivo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "userfaultfd/selftests: A few cleanups", v2.
I wanted to cleanup userfaultfd.c fault handling for a long time. If it's
not cleaned, when the new code grows the file it'll also grow the size
that needs to be cleaned... This is my attempt to cleanup the userfaultfd
selftest on fault handling, to use an err() macro instead of either
fprintf() or perror() then another exit() call.
The huge cleanup is done in the last patch. The first 4 patches are some
other standalone cleanups for the same file, so I put them together.
This patch (of 5):
Userfaultfd selftest does not need to handle kernel initiated fault. Set
user mode so it can be run even if unprivileged_userfaultfd=0 (which is
the default).
Link: https://lkml.kernel.org/r/20210412232753.1012412-2-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Brian Geffon <bgeffon@google.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Wang Qing <wangqing@vivo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The debug_cow attribute had been removed since commit 4958e4d86e ("mm:
thp: remove debug_cow switch"), so remove it in selftest code too,
otherwise the khugepaged test will fail.
Link: https://lkml.kernel.org/r/20210430051117.400189-1-sunnanyong@huawei.com
Fixes: 4958e4d86e ("mm: thp: remove debug_cow switch")
Signed-off-by: Nanyong Sun <sunnanyong@huawei.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Core:
- BPF:
- add syscall program type and libbpf support for generating
instructions and bindings for in-kernel BPF loaders (BPF loaders
for BPF), this is a stepping stone for signed BPF programs
- infrastructure to migrate TCP child sockets from one listener
to another in the same reuseport group/map to improve flexibility
of service hand-off/restart
- add broadcast support to XDP redirect
- allow bypass of the lockless qdisc to improving performance
(for pktgen: +23% with one thread, +44% with 2 threads)
- add a simpler version of "DO_ONCE()" which does not require
jump labels, intended for slow-path usage
- virtio/vsock: introduce SOCK_SEQPACKET support
- add getsocketopt to retrieve netns cookie
- ip: treat lowest address of a IPv4 subnet as ordinary unicast address
allowing reclaiming of precious IPv4 addresses
- ipv6: use prandom_u32() for ID generation
- ip: add support for more flexible field selection for hashing
across multi-path routes (w/ offload to mlxsw)
- icmp: add support for extended RFC 8335 PROBE (ping)
- seg6: add support for SRv6 End.DT46 behavior
- mptcp:
- DSS checksum support (RFC 8684) to detect middlebox meddling
- support Connection-time 'C' flag
- time stamping support
- sctp: packetization Layer Path MTU Discovery (RFC 8899)
- xfrm: speed up state addition with seq set
- WiFi:
- hidden AP discovery on 6 GHz and other HE 6 GHz improvements
- aggregation handling improvements for some drivers
- minstrel improvements for no-ack frames
- deferred rate control for TXQs to improve reaction times
- switch from round robin to virtual time-based airtime scheduler
- add trace points:
- tcp checksum errors
- openvswitch - action execution, upcalls
- socket errors via sk_error_report
Device APIs:
- devlink: add rate API for hierarchical control of max egress rate
of virtual devices (VFs, SFs etc.)
- don't require RCU read lock to be held around BPF hooks
in NAPI context
- page_pool: generic buffer recycling
New hardware/drivers:
- mobile:
- iosm: PCIe Driver for Intel M.2 Modem
- support for Qualcomm MSM8998 (ipa)
- WiFi: Qualcomm QCN9074 and WCN6855 PCI devices
- sparx5: Microchip SparX-5 family of Enterprise Ethernet switches
- Mellanox BlueField Gigabit Ethernet (control NIC of the DPU)
- NXP SJA1110 Automotive Ethernet 10-port switch
- Qualcomm QCA8327 switch support (qca8k)
- Mikrotik 10/25G NIC (atl1c)
Driver changes:
- ACPI support for some MDIO, MAC and PHY devices from Marvell and NXP
(our first foray into MAC/PHY description via ACPI)
- HW timestamping (PTP) support: bnxt_en, ice, sja1105, hns3, tja11xx
- Mellanox/Nvidia NIC (mlx5)
- NIC VF offload of L2 bridging
- support IRQ distribution to Sub-functions
- Marvell (prestera):
- add flower and match all
- devlink trap
- link aggregation
- Netronome (nfp): connection tracking offload
- Intel 1GE (igc): add AF_XDP support
- Marvell DPU (octeontx2): ingress ratelimit offload
- Google vNIC (gve): new ring/descriptor format support
- Qualcomm mobile (rmnet & ipa): inline checksum offload support
- MediaTek WiFi (mt76)
- mt7915 MSI support
- mt7915 Tx status reporting
- mt7915 thermal sensors support
- mt7921 decapsulation offload
- mt7921 enable runtime pm and deep sleep
- Realtek WiFi (rtw88)
- beacon filter support
- Tx antenna path diversity support
- firmware crash information via devcoredump
- Qualcomm 60GHz WiFi (wcn36xx)
- Wake-on-WLAN support with magic packets and GTK rekeying
- Micrel PHY (ksz886x/ksz8081): add cable test support
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmDb+fUACgkQMUZtbf5S
Irs2Jg//aqN0Q8CgIvYCVhPxQw1tY7pTAbgyqgBZ01vwjyvtIOgJiWzSfFEU84mX
M8fcpFX5eTKrOyJ9S6UFfQ/JG114n3hjAxFFT4Hxk2gC1Tg0vHuFQTDHcUl28bUE
mTm61e1YpdorILnv2k5JVQ/wu0vs5QKDrjcYcrcPnh+j93wvnPOgAfDBV95nZzjS
OTt4q2fR8GzLcSYWWsclMbDNkzyTG50RW/0Yd6aGjr5QGvXfrMeXfUJNz533PMf/
w5lNyjRKv+x9mdTZJzU0+msNUrZgUdRz7W8Ey8lD3hJZRE+D6/uU7FtsE8Mi3+uc
HWxeZUyzA3YF1MfVl/eesbxyPT7S/OkLzk4O5B35FbqP0YltaP+bOjq1/nM3ce1/
io9Dx9pIl/2JANUgRCAtLi8Z2dkvRoqTaBxZ/nPudCCljFwDwl6joTMJ7Ow22i5Y
5aIkcXFmZq4LbJDiHvbTlqT7yiuaEvu2UK/23bSIg/K3nF4eAmkY9Y1EgiMf60OF
78Ttw0wk2tUegwaS5MZnCniKBKDyl9gM2F6rbZ/IxQRR2LTXFc1B6gC+ynUxgXfh
Ub8O++6qGYGYZ0XvQH4pzco79p3qQWBTK5beIp2eu6BOAjBVIXq4AibUfoQLACsu
hX7jMPYd0kc3WFgUnKgQP8EnjFSwbf4XiaE7fIXvWBY8hzCw2h4=
=LvtX
-----END PGP SIGNATURE-----
Merge tag 'net-next-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core:
- BPF:
- add syscall program type and libbpf support for generating
instructions and bindings for in-kernel BPF loaders (BPF loaders
for BPF), this is a stepping stone for signed BPF programs
- infrastructure to migrate TCP child sockets from one listener to
another in the same reuseport group/map to improve flexibility
of service hand-off/restart
- add broadcast support to XDP redirect
- allow bypass of the lockless qdisc to improving performance (for
pktgen: +23% with one thread, +44% with 2 threads)
- add a simpler version of "DO_ONCE()" which does not require jump
labels, intended for slow-path usage
- virtio/vsock: introduce SOCK_SEQPACKET support
- add getsocketopt to retrieve netns cookie
- ip: treat lowest address of a IPv4 subnet as ordinary unicast
address allowing reclaiming of precious IPv4 addresses
- ipv6: use prandom_u32() for ID generation
- ip: add support for more flexible field selection for hashing
across multi-path routes (w/ offload to mlxsw)
- icmp: add support for extended RFC 8335 PROBE (ping)
- seg6: add support for SRv6 End.DT46 behavior
- mptcp:
- DSS checksum support (RFC 8684) to detect middlebox meddling
- support Connection-time 'C' flag
- time stamping support
- sctp: packetization Layer Path MTU Discovery (RFC 8899)
- xfrm: speed up state addition with seq set
- WiFi:
- hidden AP discovery on 6 GHz and other HE 6 GHz improvements
- aggregation handling improvements for some drivers
- minstrel improvements for no-ack frames
- deferred rate control for TXQs to improve reaction times
- switch from round robin to virtual time-based airtime scheduler
- add trace points:
- tcp checksum errors
- openvswitch - action execution, upcalls
- socket errors via sk_error_report
Device APIs:
- devlink: add rate API for hierarchical control of max egress rate
of virtual devices (VFs, SFs etc.)
- don't require RCU read lock to be held around BPF hooks in NAPI
context
- page_pool: generic buffer recycling
New hardware/drivers:
- mobile:
- iosm: PCIe Driver for Intel M.2 Modem
- support for Qualcomm MSM8998 (ipa)
- WiFi: Qualcomm QCN9074 and WCN6855 PCI devices
- sparx5: Microchip SparX-5 family of Enterprise Ethernet switches
- Mellanox BlueField Gigabit Ethernet (control NIC of the DPU)
- NXP SJA1110 Automotive Ethernet 10-port switch
- Qualcomm QCA8327 switch support (qca8k)
- Mikrotik 10/25G NIC (atl1c)
Driver changes:
- ACPI support for some MDIO, MAC and PHY devices from Marvell and
NXP (our first foray into MAC/PHY description via ACPI)
- HW timestamping (PTP) support: bnxt_en, ice, sja1105, hns3, tja11xx
- Mellanox/Nvidia NIC (mlx5)
- NIC VF offload of L2 bridging
- support IRQ distribution to Sub-functions
- Marvell (prestera):
- add flower and match all
- devlink trap
- link aggregation
- Netronome (nfp): connection tracking offload
- Intel 1GE (igc): add AF_XDP support
- Marvell DPU (octeontx2): ingress ratelimit offload
- Google vNIC (gve): new ring/descriptor format support
- Qualcomm mobile (rmnet & ipa): inline checksum offload support
- MediaTek WiFi (mt76)
- mt7915 MSI support
- mt7915 Tx status reporting
- mt7915 thermal sensors support
- mt7921 decapsulation offload
- mt7921 enable runtime pm and deep sleep
- Realtek WiFi (rtw88)
- beacon filter support
- Tx antenna path diversity support
- firmware crash information via devcoredump
- Qualcomm WiFi (wcn36xx)
- Wake-on-WLAN support with magic packets and GTK rekeying
- Micrel PHY (ksz886x/ksz8081): add cable test support"
* tag 'net-next-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2168 commits)
tcp: change ICSK_CA_PRIV_SIZE definition
tcp_yeah: check struct yeah size at compile time
gve: DQO: Fix off by one in gve_rx_dqo()
stmmac: intel: set PCI_D3hot in suspend
stmmac: intel: Enable PHY WOL option in EHL
net: stmmac: option to enable PHY WOL with PMT enabled
net: say "local" instead of "static" addresses in ndo_dflt_fdb_{add,del}
net: use netdev_info in ndo_dflt_fdb_{add,del}
ptp: Set lookup cookie when creating a PTP PPS source.
net: sock: add trace for socket errors
net: sock: introduce sk_error_report
net: dsa: replay the local bridge FDB entries pointing to the bridge dev too
net: dsa: ensure during dsa_fdb_offload_notify that dev_hold and dev_put are on the same dev
net: dsa: include fdb entries pointing to bridge in the host fdb list
net: dsa: include bridge addresses which are local in the host fdb list
net: dsa: sync static FDB entries on foreign interfaces to hardware
net: dsa: install the host MDB and FDB entries in the master's RX filter
net: dsa: reference count the FDB addresses at the cross-chip notifier level
net: dsa: introduce a separate cross-chip notifier type for host FDBs
net: dsa: reference count the MDB entries at the cross-chip notifier level
...
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCYNnKewAKCRCRxhvAZXjc
oo/DAQCgKsDJTSht/QXuA0bMqdsQW27AWFfKacbk5lY4EjXz1gD/ZsYU2Si1fgkB
7mEl32JsfgcIBv0VdIulAh2F29Fa0A0=
=/b8l
-----END PGP SIGNATURE-----
Merge tag 'fs.openat2.unknown_flags.v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull openat2 fixes from Christian Brauner:
- Remove the unused VALID_UPGRADE_FLAGS define we carried from an
extension to openat2() that we haven't merged. Aleksa might be
getting back to it at some point but just not right now.
- openat2() used to accidently ignore unknown flag values in the upper
32 bits.
The new openat2() syscall verifies that no unknown O-flag values are
set and returns an error to userspace if they are while the older
open syscalls like open() and openat() simply ignore unknown flag
values:
#define O_FLAG_CURRENTLY_INVALID (1 << 31)
struct open_how how = {
.flags = O_RDONLY | O_FLAG_CURRENTLY_INVALID,
.resolve = 0,
};
/* fails */
fd = openat2(-EBADF, "/dev/null", &how, sizeof(how));
/* succeeds */
fd = openat(-EBADF, "/dev/null", O_RDONLY | O_FLAG_CURRENTLY_INVALID);
However, openat2() silently truncates the upper 32 bits meaning:
#define O_FLAG_CURRENTLY_INVALID_LOWER32 (1 << 31)
#define O_FLAG_CURRENTLY_INVALID_UPPER32 (1 << 40)
struct open_how how_lowe32 = {
.flags = O_RDONLY | O_FLAG_CURRENTLY_INVALID_LOWER32,
};
struct open_how how_upper32 = {
.flags = O_RDONLY | O_FLAG_CURRENTLY_INVALID_UPPER32,
};
/* fails */
fd = openat2(-EBADF, "/dev/null", &how_lower32, sizeof(how_lower32));
/* succeeds */
fd = openat2(-EBADF, "/dev/null", &how_upper32, sizeof(how_upper32));
Fix this by preventing the immediate truncation in build_open_flags()
and add a compile-time check to catch when we add flags in the upper
32 bit range.
* tag 'fs.openat2.unknown_flags.v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
test: add openat2() test for invalid upper 32 bit flag value
open: don't silently ignore unknown O-flags in openat2()
fcntl: remove unused VALID_UPGRADE_FLAGS
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCYNnKNAAKCRCRxhvAZXjc
ok/HAQDz3FK3/yeqxH6OLyCedUcD+YBFPPzrfqX+3y6q3z5tGgD9GAGxFXWcMFA2
/cbfmizwh1eJ3WMnbHUp7x6ogpQhWwQ=
=PpNs
-----END PGP SIGNATURE-----
Merge tag 'fs.mount_setattr.nosymfollow.v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull mount_setattr updates from Christian Brauner:
"A few releases ago the old mount API gained support for a mount
options which prevents following symlinks on a given mount. This adds
support for it in the new mount api through the MOUNT_ATTR_NOSYMFOLLOW
flag via mount_setattr() and fsmount(). With mount_setattr() that flag
can even be applied recursively.
There's an additional ack from Ross Zwisler who originally authored
the nosymfollow patch. As I've already had the patches in my for-next
I didn't add his ack explicitly"
* tag 'fs.mount_setattr.nosymfollow.v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
tests: test MOUNT_ATTR_NOSYMFOLLOW with mount_setattr()
mount: Support "nosymfollow" in new mount api
Merge misc updates from Andrew Morton:
"191 patches.
Subsystems affected by this patch series: kthread, ia64, scripts,
ntfs, squashfs, ocfs2, kernel/watchdog, and mm (gup, pagealloc, slab,
slub, kmemleak, dax, debug, pagecache, gup, swap, memcg, pagemap,
mprotect, bootmem, dma, tracing, vmalloc, kasan, initialization,
pagealloc, and memory-failure)"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (191 commits)
mm,hwpoison: make get_hwpoison_page() call get_any_page()
mm,hwpoison: send SIGBUS with error virutal address
mm/page_alloc: split pcp->high across all online CPUs for cpuless nodes
mm/page_alloc: allow high-order pages to be stored on the per-cpu lists
mm: replace CONFIG_FLAT_NODE_MEM_MAP with CONFIG_FLATMEM
mm: replace CONFIG_NEED_MULTIPLE_NODES with CONFIG_NUMA
docs: remove description of DISCONTIGMEM
arch, mm: remove stale mentions of DISCONIGMEM
mm: remove CONFIG_DISCONTIGMEM
m68k: remove support for DISCONTIGMEM
arc: remove support for DISCONTIGMEM
arc: update comment about HIGHMEM implementation
alpha: remove DISCONTIGMEM and NUMA
mm/page_alloc: move free_the_page
mm/page_alloc: fix counting of managed_pages
mm/page_alloc: improve memmap_pages dbg msg
mm: drop SECTION_SHIFT in code comments
mm/page_alloc: introduce vm.percpu_pagelist_high_fraction
mm/page_alloc: limit the number of pages on PCP lists when reclaim is active
mm/page_alloc: scale the number of pages that are batch freed
...
Trivial conflict in net/netfilter/nf_tables_api.c.
Duplicate fix in tools/testing/selftests/net/devlink_port_split.py
- take the net-next version.
skmsg, and L4 bpf - keep the bpf code but remove the flags
and err params.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
- Consolidate the macros for .byte ... opcode sequences
- Deduplicate register offset defines in include files
- Simplify the ia32,x32 compat handling of the related syscall tables to
get rid of #ifdeffery.
- Clear all EFLAGS which are not required for syscall handling
- Consolidate the syscall tables and switch the generation over to the
generic shell script and remove the CFLAGS tweaks which are not longer
required.
- Use 'int' type for system call numbers to match the generic code.
- Add more selftests for syscalls
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmDbKzMTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoae8D/9+pksdf8lE5dRLtngSeTDLiyIV+qq4
vSks7XfrTTAhOV2nRwtIulc2CO6H7jcvn6ehmiC/X0Tn9JK5brwSJJYryNEjA3cp
3p9jPrB1w1SDhx35JzILN4DDaJfI3jobLSLDq0KQzuEL0+c0R4l3WBplpCzbLjqj
NaFQgslf8RSnjha9NLTKzlzSaNNNo9Ioo6DyrsBDEdcRBtAPlFfdVtT3oJE73ANH
dK5POoVWysmAnDAwEW17j9bBJLtxeWsrhM9CrtqvcKr3HhK9WjWUFAr+diQf5GKf
BAD2A+5y8wZQXvFOuC9WZxfQwUFSLExt8BfcXblOUbf2CdlvoYVzOlvI141kA++4
q4wQ1vl6MbLCp6wLysc3bnwKUEmnf2E4Iyj5JR2aFrw096pAoZ3ZbAQi7s3Vhb16
aSbGxIw3rHRuB0f8VmOA0iEHiXlkRmE/K+nH1/uDTUZLaDpktPvpKQJsp0+9qXFk
eVtEw4bVKJ7q5ozjMzpm9aPxPp1v8MGxUOJOy80W7Ti+vBp2KmMKc1gy8QsYrTvW
Vzvpp3U+/WFh2X7AG0zlP/JEnOuJmMwMK5QhzMC2rEbaHJ66ht7SABvtSbOHHw5Z
zugxTE0lx3n7izCxW1RLEu//xtWY0FbU2L5oE2Ace27myUPeBQCDJzynUn93dMM9
9nq2TtgTCF6XvA==
=+sb9
-----END PGP SIGNATURE-----
Merge tag 'x86-entry-2021-06-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 entry code related updates from Thomas Gleixner:
- Consolidate the macros for .byte ... opcode sequences
- Deduplicate register offset defines in include files
- Simplify the ia32,x32 compat handling of the related syscall tables
to get rid of #ifdeffery.
- Clear all EFLAGS which are not required for syscall handling
- Consolidate the syscall tables and switch the generation over to the
generic shell script and remove the CFLAGS tweaks which are not
longer required.
- Use 'int' type for system call numbers to match the generic code.
- Add more selftests for syscalls
* tag 'x86-entry-2021-06-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/syscalls: Don't adjust CFLAGS for syscall tables
x86/syscalls: Remove -Wno-override-init for syscall tables
x86/uml/syscalls: Remove array index from syscall initializers
x86/syscalls: Clear 'offset' and 'prefix' in case they are set in env
x86/entry: Use int everywhere for system call numbers
x86/entry: Treat out of range and gap system calls the same
x86/entry/64: Sign-extend system calls on entry to int
selftests/x86/syscall: Add tests under ptrace to syscall_numbering_64
selftests/x86/syscall: Simplify message reporting in syscall_numbering
selftests/x86/syscall: Update and extend syscall_numbering_64
x86/syscalls: Switch to generic syscallhdr.sh
x86/syscalls: Use __NR_syscalls instead of __NR_syscall_max
x86/unistd: Define X32_NR_syscalls only for 64-bit kernel
x86/syscalls: Stop filling syscall arrays with *_sys_ni_syscall
x86/syscalls: Switch to generic syscalltbl.sh
x86/entry/x32: Rename __x32_compat_sys_* to __x64_compat_sys_*
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmDa2mEACgkQUqAMR0iA
lPKBiBAAgvhNnaRVR6/GBVrv5jYM8obJM7PHPxp8dh+ZRb1mDyL1ZDU2r7lmQjMD
ORBN5eK6pXk/gVabXR5lY0B7vQ8phJmYO98Lk2E3n9ZTgMkTHQ5WOHzBpt93gd/y
l9m00ZD0YcHrkmM1fq73FuZVEMzPk85cjTe8n6JeHJgSAdoOY/rl61Cn57ZHFIa4
DkpdNGkJaf77UIWOc8NLAXOdSD9NxSGycHXpU0q8QO9UFq+Le2qN4OPj3S1CNCO2
ciy+VcW1VQ/BesPPlBIk3ImHWPS4ty3n4EYFzNm+saElIaWxyhNBXAvcBAK/x9LK
3QibfBFwbS3sllhnk96Z24UaWWMg2AygbV2aqd3xMLpW3XD6q/MVnWGHfayhnmYj
aNcWpldIjwDH4iZJ5vnD4KewQpYp+Jc5Hqv6UyIf1O8nEvvQubrDXjSDLLcbZFI1
m2cs9DTc5ezyX/DifBsViDbw8hPjJg7QAbRjVk1EfVQrH090mRQoSoQQI4QtuMEi
pPiTALNG1HRKIoYrKxQMB43JvZ1zjaSbtNbW4JJ9Bm3kxFZ/Oa8NXzE5BOjeykZm
bCePtc018GZaGNW0z/Zr460c/Tuaj8fZSzUOj9Xnw5Hv4G3W5+5EqDy7sU/GTPjL
It9rAZYo+cM9vp1BD2343YPZgnChWHaW0BF/WDqFAhLd9av/WKI=
=Oa1c
-----END PGP SIGNATURE-----
Merge tag 'printk-for-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux
Pull printk updates from Petr Mladek:
- Add %pt[RT]s modifier to vsprintf(). It overrides ISO 8601 separator
by using ' ' (space). It produces "YYYY-mm-dd HH:MM:SS" instead of
"YYYY-mm-ddTHH:MM:SS".
- Correctly parse long row of numbers by sscanf() when using the field
width. Add extensive sscanf() selftest.
- Generalize re-entrant CPU lock that has already been used to
serialize dump_stack() output. It is part of the ongoing printk
rework. It will allow to remove the obsoleted printk_safe buffers and
introduce atomic consoles.
- Some code clean up and sparse warning fixes.
* tag 'printk-for-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
printk: fix cpu lock ordering
lib/dump_stack: move cpu lock to printk.c
printk: Remove trailing semicolon in macros
random32: Fix implicit truncation warning in prandom_seed_state()
lib: test_scanf: Remove pointless use of type_min() with unsigned types
selftests: lib: Add wrapper script for test_scanf
lib: test_scanf: Add tests for sscanf number conversion
lib: vsprintf: Fix handling of number field widths in vsscanf
lib: vsprintf: scanf: Negative number must have field width > 1
usb: host: xhci-tegra: Switch to use %ptTs
nilfs2: Switch to use %ptTs
kdb: Switch to use %ptTs
lib/vsprintf: Allow to override ISO 8601 date and time separator
Patch series "mm/gup: Fix pin page write cache bouncing on has_pinned", v2.
This series contains 3 patches, the 1st one enables threading for
gup_benchmark in the kselftest. The latter two patches are collected from
Andrea's local branch which can fix write cache bouncing issue with
pinning fast-gup.
To be explicit on the latter two patches:
- the 2nd patch fixes the perf degrade when introducing has_pinned, then
- the last patch tries to remove the has_pinned with a bit in mm->flags
For patch 3: originally I think we had a plan to reuse has_pinned into a
counter very soon, however that's not happening at least until today, so
maybe it proves that we can remove it until we really want such a counter
for whatever reason. As the commit message stated, it saves 4 bytes for
each mm without observable regressions.
Regarding testing: we can reference to the commit message of patch 2 for
some detailed testing with will-is-scale. Meanwhile I did patch 1 just
because then we can even easily verify the patchset using the existing
kselftest facilities or even regress test it in the future with the repo
if we want.
Below numbers are extra verification tests that I did besides commit
message of patch 2 using the new gup_benchmark and 256 cpus. Below test
is done on 40 cpus host with Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz,
and I can get similar result (of course the write cache bouncing get
severe with even more cores).
After patch 1 applied (only test patch, so using old kernel):
$ sudo chrt -f 1 ./gup_test -a -m 512 -j 40
PIN_FAST_BENCHMARK: Time: get:459632 put:5990 us
PIN_FAST_BENCHMARK: Time: get:461967 put:5840 us
PIN_FAST_BENCHMARK: Time: get:464521 put:6140 us
PIN_FAST_BENCHMARK: Time: get:465176 put:7100 us
PIN_FAST_BENCHMARK: Time: get:465960 put:6733 us
PIN_FAST_BENCHMARK: Time: get:465324 put:6781 us
PIN_FAST_BENCHMARK: Time: get:466018 put:7130 us
PIN_FAST_BENCHMARK: Time: get:466362 put:7118 us
PIN_FAST_BENCHMARK: Time: get:465118 put:6975 us
PIN_FAST_BENCHMARK: Time: get:466422 put:6602 us
PIN_FAST_BENCHMARK: Time: get:465791 put:6818 us
PIN_FAST_BENCHMARK: Time: get:467091 put:6298 us
PIN_FAST_BENCHMARK: Time: get:467694 put:5432 us
PIN_FAST_BENCHMARK: Time: get:469575 put:5581 us
PIN_FAST_BENCHMARK: Time: get:468124 put:6055 us
PIN_FAST_BENCHMARK: Time: get:468877 put:6720 us
PIN_FAST_BENCHMARK: Time: get:467212 put:4961 us
PIN_FAST_BENCHMARK: Time: get:467834 put:6697 us
PIN_FAST_BENCHMARK: Time: get:470778 put:6398 us
PIN_FAST_BENCHMARK: Time: get:469788 put:6310 us
PIN_FAST_BENCHMARK: Time: get:488277 put:7113 us
PIN_FAST_BENCHMARK: Time: get:486613 put:7085 us
PIN_FAST_BENCHMARK: Time: get:486940 put:7202 us
PIN_FAST_BENCHMARK: Time: get:488728 put:7101 us
PIN_FAST_BENCHMARK: Time: get:487570 put:7327 us
PIN_FAST_BENCHMARK: Time: get:489260 put:7027 us
PIN_FAST_BENCHMARK: Time: get:488846 put:6866 us
PIN_FAST_BENCHMARK: Time: get:488521 put:6745 us
PIN_FAST_BENCHMARK: Time: get:489950 put:6459 us
PIN_FAST_BENCHMARK: Time: get:489777 put:6617 us
PIN_FAST_BENCHMARK: Time: get:488224 put:6591 us
PIN_FAST_BENCHMARK: Time: get:488644 put:6477 us
PIN_FAST_BENCHMARK: Time: get:488754 put:6711 us
PIN_FAST_BENCHMARK: Time: get:488875 put:6743 us
PIN_FAST_BENCHMARK: Time: get:489290 put:6657 us
PIN_FAST_BENCHMARK: Time: get:490264 put:6684 us
PIN_FAST_BENCHMARK: Time: get:489631 put:6737 us
PIN_FAST_BENCHMARK: Time: get:488434 put:6655 us
PIN_FAST_BENCHMARK: Time: get:492213 put:6297 us
PIN_FAST_BENCHMARK: Time: get:491124 put:6173 us
After the whole series applied (new fixed kernel):
$ sudo chrt -f 1 ./gup_test -a -m 512 -j 40
PIN_FAST_BENCHMARK: Time: get:82038 put:7041 us
PIN_FAST_BENCHMARK: Time: get:82144 put:6817 us
PIN_FAST_BENCHMARK: Time: get:83417 put:6674 us
PIN_FAST_BENCHMARK: Time: get:82540 put:6594 us
PIN_FAST_BENCHMARK: Time: get:83214 put:6681 us
PIN_FAST_BENCHMARK: Time: get:83444 put:6889 us
PIN_FAST_BENCHMARK: Time: get:83194 put:7499 us
PIN_FAST_BENCHMARK: Time: get:84876 put:7369 us
PIN_FAST_BENCHMARK: Time: get:86092 put:10289 us
PIN_FAST_BENCHMARK: Time: get:86153 put:10415 us
PIN_FAST_BENCHMARK: Time: get:85026 put:7751 us
PIN_FAST_BENCHMARK: Time: get:85458 put:7944 us
PIN_FAST_BENCHMARK: Time: get:85735 put:8154 us
PIN_FAST_BENCHMARK: Time: get:85851 put:8299 us
PIN_FAST_BENCHMARK: Time: get:86323 put:9617 us
PIN_FAST_BENCHMARK: Time: get:86288 put:10496 us
PIN_FAST_BENCHMARK: Time: get:87697 put:9346 us
PIN_FAST_BENCHMARK: Time: get:87980 put:8382 us
PIN_FAST_BENCHMARK: Time: get:88719 put:8400 us
PIN_FAST_BENCHMARK: Time: get:87616 put:8588 us
PIN_FAST_BENCHMARK: Time: get:86730 put:9563 us
PIN_FAST_BENCHMARK: Time: get:88167 put:8673 us
PIN_FAST_BENCHMARK: Time: get:86844 put:9777 us
PIN_FAST_BENCHMARK: Time: get:88068 put:11774 us
PIN_FAST_BENCHMARK: Time: get:86170 put:15676 us
PIN_FAST_BENCHMARK: Time: get:87967 put:12827 us
PIN_FAST_BENCHMARK: Time: get:95773 put:7652 us
PIN_FAST_BENCHMARK: Time: get:87734 put:13650 us
PIN_FAST_BENCHMARK: Time: get:89833 put:14237 us
PIN_FAST_BENCHMARK: Time: get:96186 put:8029 us
PIN_FAST_BENCHMARK: Time: get:95532 put:8886 us
PIN_FAST_BENCHMARK: Time: get:95351 put:5826 us
PIN_FAST_BENCHMARK: Time: get:96401 put:8407 us
PIN_FAST_BENCHMARK: Time: get:96473 put:8287 us
PIN_FAST_BENCHMARK: Time: get:97177 put:8430 us
PIN_FAST_BENCHMARK: Time: get:98120 put:5263 us
PIN_FAST_BENCHMARK: Time: get:96271 put:7757 us
PIN_FAST_BENCHMARK: Time: get:99628 put:10467 us
PIN_FAST_BENCHMARK: Time: get:99344 put:10045 us
PIN_FAST_BENCHMARK: Time: get:94212 put:15485 us
Summary:
Old kernel: 477729.97 (+-3.79%)
New kernel: 89144.65 (+-11.76%)
This patch (of 3):
Add a new parameter "-j N" to support concurrent gup test.
Link: https://lkml.kernel.org/r/20210507150553.208763-1-peterx@redhat.com
Link: https://lkml.kernel.org/r/20210507150553.208763-2-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Kirill Shutemov <kirill@shutemov.name>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull user namespace rlimit handling update from Eric Biederman:
"This is the work mainly by Alexey Gladkov to limit rlimits to the
rlimits of the user that created a user namespace, and to allow users
to have stricter limits on the resources created within a user
namespace."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
cred: add missing return error code when set_cred_ucounts() failed
ucounts: Silence warning in dec_rlimit_ucounts
ucounts: Set ucount_max to the largest positive value the type can hold
kselftests: Add test to check for rlimit changes in different user namespaces
Reimplement RLIMIT_MEMLOCK on top of ucounts
Reimplement RLIMIT_SIGPENDING on top of ucounts
Reimplement RLIMIT_MSGQUEUE on top of ucounts
Reimplement RLIMIT_NPROC on top of ucounts
Use atomic_t for ucounts reference counting
Add a reference to ucounts for each cred
Increase size of ucounts to atomic_long_t
And thus avoid a Python stacktrace:
~/linux/tools/testing/selftests/net$ ./devlink_port_split.py
Traceback (most recent call last):
File "/home/linux/tools/testing/selftests/net/./devlink_port_split.py",
line 277, in <module> main()
File "/home/linux/tools/testing/selftests/net/./devlink_port_split.py",
line 242, in main
dev = list(devs.keys())[0]
IndexError: list index out of range
Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Add MTE support in guests, complete with tag save/restore interface
- Reduce the impact of CMOs by moving them in the page-table code
- Allow device block mappings at stage-2
- Reduce the footprint of the vmemmap in protected mode
- Support the vGIC on dumb systems such as the Apple M1
- Add selftest infrastructure to support multiple configuration
and apply that to PMU/non-PMU setups
- Add selftests for the debug architecture
- The usual crop of PMU fixes
PPC:
- Support for the H_RPT_INVALIDATE hypercall
- Conversion of Book3S entry/exit to C
- Bug fixes
S390:
- new HW facilities for guests
- make inline assembly more robust with KASAN and co
x86:
- Allow userspace to handle emulation errors (unknown instructions)
- Lazy allocation of the rmap (host physical -> guest physical address)
- Support for virtualizing TSC scaling on VMX machines
- Optimizations to avoid shattering huge pages at the beginning of live migration
- Support for initializing the PDPTRs without loading them from memory
- Many TLB flushing cleanups
- Refuse to load if two-stage paging is available but NX is not (this has
been a requirement in practice for over a year)
- A large series that separates the MMU mode (WP/SMAP/SMEP etc.) from
CR0/CR4/EFER, using the MMU mode everywhere once it is computed
from the CPU registers
- Use PM notifier to notify the guest about host suspend or hibernate
- Support for passing arguments to Hyper-V hypercalls using XMM registers
- Support for Hyper-V TLB flush hypercalls and enlightened MSR bitmap on
AMD processors
- Hide Hyper-V hypercalls that are not included in the guest CPUID
- Fixes for live migration of virtual machines that use the Hyper-V
"enlightened VMCS" optimization of nested virtualization
- Bugfixes (not many)
Generic:
- Support for retrieving statistics without debugfs
- Cleanups for the KVM selftests API
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmDV9UYUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroOIRgf/XX8fKLh24RnTOs2ldIu2AfRGVrT4
QMrr8MxhmtukBAszk2xKvBt8/6gkUjdaIC3xqEnVjxaDaUvZaEtP7CQlF5JV45rn
iv1zyxUKucXrnIOr+gCioIT7qBlh207zV35ArKioP9Y83cWx9uAs22pfr6g+7RxO
h8bJZlJbSG6IGr3voANCIb9UyjU1V/l8iEHqRwhmr/A5rARPfD7g8lfMEQeGkzX6
+/UydX2fumB3tl8e2iMQj6vLVdSOsCkehvpHK+Z33EpkKhan7GwZ2sZ05WmXV/nY
QLAYfD10KegoNWl5Ay4GTp4hEAIYVrRJCLC+wnLdc0U8udbfCuTC31LK4w==
=NcRh
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"This covers all architectures (except MIPS) so I don't expect any
other feature pull requests this merge window.
ARM:
- Add MTE support in guests, complete with tag save/restore interface
- Reduce the impact of CMOs by moving them in the page-table code
- Allow device block mappings at stage-2
- Reduce the footprint of the vmemmap in protected mode
- Support the vGIC on dumb systems such as the Apple M1
- Add selftest infrastructure to support multiple configuration and
apply that to PMU/non-PMU setups
- Add selftests for the debug architecture
- The usual crop of PMU fixes
PPC:
- Support for the H_RPT_INVALIDATE hypercall
- Conversion of Book3S entry/exit to C
- Bug fixes
S390:
- new HW facilities for guests
- make inline assembly more robust with KASAN and co
x86:
- Allow userspace to handle emulation errors (unknown instructions)
- Lazy allocation of the rmap (host physical -> guest physical
address)
- Support for virtualizing TSC scaling on VMX machines
- Optimizations to avoid shattering huge pages at the beginning of
live migration
- Support for initializing the PDPTRs without loading them from
memory
- Many TLB flushing cleanups
- Refuse to load if two-stage paging is available but NX is not (this
has been a requirement in practice for over a year)
- A large series that separates the MMU mode (WP/SMAP/SMEP etc.) from
CR0/CR4/EFER, using the MMU mode everywhere once it is computed
from the CPU registers
- Use PM notifier to notify the guest about host suspend or hibernate
- Support for passing arguments to Hyper-V hypercalls using XMM
registers
- Support for Hyper-V TLB flush hypercalls and enlightened MSR bitmap
on AMD processors
- Hide Hyper-V hypercalls that are not included in the guest CPUID
- Fixes for live migration of virtual machines that use the Hyper-V
"enlightened VMCS" optimization of nested virtualization
- Bugfixes (not many)
Generic:
- Support for retrieving statistics without debugfs
- Cleanups for the KVM selftests API"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (314 commits)
KVM: x86: rename apic_access_page_done to apic_access_memslot_enabled
kvm: x86: disable the narrow guest module parameter on unload
selftests: kvm: Allows userspace to handle emulation errors.
kvm: x86: Allow userspace to handle emulation errors
KVM: x86/mmu: Let guest use GBPAGES if supported in hardware and TDP is on
KVM: x86/mmu: Get CR4.SMEP from MMU, not vCPU, in shadow page fault
KVM: x86/mmu: Get CR0.WP from MMU, not vCPU, in shadow page fault
KVM: x86/mmu: Drop redundant rsvd bits reset for nested NPT
KVM: x86/mmu: Optimize and clean up so called "last nonleaf level" logic
KVM: x86: Enhance comments for MMU roles and nested transition trickiness
KVM: x86/mmu: WARN on any reserved SPTE value when making a valid SPTE
KVM: x86/mmu: Add helpers to do full reserved SPTE checks w/ generic MMU
KVM: x86/mmu: Use MMU's role to determine PTTYPE
KVM: x86/mmu: Collapse 32-bit PAE and 64-bit statements for helpers
KVM: x86/mmu: Add a helper to calculate root from role_regs
KVM: x86/mmu: Add helper to update paging metadata
KVM: x86/mmu: Don't update nested guest's paging bitmasks if CR0.PG=0
KVM: x86/mmu: Consolidate reset_rsvds_bits_mask() calls
KVM: x86/mmu: Use MMU role_regs to get LA57, and drop vCPU LA57 helper
KVM: x86/mmu: Get nested MMU's root level from the MMU's role
...
Daniel Borkmann says:
====================
pull-request: bpf-next 2021-06-28
The following pull-request contains BPF updates for your *net-next* tree.
We've added 37 non-merge commits during the last 12 day(s) which contain
a total of 56 files changed, 394 insertions(+), 380 deletions(-).
The main changes are:
1) XDP driver RCU cleanups, from Toke Høiland-Jørgensen and Paul E. McKenney.
2) Fix bpf_skb_change_proto() IPv4/v6 GSO handling, from Maciej Żenczykowski.
3) Fix false positive kmemleak report for BPF ringbuf alloc, from Rustam Kovhaev.
4) Fix x86 JIT's extable offset calculation for PROBE_LDX NULL, from Ravi Bangoria.
5) Enable libbpf fallback probing with tracing under RHEL7, from Jonathan Edwards.
6) Clean up x86 JIT to remove unused cnt tracking from EMIT macro, from Jiri Olsa.
7) Netlink cleanups for libbpf to please Coverity, from Kumar Kartikeya Dwivedi.
8) Allow to retrieve ancestor cgroup id in tracing programs, from Namhyung Kim.
9) Fix lirc BPF program query to use user-provided prog_cnt, from Sean Young.
10) Add initial libbpf doc including generated kdoc for its API, from Grant Seltzer.
11) Make xdp_rxq_info_unreg_mem_model() more robust, from Jakub Kicinski.
12) Fix up bpfilter startup log-level to info level, from Gary Lin.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>