The bpf iterator for map elements are implemented.
The bpf program will receive four parameters:
bpf_iter_meta *meta: the meta data
bpf_map *map: the bpf_map whose elements are traversed
void *key: the key of one element
void *value: the value of the same element
Here, meta and map pointers are always valid, and
key has register type PTR_TO_RDONLY_BUF_OR_NULL and
value has register type PTR_TO_RDWR_BUF_OR_NULL.
The kernel will track the access range of key and value
during verification time. Later, these values will be compared
against the values in the actual map to ensure all accesses
are within range.
A new field iter_seq_info is added to bpf_map_ops which
is used to add map type specific information, i.e., seq_ops,
init/fini seq_file func and seq_file private data size.
Subsequent patches will have actual implementation
for bpf_map_ops->iter_seq_info.
In user space, BPF_ITER_LINK_MAP_FD needs to be
specified in prog attr->link_create.flags, which indicates
that attr->link_create.target_fd is a map_fd.
The reason for such an explicit flag is for possible
future cases where one bpf iterator may allow more than
one possible customization, e.g., pid and cgroup id for
task_file.
Current kernel internal implementation only allows
the target to register at most one required bpf_iter_link_info.
To support the above case, optional bpf_iter_link_info's
are needed, the target can be extended to register such link
infos, and user provided link_info needs to match one of
target supported ones.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200723184112.590360-1-yhs@fb.com
Readonly and readwrite buffer register states
are introduced. Totally four states,
PTR_TO_RDONLY_BUF[_OR_NULL] and PTR_TO_RDWR_BUF[_OR_NULL]
are supported. As suggested by their respective
names, PTR_TO_RDONLY_BUF[_OR_NULL] are for
readonly buffers and PTR_TO_RDWR_BUF[_OR_NULL]
for read/write buffers.
These new register states will be used
by later bpf map element iterator.
New register states share some similarity to
PTR_TO_TP_BUFFER as it will calculate accessed buffer
size during verification time. The accessed buffer
size will be later compared to other metrics during
later attach/link_create time.
Similar to reg_state PTR_TO_BTF_ID_OR_NULL in bpf
iterator programs, PTR_TO_RDONLY_BUF_OR_NULL or
PTR_TO_RDWR_BUF_OR_NULL reg_types can be set at
prog->aux->bpf_ctx_arg_aux, and bpf verifier will
retrieve the values during btf_ctx_access().
Later bpf map element iterator implementation
will show how such information will be assigned
during target registeration time.
The verifier is also enhanced such that PTR_TO_RDONLY_BUF
can be passed to ARG_PTR_TO_MEM[_OR_NULL] helper argument, and
PTR_TO_RDWR_BUF can be passed to ARG_PTR_TO_MEM[_OR_NULL] or
ARG_PTR_TO_UNINIT_MEM.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200723184111.590274-1-yhs@fb.com
This patch refactored target bpf_iter_init_seq_priv_t callback
function to accept additional information. This will be needed
in later patches for map element targets since a particular
map should be passed to traverse elements for that particular
map. In the future, other information may be passed to target
as well, e.g., pid, cgroup id, etc. to customize the iterator.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200723184110.590156-1-yhs@fb.com
There is no functionality change for this patch.
Struct bpf_iter_reg is used to register a bpf_iter target,
which includes information for both prog_load, link_create
and seq_file creation.
This patch puts fields related seq_file creation into
a different structure. This will be useful for map
elements iterator where one iterator covers different
map types and different map types may have different
seq_ops, init/fini private_data function and
private_data size.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200723184109.590030-1-yhs@fb.com
It's mostly a copy paste of commit 6086d29def ("bpf: Add bpf_map iterator")
that is use to implement bpf_seq_file opreations to traverse all bpf programs.
v1->v2: Tweak to use build time btf_id
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Currently, the pos pointer in bpf iterator map/task/task_file
seq_ops->start() is always incremented.
This is incorrect. It should be increased only if
*pos is 0 (for SEQ_START_TOKEN) since these start()
function actually returns the first real object.
If *pos is not 0, it merely found the object
based on the state in seq->private, and not really
advancing the *pos. This patch fixed this issue
by only incrementing *pos if it is 0.
Note that the old *pos calculation, although not
correct, does not affect correctness of bpf_iter
as bpf_iter seq_file->read() does not support llseek.
This patch also renamed "mid" in bpf_map iterator
seq_file private data to "map_id" for better clarity.
Fixes: 6086d29def ("bpf: Add bpf_map iterator")
Fixes: eaaacd2391 ("bpf: Add task and task/file iterator targets")
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200722195156.4029817-1-yhs@fb.com
Cover the case when BPF socket lookup returns a socket that belongs to a
reuseport group, and the reuseport group contains connected UDP sockets.
Ensure that the presence of connected UDP sockets in reuseport group does
not affect the socket lookup result. Socket selected by reuseport should
always be used as result in such case.
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Link: https://lore.kernel.org/bpf/20200722161720.940831-3-jakub@cloudflare.com
When BPF socket lookup prog selects a socket that belongs to a reuseport
group, and the reuseport group has connected sockets in it, the socket
selected by reuseport will be discarded, and socket returned by BPF socket
lookup will be used instead.
Modify this behavior so that the socket selected by reuseport running after
BPF socket lookup always gets used. Ignore the fact that the reuseport
group might have connections because it is only relevant when scoring
sockets during regular hashtable-based lookup.
Fixes: 72f7e9440e ("udp: Run SK_LOOKUP BPF program on socket lookup")
Fixes: 6d4201b138 ("udp6: Run SK_LOOKUP BPF program on socket lookup")
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Link: https://lore.kernel.org/bpf/20200722161720.940831-2-jakub@cloudflare.com
The UDP reuseport conflict was a little bit tricky.
The net-next code, via bpf-next, extracted the reuseport handling
into a helper so that the BPF sk lookup code could invoke it.
At the same time, the logic for reuseport handling of unconnected
sockets changed via commit efc6b6f6c3
which changed the logic to carry on the reuseport result into the
rest of the lookup loop if we do not return immediately.
This requires moving the reuseport_has_conns() logic into the callers.
While we are here, get rid of inline directives as they do not belong
in foo.c files.
The other changes were cases of more straightforward overlapping
modifications.
Signed-off-by: David S. Miller <davem@davemloft.net>
I have a few more fixes this week:
* A fix to avoid using SBI calls during kasan initialization, as the SBI calls
themselves have not been probed yet.
* Three fixes related to systems with multiple memory regions.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAl8cebwTHHBhbG1lckBk
YWJiZWx0LmNvbQAKCRAuExnzX7sYiV4UEACunm3LQGjjZUbHX/1/X2iAdNBcnbiZ
SFSXaKuAiDzSgT0//UMloo33eywo3afGCTOCysi3zrwNU+Hbj5l9byTMnvXy9hGv
QCFf2aEma9Ebk9ImdLDjkHQy2e1SiQRHQiKQcGTtP4LN470rrysh0a8qLLVAGucJ
j5fDPcy9smRw4x0lGHqLXDmVIm8nQNJAj6iKVQYzA2mSJC2+RoKcPcunnXvoOUwl
q/lcp41BavZyYsDIoAblRCA1/sfFL9BfocL4fUBFh4HUp/qcqwsGPv+U5QoEzkQ8
9LLLuXnHXgU5BTv4zVCUqXjMza6fgZj+qMpxXLQig0DRBetuQY2gpCdt2JAkcXTj
XvkHavxkA6206myl9xY6OJueC/25I6dp0eDENBpUNET8fqbeFBhIUCuNtAzT+98J
OcMImyFQgoesGU+6dkdHr6tSJHDcI3PzXlVUNH3hkIQq5Ny10g+jjAInZo6iXfpW
yBu+AzvippbD6C8/L+ntP94y9F/Me5rZKWMBhT2gTLm6JLrtC/Wl+ol2fZF8gmIU
wdT6TUykDY6u6bdqPSRKISzy0HfuREQNJU3qqoI496BMWQ1DaQjHB1482fdtdELI
KUFkM6u9iudWu5Dap45rSCjeZu01cLqnQqO5M3LYUrCy0D02j5SiGfUwRYo+eD0S
me68CtmbUPsH7Q==
=FG/F
-----END PGP SIGNATURE-----
Merge tag 'riscv-for-linus-5.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux into master
Pull RISC-V fixes from Palmer Dabbelt:
"A few more fixes this week:
- A fix to avoid using SBI calls during kasan initialization, as the
SBI calls themselves have not been probed yet.
- Three fixes related to systems with multiple memory regions"
* tag 'riscv-for-linus-5.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Parse all memory blocks to remove unusable memory
RISC-V: Do not rely on initrd_start/end computed during early dt parsing
RISC-V: Set maximum number of mapped pages correctly
riscv: kasan: use local_tlb_flush_all() to avoid uninitialized __sbi_rfence
- Fix a section end page alignment assumption that was causing crashes
- Fix ORC unwinding on freshly forked tasks which haven't executed yet
and which have empty user task stacks
- Fix the debug.exception-trace=1 sysctl dumping of user stacks, which
was broken by recent maccess changes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl8cGlwRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1hVGxAAhW5LoNLv6BsMtKrFhBJRzPhoO8YcB8ep
Z/MaaiEcS8kcSVx1nQrNNhqzwIGBlexHTyoddYQr/LeM3ahOoXc4+jZI7Es6WKdc
Rhofalp0pryJrqj52orAVAUeHikmiqmadANKKmUNtXcjWWoxexgab28gLwJPPA55
SKw1/tqQ6JTf/M6tLnalib/VcigN7F/r797LgqLan9pSkjMGj6f1zMFmqTsNDi05
Si5gqRv2GlAraelRQU7emMZGu1bXhfJvGix7BB1/u6nRimY15vIFtccgt4akJmFN
tBUwNHFeLrYsoX1txjMTtGzASBqtjaD6A1K2geqmuZ1lRL1YuGnUl7n4ccgAvQdJ
eY6iiLiXDcjZLZXYe5tfdV/4p3HZ3MYm/IRkaw2KfiGAIv0Gii/U0mShO+lX1PAZ
BdelgJv2fcOxDiNe0RM4EniBXQUcqDIOpQ1fPryYsFLeeA1LCK2/b8WLOPHuK8YG
oxAgU9vV7OA7nfBdxIPszvX1jXOiQPglsW2gfxlKbGAA9j3uKDGhGnmFFYatYUMF
OEOf/5iuw3MZE/+gAG685ksk4ibgW4Kv6ahAEdzs5hh37uM8SiC3oWVWQf2YWGHG
ITswqKY2oUqghWa3ftXEtPLKstsYbRG7KCRfBXh6AN2ay7JrDJg6sItCL4UI5BjK
dkDJ4pKLXug=
=oamN
-----END PGP SIGNATURE-----
Merge tag 'x86-urgent-2020-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into master
Pull x86 fixes from Ingo Molnar:
"Misc fixes:
- Fix a section end page alignment assumption that was causing
crashes
- Fix ORC unwinding on freshly forked tasks which haven't executed
yet and which have empty user task stacks
- Fix the debug.exception-trace=1 sysctl dumping of user stacks,
which was broken by recent maccess changes"
* tag 'x86-urgent-2020-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/dumpstack: Dump user space code correctly again
x86/stacktrace: Fix reliable check for empty user task stacks
x86/unwind/orc: Fix ORC for newly forked tasks
x86, vmlinux.lds: Page-align end of ..page_aligned sections
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl8cDeIRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1gVeA/8Cs67mJ9j3oto917z1GLElBVy7A+od/Pn
TqWEtwmonJA/ZM/TvuvfFxHFg6kgPlDx0x5pHYUows/8DGsftdveDQw4bGhrCN0i
Xghcn4gB+jdOaykpLQacEukYt/nPU0IOf79XqySiu774lWHvzwQI+2MIb7xBHBMO
npUXFzgeSlgjFkMm1lVa9QVPwVc1zQ6gKPJdLpDlrgW3cCrenM7QCAg3NG3Gx5TT
OXK4QZEvIuym+HWwbZRA2mOUSjm8U1Z1o95jiWUro7o5Xe+Agqi5zsLJc7lxX57+
LYwrXNx/zPZ+oBlzZhZZrAxn1bJP7DVS13ZCEL9Miy0Uslr7QBPyoGHiV4P8tL3v
cOn7kY+P7YT96Fx3ahkK3pJN+nUZIAsRrwZbgOP4bZHKdVi3zqUEilSdZNcXBZ9y
nj9mY/2pX9SOMIvoaosku2hkKs5EKIkh3qzhvbfA8fXsQC3lttrPGRW7hgPmxm7W
Fa+6liOi+t+d0rGgmi1fhHSn/L2gzmsLy5R1l6XyK15nDSALd4TrdtDrlJ8vbbl5
oRAN9mt9qPqNCQQKjcnAIXuPeJp2Pt6KphlPApHVa3J/eVzug1dFWyjia/e+js8W
Gejz1TV6SDdU137TmC4E6suAZ5nRSEBwiE+MiNnnG0DIvz7cp5FSIWCXOgt5WPl5
BXS4CSwnF24=
=7P/h
-----END PGP SIGNATURE-----
Merge tag 'timers-urgent-2020-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into master
Pull timer fix from Ingo Molnar:
"Fix a suspend/resume regression (crash) on TI AM3/AM4 SoC's"
* tag 'timers-urgent-2020-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
clocksource/drivers/timer-ti-dm: Fix suspend and resume for am3 and am4
debug check for a hard to debug case of bogus wakeup function flags.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl8cDFkRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1h2Hw//WW67DawJjUVoz7Hwjgf02ECgY1XspmoW
5oJjtXFU6pvOThL4U8qcb1HGOtWV/MqFajv25RX4lIyaxBo8AD5mazVQCwOc3s2C
nR9ZM2kYRgSKbrSp2EOoGYCTw76do9y8beahZ5btvSkjNzD1YgDJwwcuGeMZwp3w
cFg8GeeUbLRFcsVVMx+apirg2KR9rUp1jqjVdjsln8C/VngjWPYFdktgAglcXfry
5Op2SOkYUw4gw/vX8jm6YyyxtqZHbH5lsDN7zk12W3Xwqh9tH2ounk+ehzFlT+cY
HD8PvB39cT4UYtF7R0R7W6cYuLJkbF6XHpRhqBvwd4M7u6yzABp5S2W9ZfZPjmQR
nNTV2MBK9LUWTtMhXby7w7ooooSFqRydEUpEfTrmD8xYmMGYMmNkjpgErXKtIyk+
a124ouvP8pwmI0WxA9w8rPKAJnuN0TN60RvtEE4RpxvaL4Os79QcTObhwuxdarxm
8nJvQRqts0O3+i9dWF8hS9ZFWkQZfX9GfiEK6zR2qaqjxtVwvagc58XLBcRBd9pn
CI8JvJ1pZ8AFC6dy/szZ/6awrEJzynfeInYXnVX8UQ7Cn+bBQdTAlgXopLY58ybY
RKTIq4/Ov9D7g3RggoK+aEjryxJflN0xaz9Dus8fvveIZuFHtsra0u7FM3ZHz38x
rmP1A6Dc3Yc=
=XPX3
-----END PGP SIGNATURE-----
Merge tag 'sched-urgent-2020-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into master
Pull scheduler fixes from Ingo Molnar:
"Fix a race introduced by the recent loadavg race fix, plus add a debug
check for a hard to debug case of bogus wakeup function flags"
* tag 'sched-urgent-2020-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched: Warn if garbage is passed to default_wake_function()
sched: Fix race against ptrace_freeze_trace()
- Fix the layering violation in the use of the EFI runtime services
availability mask in users of the 'efivars' abstraction
- Revert build fix for GCC v4.8 which is no longer supported
- Clean up some x86 EFI stub details, some of which are borderline bugs
that copy around garbage into padding fields - let's fix these
out of caution.
- Fix build issues while working on RISC-V support
- Avoid --whole-archive when linking the stub on arm64
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl8cCIkRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1hGsxAAsPwL5ghYykViUQId0OjNmI7eRgISKRso
3n3BaFmgYMp9eq1gndzPp7A5Ty0NPsr8FiwuZ7setk9YoLuTUT1MHVtAzd6xkxlR
838CwTDvW5HvB69uxPQDHA/1mcH1smH4Iew/J7QXP+o6Zrg+BWOjKNtTiKFwawNC
m4Tkdvvq52wzykqbeuRhXxLetKFOH//R1V0s4M6nNySuY6gQGJQ2LPyaiMN6eq1V
LE8+wGcNlIRUOeC8RkEA7CE9g92jkGZ07uJDA09OP0J5WBNWLcxMJM2mBZ1Ho6uc
sNNOeTy76sjuXQvUBWCnjBr3/qqXnVGSuIq8NS1hS4L7HagdQ/peqFJRzvWBHT90
b9wUZv09ioFm9lb6/P6NL16sn/WPknCD7umxpfp5HKrlL2p7puvsvDXtuWgyhhPG
M+X9ZX1iaA54cA9bU6cXFzrNMw/DnYjHFsECF915EXjItNZToGs0rd1lf7ArgH2P
+3HgvLods73ufObKH2pUfY7EU2Ly1oJsNpK3RmpoOUzehW+++S80KdytkaAB10kT
dKp9LlTj6gC+lnC45J9NtpHlCodz/Rc0lBHQpxIqlk/p9grUuH4zn714Ii/FfNQg
i29GX9cCdgv+4KzmJCNTqyZvGFZ5m3K8f41x7iD+/ygqPLsP9PfV5RGRGau+FfL8
ezInhKdITqM=
=n3nF
-----END PGP SIGNATURE-----
Merge tag 'efi-urgent-2020-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into master
Pull EFI fixes from Ingo Molnar:
"Various EFI fixes:
- Fix the layering violation in the use of the EFI runtime services
availability mask in users of the 'efivars' abstraction
- Revert build fix for GCC v4.8 which is no longer supported
- Clean up some x86 EFI stub details, some of which are borderline
bugs that copy around garbage into padding fields - let's fix these
out of caution.
- Fix build issues while working on RISC-V support
- Avoid --whole-archive when linking the stub on arm64"
* tag 'efi-urgent-2020-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi: Revert "efi/x86: Fix build with gcc 4"
efi/efivars: Expose RT service availability via efivars abstraction
efi/libstub: Move the function prototypes to header file
efi/libstub: Fix gcc error around __umoddi3 for 32 bit builds
efi/libstub/arm64: link stub lib.a conditionally
efi/x86: Only copy upto the end of setup_header
efi/x86: Remove unused variables
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAl8bmiQACgkQiiy9cAdy
T1EfFwv+IeD4kNshoBF5NT//+7j+9n+R8z74v5bJUEmoabZMuCaWuYl/uUcBqIBH
l29Xd60aF8V2jd3xIt/SEPNFOAvvBSQIjn8X52Lkquo+QzgiOElSUL8aNZe17T9L
N1S+nhwk2udlveBtWZRxHT50WNdKQCGKEwUDn9ouUbuKxLrcwWc1gRCxkegmJXey
NRaynzdCRYFx7uLdctvesTDB4P9MJn+5vX7L1BdI7MxXzsSEI1yHpM5OBL9e13ki
swZ7P16qyCA6TK8EYuYLCeXKEK0X+wGeu/y8JZ7RjO9ozqrvECvguumNbcUCs+zf
vLWQkaeC2M1L5L4WV71sY5Pi7CUGw9WGaU/FBJUK65+hYqUWETt351BwbIRwVlrx
mX7C7Dh50AM84/54u169Sk1p/S8vjdvkmx8YyRWK+t7kmZx6TP7XKXvYN1vEHS12
Y5ceRKmNHLdRqS3nY6FK7TKdgVWK2hgNODDdVbUZ+iPoJztSmH7j2RkTzxJz6qBi
vzBAPJiG
=Cvv4
-----END PGP SIGNATURE-----
Merge tag '5.8-rc6-cifs-fix' of git://git.samba.org/sfrench/cifs-2.6 into master
Pull cifs fix from Steve French:
"A fix for a recently discovered regression in rename to older servers
caused by a recent patch"
* tag '5.8-rc6-cifs-fix' of git://git.samba.org/sfrench/cifs-2.6:
Revert "cifs: Fix the target file was deleted when rename failed."
Pull networking fixes from David Miller:
1) Fix RCU locaking in iwlwifi, from Johannes Berg.
2) mt76 can access uninitialized NAPI struct, from Felix Fietkau.
3) Fix race in updating pause settings in bnxt_en, from Vasundhara
Volam.
4) Propagate error return properly during unbind failures in ax88172a,
from George Kennedy.
5) Fix memleak in adf7242_probe, from Liu Jian.
6) smc_drv_probe() can leak, from Wang Hai.
7) Don't muck with the carrier state if register_netdevice() fails in
the bonding driver, from Taehee Yoo.
8) Fix memleak in dpaa_eth_probe, from Liu Jian.
9) Need to check skb_put_padto() return value in hsr_fill_tag(), from
Murali Karicheri.
10) Don't lose ionic RSS hash settings across FW update, from Shannon
Nelson.
11) Fix clobbered SKB control block in act_ct, from Wen Xu.
12) Missing newlink in "tx_timeout" sysfs output, from Xiongfeng Wang.
13) IS_UDPLITE cleanup a long time ago, incorrectly handled
transformations involving UDPLITE_RECV_CC. From Miaohe Lin.
14) Unbalanced locking in netdevsim, from Taehee Yoo.
15) Suppress false-positive error messages in qed driver, from Alexander
Lobakin.
16) Out of bounds read in ax25_connect and ax25_sendmsg, from Peilin Ye.
17) Missing SKB release in cxgb4's uld_send(), from Navid Emamdoost.
18) Uninitialized value in geneve_changelink(), from Cong Wang.
19) Fix deadlock in xen-netfront, from Andera Righi.
19) flush_backlog() frees skbs with IRQs disabled, so should use
dev_kfree_skb_irq() instead of kfree_skb(). From Subash Abhinov
Kasiviswanathan.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (111 commits)
drivers/net/wan: lapb: Corrected the usage of skb_cow
dev: Defer free of skbs in flush_backlog
qrtr: orphan socket in qrtr_release()
xen-netfront: fix potential deadlock in xennet_remove()
flow_offload: Move rhashtable inclusion to the source file
geneve: fix an uninitialized value in geneve_changelink()
bonding: check return value of register_netdevice() in bond_newlink()
tcp: allow at most one TLP probe per flight
AX.25: Prevent integer overflows in connect and sendmsg
cxgb4: add missing release on skb in uld_send()
net: atlantic: fix PTP on AQC10X
AX.25: Prevent out-of-bounds read in ax25_sendmsg()
sctp: shrink stream outq when fails to do addstream reconf
sctp: shrink stream outq only when new outcnt < old outcnt
AX.25: Fix out-of-bounds read in ax25_connect()
enetc: Remove the mdio bus on PF probe bailout
net: ethernet: ti: add NETIF_F_HW_TC hw feature flag for taprio offload
net: ethernet: ave: Fix error returns in ave_init
drivers/net/wan/x25_asy: Fix to make it work
ipvs: fix the connection sync failed in some cases
...
Currently, maximum physical memory allowed is equal to -PAGE_OFFSET.
That's why we remove any memory blocks spanning beyond that size. However,
it is done only for memblock containing linux kernel which will not work
if there are multiple memblocks.
Process all memory blocks to figure out how much memory needs to be removed
and remove at the end instead of updating the memblock list in place.
Signed-off-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
This patch fixed 2 issues with the usage of skb_cow in LAPB drivers
"lapbether" and "hdlc_x25":
1) After skb_cow fails, kfree_skb should be called to drop a reference
to the skb. But in both drivers, kfree_skb is not called.
2) skb_cow should be called before skb_push so that is can ensure the
safety of skb_push. But in "lapbether", it is incorrectly called after
skb_push.
More details about these 2 issues:
1) The behavior of calling kfree_skb on failure is also the behavior of
netif_rx, which is called by this function with "return netif_rx(skb);".
So this function should follow this behavior, too.
2) In "lapbether", skb_cow is called after skb_push. This results in 2
logical issues:
a) skb_push is not protected by skb_cow;
b) An extra headroom of 1 byte is ensured after skb_push. This extra
headroom has no use in this function. It also has no use in the
upper-layer function that this function passes the skb to
(x25_lapb_receive_frame in net/x25/x25_dev.c).
So logically skb_cow should instead be called before skb_push.
Cc: Eric Dumazet <edumazet@google.com>
Cc: Martin Schiller <ms@dev.tdt.de>
Signed-off-by: Xie He <xie.he.0141@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Chris Packham says:
====================
net: dsa: mv88e6xxx: port mtu support
This series connects up the mv88e6xxx switches to the dsa infrastructure for
configuring the port MTU. The first patch is also a bug fix which might be a
candiatate for stable.
I've rebased this series on top of net-next/master to pick up Andrew's change
for the gigabit switches. Patch 1 and 2 are unchanged (aside from adding
Andrew's Reviewed-by). Patch 3 is reworked to make use of the existing mtu
support.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Some of the chips in the mv88e6xxx family don't support jumbo
configuration per port. But they do have a chip-wide max frame size that
can be used. Use this to approximate the behaviour of configuring a port
based MTU.
Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The MV88E6190 and MV88E6190X both support per port jumbo configuration
just like the other GE switches. Install the appropriate ops.
Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The MV88E6097 chip does not support configuring jumbo frames. Prior to
commit 5f4366660d only the 6352, 6351, 6165 and 6320 chips configured
jumbo mode. The refactor accidentally added the function for the 6097.
Remove the erroneous function pointer assignment.
Fixes: 5f4366660d ("net: dsa: mv88e6xxx: Refactor setting of jumbo frames")
Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
IRQs are disabled when freeing skbs in input queue.
Use the IRQ safe variant to free skbs here.
Fixes: 145dd5f9c8 ("net: flush the softnet backlog in process context")
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, maximum number of mapper pages are set to the pfn calculated
from the memblock size of the memblock containing kernel. This will work
until that memblock spans the entire memory. However, it will be set to
a wrong value if there are multiple memblocks defined in kernel
(e.g. with efi runtime services).
Set the the maximum value to the pfn calculated from dram size.
Signed-off-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAl8beM8UHGJoZWxnYWFz
QGdvb2dsZS5jb20ACgkQWYigwDrT+vygXhAAwYMQgoLF+HZDVGWK6q/YmN+Q0ygb
tXtnKt/ehf+zrvWiWZxmRVPZ71MjR9Ut6kXQ+MWlADkLkBeIxsCMj4N8ZjGSp/Gh
UMfFwVYoUhJG3AM8uko6hCugsYsxagFA5zr53crHIE9AHCp/2Ca4lq1VrBB5k5OQ
v4bghphxtPZXtUelHHcbmylU8EmnPG79XzL1VYqgeOd98nSrn34c68Ib7tezLF3I
zgRraYghlyI6TGvbA55vdouMs+z+eYYN2hIu6FEXDx2lm81IvFYD5LdXwfb78Ns3
4+BcyuqHcsyvHuxD77Y/JrlmLdFopFRyhexfc887znqD2nXc0kAAtLXKM+d+/bXz
vV8qtsDVWhYXS6cwvDuYSjsa1DiaWyJNQ/4tIuu5hYofXfi7ltyATkVrAjE0QPaC
QpWnMVKecXSjd//VyTSvAAgO3Bk27uhXfTy3n7lpCTcpzMJj04t6go6u8/SOLtz9
ECWs4WWQVoi7zX65chcXtwce2+plQA6Pu7duzNLo5hP5KIIH4Ij0I19w1yGZueUm
V5CIRAsVXPfQPJ45nhKwHldHJpZ/uXM97nYH5F9H14j9VXrrMJWptzyZiH53ZKvE
9GLBfHGcyKBCHcOhJtEJqgf7gkrxGpfSVcacYmPClB8SOsTd67mzIVRdH3xS7DUk
j8VlWiNB266bHaY=
=hzUW
-----END PGP SIGNATURE-----
Merge tag 'pci-v5.8-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci into master
Pull PCI fixes from Bjorn Helgaas:
- Reject invalid IRQ 0 command line argument for virtio_mmio because
IRQ 0 now generates warnings (Bjorn Helgaas)
- Revert "PCI/PM: Assume ports without DLL Link Active train links in
100 ms", which broke nouveau (Bjorn Helgaas)
* tag 'pci-v5.8-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
Revert "PCI/PM: Assume ports without DLL Link Active train links in 100 ms"
virtio-mmio: Reject invalid IRQ 0 command line argument
We have to detach sock from socket in qrtr_release(),
otherwise skb->sk may still reference to this socket
when the skb is released in tun->queue, particularly
sk->sk_wq still points to &sock->wq, which leads to
a UAF.
Reported-and-tested-by: syzbot+6720d64f31c081c2f708@syzkaller.appspotmail.com
Fixes: 28fb4e59a4 ("net: qrtr: Expose tunneling endpoint to user space")
Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove casting the values returned by memory allocation function.
Coccinelle emits WARNING:
./drivers/net/ethernet/hisilicon/hix5hd2_gmac.c:1027:9-23: WARNING:
casting value returned by memory allocation function to (struct sg_desc *) is useless.
This issue was detected by using the Coccinelle software.
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Second set of fixes for v5.8, and hopefully also the last. Three
important regressions fixed.
ath9k
* fix a regression which broke support for all ath9k usb devices
ath10k
* fix a regression which broke support for all QCA4019 AHB devices
iwlwifi
* fix a regression which broke support for some Killer Wireless-AC 1550 cards
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJfGukwAAoJEG4XJFUm622bIdwIAKYV3lplb6mXyDdF1b3Yoxar
0IDUglXlJN+VIgvUQG4wOvxgDw7jp6q95BRi58VjafIFm+orjILvYFmwThLPYfIS
dA0ML9+zGnRMC+SIjZktXKG9lridQXyqexWHCFF9dT147GsF8+LIbfqT6denCLf7
vP26RSMAMK5Ed01V9qb5rtitaVbr+Bx+c7adV8tLH/gkWeTEp4CaDhNOkMJ0uRCV
QPZ8duF1Hz/QVMn9PAA8xGTH4bxIfSfWVJfSjG1uq349CZ+MjHleDeYWKh524ThV
49YTabR2DIKOWSgNzG7jsitz7K/420Y/NpuNeGWO+7czgYe2uEIrGDZkYPUPoS4=
=PTR1
-----END PGP SIGNATURE-----
Merge tag 'wireless-drivers-2020-07-24' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers
Kalle Valo says:
====================
wireless-drivers fixes for v5.8
Second set of fixes for v5.8, and hopefully also the last. Three
important regressions fixed.
ath9k
* fix a regression which broke support for all ath9k usb devices
ath10k
* fix a regression which broke support for all QCA4019 AHB devices
iwlwifi
* fix a regression which broke support for some Killer Wireless-AC 1550 cards
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Parkin says:
====================
l2tp: avoid multiple assignment, remove BUG_ON
l2tp hasn't been kept up to date with the static analysis checks offered
by checkpatch.pl.
This patchset builds on the series: "l2tp: cleanup checkpatch.pl
warnings" and "l2tp: further checkpatch.pl cleanups" to resolve some of
the remaining checkpatch warnings in l2tp.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
l2tp_session_free called BUG_ON if the tunnel magic feather value wasn't
correct. The intent of this was to catch lifetime bugs; for example
early tunnel free due to incorrect use of reference counts.
Since the tunnel magic feather being wrong indicates either early free
or structure corruption, we can avoid doing more damage by simply
leaving the tunnel structure alone. If the tunnel refcount isn't
dropped when it should be, the tunnel instance will remain in the
kernel, resulting in the tunnel structure and socket leaking.
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
l2tp_session_free is only called by l2tp_session_dec_refcount when the
reference count reaches zero, so it's of limited value to validate the
reference count value in l2tp_session_free itself.
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
l2tp_session_queue_purge is used during session shutdown to drop any
skbs queued for reordering purposes according to L2TP dataplane rules.
The BUG_ON in this function checks the session magic feather in an
attempt to catch lifetime bugs.
Rather than crashing the kernel with a BUG_ON, we can simply WARN_ON and
refuse to do anything more -- in the worst case this could result in a
leak. However this is highly unlikely given that the session purge only
occurs from codepaths which have obtained the session by means of a lookup
via. the parent tunnel and which check the session "dead" flag to
protect against shutdown races.
While we're here, have l2tp_session_queue_purge return void rather than
an integer, since neither of the callsites checked the return value.
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
checkpatch advises that WARN_ON and recovery code are preferred over
BUG_ON which crashes the kernel.
l2tp_ppp has a BUG_ON check of struct seq_file's private pointer in
pppol2tp_seq_start prior to accessing data through that pointer.
Rather than crashing, we can simply bail out early and return NULL in
order to terminate the seq file processing in much the same way as we do
when reaching the end of tunnel/session instances to render.
Retain a WARN_ON to help trace possible bugs in this area.
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
checkpatch advises that WARN_ON and recovery code are preferred over
BUG_ON which crashes the kernel.
l2tp_ppp.c's BUG_ON checks of the l2tp session structure's "magic" field
occur in code paths where it's reasonably easy to recover:
* In the case of pppol2tp_sock_to_session, we can return NULL and the
caller will bail out appropriately. There is no change required to
any of the callsites of this function since they already handle
pppol2tp_sock_to_session returning NULL.
* In the case of pppol2tp_session_destruct we can just avoid
decrementing the reference count on the suspect session structure.
In the worst case scenario this results in a memory leak, which is
preferable to a crash.
Convert these uses of BUG_ON to WARN_ON accordingly.
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
l2tp_tunnel_closeall is only called from l2tp_core.c, and it's easy
to statically analyse the code path calling it to validate that it
should never be passed a NULL tunnel pointer.
Having a BUG_ON checking the tunnel pointer triggers a checkpatch
warning. Since the BUG_ON is of no value, remove it to avoid the
warning.
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
l2tp_session_queue_purge is only called from l2tp_core.c, and it's easy
to statically analyse the code paths calling it to validate that it
should never be passed a NULL session pointer.
Having a BUG_ON checking the session pointer triggers a checkpatch
warning. Since the BUG_ON is of no value, remove it to avoid the
warning.
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
l2tp_dfs_seq_start had a BUG_ON to catch a possible programming error in
l2tp_dfs_seq_open.
Since we can easily bail out of l2tp_dfs_seq_start, prefer to do that
and flag the error with a WARN_ON rather than crashing the kernel.
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
checkpatch warns about multiple assignments.
Update l2tp accordingly.
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Willem de Bruijn says:
====================
icmp6: support rfc 4884
Extend the feature merged earlier this week for IPv4 to IPv6.
I expected this to be a single patch, but patch 1 seemed better to be
stand-alone
patch 1: small fix in length calculation
patch 2: factor out ipv4-specific
patch 3: add ipv6
changes v1->v2: add missing static keyword in patch 3
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Extend the rfc 4884 read interface introduced for ipv4 in
commit eba75c587e ("icmp: support rfc 4884") to ipv6.
Add socket option SOL_IPV6/IPV6_RECVERR_RFC4884.
Changes v1->v2:
- make ipv6_icmp_error_rfc4884 static (file scope)
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The RFC 4884 spec is largely the same between IPv4 and IPv6.
Factor out the IPv4 specific parts in preparation for IPv6 support:
- icmp types supported
- icmp header size, and thus offset to original datagram start
- datagram length field offset in icmp(6)hdr.
- datagram length field word size: 4B for IPv4, 8B for IPv6.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
1) Only accept packets with original datagram len field >= header len.
The extension header must start after the original datagram headers.
The embedded datagram len field is compared against the 128B minimum
stipulated by RFC 4884. It is unlikely that headers extend beyond
this. But as we know the exact header length, check explicitly.
2) Remove the check that datagram length must be <= 576B.
This is a send constraint. There is no value in testing this on rx.
Within private networks it may be known safe to send larger packets.
Process these packets.
This test was also too lax. It compared original datagram length
rather than entire icmp packet length. The stand-alone fix would be:
- if (hlen + skb->len > 576)
+ if (-skb_network_offset(skb) + skb->len > 576)
Fixes: eba75c587e ("icmp: support rfc 4884")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The variable status is being initialized with a value that is never read
and it is being updated later with a new value. The initialization is
redundant and can be removed. Also put the variable declarations into
reverse christmas tree order.
Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There's a potential race in xennet_remove(); this is what the driver is
doing upon unregistering a network device:
1. state = read bus state
2. if state is not "Closed":
3. request to set state to "Closing"
4. wait for state to be set to "Closing"
5. request to set state to "Closed"
6. wait for state to be set to "Closed"
If the state changes to "Closed" immediately after step 1 we are stuck
forever in step 4, because the state will never go back from "Closed" to
"Closing".
Make sure to check also for state == "Closed" in step 4 to prevent the
deadlock.
Also add a 5 sec timeout any time we wait for the bus state to change,
to avoid getting stuck forever in wait_event().
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The previous patch introduced a deadlock, this patch fixes it by making
sure the work is canceled without holding the global ovs lock. This is
done by moving the reorder processing one layer up to the netns level.
Fixes: eac87c413b ("net: openvswitch: reorder masks array based on usage")
Reported-by: syzbot+2c4ff3614695f75ce26c@syzkaller.appspotmail.com
Reported-by: syzbot+bad6507e5db05017b008@syzkaller.appspotmail.com
Reviewed-by: Paolo <pabeni@redhat.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This sockopt accepts two kinds of parameters, using struct
sctp_sack_info and struct sctp_assoc_value. The mentioned commit didn't
notice an implicit cast from the smaller (latter) struct to the bigger
one (former) when copying the data from the user space, which now leads
to an attempt to write beyond the buffer (because it assumes the storing
buffer is bigger than the parameter itself).
Fix it by allocating a sctp_sack_info on stack and filling it out based
on the small struct for the compat case.
Changelog stole from an earlier patch from Marcelo Ricardo Leitner.
Fixes: ebb25defdc ("sctp: pass a kernel pointer to sctp_setsockopt_delayed_ack")
Reported-by: syzbot+0e4699d000d8b874d8dc@syzkaller.appspotmail.com
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tony Nguyen says:
====================
100GbE Intel Wired LAN Driver Updates 2020-07-23
This series contains updates to ice driver only.
Jake refactors ice_discover_caps() to reduce the number of AdminQ calls
made. Splits ice_parse_caps() to separate functions to update function
and device capabilities separately to allow for updating outside of
initialization.
Akeem adds power management support.
Paul G refactors FC and FEC code to aid in restoring of PHY settings
on media insertion. Implements lenient mode and link override support.
Adds link debug info and formats existing debug info to be more
readable. Adds support to check and report additional autoneg
capabilities. Implements the capability to detect media cage in order to
differentiate AUI types as Direct Attach or backplane.
Bruce implements Total Port Shutdown for devices that support it.
Lev renames low_power_ctrl field to lower_power_ctrl_an to be more
descriptive of the field.
Doug reports AOC types as media type fiber.
Paul S adds code to handle 1G SGMII PHY type.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>