linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-11-25 13:14:07 +08:00

Author	SHA1	Message	Date
Heiner Kallweit	59ee97c0c1	r8169: enable cfg9346 config register access in atomic context For disabling ASPM during NAPI poll we'll have to unlock access to the config registers in atomic context. Other code parts running with config register access unlocked are partially longer and can sleep. Add a usage counter to enable parallel execution of code parts requiring unlocked config registers. Reviewed-by: Simon Horman <simon.horman@corigine.com> Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-03-08 09:30:41 +00:00
Heiner Kallweit	6bc6c4e689	r8169: use spinlock to protect access to registers Config2 and Config5 For disabling ASPM during NAPI poll we'll have to access both registers in atomic context. Use a spinlock to protect access. Reviewed-by: Simon Horman <simon.horman@corigine.com> Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-03-08 09:30:41 +00:00
Heiner Kallweit	91c8643578	r8169: use spinlock to protect mac ocp register access For disabling ASPM during NAPI poll we'll have to access mac ocp registers in atomic context. This could result in races because a mac ocp read consists of a write to register OCPDR, followed by a read from the same register. Therefore add a spinlock to protect access to mac ocp registers. Reviewed-by: Simon Horman <simon.horman@corigine.com> Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-03-08 09:30:41 +00:00
Vadim Fedorenko	8ca5a5790b	net-timestamp: extend SOF_TIMESTAMPING_OPT_ID to HW timestamps When the feature was added it was enabled for SW timestamps only but with current hardware the same out-of-order timestamps can be seen. Let's expand the area for the feature to all types of timestamps. Signed-off-by: Vadim Fedorenko <vadfed@meta.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-03-08 09:27:14 +00:00
Gustavo A. R. Silva	2549347972	netxen_nic: Replace fake flex-array with flexible-array member Zero-length arrays as fake flexible arrays are deprecated and we are moving towards adopting C99 flexible-array members instead. Transform zero-length array into flexible-array member in struct nx_cardrsp_rx_ctx_t. Address the following warnings found with GCC-13 and -fstrict-flex-arrays=3 enabled: drivers/net/ethernet/qlogic/netxen/netxen_nic_ctx.c:361:26: warning: array subscript <unknown> is outside array bounds of ‘char[0]’ [-Warray-bounds=] drivers/net/ethernet/qlogic/netxen/netxen_nic_ctx.c:372:25: warning: array subscript <unknown> is outside array bounds of ‘char[0]’ [-Warray-bounds=] This helps with the ongoing efforts to tighten the FORTIFY_SOURCE routines on memcpy() and help us make progress towards globally enabling -fstrict-flex-arrays=3 [1]. Link: https://github.com/KSPP/linux/issues/21 Link: https://github.com/KSPP/linux/issues/265 Link: https://gcc.gnu.org/pipermail/gcc-patches/2022-October/602902.html [1] Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/ZAZ57I6WdQEwWh7v@work Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-03-08 00:16:58 -08:00
Heiner Kallweit	4310e2f420	net: phy: smsc: simplify lan95xx_config_aneg_ext lan95xx_config_aneg_ext() can be simplified by using phy_set_bits(). Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/3da785c7-3ef8-b5d3-89a0-340f550be3c2@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-03-07 23:57:32 -08:00
Eric Dumazet	40bbae583e	net: remove enum skb_free_reason enum skb_drop_reason is more generic, we can adopt it instead. Provide dev_kfree_skb_irq_reason() and dev_kfree_skb_any_reason(). This means drivers can use more precise drop reasons if they want to. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Yunsheng Lin <linyunsheng@huawei.com> Link: https://lore.kernel.org/r/20230306204313.10492-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-03-07 23:57:19 -08:00
Heiner Kallweit	0194b64578	net: phy: improve phy_read_poll_timeout cond sometimes is (val & MASK) what may result in a false positive if val is a negative errno. We shouldn't evaluate cond if val < 0. This has no functional impact here, but it's not nice. Therefore switch order of the checks. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/6d8274ac-4344-23b4-d9a3-cad4c39517d4@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-03-07 18:19:09 -08:00
Andrii Nakryiko	d1d51a62d0	Merge branch 'libbpf: usdt arm arg parsing support' Puranjay Mohan says: ==================== This series add the support of the ARM architecture to libbpf USDT. This involves implementing the parse_usdt_arg() function for ARM. It was seen that the last part of parse_usdt_arg() is repeated for all architectures, so, the first patch in this series refactors these functions and moved the post processing to parse_usdt_spec() Changes in V2[1] to V3: - Use a tabular approach to find register offsets. - Add the patch for refactoring parse_usdt_arg() ==================== Signed-off-by: Andrii Nakryiko <andrii@kernel.org>	2023-03-07 15:35:56 -08:00
Puranjay Mohan	720d93b60a	libbpf: USDT arm arg parsing support Parsing of USDT arguments is architecture-specific; on arm it is relatively easy since registers used are r[0-10], fp, ip, sp, lr, pc. Format is slightly different compared to aarch64; forms are - "size @ [ reg, #offset ]" for dereferences, for example "-8 @ [ sp, #76 ]" ; " -4 @ [ sp ]" - "size @ reg" for register values; for example "-4@r0" - "size @ #value" for raw values; for example "-8@#1" Add support for parsing USDT arguments for ARM architecture. To test the above changes QEMU's virt[1] board with cortex-a15 CPU was used. libbpf-bootstrap's usdt example[2] was modified to attach to a test program with DTRACE_PROBE1/2/3/4... probes to test different combinations. [1] https://www.qemu.org/docs/master/system/arm/virt.html [2] https://github.com/libbpf/libbpf-bootstrap/blob/master/examples/c/usdt.bpf.c Signed-off-by: Puranjay Mohan <puranjay12@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230307120440.25941-3-puranjay12@gmail.com	2023-03-07 15:35:53 -08:00
Puranjay Mohan	98e678e9bc	libbpf: Refactor parse_usdt_arg() to re-use code The parse_usdt_arg() function is defined differently for each architecture but the last part of the function is repeated verbatim for each architecture. Refactor parse_usdt_arg() to fill the arg_sz and then do the repeated post-processing in parse_usdt_spec(). Signed-off-by: Puranjay Mohan <puranjay12@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230307120440.25941-2-puranjay12@gmail.com	2023-03-07 15:35:05 -08:00
Daniel Müller	3ecde2182a	libbpf: Fix theoretical u32 underflow in find_cd() function Coverity reported a potential underflow of the offset variable used in the find_cd() function. Switch to using a signed 64 bit integer for the representation of offset to make sure we can never underflow. Fixes: `1eebcb6063` ("libbpf: Implement basic zip archive parsing support") Signed-off-by: Daniel Müller <deso@posteo.net> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230307215504.837321-1-deso@posteo.net	2023-03-07 15:30:47 -08:00
Alexei Starovoitov	a73dc912aa	Merge branch 'bpf: bpf memory usage' Yafang Shao says: ==================== Currently we can't get bpf memory usage reliably either from memcg or from bpftool. In memcg, there's not a 'bpf' item in memory.stat, but only 'kernel', 'sock', 'vmalloc' and 'percpu' which may related to bpf memory. With these items we still can't get the bpf memory usage, because bpf memory usage may far less than the kmem in a memcg, for example, the dentry may consume lots of kmem. bpftool now shows the bpf memory footprint, which is difference with bpf memory usage. The difference can be quite great in some cases, for example, - non-preallocated bpf map The non-preallocated bpf map memory usage is dynamically changed. The allocated elements count can be from 0 to the max entries. But the memory footprint in bpftool only shows a fixed number. - bpf metadata consumes more memory than bpf element In some corner cases, the bpf metadata can consumes a lot more memory than bpf element consumes. For example, it can happen when the element size is quite small. - some maps don't have key, value or max_entries For example the key_size and value_size of ringbuf is 0, so its memlock is always 0. We need a way to show the bpf memory usage especially there will be more and more bpf programs running on the production environment and thus the bpf memory usage is not trivial. This patchset introduces a new map ops ->map_mem_usage to calculate the memory usage. Note that we don't intend to make the memory usage 100% accurate, while our goal is to make sure there is only a small difference between what bpftool reports and the real memory. That small difference can be ignored compared to the total usage. That is enough to monitor the bpf memory usage. For example, the user can rely on this value to monitor the trend of bpf memory usage, compare the difference in bpf memory usage between different bpf program versions, figure out which maps consume large memory, and etc. This patchset implements the bpf memory usage for all maps, and yet there's still work to do. We don't want to introduce runtime overhead in the element update and delete path, but we have to do it for some non-preallocated maps, - devmap, xskmap When we update or delete an element, it will allocate or free memory. In order to track this dynamic memory, we have to track the count in element update and delete path. - cpumap The element size of each cpumap element is not determinated. If we want to track the usage, we have to count the size of all elements in the element update and delete path. So I just put it aside currently. - local_storage, bpf_local_storage When we attach or detach a cgroup, it will allocate or free memory. If we want to track the dynamic memory, we also need to do something in the update and delete path. So I just put it aside currently. - offload map The element update and delete of offload map is via the netdev dev_ops, in which it may dynamically allocate or free memory, but this dynamic memory isn't counted in offload map memory usage currently. The result of each map can be found in the individual patch. We may also need to track per-container bpf memory usage, that will be addressed by a different patchset. Changes: v3->v4: code improvement on ringbuf (Andrii) use READ_ONCE() to read lpm_trie (Tao) explain why we can't get bpf memory usage from memcg. v2->v3: check callback at map creation time and avoid warning (Alexei) fix build error under CONFIG_BPF=n (lkp@intel.com) v1->v2: calculate the memory usage within bpf (Alexei) - [v1] bpf, mm: bpf memory usage https://lwn.net/Articles/921991/ - [RFC PATCH v2] mm, bpf: Add BPF into /proc/meminfo https://lwn.net/Articles/919848/ - [RFC PATCH v1] mm, bpf: Add BPF into /proc/meminfo https://lwn.net/Articles/917647/ - [RFC PATCH] bpf, mm: Add a new item bpf into memory.stat https://lore.kernel.org/bpf/20220921170002.29557-1-laoar.shao@gmail].com/ ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-07 09:33:56 -08:00
Yafang Shao	6b4a6ea2c6	bpf: enforce all maps having memory usage callback We have implemented memory usage callback for all maps, and we enforce any newly added map having a callback as well. We check this callback at map creation time. If it doesn't have the callback, we will return EINVAL. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Link: https://lore.kernel.org/r/20230305124615.12358-19-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-07 09:33:43 -08:00
Yafang Shao	9629363cd0	bpf: offload map memory usage A new helper is introduced to calculate offload map memory usage. But currently the memory dynamically allocated in netdev dev_ops, like nsim_map_update_elem, is not counted. Let's just put it aside now. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Link: https://lore.kernel.org/r/20230305124615.12358-18-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-07 09:33:43 -08:00
Yafang Shao	b4fd0d672b	bpf, net: xskmap memory usage A new helper is introduced to calculate xskmap memory usage. The xfsmap memory usage can be dynamically changed when we add or remove a xsk_map_node. Hence we need to track the count of xsk_map_node to get its memory usage. The result as follows, - before 10: xskmap name count_map flags 0x0 key 4B value 4B max_entries 65536 memlock 524288B - after 10: xskmap name count_map flags 0x0 <<< no elements case key 4B value 4B max_entries 65536 memlock 524608B Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Link: https://lore.kernel.org/r/20230305124615.12358-17-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-07 09:33:43 -08:00
Yafang Shao	73d2c61919	bpf, net: sock_map memory usage sockmap and sockhash don't have something in common in allocation, so let's introduce different helpers to calculate their memory usage. The reuslt as follows, - before 28: sockmap name count_map flags 0x0 key 4B value 4B max_entries 65536 memlock 524288B 29: sockhash name count_map flags 0x0 key 4B value 4B max_entries 65536 memlock 524288B - after 28: sockmap name count_map flags 0x0 key 4B value 4B max_entries 65536 memlock 524608B 29: sockhash name count_map flags 0x0 <<<< no updated elements key 4B value 4B max_entries 65536 memlock 1048896B Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Link: https://lore.kernel.org/r/20230305124615.12358-16-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-07 09:33:43 -08:00
Yafang Shao	7490b7f1c0	bpf, net: bpf_local_storage memory usage A new helper is introduced into bpf_local_storage map to calculate the memory usage. This helper is also used by other maps like bpf_cgrp_storage, bpf_inode_storage, bpf_task_storage and etc. Note that currently the dynamically allocated storage elements are not counted in the usage, since it will take extra runtime overhead in the elements update or delete path. So let's put it aside now, and implement it in the future when someone really need it. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Link: https://lore.kernel.org/r/20230305124615.12358-15-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-07 09:33:43 -08:00
Yafang Shao	2f536977d6	bpf: local_storage memory usage A new helper is introduced to calculate local_storage map memory usage. Currently the dynamically allocated elements are not counted, since it will take runtime overhead in the element update or delete path. So let's put it aside currently, and implement it in the future if the user really needs it. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Link: https://lore.kernel.org/r/20230305124615.12358-14-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-07 09:33:43 -08:00
Yafang Shao	f062226d8d	bpf: bpf_struct_ops memory usage A new helper is introduced to calculate bpf_struct_ops memory usage. The result as follows, - before 1: struct_ops name count_map flags 0x0 key 4B value 256B max_entries 1 memlock 4096B btf_id 73 - after 1: struct_ops name count_map flags 0x0 key 4B value 256B max_entries 1 memlock 5016B btf_id 73 Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Link: https://lore.kernel.org/r/20230305124615.12358-13-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-07 09:33:43 -08:00
Yafang Shao	c6e66b42a3	bpf: queue_stack_maps memory usage A new helper is introduced to calculate queue_stack_maps memory usage. The result as follows, - before 20: queue name count_map flags 0x0 key 0B value 4B max_entries 65536 memlock 266240B 21: stack name count_map flags 0x0 key 0B value 4B max_entries 65536 memlock 266240B - after 20: queue name count_map flags 0x0 key 0B value 4B max_entries 65536 memlock 524288B 21: stack name count_map flags 0x0 key 0B value 4B max_entries 65536 memlock 524288B Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Link: https://lore.kernel.org/r/20230305124615.12358-12-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-07 09:33:42 -08:00
Yafang Shao	fa5e83df17	bpf: devmap memory usage A new helper is introduced to calculate the memory usage of devmap and devmap_hash. The number of dynamically allocated elements are recored for devmap_hash already, but not for devmap. To track the memory size of dynamically allocated elements, this patch also count the numbers for devmap. The result as follows, - before 40: devmap name count_map flags 0x80 key 4B value 4B max_entries 65536 memlock 524288B 41: devmap_hash name count_map flags 0x80 key 4B value 4B max_entries 65536 memlock 524288B - after 40: devmap name count_map flags 0x80 <<<< no elements key 4B value 4B max_entries 65536 memlock 524608B 41: devmap_hash name count_map flags 0x80 <<<< no elements key 4B value 4B max_entries 65536 memlock 524608B Note that the number of buckets is same with max_entries for devmap_hash in this case. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Link: https://lore.kernel.org/r/20230305124615.12358-11-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-07 09:33:42 -08:00
Yafang Shao	835f1fca95	bpf: cpumap memory usage A new helper is introduced to calculate cpumap memory usage. The size of cpu_entries can be dynamically changed when we update or delete a cpumap element, but this patch doesn't include the memory size of cpu_entry yet. We can dynamically calculate the memory usage when we alloc or free a cpu_entry, but it will take extra runtime overhead, so let just put it aside currently. Note that the size of different cpu_entry may be different as well. The result as follows, - before 48: cpumap name count_map flags 0x4 key 4B value 4B max_entries 64 memlock 4096B - after 48: cpumap name count_map flags 0x4 key 4B value 4B max_entries 64 memlock 832B Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Link: https://lore.kernel.org/r/20230305124615.12358-10-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-07 09:33:42 -08:00
Yafang Shao	71a49abe73	bpf: bloom_filter memory usage Introduce a new helper to calculate the bloom_filter memory usage. The result as follows, - before 16: bloom_filter flags 0x0 key 0B value 8B max_entries 65536 memlock 524288B - after 16: bloom_filter flags 0x0 key 0B value 8B max_entries 65536 memlock 65856B Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Link: https://lore.kernel.org/r/20230305124615.12358-9-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-07 09:33:42 -08:00
Yafang Shao	2f7e4ab2ca	bpf: ringbuf memory usage A new helper ringbuf_map_mem_usage() is introduced to calculate ringbuf memory usage. The result as follows, - before 15: ringbuf name count_map flags 0x0 key 0B value 0B max_entries 65536 memlock 0B - after 15: ringbuf name count_map flags 0x0 key 0B value 0B max_entries 65536 memlock 78424B Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230305124615.12358-8-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-07 09:33:42 -08:00
Yafang Shao	2e89caf055	bpf: reuseport_array memory usage A new helper is introduced to calculate reuseport_array memory usage. The result as follows, - before 14: reuseport_sockarray name count_map flags 0x0 key 4B value 8B max_entries 65536 memlock 1048576B - after 14: reuseport_sockarray name count_map flags 0x0 key 4B value 8B max_entries 65536 memlock 524544B Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Link: https://lore.kernel.org/r/20230305124615.12358-7-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-07 09:33:42 -08:00
Yafang Shao	cbb9b6068c	bpf: stackmap memory usage A new helper is introduced to get stackmap memory usage. Some small memory allocations are ignored as their memory size is quite small compared to the totol usage. The result as follows, - before 16: stack_trace name count_map flags 0x0 key 4B value 8B max_entries 65536 memlock 1048576B - after 16: stack_trace name count_map flags 0x0 key 4B value 8B max_entries 65536 memlock 2097472B Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Link: https://lore.kernel.org/r/20230305124615.12358-6-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-07 09:33:42 -08:00
Yafang Shao	1746d0555a	bpf: arraymap memory usage Introduce array_map_mem_usage() to calculate arraymap memory usage. In this helper, some small memory allocations are ignored, like the allocation of struct bpf_array_aux in prog_array. The inner_map_meta in array_of_map is also ignored. The result as follows, - before 11: array name count_map flags 0x0 key 4B value 4B max_entries 65536 memlock 524288B 12: percpu_array name count_map flags 0x0 key 4B value 4B max_entries 65536 memlock 8912896B 13: perf_event_array name count_map flags 0x0 key 4B value 4B max_entries 65536 memlock 524288B 14: prog_array name count_map flags 0x0 key 4B value 4B max_entries 65536 memlock 524288B 15: cgroup_array name count_map flags 0x0 key 4B value 4B max_entries 65536 memlock 524288B - after 11: array name count_map flags 0x0 key 4B value 4B max_entries 65536 memlock 524608B 12: percpu_array name count_map flags 0x0 key 4B value 4B max_entries 65536 memlock 17301824B 13: perf_event_array name count_map flags 0x0 key 4B value 4B max_entries 65536 memlock 524608B 14: prog_array name count_map flags 0x0 key 4B value 4B max_entries 65536 memlock 524608B 15: cgroup_array name count_map flags 0x0 key 4B value 4B max_entries 65536 memlock 524608B Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Link: https://lore.kernel.org/r/20230305124615.12358-5-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-07 09:33:42 -08:00
Yafang Shao	304849a27b	bpf: hashtab memory usage htab_map_mem_usage() is introduced to calculate hashmap memory usage. In this helper, some small memory allocations are ignore, as their size is quite small compared with the total size. The inner_map_meta in hash_of_map is also ignored. The result for hashtab as follows, - before this change 1: hash name count_map flags 0x1 <<<< no prealloc, fully set key 16B value 24B max_entries 1048576 memlock 41943040B 2: hash name count_map flags 0x1 <<<< no prealloc, none set key 16B value 24B max_entries 1048576 memlock 41943040B 3: hash name count_map flags 0x0 <<<< prealloc key 16B value 24B max_entries 1048576 memlock 41943040B The memlock is always a fixed size whatever it is preallocated or not, and whatever the count of allocated elements is. - after this change 1: hash name count_map flags 0x1 <<<< non prealloc, fully set key 16B value 24B max_entries 1048576 memlock 117441536B 2: hash name count_map flags 0x1 <<<< non prealloc, non set key 16B value 24B max_entries 1048576 memlock 16778240B 3: hash name count_map flags 0x0 <<<< prealloc key 16B value 24B max_entries 1048576 memlock 109056000B The memlock now is hashtab actually allocated. The result for percpu hash map as follows, - before this change 4: percpu_hash name count_map flags 0x0 <<<< prealloc key 16B value 24B max_entries 1048576 memlock 822083584B 5: percpu_hash name count_map flags 0x1 <<<< no prealloc key 16B value 24B max_entries 1048576 memlock 822083584B - after this change 4: percpu_hash name count_map flags 0x0 key 16B value 24B max_entries 1048576 memlock 897582080B 5: percpu_hash name count_map flags 0x1 key 16B value 24B max_entries 1048576 memlock 922748736B At worst, the difference can be 10x, for example, - before this change 6: hash name count_map flags 0x0 key 4B value 4B max_entries 1048576 memlock 8388608B - after this change 6: hash name count_map flags 0x0 key 4B value 4B max_entries 1048576 memlock 83889408B Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Acked-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20230305124615.12358-4-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-07 09:33:42 -08:00
Yafang Shao	41d5941e7f	bpf: lpm_trie memory usage trie_mem_usage() is introduced to calculate the lpm_trie memory usage. Some small memory allocations are ignored. The inner node is also ignored. The result as follows, - before 10: lpm_trie flags 0x1 key 8B value 8B max_entries 65536 memlock 1048576B - after 10: lpm_trie flags 0x1 key 8B value 8B max_entries 65536 memlock 2291536B Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Link: https://lore.kernel.org/r/20230305124615.12358-3-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-07 09:33:42 -08:00
Yafang Shao	90a5527d76	bpf: add new map ops ->map_mem_usage Add a new map ops ->map_mem_usage to print the memory usage of a bpf map. This is a preparation for the followup change. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Link: https://lore.kernel.org/r/20230305124615.12358-2-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-07 09:33:41 -08:00
Nathan Chancellor	2d5bcdcda8	bpf: Increase size of BTF_ID_LIST without CONFIG_DEBUG_INFO_BTF again After commit `66e3a13e7c` ("bpf: Add bpf_dynptr_slice and bpf_dynptr_slice_rdwr"), clang builds without CONFIG_DEBUG_INFO_BTF warn: kernel/bpf/verifier.c:10298:24: warning: array index 16 is past the end of the array (that has type 'u32[16]' (aka 'unsigned int[16]')) [-Warray-bounds] meta.func_id == special_kfunc_list[KF_bpf_dynptr_slice_rdwr]) { ^ ~~~~~~~~~~~~~~~~~~~~~~~~ kernel/bpf/verifier.c:9150:1: note: array 'special_kfunc_list' declared here BTF_ID_LIST(special_kfunc_list) ^ include/linux/btf_ids.h:207:27: note: expanded from macro 'BTF_ID_LIST' #define BTF_ID_LIST(name) static u32 __maybe_unused name[16]; ^ 1 warning generated. A warning of this nature was previously addressed by commit `beb3d47d1d` ("bpf: Fix a BTF_ID_LIST bug with CONFIG_DEBUG_INFO_BTF not set") but there have been new kfuncs added since then. Quadruple the size of the CONFIG_DEBUG_INFO_BTF=n definition so that this problem is unlikely to show up for some time. Link: https://github.com/ClangBuiltLinux/linux/issues/1810 Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Tom Rix <trix@redhat.com> Link: https://lore.kernel.org/r/20230307-bpf-kfuncs-warray-bounds-v1-1-00ad3191f3a6@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-07 07:49:28 -08:00
Jakub Kicinski	36e5e391a2	bpf-next-for-netdev -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZAZsBwAKCRDbK58LschI g3W1AQCQnO6pqqX5Q2aYDAZPlZRtV2TRLjuqrQE0dHW/XLAbBgD/bgsAmiKhPSCG 2mTt6izpTQVlZB0e8KcDIvbYd9CE3Qc= =EjJQ -----END PGP SIGNATURE----- Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2023-03-06 We've added 85 non-merge commits during the last 13 day(s) which contain a total of 131 files changed, 7102 insertions(+), 1792 deletions(-). The main changes are: 1) Add skb and XDP typed dynptrs which allow BPF programs for more ergonomic and less brittle iteration through data and variable-sized accesses, from Joanne Koong. 2) Bigger batch of BPF verifier improvements to prepare for upcoming BPF open-coded iterators allowing for less restrictive looping capabilities, from Andrii Nakryiko. 3) Rework RCU enforcement in the verifier, add kptr_rcu and enforce BPF programs to NULL-check before passing such pointers into kfunc, from Alexei Starovoitov. 4) Add support for kptrs in percpu hashmaps, percpu LRU hashmaps and in local storage maps, from Kumar Kartikeya Dwivedi. 5) Add BPF verifier support for ST instructions in convert_ctx_access() which will help new -mcpu=v4 clang flag to start emitting them, from Eduard Zingerman. 6) Make uprobe attachment Android APK aware by supporting attachment to functions inside ELF objects contained in APKs via function names, from Daniel Müller. 7) Add a new flag BPF_F_TIMER_ABS flag for bpf_timer_start() helper to start the timer with absolute expiration value instead of relative one, from Tero Kristo. 8) Add a new kfunc bpf_cgroup_from_id() to look up cgroups via id, from Tejun Heo. 9) Extend libbpf to support users manually attaching kprobes/uprobes in the legacy/perf/link mode, from Menglong Dong. 10) Implement workarounds in the mips BPF JIT for DADDI/R4000, from Jiaxun Yang. 11) Enable mixing bpf2bpf and tailcalls for the loongarch BPF JIT, from Hengqi Chen. 12) Extend BPF instruction set doc with describing the encoding of BPF instructions in terms of how bytes are stored under big/little endian, from Jose E. Marchesi. 13) Follow-up to enable kfunc support for riscv BPF JIT, from Pu Lehui. 14) Fix bpf_xdp_query() backwards compatibility on old kernels, from Yonghong Song. 15) Fix BPF selftest cross compilation with CLANG_CROSS_FLAGS, from Florent Revest. 16) Improve bpf_cpumask_ma to only allocate one bpf_mem_cache, from Hou Tao. 17) Fix BPF verifier's check_subprogs to not unnecessarily mark a subprogram with has_tail_call, from Ilya Leoshkevich. 18) Fix arm syscall regs spec in libbpf's bpf_tracing.h, from Puranjay Mohan. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (85 commits) selftests/bpf: Add test for legacy/perf kprobe/uprobe attach mode selftests/bpf: Split test_attach_probe into multi subtests libbpf: Add support to set kprobe/uprobe attach mode tools/resolve_btfids: Add /libsubcmd to .gitignore bpf: add support for fixed-size memory pointer returns for kfuncs bpf: generalize dynptr_get_spi to be usable for iters bpf: mark PTR_TO_MEM as non-null register type bpf: move kfunc_call_arg_meta higher in the file bpf: ensure that r0 is marked scratched after any function call bpf: fix visit_insn()'s detection of BPF_FUNC_timer_set_callback helper bpf: clean up visit_insn()'s instruction processing selftests/bpf: adjust log_fixup's buffer size for proper truncation bpf: honor env->test_state_freq flag in is_state_visited() selftests/bpf: enhance align selftest's expected log matching bpf: improve regsafe() checks for PTR_TO_{MEM,BUF,TP_BUFFER} bpf: improve stack slot state printing selftests/bpf: Disassembler tests for verifier.c:convert_ctx_access() selftests/bpf: test if pointer type is tracked for BPF_ST_MEM bpf: allow ctx writes using BPF_ST_MEM instruction bpf: Use separate RCU callbacks for freeing selem ... ==================== Link: https://lore.kernel.org/r/20230307004346.27578-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-03-06 20:36:39 -08:00
Andrii Nakryiko	8f4c92f002	Merge branch 'libbpf: allow users to set kprobe/uprobe attach mode' Menglong Dong says: ==================== From: Menglong Dong <imagedong@tencent.com> By default, libbpf will attach the kprobe/uprobe BPF program in the latest mode that supported by kernel. In this series, we add the support to let users manually attach kprobe/uprobe in legacy/perf/link mode in the 1th patch. And in the 2th patch, we split the testing 'attach_probe' into multi subtests, as Andrii suggested. In the 3th patch, we add the testings for loading kprobe/uprobe in different mode. Changes since v3: - rename eBPF to BPF in the doc - use OPTS_GET() to get the value of 'force_ioctl_attach' - error out on attach mode is not supported - use test_attach_probe_manual__open_and_load() directly Changes since v2: - fix the typo in the 2th patch Changes since v1: - some small changes in the 1th patch, as Andrii suggested - split 'attach_probe' into multi subtests ==================== Signed-off-by: Andrii Nakryiko <andrii@kernel.org>	2023-03-06 09:38:08 -08:00
Menglong Dong	c7aec81b31	selftests/bpf: Add test for legacy/perf kprobe/uprobe attach mode Add the testing for kprobe/uprobe attaching in default, legacy, perf and link mode. And the testing passed: ./test_progs -t attach_probe $5/1 attach_probe/manual-default:OK $5/2 attach_probe/manual-legacy:OK $5/3 attach_probe/manual-perf:OK $5/4 attach_probe/manual-link:OK $5/5 attach_probe/auto:OK $5/6 attach_probe/kprobe-sleepable:OK $5/7 attach_probe/uprobe-lib:OK $5/8 attach_probe/uprobe-sleepable:OK $5/9 attach_probe/uprobe-ref_ctr:OK $5 attach_probe:OK Summary: 1/9 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Menglong Dong <imagedong@tencent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Biao Jiang <benbjiang@tencent.com> Link: https://lore.kernel.org/bpf/20230306064833.7932-4-imagedong@tencent.com	2023-03-06 09:38:08 -08:00
Menglong Dong	7391ec6391	selftests/bpf: Split test_attach_probe into multi subtests In order to adapt to the older kernel, now we split the "attach_probe" testing into multi subtests: manual // manual attach tests for kprobe/uprobe auto // auto-attach tests for kprobe and uprobe kprobe-sleepable // kprobe sleepable test uprobe-lib // uprobe tests for library function by name uprobe-sleepable // uprobe sleepable test uprobe-ref_ctr // uprobe ref_ctr test As sleepable kprobe needs to set BPF_F_SLEEPABLE flag before loading, we need to move it to a stand alone skel file, in case of it is not supported by kernel and make the whole loading fail. Therefore, we can only enable part of the subtests for older kernel. Signed-off-by: Menglong Dong <imagedong@tencent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Biao Jiang <benbjiang@tencent.com> Link: https://lore.kernel.org/bpf/20230306064833.7932-3-imagedong@tencent.com	2023-03-06 09:38:08 -08:00
Menglong Dong	f8b299bc6a	libbpf: Add support to set kprobe/uprobe attach mode By default, libbpf will attach the kprobe/uprobe BPF program in the latest mode that supported by kernel. In this patch, we add the support to let users manually attach kprobe/uprobe in legacy or perf mode. There are 3 mode that supported by the kernel to attach kprobe/uprobe: LEGACY: create perf event in legacy way and don't use bpf_link PERF: create perf event with perf_event_open() and don't use bpf_link Signed-off-by: Menglong Dong <imagedong@tencent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Biao Jiang <benbjiang@tencent.com> Link: create perf event with perf_event_open() and use bpf_link Link: https://lore.kernel.org/bpf/20230113093427.1666466-1-imagedong@tencent.com/ Link: https://lore.kernel.org/bpf/20230306064833.7932-2-imagedong@tencent.com Users now can manually choose the mode with bpf_program__attach_uprobe_opts()/bpf_program__attach_kprobe_opts().	2023-03-06 09:38:04 -08:00
Rong Tao	fd4cb29f2a	tools/resolve_btfids: Add /libsubcmd to .gitignore Add libsubcmd to .gitignore, otherwise after compiling the kernel it would result in the following: # bpf-next...bpf-next/master ?? tools/bpf/resolve_btfids/libsubcmd/ Signed-off-by: Rong Tao <rongtao@cestc.cn> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/tencent_F13D670D5D7AA9C4BD868D3220921AAC090A@qq.com	2023-03-06 16:23:01 +01:00
Andrii Nakryiko	f4b4eee616	bpf: add support for fixed-size memory pointer returns for kfuncs Support direct fixed-size (and for now, read-only) memory access when kfunc's return type is a pointer to non-struct type. Calculate type size and let BPF program access that many bytes directly. This is crucial for numbers iterator. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230302235015.2044271-13-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-04 11:14:32 -08:00
Andrii Nakryiko	a461f5adf1	bpf: generalize dynptr_get_spi to be usable for iters Generalize the logic of fetching special stack slot object state using spi (stack slot index). This will be used by STACK_ITER logic next. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230302235015.2044271-12-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-04 11:14:32 -08:00
Andrii Nakryiko	d5271c5b19	bpf: mark PTR_TO_MEM as non-null register type PTR_TO_MEM register without PTR_MAYBE_NULL is indeed non-null. This is important for BPF verifier to be able to prune guaranteed not to be taken branches. This is always the case with open-coded iterators. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230302235015.2044271-11-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-04 11:14:32 -08:00
Andrii Nakryiko	d0e1ac2279	bpf: move kfunc_call_arg_meta higher in the file Move struct bpf_kfunc_call_arg_meta higher in the file and put it next to struct bpf_call_arg_meta, so it can be used from more functions. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230302235015.2044271-10-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-04 11:14:32 -08:00
Andrii Nakryiko	553a64a85c	bpf: ensure that r0 is marked scratched after any function call r0 is important (unless called function is void-returning, but that's taken care of by print_verifier_state() anyways) in verifier logs. Currently for helpers we seem to print it in verifier log, but for kfuncs we don't. Instead of figuring out where in the maze of code we accidentally set r0 as scratched for helpers and why we don't do that for kfuncs, just enforce that after any function call r0 is marked as scratched. Also, perhaps, we should reconsider "scratched" terminology, as it's mightily confusing. "Touched" would seem more appropriate. But I left that for follow ups for now. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230302235015.2044271-9-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-04 11:14:32 -08:00
Andrii Nakryiko	c1ee85a980	bpf: fix visit_insn()'s detection of BPF_FUNC_timer_set_callback helper It's not correct to assume that any BPF_CALL instruction is a helper call. Fix visit_insn()'s detection of bpf_timer_set_callback() helper by also checking insn->code == 0. For kfuncs insn->code would be set to BPF_PSEUDO_KFUNC_CALL, and for subprog calls it will be BPF_PSEUDO_CALL. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230302235015.2044271-8-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-04 11:14:32 -08:00
Andrii Nakryiko	653ae3a874	bpf: clean up visit_insn()'s instruction processing Instead of referencing processed instruction repeatedly as insns[t] throughout entire visit_insn() function, take a local insn pointer and work with it in a cleaner way. It makes enhancing this function further a bit easier as well. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230302235015.2044271-7-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-04 11:14:32 -08:00
Andrii Nakryiko	fffc893b6b	selftests/bpf: adjust log_fixup's buffer size for proper truncation Adjust log_fixup's expected buffer length to fix the test. It's pretty finicky in its length expectation, but it doesn't break often. So just adjust the length to work on current kernel and with follow up iterator changes as well. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230302235015.2044271-6-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-04 11:14:32 -08:00
Andrii Nakryiko	98ddcf389d	bpf: honor env->test_state_freq flag in is_state_visited() env->test_state_freq flag can be set by user by passing BPF_F_TEST_STATE_FREQ program flag. This is used in a bunch of selftests to have predictable state checkpoints at every jump and so on. Currently, bounded loop handling heuristic ignores this flag if number of processed jumps and/or number of processed instructions is below some thresholds, which throws off that reliable state checkpointing. Honor this flag in all circumstances by disabling heuristic if env->test_state_freq is set. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230302235015.2044271-5-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-04 11:14:31 -08:00
Andrii Nakryiko	6f876e75d3	selftests/bpf: enhance align selftest's expected log matching Allow to search for expected register state in all the verifier log output that's related to specified instruction number. See added comment for an example of possible situation that is happening due to a simple enhancement done in the next patch, which fixes handling of env->test_state_freq flag in state checkpointing logic. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230302235015.2044271-4-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-04 11:14:31 -08:00
Andrii Nakryiko	567da5d253	bpf: improve regsafe() checks for PTR_TO_{MEM,BUF,TP_BUFFER} Teach regsafe() logic to handle PTR_TO_MEM, PTR_TO_BUF, and PTR_TO_TP_BUFFER similarly to PTR_TO_MAP_{KEY,VALUE}. That is, instead of exact match for var_off and range, use tnum_in() and range_within() checks, allowing more general verified state to subsume more specific current state. This allows to match wider range of valid and safe states, speeding up verification and detecting wider range of equivalent states for upcoming open-coded iteration looping logic. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230302235015.2044271-3-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-04 11:14:31 -08:00
Andrii Nakryiko	d54e0f6c1a	bpf: improve stack slot state printing Improve stack slot state printing to provide more useful and relevant information, especially for dynptrs. While previously we'd see something like: 8: (85) call bpf_ringbuf_reserve_dynptr#198 ; R0_w=scalar() fp-8_w=dddddddd fp-16_w=dddddddd refs=2 Now we'll see way more useful: 8: (85) call bpf_ringbuf_reserve_dynptr#198 ; R0_w=scalar() fp-16_w=dynptr_ringbuf(ref_id=2) refs=2 I experimented with printing the range of slots taken by dynptr, something like: fp-16..8_w=dynptr_ringbuf(ref_id=2) But it felt very awkward and pretty useless. So we print the lowest address (most negative offset) only. The general structure of this code is now also set up for easier extension and will accommodate ITER slots naturally. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230302235015.2044271-2-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-04 11:14:31 -08:00

1 2 3 4 5 ...

1168638 Commits