linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-11-18 09:44:18 +08:00

History

Martin KaFai Lau 695ba2651a bpf: lru: Lower the PERCPU_NR_SCANS from 16 to 4 After doing map_perf_test with a much bigger BPF_F_NO_COMMON_LRU map, the perf report shows a lot of time spent in rotating the inactive list (i.e. __bpf_lru_list_rotate_inactive): > map_perf_test 32 8 10000 1000000 \| awk '{sum += $3}END{print sum}' 19644783 (19M/s) > map_perf_test 32 8 10000000 10000000 \| awk '{sum += $3}END{print sum}' 6283930 (6.28M/s) By inactive, it usually means the element is not in cache. Hence, there is a need to tune the PERCPU_NR_SCANS value. This patch finds a better number of elements to scan during each list rotation. The PERCPU_NR_SCANS (which is defined the same as PERCPU_FREE_TARGET) decreases from 16 elements to 4 elements. This change only affects the BPF_F_NO_COMMON_LRU map. The test_lru_dist does not show meaningful difference between 16 and 4. Our production L4 load balancer which uses the LRU map for conntrack-ing also shows little change in cache hit rate. Since both benchmark and production data show no cache-hit difference, PERCPU_NR_SCANS is lowered from 16 to 4. We can consider making it configurable if we find a usecase later that shows another value works better and/or use a different rotation strategy. After this change: > map_perf_test 32 8 10000000 10000000 \| awk '{sum += $3}END{print sum}' 9240324 (9.2M/s) i.e. 6.28M/s -> 9.2M/s The test_lru_dist has not shown meaningful difference: > test_lru_dist zipf.100k.a1_01.out 4000 1: nr_misses: 31575 (Before) vs 31566 (After) > test_lru_dist zipf.100k.a0_01.out 40000 1 nr_misses: 67036 (Before) vs 67031 (After) Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>		2017-04-17 13:55:52 -04:00
..
fault-injection	fault-injection: fix failcmd.sh warning	2012-07-31 18:42:38 -07:00
ktest	Greg Kroah-Hartman reported to me that the ktest of v4.10 locked up in an	2017-03-08 11:06:05 -08:00
nvdimm	tools/testing/nvdimm: make iset cookie predictable	2017-03-01 00:09:51 -08:00
radix-tree	radix tree test suite: Specify -m32 in LDFLAGS too	2017-03-07 13:18:24 -05:00
selftests	bpf: lru: Lower the PERCPU_NR_SCANS from 16 to 4	2017-04-17 13:55:52 -04:00