mirror of
https://mirrors.bfsu.edu.cn/git/linux.git
synced 2025-01-16 10:54:09 +08:00
cc95a07fef
Using per-cpu storage for @x86_cpu_to_logical_apicid is not optimal. Broadcast IPI will need at least one cache line per cpu to access this field. __x2apic_send_IPI_mask() is using standard bitmask operators. By converting x86_cpu_to_logical_apicid to an array, we divide by 16x number of needed cache lines, because we find 16 values per cache line. CPU prefetcher can kick nicely. Also move @cluster_masks to READ_MOSTLY section to avoid false sharing. Tested on a dual socket host with 256 cpus, cost for a full broadcast is now 11 usec instead of 33 usec. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20211007143556.574911-1-eric.dumazet@gmail.com |
||
---|---|---|
.. | ||
apic_common.c | ||
apic_flat_64.c | ||
apic_noop.c | ||
apic_numachip.c | ||
apic.c | ||
bigsmp_32.c | ||
hw_nmi.c | ||
io_apic.c | ||
ipi.c | ||
local.h | ||
Makefile | ||
msi.c | ||
probe_32.c | ||
probe_64.c | ||
vector.c | ||
x2apic_cluster.c | ||
x2apic_phys.c | ||
x2apic_uv_x.c |