2
0
mirror of https://github.com/edk2-porting/linux-next.git synced 2025-01-04 11:43:54 +08:00
linux-next/include
Feng Tang 802f1d522d mm: page_counter: re-layout structure to reduce false sharing
When checking a memory cgroup related performance regression [1], from the
perf c2c profiling data, we found high false sharing for accessing 'usage'
and 'parent'.

On 64 bit system, the 'usage' and 'parent' are close to each other, and
easy to be in one cacheline (for cacheline size == 64+ B).  'usage' is
usally written, while 'parent' is usually read as the cgroup's
hierarchical counting nature.

So move the 'parent' to the end of the structure to make sure they
are in different cache lines.

Following are some performance data with the patch, against v5.11-rc1.  [
In the data, A means a platform with 2 sockets 48C/96T, B is a platform of
4 sockests 72C/144T, and if a %stddev will be shown bigger than 2%,
P100/P50 means number of test tasks equals to 100%/50% of nr_cpu]

will-it-scale/malloc1
---------------------
	   v5.11-rc1			v5.11-rc1+patch

A-P100	     15782 ±  2%      -0.1%      15765 ±  3%  will-it-scale.per_process_ops
A-P50	     21511            +8.9%      23432        will-it-scale.per_process_ops
B-P100	      9155            +2.2%       9357        will-it-scale.per_process_ops
B-P50	     10967            +7.1%      11751 ±  2%  will-it-scale.per_process_ops

will-it-scale/pagefault2
------------------------
	   v5.11-rc1			v5.11-rc1+patch

A-P100	     79028            +3.0%      81411        will-it-scale.per_process_ops
A-P50	    183960 ±  2%      +4.4%     192078 ±  2%  will-it-scale.per_process_ops
B-P100	     85966            +9.9%      94467 ±  3%  will-it-scale.per_process_ops
B-P50	    198195            +9.8%     217526        will-it-scale.per_process_ops

fio (4k/1M is block size)
-------------------------
	   v5.11-rc1			v5.11-rc1+patch

A-P50-r-4k     16881 ±  2%    +1.2%      17081 ±  2%  fio.read_bw_MBps
A-P50-w-4k      3931          +4.5%       4111 ±  2%  fio.write_bw_MBps
A-P50-r-1M     15178          -0.2%      15154        fio.read_bw_MBps
A-P50-w-1M      3924          +0.1%       3929        fio.write_bw_MBps

[1].https://lore.kernel.org/lkml/20201102091543.GM31092@shao2-debian/

Link: https://lkml.kernel.org/r/1611040814-33449-1-git-send-email-feng.tang@intel.com
Signed-off-by: Feng Tang <feng.tang@intel.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-24 13:38:29 -08:00
..
acpi IOMMU Updates for Linux v5.12 2021-02-22 10:31:29 -08:00
asm-generic Modules updates for v5.12 2021-02-23 10:15:33 -08:00
clocksource
crypto Keyrings miscellany 2021-02-23 16:09:23 -08:00
drm - WARN if plane src coords are too big (Ville) 2021-02-04 12:57:28 +10:00
dt-bindings Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input 2021-02-23 14:56:23 -08:00
keys
kunit
kvm
linux mm: page_counter: re-layout structure to reduce false sharing 2021-02-24 13:38:29 -08:00
math-emu
media Fixes around VM_FPNMAP and follow_pfn 2021-02-22 17:45:02 -08:00
memory
misc
net Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next 2021-02-17 13:19:24 -08:00
pcmcia
ras
rdma RDMA/ipoib: Remove racy Subnet Manager sendonly join checks 2021-02-16 14:42:58 -04:00
scsi
soc IOMMU Updates for Linux v5.12 2021-02-22 10:31:29 -08:00
sound ASoC: Updates for v5.12 2021-02-17 21:16:27 +01:00
target scsi: target: core: Change ASCQ for residual write 2021-01-26 23:12:18 -05:00
trace mm, tracing: record slab name for kmem_cache_free() 2021-02-24 13:38:26 -08:00
uapi Changes in gfs2: 2021-02-23 14:04:04 -08:00
vdso
video
xen x86: 2021-02-21 13:31:43 -08:00