linux/drivers/base
Scott Cheloha 4fb6eabf10 drivers/base/memory.c: cache memory blocks in xarray to accelerate lookup
Searching for a particular memory block by id is an O(n) operation because
each memory block's underlying device is kept in an unsorted linked list
on the subsystem bus.

We can cut the lookup cost to O(log n) if we cache each memory block
in an xarray.  This time complexity improvement is significant on
systems with many memory blocks.  For example:

1. A 128GB POWER9 VM with 256MB memblocks has 512 blocks.  With this
   change  memory_dev_init() completes ~12ms faster and walk_memory_blocks()
   completes ~12ms faster.

Before:
[    0.005042] memory_dev_init: adding memory blocks
[    0.021591] memory_dev_init: added memory blocks
[    0.022699] walk_memory_blocks: walking memory blocks
[    0.038730] walk_memory_blocks: walked memory blocks 0-511

After:
[    0.005057] memory_dev_init: adding memory blocks
[    0.009415] memory_dev_init: added memory blocks
[    0.010519] walk_memory_blocks: walking memory blocks
[    0.014135] walk_memory_blocks: walked memory blocks 0-511

2. A 256GB POWER9 LPAR with 256MB memblocks has 1024 blocks.  With
   this change memory_dev_init() completes ~88ms faster and
   walk_memory_blocks() completes ~87ms faster.

Before:
[    0.252246] memory_dev_init: adding memory blocks
[    0.395469] memory_dev_init: added memory blocks
[    0.409413] walk_memory_blocks: walking memory blocks
[    0.433028] walk_memory_blocks: walked memory blocks 0-511
[    0.433094] walk_memory_blocks: walking memory blocks
[    0.500244] walk_memory_blocks: walked memory blocks 131072-131583

After:
[    0.245063] memory_dev_init: adding memory blocks
[    0.299539] memory_dev_init: added memory blocks
[    0.313609] walk_memory_blocks: walking memory blocks
[    0.315287] walk_memory_blocks: walked memory blocks 0-511
[    0.315349] walk_memory_blocks: walking memory blocks
[    0.316988] walk_memory_blocks: walked memory blocks 131072-131583

3. A 32TB POWER9 LPAR with 256MB memblocks has 131072 blocks.  With
   this change we complete memory_dev_init() ~37 minutes faster and
   walk_memory_blocks() at least ~30 minutes faster.  The exact timing
   for walk_memory_blocks() is  missing, though I observed that the
   soft lockups in walk_memory_blocks() disappeared with the change,
   suggesting that lower bound.

Before:
[   13.703907] memory_dev_init: adding blocks
[ 2287.406099] memory_dev_init: added all blocks
[ 2347.494986] [c000000014c5bb60] [c000000000869af4] walk_memory_blocks+0x94/0x160
[ 2527.625378] [c000000014c5bb60] [c000000000869af4] walk_memory_blocks+0x94/0x160
[ 2707.761977] [c000000014c5bb60] [c000000000869af4] walk_memory_blocks+0x94/0x160
[ 2887.899975] [c000000014c5bb60] [c000000000869af4] walk_memory_blocks+0x94/0x160
[ 3068.028318] [c000000014c5bb60] [c000000000869af4] walk_memory_blocks+0x94/0x160
[ 3248.158764] [c000000014c5bb60] [c000000000869af4] walk_memory_blocks+0x94/0x160
[ 3428.287296] [c000000014c5bb60] [c000000000869af4] walk_memory_blocks+0x94/0x160
[ 3608.425357] [c000000014c5bb60] [c000000000869af4] walk_memory_blocks+0x94/0x160
[ 3788.554572] [c000000014c5bb60] [c000000000869af4] walk_memory_blocks+0x94/0x160
[ 3968.695071] [c000000014c5bb60] [c000000000869af4] walk_memory_blocks+0x94/0x160
[ 4148.823970] [c000000014c5bb60] [c000000000869af4] walk_memory_blocks+0x94/0x160

After:
[   13.696898] memory_dev_init: adding blocks
[   15.660035] memory_dev_init: added all blocks
(the walk_memory_blocks traces disappear)

There should be no significant negative impact for machines with few
memory blocks.  A sparse xarray has a small footprint and an O(log n)
lookup is negligibly slower than an O(n) lookup for only the smallest
number of memory blocks.

1. A 16GB x86 machine with 128MB memblocks has 132 blocks.  With this
   change memory_dev_init() completes ~300us faster and walk_memory_blocks()
   completes no faster or slower.  The improvement is pretty close to noise.

Before:
[    0.224752] memory_dev_init: adding memory blocks
[    0.227116] memory_dev_init: added memory blocks
[    0.227183] walk_memory_blocks: walking memory blocks
[    0.227183] walk_memory_blocks: walked memory blocks 0-131

After:
[    0.224911] memory_dev_init: adding memory blocks
[    0.226935] memory_dev_init: added memory blocks
[    0.227089] walk_memory_blocks: walking memory blocks
[    0.227089] walk_memory_blocks: walked memory blocks 0-131

[david@redhat.com: document the locking]
  Link: http://lkml.kernel.org/r/bc21eec6-7251-4c91-2f57-9a0671f8d414@redhat.com
Signed-off-by: Scott Cheloha <cheloha@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Nathan Lynch <nathanl@linux.ibm.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Rafael J. Wysocki <rafael@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Rick Lindsley <ricklind@linux.vnet.ibm.com>
Cc: Scott Cheloha <cheloha@linux.ibm.com>
Link: http://lkml.kernel.org/r/20200121231028.13699-1-cheloha@linux.ibm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-03 20:09:49 -07:00
..
firmware_loader firmware_loader: revert removal of the fw_fallback_config export 2020-04-26 10:42:15 +02:00
power Merge branches 'pm-core' and 'pm-sleep' 2020-06-01 15:19:08 +02:00
regmap Merge remote-tracking branch 'regmap/for-5.8' into regmap-next 2020-05-29 14:03:32 +01:00
test Driver core changes for 5.6-rc1 2020-01-29 10:18:20 -08:00
arch_topology.c arm64 updates for 5.7: 2020-03-31 10:05:01 -07:00
attribute_container.c scsi: drivers: base: Support atomic version of attribute_container_device_trigger 2020-01-15 22:55:36 -05:00
base.h device.h: move devtmpfs prototypes out of the file 2019-12-16 10:10:18 +01:00
bus.c device.h: move 'struct bus' stuff out to device/bus.h 2019-12-16 10:11:12 +01:00
cacheinfo.c Driver Core and debugfs changes for 5.3-rc1 2019-07-12 12:24:03 -07:00
class.c device.h: move 'struct class' stuff out to device/class.h 2019-12-16 10:11:14 +01:00
component.c component: Silence bind error on -EPROBE_DEFER 2020-04-28 17:54:15 +02:00
container.c driver core: Remove redundant license text 2017-12-07 18:36:44 +01:00
core.c for-5.8/block-2020-06-01 2020-06-02 15:29:19 -07:00
cpu.c CPU (hotplug) updates: 2020-03-30 18:06:39 -07:00
dd.c driver core: Ensure wait_for_device_probe() waits until the deferred_probe_timeout fires 2020-04-28 17:57:13 +02:00
devcon.c Merge generic_lookup_helpers into usb-next 2019-09-03 17:11:07 +02:00
devcoredump.c devcoredump: fix typo in comment 2019-08-15 17:38:11 +02:00
devres.c drivers/base/devres: introduce devm_release_action() 2019-06-13 17:34:56 -10:00
devtmpfs.c Merge branch 'merge.nfs-fs_parse.1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-02-08 13:26:41 -08:00
driver.c device.h: move 'struct driver' stuff out to device/driver.h 2019-12-16 10:11:16 +01:00
firmware.c driver core: Remove redundant license text 2017-12-07 18:36:44 +01:00
hypervisor.c driver core: Remove redundant license text 2017-12-07 18:36:44 +01:00
init.c base: fix order of OF initialization 2018-07-07 17:54:29 +02:00
isa.c Merge 4.15-rc3 into driver-core-next 2017-12-11 08:50:05 +01:00
Kconfig kunit: building kunit as a module breaks allmodconfig 2020-01-10 14:36:37 -07:00
Makefile drivers: base: Introducing software nodes to the firmware node framework 2018-11-26 18:19:11 +01:00
map.c driver core: Remove redundant license text 2017-12-07 18:36:44 +01:00
memory.c drivers/base/memory.c: cache memory blocks in xarray to accelerate lookup 2020-06-03 20:09:49 -07:00
module.c driver core: Remove redundant license text 2017-12-07 18:36:44 +01:00
node.c Merge branch 'akpm' (patches from Andrew) 2020-06-02 12:21:36 -07:00
pinctrl.c driver core: Remove redundant license text 2017-12-07 18:36:44 +01:00
platform-msi.c platform-msi: Free descriptors in platform_msi_domain_free() 2018-12-13 09:35:31 +00:00
platform.c A fair amount of stuff this time around, dominated by yet another massive 2020-06-01 15:45:27 -07:00
property.c device property: Export fwnode_get_name() 2020-03-16 07:47:58 +01:00
soc.c base: soc: Handle custom soc information sysfs entries 2019-10-10 14:35:32 +02:00
swnode.c software node: Allow register and unregister software node groups 2020-04-20 14:41:56 +03:00
syscore.c treewide: Switch printk users from %pf and %pF to %ps and %pS, respectively 2019-04-09 14:19:06 +02:00
topology.c topology: Create core_cpus and die_cpus sysfs attributes 2019-05-23 10:08:34 +02:00
transport_class.c scsi: drivers: base: Propagate errors through the transport component 2020-01-15 22:55:37 -05:00