linux/arch/powerpc/kernel
Nishanth Aravamudan 2fabf084b6 powerpc: reorder per-cpu NUMA information's initialization
There is an issue currently where NUMA information is used on powerpc
(and possibly ia64) before it has been read from the device-tree, which
leads to large slab consumption with CONFIG_SLUB and memoryless nodes.

NUMA powerpc non-boot CPU's cpu_to_node/cpu_to_mem is only accurate
after start_secondary(), similar to ia64, which is invoked via
smp_init().

Commit 6ee0578b4d ("workqueue: mark init_workqueues() as
early_initcall()") made init_workqueues() be invoked via
do_pre_smp_initcalls(), which is obviously before the secondary
processors are online.

Additionally, the following commits changed init_workqueues() to use
cpu_to_node to determine the node to use for kthread_create_on_node:

bce903809a ("workqueue: add wq_numa_tbl_len and
wq_numa_possible_cpumask[]")
f3f90ad469 ("workqueue: determine NUMA node of workers accourding to
the allowed cpumask")

Therefore, when init_workqueues() runs, it sees all CPUs as being on
Node 0. On LPARs or KVM guests where Node 0 is memoryless, this leads to
a high number of slab deactivations
(http://www.spinics.net/lists/linux-mm/msg67489.html).

Fix this by initializing the powerpc-specific CPU<->node/local memory
node mapping as early as possible, which on powerpc is
do_init_bootmem(). Currently that function initializes the mapping for
the boot CPU, but we extend it to setup the mapping for all possible
CPUs. Then, in smp_prepare_cpus(), we can correspondingly set the
per-cpu values for all possible CPUs. That ensures that before the
early_initcalls run (and really as early as possible), the per-cpu NUMA
mapping is accurate.

While testing memoryless nodes on PowerKVM guests with a fix to the
workqueue logic to use cpu_to_mem() instead of cpu_to_node(), with a
guest topology of:

available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
node 0 size: 0 MB
node 0 free: 0 MB
node 1 cpus: 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
node 1 size: 16336 MB
node 1 free: 15329 MB
node distances:
node   0   1
  0:  10  40
  1:  40  10

the slab consumption decreases from

Slab:             932416 kB
SUnreclaim:       902336 kB

to

Slab:             395264 kB
SUnreclaim:       359424 kB

And we a corresponding increase in the slab efficiency from

slab                                   mem     objs    slabs
                                      used   active   active
------------------------------------------------------------
kmalloc-16384                       337 MB   11.28%  100.00%
task_struct                         288 MB    9.93%  100.00%

to

slab                                   mem     objs    slabs
                                      used   active   active
------------------------------------------------------------
kmalloc-16384                        37 MB  100.00%  100.00%
task_struct                          31 MB  100.00%  100.00%

Powerpc didn't support memoryless nodes until recently (64bb80d87f
"powerpc/numa: Enable CONFIG_HAVE_MEMORYLESS_NODES" and 8c27226119
"powerpc/numa: Enable USE_PERCPU_NUMA_NODE_ID"). Those commits also
helped improve memory consumption with these kind of environments.

Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-08-13 15:14:05 +10:00
..
vdso32 powerpc/booke64: Use SPRG7 for VDSO 2014-03-19 19:57:14 -05:00
vdso64 powerpc/booke64: Use SPRG7 for VDSO 2014-03-19 19:57:14 -05:00
.gitignore
align.c KVM: PPC: BOOK3S: Remove open coded make_dsisr in alignment handler 2014-05-30 14:26:25 +02:00
asm-offsets.c Here are the PPC and ARM changes for KVM, which I separated because 2014-08-07 11:35:30 -07:00
audit.c
btext.c powerpc/btext: Fix CONFIG_PPC_EARLY_DEBUG_BOOTX on ppc32 2013-08-27 16:01:23 +10:00
cacheinfo.c powerpc/pseries: Update dynamic cache nodes for suspend/resume operation 2014-03-07 15:54:49 +11:00
cacheinfo.h
compat_audit.c
cpu_setup_6xx.S
cpu_setup_44x.S
cpu_setup_fsl_booke.S powerpc: No need to use dot symbols when branching to a function 2014-04-23 10:05:16 +10:00
cpu_setup_pa6t.S
cpu_setup_power.S powerpc/powernv: Enable POWER8 doorbell IPIs 2014-06-11 17:05:12 +10:00
cpu_setup_ppc970.S
cputable.c powerpc: Remove CLASSIC_PPC 2014-07-28 14:11:22 +10:00
crash_dump.c powerpc/crashdump : Fix page frame number check in copy_oldmem_page 2014-02-28 18:06:25 +11:00
crash.c arch,powerpc: Convert smp_mb__*() 2014-04-18 14:20:41 +02:00
dbell.c powerpc: Add accounting for Doorbell interrupts 2013-04-18 15:59:55 +10:00
dma-iommu.c powerpc/iommu: Update the generic code to use dynamic iommu page sizes 2013-12-30 14:17:19 +11:00
dma-swiotlb.c
dma.c powerpc/powernv: Add iommu DMA bypass support for IODA2 2014-02-11 16:07:37 +11:00
eeh_cache.c powerpc/eeh: Replace pr_warning() with pr_warn() 2014-08-05 15:41:34 +10:00
eeh_dev.c powerpc/eeh: Replace pr_warning() with pr_warn() 2014-08-05 15:41:34 +10:00
eeh_driver.c powerpc/eeh: Replace pr_warning() with pr_warn() 2014-08-05 15:41:34 +10:00
eeh_event.c powerpc/powernv: Fix killed EEH event 2014-06-11 17:04:33 +10:00
eeh_pe.c powerpc/eeh: Aux PE data for error log 2014-08-05 15:41:43 +10:00
eeh_sysfs.c powerpc/eeh: Skip eeh sysfs when eeh is disabled 2014-06-05 13:20:38 +10:00
eeh.c powerpc/eeh: Export eeh_iommu_group_to_pe() 2014-08-07 13:00:02 +10:00
entry_32.S powerpc/32bit:Store temporary result in r0 instead of r8 2013-06-01 08:29:27 +10:00
entry_64.S powerpc/book3s: Add basic infrastructure to handle HMI in Linux. 2014-08-05 16:33:48 +10:00
epapr_hcalls.S powerpc: Add paravirt idle loop for 64-bit Book-E 2013-03-13 14:19:36 -05:00
epapr_paravirt.c Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc 2014-06-10 18:54:22 -07:00
exceptions-64e.S powerpc: Remove platforms/wsp and associated pieces 2014-06-11 16:35:38 +10:00
exceptions-64s.S powerpc: Fix "attempt to move .org backwards" error 2014-08-13 15:13:25 +10:00
fadump.c Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc 2014-06-10 18:54:22 -07:00
firmware.c
fpu.S powerpc: Don't corrupt transactional state when using FP/VMX in kernel 2014-01-15 13:59:11 +11:00
fsl_booke_entry_mapping.S powerpc: enable the relocatable support for the fsl booke 32bit kernel 2014-01-09 17:52:16 -06:00
ftrace.c powerpc/ftrace: Add call to ftrace_graph_is_dead() in function graph code 2014-07-18 13:56:56 -04:00
head_8xx.S powerpc/8xx: Fixing issue with CONFIG_PIN_TLB 2013-10-28 21:11:21 -05:00
head_32.S
head_40x.S powerpc: Remove check for CONFIG_SERIAL_TEXT_DEBUG 2014-06-11 16:31:21 +10:00
head_44x.S powerpc/ppc476: Disable BTAC 2014-08-13 15:13:42 +10:00
head_64.S Merge remote-tracking branch 'scott/next' into next 2014-08-05 14:13:41 +10:00
head_booke.h powerpc: Fix interrupt range check on debug exception 2013-05-02 10:31:01 +10:00
head_fsl_booke.S powerpc/fsl_booke: smp support for booting a relocatable kernel above 64M 2014-01-09 17:52:18 -06:00
hw_breakpoint.c powerpc: Fix smp_processor_id() in preemptible splat in set_breakpoint 2014-05-20 10:54:06 +10:00
ibmebus.c PPC: ibmebus: convert bus code to use bus_groups 2013-09-26 15:49:42 -07:00
idle_6xx.S
idle_book3e.S powerpc: No need to use dot symbols when branching to a function 2014-04-23 10:05:16 +10:00
idle_e500.S
idle_power4.S powerpc: No need to use dot symbols when branching to a function 2014-04-23 10:05:16 +10:00
idle_power7.S powerpc/book3s: Fix endianess issue for HMI handling on napping cpus. 2014-08-05 16:34:01 +10:00
idle.c powerpc/idle: Convert use of typedef ctl_table to struct ctl_table 2013-07-01 11:10:35 +10:00
io-workarounds.c powerpc: Better split CONFIG_PPC_INDIRECT_PIO and CONFIG_PPC_INDIRECT_MMIO 2013-08-14 14:57:50 +10:00
io.c powerpc/powernv: Add PIO accessors for Power8 LPC bus 2013-08-14 14:58:08 +10:00
iomap.c powerpc/kerenl: Enable EEH for IO accessors 2014-06-24 12:43:13 +10:00
iommu.c powerpc/powernv: Fix IOMMU group lost 2014-08-13 15:13:42 +10:00
irq.c powerpc/book3s: Add basic infrastructure to handle HMI in Linux. 2014-08-05 16:33:48 +10:00
isa-bridge.c
jump_label.c
kgdb.c powerpc: Delete non-required instances of include <linux/init.h> 2014-01-15 13:46:44 +11:00
kprobes.c powerpc/kprobes: Fix jprobes on ABI v2 (LE) 2014-06-24 14:05:55 +10:00
kvm_emul.S
kvm.c At over 200 commits, covering almost all supported architectures, this 2014-06-04 08:47:12 -07:00
l2cr_6xx.S
legacy_serial.c powerpc/serial: Use saner flags when creating legacy ports 2014-06-05 13:20:36 +10:00
machine_kexec_32.c
machine_kexec_64.c powerpc, kexec: Fix "Processor X is stuck" issue during kexec from ST mode 2014-05-28 13:24:26 +10:00
machine_kexec.c powerpc: Fix endian issues in kexec and crash dump code 2014-02-11 11:24:52 +11:00
Makefile powerpc: Remove platforms/wsp and associated pieces 2014-06-11 16:35:38 +10:00
mce_power.c powerpc/book3s: Recover from MC in sapphire on SCOM read via MMIO. 2014-03-07 15:52:10 +11:00
mce.c powerpc/book3s: Recover from MC in sapphire on SCOM read via MMIO. 2014-03-07 15:52:10 +11:00
misc_32.S powerpc: Set the correct ksp_limit on ppc32 when switching to irq stack 2014-02-17 11:19:34 +11:00
misc_64.S powerpc: module: handle MODVERSION for .TOC. 2014-04-23 10:05:28 +10:00
misc.S
module_32.c powerpc: Move local setup.h declarations to arch includes 2013-10-30 16:00:31 +11:00
module_64.c powerpc/module: Fix TOC symbol CRC 2014-06-25 13:10:48 +10:00
module.c powerpc: Move local setup.h declarations to arch includes 2013-10-30 16:00:31 +11:00
msi.c
nvram_64.c arch/powerpc/kernel: Use %12.12s instead of %12s to avoid memory overflow 2013-11-25 11:50:57 +11:00
of_platform.c
paca.c KVM: PPC: Book3S PR: Rework SLB switching code 2014-05-30 14:26:30 +02:00
pci_32.c powerpc/pci: Support per-aperture memory offset 2013-05-06 13:40:40 +10:00
pci_64.c powerpc/PCI: Fix NULL dereference in sys_pciconfig_iobase() list traversal 2014-04-14 16:33:49 -06:00
pci_dn.c powerpc: Make PCI device node device tree accesses endian safe 2013-08-14 15:33:31 +10:00
pci_of_scan.c Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc 2014-06-10 18:54:22 -07:00
pci-common.c powerpc/pci: Remove duplicate logic 2014-07-03 16:45:04 -06:00
pci-hotplug.c powerpc/PCI: Use pci_is_bridge() to simplify code 2014-05-27 14:57:36 -06:00
pmc.c
ppc32.h powerpc: switch to generic old sigaction() 2013-02-03 18:16:10 -05:00
ppc_ksyms.c powerpc: memcpy optimization for 64bit LE 2014-04-30 15:26:18 +10:00
ppc_save_regs.S
proc_powerpc.c proc_powerpc: switch to fixed_size_llseek() 2013-06-29 12:57:50 +04:00
process.c powerpc: Reduce scariness of interrupt frames in stack traces 2014-08-05 16:34:28 +10:00
prom_init_check.sh powerpc/powernv: Remove OPAL v1 takeover 2014-06-25 13:10:47 +10:00
prom_init.c powerpc/powernv: Remove OPAL v1 takeover 2014-06-25 13:10:47 +10:00
prom_parse.c powerpc: of_parse_dma_window should take a __be32 *dma_window 2013-08-14 15:33:26 +10:00
prom.c Merge remote-tracking branch 'scott/next' into next 2014-08-05 14:13:41 +10:00
ptrace32.c powerpc: move debug registers in a structure 2013-10-18 18:44:49 -05:00
ptrace.c powerpc: PTRACE_PEEKUSR always returns FPR0 2013-12-13 15:48:33 +11:00
reloc_32.S powerpc: Don't flush/invalidate the d/icache for an unknown relocation type 2013-07-01 11:10:34 +10:00
reloc_64.S powerpc: Align p_dyn, p_rela and p_st symbols 2014-03-07 13:50:19 +11:00
rtas_flash.c powerpc: Fix endianness of flash_block_list in rtas_flash 2014-07-28 11:30:54 +10:00
rtas_pci.c powerpc/eeh: Block PCI-CFG access during PE reset 2014-04-28 17:34:02 +10:00
rtas-proc.c
rtas-rtc.c
rtas.c of/fdt: update of_get_flat_dt_prop in prep for libfdt 2014-04-30 00:59:15 -05:00
rtasd.c powerpc/le: Enable RTAS events support 2014-04-07 10:33:12 +10:00
setup_32.c powerpc: Make boot_cpuid common between 32 and 64-bit 2014-04-07 10:33:14 +10:00
setup_64.c arch/powerpc: replace obsolete strict_strto* calls 2014-08-08 15:57:28 -07:00
setup-common.c Merge remote-tracking branch 'scott/next' into next 2014-08-05 14:13:41 +10:00
signal_32.c powerpc: Use sigsp() 2014-08-06 13:04:32 +02:00
signal_64.c powerpc: Use sigsp() 2014-08-06 13:04:32 +02:00
signal.c powerpc: Use sigsp() 2014-08-06 13:04:32 +02:00
signal.h powerpc: Use get_signal() signal_setup_done() 2014-08-06 13:03:09 +02:00
smp-tbsync.c powerpc: Delete non-required instances of include <linux/init.h> 2014-01-15 13:46:44 +11:00
smp.c powerpc: reorder per-cpu NUMA information's initialization 2014-08-13 15:14:05 +10:00
stacktrace.c
suspend.c
swsusp_32.S
swsusp_64.c
swsusp_asm64.S powerpc: Only save/restore SDR1 if in hypervisor mode 2013-10-31 12:37:29 +11:00
swsusp_booke.S powerpc/fsl-booke: Use SPRN_SPRGn rather than mfsprg/mtsprg 2014-01-07 19:06:03 -06:00
swsusp.c
sys_ppc32.c unify compat fanotify_mark(2), switch to COMPAT_SYSCALL_DEFINE 2013-05-09 13:46:38 -04:00
syscalls.c powerpc: Delete non-required instances of include <linux/init.h> 2014-01-15 13:46:44 +11:00
sysfs.c powerpc: Fix regression of per-CPU DSCR setting 2014-05-28 13:35:40 +10:00
systbl_chk.c
systbl_chk.sh
systbl.S powerpc: Use standard macros for sys_sigpending() & sys_old_getrlimit() 2014-07-28 14:09:23 +10:00
tau_6xx.c
time.c clocksource: Get rid of cycle_last 2014-07-23 15:01:52 -07:00
tm.S powerpc: Fix regression of per-CPU DSCR setting 2014-05-28 13:35:40 +10:00
traps.c powerpc/book3s: Add basic infrastructure to handle HMI in Linux. 2014-08-05 16:33:48 +10:00
udbg_16550.c powerpc: Remove platforms/wsp and associated pieces 2014-06-11 16:35:38 +10:00
udbg.c powerpc: Remove platforms/wsp and associated pieces 2014-06-11 16:35:38 +10:00
uprobes.c uprobes/powerpc: Kill arch_uprobe->ainsn 2013-11-20 16:31:01 +01:00
vdso.c arm64,ia64,ppc,s390,sh,tile,um,x86,mm: remove default gate area 2014-08-08 15:57:27 -07:00
vecemu.c powerpc: Put FP/VSX and VR state into structures 2013-10-11 17:26:49 +11:00
vector.S powerpc: Don't corrupt transactional state when using FP/VMX in kernel 2014-01-15 13:59:11 +11:00
vio.c arch/powerpc: replace obsolete strict_strto* calls 2014-08-08 15:57:28 -07:00
vmlinux.lds.S powerpc/modules: Module CRC relocation fix causes perf issues 2013-07-24 14:18:43 +10:00