linux/mm
Hiro Yoshioka c22ce143d1 [PATCH] x86: cache pollution aware __copy_from_user_ll()
Use the x86 cache-bypassing copy instructions for copy_from_user().

Some performance data are

Total of GLOBAL_POWER_EVENTS (CPU cycle samples)

2.6.12.4.orig    1921587
2.6.12.4.nt      1599424
1599424/1921587=83.23% (16.77% reduction)

BSQ_CACHE_REFERENCE (L3 cache miss)
2.6.12.4.orig      57427
2.6.12.4.nt        20858
20858/57427=36.32% (63.7% reduction)

L3 cache miss reduction of __copy_from_user_ll
samples  %
37408    65.1412  vmlinux                  __copy_from_user_ll
23        0.1103  vmlinux                  __copy_user_zeroing_intel_nocache
23/37408=0.061% (99.94% reduction)

Top 5 of 2.6.12.4.nt
Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01 (mandatory) count 100000
samples  %        app name                 symbol name
128392    8.0274  vmlinux                  __copy_user_zeroing_intel_nocache
64206     4.0143  vmlinux                  journal_add_journal_head
59746     3.7355  vmlinux                  do_get_write_access
47674     2.9807  vmlinux                  journal_put_journal_head
46021     2.8774  vmlinux                  journal_dirty_metadata
pattern9-0-cpu4-0-09011728/summary.out

Counted BSQ_CACHE_REFERENCE events (cache references seen by the bus unit) with a unit mask of 0x3f (multiple flags) count 3000
samples  %        app name                 symbol name
69755     4.2861  vmlinux                  __copy_user_zeroing_intel_nocache
55685     3.4215  vmlinux                  journal_add_journal_head
52371     3.2179  vmlinux                  __find_get_block
45504     2.7960  vmlinux                  journal_put_journal_head
36005     2.2123  vmlinux                  journal_stop
pattern9-0-cpu4-0-09011744/summary.out

Counted BSQ_CACHE_REFERENCE events (cache references seen by the bus unit) with a unit mask of 0x200 (read 3rd level cache miss) count 3000
samples  %        app name                 symbol name
1147      5.4994  vmlinux                  journal_add_journal_head
881       4.2240  vmlinux                  journal_dirty_data
872       4.1809  vmlinux                  blk_rq_map_sg
734       3.5192  vmlinux                  journal_commit_transaction
617       2.9582  vmlinux                  radix_tree_delete
pattern9-0-cpu4-0-09011731/summary.out

iozone results are

original 2.6.12.4 CPU time = 207.768 sec
cache aware       CPU time = 184.783 sec
(three times run)
184.783/207.768=88.94% (11.06% reduction)

original:
pattern9-0-cpu4-0-08191720/iozone.out:  CPU Utilization: Wall time   45.997    CPU time   64.527    CPU utilization 140.28 %
pattern9-0-cpu4-0-08191741/iozone.out:  CPU Utilization: Wall time   46.878    CPU time   71.933    CPU utilization 153.45 %
pattern9-0-cpu4-0-08191743/iozone.out:  CPU Utilization: Wall time   45.152    CPU time   71.308    CPU utilization 157.93 %

cache awre:
pattern9-0-cpu4-0-09011728/iozone.out:  CPU Utilization: Wall time   44.842    CPU time   62.465    CPU utilization 139.30 %
pattern9-0-cpu4-0-09011731/iozone.out:  CPU Utilization: Wall time   44.718    CPU time   59.273    CPU utilization 132.55 %
pattern9-0-cpu4-0-09011744/iozone.out:  CPU Utilization: Wall time   44.367    CPU time   63.045    CPU utilization 142.10 %

Signed-off-by: Hiro Yoshioka <hyoshiok@miraclelinux.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23 07:42:56 -07:00
..
bootmem.c [PATCH] x86_64: Handle empty PXMs that only contain hotplug memory 2006-04-09 11:53:16 -07:00
fadvise.c [PATCH] sys_sync_file_range() 2006-03-31 12:18:54 -08:00
filemap_xip.c [PATCH] replace inode_update_time with file_update_time 2006-01-10 08:01:30 -08:00
filemap.c [PATCH] x86: cache pollution aware __copy_from_user_ll() 2006-06-23 07:42:56 -07:00
filemap.h [PATCH] x86: cache pollution aware __copy_from_user_ll() 2006-06-23 07:42:56 -07:00
fremap.c [PATCH] fix update_mmu_cache in fremap.c 2006-06-23 07:42:52 -07:00
highmem.c BUG_ON() Conversion in mm/highmem.c 2006-04-02 13:47:35 +02:00
hugetlb.c [PATCH] tightening hugetlb strict accounting 2006-06-23 07:42:48 -07:00
internal.h [PATCH] remove set_page_count() outside mm/ 2006-03-22 07:54:02 -08:00
Kconfig [PATCH] Swapless page migration: modify core logic 2006-06-23 07:42:50 -07:00
madvise.c [PATCH] Fix MADV_REMOVE protection checking 2006-04-17 18:22:18 -07:00
Makefile [PATCH] uninline zone helpers 2006-03-27 08:44:48 -08:00
memory_hotplug.c [PATCH] update vm_total_pages at memory hotadd 2006-06-23 07:42:52 -07:00
memory.c [PATCH] add page_mkwrite() vm_operations method 2006-06-23 07:42:51 -07:00
mempolicy.c [PATCH] SELinux: add security_task_movememory calls to mm code 2006-06-23 07:42:54 -07:00
mempool.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial 2006-03-26 09:41:18 -08:00
migrate.c [PATCH] SELinux: add security_task_movememory calls to mm code 2006-06-23 07:42:54 -07:00
mincore.c [PATCH] freepgt: sys_mincore ignore FIRST_USER_PGD_NR 2005-04-19 13:29:20 -07:00
mlock.c [PATCH] move capable() to capability.h 2006-01-11 18:42:13 -08:00
mmap.c [PATCH] add page_mkwrite() vm_operations method 2006-06-23 07:42:51 -07:00
mmzone.c [PATCH] uninline zone helpers 2006-03-27 08:44:48 -08:00
mprotect.c [PATCH] add page_mkwrite() vm_operations method 2006-06-23 07:42:51 -07:00
mremap.c [PATCH] move capable() to capability.h 2006-01-11 18:42:13 -08:00
msync.c The comment describing how MS_ASYNC works in msync.c is confusing 2006-03-24 18:30:53 +01:00
nommu.c [PATCH] overcommit: use totalreserve_pages for nommu 2006-04-11 06:18:32 -07:00
oom_kill.c [PATCH] mm: fix typos in comments in mm/oom_kill.c 2006-06-23 07:42:47 -07:00
page_alloc.c [PATCH] printk() should not be called under zone->lock 2006-06-23 07:42:52 -07:00
page_io.c [PATCH] mm: split page table lock 2005-10-29 21:40:42 -07:00
page-writeback.c [PATCH] writeback: fix range handling 2006-06-23 07:42:49 -07:00
pdflush.c [PATCH] Swap Migration V5: PF_SWAPWRITE to allow writing to swap 2006-01-08 20:12:41 -08:00
prio_tree.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
readahead.c [PATCH] ext3_readdir: use generic readahead 2006-03-23 07:38:09 -08:00
rmap.c [PATCH] More page migration: use migration entries for file pages 2006-06-23 07:42:51 -07:00
shmem.c [PATCH] migration: remove unnecessary PageSwapCache checks 2006-06-23 07:42:46 -07:00
slab.c [PATCH] slab: kmalloc, kzalloc comments cleanup and fix 2006-06-23 07:42:52 -07:00
slob.c [PATCH] mm/slob.c: for_each_possible_cpu(), not NR_CPUS 2006-04-19 09:13:49 -07:00
sparse.c [PATCH] sparsemem: record nid during memory present 2006-06-23 07:42:51 -07:00
swap_state.c BUG_ON() Conversion in mm/swap_state.c 2006-04-01 01:25:12 +02:00
swap.c [PATCH] for_each_possible_cpu: fixes for generic part 2006-03-28 09:16:05 -08:00
swapfile.c [PATCH] swapoff: use atomic_inc_not_zero() on mm_users 2006-06-23 07:42:51 -07:00
thrash.c [PATCH] temporarily disable swap token on memory pressure 2005-11-28 14:42:25 -08:00
tiny-shmem.c [PATCH] do_truncate() call fix in tiny-shmem.c 2006-01-12 09:08:49 -08:00
truncate.c [PATCH] mutex subsystem, semaphore to mutex: VFS, ->i_sem 2006-01-09 15:59:24 -08:00
util.c [PATCH] slab: optimize constant-size kzalloc calls 2006-03-25 08:22:49 -08:00
vmalloc.c [PATCH] mm: introduce remap_vmalloc_range() 2006-06-23 07:42:49 -07:00
vmscan.c [PATCH] initialise total_memory() earlier 2006-06-23 07:42:52 -07:00