Commit Graph

1982 Commits

Author SHA1 Message Date
Konstantin Khlebnikov
78c1d78488 radix-tree: introduce bit-optimized iterator
A series of radix tree cleanups, and usage of them in the core pagecache
code.

Micro-benchmark:

lookup 14 slots (typical page-vector size)
in radix-tree there earch <step> slot filled and tagged
before/after - nsec per full scan through tree

* Intel Sandy Bridge i7-2620M 4Mb L3
New code always faster

* AMD Athlon 6000+ 2x1Mb L2, without L3
New code generally faster,
Minor degradation (marked with "*") for huge sparse trees

* i386 on Sandy Bridge
New code faster for common cases: tagged and dense trees.
Some degradations for non-tagged lookup on sparse trees.

Ideally, there might help __ffs() analog for searching first non-zero
long element in array, gcc sometimes cannot optimize this loop corretly.

Numbers:

CPU: Intel Sandy Bridge i7-2620M 4Mb L3

radix-tree with 1024 slots:

tagged lookup

step  1      before  7156        after  3613
step  2      before  5399        after  2696
step  3      before  4779        after  1928
step  4      before  4456        after  1429
step  5      before  4292        after  1213
step  6      before  4183        after  1052
step  7      before  4157        after  951
step  8      before  4016        after  812
step  9      before  3952        after  851
step  10     before  3937        after  732
step  11     before  4023        after  709
step  12     before  3872        after  657
step  13     before  3892        after  633
step  14     before  3720        after  591
step  15     before  3879        after  578
step  16     before  3561        after  513

normal lookup

step  1      before  4266       after  3301
step  2      before  2695       after  2129
step  3      before  2083       after  1712
step  4      before  1801       after  1534
step  5      before  1628       after  1313
step  6      before  1551       after  1263
step  7      before  1475       after  1185
step  8      before  1432       after  1167
step  9      before  1373       after  1092
step  10     before  1339       after  1134
step  11     before  1292       after  1056
step  12     before  1319       after  1030
step  13     before  1276       after  1004
step  14     before  1256       after  987
step  15     before  1228       after  992
step  16     before  1247       after  999

radix-tree with 1024*1024*128 slots:

tagged lookup

step  1      before  1086102841  after  674196409
step  2      before  816839155   after  498138306
step  7      before  599728907   after  240676762
step  15     before  555729253   after  185219677
step  63     before  606637748   after  128585664
step  64     before  608384432   after  102945089
step  65     before  596987114   after  123996019
step  128    before  304459225   after  56783056
step  256    before  158846855   after  31232481
step  512    before  86085652    after  18950595
step  12345  before  6517189     after  1674057

normal lookup

step  1      before  626064869  after  544418266
step  2      before  418809975  after  336321473
step  7      before  242303598  after  207755560
step  15     before  208380563  after  176496355
step  63     before  186854206  after  167283638
step  64     before  176188060  after  170143976
step  65     before  185139608  after  167487116
step  128    before  88181865   after  86913490
step  256    before  45733628   after  45143534
step  512    before  24506038   after  23859036
step  12345  before  2177425    after  2018662

* AMD Athlon 6000+ 2x1Mb L2, without L3

radix-tree with 1024 slots:

tag-lookup

step  1      before  8164        after  5379
step  2      before  5818        after  5581
step  3      before  4959        after  4213
step  4      before  4371        after  3386
step  5      before  4204        after  2997
step  6      before  4950        after  2744
step  7      before  4598        after  2480
step  8      before  4251        after  2288
step  9      before  4262        after  2243
step  10     before  4175        after  2131
step  11     before  3999        after  2024
step  12     before  3979        after  1994
step  13     before  3842        after  1929
step  14     before  3750        after  1810
step  15     before  3735        after  1810
step  16     before  3532        after  1660

normal-lookup

step  1      before  7875        after  5847
step  2      before  4808        after  4071
step  3      before  4073        after  3462
step  4      before  3677        after  3074
step  5      before  4308        after  2978
step  6      before  3911        after  3807
step  7      before  3635        after  3522
step  8      before  3313        after  3202
step  9      before  3280        after  3257
step  10     before  3166        after  3083
step  11     before  3066        after  3026
step  12     before  2985        after  2982
step  13     before  2925        after  2924
step  14     before  2834        after  2808
step  15     before  2805        after  2803
step  16     before  2647        after  2622

radix-tree with 1024*1024*128 slots:

tag-lookup

step  1      before  1288059720  after  951736580
step  2      before  961292300   after  884212140
step  7      before  768905140   after  547267580
step  15     before  771319480   after  456550640
step  63     before  504847640   after  242704304
step  64     before  392484800   after  177920786
step  65     before  491162160   after  246895264
step  128    before  208084064   after  97348392
step  256    before  112401035   after  51408126
step  512    before  75825834    after  29145070
step  12345  before  5603166     after  2847330

normal-lookup

step  1      before  1025677120  after  861375100
step  2      before  647220080   after  572258540
step  7      before  505518960   after  484041813
step  15     before  430483053   after  444815320	*
step  63     before  388113453   after  404250546	*
step  64     before  374154666   after  396027440	*
step  65     before  381423973   after  396704853	*
step  128    before  190078700   after  202619384	*
step  256    before  100886756   after  102829108	*
step  512    before  64074505    after  56158720
step  12345  before  4237289     after  4422299		*

* i686 on Sandy bridge

radix-tree with 1024 slots:

tagged lookup

step  1      before  7990        after  4019
step  2      before  5698        after  2897
step  3      before  5013        after  2475
step  4      before  4630        after  1721
step  5      before  4346        after  1759
step  6      before  4299        after  1556
step  7      before  4098        after  1513
step  8      before  4115        after  1222
step  9      before  3983        after  1390
step  10     before  4077        after  1207
step  11     before  3921        after  1231
step  12     before  3894        after  1116
step  13     before  3840        after  1147
step  14     before  3799        after  1090
step  15     before  3797        after  1059
step  16     before  3783        after  745

normal lookup

step  1      before  5103       after  3499
step  2      before  3299       after  2550
step  3      before  2489       after  2370
step  4      before  2034       after  2302		*
step  5      before  1846       after  2268		*
step  6      before  1752       after  2249		*
step  7      before  1679       after  2164		*
step  8      before  1627       after  2153		*
step  9      before  1542       after  2095		*
step  10     before  1479       after  2109		*
step  11     before  1469       after  2009		*
step  12     before  1445       after  2039		*
step  13     before  1411       after  2013		*
step  14     before  1374       after  2046		*
step  15     before  1340       after  1975		*
step  16     before  1331       after  2000		*

radix-tree with 1024*1024*128 slots:

tagged lookup

step  1      before  1225865377  after  667153553
step  2      before  842427423   after  471533007
step  7      before  609296153   after  276260116
step  15     before  544232060   after  226859105
step  63     before  519209199   after  141343043
step  64     before  588980279   after  141951339
step  65     before  521099710   after  138282060
step  128    before  298476778   after  83390628
step  256    before  149358342   after  43602609
step  512    before  76994713    after  22911077
step  12345  before  5328666     after  1472111

normal lookup

step  1      before  819284564  after  533635310
step  2      before  512421605  after  364956155
step  7      before  271443305  after  305721345	*
step  15     before  223591630  after  273960216	*
step  63     before  190320247  after  217770207	*
step  64     before  178538168  after  267411372	*
step  65     before  186400423  after  215347937	*
step  128    before  88106045   after  140540612	*
step  256    before  44812420   after  70660377		*
step  512    before  24435438   after  36328275		*
step  12345  before  2123924    after  2148062		*

bloat-o-meter delta for this patchset + patchset with related shmem cleanups

bloat-o-meter: x86_64

add/remove: 4/3 grow/shrink: 5/6 up/down: 928/-939 (-11)
function                                     old     new   delta
radix_tree_next_chunk                          -     499    +499
shmem_unuse                                  428     554    +126
shmem_radix_tree_replace                     131     227     +96
find_get_pages_tag                           354     419     +65
find_get_pages_contig                        345     407     +62
find_get_pages                               362     396     +34
__kstrtab_radix_tree_next_chunk                -      22     +22
__ksymtab_radix_tree_next_chunk                -      16     +16
__kcrctab_radix_tree_next_chunk                -       8      +8
radix_tree_gang_lookup_slot                  204     203      -1
static.shmem_xattr_set                       384     381      -3
radix_tree_gang_lookup_tag_slot              208     191     -17
radix_tree_gang_lookup                       231     187     -44
radix_tree_gang_lookup_tag                   247     199     -48
shmem_unlock_mapping                         278     190     -88
__lookup                                     217       -    -217
__lookup_tag                                 242       -    -242
radix_tree_locate_item                       279       -    -279

bloat-o-meter: i386

add/remove: 3/3 grow/shrink: 8/9 up/down: 1075/-1275 (-200)
function                                     old     new   delta
radix_tree_next_chunk                          -     757    +757
shmem_unuse                                  352     449     +97
find_get_pages_contig                        269     322     +53
shmem_radix_tree_replace                     113     154     +41
find_get_pages_tag                           277     318     +41
dcache_dir_lseek                             426     458     +32
__kstrtab_radix_tree_next_chunk                -      22     +22
vc_do_resize                                 968     977      +9
snd_pcm_lib_read1                            725     733      +8
__ksymtab_radix_tree_next_chunk                -       8      +8
netlbl_cipsov4_list                         1120    1127      +7
find_get_pages                               293     291      -2
new_slab                                     467     459      -8
bitfill_unaligned_rev                        425     417      -8
radix_tree_gang_lookup_tag_slot              177     146     -31
blk_dump_cmd                                 267     229     -38
radix_tree_gang_lookup_slot                  212     134     -78
shmem_unlock_mapping                         221     128     -93
radix_tree_gang_lookup_tag                   275     162    -113
radix_tree_gang_lookup                       255     126    -129
__lookup                                     227       -    -227
__lookup_tag                                 271       -    -271
radix_tree_locate_item                       277       -    -277

This patch:

Implement a clean, simple and effective radix-tree iteration routine.

Iterating divided into two phases:
* lookup next chunk in radix-tree leaf node
* iterating through slots in this chunk

Main iterator function radix_tree_next_chunk() returns pointer to first
slot, and stores in the struct radix_tree_iter index of next-to-last slot.
 For tagged-iterating it also constuct bitmask of tags for retunted chunk.
 All additional logic implemented as static-inline functions and macroses.

Also adds radix_tree_find_next_bit() static-inline variant of
find_next_bit() optimized for small constant size arrays, because
find_next_bit() too heavy for searching in an array with one/two long
elements.

[akpm@linux-foundation.org: rework comments a bit]
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Tested-by: Hugh Dickins <hughd@google.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-28 17:14:37 -07:00
Srivatsa S. Bhat
38b93780a5 lib/cpumask.c: remove __any_online_cpu()
__any_online_cpu() is not optimal and also unnecessary.  So, replace its
use by faster cpumask_* operations.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Venkatesh Pallipadi <venki@google.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-28 17:14:35 -07:00
Linus Torvalds
fa453a625d Merge branch 'for-linus-3.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml
Pull UML changes from Richard Weinberger:
 "Mostly bug fixes and cleanups"

* 'for-linus-3.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml: (35 commits)
  um: Update defconfig
  um: Switch to large mcmodel on x86_64
  MTD: Relax dependencies
  um: Wire CONFIG_GENERIC_IO up
  um: Serve io_remap_pfn_range()
  Introduce CONFIG_GENERIC_IO
  um: allow SUBARCH=x86
  um: most of the SUBARCH uses can be killed
  um: deadlock in line_write_interrupt()
  um: don't bother trying to rebuild CHECKFLAGS for USER_OBJS
  um: use the right ifdef around exports in user_syms.c
  um: a bunch of headers can be killed by using generic-y
  um: ptrace-generic.h doesn't need user.h
  um: kill HOST_TASK_PID
  um: remove pointless include of asm/fixmap.h from asm/pgtable.h
  um: asm-offsets.h might as well come from underlying arch...
  um: merge processor_{32,64}.h a bit...
  um: switch close_chan() to struct line
  um: race fix: initialize delayed_work *before* registering IRQ
  um: line->have_irq is never checked...
  ...
2012-03-27 18:29:53 -07:00
Richard Weinberger
087fafd152 Introduce CONFIG_GENERIC_IO
There are situations where CONFIG_HAS_IOMEM is too restrictive.
For example CONFIG_MTD_NAND_NANDSIM depends on CONFIG_HAS_IOMEM
but it works perfectly fine if an architecture without io memory
just includes asm-generic/io.h or implements everything defined in it.
UML is such a corner case.

Signed-off-by: Richard Weinberger <richard@nod.at>
2012-03-25 00:29:56 +01:00
Linus Torvalds
11bcb32848 The following text was taken from the original review request:
"[PATCH 0/3] RFC - module.h usage cleanups in fs/ and lib/"
 		https://lkml.org/lkml/2012/2/29/589
 --
 
 Fix up files in fs/ and lib/ dirs to only use module.h if they really
 need it.
 
 These are trivial in scope vs. the work done previously.  We now have
 things where any few remaining cleanups can be farmed out to arch or
 subsystem maintainers, and I have done so when possible.  What is
 remaining here represents the bits that don't clearly lie within a
 single arch/subsystem boundary, like the fs dir and the lib dir.
 
 Some duplicate includes arising from overlapping fixes from
 independent subsystem maintainer submissions are also quashed.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJPbNw3AAoJEOvOhAQsB9HWA7wQALrsQ6V6Z+B3KsvSoD5kFnpZ
 Y+4uggs+GdUdWmtRrZnTBp896gGuUgBxc3syA2XWd7Oqi49+c5c1m0cFxKyVdIHm
 fB+jmxS69soADtHR3cXmxcQshrUzUf2rTn8frcw4O/BmJuplv4xT9uPQzwGaRSZT
 gomQsQ1bGnkwjO2jfS8f/N5Mjr8u/z0WF7TTOTUSq+Cv3BervPaSPF1Ea6J8oo+N
 4+/n8RlU1HWiI4inrgrFPN6UHmE45BAL2xGbB47LgooHJW8P5kAnU+vxGScaoy1Q
 JKX9WKT3VCiwR3VOPa86iLKP3Y8a3VlhyGn+yzzcYkGX/n0tbT7aoRhQm21sGIv0
 DoeXWe7aiiY8cEW69G6GIfRPFl+Zh81m1Whbu7IZT/sV3asx6jWmEXE8CgCfeDt5
 mNQk9D4Irf6+rmCSbeSVC4L0eFfLxNFouNyh2aus/q+gIjKNKYwZQryHrodK4wpv
 UgMKSTZfPrTAWay2gCNWNqo3Zs8e1LDqkftetxeU3jx2kTuaNzBl4Y7mhsX7sLYe
 MsFX3JUJ2pn6XWbgqcY+bdr/mzgsCrjzqdf15MTUzEc5SIfVF+XpNNZN1ITwl6UA
 /ZH9keBu1mEdCoPU5W74kYwx4p35hIeWJGfc0MRp07ruf941F+SBgMD11B0+06f0
 pN0DcITTkD16+sS4x1cB
 =Z4w0
 -----END PGP SIGNATURE-----

Merge tag 'module-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux

Pull cleanup of fs/ and lib/ users of module.h from Paul Gortmaker:
 "Fix up files in fs/ and lib/ dirs to only use module.h if they really
  need it.

  These are trivial in scope vs the work done previously.  We now have
  things where any few remaining cleanups can be farmed out to arch or
  subsystem maintainers, and I have done so when possible.  What is
  remaining here represents the bits that don't clearly lie within a
  single arch/subsystem boundary, like the fs dir and the lib dir.

  Some duplicate includes arising from overlapping fixes from
  independent subsystem maintainer submissions are also quashed."

Fix up trivial conflicts due to clashes with other include file cleanups
(including some due to the previous bug.h cleanup pull).

* tag 'module-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux:
  lib: reduce the use of module.h wherever possible
  fs: reduce the use of module.h wherever possible
  includecheck: delete any duplicate instances of module.h
2012-03-24 10:24:31 -07:00
Linus Torvalds
ed2d265d12 The following text was taken from the original review request:
"[RFC - PATCH 0/7] consolidation of BUG support code."
 		https://lkml.org/lkml/2012/1/26/525
 --
 
 The changes shown here are to unify linux's BUG support under
 the one <linux/bug.h> file.  Due to historical reasons, we have
 some BUG code in bug.h and some in kernel.h -- i.e. the support for
 BUILD_BUG in linux/kernel.h predates the addition of linux/bug.h,
 but old code in kernel.h wasn't moved to bug.h at that time.  As
 a band-aid, kernel.h was including <asm/bug.h> to pseudo link them.
 
 This has caused confusion[1] and general yuck/WTF[2] reactions.
 Here is an example that violates the principle of least surprise:
 
       CC      lib/string.o
       lib/string.c: In function 'strlcat':
       lib/string.c:225:2: error: implicit declaration of function 'BUILD_BUG_ON'
       make[2]: *** [lib/string.o] Error 1
       $
       $ grep linux/bug.h lib/string.c
       #include <linux/bug.h>
       $
 
 We've included <linux/bug.h> for the BUG infrastructure and yet we
 still get a compile fail!  [We've not kernel.h for BUILD_BUG_ON.]
 Ugh - very confusing for someone who is new to kernel development.
 
 With the above in mind, the goals of this changeset are:
 
 1) find and fix any include/*.h files that were relying on the
    implicit presence of BUG code.
 2) find and fix any C files that were consuming kernel.h and
    hence relying on implicitly getting some/all BUG code.
 3) Move the BUG related code living in kernel.h to <linux/bug.h>
 4) remove the asm/bug.h from kernel.h to finally break the chain.
 
 During development, the order was more like 3-4, build-test, 1-2.
 But to ensure that git history for bisect doesn't get needless
 build failures introduced, the commits have been reorderd to fix
 the problem areas in advance.
 
 [1]  https://lkml.org/lkml/2012/1/3/90
 [2]  https://lkml.org/lkml/2012/1/17/414
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJPbNwpAAoJEOvOhAQsB9HWrqYP/A0t9VB0nK6e42F0OR2P14MZ
 GJFtf1B++wwioIrx+KSWSRfSur1C5FKhDbxLR3I/pvkAYl4+T4JvRdMG6xJwxyip
 CC1kVQQNDjWVVqzjz2x6rYkOffx6dUlw/ERyIyk+OzP+1HzRIsIrugMqbzGLlX0X
 y0v2Tbd0G6xg1DV8lcRdp95eIzcGuUvdb2iY2LGadWZczEOeSXx64Jz3QCFxg3aL
 LFU4oovsg8Nb7MRJmqDvHK/oQf5vaTm9WSrS0pvVte0msSQRn8LStYdWC0G9BPCS
 GwL86h/eLXlUXQlC5GpgWg1QQt5i2QpjBFcVBIG0IT5SgEPMx+gXyiqZva2KwbHu
 LKicjKtfnzPitQnyEV/N6JyV1fb1U6/MsB7ebU5nCCzt9Gr7MYbjZ44peNeprAtu
 HMvJ/BNnRr4Ha6nPQNu952AdASPKkxmeXFUwBL1zUbLkOX/bK/vy1ujlcdkFxCD7
 fP3t7hghYa737IHk0ehUOhrE4H67hvxTSCKioLUAy/YeN1IcfH/iOQiCBQVLWmoS
 AqYV6ou9cqgdYoyila2UeAqegb+8xyubPIHt+lebcaKxs5aGsTg+r3vq5juMDAPs
 iwSVYUDcIw9dHer1lJfo7QCy3QUTRDTxh+LB9VlHXQICgeCK02sLBOi9hbEr4/H8
 Ko9g8J3BMxcMkXLHT9ud
 =PYQT
 -----END PGP SIGNATURE-----

Merge tag 'bug-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux

Pull <linux/bug.h> cleanup from Paul Gortmaker:
 "The changes shown here are to unify linux's BUG support under the one
  <linux/bug.h> file.  Due to historical reasons, we have some BUG code
  in bug.h and some in kernel.h -- i.e.  the support for BUILD_BUG in
  linux/kernel.h predates the addition of linux/bug.h, but old code in
  kernel.h wasn't moved to bug.h at that time.  As a band-aid, kernel.h
  was including <asm/bug.h> to pseudo link them.

  This has caused confusion[1] and general yuck/WTF[2] reactions.  Here
  is an example that violates the principle of least surprise:

      CC      lib/string.o
      lib/string.c: In function 'strlcat':
      lib/string.c:225:2: error: implicit declaration of function 'BUILD_BUG_ON'
      make[2]: *** [lib/string.o] Error 1
      $
      $ grep linux/bug.h lib/string.c
      #include <linux/bug.h>
      $

  We've included <linux/bug.h> for the BUG infrastructure and yet we
  still get a compile fail! [We've not kernel.h for BUILD_BUG_ON.] Ugh -
  very confusing for someone who is new to kernel development.

  With the above in mind, the goals of this changeset are:

  1) find and fix any include/*.h files that were relying on the
     implicit presence of BUG code.
  2) find and fix any C files that were consuming kernel.h and hence
     relying on implicitly getting some/all BUG code.
  3) Move the BUG related code living in kernel.h to <linux/bug.h>
  4) remove the asm/bug.h from kernel.h to finally break the chain.

  During development, the order was more like 3-4, build-test, 1-2.  But
  to ensure that git history for bisect doesn't get needless build
  failures introduced, the commits have been reorderd to fix the problem
  areas in advance.

	[1]  https://lkml.org/lkml/2012/1/3/90
	[2]  https://lkml.org/lkml/2012/1/17/414"

Fix up conflicts (new radeon file, reiserfs header cleanups) as per Paul
and linux-next.

* tag 'bug-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux:
  kernel.h: doesn't explicitly use bug.h, so don't include it.
  bug: consolidate BUILD_BUG_ON with other bug code
  BUG: headers with BUG/BUG_ON etc. need linux/bug.h
  bug.h: add include of it to various implicit C users
  lib: fix implicit users of kernel.h for TAINT_WARN
  spinlock: macroize assert_spin_locked to avoid bug.h dependency
  x86: relocate get/set debugreg fcns to include/asm/debugreg.
2012-03-24 10:08:39 -07:00
Linus Torvalds
f1d38e423a Merge git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/sysctl
Pull sysctl updates from Eric Biederman:

 - Rewrite of sysctl for speed and clarity.

   Insert/remove/Lookup in sysctl are all now O(NlogN) operations, and
   are no longer bottlenecks in the process of adding and removing
   network devices.

   sysctl is now focused on being a filesystem instead of system call
   and the code can all be found in fs/proc/proc_sysctl.c.  Hopefully
   this means the code is now approachable.

   Much thanks is owed to Lucian Grinjincu for keeping at this until
   something was found that was usable.

 - The recent proc_sys_poll oops found by the fuzzer during hibernation
   is fixed.

* git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/sysctl: (36 commits)
  sysctl: protect poll() in entries that may go away
  sysctl: Don't call sysctl_follow_link unless we are a link.
  sysctl: Comments to make the code clearer.
  sysctl: Correct error return from get_subdir
  sysctl: An easier to read version of find_subdir
  sysctl: fix memset parameters in setup_sysctl_set()
  sysctl: remove an unused variable
  sysctl: Add register_sysctl for normal sysctl users
  sysctl: Index sysctl directories with rbtrees.
  sysctl: Make the header lists per directory.
  sysctl: Move sysctl_check_dups into insert_header
  sysctl: Modify __register_sysctl_paths to take a set instead of a root and an nsproxy
  sysctl: Replace root_list with links between sysctl_table_sets.
  sysctl: Add sysctl_print_dir and use it in get_subdir
  sysctl: Stop requiring explicit management of sysctl directories
  sysctl: Add a root pointer to ctl_table_set
  sysctl: Rewrite proc_sys_readdir in terms of first_entry and next_entry
  sysctl: Rewrite proc_sys_lookup introducing find_entry and lookup_entry.
  sysctl: Normalize the root_table data structure.
  sysctl: Factor out insert_header and erase_header
  ...
2012-03-23 18:08:58 -07:00
KAMEZAWA Hiroyuki
1ac101a5d6 procfs: add num_to_str() to speed up /proc/stat
== stat_check.py
num = 0
with open("/proc/stat") as f:
        while num < 1000 :
                data = f.read()
                f.seek(0, 0)
                num = num + 1
==

perf shows

    20.39%  stat_check.py  [kernel.kallsyms]    [k] format_decode
    13.41%  stat_check.py  [kernel.kallsyms]    [k] number
    12.61%  stat_check.py  [kernel.kallsyms]    [k] vsnprintf
    10.85%  stat_check.py  [kernel.kallsyms]    [k] memcpy
     4.85%  stat_check.py  [kernel.kallsyms]    [k] radix_tree_lookup
     4.43%  stat_check.py  [kernel.kallsyms]    [k] seq_printf

This patch removes most of calls to vsnprintf() by adding num_to_str()
and seq_print_decimal_ull(), which prints decimal numbers without rich
functions provided by printf().

On my 8cpu box.
== Before patch ==
[root@bluextal test]# time ./stat_check.py

real    0m0.150s
user    0m0.026s
sys     0m0.121s

== After patch ==
[root@bluextal test]# time ./stat_check.py

real    0m0.055s
user    0m0.022s
sys     0m0.030s

[akpm@linux-foundation.org: remove incorrect comment, use less statck in num_to_str(), move comment from .h to .c, simplify seq_put_decimal_ull()]
[andrea@betterlinux.com: avoid breaking the ABI in /proc/stat]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrea Righi <andrea@betterlinux.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Glauber Costa <glommer@parallels.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Turner <pjt@google.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-23 16:58:42 -07:00
Darrick J. Wong
5cde7656d0 crc32: select an algorithm via Kconfig
Allow the kernel builder to choose a crc32* algorithm for the kernel.

Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Cc: Bob Pearson <rpearson@systemfabricworks.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-23 16:58:38 -07:00
Darrick J. Wong
577eba9e22 crc32: add self-test code for crc32c
Add self-test code for crc32c.

Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Cc: Bob Pearson <rpearson@systemfabricworks.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-23 16:58:38 -07:00
Darrick J. Wong
46c5801eaf crc32: bolt on crc32c
Reuse the existing crc32 code to stamp out a crc32c implementation.

Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Bob Pearson <rpearson@systemfabricworks.com>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-23 16:58:38 -07:00
Bob Pearson
78dff41897 crc32: add note about this patchset to crc32.c
Add a comment at the top of crc32.c

[djwong@us.ibm.com: Minor changelog tweaks]
Signed-off-by: Bob Pearson <rpearson@systemfabricworks.com>
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-23 16:58:38 -07:00
Bob Pearson
0292c497b6 crc32: optimize loop counter for x86
Add two changes that improve the performance of x86 systems

1. replace main loop with incrementing counter this change improves
   the performance of the selftest by about 5-6% on Nehalem CPUs.  The
   apparent reason is that the compiler can use the loop index to perform
   an indexed memory access.  This is reported to make the performance of
   PowerPC CPUs to get worse.

2. replace the rem_len loop with incrementing counter this change
   improves the performance of the selftest, which has more than the usual
   number of occurances, by about 1-2% on x86 CPUs.  In actual work loads
   the length is most often a multiple of 4 bytes and this code does not
   get executed as often if at all.  Again this change is reported to make
   the performance of PowerPC get worse.

[djwong@us.ibm.com: Minor changelog tweaks]
Signed-off-by: Bob Pearson <rpearson@systemfabricworks.com>
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-23 16:58:37 -07:00
Bob Pearson
324eb0f17d crc32: add slice-by-8 algorithm to existing code
Add slicing-by-8 algorithm to the existing slicing-by-4 algorithm.  This
consists of:

- extend largest BITS size from 32 to 64
- extend tables from tab[4][256] to up to tab[8][256]
- Add code for inner loop.

[djwong@us.ibm.com: Minor changelog tweaks]
Signed-off-by: Bob Pearson <rpearson@systemfabricworks.com>
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-23 16:58:37 -07:00
Bob Pearson
9a1dbf6a29 crc32: make CRC_*_BITS definition correspond to actual bit counts
crc32.c provides a choice of one of several algorithms for computing the
LSB and LSB versions of the CRC32 checksum based on the parameters
CRC_LE_BITS and CRC_BE_BITS.

In the original version the values 1, 2, 4 and 8 respectively selected
versions of the alrogithm that computed the crc 1, 2, 4 and 32 bits as a
time.

This patch series adds a new version that computes the CRC 64 bits at a
time.  To make things easier to understand the parameter has been
reinterpreted to actually stand for the number of bits processed in each
step of the algorithm so that the old value 8 has been replaced with the
value 32.

This also allows us to add in a widely used crc algorithm that computes
the crc 8 bits at a time called the Sarwate algorithm.

[djwong@us.ibm.com: Minor changelog tweaks]
Signed-off-by: Bob Pearson <rpearson@systemfabricworks.com>
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-23 16:58:37 -07:00
Bob Pearson
ce4320ddda crc32: fix mixing of endian-specific types
crc32.c in its original version freely mixed u32, __le32 and __be32 types
which caused warnings from sparse with __CHECK_ENDIAN__.  This patch fixes
these by forcing the types to u32.

[djwong@us.ibm.com: Minor changelog tweaks]
Signed-off-by: Bob Pearson <rpearson@systemfabricworks.com>
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-23 16:58:37 -07:00
Bob Pearson
60e58d5c9d crc32: miscellaneous cleanups
Misc cleanup of lib/crc32.c and related files.

- remove unnecessary header files.

- straighten out some convoluted ifdef's

- rewrite some references to 2 dimensional arrays as 1 dimensional
  arrays to make them correct.  I.e.  replace tab[i] with tab[0][i].

- a few trivial whitespace changes

- fix a warning in gen_crc32tables.c caused by a mismatch in the type of
  the pointer passed to output table.  Since the table is only used at
  kernel compile time, it is simpler to make the table big enough to hold
  the largest column size used.  One cannot make the column size smaller
  in output_table because it has to be used by both the le and be tables
  and they can have different column sizes.

[djwong@us.ibm.com: Minor changelog tweaks]
Signed-off-by: Bob Pearson <rpearson@systemfabricworks.com>
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-23 16:58:37 -07:00
Bob Pearson
3863ef31dc crc32: simplify unit test code
Replace the unit test provided in crc32.c, which doesn't have a makefile
and doesn't compile with current headers, with a simpler self test
routine that also gives a measure of performance and runs at module init
time.  The self test option can be enabled through a configuration
option CONFIG_CRC32_SELFTEST.

The test stresses the pre and post loops and is thus not very realistic
since actual uses will likely have addresses and lengths that are at
least 4 byte aligned.  However, the main loop is long enough so that the
performance is dominated by that loop.

The expected values for crc32_le and crc32_be were generated with the
original version of crc32.c using CRC_BITS_LE = 8 and CRC_BITS_BE = 8.
These values were then used to check all the values of the BITS
parameters in both the original and new versions.

The performance results show some variability from run to run in spite
of attempts to both warm the cache and reduce the amount of OS noise by
limiting interrutps during the test.  To get comparable results and to
analyse options wrt performance the best time reported over a small
sample of runs has been taken.

[djwong@us.ibm.com: Minor changelog tweaks]
Signed-off-by: Bob Pearson <rpearson@systemfabricworks.com>
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-23 16:58:37 -07:00
Bob Pearson
fbedceb100 crc32: move long comment about crc32 fundamentals to Documentation/
Move a long comment from lib/crc32.c to Documentation/crc32.txt where it
will more likely get read.

Edited the resulting document to add an explanation of the slicing-by-n
algorithm.

[djwong@us.ibm.com: minor changelog tweaks]
[akpm@linux-foundation.org: fix typo, per George]
Signed-off-by: George Spelvin <linux@horizon.com>
Signed-off-by: Bob Pearson <rpearson@systemfabricworks.com>
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-23 16:58:37 -07:00
Bob Pearson
e30c7a8fcf crc32: remove two instances of trailing whitespaces
This patchset (re)uses Bob Pearson's crc32 slice-by-8 code to stamp out
a software crc32c implementation.  It removes the crc32c implementation
in crypto/ in favor of using the stamped-out one in lib/.  There is also
a change to Kconfig so that the kernel builder can pick an
implementation best suited for the hardware.

The motivation for this patchset is that I am working on adding full
metadata checksumming to ext4.  As far as performance impact of adding
checksumming goes, I see nearly no change with a standard mail server
ffsb simulation.  On a test that involves only file creation and
deletion and extent tree writes, I see a drop of about 50 pcercent with
the current kernel crc32c implementation; this improves to a drop of
about 20 percent with the enclosed crc32c code.

When metadata is usually a small fraction of total IO, this new
implementation doesn't help much because metadata is usually a small
fraction of total IO.  However, when we are doing IO that is almost all
metadata (such as rm -rf'ing a tree), then this patch speeds up the
operation substantially.

Incidentally, given that iscsi, sctp, and btrfs also use crc32c, this
patchset should improve their speed as well.  I have not yet quantified
that, however.  This latest submission combines Bob's patches from late
August 2011 with mine so that they can be one coherent patch set.
Please excuse my inability to combine some of the patches; I've been
advised to leave Bob's patches alone and build atop them instead.  :/

Since the last posting, I've also collected some crc32c test results on
a bunch of different x86/powerpc/sparc platforms.  The results can be
viewed here: http://goo.gl/sgt3i ; the "crc32-kern-le" and "crc32c"
columns describe the performance of the kernel's current crc32 and
crc32c software implementations.  The "crc32c-by8-le" column shows
crc32c performance with this patchset applied.  I expect crc32
performance to be roughly the same.

The two _boost columns at the right side of the spreadsheet shows how much
faster the new implementation is over the old one.  As you can see, crc32
rises substantially, and crc32c experiences a huge increase.

This patch:

- remove trailing whitespace from lib/crc32.c
- remove trailing whitespace from lib/crc32defs.h

[djwong@us.ibm.com: changelog tweaks]
Signed-off-by: Bob Pearson <rpearson@systemfabricworks.com>
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-23 16:58:37 -07:00
Xiao Guangrong
97e834c504 prio_tree: introduce prio_set_parent()
Introduce prio_set_parent() to abstract the operation which is used to
attach the node to its parent.

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-23 16:58:36 -07:00
Xiao Guangrong
742245d5c2 prio_tree: simplify prio_tree_expand()
In current code, the deleted-node is recorded from first to last,
actually, we can directly attach these node on 'node' we will insert as
the left child, it can let the code more readable.

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-23 16:58:36 -07:00
Xiao Guangrong
f35368dd1c prio_tree: cleanup prio_tree_left()/prio_tree_right()
Introduce iter_walk_down()/iter_walk_up() to remove the common code
between prio_tree_left() and prio_tree_right().

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-23 16:58:36 -07:00
Xiao Guangrong
f42240d729 prio_tree: remove unnecessary code in prio_tree_replace
Remove the code since 'node' has already been initialized in the begin of
the function

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-23 16:58:36 -07:00
Akinobu Mita
f43804bf5f string: memchr_inv() speed improvements
- Generate a 64-bit pattern more efficiently

memchr_inv needs to generate a 64-bit pattern filled with a target
character.  The operation can be done by more efficient way.

- Don't call the slow check_bytes() if the memory area is 64-bit aligned

memchr_inv compares contiguous 64-bit words with the 64-bit pattern as
much as possible.  The outside of the region is checked by check_bytes()
that scans for each byte.  Unfortunately, the first 64-bit word is
unexpectedly scanned by check_bytes() even if the memory area is aligned
to a 64-bit boundary.

Both changes were originally suggested by Eric Dumazet.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-23 16:58:35 -07:00
Cong Wang
d314d74c69 nmi watchdog: do not use cpp symbol in Kconfig
ARCH_HAS_NMI_WATCHDOG is a macro defined by arch, but config
HARDLOCKUP_DETECTOR depends on it.  This is wrong, ARCH_HAS_NMI_WATCHDOG
has to be a Kconfig config, and arch's need it should select it
explicitly.

Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com>
Acked-by: Don Zickus <dzickus@redhat.com>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Cc: David Howells <dhowells@redhat.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-23 16:58:31 -07:00
Hugh Dickins
9f7de8275b idr: make idr_get_next() good for rcu_read_lock()
Make one small adjustment to idr_get_next(): take the height from the top
layer (stable under RCU) instead of from the root (unprotected by RCU), as
idr_find() does: so that it can be used with RCU locking.  Copied comment
on RCU locking from idr_find().

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-21 17:55:01 -07:00
Linus Torvalds
9f3938346a Merge branch 'kmap_atomic' of git://github.com/congwang/linux
Pull kmap_atomic cleanup from Cong Wang.

It's been in -next for a long time, and it gets rid of the (no longer
used) second argument to k[un]map_atomic().

Fix up a few trivial conflicts in various drivers, and do an "evil
merge" to catch some new uses that have come in since Cong's tree.

* 'kmap_atomic' of git://github.com/congwang/linux: (59 commits)
  feature-removal-schedule.txt: schedule the deprecated form of kmap_atomic() for removal
  highmem: kill all __kmap_atomic() [swarren@nvidia.com: highmem: Fix ARM build break due to __kmap_atomic rename]
  drbd: remove the second argument of k[un]map_atomic()
  zcache: remove the second argument of k[un]map_atomic()
  gma500: remove the second argument of k[un]map_atomic()
  dm: remove the second argument of k[un]map_atomic()
  tomoyo: remove the second argument of k[un]map_atomic()
  sunrpc: remove the second argument of k[un]map_atomic()
  rds: remove the second argument of k[un]map_atomic()
  net: remove the second argument of k[un]map_atomic()
  mm: remove the second argument of k[un]map_atomic()
  lib: remove the second argument of k[un]map_atomic()
  power: remove the second argument of k[un]map_atomic()
  kdb: remove the second argument of k[un]map_atomic()
  udf: remove the second argument of k[un]map_atomic()
  ubifs: remove the second argument of k[un]map_atomic()
  squashfs: remove the second argument of k[un]map_atomic()
  reiserfs: remove the second argument of k[un]map_atomic()
  ocfs2: remove the second argument of k[un]map_atomic()
  ntfs: remove the second argument of k[un]map_atomic()
  ...
2012-03-21 09:40:26 -07:00
Linus Torvalds
4a52246302 driver core merge for 3.4-rc1
Here's the big driver core merge for 3.4-rc1.
 
 Lots of various things here, sysfs fixes/tweaks (with the nlink breakage
 reverted), dynamic debugging updates, w1 drivers, hyperv driver updates,
 and a variety of other bits and pieces, full information in the
 shortlog.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.18 (GNU/Linux)
 
 iEYEABECAAYFAk9neCsACgkQMUfUDdst+ylyQwCfY2eizvzw5HhjQs8gOiBRDADe
 yrgAnj1Zan2QkoCnQIFJNAoxqNX9yAhd
 =biH6
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core patches for 3.4-rc1 from Greg KH:
 "Here's the big driver core merge for 3.4-rc1.

  Lots of various things here, sysfs fixes/tweaks (with the nlink
  breakage reverted), dynamic debugging updates, w1 drivers, hyperv
  driver updates, and a variety of other bits and pieces, full
  information in the shortlog."

* tag 'driver-core-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (78 commits)
  Tools: hv: Support enumeration from all the pools
  Tools: hv: Fully support the new KVP verbs in the user level daemon
  Drivers: hv: Support the newly introduced KVP messages in the driver
  Drivers: hv: Add new message types to enhance KVP
  regulator: Support driver probe deferral
  Revert "sysfs: Kill nlink counting."
  uevent: send events in correct order according to seqnum (v3)
  driver core: minor comment formatting cleanups
  driver core: move the deferred probe pointer into the private area
  drivercore: Add driver probe deferral mechanism
  DS2781 Maxim Stand-Alone Fuel Gauge battery and w1 slave drivers
  w1_bq27000: Only one thread can access the bq27000 at a time.
  w1_bq27000 - remove w1_bq27000_write
  w1_bq27000: remove unnecessary NULL test.
  sysfs: Fix memory leak in sysfs_sd_setsecdata().
  intel_idle: Revert change of auto_demotion_disable_flags for Nehalem
  w1: Fix w1_bq27000
  driver-core: documentation: fix up Greg's email address
  powernow-k6: Really enable auto-loading
  powernow-k7: Fix CPU family number
  ...
2012-03-20 11:16:20 -07:00
Linus Torvalds
9c2b957db1 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf events changes for v3.4 from Ingo Molnar:

 - New "hardware based branch profiling" feature both on the kernel and
   the tooling side, on CPUs that support it.  (modern x86 Intel CPUs
   with the 'LBR' hardware feature currently.)

   This new feature is basically a sophisticated 'magnifying glass' for
   branch execution - something that is pretty difficult to extract from
   regular, function histogram centric profiles.

   The simplest mode is activated via 'perf record -b', and the result
   looks like this in perf report:

	$ perf record -b any_call,u -e cycles:u branchy

	$ perf report -b --sort=symbol
	    52.34%  [.] main                   [.] f1
	    24.04%  [.] f1                     [.] f3
	    23.60%  [.] f1                     [.] f2
	     0.01%  [k] _IO_new_file_xsputn    [k] _IO_file_overflow
	     0.01%  [k] _IO_vfprintf_internal  [k] _IO_new_file_xsputn
	     0.01%  [k] _IO_vfprintf_internal  [k] strchrnul
	     0.01%  [k] __printf               [k] _IO_vfprintf_internal
	     0.01%  [k] main                   [k] __printf

   This output shows from/to branch columns and shows the highest
   percentage (from,to) jump combinations - i.e.  the most likely taken
   branches in the system.  "branches" can also include function calls
   and any other synchronous and asynchronous transitions of the
   instruction pointer that are not 'next instruction' - such as system
   calls, traps, interrupts, etc.

   This feature comes with (hopefully intuitive) flat ascii and TUI
   support in perf report.

 - Various 'perf annotate' visual improvements for us assembly junkies.
   It will now recognize function calls in the TUI and by hitting enter
   you can follow the call (recursively) and back, amongst other
   improvements.

 - Multiple threads/processes recording support in perf record, perf
   stat, perf top - which is activated via a comma-list of PIDs:

	perf top -p 21483,21485
	perf stat -p 21483,21485 -ddd
	perf record -p 21483,21485

 - Support for per UID views, via the --uid paramter to perf top, perf
   report, etc.  For example 'perf top --uid mingo' will only show the
   tasks that I am running, excluding other users, root, etc.

 - Jump label restructurings and improvements - this includes the
   factoring out of the (hopefully much clearer) include/linux/static_key.h
   generic facility:

	struct static_key key = STATIC_KEY_INIT_FALSE;

	...

	if (static_key_false(&key))
	        do unlikely code
	else
	        do likely code

	...
	static_key_slow_inc();
	...
	static_key_slow_inc();
	...

   The static_key_false() branch will be generated into the code with as
   little impact to the likely code path as possible.  the
   static_key_slow_*() APIs flip the branch via live kernel code patching.

   This facility can now be used more widely within the kernel to
   micro-optimize hot branches whose likelihood matches the static-key
   usage and fast/slow cost patterns.

 - SW function tracer improvements: perf support and filtering support.

 - Various hardenings of the perf.data ABI, to make older perf.data's
   smoother on newer tool versions, to make new features integrate more
   smoothly, to support cross-endian recording/analyzing workflows
   better, etc.

 - Restructuring of the kprobes code, the splitting out of 'optprobes',
   and a corner case bugfix.

 - Allow the tracing of kernel console output (printk).

 - Improvements/fixes to user-space RDPMC support, allowing user-space
   self-profiling code to extract PMU counts without performing any
   system calls, while playing nice with the kernel side.

 - 'perf bench' improvements

 - ... and lots of internal restructurings, cleanups and fixes that made
   these features possible.  And, as usual this list is incomplete as
   there were also lots of other improvements

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (120 commits)
  perf report: Fix annotate double quit issue in branch view mode
  perf report: Remove duplicate annotate choice in branch view mode
  perf/x86: Prettify pmu config literals
  perf report: Enable TUI in branch view mode
  perf report: Auto-detect branch stack sampling mode
  perf record: Add HEADER_BRANCH_STACK tag
  perf record: Provide default branch stack sampling mode option
  perf tools: Make perf able to read files from older ABIs
  perf tools: Fix ABI compatibility bug in print_event_desc()
  perf tools: Enable reading of perf.data files from different ABI rev
  perf: Add ABI reference sizes
  perf report: Add support for taken branch sampling
  perf record: Add support for sampling taken branch
  perf tools: Add code to support PERF_SAMPLE_BRANCH_STACK
  x86/kprobes: Split out optprobe related code to kprobes-opt.c
  x86/kprobes: Fix a bug which can modify kernel code permanently
  x86/kprobes: Fix instruction recovery on optimized path
  perf: Add callback to flush branch_stack on context switch
  perf: Disable PERF_SAMPLE_BRANCH_* when not supported
  perf/x86: Add LBR software filter support for Intel CPUs
  ...
2012-03-20 10:29:15 -07:00
Linus Torvalds
5928a2b60c Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU changes for v3.4 from Ingo Molnar.  The major features of this
series are:

 - making RCU more aggressive about entering dyntick-idle mode in order
   to improve energy efficiency

 - converting a few more call_rcu()s to kfree_rcu()s

 - applying a number of rcutree fixes and cleanups to rcutiny

 - removing CONFIG_SMP #ifdefs from treercu

 - allowing RCU CPU stall times to be set via sysfs

 - adding CPU-stall capability to rcutorture

 - adding more RCU-abuse diagnostics

 - updating documentation

 - fixing yet more issues located by the still-ongoing top-to-bottom
   inspection of RCU, this time with a special focus on the CPU-hotplug
   code path.

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (48 commits)
  rcu: Stop spurious warnings from synchronize_sched_expedited
  rcu: Hold off RCU_FAST_NO_HZ after timer posted
  rcu: Eliminate softirq-mediated RCU_FAST_NO_HZ idle-entry loop
  rcu: Add RCU_NONIDLE() for idle-loop RCU read-side critical sections
  rcu: Allow nesting of rcu_idle_enter() and rcu_idle_exit()
  rcu: Remove redundant check for rcu_head misalignment
  PTR_ERR should be called before its argument is cleared.
  rcu: Convert WARN_ON_ONCE() in rcu_lock_acquire() to lockdep
  rcu: Trace only after NULL-pointer check
  rcu: Call out dangers of expedited RCU primitives
  rcu: Rework detection of use of RCU by offline CPUs
  lockdep: Add CPU-idle/offline warning to lockdep-RCU splat
  rcu: No interrupt disabling for rcu_prepare_for_idle()
  rcu: Move synchronize_sched_expedited() to rcutree.c
  rcu: Check for illegal use of RCU from offlined CPUs
  rcu: Update stall-warning documentation
  rcu: Add CPU-stall capability to rcutorture
  rcu: Make documentation give more realistic rcutorture duration
  rcutorture: Permit holding off CPU-hotplug operations during boot
  rcu: Print scheduling-clock information on RCU CPU stall-warning messages
  ...
2012-03-20 10:10:18 -07:00
Cong Wang
c3eede8e0a lib: remove the second argument of k[un]map_atomic()
Signed-off-by: Cong Wang <amwang@redhat.com>
2012-03-20 21:48:27 +08:00
Ingo Molnar
35239e23c6 Merge branch 'perf/urgent' into perf/core
Merge reason: We are going to queue up a dependent patch.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-12 20:44:11 +01:00
Tom Herbert
930c514f69 dql: Fix undefined jiffies
In some configurations, jiffies may be undefined in
lib/dynamic_queue_limits.c.  Adding include of jiffies.h to avoid
this.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-11 19:59:43 -07:00
Greg Kroah-Hartman
263a5c8e16 Merge 3.3-rc6 into driver-core-next
This was done to resolve a conflict in the drivers/base/cpu.c file.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-09 12:35:53 -08:00
Andrew Vagin
7b60a18da3 uevent: send events in correct order according to seqnum (v3)
The queue handling in the udev daemon assumes that the events are
ordered.

Before this patch uevent_seqnum is incremented under sequence_lock,
than an event is send uner uevent_sock_mutex. I want to say that code
contained a window between incrementing seqnum and sending an event.

This patch locks uevent_sock_mutex before incrementing uevent_seqnum.

v2: delete sequence_lock, uevent_seqnum is protected by uevent_sock_mutex
v3: unlock the mutex before the goto exit

Thanks for Kay for the comments.

Signed-off-by: Andrew Vagin <avagin@openvz.org>
Tested-By: Kay Sievers <kay.sievers@vrfy.org>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-08 12:56:40 -08:00
Paul Gortmaker
8bc3bcc93a lib: reduce the use of module.h wherever possible
For files only using THIS_MODULE and/or EXPORT_SYMBOL, map
them onto including export.h -- or if the file isn't even
using those, then just delete the include.  Fix up any implicit
include dependencies that were being masked by module.h along
the way.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-03-07 15:04:04 -05:00
Jan Beulich
5756b76e4d vsprintf: make %pV handling compatible with kasprintf()
kasprintf() (and potentially other functions that I didn't run across so
far) want to evaluate argument lists twice.  Caring to do so for the
primary list is obviously their job, but they can't reasonably be
expected to check the format string for instances of %pV, which however
need special handling too: On architectures like x86-64 (as opposed to
e.g.  ix86), using the same argument list twice doesn't produce the
expected results, as an internally managed cursor gets updated during
the first run.

Fix the problem by always acting on a copy of the original list when
handling %pV.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-06 08:22:26 -08:00
Stephen Boyd
9f78ff005a debugobjects: Fix selftest for static warnings
debugobjects is now printing a warning when a fixup for a NOTAVAILABLE
object is run.  This causes the selftest to fail like:

	ODEBUG: selftest warnings failed 4 != 5

We could just increase the number of warnings that the selftest is
expecting to see because that is actually what has changed.  But, it turns
out that fixup_activate() was written with inverted logic and thus a fixup
for a static object returned 1 indicating the object had been fixed, and 0
otherwise.  Fix the logic to be correct and update the counts to reflect
that nothing needed fixing for a static object.

Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-05 15:49:43 -08:00
Ingo Molnar
737f24bda7 Merge branch 'perf/urgent' into perf/core
Conflicts:
	tools/perf/builtin-record.c
	tools/perf/builtin-top.c
	tools/perf/perf.h
	tools/perf/util/top.h

Merge reason: resolve these cherry-picking conflicts.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-05 09:20:08 +01:00
Paul Gortmaker
50af5ead3b bug.h: add include of it to various implicit C users
With bug.h currently living right in linux/kernel.h there
are files that use BUG_ON and friends but are not including
the header explicitly.  Fix them up so we can remove the
presence in kernel.h file.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-02-29 17:15:08 -05:00
Paul Gortmaker
b116ee4d77 lib: fix implicit users of kernel.h for TAINT_WARN
A pending header cleanup will cause this to show up as:

lib/average.c:38: error: 'TAINT_WARN' undeclared (first use in this function)
lib/list_debug.c:24: error: 'TAINT_WARN' undeclared (first use in this function)

and TAINT_WARN comes from include/linux/kernel.h file.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-02-28 17:49:27 -05:00
Paul E. McKenney
a858af2875 rcu: Print scheduling-clock information on RCU CPU stall-warning messages
There have been situations where RCU CPU stall warnings were caused by
issues in scheduling-clock timer initialization.  To make it easier to
track these down, this commit causes the RCU CPU stall-warning messages
to print out the number of scheduling-clock interrupts taken in the
current grace period for each stalled CPU.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-02-21 09:03:49 -08:00
Paul E. McKenney
5c8806a037 rcu: Move RCU_TRACE to lib/Kconfig.debug
The RCU_TRACE kernel parameter has always been intended for debugging,
not for production use.  Formalize this by moving RCU_TRACE from
init/Kconfig to lib/Kconfig.debug.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-02-21 09:03:26 -08:00
Fernando Luis Vázquez Cao
5f32908943 watchdog: Update Kconfig entries
The soft and hard lockup thresholds have changed so the
corresponding Kconfig entries need to be updated accordingly.
Add a reference to  watchdog_thresh while at it.

Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Link: http://lkml.kernel.org/r/1328827342-6253-2-git-send-email-dzickus@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-02-11 15:11:30 +01:00
David Howells
690d137f44 Reduce the number of expensive division instructions done by _parse_integer()
_parse_integer() does one or two division instructions (which are slow)
per digit parsed to perform the overflow check.

Furthermore, these are particularly expensive examples of division
instruction as the number of clock cycles required to complete them may
go up with the position of the most significant set bit in the dividend:

	if (*res > div_u64(ULLONG_MAX - val, base))

which is as maximal as possible.

Worse, on 32-bit arches, more than one of these division instructions
may be required per digit.

So, assuming we don't support a base of more than 16, skip the check if the
top nibble of the result is not set at this point.

Signed-off-by: David Howells <dhowells@redhat.com>
[ Changed it to not dereference the pointer all the time - even if the
  compiler can and does optimize it away, the code just looks cleaner.
  And edited the top nybble test slightly to make the code generated on
  x86-64 better in the loop - test against a hoisted constant instead of
  shifting and testing the result ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-02-09 10:09:30 -08:00
Linus Torvalds
95025d6b27 arch: fix ioport mapping on mips,sh
Kevin Cernekee reported that recent cleanup
 that replaced pci_iomap with a generic function
 failed to take into account the differences
 in io port handling on mips and sh architectures.
 
 Rather than revert the changes reintroducing the
 code duplication, this patchset fixes this
 by adding ability for architectures to override
 ioport mapping for pci devices.
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQEcBAABAgAGBQJPL9X8AAoJECgfDbjSjVRpgzgIAIQGeWIP0JUWDDhLfYscTBZx
 96Am+8+maVl12t+evVH8lMnJDSqjKH0kWxk6CTQSUo57gZ4ne1SxbZ0+s5DcsE6m
 XAnIvkA+4pw36l4QRkEj8g+yrhpQhqaiKJt/l80jaVFGVAw47WrxGKatUe9L90lI
 X7+xa/F5zvZO6oamEI94SAojIvmKkZfHIjGc/NaZLaWHRysdFf8Ek13mj2+9FLq3
 dxmg9F14eS2X59tIkN4BLM4Dq8UyZqraT0N/0bO0Tetqx0azzNZbsBsg5RwQ15IF
 Ei0dMFARoT9UcIpdSwtpGGCoqYa5yRHFT1g54hv/Pon0mKUjG7Fpz5LRWmZ5Xic=
 =b1R2
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

arch: fix ioport mapping on mips,sh

Kevin Cernekee reported that recent cleanup that replaced pci_iomap with
a generic function failed to take into account the differences in io
port handling on mips and sh architectures.

Rather than revert the changes reintroducing the code duplication, this
patchset fixes this by adding ability for architectures to override
ioport mapping for pci devices.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  sh: use the the PCI channels's io_map_base
  mips: use the the PCI controller's io_map_base
  lib: add NO_GENERIC_PCI_IOPORT_MAP
2012-02-07 14:32:24 -08:00
Linus Torvalds
2f2fde9272 Merge branches 'core-urgent-for-linus', 'perf-urgent-for-linus', 'sched-urgent-for-linus' and 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  bugs, x86: Fix printk levels for panic, softlockups and stack dumps

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf top: Fix number of samples displayed
  perf tools: Fix strlen() bug in perf_event__synthesize_event_type()
  perf tools: Fix broken build by defining _GNU_SOURCE in Makefile
  x86/dumpstack: Remove unneeded check in dump_trace()
  perf: Fix broken interrupt rate throttling

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/rt: Fix task stack corruption under __ARCH_WANT_INTERRUPTS_ON_CTXSW
  sched: Fix ancient race in do_exit()
  sched/nohz: Fix nohz cpu idle load balancing state with cpu hotplug
  sched/s390: Fix compile error in sched/core.c
  sched: Fix rq->nr_uninterruptible update race

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/reboot: Remove VersaLogic Menlow reboot quirk
  x86/reboot: Skip DMI checks if reboot set by user
  x86: Properly parenthesize cmpxchg() macro arguments
2012-02-02 11:11:13 -08:00
David Miller
a99e7e5f36 lib: Fix 32-bit sparc udiv_qrnnd() definition in mpilib's longlong.h
This copy of longlong.h is extremely dated and results in compile
errors on sparc32 when MPILIB is enabled, copy over the more uptodate
implementation from arch/sparc/math/sfp-util_32.h

Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: James Morris <jmorris@namei.org>
2012-02-02 10:34:25 +11:00
David Miller
c6df4b17c8 lib: Fix multiple definitions of clz_tab
Both sparc 32-bit's software divide assembler and MPILIB provide
clz_tab[] with identical contents.

Break it out into a seperate object file and select it when
SPARC32 or MPILIB is set.

Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: James Morris <jmorris@namei.org>
2012-02-02 10:34:23 +11:00