Follow the practice described in section 8.2.2 of RFC5661: When sending a
read/write or setattr stateid, set the seqid field to zero in order to
signal that the NFS server should apply the most recent locking state.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
If we replay a READ or WRITE call, we should not be changing the
stateid. Currently, we may end up doing so, because the stateid
is only selected at xdr encode time.
This patch ensures that we select the stateid after we get an NFSv4.1
session slot, and that we keep that same stateid across retries.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
If the state recovery failed, we want to ensure that the application
doesn't try to use the same file descriptor for more reads or writes.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Pull more VFS bits from Al Viro:
"Unfortunately, it looks like xattr series will have to wait until the
next cycle ;-/
This pile contains 9p cleanups and fixes (races in v9fs_fid_add()
etc), fixup for nommu breakage in shmem.c, several cleanups and a bit
more file_inode() work"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
constify path_get/path_put and fs_struct.c stuff
fix nommu breakage in shmem.c
cache the value of file_inode() in struct file
9p: if v9fs_fid_lookup() gets to asking server, it'd better have hashed dentry
9p: make sure ->lookup() adds fid to the right dentry
9p: untangle ->lookup() a bit
9p: double iput() in ->lookup() if d_materialise_unique() fails
9p: v9fs_fid_add() can't fail now
v9fs: get rid of v9fs_dentry
9p: turn fid->dlist into hlist
9p: don't bother with private lock in ->d_fsdata; dentry->d_lock will do just fine
more file_inode() open-coded instances
selinux: opened file can't have NULL or negative ->f_path.dentry
(In the meantime, the hlist traversal macros have changed, so this
required a semantic conflict fixup for the newly hlistified fid->dlist)
This adds core architecture support for Imagination's Meta processor
cores, followed by some later miscellaneous arch/metag cleanups and
fixes which I kept separate to ease review:
- Support for basic Meta 1 (ATP) and Meta 2 (HTP) core architecture
- A few fixes all over, particularly for symbol prefixes
- A few privilege protection fixes
- Several cleanups (setup.c includes, split out a lot of metag_ksyms.c)
- Fix some missing exports
- Convert hugetlb to use vm_unmapped_area()
- Copy device tree to non-init memory
- Provide dma_get_sgtable()
Signed-off-by: James Hogan <james.hogan@imgtec.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (GNU/Linux)
iQIcBAABAgAGBQJRMmVXAAoJEKHZs+irPybfivgP/inEXqJyfw59omQdjwvYcU/a
/u0MJ3UKSNS3U+HknfaFCy/Nwk1dqPLjqqyVC1V6AbUPBXlaEwGcimlNRx2uRjdq
Uh96upMLHsNuF/xiiR477g3RwY0egIJdM1R1bGi3mZ3vVrNQGF+wbni6f61xCWGz
M/4rDglpQvE79oLhYdgj6tidZtHQT0YWtERA9W90zkQWXGYmpFPKBKbfZAi5+rKQ
U6Gpg26orUugzXNaxltJEYKE8gjLTppEabx8DARnItZ4zCMy4dw5RBJ35RFvQw6e
eSmfgTy9w9WqBMY2+QMSgU0KQt1IITCzX7OlOXC0jALQJXoU0WWbOELlBVQLCwF1
T0OcR/5ZP/hIlOk5Dh+e9U3AtbASXdMtqA0ZUe78woH1CBf7Nc/0c0vRg23EdMh8
lnHDJxT/UqskoOYLI4kgWbEdLDy4uTh19U2pVi7VCo7ksLB9Bj9Xc8VSKgscSXTl
OwzN+c4Jgtu8FDFTp+Af4AT8pYGJ08j8L2ErsV2sOv3Q44U5WXdrMz3GSgwXj8+4
wZk3HvdkQVkMD5sJCUZgAswaN6BnbB0pHdCz4wMQ8jR/Ogs015Ipk64Ecym9S/4n
uES7PnDtt/4lb5EyX2ScbvdnZTAFTaaP7OOhC77BOQvbQjIW1tkAcxWJqRry86uS
iM0BFgK6Ohx3geqa5Ft0
=65cR
-----END PGP SIGNATURE-----
Merge tag 'metag-v3.9-rc1-v4' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/metag
Pull new ImgTec Meta architecture from James Hogan:
"This adds core architecture support for Imagination's Meta processor
cores, followed by some later miscellaneous arch/metag cleanups and
fixes which I kept separate to ease review:
- Support for basic Meta 1 (ATP) and Meta 2 (HTP) core architecture
- A few fixes all over, particularly for symbol prefixes
- A few privilege protection fixes
- Several cleanups (setup.c includes, split out a lot of
metag_ksyms.c)
- Fix some missing exports
- Convert hugetlb to use vm_unmapped_area()
- Copy device tree to non-init memory
- Provide dma_get_sgtable()"
* tag 'metag-v3.9-rc1-v4' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/metag: (61 commits)
metag: Provide dma_get_sgtable()
metag: prom.h: remove declaration of metag_dt_memblock_reserve()
metag: copy devicetree to non-init memory
metag: cleanup metag_ksyms.c includes
metag: move mm/init.c exports out of metag_ksyms.c
metag: move usercopy.c exports out of metag_ksyms.c
metag: move setup.c exports out of metag_ksyms.c
metag: move kick.c exports out of metag_ksyms.c
metag: move traps.c exports out of metag_ksyms.c
metag: move irq enable out of irqflags.h on SMP
genksyms: fix metag symbol prefix on crc symbols
metag: hugetlb: convert to vm_unmapped_area()
metag: export clear_page and copy_page
metag: export metag_code_cache_flush_all
metag: protect more non-MMU memory regions
metag: make TXPRIVEXT bits explicit
metag: kernel/setup.c: sort includes
perf: Enable building perf tools for Meta
metag: add boot time LNKGET/LNKSET check
metag: add __init to metag_cache_probe()
...
Pull watchdog updates from Wim Van Sebroeck:
"This contains:
- fixes and improvements
- devicetree bindings
- conversion to watchdog generic framework of the following drivers:
- booke_wdt
- bcm47xx_wdt.c
- at91sam9_wdt
- Removal of old STMP3xxx driver
- Addition of following new drivers:
- new driver for STMP3xxx and i.MX23/28
- Retu watchdog driver"
* git://www.linux-watchdog.org/linux-watchdog: (30 commits)
watchdog: sp805_wdt depends on ARM
watchdog: davinci_wdt: update to devm_* API
watchdog: davinci_wdt: use devm managed clk get
watchdog: at91rm9200: add DT support
watchdog: add timeout-sec property binding
watchdog: at91sam9_wdt: Convert to use the watchdog framework
watchdog: omap_wdt: Add option nowayout
watchdog: core: dt: add support for the timeout-sec dt property
watchdog: bcm47xx_wdt.c: add hard timer
watchdog: bcm47xx_wdt.c: rename wdt_time to timeout
watchdog: bcm47xx_wdt.c: rename ops methods
watchdog: bcm47xx_wdt.c: use platform device
watchdog: bcm47xx_wdt.c: convert to watchdog core api
watchdog: Convert BookE watchdog driver to watchdog infrastructure
watchdog: s3c2410_wdt: Use devm_* functions
watchdog: remove old STMP3xxx driver
watchdog: add new driver for STMP3xxx and i.MX23/28
rtc: stmp3xxx: add wdt-accessor function
watchdog: introduce retu_wdt driver
watchdog: intel_scu_watchdog: fix Kconfig dependency
...
Pull second set of slave-dmaengine updates from Vinod Koul:
"Arnd's patch moves the dw_dmac to use generic DMA binding. I agreed
to merge this late as it will avoid the conflicts between trees.
The second patch from Matt adding a dma_request_slave_channel_compat
API was supposed to be picked up, but somehow never got picked up.
Some patches dependent on this are already in -next :("
* 'next' of git://git.infradead.org/users/vkoul/slave-dma:
dmaengine: dw_dmac: move to generic DMA binding
dmaengine: add dma_request_slave_channel_compat()
- Don't allow NFS silly-renamed files to be deleted
- Don't start the retransmission timer when out of socket space
- Fix a couple of pnfs-related Oopses.
- Fix one more NFSv4 state recovery deadlock
- Don't loop forever when LAYOUTGET returns NFS4ERR_LAYOUTTRYLATER
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (GNU/Linux)
iQIcBAABAgAGBQJRMpMhAAoJEGcL54qWCgDy4BMP/0Zl7Ei7x9bJSb1C1lpPSo5p
Lr9XoHLYqhPcAwKUXQfgM5IkC69bE62bD5esmdDqkgZYqnmGE0E4LG6MsbsMmvzk
yug5WOxmjOFee7Bdpd8B86Z0qsa4l2TkQu2h9G3zE36P2rPKQaNzpteIjhis5UEQ
EfNyLoBdFcuUSh4ztMVZOzbAyDcbNfsyl03XVmlv+Qn/o0l42Zjth0qwOP60bjuM
zJF1CkHi5NLbXEhmOev9mA6UYz6zWRbiA/Yu92pomtXVDtOtzWpUniBIcf/S1ZH/
V8Gj6bWdHHyFCa2PjhY1/QdLBOPRPdxpAAJk+q48AKmzyiOU6g3lIHBp5ai1WZNI
1C+SYxABE/EJgq9SoQYGqq6SUiolrFulqnFHXF0jHF+ifdjoHjSRmpGQAoyoZ0k1
aSl+Ojqx7QHibJd8GZBavWc3upRDzhHDRRB3tkQCENi+hryBZxEyeS2Z54NmBRUN
tsOuyac6rtknZdD8Do4DMt9uc9u1DWicaiZbLfkP1VL1Angh6NKSA7qbmH6giLBS
9Y+DPcIk5e34uKQ21WTxFydGD+SMg0EMnOmfr6EYXWEHBhKNYVR+cHyH0mAF6RzX
enU2g0H2m+3vUQqajPUP0DV/eLGtdsvWvMjiskc3KX90CWfHmV2C8GFSxjV2OkT1
vG1KFrICO6DR2943Udit
=FMtb
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-3.9-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfixes from Trond Myklebust:
"We've just concluded another Connectathon interoperability testing
week, and so here are the fixes for the bugs that were discovered:
- Don't allow NFS silly-renamed files to be deleted
- Don't start the retransmission timer when out of socket space
- Fix a couple of pnfs-related Oopses.
- Fix one more NFSv4 state recovery deadlock
- Don't loop forever when LAYOUTGET returns NFS4ERR_LAYOUTTRYLATER"
* tag 'nfs-for-3.9-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
SUNRPC: One line comment fix
NFSv4.1: LAYOUTGET EDELAY loops timeout to the MDS
SUNRPC: add call to get configured timeout
PNFS: set the default DS timeout to 60 seconds
NFSv4: Fix another open/open_recovery deadlock
nfs: don't allow nfs_find_actor to match inodes of the wrong type
NFSv4.1: Hold reference to layout hdr in layoutget
pnfs: fix resend_to_mds for directio
SUNRPC: Don't start the retransmission timer when out of socket space
NFS: Don't allow NFS silly-renamed files to be deleted, no signal
Pull btrfs update from Chris Mason:
"The biggest feature in the pull is the new (and still experimental)
raid56 code that David Woodhouse started long ago. I'm still working
on the parity logging setup that will avoid inconsistent parity after
a crash, so this is only for testing right now. But, I'd really like
to get it out to a broader audience to hammer out any performance
issues or other problems.
scrub does not yet correct errors on raid5/6 either.
Josef has another pass at fsync performance. The big change here is
to combine waiting for metadata with waiting for data, which is a big
latency win. It is also step one toward using atomics from the
hardware during a commit.
Mark Fasheh has a new way to use btrfs send/receive to send only the
metadata changes. SUSE is using this to make snapper more efficient
at finding changes between snapshosts.
Snapshot-aware defrag is also included.
Otherwise we have a large number of fixes and cleanups. Eric Sandeen
wins the award for removing the most lines, and I'm hoping we steal
this idea from XFS over and over again."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (118 commits)
btrfs: fixup/remove module.h usage as required
Btrfs: delete inline extents when we find them during logging
btrfs: try harder to allocate raid56 stripe cache
Btrfs: cleanup to make the function btrfs_delalloc_reserve_metadata more logic
Btrfs: don't call btrfs_qgroup_free if just btrfs_qgroup_reserve fails
Btrfs: remove reduplicate check about root in the function btrfs_clean_quota_tree
Btrfs: return ENOMEM rather than use BUG_ON when btrfs_alloc_path fails
Btrfs: fix missing deleted items in btrfs_clean_quota_tree
btrfs: use only inline_pages from extent buffer
Btrfs: fix wrong reserved space when deleting a snapshot/subvolume
Btrfs: fix wrong reserved space in qgroup during snap/subv creation
Btrfs: remove unnecessary dget_parent/dput when creating the pending snapshot
btrfs: remove a printk from scan_one_device
Btrfs: fix NULL pointer after aborting a transaction
Btrfs: fix memory leak of log roots
Btrfs: copy everything if we've created an inline extent
btrfs: cleanup for open-coded alignment
Btrfs: do not change inode flags in rename
Btrfs: use reserved space for creating a snapshot
clear chunk_alloc flag on retryable failure
...
* misc clean-ups in the MTD command-line partitioning parser (cmdlinepart)
* add flash locking support for STmicro chips serial flash chips, as well as
for CFI command set 2 chips.
* new driver for the ELM error correction HW module found in various TI chips,
enable the OMAP NAND driver to use the ELM HW error correction
* added number of new serial flash IDs
* various fixes and improvements in the gpmi NAND driver
* bcm47xx NAND driver improvements
* make the mtdpart module actually removable
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (GNU/Linux)
iEYEABECAAYFAlExEs8ACgkQdwG7hYl686Oa+gCgiBNt+I0MiixDeN+MGuE1uw9s
i2wAniD9sR2HDy6y5SkbdXK/aPr8ez/p
=xV1l
-----END PGP SIGNATURE-----
Merge tag 'for-linus-20130301' of git://git.infradead.org/linux-mtd
Pull MTD update from David Woodhouse:
"Fairly unexciting MTD merge for 3.9:
- misc clean-ups in the MTD command-line partitioning parser
(cmdlinepart)
- add flash locking support for STmicro chips serial flash chips, as
well as for CFI command set 2 chips.
- new driver for the ELM error correction HW module found in various
TI chips, enable the OMAP NAND driver to use the ELM HW error
correction
- added number of new serial flash IDs
- various fixes and improvements in the gpmi NAND driver
- bcm47xx NAND driver improvements
- make the mtdpart module actually removable"
* tag 'for-linus-20130301' of git://git.infradead.org/linux-mtd: (45 commits)
mtd: map: BUG() in non handled cases
mtd: bcm47xxnflash: use pr_fmt for module prefix in messages
mtd: davinci_nand: Use managed resources
mtd: mtd_torturetest can cause stack overflows
mtd: physmap_of: Convert device allocation to managed devm_kzalloc()
mtd: at91: atmel_nand: for PMECC, add code to check the ONFI parameter ECC requirement.
mtd: atmel_nand: make pmecc-cap, pmecc-sector-size in dts is optional.
mtd: atmel_nand: avoid to report an error when lookup table offset is 0.
mtd: bcm47xxsflash: adjust names of bus-specific functions
mtd: bcm47xxpart: improve probing of nvram partition
mtd: bcm47xxpart: add support for other erase sizes
mtd: bcm47xxnflash: register this as normal driver
mtd: bcm47xxnflash: fix message
mtd: bcm47xxsflash: register this as normal driver
mtd: bcm47xxsflash: write number of written bytes
mtd: gpmi: add sanity check for the ECC
mtd: gpmi: set the Golois Field bit for mx6q's BCH
mtd: devices: elm: Removes <xx> literals in elm DT node
mtd: gpmi: fix a dereferencing freed memory error
mtd: fix the wrong timeo for panic_nand_wait()
...
Commit cc2383ec06 ("mm: introduce
arch-specific vma flag VM_ARCH_1") merged in v3.7-rc1.
The above commit combined several arch-specific vma flags into one, and
in the process it changed the VM_GROWSUP definition to depend on
specific architectures rather than CONFIG_STACK_GROWSUP. Therefore add
an ifdef for CONFIG_METAG to also set VM_GROWSUP.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michel Lespinasse <walken@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-mm@kvack.org
Meta core internal interrupts (from HWSTATMETA and friends) are vectored
onto the TR1 core trigger for the current thread. This is demultiplexed
in irq-metag.c to individual Linux IRQs for each internal interrupt.
External SoC interrupts (from HWSTATEXT and friends) are vectored onto
the TR2 core trigger for the current thread. This is demultiplexed in
irq-metag-ext.c to individual Linux IRQs for each external SoC interrupt.
The external irqchip has devicetree bindings for configuring the number
of irq banks and the type of masking available.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Grant Likely <grant.likely@secretlab.ca>
Cc: Rob Herring <rob.herring@calxeda.com>
Cc: Rob Landley <rob@landley.net>
Cc: Dom Cobley <popcornmix@gmail.com>
Cc: Simon Arlott <simon@fire.lp0.eu>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Maxime Ripard <maxime.ripard@free-electrons.com>
Cc: devicetree-discuss@lists.ozlabs.org
Cc: linux-doc@vger.kernel.org
an SSD to be used as a cache in front of a slower device. Cache
tuning is delegated to interchangeable policy modules so these can
be developed independently of the mechanics needed to shuffle the
data around.
Other than that, kcopyd users acquire a throttling parameter, ioctl
buffer usage gets streamlined, more mempool reliance is reduced
and there are a few other bug fixes and tidy-ups.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJRMTADAAoJEK2W1qbAHj1nAiUP/1Az6kkg+pYQAkD2f3aQLHi8
b3SfkZ5mMSZkBYRUuMzZ5oMuSVZuyoRQi8+qy5PsUSDYO8jH2ja825jOa584OqVW
XYBbNi6Te/1L+CMWWCYf5ShyZuu4o5TwIV7IWbaPsVWIwflaTF9WRyEjxG3WRM3X
XglrGJqb321Gx2ZI3OwxIPGyXRQWFTCmZ5pshAVAedspE86tKRh2kx3dgLw4qcu/
3YYMV8DBga+xgNVM/EuoXQLsPpuxdrsSfdYZo7cTvKevgEuO+0vgWZORjCWFfICd
ypwco9FHAU5dSjEiQ0rXO5Lhbv3Dx5vfm+vrzPSKUp4WVFhZKLHOZEFDylluEidv
jX7a5RohUj/mDOTp+tgZ1GbpTWXtjX9uhoQWbeqZmz189TYX+twnVaNR47Q+YjcL
VtYP7SgsUXTzdGJJ/RtGM6zhlgALI5dC6smNGbHLcyFcmIKIAa6LtyE86iPw14DY
axKfNwL/c5PmtcmAYxIBx6CDK+M+MunK50/BtX4pOrQyzgbyxtylvrgq/BzvI5m3
FwMkcziDMA0Dhwi1PpXz+MyOnVvGhEb6CrzYrylB98aFyDw2YvwUytBvisTmwSoJ
YNn4vwCwJIWetu0ls5SZERLHPOh3rjwaufq9nVXew/FXF8gNjrOTZLHz3XtZbtw5
TMntpytLQ5N9A7bRPuDz
=yXuO
-----END PGP SIGNATURE-----
Merge tag 'dm-3.9-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm
Pull device-mapper update from Alasdair G Kergon:
"The main addition here is a long-desired target framework to allow an
SSD to be used as a cache in front of a slower device. Cache tuning
is delegated to interchangeable policy modules so these can be
developed independently of the mechanics needed to shuffle the data
around.
Other than that, kcopyd users acquire a throttling parameter, ioctl
buffer usage gets streamlined, more mempool reliance is reduced and
there are a few other bug fixes and tidy-ups."
* tag 'dm-3.9-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm: (30 commits)
dm cache: add cleaner policy
dm cache: add mq policy
dm: add cache target
dm persistent data: add bitset
dm persistent data: add transactional array
dm thin: remove cells from stack
dm bio prison: pass cell memory in
dm persistent data: add btree_walk
dm: add target num_write_bios fn
dm kcopyd: introduce configurable throttling
dm ioctl: allow message to return data
dm ioctl: optimize functions without variable params
dm ioctl: introduce ioctl_flags
dm: merge io_pool and tio_pool
dm: remove unused _rq_bio_info_cache
dm: fix limits initialization when there are no data devices
dm snapshot: add missing module aliases
dm persistent data: set some btree fn parms const
dm: refactor bio cloning
dm: rename bio cloning functions
...
Tim found:
WARNING: at arch/x86/kernel/smpboot.c:324 topology_sane.isra.2+0x6f/0x80()
Hardware name: S2600CP
sched: CPU #1's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency.
smpboot: Booting Node 1, Processors #1
Modules linked in:
Pid: 0, comm: swapper/1 Not tainted 3.9.0-0-generic #1
Call Trace:
set_cpu_sibling_map+0x279/0x449
start_secondary+0x11d/0x1e5
Don Morris reproduced on a HP z620 workstation, and bisected it to
commit e8d1955258 ("acpi, memory-hotplug: parse SRAT before memblock
is ready")
It turns out movable_map has some problems, and it breaks several things
1. numa_init is called several times, NOT just for srat. so those
nodes_clear(numa_nodes_parsed)
memset(&numa_meminfo, 0, sizeof(numa_meminfo))
can not be just removed. Need to consider sequence is: numaq, srat, amd, dummy.
and make fall back path working.
2. simply split acpi_numa_init to early_parse_srat.
a. that early_parse_srat is NOT called for ia64, so you break ia64.
b. for (i = 0; i < MAX_LOCAL_APIC; i++)
set_apicid_to_node(i, NUMA_NO_NODE)
still left in numa_init. So it will just clear result from early_parse_srat.
it should be moved before that....
c. it breaks ACPI_TABLE_OVERIDE...as the acpi table scan is moved
early before override from INITRD is settled.
3. that patch TITLE is total misleading, there is NO x86 in the title,
but it changes critical x86 code. It caused x86 guys did not
pay attention to find the problem early. Those patches really should
be routed via tip/x86/mm.
4. after that commit, following range can not use movable ram:
a. real_mode code.... well..funny, legacy Node0 [0,1M) could be hot-removed?
b. initrd... it will be freed after booting, so it could be on movable...
c. crashkernel for kdump...: looks like we can not put kdump kernel above 4G
anymore.
d. init_mem_mapping: can not put page table high anymore.
e. initmem_init: vmemmap can not be high local node anymore. That is
not good.
If node is hotplugable, the mem related range like page table and
vmemmap could be on the that node without problem and should be on that
node.
We have workaround patch that could fix some problems, but some can not
be fixed.
So just remove that offending commit and related ones including:
f7210e6c4a ("mm/memblock.c: use CONFIG_HAVE_MEMBLOCK_NODE_MAP to
protect movablecore_map in memblock_overlaps_region().")
01a178a94e ("acpi, memory-hotplug: support getting hotplug info from
SRAT")
27168d38fa ("acpi, memory-hotplug: extend movablemem_map ranges to
the end of node")
e8d1955258 ("acpi, memory-hotplug: parse SRAT before memblock is
ready")
fb06bc8e5f ("page_alloc: bootmem limit with movablecore_map")
42f47e27e7 ("page_alloc: make movablemem_map have higher priority")
6981ec3114 ("page_alloc: introduce zone_movable_limit[] to keep
movable limit for nodes")
34b71f1e04 ("page_alloc: add movable_memmap kernel parameter")
4d59a75125 ("x86: get pg_data_t's memory from other node")
Later we should have patches that will make sure kernel put page table
and vmemmap on local node ram instead of push them down to node0. Also
need to find way to put other kernel used ram to local node ram.
Reported-by: Tim Gardner <tim.gardner@canonical.com>
Reported-by: Don Morris <don.morris@hp.com>
Bisected-by: Don Morris <don.morris@hp.com>
Tested-by: Don Morris <don.morris@hp.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Thomas Renninger <trenn@suse.de>
Cc: Tejun Heo <tj@kernel.org>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull signal/compat fixes from Al Viro:
"Fixes for several regressions introduced in the last signal.git pile,
along with fixing bugs in truncate and ftruncate compat (on just about
anything biarch at least one of those two had been done wrong)."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
compat: restore timerfd settime and gettime compat syscalls
[regression] braino in "sparc: convert to ksignal"
fix compat truncate/ftruncate
switch lseek to COMPAT_SYSCALL_DEFINE
lseek() and truncate() on sparc really need sign extension
Add a num_write_bios function to struct target.
If an instance of a target sets this, it will be queried before the
target's mapping function is called on a write bio, and the response
controls the number of copies of the write bio that the target will
receive.
This provides a convenient way for a target to send the same data to
more than one device. The new cache target uses this in writethrough
mode, to send the data both to the cache and the backing device.
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
This patch allows the administrator to reduce the rate at which kcopyd
issues I/O.
Each module that uses kcopyd acquires a throttle parameter that can be
set in /sys/module/*/parameters.
We maintain a history of kcopyd usage by each module in the variables
io_period and total_period in struct dm_kcopyd_throttle. The actual
kcopyd activity is calculated as a percentage of time equal to
"(100 * io_period / total_period)". This is compared with the user-defined
throttle percentage threshold and if it is exceeded, we sleep.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Use 'bio' in the name of variables and functions that deal with
bios rather than 'request' to avoid confusion with the normal
block layer use of 'request'.
No functional changes.
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Avoid returning a truncated table or status string instead of setting
the DM_BUFFER_FULL_FLAG when the last target of a table fills the
buffer.
When processing a table or status request, the function retrieve_status
calls ti->type->status. If ti->type->status returns non-zero,
retrieve_status assumes that the buffer overflowed and sets
DM_BUFFER_FULL_FLAG.
However, targets don't return non-zero values from their status method
on overflow. Most targets returns always zero.
If a buffer overflow happens in a target that is not the last in the
table, it gets noticed during the next iteration of the loop in
retrieve_status; but if a buffer overflow happens in the last target, it
goes unnoticed and erroneously truncated data is returned.
In the current code, the targets behave in the following way:
* dm-crypt returns -ENOMEM if there is not enough space to store the
key, but it returns 0 on all other overflows.
* dm-thin returns errors from the status method if a disk error happened.
This is incorrect because retrieve_status doesn't check the error
code, it assumes that all non-zero values mean buffer overflow.
* all the other targets always return 0.
This patch changes the ti->type->status function to return void (because
most targets don't use the return code). Overflow is detected in
retrieve_status: if the status method fills up the remaining space
completely, it is assumed that buffer overflow happened.
Cc: stable@vger.kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Fix kernel-doc warnings in hsi files:
Warning(include/linux/hsi/hsi.h:136): Excess struct/union/enum/typedef member 'e_handler' description in 'hsi_client'
Warning(include/linux/hsi/hsi.h:136): Excess struct/union/enum/typedef member 'pclaimed' description in 'hsi_client'
Warning(include/linux/hsi/hsi.h:136): Excess struct/union/enum/typedef member 'nb' description in 'hsi_client'
Warning(drivers/hsi/hsi.c:434): No description found for parameter 'handler'
Warning(drivers/hsi/hsi.c:434): Excess function parameter 'cb' description in 'hsi_register_port_event'
Don't document "private:" fields with kernel-doc notation.
If you want to leave them fully documented, that's OK, but
then don't mark them as "private:".
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Carlos Chinea <carlos.chinea@nokia.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: linux-kernel@vger.kernel.org
Cc: linux-omap@vger.kernel.org
Acked-by: Nishanth Menon <nm@ti.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add support for watchdog drivers to initialize/set the timeout field
of the watchdog_device structure. The timeout field is initialised
either with the module timeout parameter value (if valid) or with the
timeout-sec dt property (if valid). If both are invalid the initial
value is unchanged.
Signed-off-by: Fabio Porcedda <fabio.porcedda@gmail.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Instead of accessing the function to set the watchdog timer directly,
register a platform driver the platform could register to use this
watchdog driver.
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
This RTC also includes a watchdog timer. Provide an accessor function
for setting the watchdog timeout value which will be picked up by a
watchdog driver. Also register the platform_device for the watchdog here
to get the boot-time dependencies right.
Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (GNU/Linux)
iD8DBQBRLQ9iTWFXqwsgQ8kRAvK/AJ9HUkhHJsukjw35XEQPkhfwNs/XPgCglXB1
dF0wRbbL4d6pE9IloUsgYLg=
=JSSN
-----END PGP SIGNATURE-----
Merge tag 'lzo-update-signature-20130226' of git://github.com/markus-oberhumer/linux
Pull LZO compression update from Markus Oberhumer:
"Summary:
========
Update the Linux kernel LZO compression and decompression code to the
current upstream version which features significant performance
improvements on modern machines.
Some *synthetic* benchmarks:
============================
x86_64 (Sandy Bridge), gcc-4.6 -O3, Silesia test corpus, 256 kB block-size:
compression speed decompression speed
LZO-2005 : 150 MB/sec 468 MB/sec
LZO-2012 : 434 MB/sec 1210 MB/sec
i386 (Sandy Bridge), gcc-4.6 -O3, Silesia test corpus, 256 kB block-size:
compression speed decompression speed
LZO-2005 : 143 MB/sec 409 MB/sec
LZO-2012 : 372 MB/sec 1121 MB/sec
armv7 (Cortex-A9), Linaro gcc-4.6 -O3, Silesia test corpus, 256 kB block-size:
compression speed decompression speed
LZO-2005 : 27 MB/sec 84 MB/sec
LZO-2012 : 44 MB/sec 117 MB/sec
**LZO-2013-UA : 47 MB/sec 167 MB/sec
Legend:
LZO-2005 : LZO version in current 3.8 kernel (which is based on
the LZO 2.02 release from 2005)
LZO-2012 : updated LZO version available in linux-next
**LZO-2013-UA : updated LZO version available in linux-next plus experimental
ARM Unaligned Access patch. This needs approval
from some ARM maintainer ist NOT YET INCLUDED."
Andrew Morton <akpm@linux-foundation.org> acks it and says:
"There's a new LZ4 on the block which is even faster than the sped-up
LZO, but various filesystems and things use LZO"
* tag 'lzo-update-signature-20130226' of git://github.com/markus-oberhumer/linux:
crypto: testmgr - update LZO compression test vectors
lib/lzo: Update LZO compression to current upstream version
lib/lzo: Rename lzo1x_decompress.c to lzo1x_decompress_safe.c
Pull EDAC fixes and ghes-edac from Mauro Carvalho Chehab:
"For:
- Some fixes at edac drivers (i7core_edac, sb_edac, i3200_edac);
- error injection support for i5100, when EDAC debug is enabled;
- fix edac when it is loaded builtin (early init for the subsystem);
- a "Firmware First" EDAC driver, allowing ghes to report errors via
EDAC (ghes-edac).
With regards to ghes-edac, this fixes a longstanding BZ at Red Hat
that happens with Nehalem and Sandy Bridge CPUs: when both GHES and
i7core_edac or sb_edac are running, the error reports are
unpredictable, as both BIOS and OS race to access the registers. With
ghes-edac, the EDAC core will refuse to register any other concurrent
memory error driver.
This patchset moves the ghes struct definitions to a separate header
file (include/acpi/ghes.h) and adds 3 hooks at apei/ghes.c to
register/unregister and to report errors via ghes-edac. Those changes
were acked by ghes driver maintainer (Huang)."
* 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac: (30 commits)
i5100_edac: convert to use simple_open()
ghes_edac: fix to use list_for_each_entry_safe() when delete list items
ghes_edac: Fix RAS tracing
ghes_edac: Make it compliant with UEFI spec 2.3.1
ghes_edac: Improve driver's printk messages
ghes_edac: Don't credit the same memory dimm twice
ghes_edac: do a better job of filling EDAC DIMM info
ghes_edac: add support for reporting errors via EDAC
ghes_edac: Register at EDAC core the BIOS report
ghes: add the needed hooks for EDAC error report
ghes: move structures/enum to a header file
edac: add support for error type "Info"
edac: add support for raw error reports
edac: reduce stack pressure by using a pre-allocated buffer
edac: lock module owner to avoid error report conflicts
edac: remove proc_name from mci structure
edac: add a new memory layer type
edac: initialize the core earlier
edac: better report error conditions in debug mode
i5100_edac: Remove two checkpatch warnings
...
Pull thermal management updates from Zhang Rui:
"Highlights:
- introduction of Dove thermal sensor driver.
- introduction of Kirkwood thermal sensor driver.
- introduction of intel_powerclamp thermal cooling device driver.
- add interrupt and DT support for rcar thermal driver.
- add thermal emulation support which allows platform thermal driver
to do software/hardware emulation for thermal issues."
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux: (36 commits)
thermal: rcar: remove __devinitconst
thermal: return an error on failure to register thermal class
Thermal: rename thermal governor Kconfig option to avoid generic naming
thermal: exynos: Use the new thermal trend type for quick cooling action.
Thermal: exynos: Add support for temperature falling interrupt.
Thermal: Dove: Add Themal sensor support for Dove.
thermal: Add support for the thermal sensor on Kirkwood SoCs
thermal: rcar: add Device Tree support
thermal: rcar: remove machine_power_off() from rcar_thermal_notify()
thermal: rcar: add interrupt support
thermal: rcar: add read/write functions for common/priv data
thermal: rcar: multi channel support
thermal: rcar: use mutex lock instead of spin lock
thermal: rcar: enable CPCTL to use hardware TSC deciding
thermal: rcar: use parenthesis on macro
Thermal: fix a build warning when CONFIG_THERMAL_EMULATION cleared
Thermal: fix a wrong comment
thermal: sysfs: Add a new sysfs node emul_temp for thermal emulation
PM: intel_powerclamp: off by one in start_power_clamp()
thermal: exynos: Miscellaneous fixes to support falling threshold interrupt
...
Pull nfsd changes from J Bruce Fields:
"Miscellaneous bugfixes, plus:
- An overhaul of the DRC cache by Jeff Layton. The main effect is
just to make it larger. This decreases the chances of intermittent
errors especially in the UDP case. But we'll need to watch for any
reports of performance regressions.
- Containerized nfsd: with some limitations, we now support
per-container nfs-service, thanks to extensive work from Stanislav
Kinsbursky over the last year."
Some notes about conflicts, since there were *two* non-data semantic
conflicts here:
- idr_remove_all() had been added by a memory leak fix, but has since
become deprecated since idr_destroy() does it for us now.
- xs_local_connect() had been added by this branch to make AF_LOCAL
connections be synchronous, but in the meantime Trond had changed the
calling convention in order to avoid a RCU dereference.
There were a couple of more obvious actual source-level conflicts due to
the hlist traversal changes and one just due to code changes next to
each other, but those were trivial.
* 'for-3.9' of git://linux-nfs.org/~bfields/linux: (49 commits)
SUNRPC: make AF_LOCAL connect synchronous
nfsd: fix compiler warning about ambiguous types in nfsd_cache_csum
svcrpc: fix rpc server shutdown races
svcrpc: make svc_age_temp_xprts enqueue under sv_lock
lockd: nlmclnt_reclaim(): avoid stack overflow
nfsd: enable NFSv4 state in containers
nfsd: disable usermode helper client tracker in container
nfsd: use proper net while reading "exports" file
nfsd: containerize NFSd filesystem
nfsd: fix comments on nfsd_cache_lookup
SUNRPC: move cache_detail->cache_request callback call to cache_read()
SUNRPC: remove "cache_request" argument in sunrpc_cache_pipe_upcall() function
SUNRPC: rework cache upcall logic
SUNRPC: introduce cache_detail->cache_request callback
NFS: simplify and clean cache library
NFS: use SUNRPC cache creation and destruction helper for DNS cache
nfsd4: free_stid can be static
nfsd: keep a checksum of the first 256 bytes of request
sunrpc: trim off trailing checksum before returning decrypted or integrity authenticated buffer
sunrpc: fix comment in struct xdr_buf definition
...
Pull Ceph updates from Sage Weil:
"A few groups of patches here. Alex has been hard at work improving
the RBD code, layout groundwork for understanding the new formats and
doing layering. Most of the infrastructure is now in place for the
final bits that will come with the next window.
There are a few changes to the data layout. Jim Schutt's patch fixes
some non-ideal CRUSH behavior, and a set of patches from me updates
the client to speak a newer version of the protocol and implement an
improved hashing strategy across storage nodes (when the server side
supports it too).
A pair of patches from Sam Lang fix the atomicity of open+create
operations. Several patches from Yan, Zheng fix various mds/client
issues that turned up during multi-mds torture tests.
A final set of patches expose file layouts via virtual xattrs, and
allow the policies to be set on directories via xattrs as well
(avoiding the awkward ioctl interface and providing a consistent
interface for both kernel mount and ceph-fuse users)."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (143 commits)
libceph: add support for HASHPSPOOL pool flag
libceph: update osd request/reply encoding
libceph: calculate placement based on the internal data types
ceph: update support for PGID64, PGPOOL3, OSDENC protocol features
ceph: update "ceph_features.h"
libceph: decode into cpu-native ceph_pg type
libceph: rename ceph_pg -> ceph_pg_v1
rbd: pass length, not op for osd completions
rbd: move rbd_osd_trivial_callback()
libceph: use a do..while loop in con_work()
libceph: use a flag to indicate a fault has occurred
libceph: separate non-locked fault handling
libceph: encapsulate connection backoff
libceph: eliminate sparse warnings
ceph: eliminate sparse warnings in fs code
rbd: eliminate sparse warnings
libceph: define connection flag helpers
rbd: normalize dout() calls
rbd: barriers are hard
rbd: ignore zero-length requests
...
The client will currently try LAYOUTGETs forever if a server is returning
NFS4ERR_LAYOUTTRYLATER or NFS4ERR_RECALLCONFLICT - even if the client no
longer needs the layout (ie process killed, unmounted).
This patch uses the DS timeout value (module parameter 'dataserver_timeo'
via rpc layer) to set an upper limit of how long the client tries LATOUTGETs
in this situation. Once the timeout is reached, IO is redirected to the MDS.
This also changes how the client checks if a layout is on the clp list
to avoid a double list_add.
Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Returns the configured timeout for the xprt of the rpc client.
Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Pull block driver bits from Jens Axboe:
"After the block IO core bits are in, please grab the driver updates
from below as well. It contains:
- Fix ancient regression in dac960. Nobody must be using that
anymore...
- Some good fixes from Guo Ghao for loop, fixing both potential
oopses and deadlocks.
- Improve mtip32xx for NUMA systems, by being a bit more clever in
distributing work.
- Add IBM RamSan 70/80 driver. A second round of fixes for that is
pending, that will come in through for-linus during the 3.9 cycle
as per usual.
- A few xen-blk{back,front} fixes from Konrad and Roger.
- Other minor fixes and improvements."
* 'for-3.9/drivers' of git://git.kernel.dk/linux-block:
loopdev: ignore negative offset when calculate loop device size
loopdev: remove an user triggerable oops
loopdev: move common code into loop_figure_size()
loopdev: update block device size in loop_set_status()
loopdev: fix a deadlock
xen-blkback: use balloon pages for persistent grants
xen-blkfront: drop the use of llist_for_each_entry_safe
xen/blkback: Don't trust the handle from the frontend.
xen-blkback: do not leak mode property
block: IBM RamSan 70/80 driver fixes
rsxx: add slab.h include to dma.c
drivers/block/mtip32xx: add missing GENERIC_HARDIRQS dependency
block: remove new __devinit/exit annotations on ramsam driver
block: IBM RamSan 70/80 device driver
drivers/block/mtip32xx/mtip32xx.c:1726:5: sparse: symbol 'mtip_send_trim' was not declared. Should it be static?
drivers/block/mtip32xx/mtip32xx.c:4029:1: sparse: symbol 'mtip_workq_sdbf0' was not declared. Should it be static?
dac960: return success instead of -ENOTTY
mtip32xx: add trim support
mtip32xx: Add workqueue and NUMA support
block: delete super ancient PC-XT driver for 1980's hardware
Pull block IO core bits from Jens Axboe:
"Below are the core block IO bits for 3.9. It was delayed a few days
since my workstation kept crashing every 2-8h after pulling it into
current -git, but turns out it is a bug in the new pstate code (divide
by zero, will report separately). In any case, it contains:
- The big cfq/blkcg update from Tejun and and Vivek.
- Additional block and writeback tracepoints from Tejun.
- Improvement of the should sort (based on queues) logic in the plug
flushing.
- _io() variants of the wait_for_completion() interface, using
io_schedule() instead of schedule() to contribute to io wait
properly.
- Various little fixes.
You'll get two trivial merge conflicts, which should be easy enough to
fix up"
Fix up the trivial conflicts due to hlist traversal cleanups (commit
b67bfe0d42: "hlist: drop the node parameter from iterators").
* 'for-3.9/core' of git://git.kernel.dk/linux-block: (39 commits)
block: remove redundant check to bd_openers()
block: use i_size_write() in bd_set_size()
cfq: fix lock imbalance with failed allocations
drivers/block/swim3.c: fix null pointer dereference
block: don't select PERCPU_RWSEM
block: account iowait time when waiting for completion of IO request
sched: add wait_for_completion_io[_timeout]
writeback: add more tracepoints
block: add block_{touch|dirty}_buffer tracepoint
buffer: make touch_buffer() an exported function
block: add @req to bio_{front|back}_merge tracepoints
block: add missing block_bio_complete() tracepoint
block: Remove should_sort judgement when flush blk_plug
block,elevator: use new hashtable implementation
cfq-iosched: add hierarchical cfq_group statistics
cfq-iosched: collect stats from dead cfqgs
cfq-iosched: separate out cfqg_stats_reset() from cfq_pd_reset_stats()
blkcg: make blkcg_print_blkgs() grab q locks instead of blkcg lock
block: RCU free request_queue
blkcg: implement blkg_[rw]stat_recursive_sum() and blkg_[rw]stat_merge()
...
Merge third patch-bumb from Andrew Morton:
"This wraps me up for -rc1.
- Lots of misc stuff and things which were deferred/missed from
patchbombings 1 & 2.
- ocfs2 things
- lib/scatterlist
- hfsplus
- fatfs
- documentation
- signals
- procfs
- lockdep
- coredump
- seqfile core
- kexec
- Tejun's large IDR tree reworkings
- ipmi
- partitions
- nbd
- random() things
- kfifo
- tools/testing/selftests updates
- Sasha's large and pointless hlist cleanup"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (163 commits)
hlist: drop the node parameter from iterators
kcmp: make it depend on CHECKPOINT_RESTORE
selftests: add a simple doc
tools/testing/selftests/Makefile: rearrange targets
selftests/efivarfs: add create-read test
selftests/efivarfs: add empty file creation test
selftests: add tests for efivarfs
kfifo: fix kfifo_alloc() and kfifo_init()
kfifo: move kfifo.c from kernel/ to lib/
arch Kconfig: centralise CONFIG_ARCH_NO_VIRT_TO_BUS
w1: add support for DS2413 Dual Channel Addressable Switch
memstick: move the dereference below the NULL test
drivers/pps/clients/pps-gpio.c: use devm_kzalloc
Documentation/DMA-API-HOWTO.txt: fix typo
include/linux/eventfd.h: fix incorrect filename is a comment
mtd: mtd_stresstest: use prandom_bytes()
mtd: mtd_subpagetest: convert to use prandom library
mtd: mtd_speedtest: use prandom_bytes
mtd: mtd_pagetest: convert to use prandom library
mtd: mtd_oobtest: convert to use prandom library
...
The original device tree binding for this driver, from Viresh Kumar
unfortunately conflicted with the generic DMA binding, and did not allow
to completely seperate slave device configuration from the controller.
This is an attempt to replace it with an implementation of the generic
binding, but it is currently completely untested, because I do not have
any hardware with this particular controller.
The patch applies on top of the slave-dma tree, which contains both the base
support for the generic DMA binding, as well as the earlier attempt from
Viresh. Both of these are currently not merged upstream however.
This version incorporates feedback from Viresh Kumar, Andy Shevchenko
and Russell King.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Vinod Koul <vinod.koul@linux.intel.com>
Cc: devicetree-discuss@lists.ozlabs.org
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
I'm not sure why, but the hlist for each entry iterators were conceived
list_for_each_entry(pos, head, member)
The hlist ones were greedy and wanted an extra parameter:
hlist_for_each_entry(tpos, pos, head, member)
Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.
Besides the semantic patch, there was some manual work required:
- Fix up the actual hlist iterators in linux/list.h
- Fix up the declaration of other iterators based on the hlist ones.
- A very small amount of places were using the 'node' parameter, this
was modified to use 'obj->member' instead.
- Coccinelle didn't handle the hlist_for_each_entry_safe iterator
properly, so those had to be fixed up manually.
The semantic patch which is mostly the work of Peter Senna Tschudin is here:
@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;
type T;
expression a,c,d,e;
identifier b;
statement S;
@@
-T b;
<+... when != b
(
hlist_for_each_entry(a,
- b,
c, d) S
|
hlist_for_each_entry_continue(a,
- b,
c) S
|
hlist_for_each_entry_from(a,
- b,
c) S
|
hlist_for_each_entry_rcu(a,
- b,
c, d) S
|
hlist_for_each_entry_rcu_bh(a,
- b,
c, d) S
|
hlist_for_each_entry_continue_rcu_bh(a,
- b,
c) S
|
for_each_busy_worker(a, c,
- b,
d) S
|
ax25_uid_for_each(a,
- b,
c) S
|
ax25_for_each(a,
- b,
c) S
|
inet_bind_bucket_for_each(a,
- b,
c) S
|
sctp_for_each_hentry(a,
- b,
c) S
|
sk_for_each(a,
- b,
c) S
|
sk_for_each_rcu(a,
- b,
c) S
|
sk_for_each_from
-(a, b)
+(a)
S
+ sk_for_each_from(a) S
|
sk_for_each_safe(a,
- b,
c, d) S
|
sk_for_each_bound(a,
- b,
c) S
|
hlist_for_each_entry_safe(a,
- b,
c, d, e) S
|
hlist_for_each_entry_continue_rcu(a,
- b,
c) S
|
nr_neigh_for_each(a,
- b,
c) S
|
nr_neigh_for_each_safe(a,
- b,
c, d) S
|
nr_node_for_each(a,
- b,
c) S
|
nr_node_for_each_safe(a,
- b,
c, d) S
|
- for_each_gfn_sp(a, c, d, b) S
+ for_each_gfn_sp(a, c, d) S
|
- for_each_gfn_indirect_valid_sp(a, c, d, b) S
+ for_each_gfn_indirect_valid_sp(a, c, d) S
|
for_each_host(a,
- b,
c) S
|
for_each_host_safe(a,
- b,
c, d) S
|
for_each_mesh_entry(a,
- b,
c, d) S
)
...+>
[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: Peter Senna Tschudin <peter.senna@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Comment in eventfd.h referred to 'include/asm-generic/fcntl.h'
while the correct path is 'include/uapi/asm-generic/fcntl.h'.
Signed-off-by: Martin Sustrik <sustrik@250bpm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Given the obvious distinction between kernel and userspace supported
by uapi/, it seems unnecessary to comment on that.
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
While idr lookup isn't a particularly heavy operation, it still is too
substantial to use in hot paths without worrying about the performance
implications. With recent changes, each idr_layer covers 256 slots
which should be enough to cover most use cases with single idr_layer
making lookup hint very attractive.
This patch adds idr->hint which points to the idr_layer which
allocated an ID most recently and the fast path lookup becomes
if (look up target's prefix matches that of the hinted layer)
return hint->ary[ID's offset in the leaf layer];
which can be inlined.
idr->hint is set to the leaf node on idr_fill_slot() and cleared from
free_layer().
[andriy.shevchenko@linux.intel.com: always do slow path when hint is uninitialized]
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a field which carries the prefix of ID the idr_layer covers. This
will be used to implement lookup hint.
This patch doesn't make use of the new field and doesn't introduce any
behavior difference.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With recent preloading changes, idr no longer keeps full layer cache per
each idr instance (used to be ~6.5k per idr on 64bit) and the previous
patch removed restriction on the bitmap size. Both now allow us to have
larger layers.
Increase IDR_BITS to 8 regardless of BITS_PER_LONG. Each layer is
slightly larger than 2k on 64bit and 1k on 32bit and carries 256 entries.
The size isn't too large, especially compared to what we used to waste on
per-idr caches, and 256 entries should be able to serve most use cases
with single layer. The max tree depth is 4 which is much better than the
previous 6 on 64bit and 7 on 32bit.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently, idr->bitmap is declared as an unsigned long which restricts
the number of bits an idr_layer can contain. All bitops can handle
arbitrary positive integer bit number and there's no reason for this
restriction.
Declare idr_layer->bitmap using DECLARE_BITMAP() instead of a single
unsigned long.
* idr_layer->bitmap is now an array. '&' dropped from params to
bitops.
* Replaced "== IDR_FULL" tests with bitmap_full() and removed
IDR_FULL.
* Replaced find_next_bit() on ~bitmap with find_next_zero_bit().
* Replaced "bitmap = 0" with bitmap_clear().
This patch doesn't (or at least shouldn't) introduce any behavior
changes.
[akpm@linux-foundation.org: checkpatch fixes]
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
MAX_IDR_MASK is another weirdness in the idr interface. As idr covers
whole positive integer range, it's defined as 0x7fffffff or INT_MAX.
Its usage in idr_find(), idr_replace() and idr_remove() is bizarre.
They basically mask off the sign bit and operate on the rest, so if
the caller, by accident, passes in a negative number, the sign bit
will be masked off and the remaining part will be used as if that was
the input, which is worse than crashing.
The constant is visible in idr.h and there are several users in the
kernel.
* drivers/i2c/i2c-core.c:i2c_add_numbered_adapter()
Basically used to test if adap->nr is a negative number which isn't
-1 and returns -EINVAL if so. idr_alloc() already has negative
@start checking (w/ WARN_ON_ONCE), so this can go away.
* drivers/infiniband/core/cm.c:cm_alloc_id()
drivers/infiniband/hw/mlx4/cm.c:id_map_alloc()
Used to wrap cyclic @start. Can be replaced with max(next, 0).
Note that this type of cyclic allocation using idr is buggy. These
are prone to spurious -ENOSPC failure after the first wraparound.
* fs/super.c:get_anon_bdev()
The ID allocated from ida is masked off before being tested whether
it's inside valid range. ida allocated ID can never be a negative
number and the masking is unnecessary.
Update idr_*() functions to fail with -EINVAL when negative @id is
specified and update other MAX_IDR_MASK users as described above.
This leaves MAX_IDR_MASK without any user, remove it and relocate
other MAX_IDR_* constants to lib/idr.c.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jean Delvare <khali@linux-fr.org>
Cc: Roland Dreier <roland@kernel.org>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: "Marciniszyn, Mike" <mike.marciniszyn@intel.com>
Cc: Jack Morgenstein <jackm@dev.mellanox.co.il>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Wolfram Sang <wolfram@the-dreams.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The current idr interface is very cumbersome.
* For all allocations, two function calls - idr_pre_get() and
idr_get_new*() - should be made.
* idr_pre_get() doesn't guarantee that the following idr_get_new*()
will not fail from memory shortage. If idr_get_new*() returns
-EAGAIN, the caller is expected to retry pre_get and allocation.
* idr_get_new*() can't enforce upper limit. Upper limit can only be
enforced by allocating and then freeing if above limit.
* idr_layer buffer is unnecessarily per-idr. Each idr ends up keeping
around MAX_IDR_FREE idr_layers. The memory consumed per idr is
under two pages but it makes it difficult to make idr_layer larger.
This patch implements the following new set of allocation functions.
* idr_preload[_end]() - Similar to radix preload but doesn't fail.
The first idr_alloc() inside preload section can be treated as if it
were called with @gfp_mask used for idr_preload().
* idr_alloc() - Allocate an ID w/ lower and upper limits. Takes
@gfp_flags and can be used w/o preloading. When used inside
preloaded section, the allocation mask of preloading can be assumed.
If idr_alloc() can be called from a context which allows sufficiently
relaxed @gfp_mask, it can be used by itself. If, for example,
idr_alloc() is called inside spinlock protected region, preloading can
be used like the following.
idr_preload(GFP_KERNEL);
spin_lock(lock);
id = idr_alloc(idr, ptr, start, end, GFP_NOWAIT);
spin_unlock(lock);
idr_preload_end();
if (id < 0)
error;
which is much simpler and less error-prone than idr_pre_get and
idr_get_new*() loop.
The new interface uses per-pcu idr_layer buffer and thus the number of
idr's in the system doesn't affect the amount of memory used for
preloading.
idr_layer_alloc() is introduced to handle idr_layer allocations for
both old and new ID allocation paths. This is a bit hairy now but the
new interface is expected to replace the old and the internal
implementation eventually will become simpler.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
idr uses -1, IDR_NEED_TO_GROW and IDR_NOMORE_SPACE to communicate
exception conditions internally. The return value is later translated
to errno values using _idr_rc_to_errno().
This is confusing. Drop the custom ones and consistently use -EAGAIN
for "tree needs to grow", -ENOMEM for "need more memory" and -ENOSPC for
"ran out of ID space".
Due to the weird memory preloading mechanism, [ra]_get_new*() return
-EAGAIN on memory shortage, so we need to substitute -ENOMEM w/
-EAGAIN on those interface functions. They'll eventually be cleaned
up and the translations will go away.
This patch doesn't introduce any functional changes.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Move idr_for_each_entry() definition next to other idr related
definitions.
* Make id[r|a]_get_new() inline wrappers of id[r|a]_get_new_above().
This changes the implementation of idr_get_new() but the new
implementation is trivial. This patch doesn't introduce any
functional change.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Tab align fields like a normal person.
* Drop the unnecessary 0 inits from IDR_INIT().
This patch is purely cosmetic.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>