Commit Graph

344344 Commits

Author SHA1 Message Date
Ralf Baechle
ae1242a546 MIPS: PowerTV: Fix build.
CC      arch/mips/powertv/init.o
/home/ralf/src/linux/linux-mips/arch/mips/powertv/init.c: In function ‘mips_nmi_setup’:
/home/ralf/src/linux/linux-mips/arch/mips/powertv/init.c:80:8: error: variable ‘base’ set but not used [-Werror=unused-but-set-variable]
/home/ralf/src/linux/linux-mips/arch/mips/powertv/init.c: In function ‘mips_ejtag_setup’:
/home/ralf/src/linux/linux-mips/arch/mips/powertv/init.c:94:8: error: variable ‘base’ set but not used [-Werror=unused-but-set-variable]
cc1: all warnings being treated as errors

As these two functions are, they don't serve any useful purpose so I've
deleted them entirely.

This warning exists in gcc 4.6.0 and newer.  Kernels 2.6.40 and newer use
-Wunused-but-set-variable to suppress it.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-12-13 18:15:28 +01:00
Dave Jones
686957e71d MIPS: IP27: Correct fucked grammar in ops-bridge.c
I had no idea just how broken IOC3 was until I read this.

Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-12-13 18:15:28 +01:00
Ralf Baechle
b99fbc10df MIPS: Highmem: Fix build error if CONFIG_DEBUG_HIGHMEM is disabled
CC      arch/mips/mm/highmem.o
/home/ralf/src/linux/linux-mips/arch/mips/mm/highmem.c: In function ‘__kunmap_atomic’:
/home/ralf/src/linux/linux-mips/arch/mips/mm/highmem.c:70:6: error: variable ‘type’ set but not used [-Werror=unused-but-set-variable]
cc1: all warnings being treated as errors

This warning exists in gcc 4.6.0 and newer.  Kernels 2.6.40 and newer use
-Wunused-but-set-variable to suppress it.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-12-13 18:15:27 +01:00
Ralf Baechle
a16dad7763 MIPS: Fix potencial corruption
Normally r4k_dma_cache_inv should only ever be called with cacheline
aligned addresses.  If however, it isn't there is the theoretical
possibility of data corruption.  There is no correct way of handling this
and anyway, it should only happen if the DMA API is used incorrectly
so drop

There is a different corruption scenario with these CACHE instructions
removed but again there is no way of handling this correctly and it can
be triggered only through incorrect use of the DMA API.

So just get rid of the complexity.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Reported-by: James Rodriguez <jamesr@juniper.net>
2012-12-13 18:15:27 +01:00
Ralf Baechle
51d943f07d MIPS: Fix for warning from FPU emulation code
The default implementation of 'cpu_has_fpu' macro calls
smp_processor_id() which causes this warning to be printed when
preemption is enabled:

[    4.664000] Algorithmics/MIPS FPU Emulator v1.5
[    4.676000] BUG: using smp_processor_id() in preemptible [00000000] code: ini
[    4.700000] caller is fpu_emulator_cop1Handler+0x434/0x27b8

This problem got introduced in November 2009 by
af1d2af877ef6c36990671bc86a5b9c5bb50b1da (lmo) [MIPS: Fix emulation of
64-bit FPU on 64-bit CPUs.] rsp.  da0bac3341
(kernel.org) [MIPS: Fix emulation of 64-bit FPU on FPU-less 64-bit CPUs.]
in 2.6.32.

Fixed by rewriting cop1_64bit() to return a constant whenever possible
but most importantly avoid the use pf cpu_has_fpu entirely.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Reported-by: Jayachandran C <jchandra@broadcom.com>
Initial-patch-by: Jayachandran C <jchandra@broadcom.com>
Patchwork: https://patchwork.linux-mips.org/patch/4225/
2012-12-13 18:15:27 +01:00
Maciej W. Rozycki
051ff44a8b MIPS: Handle COP3 Unusable exception as COP1X for FP emulation
Our FP emulator is hardcoded for the MIPS IV FP instruction set and does
not match the FP ISA with the general ISA.  However for the few MIPS IV FP
instructions that use the COP1X major opcode it relies on the Coprocessor
Unusable exception to be delivered as a COP1 rather than COP3 exception.
This includes indexed transfer (LDXC1, etc.) and FP multiply-accumulate
(MADD.D, etc.) instructions.

 All the MIPS I, II, III and IV processors and some newer chips that do not
implement the FPU use the COP3 exception however.  Therefore I believe the
kernel should follow and redirect any COP3 Unusable traps to the emulator
unless an actual FPU part or core is present.

 This is a change that implements it.  Any minor opcode encodings that are
not recognised as valid FP instructions are rejected by the emulator and
will result in a SIGILL signal being delivered as they currently do.  We
do not support vendor-specific coprocessor 3 implementations supported
with MIPS I and MIPS II ISA processors; we never set CP0.Status.CU3.

[Ralf: On MIPS IV processors the kernel always enables the XX bit which
replaces the CU3 bit off earlier architecture revisions.]

 If matching between the CPU and the FPU ISA is considered required one
day, this can still be done in the emulator itself.  I think the CpU
exception dispatcher is not the right place to do this anyway, as there
are further differences between MIPS I, MIPS II, MIPS III, MIPS IV and
MIPS32 FP ISAs.

 Corresponding explanation of this implementation is included within the
change itself.

Signed-off-by: Maciej W. Rozycki <macro@codesourcery.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/project/linux-mips/list/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-12-13 18:15:27 +01:00
Huacai Chen
8add1ecb81 MIPS: Fix poweroff failure when HOTPLUG_CPU configured.
When poweroff machine, kernel_power_off() call disable_nonboot_cpus().
And if we have HOTPLUG_CPU configured, disable_nonboot_cpus() is not an
empty function but attempt to actually disable the nonboot cpus. Since
system state is SYSTEM_POWER_OFF, play_dead() won't be called and thus
disable_nonboot_cpus() hangs. Therefore, we make this patch to avoid
poweroff failure.

Signed-off-by: Huacai Chen <chenhc@lemote.com>
Signed-off-by: Hongliang Tao <taohl@lemote.com>
Signed-off-by: Hua Yan <yanh@lemote.com>
Cc: Yong Zhang <yong.zhang@windriver.com>
Cc: stable@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Cc: Fuxin Zhang <zhangfx@lemote.com>
Cc: Zhangjin Wu <wuzhangjin@gmail.com>
Patchwork: https://patchwork.linux-mips.org/patch/4211/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-12-13 18:15:26 +01:00
Florian Fainelli
b88fb18e7e MIPS: MT: Fix build with CONFIG_UIDGID_STRICT_TYPE_CHECKS=y
When CONFIG_UIDGID_STRICT_TYPE_CHECKS is enabled, plain integer checking
between different uids/gids is explicitely turned into a build failure
by making the k{uid,gid}_t types a structure containing a value:

arch/mips/kernel/mips-mt-fpaff.c: In function 'check_same_owner':
arch/mips/kernel/mips-mt-fpaff.c:53:22: error: invalid operands to
binary == (have 'kuid_t' and 'kuid_t')
arch/mips/kernel/mips-mt-fpaff.c:54:15: error: invalid operands to
binary == (have 'kuid_t' and 'kuid_t')

In order to ensure proper comparison between uids, using the helper
function uid_eq() which performs the right thing whenever this config
option is turned on or off.

Signed-off-by: Florian Fainelli <florian@openwrt.org>
Patchwork: https://patchwork.linux-mips.org/patch/4717/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-12-13 18:15:26 +01:00
Paul Bolle
a685bc3dab MIPS: Remove unused smvp.h
This header was added in commit 39b8d52542
(kernel.org) / b6e90cd0ae7a556080d9ea2ec1b8f6d9accad9d4 (lmo( ([MIPS] Add
support for MIPS CMP platform.).  None of the functions it declared were
ever included in the tree. Commit cb7f39d2bc
(kernel.org) / b6e90cd0ae7a556080d9ea2ec1b8f6d9accad9d4 (lmo) [MIPS] Remove
unused maltasmp.h.] removeed the sole file that included it because that
file was itself unused.

[ralf@linux-mips.org: The whole mess happened because somebody at MIPS
thought it was a good idea to rename VSMP ("Vitual SMP") to SMVP.  Which
is an IBMeque ETLA in contrast to VSMP, so public kernels as opposed to
MTI's inhouse kernels never followed suit.]

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/3950/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-12-13 18:15:26 +01:00
David Daney
e1ced09797 MIPS/EDAC: Improve OCTEON EDAC support.
Some initialization errors are reported with the existing OCTEON EDAC
support patch.  Also some parts have more than one memory controller.

Fix the errors and add multiple controllers if present.

Signed-off-by: David Daney <david.daney@cavium.com>
2012-12-13 18:15:26 +01:00
David Daney
abe105a4d8 MIPS: OCTEON: Add definitions for OCTEON memory contoller registers.
Signed-off-by: David Daney <david.daney@cavium.com>
2012-12-13 18:15:25 +01:00
David Daney
6bbf6a6d48 MIPS: OCTEON: Add OCTEON family definitions to octeon-model.h
Used by follow-on EDAC patches.

Signed-off-by: David Daney <david.daney@cavium.com>
2012-12-13 18:15:25 +01:00
David Daney
1007c4bc0f ata: pata_octeon_cf: Use correct byte order for DMA in when built little-endian.
We need to set the 'endian' bit in this case.

Acked-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: David Daney <david.daney@cavium.com>
2012-12-13 18:15:25 +01:00
David Daney
43f01da0f2 MIPS/OCTEON/ata: Convert pata_octeon_cf.c to use device tree.
The patch needs to eliminate the definition of OCTEON_IRQ_BOOTDMA so
that the device tree code can map the interrupt, so in order to not
temporarily break things, we do a single patch to both the interrupt
registration code and the pata_octeon_cf driver.

Also rolled in is a conversion to use hrtimers and corrections to the
timing calculations.

Acked-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: David Daney <david.daney@cavium.com>
2012-12-13 18:15:24 +01:00
Ralf Baechle
f772cdb2bd MIPS: Remove usage of CEVT_R4K_LIB config option.
Manuel Lauss <manuel.lauss@gmail.com> writes:

I introduced it as a fallback because early revisions of Alchemy hardware
we shipped had a non-functional 32kHz timer and had to rely on the r4k
timer instead.  Previously the r4k timer was initialized regardless, but
it's useless with the "wait" instruction.

So long story short:   I need either the on-chip 32kHz timer OR the r4k
timer if the 32kHz one is unusable, but not both, and r4k timer is useless
when au1k_idle is in use.

The current in-kernel Alchemy boards all work with the 32kHz timer, so I'm
not against removing R4K_LIB symbols.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-12-13 18:15:24 +01:00
Steven J. Hill
d7ea335c05 MIPS: Remove usage of CSRC_R4K_LIB config option.
Manuel Lauss <manuel.lauss@gmail.com> writes:

I introduced it as a fallback because early revisions of Alchemy hardware
we shipped had a non-functional 32kHz timer and had to rely on the r4k
timer instead.  Previously the r4k timer was initialized regardless, but
it's useless with the "wait" instruction.

So long story short:   I need either the on-chip 32kHz timer OR the r4k
timer if the 32kHz one is unusable, but not both, and r4k timer is useless
when au1k_idle is in use.

The current in-kernel Alchemy boards all work with the 32kHz timer, so I'm
not against removing R4K_LIB symbols.

Signed-off-by: Steven J. Hill <sjhill@mips.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-12-13 18:15:24 +01:00
Florian Fainelli
dcb96a4e36 MIPS: AR7: use part_probe_types to specificy the partition parser to use
This patch changes the physmap-flash platform data on AR7 to pass the
correct partition parser: ar7part to used by the "physmap-flash" mapping
driver so we get the partitions probed correctly.

Signed-off-by: Florian Fainelli <florian@openwrt.org>
Cc: blogic@openwrt.org
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/4654/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-12-13 18:15:23 +01:00
Masanari Iida
d08be0dbe8 MIPS: Lantiq: Fix typo in "endianness" in dma.c
Correct spelling typo ENDIANESS to ENDIANNESS in arc/mips/lantiq/xway/dma.c

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Cc: trivial@kernel.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/4613/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-12-13 18:15:23 +01:00
Ralf Baechle
0e2794b0b7 MIPS: Kconfig: Rename several firmware related config symbols.
With the upcoming merge of the ARC architecture there is a small likelyhood
of conflicting use for the CONFIG_ARC config symbol.  Rename it to
CONFIG_FW_ARC.  Also rename CONFIG_ARC32 to CONFIG_FW_ARC32, CONFIG_ARC64
to CONFIG_FW_ARC64.

For consistence also rename CONFIG_SNIPROM to CONFIG_FW_SNIPROM and
CONFIG_CFE to CONFIG_FW_CFE.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-12-13 17:02:14 +01:00
Ralf Baechle
abe77f90dc MIPS: Octeon: Add kexec and kdump support
[ralf@linux-mips.org: Original patch by Maxim Uvarov <muvarov@gmail.com>
with plenty of further shining, polishing, debugging and testing by me.]

Signed-off-by: Maxim Uvarov <muvarov@gmail.com>
Cc: linux-mips@linux-mips.org
Cc: kexec@lists.infradead.org
Cc: horms@verge.net.au
Patchwork: https://patchwork.linux-mips.org/patch/1026/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-12-13 17:00:39 +01:00
Ralf Baechle
7aa1c8f47e MIPS: kdump: Add support
[ralf@linux-mips.org: Original patch by Maxim Uvarov <muvarov@gmail.com>
with plenty of further shining, polishing, debugging and testing by me.]

Signed-off-by: Maxim Uvarov <muvarov@gmail.com>
Cc: linux-mips@linux-mips.org
Cc: kexec@lists.infradead.org
Cc: horms@verge.net.au
Patchwork: https://patchwork.linux-mips.org/patch/1025/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2012-12-13 16:46:47 +01:00
Linus Torvalds
6be35c700f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking changes from David Miller:

1) Allow to dump, monitor, and change the bridge multicast database
   using netlink.  From Cong Wang.

2) RFC 5961 TCP blind data injection attack mitigation, from Eric
   Dumazet.

3) Networking user namespace support from Eric W. Biederman.

4) tuntap/virtio-net multiqueue support by Jason Wang.

5) Support for checksum offload of encapsulated packets (basically,
   tunneled traffic can still be checksummed by HW).  From Joseph
   Gasparakis.

6) Allow BPF filter access to VLAN tags, from Eric Dumazet and
   Daniel Borkmann.

7) Bridge port parameters over netlink and BPDU blocking support
   from Stephen Hemminger.

8) Improve data access patterns during inet socket demux by rearranging
   socket layout, from Eric Dumazet.

9) TIPC protocol updates and cleanups from Ying Xue, Paul Gortmaker, and
   Jon Maloy.

10) Update TCP socket hash sizing to be more in line with current day
    realities.  The existing heurstics were choosen a decade ago.
    From Eric Dumazet.

11) Fix races, queue bloat, and excessive wakeups in ATM and
    associated drivers, from Krzysztof Mazur and David Woodhouse.

12) Support DOVE (Distributed Overlay Virtual Ethernet) extensions
    in VXLAN driver, from David Stevens.

13) Add "oops_only" mode to netconsole, from Amerigo Wang.

14) Support set and query of VEB/VEPA bridge mode via PF_BRIDGE, also
    allow DCB netlink to work on namespaces other than the initial
    namespace.  From John Fastabend.

15) Support PTP in the Tigon3 driver, from Matt Carlson.

16) tun/vhost zero copy fixes and improvements, plus turn it on
    by default, from Michael S. Tsirkin.

17) Support per-association statistics in SCTP, from Michele
    Baldessari.

And many, many, driver updates, cleanups, and improvements.  Too
numerous to mention individually.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1722 commits)
  net/mlx4_en: Add support for destination MAC in steering rules
  net/mlx4_en: Use generic etherdevice.h functions.
  net: ethtool: Add destination MAC address to flow steering API
  bridge: add support of adding and deleting mdb entries
  bridge: notify mdb changes via netlink
  ndisc: Unexport ndisc_{build,send}_skb().
  uapi: add missing netconf.h to export list
  pkt_sched: avoid requeues if possible
  solos-pci: fix double-free of TX skb in DMA mode
  bnx2: Fix accidental reversions.
  bna: Driver Version Updated to 3.1.2.1
  bna: Firmware update
  bna: Add RX State
  bna: Rx Page Based Allocation
  bna: TX Intr Coalescing Fix
  bna: Tx and Rx Optimizations
  bna: Code Cleanup and Enhancements
  ath9k: check pdata variable before dereferencing it
  ath5k: RX timestamp is reported at end of frame
  ath9k_htc: RX timestamp is reported at end of frame
  ...
2012-12-12 18:07:07 -08:00
Linus Torvalds
e37aa63e87 MN10300 changes 2012-12-12
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIVAwUAUMi2WhOxKuMESys7AQK2bxAApJQL2x6/k4swH933rhdVooA2TiMVST3l
 XSy6yil6Qeqz82RDnVMfxQ069N8iP5x93fE918V6UzeIrUmKEL8xD2UJCZzjW6B9
 vmBNrD6VUGdiBhTcGY7er4EtlnRf1XJgUPmfdIEAJoZ8VMkKyYAGkckW2I8hiYbZ
 gyF+ONc+CHxspqS1CzNUmmbP84T6rij2fydqLaSNNnQYnEfICt7dciv73KBQYMtn
 AsCLcmWW4DkZ37VL6Bg8yvgRaxbNlZpS0Rl5oKS65rYX9azt/SvujSta0UEv+uYF
 m/2HqExwgo8HZHKyIEpRgBLqfOfekJATbSLEq3jEgA73MLdzw2DTgpJQOmWCjtjN
 7bROv2O57e8ttxb81x10YyInzOTYOd18XEb2Qa6O4wbB5TS8MxZywfuTfL+sdfsN
 pquqyKNgxD7HqqxIcWSNKGxkPPZ/Xk/JmgcQFVCjpvvdCizsFTwWeiAd81Jz0Dn+
 SLL345nlDJPVukgIiDiwm9UvkyG0Pg03K5k6+7QOWB/5AdPqgRUeOi6gqQE7ZQ9G
 GK8/2xX4xFJ8LLPqfh2X+1PUesa8Dhph4NorsW4comJtPcLuh30XbwIKTjpBF90y
 7OILeZeQ+qFu8S9lLSQOr6zxs3/9uKP8ADoOAnFmUEE2PkzvZqMrnrlezAyqtNht
 LkVa/IR/z50=
 =ex0l
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-20121212' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-mn10300

Pull MN10300 changes from David Howells:
 "miscellaneous MN10300 arch patches.  I've based it on top of Al Viro's
  signal tree - so these patches should be pulled after that."

* tag 'for-linus-20121212' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-mn10300:
  MN10300: Use asm-generic/pci_iomap.h
  MN10300: Get rid of unused variable from ASB2305 PCI code
  MN10300: ASB2305 PCI code needs linux/irq.h
  mn10300/mm/fault.c: Port OOM changes to do_page_fault
  MN10300: Handle cacheable PCI regions in pci_iomap()
  MN10300: fix debug polling in ttySM driver
  MN10300: ttySM: clean up unnecessary casting
  MN10300: fix SMP synchronization between txdma and serial driver
  MN10300: fix serial port vdma irq setup for SMP
  MN10300: cleanup IRQ affinity setting
  MN10300: ttySM: Use memory barriers correctly in circular buffer logic
2012-12-12 17:50:34 -08:00
Lin Feng
98870901cc mm/bootmem.c: remove unused wrapper function reserve_bootmem_generic()
reserve_bootmem_generic() has no caller,

Signed-off-by: Lin Feng <linfeng@cn.fujitsu.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:35 -08:00
Dominik Dingel
66521d5aa6 mm/memory.c: remove unused code from do_wp_page()
page_mkwrite is initalized with zero and only set once, from that point
exists no way to get to the oom or oom_free_new labels.

[akpm@linux-foundation.org: cleanup]
Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:35 -08:00
Kirill A. Shutemov
816422ad76 asm-generic, mm: pgtable: consolidate zero page helpers
We have two different implementation of is_zero_pfn() and my_zero_pfn()
helpers: for architectures with and without zero page coloring.

Let's consolidate them in <asm-generic/pgtable.h>.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:35 -08:00
Naoya Horiguchi
56f2fb1476 mm/hugetlb.c: fix warning on freeing hwpoisoned hugepage
Fix the warning from __list_del_entry() which is triggered when a process
tries to do free_huge_page() for a hwpoisoned hugepage.

free_huge_page() can be called for hwpoisoned hugepage from
unpoison_memory().  This function gets refcount once and clears
PageHWPoison, and then puts refcount twice to return the hugepage back to
free pool.  The second put_page() finally reaches free_huge_page().

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:35 -08:00
Naoya Horiguchi
5f24ae585b hwpoison, hugetlbfs: fix RSS-counter warning
Memory error handling on hugepages can break a RSS counter, which emits a
message like "Bad rss-counter state mm:ffff88040abecac0 idx:1 val:-1".
This is because PageAnon returns true for hugepage (this behavior is
necessary for reverse mapping to work on hugetlbfs).

[akpm@linux-foundation.org: clean up code layout]
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:35 -08:00
Naoya Horiguchi
8c4894c6bc hwpoison, hugetlbfs: fix "bad pmd" warning in unmapping hwpoisoned hugepage
When a process which used a hwpoisoned hugepage tries to exit() or
munmap(), the kernel can print out "bad pmd" message because page table
walker in free_pgtables() encounters 'hwpoisoned entry' on pmd.

This is because currently we fail to clear the hwpoisoned entry in
__unmap_hugepage_range(), so this patch simply does it.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:35 -08:00
Michel Lespinasse
4128997b5f mm: protect against concurrent vma expansion
expand_stack() runs with a shared mmap_sem lock.  Because of this, there
could be multiple concurrent stack expansions in the same mm, which may
cause problems in the vma gap update code.

I propose to solve this by taking the mm->page_table_lock around such vma
expansions, in order to avoid the concurrency issue.  We only have to
worry about concurrent expand_stack() calls here, since we hold a shared
mmap_sem lock and all vma modificaitons other than expand_stack() are done
under an exclusive mmap_sem lock.

I previously tried to achieve the same effect by making sure all growable
vmas in a given mm would share the same anon_vma, which we already lock
here.  However this turned out to be difficult - all of the schemes I
tried for refcounting the growable anon_vma and clearing turned out ugly.
So, I'm now proposing only the minimal fix.

The overhead of taking the page table lock during stack expansion is
expected to be small: glibc doesn't use expandable stacks for the threads
it creates, so having multiple growable stacks is actually uncommon and we
don't expect the page table lock to get bounced between threads.

Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:35 -08:00
Michal Hocko
c95d26c2ff memcg: do not check for mm in __mem_cgroup_count_vm_event
The mm given to __mem_cgroup_count_vm_event() cannot be NULL because the
function is either called from the page fault path or vma->vm_mm is used.
So the check can be dropped.

The check was introduced by commit 456f998ec8 ("memcg: add the
pagefault count into memcg stats") because the originally proposed patch
used current->mm for shmem but this has been changed to vma->vm_mm later
on without the check being removed (thanks to Hugh for this
recollection).

Signed-off-by: Michal Hocko <mhocko@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ying Han <yinghan@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:35 -08:00
Hugh Dickins
220f2ac913 tmpfs: support SEEK_DATA and SEEK_HOLE (reprise)
Revert 3.5's commit f21f806220 ("tmpfs: revert SEEK_DATA and
SEEK_HOLE") to reinstate 4fb5ef089b ("tmpfs: support SEEK_DATA and
SEEK_HOLE"), with the intervening additional arg to
generic_file_llseek_size().

In 3.8, ext4 is expected to join btrfs, ocfs2 and xfs with proper
SEEK_DATA and SEEK_HOLE support; and a good case has now been made for
it on tmpfs, so let's join the party.

It's quite easy for tmpfs to scan the radix_tree to support llseek's new
SEEK_DATA and SEEK_HOLE options: so add them while the minutiae are
still on my mind (in particular, the !PageUptodate-ness of pages
fallocated but still unwritten).

[akpm@linux-foundation.org: fix warning with CONFIG_TMPFS=n]
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jaegeuk Hanse <jaegeuk.hanse@gmail.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Zheng Liu <wenqing.lz@taobao.com>
Cc: Jeff liu <jeff.liu@oracle.com>
Cc: Paul Eggert <eggert@cs.ucla.edu>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Josef Bacik <josef@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Andreas Dilger <adilger@dilger.ca>
Cc: Marco Stornelli <marco.stornelli@gmail.com>
Cc: Chris Mason <chris.mason@fusionio.com>
Cc: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:35 -08:00
Jiang Liu
01cefaef40 mm: provide more accurate estimation of pages occupied by memmap
If SPARSEMEM is enabled, it won't build page structures for non-existing
pages (holes) within a zone, so provide a more accurate estimation of
pages occupied by memmap if there are bigger holes within the zone.

And pages for highmem zones' memmap will be allocated from lowmem, so
charge nr_kernel_pages for that.

[akpm@linux-foundation.org: mark calc_memmap_size __paging_init]
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Maciej Rutecki <maciej.rutecki@gmail.com>
Cc: Chris Clayton <chris2553@googlemail.com>
Cc: "Rafael J . Wysocki" <rjw@sisk.pl>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Michal Hocko <mhocko@suse.cz>
Tested-by: Jianguo Wu <wujianguo@huawei.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:35 -08:00
Yan Hong
02c0ab684f fs/buffer.c: remove redundant initialization in alloc_page_buffers()
buffer_head comes from kmem_cache_zalloc(), no need to zero its fields.

Signed-off-by: Yan Hong <clouds.yan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:35 -08:00
Yan Hong
a3f3c29cb2 fs/buffer.c: do not inline exported function
It makes no sense to inline an exported function.

Signed-off-by: Yan Hong <clouds.yan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:34 -08:00
Yan Hong
5aaea51dfb writeback: fix a typo in comment
Signed-off-by: Yan Hong <clouds.yan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:34 -08:00
Jiang Liu
9feedc9d83 mm: introduce new field "managed_pages" to struct zone
Currently a zone's present_pages is calcuated as below, which is
inaccurate and may cause trouble to memory hotplug.

	spanned_pages - absent_pages - memmap_pages - dma_reserve.

During fixing bugs caused by inaccurate zone->present_pages, we found
zone->present_pages has been abused.  The field zone->present_pages may
have different meanings in different contexts:

1) pages existing in a zone.
2) pages managed by the buddy system.

For more discussions about the issue, please refer to:
  http://lkml.org/lkml/2012/11/5/866
  https://patchwork.kernel.org/patch/1346751/

This patchset tries to introduce a new field named "managed_pages" to
struct zone, which counts "pages managed by the buddy system".  And revert
zone->present_pages to count "physical pages existing in a zone", which
also keep in consistence with pgdat->node_present_pages.

We will set an initial value for zone->managed_pages in function
free_area_init_core() and will adjust it later if the initial value is
inaccurate.

For DMA/normal zones, the initial value is set to:

	(spanned_pages - absent_pages - memmap_pages - dma_reserve)

Later zone->managed_pages will be adjusted to the accurate value when the
bootmem allocator frees all free pages to the buddy system in function
free_all_bootmem_node() and free_all_bootmem().

The bootmem allocator doesn't touch highmem pages, so highmem zones'
managed_pages is set to the accurate value "spanned_pages - absent_pages"
in function free_area_init_core() and won't be updated anymore.

This patch also adds a new field "managed_pages" to /proc/zoneinfo
and sysrq showmem.

[akpm@linux-foundation.org: small comment tweaks]
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Maciej Rutecki <maciej.rutecki@gmail.com>
Tested-by: Chris Clayton <chris2553@googlemail.com>
Cc: "Rafael J . Wysocki" <rjw@sisk.pl>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Jianguo Wu <wujianguo@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:34 -08:00
David Rientjes
c2d23f919b mm, oom: remove statically defined arch functions of same name
out_of_memory() is a globally defined function to call the oom killer.
x86, sh, and powerpc all use a function of the same name within file scope
in their respective fault.c unnecessarily.  Inline the functions into the
pagefault handlers to clean the code up.

Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:34 -08:00
David Rientjes
0fa84a4bfa mm, oom: remove redundant sleep in pagefault oom handler
out_of_memory() will already cause current to schedule if it has not been
killed, so doing it again in pagefault_out_of_memory() is redundant.
Remove it.

Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:34 -08:00
David Rientjes
efacd02e4f mm, oom: cleanup pagefault oom handler
To lock the entire system from parallel oom killing, it's possible to pass
in a zonelist with all zones rather than using for_each_populated_zone()
for the iteration.  This obsoletes try_set_system_oom() and
clear_system_oom() so that they can be removed.

Signed-off-by: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:34 -08:00
Lai Jiangshan
09285af75d memory_hotplug: allow online/offline memory to result movable node
Now, memory management can handle movable node or nodes which don't have
any normal memory, so we can dynamic configure and add movable node by:

	online a ZONE_MOVABLE memory from a previous offline node
	offline the last normal memory which result a non-normal-memory-node

movable-node is very important for power-saving, hardware partitioning and
high-available-system(hardware fault management).

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Tested-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:34 -08:00
Lai Jiangshan
20b2f52b73 numa: add CONFIG_MOVABLE_NODE for movable-dedicated node
We need a node which only contains movable memory.  This feature is very
important for node hotplug.  If a node has normal/highmem, the memory may
be used by the kernel and can't be offlined.  If the node only contains
movable memory, we can offline the memory and the node.

All are prepared, we can actually introduce N_MEMORY.
add CONFIG_MOVABLE_NODE make we can use it for movable-dedicated node

[akpm@linux-foundation.org: fix Kconfig text]
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Tested-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:34 -08:00
David Rientjes
68ae564bba mm, memcg: avoid unnecessary function call when memcg is disabled
While profiling numa/core v16 with cgroup_disable=memory on the command
line, I noticed mem_cgroup_count_vm_event() still showed up as high as
0.60% in perftop.

This occurs because the function is called extremely often even when memcg
is disabled.

To fix this, inline the check for mem_cgroup_disabled() so we avoid the
unnecessary function call if memcg is disabled.

Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Glauber Costa <glommer@parallels.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:34 -08:00
Andrew Morton
05b0afd73d mm: add a reminder comment for __GFP_BITS_SHIFT
Cc: Glauber Costa <glommer@parallels.com>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:34 -08:00
Joonsoo Kim
2897b4d29d mm: WARN_ON_ONCE if f_op->mmap() change vma's start address
During reviewing the source code, I found a comment which mention that
after f_op->mmap(), vma's start address can be changed.  I didn't verify
that it is really possible, because there are so many f_op->mmap()
implementation.  But if there are some mmap() which change vma's start
address, it is possible error situation, because we already prepare prev
vma, rb_link and rb_parent and these are related to original address.

So add WARN_ON_ONCE for finding that this situtation really happens.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:33 -08:00
Greg Thelen
44e33e8f95 res_counter: delete res_counter_write()
Since commit 628f423553 ("memcg: limit change shrink usage") both
res_counter_write() and write_strategy_fn have been unused.  This patch
deletes them both.

Signed-off-by: Greg Thelen <gthelen@google.com>
Cc: Glauber Costa <glommer@parallels.com>
Cc: Tejun Heo <tj@kernel.org>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Frederic Weisbecker <fweisbec@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:33 -08:00
Lai Jiangshan
6715ddf945 hotplug: update nodemasks management
Update nodemasks management for N_MEMORY.

[lliubbo@gmail.com: fix build]
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Lin Feng <linfeng@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Bob Liu <lliubbo@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:33 -08:00
Lai Jiangshan
4b0ef1fe8a page_alloc: use N_MEMORY instead N_HIGH_MEMORY change the node_states initialization
N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.

The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.

Since we introduced N_MEMORY, we update the initialization of node_states.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Lin Feng <linfeng@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:33 -08:00
Lai Jiangshan
48fb2e240c vmscan: use N_MEMORY instead N_HIGH_MEMORY
N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.

The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Hillf Danton <dhillf@gmail.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Lin Feng <linfeng@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:33 -08:00
Lai Jiangshan
3c466d46a9 init: use N_MEMORY instead N_HIGH_MEMORY
N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.

The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Lin Feng <linfeng@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:33 -08:00