Commit Graph

531620 Commits

Author SHA1 Message Date
Ivan Khoronzhuk
863ef5ba29 Documentation: ABI: sysfs-firmware-dmi: add -entries suffix to file name
The dmi-sysfs module adds DMI table structures entries under
/sys/firmware/dmi/entries only, so rename documentation file to
sysfs-firmware-dmi-entries as more appropriate. Without renaming it's
confusing to differ this from sysfs-firmware-dmi-tables that adds raw
DMI table and actually adds "dmi" kobject.

Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@globallogic.com>
Signed-off-by: Jean Delvare <jdelvare@suse.de>
2015-06-25 09:06:57 +02:00
Ivan Khoronzhuk
d7f96f97c4 firmware: dmi_scan: add SBMIOS entry and DMI tables
Some utils, like dmidecode and smbios, need to access SMBIOS entry
table area in order to get information like SMBIOS version, size, etc.
Currently it's done via /dev/mem. But for situation when /dev/mem
usage is disabled, the utils have to use dmi sysfs instead, which
doesn't represent SMBIOS entry and adds code/delay redundancy when direct
access for table is needed.

So this patch creates dmi/tables and adds SMBIOS entry point to allow
utils in question to work correctly without /dev/mem. Also patch adds
raw dmi table to simplify dmi table processing in user space, as
proposed by Jean Delvare.

Tested-by: Roy Franz <roy.franz@linaro.org>
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@globallogic.com>
Signed-off-by: Jean Delvare <jdelvare@suse.de>
2015-06-25 09:06:56 +02:00
Jean Delvare
6e0ad59e3d firmware: dmi_scan: Trim DMI table length before exporting it
The SMBIOS v3 entry points specify a maximum length for the DMI table,
not the exact length. Thus there may be garbage after the end-of-table
marker, which we don't want to export to user-space. Adjust dmi_len
when we find the end-of-table marker, so that only the actual table
payload is exported.

Signed-off-by: Jean Delvare <jdelvare@suse.de>
Cc: Ivan Khoronzhuk <ivan.khoronzhuk@globallogic.com>
2015-06-25 09:06:56 +02:00
Ivan Khoronzhuk
eb4c5ea50e firmware: dmi_scan: Rename dmi_table to dmi_decode_table
The "dmi_table" function looks like data instance, but it does DMI
table decode. This patch renames it to "dmi_decode_table" name as
more appropriate. That allows us to use "dmi_table" name for correct
purposes.

Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@globallogic.com>
Signed-off-by: Jean Delvare <jdelvare@suse.de>
2015-06-25 09:06:56 +02:00
Jean Delvare
d4aeef9323 firmware: dmi: List my quilt tree
I'll be maintaining the pending patches to the dmi_scan and dmi-id
drivers as a quilt tree.

Signed-off-by: Jean Delvare <jdelvare@suse.de>
2015-06-25 09:06:56 +02:00
Jean Delvare
17cd5bd539 firmware: dmi_scan: Only honor end-of-table for 64-bit tables
A 32-bit entry point to a DMI table says how many structures the table
contains. The SMBIOS specification explicitly says that end-of-table
markers should be ignored if they are not actually at the end of the
DMI table. So only honor the end-of-table marker for tables accessed
through 64-bit entry points, as they do not specify a structure count.

Fixes: fc43026278 ("dmi: add support for SMBIOS 3.0 64-bit entry point")
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Leif Lindholm <leif.lindholm@linaro.org>
Cc: Matt Fleming <matt.fleming@intel.com>
2015-06-25 09:06:55 +02:00
Vinod Koul
657d61275d dmaengine: xgene: fix file permission
drivers/dma/xgene-dma.c has file permissions 775, which is wrong, it should
be 664, so fix it

Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2015-06-25 09:22:32 +05:30
Stefan Agner
0fe25d6110 dmaengine: fsl-edma: clear pending interrupts on initialization
Clear pending interrupts before requesting interrupts and move
interrupt initialization after channels have been initialized.
This avoids a NULL pointer dereference panic when using kexec
while DMA requests were running.

Signed-off-by: Stefan Agner <stefan@agner.ch>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2015-06-25 09:22:32 +05:30
Maxime Ripard
b206d9a23a dmaengine: xdmac: Add memset support
The XDMAC supports memset transfers, both over contiguous areas, and over
discontiguous areas through a LLI.

The current memset operation only supports contiguous memset for now, add some
support for it. Scatter-gathered memset will come eventually.

Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2015-06-25 09:22:32 +05:30
Vinod Koul
f2704052cb Merge branch 'topic/pxa' into for-linus 2015-06-25 09:21:58 +05:30
Vinod Koul
4fb9c15b4f Merge branch 'topic/xdmac' into for-linus 2015-06-25 09:21:49 +05:30
Vinod Koul
0e0fa66e39 Merge branch 'topic/omap' into for-linus 2015-06-25 09:21:43 +05:30
Vinod Koul
9324fdf526 Merge branch 'topic/core' into for-linus 2015-06-25 09:21:37 +05:30
Linus Torvalds
aefbef10e3 Merge branch 'akpm' (patches from Andrew)
Merge first patchbomb from Andrew Morton:

 - a few misc things

 - ocfs2 udpates

 - kernel/watchdog.c feature work (took ages to get right)

 - most of MM.  A few tricky bits are held up and probably won't make 4.2.

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (91 commits)
  mm: kmemleak_alloc_percpu() should follow the gfp from per_alloc()
  mm, thp: respect MPOL_PREFERRED policy with non-local node
  tmpfs: truncate prealloc blocks past i_size
  mm/memory hotplug: print the last vmemmap region at the end of hot add memory
  mm/mmap.c: optimization of do_mmap_pgoff function
  mm: kmemleak: optimise kmemleak_lock acquiring during kmemleak_scan
  mm: kmemleak: avoid deadlock on the kmemleak object insertion error path
  mm: kmemleak: do not acquire scan_mutex in kmemleak_do_cleanup()
  mm: kmemleak: fix delete_object_*() race when called on the same memory block
  mm: kmemleak: allow safe memory scanning during kmemleak disabling
  memcg: convert mem_cgroup->under_oom from atomic_t to int
  memcg: remove unused mem_cgroup->oom_wakeups
  frontswap: allow multiple backends
  x86, mirror: x86 enabling - find mirrored memory ranges
  mm/memblock: allocate boot time data structures from mirrored memory
  mm/memblock: add extra "flags" to memblock to allow selection of memory based on attribute
  mm: do not ignore mapping_gfp_mask in page cache allocation paths
  mm/cma.c: fix typos in comments
  mm/oom_kill.c: print points as unsigned int
  mm/hugetlb: handle races in alloc_huge_page and hugetlb_reserve_pages
  ...
2015-06-24 20:47:21 -07:00
Linus Torvalds
266da6f142 Miscellaneous pstore improvements
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJViyiCAAoJEKurIx+X31iB3acP/1kPYKClGZLoqH6wqHNM/djq
 ROsPm9PDt9g7WZ/yw5HpKuUzC+XhOg4odYpZIy+6PogW6BUCumxPCLVq/qbVSPhT
 Q7Pv0mlmjyJS+kj6FncWuJrJG0xQfKYE6OeYrnkyHfrHYmJunf1bS6K71CDGHJAa
 O2bQr4E67twuU8yQR8BZ+YlZu4NTzPYZ4JWmb9Wepm3seIM2GEJsNRZ7WJXH7BOv
 BHOPi8FDr8fMkA+2WitE853gYvcTYcuxlsDgRumtGzWDhRIUH8Q5yS9QLAFd0Rly
 BW7YOHYCY1L75RJxnVTWd04GNrepxe4LY1bbtx+mqI6FrdMw0dK0M5BKuohV+BT0
 tBC/anSHBOqua/aDA6m+8c+p8I7qp1wXNHtmm15lqKQg1YHvh0Rs7FlP3HBdLDxQ
 rmUQHTcVQGaf00GCTUgsEn80kW8FYYtOnh4FJcbSkLdU8/mkr1+1rE/3i8ob/AL9
 EsJgzqptT9/VBX3j6tZgk8tt6xstLMGVw/DmScjxeLqA2WoaINh5XRPgGCGWQW6T
 wa9nFGBuJFtJv9NfVlMEe8xekxyDa6xPIgoCheWBcV9NFRmfBrUmbG4yF127nqTG
 TXYBzFPSxdBn0FqGIaz+6RPbcGd7tZuz317sIYHTgHBWQHeoWG9TJGHeGt8L5uql
 eKMoHAvVRZIyBuZruUeB
 =248K
 -----END PGP SIGNATURE-----

Merge tag 'please-pull-pstore' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux

Pull pstore updates from Tony Luck:
 "Miscellaneous pstore improvements"

* tag 'please-pull-pstore' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
  ramoops: make it possible to change mem_type param.
  pstore/ram: verify ramoops header before saving record
  fs/pstore: Optimization function ramoops_init_przs
  fs/pstore: update the backend parameter in pstore module
  pstore: do not use message compression without lock
2015-06-24 20:42:21 -07:00
Linus Torvalds
cfcc0ad47f Merge tag 'for-f2fs-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
 "New features:
   - per-file encryption (e.g., ext4)
   - FALLOC_FL_ZERO_RANGE
   - FALLOC_FL_COLLAPSE_RANGE
   - RENAME_WHITEOUT

  Major enhancement/fixes:
   - recovery broken superblocks
   - enhance f2fs_trim_fs with a discard_map
   - fix a race condition on dentry block allocation
   - fix a deadlock during summary operation
   - fix a missing fiemap result

  .. and many minor bug fixes and clean-ups were done"

* tag 'for-f2fs-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (83 commits)
  f2fs: do not trim preallocated blocks when truncating after i_size
  f2fs crypto: add alloc_bounce_page
  f2fs crypto: fix to handle errors likewise ext4
  f2fs: drop the volatile_write flag only
  f2fs: skip committing valid superblock
  f2fs: setting discard option in parse_options()
  f2fs: fix to return exact trimmed size
  f2fs: support FALLOC_FL_INSERT_RANGE
  f2fs: hide common code in f2fs_replace_block
  f2fs: disable the discard option when device doesn't support
  f2fs crypto: remove alloc_page for bounce_page
  f2fs: fix a deadlock for summary page lock vs. sentry_lock
  f2fs crypto: clean up error handling in f2fs_fname_setup_filename
  f2fs crypto: avoid f2fs_inherit_context for symlink
  f2fs crypto: do not set encryption policy for non-directory by ioctl
  f2fs crypto: allow setting encryption policy once
  f2fs crypto: check context consistent for rename2
  f2fs: avoid duplicated code by reusing f2fs_read_end_io
  f2fs crypto: use per-inode tfm structure
  f2fs: recovering broken superblock during mount
  ...
2015-06-24 20:38:29 -07:00
Linus Torvalds
a7296b49fb Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull UDF fixes and cleanups from Jan Kara:
 "The contains some small fixes and improvements in error handling for
  UDF.

  Bundled is also one ext3 coding style fix and a fix in quota
  documentation"

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  udf: fix udf_load_pvoldesc()
  udf: remove double err declaration in udf_file_write_iter()
  UDF: support NFSv2 export
  fs: ext3: super: fixed a space coding style issue
  quota: Update documentation
  udf: Return error from udf_find_entry()
  udf: Make udf_get_filename() return error instead of 0 length file name
  udf: bug on exotic flag in udf_get_filename()
  udf: improve error management in udf_CS0toNLS()
  udf: improve error management in udf_CS0toUTF8()
  udf: unicode: update function name in comments
  udf: remove unnecessary test in udf_build_ustr_exact()
  udf: Return -ENOMEM when allocation fails in udf_get_filename()
2015-06-24 20:07:10 -07:00
Linus Torvalds
1e467e68e5 Documentation updates for 4.2
The main thing here is Ingo's big subdirectory documenting feature support
 for each architecture.  Beyond that, it's the usual pile of fixes, tweaks,
 and small additions.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVi0g2AAoJEI3ONVYwIuV6Me4QAIfa79z05ABSjlyWaKw46plH
 lULR9cyHdR59JVPHKjSOfT9/c+GOdoz6kkXQoe/TgVyj5fRB8seUW5GJXCASndkk
 aVd4c6yKFH1NISXsSdVQC0JbpgAURgcSR6x59It++fG3NINvXronFTWGMBHMLKcI
 A2hM2jNP914Dy5r4ipWZKzF1KxIlqK9kmLxlNoE6/LoQfBhh1dMdnyfuM11sguAy
 s5pr9JeCPbWC0RE7st/qEivXF4lpj6hd3XoYfM2Y+oukj5xEPQevLTLHOgtesnx9
 guUAul5Sw27n+Dx8I0Qxf1n+5SkrijoAa72g5vAxTs+ilOey67qba012NaYSy7RK
 s15XOIZ/1JTS9JjkO7GR5NbG6AiIIAH5P+Y501ivCIrsWciTOgKj7cOzakIEV8/P
 NX4120Lh5lbBrWeYkl8WbgMO0Me8cThbALC+rncF/wjvGyREKyxNlZ9qvBqmHYjG
 5Et2DT+rANaDmmblgMK3tX/zI1g3pN51e+CRF+Hzh1jZD3MZ/i+KS4qgfGFDzMIj
 uoniO5VfyD4zRbyv4Grg7XMpXiP8xFxKDypglYiXzzwlkarUgbMGOoFE7AkiPOKB
 t9gLPetbDsDyU/bSpzHlfObZp+q+pCxHPhyLS7hxEi3gBxYajIMbkpHHJugnE0+H
 TfkIhy6QQm1vAPTpRXaE
 =ODt8
 -----END PGP SIGNATURE-----

Merge tag 'docs-for-linus' of git://git.lwn.net/linux-2.6

Pull documentation updates from Jonathan Corbet:
 "The main thing here is Ingo's big subdirectory documenting feature
  support for each architecture.  Beyond that, it's the usual pile of
  fixes, tweaks, and small additions"

* tag 'docs-for-linus' of git://git.lwn.net/linux-2.6: (79 commits)
  doc:md: fix typo in md.txt.
  Documentation/mic/mpssd: don't build x86 userspace when cross compiling
  Documentation/prctl: don't build tsc tests when cross compiling
  Documentation/vDSO: don't build tests when cross compiling
  Doc:ABI/testing: Fix typo in sysfs-bus-fcoe
  Doc: Docbook: Change wikipedia's URL from http to https in scsi.tmpl
  Doc: Change wikipedia's URL from http to https
  Documentation/kernel-parameters: add missing pciserial to the earlyprintk
  Doc:pps: Fix typo in pps.txt
  kbuild : Fix documentation of INSTALL_HDR_PATH
  Documentation: filesystems: updated struct file_operations documentation in vfs.txt
  kbuild: edit explanation of clean-files variable
  Doc: ja_JP: Fix typo in HOWTO
  Move freefall program from Documentation/ to tools/
  Documentation: ARM: EXYNOS: Describe boot loaders interface
  Doc:nfc: Fix typo in nfc-hci.txt
  vfs: Minor documentation fix
  Doc: networking: txtimestamp: fix printf format warning
  Documentation, intel_pstate: Improve legacy mode internal governors description
  Documentation: extend use case for EXPORT_SYMBOL_GPL()
  ...
2015-06-24 20:01:36 -07:00
Linus Torvalds
14738e0331 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input subsystem updates from Dmitry Torokhov:
 "Thanks to Samuel Thibault input device (keyboard) LEDs are no longer
  hardwired within the input core but use LED subsystem and so allow use
  of different triggers; Hans de Goede did a large update for the ALPS
  touchpad driver; we have new TI drv2665 haptics driver and DA9063
  OnKey driver, and host of other drivers got various fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (55 commits)
  Input: pixcir_i2c_ts - fix receive error
  MAINTAINERS: remove non existent input mt git tree
  Input: improve usage of gpiod API
  tty/vt/keyboard: define LED triggers for VT keyboard lock states
  tty/vt/keyboard: define LED triggers for VT LED states
  Input: export LEDs as class devices in sysfs
  Input: cyttsp4 - use swap() in cyttsp4_get_touch()
  Input: goodix - do not explicitly set evbits in input device
  Input: goodix - export id and version read from device
  Input: goodix - fix variable length array warning
  Input: goodix - fix alignment issues
  Input: add OnKey driver for DA9063 MFD part
  Input: elan_i2c - add product IDs FW names
  Input: elan_i2c - add support for multi IC type and iap format
  Input: focaltech - report finger width to userspace
  tty: remove platform_sysrq_reset_seq
  Input: synaptics_i2c - use proper boolean values
  Input: psmouse - use true instead of 1 for boolean values
  Input: cyapa - fix a few typos in comments
  Input: stmpe-ts - enforce device tree only mode
  ...
2015-06-24 19:56:58 -07:00
Linus Torvalds
45471cd98d EDAC changes, v2:
* New APM X-Gene SoC EDAC driver (Loc Ho)
 
 * AMD error injection module improvements (Aravind Gopalakrishnan)
 
 * Altera Arria 10 support (Thor Thayer)
 
 * misc fixes and cleanups all over the place
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJViuInAAoJEBLB8Bhh3lVKHT8QAKkHIMreO8obo09haxNJlfdF
 BaG7SNEDhvcgQ1B76RsjnjkUpsivvUt+mCYMP+BxcAqFrTA33UZCCOK5tEhGb1wr
 matRdR6+aezqAl2e/0/Ti25bWOkDxcOeazh2TyezuyIXtaJjOq1oZC7OaYGmxPun
 NlZY+/uY1eiHlewKsK04y8G8J5i4wGoKnuxBvOyELT90+a+fLfAOshAp0D4r0piB
 Znv0ydsHlu+Wx57slg1DktlsyswmcGS9WfWwwTlELOLulKgN8wEAVYzUB5pJzNbz
 ehq0J4wYz95juXADC4M4tEjErHVJNl6PbyMqwt0+XUUJ1NSgOj7Q6iqwxDoZX8km
 oxiLVydQBtoIzF1LojFKAVZDFnrMKHKwK3RaDaUJjTI90+tVzEU8xsBlUf6+EgD2
 Ss2RH8Gfuf52RdtwHh9++T1ur5rM9YNCAm31msq06mcOf0bEtmDbhZ+fVC5mjhqB
 fIb3hxnk0r2BVg+ZCN/boxGS6RzUtYVcCXaBPDMeHcg9BEEds70KCFEcsX7TvJIg
 5/SHI+033MylqkX2zrgDQLj7CQk3R0jaotHVbdhLupyOldcM7r5uF+VO84drNWGN
 GfM2lpyE/swZWnzKuotgYIGR1XvFjtJAVAyNGIvwP+ajjTsqXzEnLSLClY5LWfYd
 nSSSMpCCqsEmhoWftOix
 =Id4f
 -----END PGP SIGNATURE-----

Merge tag 'edac_for_4.2_2' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp

Pull EDAC updates from Borislav Petkov:

 - New APM X-Gene SoC EDAC driver (Loc Ho)

 - AMD error injection module improvements (Aravind Gopalakrishnan)

 - Altera Arria 10 support (Thor Thayer)

 - misc fixes and cleanups all over the place

* tag 'edac_for_4.2_2' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: (28 commits)
  EDAC: Update Documentation/edac.txt
  EDAC: Fix typos in Documentation/edac.txt
  EDAC, mce_amd_inj: Set MISCV on injection
  EDAC, mce_amd_inj: Move bit preparations before the injection
  EDAC, mce_amd_inj: Cleanup and simplify README
  EDAC, altera: Do not allow suspend when EDAC is enabled
  EDAC, mce_amd_inj: Make inj_type static
  arm: socfpga: dts: Add Arria10 SDRAM EDAC DTS support
  EDAC, altera: Add Arria10 EDAC support
  EDAC, altera: Refactor for Altera CycloneV SoC
  EDAC, altera: Generalize driver to use DT Memory size
  EDAC, mce_amd_inj: Add README file
  EDAC, mce_amd_inj: Add individual permissions field to dfs_node
  EDAC, mce_amd_inj: Modify flags attribute to use string arguments
  EDAC, mce_amd_inj: Read out number of MCE banks from the hardware
  EDAC, mce_amd_inj: Use MCE_INJECT_GET macro for bank node too
  EDAC, xgene: Fix cpuid abuse
  EDAC, mpc85xx: Extend error address to 64 bit
  EDAC, mpc8xxx: Adapt for FSL SoC
  EDAC, edac_stub: Drop arch-specific include
  ...
2015-06-24 19:52:06 -07:00
Linus Torvalds
93a4b1b946 Here is the bulk of pin control changes for the v4.2 series:
- Core functionality:
   - Enable exclusive pin ownership: it is possible to flag a pin
     controller so that GPIO and other functions cannot use a single
     pin simultaneously.
 
 - New drivers:
   - NXP LPC18xx System Control Unit pin controller
   - Imagination Pistachio SoC pin controller
 
 - New subdrivers:
   - Freescale i.MX7d SoC
   - Intel Sunrisepoint-H PCH
   - Renesas PFC R8A7793
   - Renesas PFC R8A7794
   - Mediatek MT6397, MT8127
   - SiRF Atlas 7
   - Allwinner A33
   - Qualcomm MSM8660
   - Marvell Armada 395
   - Rockchip RK3368
 
 - Cleanups:
   - A big cleanup of the Marvell MVEBU driver rectifying it to
     correspond to reality
   - Drop platform device probing from the SH PFC driver, we are now a
     DT only shop for SuperH
   - Drop obsolte multi-platform check for SH PFC
   - Various janitorial: constification, grammar etc
 
 - Improvements:
   - The AT91 GPIO portions now supports the set_multiple() feature
   - Split out SPI pins on the Xilinx Zynq
   - Support DTs without specific function nodes in the i.MX driver
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVin37AAoJEEEQszewGV1zIlQP/i6+C47z3OV67hYAOmlGoynl
 wsdTFbyp+GIPl3N1r0lRzxOfQsuc9t93iDMrC5ssN9VFaj8MgH/j3XKWf5A55iVn
 u7nNQzIFjzTwl58/Pu4oM+d9l5i26o44teFKh3xI4aup4AFed3+lDkQtRipgo29c
 V4y+6SaQxQ46e2qaOAM20gEagm2a8EvChn1Zo/HLQnnmZcKBxgObJna7iTZWm+fN
 LzyBWtczFYPxfQ9IqYzklyeou4ohfrcHzqN71IEtmGMXxob+i04QS9FQXaPitgBG
 UORjwFVh8690n3ETQobjLrylOF5F/3+RdCGqanYOLgaJ0aix4+EByLz9FbxLPnJk
 4Utijk2SKxLUb3dXZIfpwKtmPmvLJkFqwSazN5WDIg9Rjqz/H1p9UTWP0cfPRwJa
 9INDZeK833kjYdtK6UMBpuNFkgGtpKTlhMX/cI78KYsEwVgK8r69b7uNr+2OUMgh
 4i7dbHgb5/NpHlUlacVPTBvXf7C1iQ//vqh0Oc20lp/mAY1tVGuYRHno6QVyRtfS
 DmCNPtbAgCa9FmP/t5NA8a3wana2ObTT2NCNMGEue7tJxVX4YaLpwIAEnUSHSJOQ
 seI8HT2M1yEiSes9V+OuigHt3pKk68fMe0ZqDkovcd4QBlub6WTAPXWrXpbHtBCo
 k+hT8TlDYaDbQkNDzXtg
 =UyKm
 -----END PGP SIGNATURE-----

Merge tag 'pinctrl-v4.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl

Pull pin control updates from Linus Walleij:
 "Here is the bulk of pin control changes for the v4.2 series: Quite a
  lot of new SoC subdrivers and two new main drivers this time, apart
  from that business as usual.

  Details:

  Core functionality:
   - Enable exclusive pin ownership: it is possible to flag a pin
     controller so that GPIO and other functions cannot use a single pin
     simultaneously.

  New drivers:
   - NXP LPC18xx System Control Unit pin controller
   - Imagination Pistachio SoC pin controller

  New subdrivers:
   - Freescale i.MX7d SoC
   - Intel Sunrisepoint-H PCH
   - Renesas PFC R8A7793
   - Renesas PFC R8A7794
   - Mediatek MT6397, MT8127
   - SiRF Atlas 7
   - Allwinner A33
   - Qualcomm MSM8660
   - Marvell Armada 395
   - Rockchip RK3368

  Cleanups:
   - A big cleanup of the Marvell MVEBU driver rectifying it to
     correspond to reality
   - Drop platform device probing from the SH PFC driver, we are now a
     DT only shop for SuperH
   - Drop obsolte multi-platform check for SH PFC
   - Various janitorial: constification, grammar etc

  Improvements:
   - The AT91 GPIO portions now supports the set_multiple() feature
   - Split out SPI pins on the Xilinx Zynq
   - Support DTs without specific function nodes in the i.MX driver"

* tag 'pinctrl-v4.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: (99 commits)
  pinctrl: rockchip: add support for the rk3368
  pinctrl: rockchip: generalize perpin driver-strength setting
  pinctrl: sh-pfc: r8a7794: add SDHI pin groups
  pinctrl: sh-pfc: r8a7794: add MMCIF pin groups
  pinctrl: sh-pfc: add R8A7794 PFC support
  pinctrl: make pinctrl_register() return proper error code
  pinctrl: mvebu: armada-39x: add support for Armada 395 variant
  pinctrl: mvebu: armada-39x: add missing SATA functions
  pinctrl: mvebu: armada-39x: add missing PCIe functions
  pinctrl: mvebu: armada-38x: add ptp functions
  pinctrl: mvebu: armada-38x: add ua1 functions
  pinctrl: mvebu: armada-38x: add nand functions
  pinctrl: mvebu: armada-38x: add sata functions
  pinctrl: mvebu: armada-xp: add dram functions
  pinctrl: mvebu: armada-xp: add nand rb function
  pinctrl: mvebu: armada-xp: add spi1 function
  pinctrl: mvebu: armada-39x: normalize ref clock naming
  pinctrl: mvebu: armada-xp: rename spi to spi0
  pinctrl: mvebu: armada-370: align spi1 clock pin naming
  pinctrl: mvebu: armada-370: align VDD cpu-pd pin naming with datasheet
  ...
2015-06-24 19:21:02 -07:00
Dave Airlie
6b8eeca65b drm/dp/mst: close deadlock in connector destruction.
I've only seen this once, and I failed to capture the
lockdep backtrace, but I did some investigations.

If we are calling into the MST layer from EDID probing,
we have the mode_config mutex held, if during that EDID
probing, the MST hub goes away, then we can get a deadlock
where the connector destruction function in the driver
tries to retake the mode config mutex.

This offloads connector destruction to a workqueue,
and avoid the subsequenct lock ordering issue.

Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: stable@vger.kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-06-25 11:57:23 +10:00
Linus Torvalds
d59b92f93d == Changes to existing drivers ==
- Supply MODULE_DEVICE_TABLE() to ensure probing
  - Constify struct; da9052_bl
  - Enable compile test; lcd_l4f00242t03, lcd_lms283fg05, backlight_gpio
  - Suspend/resume bugfix; lp855x_bl
  - devm_gpiod_get_optional() API fixup; pwm_bl
  - Error handling fixup; backlight
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJViXR7AAoJEFGvii+H/HdhZPsP/3kxXQNCt2voEj96MFLXWLwq
 1cbWbPboslNkdYPY9ygLTIqop/NOoEwiZMLE2Ea3uBDFewLU27vEX4waG+ILITjG
 /KxBAUXnNvFTrv9X+5hiOm6D+6kLk2M2eEygSC6f9QgBXP4VjApkVvpehdDKWT5x
 X11wlfx/TYhLcj5iggyW39fACp8Aig7LvOigS7fjfhwn1PAjWVw6NLrxmIlysWbH
 8qMfL0u8Ks5BYSh4xr5ATrB6OLx5Hu3mv9d8AK8o4XsRXOrFtR/dMThOLcJq/bFi
 4Vp4roi/30RSpow1yKPS3+TBRFbn2PG+6G6GVbWCO/uVkQiaxteyM79P0gEy5Y2a
 8WvV3vOMYY1/FszCOIfrJbj4No5/Bc2fObLXYDursLYMOPhUNrWeyRxbLfHTpKR3
 kim8XFGzLE5qFLqQWheqkHDq24y1iz6fl4YEZ8avf1rnDfzNJx8fnHk2uXZrW6ru
 HdjXbGC4pht1j6uM+DDROZ3iM5+2AMb/ASPLSCqslXXG82BCva3wasrMd3RttEQN
 bUltoWgWAMonIYoNx3CYwOGS9sWFllq1b0dTl1qDQPRT55sR4zcMZbMp16A0whJ7
 bJQMflbkgWDCeiMg5vXrImxmBBwfAe6IQ3yfEMUNNf+CzJxVnGjvpWvDWHo5iqhd
 Ul8lq/XdnXvpSSUugWhM
 =8cYG
 -----END PGP SIGNATURE-----

Merge tag 'backlight-for-linus-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight

Pull backlight updates from Lee Jones:
 "Changes to existing drivers:

   - supply MODULE_DEVICE_TABLE() to ensure probing
   - constify struct; da9052_bl
   - enable compile test; lcd_l4f00242t03, lcd_lms283fg05, backlight_gpio
   - suspend/resume bugfix; lp855x_bl
   - devm_gpiod_get_optional() API fixup; pwm_bl
   - error handling fixup; backlight"

* tag 'backlight-for-linus-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight:
  backlight: Change the return type of backlight_update_status() to int
  backlight: pwm_bl: Simplify usage of devm_gpiod_get_optional
  backlight: lp855x: Don't clear level on suspend/blank
  backlight: Allow compile test of GPIO consumers if !GPIOLIB
  video: backlight: da9052: Constify platform_device_id
  gpio-backlight: Discover driver during boot time
2015-06-24 18:57:00 -07:00
Dan Williams
0ba1c63489 libnvdimm: write blk label set
After 'uuid', 'size', 'sector_size', and optionally 'alt_name' have been
set to valid values the labels on the dimm can be updated.  The
difference with the pmem case is that blk namespaces are limited to one
dimm and can cover discontiguous ranges in dpa space.

Also, after allocating label slots, it is useful for userspace to know
how many slots are left.  Export this information in sysfs.

Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Neil Brown <neilb@suse.de>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-06-24 21:24:10 -04:00
Dan Williams
f524bf271a libnvdimm: write pmem label set
After 'uuid', 'size', and optionally 'alt_name' have been set to valid
values the labels on the dimms can be updated.

Write procedure is:
1/ Allocate and write new labels in the "next" index
2/ Free the old labels in the working copy
3/ Write the bitmap and the label space on the dimm
4/ Write the index to make the update valid

Label ranges directly mirror the dpa resource values for the given
label_id of the namespace.

Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Neil Brown <neilb@suse.de>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-06-24 21:24:10 -04:00
Dan Williams
1b40e09a12 libnvdimm: blk labels and namespace instantiation
A blk label set describes a namespace comprised of one or more
discontiguous dpa ranges on a single dimm.  They may alias with one or
more pmem interleave sets that include the given dimm.

This is the runtime/volatile configuration infrastructure for sysfs
manipulation of 'alt_name', 'uuid', 'size', and 'sector_size'.  A later
patch will make these settings persistent by writing back the label(s).

Unlike pmem namespaces, multiple blk namespaces can be created per
region.  Once a blk namespace has been created a new seed device
(unconfigured child of a parent blk region) is instantiated.  As long as
a region has 'available_size' != 0 new child namespaces may be created.

Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Neil Brown <neilb@suse.de>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-06-24 21:24:10 -04:00
Dan Williams
bf9bccc14c libnvdimm: pmem label sets and namespace instantiation.
A complete label set is a PMEM-label per-dimm per-interleave-set where
all the UUIDs match and the interleave set cookie matches the hosting
interleave set.

Present sysfs attributes for manipulation of a PMEM-namespace's
'alt_name', 'uuid', and 'size' attributes.  A later patch will make
these settings persistent by writing back the label.

Note that PMEM allocations grow forwards from the start of an interleave
set (lowest dimm-physical-address (DPA)).  BLK-namespaces that alias
with a PMEM interleave set will grow allocations backward from the
highest DPA.

Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Neil Brown <neilb@suse.de>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-06-24 21:24:10 -04:00
Dan Williams
4a826c83db libnvdimm: namespace indices: read and validate
This on media label format [1] consists of two index blocks followed by
an array of labels.  None of these structures are ever updated in place.
A sequence number tracks the current active index and the next one to
write, while labels are written to free slots.

    +------------+
    |            |
    |  nsindex0  |
    |            |
    +------------+
    |            |
    |  nsindex1  |
    |            |
    +------------+
    |   label0   |
    +------------+
    |   label1   |
    +------------+
    |            |
     ....nslot...
    |            |
    +------------+
    |   labelN   |
    +------------+

After reading valid labels, store the dpa ranges they claim into
per-dimm resource trees.

[1]: http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf

Cc: Neil Brown <neilb@suse.de>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-06-24 21:24:10 -04:00
Dan Williams
eaf961536e libnvdimm, nfit: add interleave-set state-tracking infrastructure
On platforms that have firmware support for reading/writing per-dimm
label space, a portion of the dimm may be accessible via an interleave
set PMEM mapping in addition to the dimm's BLK (block-data-window
aperture(s)) interface.  A label, stored in a "configuration data
region" on the dimm, disambiguates which dimm addresses are accessed
through which exclusive interface.

Add infrastructure that allows the kernel to block modifications to a
label in the set while any member dimm is active.  Note that this is
meant only for enforcing "no modifications of active labels" via the
coarse ioctl command.  Adding/deleting namespaces from an active
interleave set is always possible via sysfs.

Another aspect of tracking interleave sets is tracking their integrity
when DIMMs in a set are physically re-ordered.  For this purpose we
generate an "interleave-set cookie" that can be recorded in a label and
validated against the current configuration.  It is the bus provider
implementation's responsibility to calculate the interleave set cookie
and attach it to a given region.

Cc: Neil Brown <neilb@suse.de>
Cc: <linux-acpi@vger.kernel.org>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Robert Moore <robert.moore@intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-06-24 21:24:10 -04:00
Dan Williams
9f53f9fa4a libnvdimm, pmem: add libnvdimm support to the pmem driver
nd_pmem attaches to persistent memory regions and namespaces emitted by
the libnvdimm subsystem, and, same as the original pmem driver, presents
the system-physical-address range as a block device.

The existing e820-type-12 to pmem setup is converted to an nvdimm_bus
that emits an nd_namespace_io device.

Note that the X in 'pmemX' is now derived from the parent region.  This
provides some stability to the pmem devices names from boot-to-boot.
The minor numbers are also more predictable by passing 0 to
alloc_disk().

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Boaz Harrosh <boaz@plexistor.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Tested-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-06-24 21:24:10 -04:00
Dan Williams
18da2c9ee4 libnvdimm, pmem: move pmem to drivers/nvdimm/
Prepare the pmem driver to consume PMEM namespaces emitted by regions of
an nvdimm_bus instance.  No functional change.

Acked-by: Christoph Hellwig <hch@lst.de>
Tested-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-06-24 21:24:10 -04:00
Dan Williams
3d88002e4a libnvdimm: support for legacy (non-aliasing) nvdimms
The libnvdimm region driver is an intermediary driver that translates
non-volatile "region"s into "namespace" sub-devices that are surfaced by
persistent memory block-device drivers (PMEM and BLK).

ACPI 6 introduces the concept that a given nvdimm may simultaneously
offer multiple access modes to its media through direct PMEM load/store
access, or windowed BLK mode.  Existing nvdimms mostly implement a PMEM
interface, some offer a BLK-like mode, but never both as ACPI 6 defines.
If an nvdimm is single interfaced, then there is no need for dimm
metadata labels.  For these devices we can take the region boundaries
directly to create a child namespace device (nd_namespace_io).

Acked-by: Christoph Hellwig <hch@lst.de>
Tested-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-06-24 21:24:10 -04:00
Dan Williams
1f7df6f88b libnvdimm, nfit: regions (block-data-window, persistent memory, volatile memory)
A "region" device represents the maximum capacity of a BLK range (mmio
block-data-window(s)), or a PMEM range (DAX-capable persistent memory or
volatile memory), without regard for aliasing.  Aliasing, in the
dimm-local address space (DPA), is resolved by metadata on a dimm to
designate which exclusive interface will access the aliased DPA ranges.
Support for the per-dimm metadata/label arrvies is in a subsequent
patch.

The name format of "region" devices is "regionN" where, like dimms, N is
a global ida index assigned at discovery time.  This id is not reliable
across reboots nor in the presence of hotplug.  Look to attributes of
the region or static id-data of the sub-namespace to generate a
persistent name.  However, if the platform configuration does not change
it is reasonable to expect the same region id to be assigned at the next
boot.

"region"s have 2 generic attributes "size", and "mapping"s where:
- size: the BLK accessible capacity or the span of the
  system physical address range in the case of PMEM.

- mappingN: a tuple describing a dimm's contribution to the region's
  capacity in the format (<nmemX>,<dpa>,<size>).  For a PMEM-region
  there will be at least one mapping per dimm in the interleave set.  For
  a BLK-region there is only "mapping0" listing the starting DPA of the
  BLK-region and the available DPA capacity of that space (matches "size"
  above).

The max number of mappings per "region" is hard coded per the
constraints of sysfs attribute groups.  That said the number of mappings
per region should never exceed the maximum number of possible dimms in
the system.  If the current number turns out to not be enough then the
"mappings" attribute clarifies how many there are supposed to be. "32
should be enough for anybody...".

Cc: Neil Brown <neilb@suse.de>
Cc: <linux-acpi@vger.kernel.org>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Robert Moore <robert.moore@intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-06-24 21:24:10 -04:00
Dan Williams
4d88a97aa9 libnvdimm, nvdimm: dimm driver and base libnvdimm device-driver infrastructure
* Implement the device-model infrastructure for loading modules and
  attaching drivers to nvdimm devices.  This is a simple association of a
  nd-device-type number with a driver that has a bitmask of supported
  device types.  To facilitate userspace bind/unbind operations 'modalias'
  and 'devtype', that also appear in the uevent, are added as generic
  sysfs attributes for all nvdimm devices.  The reason for the device-type
  number is to support sub-types within a given parent devtype, be it a
  vendor-specific sub-type or otherwise.

* The first consumer of this infrastructure is the driver
  for dimm devices.  It simply uses control messages to retrieve and
  store the configuration-data image (label set) from each dimm.

Note: nd_device_register() arranges for asynchronous registration of
      nvdimm bus devices by default.

Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Neil Brown <neilb@suse.de>
Acked-by: Christoph Hellwig <hch@lst.de>
Tested-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-06-24 21:24:10 -04:00
Dan Williams
62232e45f4 libnvdimm: control (ioctl) messages for nvdimm_bus and nvdimm devices
Most discovery/configuration of the nvdimm-subsystem is done via sysfs
attributes.  However, some nvdimm_bus instances, particularly the
ACPI.NFIT bus, define a small set of messages that can be passed to the
platform.  For convenience we derive the initial libnvdimm-ioctl command
formats directly from the NFIT DSM Interface Example formats.

    ND_CMD_SMART: media health and diagnostics
    ND_CMD_GET_CONFIG_SIZE: size of the label space
    ND_CMD_GET_CONFIG_DATA: read label space
    ND_CMD_SET_CONFIG_DATA: write label space
    ND_CMD_VENDOR: vendor-specific command passthrough
    ND_CMD_ARS_CAP: report address-range-scrubbing capabilities
    ND_CMD_ARS_START: initiate scrubbing
    ND_CMD_ARS_STATUS: report on scrubbing state
    ND_CMD_SMART_THRESHOLD: configure alarm thresholds for smart events

If a platform later defines different commands than this set it is
straightforward to extend support to those formats.

Most of the commands target a specific dimm.  However, the
address-range-scrubbing commands target the bus.  The 'commands'
attribute in sysfs of an nvdimm_bus, or nvdimm, enumerate the supported
commands for that object.

Cc: <linux-acpi@vger.kernel.org>
Cc: Robert Moore <robert.moore@intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reported-by: Nicholas Moulin <nicholas.w.moulin@linux.intel.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-06-24 21:24:10 -04:00
Dan Williams
e6dfb2de47 libnvdimm, nfit: dimm/memory-devices
Enable nvdimm devices to be registered on a nvdimm_bus.  The kernel
assigned device id for nvdimm devicesis dynamic.  If userspace needs a
more static identifier it should consult a provider-specific attribute.
In the case where NFIT is the provider, the 'nmemX/nfit/handle' or
'nmemX/nfit/serial' attributes may be used for this purpose.

Cc: Neil Brown <neilb@suse.de>
Cc: <linux-acpi@vger.kernel.org>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Robert Moore <robert.moore@intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-06-24 21:24:10 -04:00
Dan Williams
45def22c1f libnvdimm: control character device and nvdimm_bus sysfs attributes
The control device for a nvdimm_bus is registered as an "nd" class
device.  The expectation is that there will usually only be one "nd" bus
registered under /sys/class/nd.  However, we allow for the possibility
of multiple buses and they will listed in discovery order as
ndctl0...ndctlN.  This character device hosts the ioctl for passing
control messages.  The initial command set has a 1:1 correlation with
the commands listed in the by the "NFIT DSM Example" document [1], but
this scheme is extensible to future command sets.

Note, nd_ioctl() and the backing ->ndctl() implementation are defined in
a subsequent patch.  This is simply the initial registrations and sysfs
attributes.

[1]: http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf

Cc: Neil Brown <neilb@suse.de>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: <linux-acpi@vger.kernel.org>
Cc: Robert Moore <robert.moore@intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-06-24 21:24:10 -04:00
Dan Williams
b94d5230d0 libnvdimm, nfit: initial libnvdimm infrastructure and NFIT support
A struct nvdimm_bus is the anchor device for registering nvdimm
resources and interfaces, for example, a character control device,
nvdimm devices, and I/O region devices.  The ACPI NFIT (NVDIMM Firmware
Interface Table) is one possible platform description for such
non-volatile memory resources in a system.  The nfit.ko driver attaches
to the "ACPI0012" device that indicates the presence of the NFIT and
parses the table to register a struct nvdimm_bus instance.

Cc: <linux-acpi@vger.kernel.org>
Cc: Lv Zheng <lv.zheng@intel.com>
Cc: Robert Moore <robert.moore@intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-06-24 21:24:10 -04:00
Larry Finger
8a8c35fadf mm: kmemleak_alloc_percpu() should follow the gfp from per_alloc()
Beginning at commit d52d3997f8 ("ipv6: Create percpu rt6_info"), the
following INFO splat is logged:

  ===============================
  [ INFO: suspicious RCU usage. ]
  4.1.0-rc7-next-20150612 #1 Not tainted
  -------------------------------
  kernel/sched/core.c:7318 Illegal context switch in RCU-bh read-side critical section!
  other info that might help us debug this:
  rcu_scheduler_active = 1, debug_locks = 0
   3 locks held by systemd/1:
   #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff815f0c8f>] rtnetlink_rcv+0x1f/0x40
   #1:  (rcu_read_lock_bh){......}, at: [<ffffffff816a34e2>] ipv6_add_addr+0x62/0x540
   #2:  (addrconf_hash_lock){+...+.}, at: [<ffffffff816a3604>] ipv6_add_addr+0x184/0x540
  stack backtrace:
  CPU: 0 PID: 1 Comm: systemd Not tainted 4.1.0-rc7-next-20150612 #1
  Hardware name: TOSHIBA TECRA A50-A/TECRA A50-A, BIOS Version 4.20   04/17/2014
  Call Trace:
    dump_stack+0x4c/0x6e
    lockdep_rcu_suspicious+0xe7/0x120
    ___might_sleep+0x1d5/0x1f0
    __might_sleep+0x4d/0x90
    kmem_cache_alloc+0x47/0x250
    create_object+0x39/0x2e0
    kmemleak_alloc_percpu+0x61/0xe0
    pcpu_alloc+0x370/0x630

Additional backtrace lines are truncated.  In addition, the above splat
is followed by several "BUG: sleeping function called from invalid
context at mm/slub.c:1268" outputs.  As suggested by Martin KaFai Lau,
these are the clue to the fix.  Routine kmemleak_alloc_percpu() always
uses GFP_KERNEL for its allocations, whereas it should follow the gfp
from its callers.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: <stable@vger.kernel.org>	[3.18+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:46 -07:00
Vlastimil Babka
0867a57c4f mm, thp: respect MPOL_PREFERRED policy with non-local node
Since commit 077fcf116c ("mm/thp: allocate transparent hugepages on
local node"), we handle THP allocations on page fault in a special way -
for non-interleave memory policies, the allocation is only attempted on
the node local to the current CPU, if the policy's nodemask allows the
node.

This is motivated by the assumption that THP benefits cannot offset the
cost of remote accesses, so it's better to fallback to base pages on the
local node (which might still be available, while huge pages are not due
to fragmentation) than to allocate huge pages on a remote node.

The nodemask check prevents us from violating e.g.  MPOL_BIND policies
where the local node is not among the allowed nodes.  However, the
current implementation can still give surprising results for the
MPOL_PREFERRED policy when the preferred node is different than the
current CPU's local node.

In such case we should honor the preferred node and not use the local
node, which is what this patch does.  If hugepage allocation on the
preferred node fails, we fall back to base pages and don't try other
nodes, with the same motivation as is done for the local node hugepage
allocations.  The patch also moves the MPOL_INTERLEAVE check around to
simplify the hugepage specific test.

The difference can be demonstrated using in-tree transhuge-stress test
on the following 2-node machine where half memory on one node was
occupied to show the difference.

> numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 24 25 26 27 28 29 30 31 32 33 34 35
node 0 size: 7878 MB
node 0 free: 3623 MB
node 1 cpus: 12 13 14 15 16 17 18 19 20 21 22 23 36 37 38 39 40 41 42 43 44 45 46 47
node 1 size: 8045 MB
node 1 free: 7818 MB
node distances:
node   0   1
  0:  10  21
  1:  21  10

Before the patch:
> numactl -p0 -C0 ./transhuge-stress
transhuge-stress: 2.197 s/loop, 0.276 ms/page,   7249.168 MiB/s 7962 succeed,    0 failed, 1786 different pages

> numactl -p0 -C12 ./transhuge-stress
transhuge-stress: 2.962 s/loop, 0.372 ms/page,   5376.172 MiB/s 7962 succeed,    0 failed, 3873 different pages

Number of successful THP allocations corresponds to free memory on node 0 in
the first case and node 1 in the second case, i.e. -p parameter is ignored and
cpu binding "wins".

After the patch:
> numactl -p0 -C0 ./transhuge-stress
transhuge-stress: 2.183 s/loop, 0.274 ms/page,   7295.516 MiB/s 7962 succeed,    0 failed, 1760 different pages

> numactl -p0 -C12 ./transhuge-stress
transhuge-stress: 2.878 s/loop, 0.361 ms/page,   5533.638 MiB/s 7962 succeed,    0 failed, 1750 different pages

> numactl -p1 -C0 ./transhuge-stress
transhuge-stress: 4.628 s/loop, 0.581 ms/page,   3440.893 MiB/s 7962 succeed,    0 failed, 3918 different pages

The -p parameter is respected regardless of cpu binding.

> numactl -C0 ./transhuge-stress
transhuge-stress: 2.202 s/loop, 0.277 ms/page,   7230.003 MiB/s 7962 succeed,    0 failed, 1750 different pages

> numactl -C12 ./transhuge-stress
transhuge-stress: 3.020 s/loop, 0.379 ms/page,   5273.324 MiB/s 7962 succeed,    0 failed, 3916 different pages

Without -p parameter, hugepage restriction to CPU-local node works as before.

Fixes: 077fcf116c ("mm/thp: allocate transparent hugepages on local node")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: <stable@vger.kernel.org>	[4.0+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:46 -07:00
Josef Bacik
afa2db2fb6 tmpfs: truncate prealloc blocks past i_size
One of the rocksdb people noticed that when you do something like this

    fallocate(fd, FALLOC_FL_KEEP_SIZE, 0, 10M)
    pwrite(fd, buf, 5M, 0)
    ftruncate(5M)

on tmpfs, the file would still take up 10M: which led to super fun
issues because we were getting ENOSPC before we thought we should be
getting ENOSPC.  This patch fixes the problem, and mirrors what all the
other fs'es do (and was agreed to be the correct behaviour at LSF).

I tested it locally to make sure it worked properly with the following

    xfs_io -f -c "falloc -k 0 10M" -c "pwrite 0 5M" -c "truncate 5M" file

Without the patch we have "Blocks: 20480", with the patch we have the
correct value of "Blocks: 10240".

Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:45 -07:00
Zhu Guihua
c435a39057 mm/memory hotplug: print the last vmemmap region at the end of hot add memory
When hot add two nodes continuously, we found the vmemmap region info is
a bit messed.  The last region of node 2 is printed when node 3 hot
added, like the following:

  Initmem setup node 2 [mem 0x0000000000000000-0xffffffffffffffff]
   On node 2 totalpages: 0
   Built 2 zonelists in Node order, mobility grouping on.  Total pages: 16090539
   Policy zone: Normal
   init_memory_mapping: [mem 0x40000000000-0x407ffffffff]
    [mem 0x40000000000-0x407ffffffff] page 1G
    [ffffea1000000000-ffffea10001fffff] PMD -> [ffff8a077d800000-ffff8a077d9fffff] on node 2
    [ffffea1000200000-ffffea10003fffff] PMD -> [ffff8a077de00000-ffff8a077dffffff] on node 2
  ...
    [ffffea101f600000-ffffea101f9fffff] PMD -> [ffff8a074ac00000-ffff8a074affffff] on node 2
    [ffffea101fa00000-ffffea101fdfffff] PMD -> [ffff8a074a800000-ffff8a074abfffff] on node 2
  Initmem setup node 3 [mem 0x0000000000000000-0xffffffffffffffff]
   On node 3 totalpages: 0
   Built 3 zonelists in Node order, mobility grouping on.  Total pages: 16090539
   Policy zone: Normal
   init_memory_mapping: [mem 0x60000000000-0x607ffffffff]
    [mem 0x60000000000-0x607ffffffff] page 1G
    [ffffea101fe00000-ffffea101fffffff] PMD -> [ffff8a074a400000-ffff8a074a5fffff] on node 2 <=== node 2 ???
    [ffffea1800000000-ffffea18001fffff] PMD -> [ffff8a074a600000-ffff8a074a7fffff] on node 3
    [ffffea1800200000-ffffea18005fffff] PMD -> [ffff8a074a000000-ffff8a074a3fffff] on node 3
    [ffffea1800600000-ffffea18009fffff] PMD -> [ffff8a0749c00000-ffff8a0749ffffff] on node 3
  ...

The cause is the last region was missed at the and of hot add memory,
and p_start, p_end, node_start were not reset, so when hot add memory to
a new node, it will consider they are not contiguous blocks and print
the previous one.  So we print the last vmemmap region at the end of hot
add memory to avoid the confusion.

Signed-off-by: Zhu Guihua <zhugh.fnst@cn.fujitsu.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:45 -07:00
Piotr Kwapulinski
e37609bb36 mm/mmap.c: optimization of do_mmap_pgoff function
The simple check for zero length memory mapping may be performed
earlier.  So that in case of zero length memory mapping some unnecessary
code is not executed at all.  It does not make the code less readable
and saves some CPU cycles.

Signed-off-by: Piotr Kwapulinski <kwapulinski.piotr@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:45 -07:00
Catalin Marinas
93ada579b0 mm: kmemleak: optimise kmemleak_lock acquiring during kmemleak_scan
The kmemleak memory scanning uses finer grained object->lock spinlocks
primarily to avoid races with the memory block freeing.  However, the
pointer lookup in the rb tree requires the kmemleak_lock to be held.
This is currently done in the find_and_get_object() function for each
pointer-like location read during scanning.  While this allows a low
latency on kmemleak_*() callbacks on other CPUs, the memory scanning is
slower.

This patch moves the kmemleak_lock outside the scan_block() loop,
acquiring/releasing it only once per scanned memory block.  The
allow_resched logic is moved outside scan_block() and a new
scan_large_block() function is implemented which splits large blocks in
MAX_SCAN_SIZE chunks with cond_resched() calls in-between.  A redundant
(object->flags & OBJECT_NO_SCAN) check is also removed from
scan_object().

With this patch, the kmemleak scanning performance is significantly
improved: at least 50% with lock debugging disabled and over an order of
magnitude with lock proving enabled (on an arm64 system).

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:45 -07:00
Catalin Marinas
9d5a4c730d mm: kmemleak: avoid deadlock on the kmemleak object insertion error path
While very unlikely (usually kmemleak or sl*b bug), the create_object()
function in mm/kmemleak.c may fail to insert a newly allocated object into
the rb tree.  When this happens, kmemleak disables itself and prints
additional information about the object already found in the rb tree.
Such printing is done with the parent->lock acquired, however the
kmemleak_lock is already held.  This is a potential race with the scanning
thread which acquires object->lock and kmemleak_lock in a

This patch removes the locking around the 'parent' object information
printing.  Such object cannot be freed or removed from object_tree_root
and object_list since kmemleak_lock is already held.  There is a very
small risk that some of the object data is being modified on another CPU
but the only downside is inconsistent information printing.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:45 -07:00
Catalin Marinas
5f369f374b mm: kmemleak: do not acquire scan_mutex in kmemleak_do_cleanup()
The kmemleak_do_cleanup() work thread already waits for the kmemleak_scan
thread to finish via kthread_stop().  Waiting in kthread_stop() while
scan_mutex is held may lead to deadlock if kmemleak_scan_thread() also
waits to acquire for scan_mutex.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:45 -07:00
Catalin Marinas
e781a9ab48 mm: kmemleak: fix delete_object_*() race when called on the same memory block
Calling delete_object_*() on the same pointer is not a standard use case
(unless there is a bug in the code calling kmemleak_free()).  However,
during kmemleak disabling (error or user triggered via /sys), there is a
potential race between kmemleak_free() calls on a CPU and
__kmemleak_do_cleanup() on a different CPU.

The current delete_object_*() implementation first performs a look-up
holding kmemleak_lock, increments the object->use_count and then
re-acquires kmemleak_lock to remove the object from object_tree_root and
object_list.

This patch simplifies the delete_object_*() mechanism to both look up
and remove an object from the object_tree_root and object_list
atomically (guarded by kmemleak_lock).  This allows safe concurrent
calls to delete_object_*() on the same pointer without additional
locking for synchronising the kmemleak_free_enabled flag.

A side effect is a slight improvement in the delete_object_*() performance
by avoiding acquiring kmemleak_lock twice and incrementing/decrementing
object->use_count.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:45 -07:00
Catalin Marinas
c5f3b1a51a mm: kmemleak: allow safe memory scanning during kmemleak disabling
The kmemleak scanning thread can run for minutes.  Callbacks like
kmemleak_free() are allowed during this time, the race being taken care
of by the object->lock spinlock.  Such lock also prevents a memory block
from being freed or unmapped while it is being scanned by blocking the
kmemleak_free() -> ...  -> __delete_object() function until the lock is
released in scan_object().

When a kmemleak error occurs (e.g.  it fails to allocate its metadata),
kmemleak_enabled is set and __delete_object() is no longer called on
freed objects.  If kmemleak_scan is running at the same time,
kmemleak_free() no longer waits for the object scanning to complete,
allowing the corresponding memory block to be freed or unmapped (in the
case of vfree()).  This leads to kmemleak_scan potentially triggering a
page fault.

This patch separates the kmemleak_free() enabling/disabling from the
overall kmemleak_enabled nob so that we can defer the disabling of the
object freeing tracking until the scanning thread completed.  The
kmemleak_free_part() is deliberately ignored by this patch since this is
only called during boot before the scanning thread started.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Vignesh Radhakrishnan <vigneshr@codeaurora.org>
Tested-by: Vignesh Radhakrishnan <vigneshr@codeaurora.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:45 -07:00
Tejun Heo
c2b42d3cad memcg: convert mem_cgroup->under_oom from atomic_t to int
memcg->under_oom tracks whether the memcg is under OOM conditions and is
an atomic_t counter managed with mem_cgroup_[un]mark_under_oom().  While
atomic_t appears to be simple synchronization-wise, when used as a
synchronization construct like here, it's trickier and more error-prone
due to weak memory ordering rules, especially around atomic_read(), and
false sense of security.

For example, both non-trivial read sites of memcg->under_oom are a bit
problematic although not being actually broken.

* mem_cgroup_oom_register_event()

  It isn't explicit what guarantees the memory ordering between event
  addition and memcg->under_oom check.  This isn't broken only because
  memcg_oom_lock is used for both event list and memcg->oom_lock.

* memcg_oom_recover()

  The lockless test doesn't have any explanation why this would be
  safe.

mem_cgroup_[un]mark_under_oom() are very cold paths and there's no point
in avoiding locking memcg_oom_lock there.  This patch converts
memcg->under_oom from atomic_t to int, puts their modifications under
memcg_oom_lock and documents why the lockless test in
memcg_oom_recover() is safe.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:45 -07:00
Tejun Heo
f4b90b70b7 memcg: remove unused mem_cgroup->oom_wakeups
Since commit 4942642080 ("mm: memcg: handle non-error OOM situations
more gracefully"), nobody uses mem_cgroup->oom_wakeups.  Remove it.

While at it, also fold memcg_wakeup_oom() into memcg_oom_recover() which
is its only user.  This cleanup was suggested by Michal.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:45 -07:00