linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-12-02 08:34:20 +08:00

Author	SHA1	Message	Date
Linus Torvalds	dfab34aa61	ARM: arm-soc device-tree updates for 3.10, part 1 Device-tree updates for 3.10. The bulk of the churn in this branch is due to i.MX moving from C-defined pin control over to device tree, which is a one-time conversion that will allow greater flexibility down the road. Besides that, there's PCI-e bindings for Marvell mvebu platforms and a handful of cleanups to tegra due to the new include file functionality of the device tree compiler. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJRgg+aAAoJEIwa5zzehBx3/q0P/RumfsMePxhmSU4HM16a3w0B 9jg7wd9BxVrJUzTY9F7z+Q72x0u5USUtVnyoY5s68DQMkFyhBQUuKCCiwCqtpCBN 2Uf0JQjYHdqEFKgN6DiPxSVRPXC8jmMzYGRk5RTI5kVWxaBEMdw9rTo0x4vol/Cv 7Z+W+gixXZbgydH/ogqly1MQc9vWliRTfU2zv2WOZ7TLyyEd2lOjMMBIX/n3vI4l T32JOUDgIYK841s9n2eNQGEjqB/OghMMrQsdjUAd++je6QtqgZk9+uHfPFC1C0wQ 3F93te9HleluYcOcxGmedK3B9QO2Y8y1XHe+uxLZVKXBR+6/5AtSwZFRQm10uMCI JUz3j6tRAWDAOin2vXZcf2CVPn5HZbh3D67WuUdfxMngH0XHvSZRC9eRd70jWvDe 9FY4NRTjRSLu/VtgCzF8tSA3cEylhyKYdK6Cf0nbwQ26JTO2VNNCnjuCbRfWp+E1 y0jIQwsaiNLEBwbesNbnFrj+YTTAZBI4+Y5HrSV7Og5/5X9BWs11KAkRppNOj0Uc WnqG26SssuBNBVHPOO2RrOwq3n2VphQ/BB8j9yrpWtcAlQxdjmVqFj/GIIiHr2Wm GuKWgM5fn+xF0oeCriq4Ti5eCJQ7Ev6Er46WrGQDBniZWVi05aP51ks1bfwbfHqn z1o5QfLpr4PkJPk0mnim =8X1b -----END PGP SIGNATURE----- Merge tag 'dt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC device-tree updates from Olof Johansson: "Part 1 of device-tree updates for 3.10. The bulk of the churn in this branch is due to i.MX moving from C-defined pin control over to device tree, which is a one-time conversion that will allow greater flexibility down the road. Besides that, there's PCI-e bindings for Marvell mvebu platforms and a handful of cleanups to tegra due to the new include file functionality of the device tree compiler" * tag 'dt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (113 commits) arm: mvebu: PCIe Device Tree informations for Armada XP GP arm: mvebu: PCIe Device Tree informations for Armada 370 DB arm: mvebu: PCIe Device Tree informations for Armada 370 Mirabox arm: mvebu: PCIe Device Tree informations for Armada XP DB arm: mvebu: PCIe Device Tree informations for OpenBlocks AX3-4 arm: mvebu: add PCIe Device Tree informations for Armada XP arm: mvebu: add PCIe Device Tree informations for Armada 370 ARM: sunxi: unify osc24M_fixed and osc24M arm: vt8500: Add SDHC support to WM8505 DT ARM: dts: Add a 64 bits version of the skeleton device tree ARM: mvebu: Add Device Bus and CFI flash memory support to defconfig ARM: mvebu: Add support for NOR flash device on Openblocks AX3 board ARM: mvebu: Add support for NOR flash device on Armada XP-GP board ARM: mvebu: Add Device Bus support for Armada 370/XP SoC ARM: dts: imx6dl-wandboard: Add USB Host support ARM: dts: imx51 cpu node ARM: dts: Add missing imx27-phytec-phycore dtb target ARM: dts: Add NFC support for i.MX27 Phytec PCM038 module ARM: i.MX51: Add PATA support ARM: dts: Add initial support for Wandboard Dual-Lite ...	2013-05-02 09:28:03 -07:00
Vinod Koul	b2396f7984	Merge branch 'topic/of' into for-linus Conflicts: include/linux/dmaengine.h Signed-off-by: Vinod Koul <vinod.koul@intel.com>	2013-05-02 21:52:26 +05:30
Lars-Peter Clausen	de61608acf	dma:of: Use a mutex to protect the of_dma_list Currently the OF DMA code uses a spin lock to protect the of_dma_list from concurrent access and a per controller reference count to protect the controller from being freed while a request operation is in progress. If of_dma_controller_free() is called for a controller who's reference count is not zero it will return -EBUSY and not remove the controller. This is fine up until here, but leaves the question what the caller of of_dma_controller_free() is supposed to do if the controller couldn't be freed. The only viable solution for the caller is to spin on of_dma_controller_free() until it returns success. E.g. do { ret = of_dma_controller_free(dev->of_node) } while (ret != -EBUSY); This is rather ugly and unnecessary and none of the current users of of_dma_controller_free() check it's return value anyway. Instead protect the list by a mutex. The mutex will be held as long as a request operation is in progress. So if of_dma_controller_free() is called while a request operation is in progress it will be put to sleep and only wake up once the request operation has finished. This means that it is no longer possible to register or unregister OF DMA controllers from a context where it's not possible to sleep. But I doubt that we'll ever need this. Also rename of_dma_get_controller back to of_dma_find_controller. Signed-off-by: Lars-Peter Clausen <lars@metafoo.de> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Vinod Koul <vinod.koul@intel.com>	2013-05-02 21:50:38 +05:30
Linus Torvalds	a7726350e0	ARM: arm-soc cleanup for 3.10 Here is a collection of cleanup patches. Among the pieces that stand out are: - The deletion of h720x platforms - Split of at91 non-dt platforms to their own Kconfig file to keep them separate - General cleanups and refactoring of i.MX and MXS platforms - Some restructuring of clock tables for OMAP - Convertion of PMC driver for Tegra to dt-only - Some renames of sunxi -> sun4i (Allwinner A10) - ... plus a bunch of other stuff that I haven't mentioned -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJRggUqAAoJEIwa5zzehBx3HjEQAJwp7heRs/HwTDzmzcyHkRMV usbaa9dHBuAZ0DzsWjLK99xEn8VWD9TvbeP6hN5gNhxko06UVza3o8PI2iV1ztMB 9K3u2+LS5on/5cOxnsU1va16h5hBZ0ZIgNx5NY+PZ5mBY6v1U3qTjljPP62iXp63 w+sdXeZDe/c5JvuoDRbY0OBR++3Jp8cQg7KbU78jWz3r5D2rC1zwhkf2audcRY6b jIWTj9M8CHynh/D6OzKqDcOYorBHNSRj0YbiWS2nnMfm+0V8nya00EPRpCPRiBUb sobSy1CI9Qxiih3bOf6QCfzCRzJ5hbtE0zlI8g3bqtEZ1yOsE949HrKapWHJJdIU JNTXrxXORAnaRhbzvSPNpp/iJBSDQRsfEETgv5BuHg/4lzTQfzElySbcgb4EeoHr 7Zt8ZR2/Du+u76qIPqs19ES3Wx+nOEOfSDAgZmlfPvlwmlGDYvqAXoeJ006VXnhG JacLuD/cFnJ1w00Bcl48ZXMIsVkoRqjvsCG5q688HGXMM1lU8DfgUpQY6OCWAbdu kFnBinJZk+HbE8FGS8O0BoQ+oiC0YIr2XhATL66PGHq7bLHb5ycwvZ7mrfC0AN9j M9hqTFednwfo9wF8vSj5nMsxXwP8/mky4ECGoFvLsMYDosunrNVnAHtTgDSE+ZgO 6kQJ1P8jBBXn2LyjF88W =xCAx -----END PGP SIGNATURE----- Merge tag 'cleanup-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC cleanup from Olof Johansson: "Here is a collection of cleanup patches. Among the pieces that stand out are: - The deletion of h720x platforms - Split of at91 non-dt platforms to their own Kconfig file to keep them separate - General cleanups and refactoring of i.MX and MXS platforms - Some restructuring of clock tables for OMAP - Convertion of PMC driver for Tegra to dt-only - Some renames of sunxi -> sun4i (Allwinner A10) - ... plus a bunch of other stuff that I haven't mentioned" * tag 'cleanup-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (119 commits) ARM: i.MX: remove unused ARCH_* configs ARM i.MX53: remove platform ahci support ARM: sunxi: Rework the restart code irqchip: sunxi: Rename sunxi to sun4i irqchip: sunxi: Make use of the IRQCHIP_DECLARE macro clocksource: sunxi: Rename sunxi to sun4i clocksource: sunxi: make use of CLKSRC_OF clocksource: sunxi: Cleanup the timer code ARM: at91: remove trailing semicolon from macros ARM: at91/setup: fix trivial typos ARM: EXYNOS: remove "config EXYNOS_DEV_DRM" ARM: EXYNOS: change the name of USB ohci header ARM: SAMSUNG: Remove unnecessary code for dma ARM: S3C24XX: Remove unused GPIO drive strength register definitions ARM: OMAP4+: PM: Restore CPU power state to ON with clockdomain force wakeup method ARM: S3C24XX: Removed unneeded dependency on CPU_S3C2412 ARM: S3C24XX: Removed unneeded dependency on CPU_S3C2410 ARM: S3C24XX: Removed unneeded dependency on ARCH_S3C24XX for boards ARM: SAMSUNG: Fix typo "CONFIG_SAMSUNG_DEV_RTC" ARM: S5P64X0: Fix typo "CONFIG_S5P64X0_SETUP_SDHCI" ...	2013-05-02 09:03:55 -07:00
Frederic Weisbecker	c032862fba	Merge commit '8700c95adb03' into timers/nohz The full dynticks tree needs the latest RCU and sched upstream updates in order to fix some dependencies. Merge a common upstream merge point that has these updates. Conflicts: include/linux/perf_event.h kernel/rcutree.h kernel/rcutree_plugin.h Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2013-05-02 17:54:19 +02:00
David Miller	4ada8db38a	net: Restore NETIF_F_* bit ordering. Commit `8ad227ff89` ("net: vlan: add 802.1ad support") added some new NETIF_F_* features bits, but it added them in the middle of existing values. Userland depends upon the flag bits via the per-netdevice 'flags' sysfs file. So restore the previous ordering by adding the new flags at the end. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-05-02 07:34:58 -07:00
Bhanu Prakash Gollapudi	c13d2b6d36	[SCSI] bnx2fc: Include chip number in the symbolic name [jejb: move PCI_DEVICE_ID definitions to include/pci_ids.h] Signed-off-by: Bhanu Prakash Gollapudi <bprakash@broadcom.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2013-05-02 07:32:24 -07:00
Alex Deucher	62d1f92e06	drm/radeon: add new richland pci ids Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2013-05-02 10:01:49 -04:00
Alex Deucher	18932a2841	drm/radeon: add some new SI PCI ids Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2013-05-02 10:01:48 -04:00
Paul Mackerras	5975a2e095	KVM: PPC: Book3S: Add API for in-kernel XICS emulation This adds the API for userspace to instantiate an XICS device in a VM and connect VCPUs to it. The API consists of a new device type for the KVM_CREATE_DEVICE ioctl, a new capability KVM_CAP_IRQ_XICS, which functions similarly to KVM_CAP_IRQ_MPIC, and the KVM_IRQ_LINE ioctl, which is used to assert and deassert interrupt inputs of the XICS. The XICS device has one attribute group, KVM_DEV_XICS_GRP_SOURCES. Each attribute within this group corresponds to the state of one interrupt source. The attribute number is the same as the interrupt source number. This does not support irq routing or irqfd yet. Signed-off-by: Paul Mackerras <paulus@samba.org> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>	2013-05-02 15:28:36 +02:00
Michael S. Tsirkin	5012a3a384	tcm_vhost: header split up move uapi parts to vhost.h move .c private parts to .c itself Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Asias He <asias@redhat.com> Acked-by: Nicholas Bellinger <nab@linux-iscsi.org>	2013-05-02 13:40:15 +03:00
Joerg Roedel	0c4513be3d	Merge branches 'iommu/fixes', 'x86/vt-d', 'x86/amd', 'ppc/pamu', 'core' and 'arm/tegra' into next	2013-05-02 12:10:19 +02:00
Alex Elder	4f0dcb10cf	libceph: create source file "net/ceph/snapshot.c" This creates a new source file "net/ceph/snapshot.c" to contain utility routines related to ceph snapshot contexts. The main motivation was to define ceph_create_snap_context() as a common way to create these structures, but I've moved the definitions of ceph_get_snap_context() and ceph_put_snap_context() there too. (The benefit of inlining those is very small, and I'd rather keep this collection of functions together.) Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:20:08 -07:00
Alex Elder	c3f56102f2	libceph: validate timespec conversions A ceph timespec contains 32-bit unsigned values for its seconds and nanoseconds components. For a standard timespec, both fields are signed, and the seconds field is almost surely 64 bits. Add some explicit casts so the fact that this conversion is taking place is obvious. Also trip a bug if we ever try to put out of range (negative or too big) values into a ceph timespec. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:19:17 -07:00
Alex Elder	b587398a4f	libceph: add signed type limits Flesh out the limits defined in <linux/ceph/decode.h> to include the maximum and minimum values for signed type S8, S16, S32, and S64. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:19:16 -07:00
Alex Elder	6c57b5545d	libceph: support pages for class request data Add the ability to provide an array of pages as outbound request data for object class method calls. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:19:06 -07:00
Alex Elder	49719778bf	libceph: support raw data requests Allow osd request ops that aren't otherwise structured (not class, extent, or watch ops) to specify "raw" data to be used to hold incoming data for the op. Make use of this capability for the osd STAT op. Prefix the name of the private function osd_req_op_init() with "_", and expose a new function by that (earlier) name whose purpose is to initialize osd ops with (only) implied data. For now we'll just support the use of a page array for an osd op with incoming raw data. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:19:00 -07:00
Alex Elder	406e2c9f92	libceph: kill off osd data write_request parameters In the incremental move toward supporting distinct data items in an osd request some of the functions had "write_request" parameters to indicate, basically, whether the data belonged to in_data or the out_data. Now that we maintain the data fields in the op structure there is no need to indicate the direction, so get rid of the "write_request" parameters. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:18:58 -07:00
Alex Elder	26be88087a	libceph: change how "safe" callback is used An osd request currently has two callbacks. They inform the initiator of the request when we've received confirmation for the target osd that a request was received, and when the osd indicates all changes described by the request are durable. The only time the second callback is used is in the ceph file system for a synchronous write. There's a race that makes some handling of this case unsafe. This patch addresses this problem. The error handling for this callback is also kind of gross, and this patch changes that as well. In ceph_sync_write(), if a safe callback is requested we want to add the request on the ceph inode's unsafe items list. Because items on this list must have their tid set (by ceph_osd_start_request()), the request added after the call to that function returns. The problem with this is that there's a race between starting the request and adding it to the unsafe items list; the request may already be complete before ceph_sync_write() even begins to put it on the list. To address this, we change the way the "safe" callback is used. Rather than just calling it when the request is "safe", we use it to notify the initiator the bounds (start and end) of the period during which the request is unsafe. So the initiator gets notified just before the request gets sent to the osd (when it is "unsafe"), and again when it's known the results are durable (it's no longer unsafe). The first call will get made in __send_request(), just before the request message gets sent to the messenger for the first time. That function is only called by __send_queued(), which is always called with the osd client's request mutex held. We then have this callback function insert the request on the ceph inode's unsafe list when we're told the request is unsafe. This will avoid the race because this call will be made under protection of the osd client's request mutex. It also nicely groups the setup and cleanup of the state associated with managing unsafe requests. The name of the "safe" callback field is changed to "unsafe" to better reflect its new purpose. It has a Boolean "unsafe" parameter to indicate whether the request is becoming unsafe or is now safe. Because the "msg" parameter wasn't used, we drop that. This resolves the original problem reportedin: http://tracker.ceph.com/issues/4706 Reported-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Sage Weil <sage@inktank.com>	2013-05-01 21:18:52 -07:00
Alex Elder	04017e29bb	libceph: make method call data be a separate data item Right now the data for a method call is specified via a pointer and length, and it's copied--along with the class and method name--into a pagelist data item to be sent to the osd. Instead, encode the data in a data item separate from the class and method names. This will allow large amounts of data to be supplied to methods without copying. Only rbd uses the class functionality right now, and when it really needs this it will probably need to use a page array rather than a page list. But this simple implementation demonstrates the functionality on the osd client, and that's enough for now. This resolves: http://tracker.ceph.com/issues/4104 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:18:35 -07:00
Alex Elder	90af36022a	libceph: add, don't set data for a message Change the names of the functions that put data on a pagelist to reflect that we're adding to whatever's already there rather than just setting it to the one thing. Currently only one data item is ever added to a message, but that's about to change. This resolves: http://tracker.ceph.com/issues/2770 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:18:34 -07:00
Alex Elder	ca8b3a6917	libceph: implement multiple data items in a message This patch adds support to the messenger for more than one data item in its data list. A message data cursor has two more fields to support this: - a count of the number of bytes left to be consumed across all data items in the list, "total_resid" - a pointer to the head of the list (for validation only) The cursor initialization routine has been split into two parts: the outer one, which initializes the cursor for traversing the entire list of data items; and the inner one, which initializes the cursor to start processing a single data item. When a message cursor is first initialized, the outer initialization routine sets total_resid to the length provided. The data pointer is initialized to the first data item on the list. From there, the inner initialization routine finishes by setting up to process the data item the cursor points to. Advancing the cursor consumes bytes in total_resid. If the resid field reaches zero, it means the current data item is fully consumed. If total_resid indicates there is more data, the cursor is advanced to point to the next data item, and then the inner initialization routine prepares for using that. (A check is made at this point to make sure we don't wrap around the front of the list.) The type-specific init routines are modified so they can be given a length that's larger than what the data item can support. The resid field is initialized to the smaller of the provided length and the length of the entire data item. When total_resid reaches zero, we're done. This resolves: http://tracker.ceph.com/issues/3761 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:18:33 -07:00
Alex Elder	5240d9f95d	libceph: replace message data pointer with list In place of the message data pointer, use a list head which links through message data items. For now we only support a single entry on that list. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:18:32 -07:00
Alex Elder	8ae4f4f5c0	libceph: have cursor point to data Rather than having a ceph message data item point to the cursor it's associated with, have the cursor point to a data item. This will allow a message cursor to be used for more than one data item. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:18:30 -07:00
Alex Elder	36153ec9dd	libceph: move cursor into message A message will only be processing a single data item at a time, so there's no need for each data item to have its own cursor. Move the cursor embedded in the message data structure into the message itself. To minimize the impact, keep the data->cursor field, but make it be a pointer to the cursor in the message. Move the definition of ceph_msg_data above ceph_msg_data_cursor so the cursor can point to the data without a forward definition rather than vice-versa. This and the upcoming patches are part of: http://tracker.ceph.com/issues/3761 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:18:29 -07:00
Alex Elder	c851c49591	libceph: record bio length The bio is the only data item type that doesn't record its full length. Fix that. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:18:28 -07:00
Alex Elder	ea96571f7b	libceph: fix possible CONFIG_BLOCK build problem This patch: 15a0d7b libceph: record message data length did not enclose some bio-specific code inside CONFIG_BLOCK as it should have. Fix that. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:18:26 -07:00
Alex Elder	5476492fba	libceph: kill off osd request r_data_in and r_data_out Finally! Convert the osd op data pointers into real structures, and make the switch over to using them instead of having all ops share the in and/or out data structures in the osd request. Set up a new function to traverse the set of ops and release any data associated with them (pages). This and the patches leading up to it resolve: http://tracker.ceph.com/issues/4657 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:18:25 -07:00
Alex Elder	ec9123c567	libceph: set the data pointers when encoding ops Still using the osd request r_data_in and r_data_out pointer, but we're basically only referring to it via the data pointers in the osd ops. And we're transferring that information to the request or reply message only when the op indicates it's needed, in osd_req_encode_op(). To avoid a forward reference, ceph_osdc_msg_data_set() was moved up in the file. Don't bother calling ceph_osd_data_init(), in ceph_osd_alloc(), because the ops array will already be zeroed anyway. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:18:24 -07:00
Alex Elder	a4ce40a9a7	libceph: combine initializing and setting osd data This ends up being a rather large patch but what it's doing is somewhat straightforward. Basically, this is replacing two calls with one. The first of the two calls is initializing a struct ceph_osd_data with data (either a page array, a page list, or a bio list); the second is setting an osd request op so it associates that data with one of the op's parameters. In place of those two will be a single function that initializes the op directly. That means we sort of fan out a set of the needed functions: - extent ops with pages data - extent ops with pagelist data - extent ops with bio list data and - class ops with page data for receiving a response We also have define another one, but it's only used internally: - class ops with pagelist data for request parameters Note that we still haven't gotten rid of the osd request's r_data_in and r_data_out fields. All the osd ops refer to them for their data. For now, these data fields are pointers assigned to the appropriate r_data_* field when these new functions are called. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:18:23 -07:00
Alex Elder	5f562df5f5	libceph: format class info at init time An object class method is formatted using a pagelist which contains the class name, the method name, and the data concatenated into an osd request's outbound data. Currently when a class op is initialized in osd_req_op_cls_init(), the lengths of and pointers to these three items are recorded. Later, when the op is getting formatted into the request message, a new pagelist is created and that is when these items get copied into the pagelist. This patch makes it so the pagelist to hold these items is created when the op is initialized instead. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:18:19 -07:00
Alex Elder	c99d2d4abb	libceph: specify osd op by index in request An osd request now holds all of its source op structures, and every place that initializes one of these is in fact initializing one of the entries in the the osd request's array. So rather than supplying the address of the op to initialize, have caller specify the osd request and an indication of which op it would like to initialize. This better hides the details the op structure (and faciltates moving the data pointers they use). Since osd_req_op_init() is a common routine, and it's not used outside the osd client code, give it static scope. Also make it return the address of the specified op (so all the other init routines don't have to repeat that code). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:18:15 -07:00
Alex Elder	8c042b0df9	libceph: add data pointers in osd op structures An extent type osd operation currently implies that there will be corresponding data supplied in the data portion of the request (for write) or response (for read) message. Similarly, an osd class method operation implies a data item will be supplied to receive the response data from the operation. Add a ceph_osd_data pointer to each of those structures, and assign it to point to eithre the incoming or the outgoing data structure in the osd message. The data is not always available when an op is initially set up, so add two new functions to allow setting them after the op has been initialized. Begin to make use of the data item pointer available in the osd operation rather than the request data in or out structure in places where it's convenient. Add some assertions to verify pointers are always set the way they're expected to be. This is a sort of stepping stone toward really moving the data into the osd request ops, to allow for some validation before making that jump. This is the first in a series of patches that resolve: http://tracker.ceph.com/issues/4657 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:18:14 -07:00
Alex Elder	54d5064912	libceph: rename data out field in osd request op There are fields "indata" and "indata_len" defined the ceph osd request op structure. The "in" part is with from the point of view of the osd server, but is a little confusing here on the client side. Change their names to use "request" instead of "in" to indicate that it defines data provided with the request (as opposed the data returned in the response). Rename the local variable in osd_req_encode_op() to match. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:18:13 -07:00
Alex Elder	79528734f3	libceph: keep source rather than message osd op array An osd request keeps a pointer to the osd operations (ops) array that it builds in its request message. In order to allow each op in the array to have its own distinct data, we will need to keep track of each op's data, and that information does not go over the wire. As long as we're tracking the data we might as well just track the entire (source) op definition for each of the ops. And if we're doing that, we'll have no more need to keep a pointer to the wire-encoded version. This patch makes the array of source ops be kept with the osd request structure, and uses that instead of the version encoded in the message in places where that was previously used. The array will be embedded in the request structure, and the maximum number of ops we ever actually use is currently 2. So reduce CEPH_OSD_MAX_OP to 2 to reduce the size of the structure. The result of doing this sort of ripples back up, and as a result various function parameters and local variables become unnecessary. Make r_num_ops be unsigned, and move the definition of struct ceph_osd_req_op earlier to ensure it's defined where needed. It does not yet add per-op data, that's coming soon. This resolves: http://tracker.ceph.com/issues/4656 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:18:12 -07:00
Alex Elder	43bfe5de9f	libceph: define osd data initialization helpers Define and use functions that encapsulate the initializion of a ceph_osd_data structure. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:18:06 -07:00
Alex Elder	e5975c7c8e	ceph: build osd request message later for writepages Hold off building the osd request message in ceph_writepages_start() until just before it will be submitted to the osd client for execution. We'll still create the request and allocate the page pointer array after we learn we have at least one page to write. A local variable will be used to keep track of the allocated array of pages. Wait until just before submitting the request for assigning that page array pointer to the request message. Create ands use a new function osd_req_op_extent_update() whose purpose is to serve this one spot where the length value supplied when an osd request's op was initially formatted might need to get changed (reduced, never increased) before submitting the request. Previously, ceph_writepages_start() assigned the message header's data length because of this update. That's no longer necessary, because ceph_osdc_build_request() will recalculate the right value to use based on the content of the ops in the request. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:18:02 -07:00
Alex Elder	acead002b2	libceph: don't build request in ceph_osdc_new_request() This patch moves the call to ceph_osdc_build_request() out of ceph_osdc_new_request() and into its caller. This is in order to defer formatting osd operation information into the request message until just before request is started. The only unusual (ab)user of ceph_osdc_build_request() is ceph_writepages_start(), where the final length of write request may change (downward) based on the current inode size or the oldest snapshot context with dirty data for the inode. The remaining callers don't change anything in the request after has been built. This means the ops array is now supplied by the caller. It also means there is no need to pass the mtime to ceph_osdc_new_request() (it gets provided to ceph_osdc_build_request()). And rather than passing a do_sync flag, have the number of ops in the ops array supplied imply adding a second STARTSYNC operation after the READ or WRITE requested. This and some of the patches that follow are related to having the messenger (only) be responsible for filling the content of the message header, as described here: http://tracker.ceph.com/issues/4589 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:17:58 -07:00
Alex Elder	a193080481	libceph: record message data length Keep track of the length of the data portion for a message in a separate field in the ceph_msg structure. This information has been maintained in wire byte order in the message header, but that's going to change soon. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:17:57 -07:00
Alex Elder	fdce58ccb5	libceph: record length of bio list with bio When assigning a bio pointer to an osd request, we don't have an efficient way of knowing the total length bytes in the bio list. That information is available at the point it's set up by the rbd code, so record it with the osd data when it's set. This and the next patch are related to maintaining the length of a message's data independent of the message header, as described here: http://tracker.ceph.com/issues/4589 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:17:56 -07:00
Alex Elder	ace6d3a96f	libceph: drop ceph_osd_request->r_con_filling_msg A field in an osd request keeps track of whether a connection is currently filling the request's reply message. This patch gets rid of that field. An osd request includes two messages--a request and a reply--and they're both associated with the connection that existed to its the target osd at the time the request was created. An osd request can be dropped early, even when it's in flight. And at that time both messages are released. It's possible the reply message has been supplied to its connection to receive an incoming response message at the time the osd request gets dropped. So ceph_osdc_release_request() revokes that message from the connection before releasing it so things get cleaned up properly. Previously this may have caused a problem, because the connection that a message was associated with might have gone away before the revoke request. And to avoid any problems using that connection, the osd client held a reference to it when it supplies its response message. However since this commit: `38941f80` libceph: have messages point to their connection all messages hold a reference to the connection they are associated with whenever the connection is actively operating on the message (i.e. while the message is queued to send or sending, and when it data is being received into it). And if a message has no connection associated with it, ceph_msg_revoke_incoming() won't do anything when asked to revoke it. As a result, there is no need to keep an additional reference to the connection associated with a message when we hand the message to the messenger when it calls our alloc_msg() method to receive something. If the connection were operating on it, it would have its own reference, and if not, there's no work to be done when we need to revoke it. So get rid of the osd request's r_con_filling_msg field. This resolves: http://tracker.ceph.com/issues/4647 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:17:54 -07:00
Alex Elder	ef4859d647	libceph: define ceph_decode_pgid() only once There are two basically identical definitions of __decode_pgid() in libceph, one in "net/ceph/osdmap.c" and the other in "net/ceph/osd_client.c". Get rid of both, and instead define a single inline version in "include/linux/ceph/osdmap.h". Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:17:52 -07:00
Alex Elder	33803f3300	libceph: define source request op functions The rbd code has a function that allocates and populates a ceph_osd_req_op structure (the in-core version of an osd request operation). When reviewed, Josh suggested two things: that the big varargs function might be better split into type-specific functions; and that this functionality really belongs in the osd client rather than rbd. This patch implements both of Josh's suggestions. It breaks up the rbd function into separate functions and defines them in the osd client module as exported interfaces. Unlike the rbd version, however, the functions don't allocate an osd_req_op structure; they are provided the address of one and that is initialized instead. The rbd function has been eliminated and calls to it have been replaced by calls to the new routines. The rbd code now now use a stack (struct) variable to hold the op rather than allocating and freeing it each time. For now only the capabilities used by rbd are implemented. Implementing all the other osd op types, and making the rest of the code use it will be done separately, in the next few patches. Note that only the extent, cls, and watch portions of the ceph_osd_req_op structure are currently used. Delete the others (xattr, pgls, and snap) from its definition so nobody thinks it's actually implemented or needed. We can add it back again later if needed, when we know it's been tested. This (and a few follow-on patches) resolves: http://tracker.ceph.com/issues/3861 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:17:45 -07:00
Alex Elder	adfe695a25	ceph: move max constant definitions Move some definitions for max integer values out of the rbd code and into the more central "decode.h" header file. These really belong in a Linux (or libc) header somewhere, but I haven't gotten around to proposing that yet. This is in preparation for moving some code out of rbd.c and into the osd client. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:17:42 -07:00
Alex Elder	6644ed7b7e	libceph: make message data be a pointer Begin the transition from a single message data item to a list of them by replacing the "data" structure in a message with a pointer to a ceph_msg_data structure. A null pointer will indicate the message has no data; replace the use of ceph_msg_has_data() with a simple check for a null pointer. Create functions ceph_msg_data_create() and ceph_msg_data_destroy() to dynamically allocate and free a data item structure of a given type. When a message has its data item "set," allocate one of these to hold the data description, and free it when the last reference to the message is dropped. This partially resolves: http://tracker.ceph.com/issues/4429 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:17:37 -07:00
Alex Elder	f5db90bcf2	libceph: kill last of ceph_msg_pos The only remaining field in the ceph_msg_pos structure is did_page_crc. In the new cursor model of things that flag (or something like it) belongs in the cursor. Define a new field "need_crc" in the cursor (which applies to all types of data) and initialize it to true whenever a cursor is initialized. In write_partial_message_data(), the data CRC still will be computed as before, but it will check the cursor->need_crc field to determine whether it's needed. Any time the cursor is advanced to a new piece of a data item, need_crc will be set, and this will cause the crc for that entire piece to be accumulated into the data crc. In write_partial_message_data() the intermediate crc value is now held in a local variable so it doesn't have to be byte-swapped so many times. In read_partial_msg_data() we do something similar (but mainly for consistency there). With that, the ceph_msg_pos structure can go away, and it no longer needs to be passed as an argument to prepare_message_data(). This cleanup is related to: http://tracker.ceph.com/issues/4428 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:17:34 -07:00
Alex Elder	859a35d552	libceph: kill most of ceph_msg_pos All but one of the fields in the ceph_msg_pos structure are now never used (only assigned), so get rid of them. This allows several small blocks of code to go away. This is cleanup of old code related to: http://tracker.ceph.com/issues/4428 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:17:33 -07:00
Alex Elder	4c59b4a278	libceph: collapse all data items into one It turns out that only one of the data item types is ever used at any one time in a single message (currently). - A page array is used by the osd client (on behalf of the file system) and by rbd. Only one osd op (and therefore at most one data item) is ever used at a time by rbd. And the only time the file system sends two, the second op contains no data. - A bio is only used by the rbd client (and again, only one data item per message) - A page list is used by the file system and by rbd for outgoing data, but only one op (and one data item) at a time. We can therefore collapse all three of our data item fields into a single field "data", and depend on the messenger code to properly handle it based on its type. This allows us to eliminate quite a bit of duplicated code. This is related to: http://tracker.ceph.com/issues/4429 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:17:30 -07:00
Alex Elder	6518be47f9	libceph: kill ceph message bio_iter, bio_seg The bio_iter and bio_seg fields in a message are no longer used, we use the cursor instead. So get rid of them and the functions that operate on them them. This is related to: http://tracker.ceph.com/issues/4428 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:17:26 -07:00
Alex Elder	25aff7c559	libceph: record residual bytes for all message data types All of the data types can use this, not just the page array. Until now, only the bio type doesn't have it available, and only the initiator of the request (the rbd client) is able to supply the length of the full request without re-scanning the bio list. Change the cursor init routines so the length is supplied based on the message header "data_len" field, and use that length to intiialize the "resid" field of the cursor. In addition, change the way "last_piece" is defined so it is based on the residual number of bytes in the original request. This is necessary (at least for bio messages) because it is possible for a read request to succeed without consuming all of the space available in the data buffer. This resolves: http://tracker.ceph.com/issues/4427 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:17:24 -07:00
Sage Weil	e9966076cd	libceph: wrap auth methods in a mutex The auth code is called from a variety of contexts, include the mon_client (protected by the monc's mutex) and the messenger callbacks (currently protected by nothing). Avoid chaos by protecting all auth state with a mutex. Nothing is blocking, so this should be simple and lightweight. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com>	2013-05-01 21:17:15 -07:00
Sage Weil	27859f9773	libceph: wrap auth ops in wrapper functions Use wrapper functions that check whether the auth op exists so that callers do not need a bunch of conditional checks. Simplifies the external interface. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com>	2013-05-01 21:17:14 -07:00
Sage Weil	0bed9b5c52	libceph: add update_authorizer auth method Currently the messenger calls out to a get_authorizer con op, which will create a new authorizer if it doesn't yet have one. In the meantime, when we rotate our service keys, the authorizer doesn't get updated. Eventually it will be rejected by the server on a new connection attempt and get invalidated, and we will then rebuild a new authorizer, but this is not ideal. Instead, if we do have an authorizer, call a new update_authorizer op that will verify that the current authorizer is using the latest secret. If it is not, we will build a new one that does. This avoids the transient failure. This fixes one of the sorry sequence of events for bug http://tracker.ceph.com/issues/4282 Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com>	2013-05-01 21:17:13 -07:00
Sage Weil	3a23083bda	libceph: implement RECONNECT_SEQ feature This is an old protocol extension that allows the client and server to avoid resending old messages after a reconnect (following a socket error). Instead, the exchange their sequence numbers during the handshake. This avoids sending a bunch of useless data over the socket. It has been supported in the server code since v0.22 (Sep 2010). Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com>	2013-05-01 21:17:09 -07:00
Alex Elder	9d2a06c275	libceph: kill message trail The wart that is the ceph message trail can now be removed, because its only user was the osd client, and the previous patch made that no longer the case. The result allows write_partial_msg_pages() to be simplified considerably. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:17:05 -07:00
Alex Elder	95e072eb38	libceph: kill osd request r_trail The osd trail is a pagelist, used only for a CALL osd operation to hold the class and method names, along with any input data for the call. It is only currently used by the rbd client, and when it's used it is the only bit of outbound data in the osd request. Since we already support (non-trail) pagelist data in a message, we can just save this outbound CALL data in the "normal" pagelist rather than the trail, and get rid of the trail entirely. The existing pagelist support depends on the pagelist being dynamically allocated, and ownership of it is passed to the messenger once it's been attached to a message. (That is to say, the messenger releases and frees the pagelist when it's done with it). That means we need to dynamically allocate the pagelist also. Note that we simply assert that the allocation of a pagelist structure succeeds. Appending to a pagelist might require a dynamic allocation, so we're already assuming we won't run into trouble doing so (we're just ignore any failures--and that should be fixed at some point). This resolves: http://tracker.ceph.com/issues/4407 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:17:04 -07:00
Alex Elder	9a5e6d09dd	libceph: have osd requests support pagelist data Add support for recording a ceph pagelist as data associated with an osd request. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:17:03 -07:00
Alex Elder	175face2ba	libceph: let osd ops determine request data length The length of outgoing data in an osd request is dependent on the osd ops that are embedded in that request. Each op is encoded into a request message using osd_req_encode_op(), so that should be used to determine the amount of outgoing data implied by the op as it is encoded. Have osd_req_encode_op() return the number of bytes of outgoing data implied by the op being encoded, and accumulate and use that in ceph_osdc_build_request(). As a result, ceph_osdc_build_request() no longer requires its "len" parameter, so get rid of it. Using the sum of the op lengths rather than the length provided is a valid change because: - The only callers of osd ceph_osdc_build_request() are rbd and the osd client (in ceph_osdc_new_request() on behalf of the file system). - When rbd calls it, the length provided is only non-zero for write requests, and in that case the single op has the same length value as what was passed here. - When called from ceph_osdc_new_request(), (it's not all that easy to see, but) the length passed is also always the same as the extent length encoded in its (single) write op if present. This resolves: http://tracker.ceph.com/issues/4406 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:17:02 -07:00
Alex Elder	e766d7b55e	libceph: implement pages array cursor Implement and use cursor routines for page array message data items for outbound message data. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:17:01 -07:00
Alex Elder	6aaa4511de	libceph: implement bio message data item cursor Implement and use cursor routines for bio message data items for outbound message data. (See the previous commit for reasoning in support of the changes in out_msg_pos_next().) Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:16:59 -07:00
Alex Elder	dd236fcb65	libceph: prepare for other message data item types This just inserts some infrastructure in preparation for handling other types of ceph message data items. No functional changes, just trying to simplify review by separating out some noise. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:16:57 -07:00
Alex Elder	fe38a2b67b	libceph: start defining message data cursor This patch lays out the foundation for using generic routines to manage processing items of message data. For simplicity, we'll start with just the trail portion of a message, because it stands alone and is only present for outgoing data. First some basic concepts. We'll use the term "data item" to represent one of the ceph_msg_data structures associated with a message. There are currently four of those, with single-letter field names p, l, b, and t. A data item is further broken into "pieces" which always lie in a single page. A data item will include a "cursor" that will track state as the memory defined by the item is consumed by sending data from or receiving data into it. We define three routines to manipulate a data item's cursor: the "init" routine; the "next" routine; and the "advance" routine. The "init" routine initializes the cursor so it points at the beginning of the first piece in the item. The "next" routine returns the page, page offset, and length (limited by both the page and item size) of the next unconsumed piece in the item. It also indicates to the caller whether the piece being returned is the last one in the data item. The "advance" routine consumes the requested number of bytes in the item (advancing the cursor). This is used to record the number of bytes from the current piece that were actually sent or received by the network code. It returns an indication of whether the result means the current piece has been fully consumed. This is used by the message send code to determine whether it should calculate the CRC for the next piece processed. The trail of a message is implemented as a ceph pagelist. The routines defined for it will be usable for non-trail pagelist data as well. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:16:56 -07:00
Alex Elder	437945094f	libceph: abstract message data Group the types of message data into an abstract structure with a type indicator and a union containing fields appropriate to the type of data it represents. Use this to represent the pages, pagelist, bio, and trail in a ceph message. Verify message data is of type NONE in ceph_msg_data_set_*() routines. Since information about message data of type NONE really should not be interpreted, get rid of the other assertions in those functions. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:16:55 -07:00
Alex Elder	f9e15777af	libceph: be explicit about message data representation A ceph message has a data payload portion. The memory for that data (either the source of data to send or the location to place data that is received) is specified in several ways. The ceph_msg structure includes fields for all of those ways, but this mispresents the fact that not all of them are used at a time. Specifically, the data in a message can be in: - an array of pages - a list of pages - a list of Linux bios - a second list of pages (the "trail") (The two page lists are currently only ever used for outgoing data.) Impose more structure on the ceph message, making the grouping of some of these fields explicit. Shorten the name of the "page_alignment" field. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:16:54 -07:00
Alex Elder	97fb1c7f66	libceph: define ceph_msg_has_() data macros Define and use macros ceph_msg_has_() to determine whether to operate on the pages, pagelist, bio, and trail fields of a message. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:16:53 -07:00
Alex Elder	4a73ef27ad	libceph: record message data byte length Record the number of bytes of data in a page array rather than the number of pages in the array. It can be assumed that the page array is of sufficient size to hold the number of bytes indicated (and offset by the indicated alignment). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:16:42 -07:00
Alex Elder	27fa83852b	libceph: isolate other message data fields Define ceph_msg_data_set_pagelist(), ceph_msg_data_set_bio(), and ceph_msg_data_set_trail() to clearly abstract the assignment of the remaining data-related fields in a ceph message structure. Use the new functions in the osd client and mds client. This partially resolves: http://tracker.ceph.com/issues/4263 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:16:40 -07:00
Alex Elder	f1baeb2b9f	libceph: set page info with byte length When setting page array information for message data, provide the byte length rather than the page count ceph_msg_data_set_pages(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:16:39 -07:00
Alex Elder	02afca6ca0	libceph: isolate message page field manipulation Define a function ceph_msg_data_set_pages(), which more clearly abstracts the assignment page-related fields for data in a ceph message structure. Use this new function in the osd client and mds client. Ideally, these fields would never be set more than once (with BUG_ON() calls to guarantee that). At the moment though the osd client sets these every time it receives a message, and in the event of a communication problem this can happen more than once. (This will be resolved shortly, but setting up these helpers first makes it all a bit easier to work with.) Rearrange the field order in a ceph_msg structure to group those that are used to define the possible data payloads. This partially resolves: http://tracker.ceph.com/issues/4263 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:16:38 -07:00
Alex Elder	e0c594878e	libceph: record byte count not page count Record the byte count for an osd request rather than the page count. The number of pages can always be derived from the byte count (and alignment/offset) but the reverse is not true. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:16:36 -07:00
Alex Elder	7b11ba3758	libceph: define CEPH_MSG_MAX_MIDDLE_LEN This is probably unnecessary but the code read as if it were wrong in read_partial_message(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:16:29 -07:00
Alex Elder	0fff87ec79	libceph: separate read and write data An osd request defines information about where data to be read should be placed as well as where data to write comes from. Currently these are represented by common fields. Keep information about data for writing separate from data to be read by splitting these into data_in and data_out fields. This is the key patch in this whole series, in that it actually identifies which osd requests generate outgoing data and which generate incoming data. It's less obvious (currently) that an osd CALL op generates both outgoing and incoming data; that's the focus of some upcoming work. This resolves: http://tracker.ceph.com/issues/4127 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:16:27 -07:00
Alex Elder	2ac2b7a6d4	libceph: distinguish page and bio requests An osd request uses either pages or a bio list for its data. Use a union to record information about the two, and add a data type tag to select between them. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:16:25 -07:00
Alex Elder	2794a82a11	libceph: separate osd request data info Pull the fields in an osd request structure that define the data for the request out into a separate structure. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:16:24 -07:00
Alex Elder	153e5167e0	libceph: don't assign page info in ceph_osdc_new_request() Currently ceph_osdc_new_request() assigns an osd request's r_num_pages and r_alignment fields. The only thing it does after that is call ceph_osdc_build_request(), and that doesn't need those fields to be assigned. Move the assignment of those fields out of ceph_osdc_new_request() and into its caller. As a result, the page_align parameter is no longer used, so get rid of it. Note that in ceph_sync_write(), the value for req->r_num_pages had already been calculated earlier (as num_pages, and fortunately it was computed the same way). So don't bother recomputing it, but because it's not needed earlier, move that calculation after the call to ceph_osdc_new_request(). Hold off making the assignment to r_alignment, doing it instead r_pages and r_num_pages are getting set. Similarly, in start_read(), nr_pages already holds the number of pages in the array (and is calculated the same way), so there's no need to recompute it. Move the assignment of the page alignment down with the others there as well. This and the next few patches are preparation work for: http://tracker.ceph.com/issues/4127 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:16:23 -07:00
Alex Elder	41766f87f5	libceph: rename ceph_calc_object_layout() The purpose of ceph_calc_object_layout() is to fill in the pool number and seed for a ceph_pg structure provided, based on a given osd map and target object id. Currently that function takes a file layout parameter, but the only thing used out of that is its pool number. Change the function so it takes a pool number rather than the full file layout structure. Only update the ceph_pg if the pool is found in the osd map. Get rid of few useless lines of code from the function while there. Since the function now very clearly just fills in the ceph_pg structure it's provided, rename it ceph_calc_ceph_pg(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:16:17 -07:00
Alex Elder	ec02a2f2ff	libceph: kill ceph_msg->pagelist_count The pagelist_count field is never actually used, so get rid of it. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:16:16 -07:00
Alex Elder	2a24d1f4bd	libceph: use (void ) for untyped data in osd ops Two of the fields defining osd operations are defined using (char ) while the data they represent are really untyped, not character strings. Change them to have type (void *). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:16:15 -07:00
Alex Elder	0d5af16435	libceph: complete lingering requests only once An osd request marked to linger will be re-submitted in the event a connection to the target osd gets dropped. Currently, if there is a callback function associated with a request it will be called each time a request is submitted--which for lingering requests can be more than once. Change it so a request--including lingering ones--will get completed (from the perspective of the user of the osd client) exactly once. This resolves: http://tracker.ceph.com/issues/3967 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:16:12 -07:00
Alex Elder	d4b515fa10	libceph: distinguish page array and pagelist count Use distinct fields for tracking the number of pages in a message's page array and in a message's page list. Currently only one or the other is used at a time, but that will be changing soon. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:14:28 -07:00
Alex Elder	07c09b7255	libceph: make ceph_msg->bio_seg be unsigned The bio_seg field is used by the ceph messenger in iterating through a bio. It should never have a negative value, so make it an unsigned. (I contemplated making it unsigned short to match the struct bio definition, but it offered no benefit.) Change variables used to hold bio_seg values to all be unsigned as well. Change two variable names in init_bio_iter() to match the convention used everywhere else. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>	2013-05-01 21:14:23 -07:00
Linus Torvalds	20b4fb4852	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull VFS updates from Al Viro, Misc cleanups all over the place, mainly wrt /proc interfaces (switch create_proc_entry to proc_create(), get rid of the deprecated create_proc_read_entry() in favor of using proc_create_data() and seq_file etc). 7kloc removed. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (204 commits) don't bother with deferred freeing of fdtables proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h proc: Make the PROC_I() and PDE() macros internal to procfs proc: Supply a function to remove a proc entry by PDE take cgroup_open() and cpuset_open() to fs/proc/base.c ppc: Clean up scanlog ppc: Clean up rtas_flash driver somewhat hostap: proc: Use remove_proc_subtree() drm: proc: Use remove_proc_subtree() drm: proc: Use minor->index to label things, not PDE->name drm: Constify drm_proc_list[] zoran: Don't print proc_dir_entry data in debug reiserfs: Don't access the proc_dir_entry in r_open(), r_start() r_show() proc: Supply an accessor for getting the data from a PDE's parent airo: Use remove_proc_subtree() rtl8192u: Don't need to save device proc dir PDE rtl8187se: Use a dir under /proc/net/r8180/ proc: Add proc_mkdir_data() proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h} proc: Move PDE_NET() to fs/proc/proc_net.c ...	2013-05-01 17:51:54 -07:00
Linus Torvalds	b9394d8a65	Merge branch 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/efi changes from Peter Anvin: "The bulk of these changes are cleaning up the efivars handling and breaking it up into a tree of files. There are a number of fixes as well. The entire changeset is pretty big, but most of it is code movement. Several of these commits are quite new; the history got very messed up due to a mismerge with the urgent changes for rc8 which completely broke IA64, and so Ingo requested that we rebase it to straighten it out." * 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: efi: remove "kfree(NULL)" efi: locking fix in efivar_entry_set_safe() efi, pstore: Read data from variable store before memcpy() efi, pstore: Remove entry from list when erasing efi, pstore: Initialise 'entry' before iterating efi: split efisubsystem from efivars efivarfs: Move to fs/efivarfs efivars: Move pstore code into the new EFI directory efivars: efivar_entry API efivars: Keep a private global pointer to efivars efi: move utf16 string functions to efi.h x86, efi: Make efi_memblock_x86_reserve_range more readable efivarfs: convert to use simple_open()	2013-05-01 15:51:46 -07:00
James Hogan	126de6b20b	linkage.h: fix build breakage due to symbol prefix handling Al's commit `e1b5bb6d12` ("consolidate cond_syscall and SYSCALL_ALIAS declarations") broke the build on blackfin and metag due to the following code: #ifndef SYMBOL_NAME #ifdef CONFIG_SYMBOL_PREFIX #define SYMBOL_NAME(x) CONFIG_SYMBOL_PREFIX ## x #else #define SYMBOL_NAME(x) x #endif #endif #define __SYMBOL_NAME(x) __stringify(SYMBOL_NAME(x)) __stringify literally stringifies CONFIG_SYMBOL_PREFIX ##x, so you get lines like this in kernel/sys_ni.s: .weak CONFIG_SYMBOL_PREFIXsys_quotactl .set CONFIG_SYMBOL_PREFIXsys_quotactl,CONFIG_SYMBOL_PREFIXsys_ni_syscall The patches in Rusty's modules-next tree such as "CONFIG_SYMBOL_PREFIX: cleanup." cleans up the whole mess around symbol prefixes, so this patch just attempts to fix the build in the meantime. The intermediate definition of SYMBOL_NAME above isn't used and is incorrect when CONFIG_SYMBOL_PREFIX is defined as CONFIG_SYMBOL_PREFIX is a quoted string literal, so define __SYMBOL_NAME directly depending on CONFIG_SYMBOL_PREFIX. Signed-off-by: James Hogan <james.hogan@imgtec.com> Mea-culpa-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Mike Frysinger <vapier@gentoo.org> Cc: uclinux-dist-devel@blackfin.uclinux.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-05-01 15:51:29 -07:00
Al Viro	ac3e3c5b11	don't bother with deferred freeing of fdtables Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-05-01 17:31:42 -04:00
David Howells	59d8053f1e	proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h Move non-public declarations and definitions from linux/proc_fs.h to fs/proc/internal.h. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-05-01 17:29:47 -04:00
David Howells	c30480b92c	proc: Make the PROC_I() and PDE() macros internal to procfs Make the PROC_I() and PDE() macros internal to procfs. This means making PDE_DATA() out of line. This could be made more optimal by storing PDE()->data into inode->i_private. Also provide a __PDE_DATA() that is inline and internal to procfs. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-05-01 17:29:47 -04:00
David Howells	a8ca16ea7b	proc: Supply a function to remove a proc entry by PDE Supply a function (proc_remove()) to remove a proc entry (and any subtree rooted there) by proc_dir_entry pointer rather than by name and (optionally) root dir entry pointer. This allows us to eliminate all remaining pde->name accesses outside of procfs. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Grant Likely <grant.likely@linaro.or> cc: linux-acpi@vger.kernel.org cc: openipmi-developer@lists.sourceforge.net cc: devicetree-discuss@lists.ozlabs.org cc: linux-pci@vger.kernel.org cc: netdev@vger.kernel.org cc: netfilter-devel@vger.kernel.org cc: alsa-devel@alsa-project.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-05-01 17:29:46 -04:00
Al Viro	8d8b97ba49	take cgroup_open() and cpuset_open() to fs/proc/base.c Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-05-01 17:29:46 -04:00
David Howells	b63e6aa502	drm: proc: Use minor->index to label things, not PDE->name Use minor->index to label things, not the name field from the proc_dir_entry of the /proc/dwm/<minor>/ directory. Also, use "%u" not "%d" to render the value and use a 12-byte buffer in which to render the integer, not a 16-byte buffer. The longest string an unsigned int can give you is 10 chars (4294967295) plus a NUL, so round up to 12 as the stack is likely to be 4- or 8-byte aligned. Signed-off-by: David Howells <dhowells@redhat.com> cc: dri-devel@lists.freedesktop.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-05-01 17:29:44 -04:00
David Howells	ce089b5472	drm: Constify drm_proc_list[] Constify drm_proc_list[] and related pointers. Signed-off-by: David Howells <dhowells@redhat.com> cc: dri-devel@lists.freedesktop.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-05-01 17:29:43 -04:00
David Howells	4a520d2769	proc: Supply an accessor for getting the data from a PDE's parent Supply an accessor function for getting the private data from the parent proc_dir_entry struct of the proc_dir_entry struct associated with an inode. ReiserFS, for instance, stores the super_block pointer in the proc directory it makes for that super_block, and a pointer to the respective seq_file show function in each of the proc files in that directory. This allows a reduction in the number of file_operations structs, open functions and seq_operations structs required. The problem otherwise is that each show function requires two pieces of data but only has storage for one per PDE (and this has no release function). Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> cc: Jerry Chuang <jerry-chuang@realtek.com> cc: Maxim Mikityanskiy <maxtram95@gmail.com> cc: YAMANE Toshiaki <yamanetoshi@gmail.com> cc: linux-wireless@vger.kernel.org cc: linux-scsi@vger.kernel.org cc: devel@driverdev.osuosl.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-05-01 17:29:42 -04:00
David Howells	270b5ac215	proc: Add proc_mkdir_data() Add proc_mkdir_data() to allow procfs directories to be created that are annotated at the time of creation with private data rather than doing this post-creation. This means no access is then required to the proc_dir_entry struct to set this. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> cc: Neela Syam Kolli <megaraidlinux@lsi.com> cc: Jerry Chuang <jerry-chuang@realtek.com> cc: linux-scsi@vger.kernel.org cc: devel@driverdev.osuosl.org cc: linux-wireless@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-05-01 17:29:41 -04:00
David Howells	34db8aaf0f	proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h} Move some bits from linux/proc_fs.h to linux/of.h, signal.h and tty.h. Also move proc_tty_init() and proc_device_tree_init() to fs/proc/internal.h as they're internal to procfs. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Grant Likely <grant.likely@secretlab.ca> cc: devicetree-discuss@lists.ozlabs.org cc: linux-arch@vger.kernel.org cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> cc: Jri Slaby <jslaby@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-05-01 17:29:40 -04:00
David Howells	4abfd02989	proc: Move PDE_NET() to fs/proc/proc_net.c Move PDE_NET() to fs/proc/proc_net.c as that's where the only user is. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-05-01 17:29:40 -04:00
David Howells	0bb80f2405	proc: Split the namespace stuff out into linux/proc_ns.h Split the proc namespace stuff out into linux/proc_ns.h. Signed-off-by: David Howells <dhowells@redhat.com> cc: netdev@vger.kernel.org cc: Serge E. Hallyn <serge.hallyn@ubuntu.com> cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-05-01 17:29:39 -04:00
David Howells	271a15eabe	proc: Supply PDE attribute setting accessor functions Supply accessor functions to set attributes in proc_dir_entry structs. The following are supplied: proc_set_size() and proc_set_user(). Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com> cc: linuxppc-dev@lists.ozlabs.org cc: linux-media@vger.kernel.org cc: netdev@vger.kernel.org cc: linux-wireless@vger.kernel.org cc: linux-pci@vger.kernel.org cc: netfilter-devel@vger.kernel.org cc: alsa-devel@alsa-project.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-05-01 17:29:18 -04:00
Linus Torvalds	73287a43cc	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next Pull networking updates from David Miller: "Highlights (1721 non-merge commits, this has to be a record of some sort): 1) Add 'random' mode to team driver, from Jiri Pirko and Eric Dumazet. 2) Make it so that any driver that supports configuration of multiple MAC addresses can provide the forwarding database add and del calls by providing a default implementation and hooking that up if the driver doesn't have an explicit set of handlers. From Vlad Yasevich. 3) Support GSO segmentation over tunnels and other encapsulating devices such as VXLAN, from Pravin B Shelar. 4) Support L2 GRE tunnels in the flow dissector, from Michael Dalton. 5) Implement Tail Loss Probe (TLP) detection in TCP, from Nandita Dukkipati. 6) In the PHY layer, allow supporting wake-on-lan in situations where the PHY registers have to be written for it to be configured. Use it to support wake-on-lan in mv643xx_eth. From Michael Stapelberg. 7) Significantly improve firewire IPV6 support, from YOSHIFUJI Hideaki. 8) Allow multiple packets to be sent in a single transmission using network coding in batman-adv, from Martin Hundebøll. 9) Add support for T5 cxgb4 chips, from Santosh Rastapur. 10) Generalize the VXLAN forwarding tables so that there is more flexibility in configurating various aspects of the endpoints. From David Stevens. 11) Support RSS and TSO in hardware over GRE tunnels in bxn2x driver, from Dmitry Kravkov. 12) Zero copy support in nfnelink_queue, from Eric Dumazet and Pablo Neira Ayuso. 13) Start adding networking selftests. 14) In situations of overload on the same AF_PACKET fanout socket, or per-cpu packet receive queue, minimize drop by distributing the load to other cpus/fanouts. From Willem de Bruijn and Eric Dumazet. 15) Add support for new payload offset BPF instruction, from Daniel Borkmann. 16) Convert several drivers over to mdoule_platform_driver(), from Sachin Kamat. 17) Provide a minimal BPF JIT image disassembler userspace tool, from Daniel Borkmann. 18) Rewrite F-RTO implementation in TCP to match the final specification of it in RFC4138 and RFC5682. From Yuchung Cheng. 19) Provide netlink socket diag of netlink sockets ("Yo dawg, I hear you like netlink, so I implemented netlink dumping of netlink sockets.") From Andrey Vagin. 20) Remove ugly passing of rtnetlink attributes into rtnl_doit functions, from Thomas Graf. 21) Allow userspace to be able to see if a configuration change occurs in the middle of an address or device list dump, from Nicolas Dichtel. 22) Support RFC3168 ECN protection for ipv6 fragments, from Hannes Frederic Sowa. 23) Increase accuracy of packet length used by packet scheduler, from Jason Wang. 24) Beginning set of changes to make ipv4/ipv6 fragment handling more scalable and less susceptible to overload and locking contention, from Jesper Dangaard Brouer. 25) Get rid of using non-type-safe NLMSG_* macros and use nlmsg_() instead. From Hong Zhiguo. 26) Optimize route usage in IPVS by avoiding reference counting where possible, from Julian Anastasov. 27) Convert IPVS schedulers to RCU, also from Julian Anastasov. 28) Support cpu fanouts in xt_NFQUEUE netfilter target, from Holger Eitzenberger. 29) Network namespace support for nf_log, ebt_log, xt_LOG, ipt_ULOG, nfnetlink_log, and nfnetlink_queue. From Gao feng. 30) Implement RFC3168 ECN protection, from Hannes Frederic Sowa. 31) Support several new r8169 chips, from Hayes Wang. 32) Support tokenized interface identifiers in ipv6, from Daniel Borkmann. 33) Use usbnet_link_change() helper in USB net driver, from Ming Lei. 34) Add 802.1ad vlan offload support, from Patrick McHardy. 35) Support mmap() based netlink communication, also from Patrick McHardy. 36) Support HW timestamping in mlx4 driver, from Amir Vadai. 37) Rationalize AF_PACKET packet timestamping when transmitting, from Willem de Bruijn and Daniel Borkmann. 38) Bring parity to what's provided by /proc/net/packet socket dumping and the info provided by netlink socket dumping of AF_PACKET sockets. From Nicolas Dichtel. 39) Fix peeking beyond zero sized SKBs in AF_UNIX, from Benjamin Poirier" git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1722 commits) filter: fix va_list build error af_unix: fix a fatal race with bit fields bnx2x: Prevent memory leak when cnic is absent bnx2x: correct reading of speed capabilities net: sctp: attribute printl with __printf for gcc fmt checks netlink: kconfig: move mmap i/o into netlink kconfig netpoll: convert mutex into a semaphore netlink: Fix skb ref counting. net_sched: act_ipt forward compat with xtables mlx4_en: fix a build error on 32bit arches Revert "bnx2x: allow nvram test to run when device is down" bridge: avoid OOPS if root port not found drivers: net: cpsw: fix kernel warn on cpsw irq enable sh_eth: use random MAC address if no valid one supplied 3c509.c: call SET_NETDEV_DEV for all device types (ISA/ISAPnP/EISA) tg3: fix to append hardware time stamping flags unix/stream: fix peeking with an offset larger than data in queue unix/dgram: fix peeking with an offset larger than data in queue unix/dgram: peek beyond 0-sized skbs openvswitch: Remove unneeded ovs_netdev_get_ifindex() ...	2013-05-01 14:08:52 -07:00
Xi Wang	20074f357d	filter: fix va_list build error This patch fixes the following build error. In file included from include/linux/filter.h:52:0, from arch/arm/net/bpf_jit_32.c:14: include/linux/printk.h:54:2: error: unknown type name ‘va_list’ include/linux/printk.h:105:21: error: unknown type name ‘va_list’ include/linux/printk.h:108:30: error: unknown type name ‘va_list’ Signed-off-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-05-01 16:28:48 -04:00
Linus Torvalds	251df49db3	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input updates from Dmitry Torokhov: "Assorted fixes and cleanups to the existing drivers plus a new driver for IMS Passenger Control Unit device they use for ther in-flight entertainment system." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (44 commits) Input: trackpoint - Optimize trackpoint init to use power-on reset Input: apbps2 - convert to devm_ioremap_resource() Input: ALPS - use %ph to print buffers ARM - shmobile: Armadillo800EVA: Move st1232 reset pin handling Input: st1232 - add reset pin handling Input: st1232 - convert to devm_* infrastructure Input: MT - handle semi-mt devices in core Input: adxl34x - use spi_get_drvdata() Input: ad7877 - use spi_get_drvdata() and spi_set_drvdata() Input: ads7846 - use spi_get_drvdata() and spi_set_drvdata() Input: ims-pcu - fix a memory leak on error Input: sysrq - supplement reset sequence with timeout functionality Input: tegra-kbc - support for defining row/columns based on SoC Input: imx_keypad - switch to using managed resources Input: arc_ps2 - add support for device tree Input: mma8450 - fix signed 12bits to 32bits conversion Input: eeti_ts - remove redundant null check Input: edt-ft5x06 - remove redundant null check before kfree Input: ad714x - add CONFIG_PM_SLEEP to suspend/resume functions Input: adxl34x - add CONFIG_PM_SLEEP to suspend/resume functions ...	2013-05-01 13:20:04 -07:00
Eric Dumazet	60bc851ae5	af_unix: fix a fatal race with bit fields Using bit fields is dangerous on ppc64/sparc64, as the compiler [1] uses 64bit instructions to manipulate them. If the 64bit word includes any atomic_t or spinlock_t, we can lose critical concurrent changes. This is happening in af_unix, where unix_sk(sk)->gc_candidate/ gc_maybe_cycle/lock share the same 64bit word. This leads to fatal deadlock, as one/several cpus spin forever on a spinlock that will never be available again. A safer way would be to use a long to store flags. This way we are sure compiler/arch wont do bad things. As we own unix_gc_lock spinlock when clearing or setting bits, we can use the non atomic __set_bit()/__clear_bit(). recursion_level can share the same 64bit location with the spinlock, as it is set only with this spinlock held. [1] bug fixed in gcc-4.8.0 : http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52080 Reported-by: Ambrose Feinstein <ambrose@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-05-01 15:13:49 -04:00
Neil Horman	bd7c4b604a	netpoll: convert mutex into a semaphore Bart Van Assche recently reported a warning to me: <IRQ> [<ffffffff8103d79f>] warn_slowpath_common+0x7f/0xc0 [<ffffffff8103d7fa>] warn_slowpath_null+0x1a/0x20 [<ffffffff814761dd>] mutex_trylock+0x16d/0x180 [<ffffffff813968c9>] netpoll_poll_dev+0x49/0xc30 [<ffffffff8136a2d2>] ? __alloc_skb+0x82/0x2a0 [<ffffffff81397715>] netpoll_send_skb_on_dev+0x265/0x410 [<ffffffff81397c5a>] netpoll_send_udp+0x28a/0x3a0 [<ffffffffa0541843>] ? write_msg+0x53/0x110 [netconsole] [<ffffffffa05418bf>] write_msg+0xcf/0x110 [netconsole] [<ffffffff8103eba1>] call_console_drivers.constprop.17+0xa1/0x1c0 [<ffffffff8103fb76>] console_unlock+0x2d6/0x450 [<ffffffff8104011e>] vprintk_emit+0x1ee/0x510 [<ffffffff8146f9f6>] printk+0x4d/0x4f [<ffffffffa0004f1d>] scsi_print_command+0x7d/0xe0 [scsi_mod] This resulted from my commit `ca99ca14c` which introduced a mutex_trylock operation in a path that could execute in interrupt context. When mutex debugging is enabled, the above warns the user when we are in fact exectuting in interrupt context interrupt context. After some discussion, It seems that a semaphore is the proper mechanism to use here. While mutexes are defined to be unusable in interrupt context, no such condition exists for semaphores (save for the fact that the non blocking api calls, like up and down_trylock must be used when in irq context). Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Reported-by: Bart Van Assche <bvanassche@acm.org> CC: Bart Van Assche <bvanassche@acm.org> CC: David Miller <davem@davemloft.net> CC: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>	2013-05-01 15:00:24 -04:00
Linus Torvalds	a49fe6d59a	Merge branch 'topic/omap3isp' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull omap3isp clk support from Mauro Carvalho Chehab: "This patch were sent in separate as it depends on a merge from clock framework, that you merged in commit 362ed48dee50" * 'topic/omap3isp' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: [media] omap3isp: Use the common clock framework	2013-05-01 09:57:04 -07:00
Linus Torvalds	823e75f723	Merge branch 'ipc-scalability' Merge IPC cleanup and scalability patches from Andrew Morton. This cleans up many of the oddities in the IPC code, uses the list iterator helpers, splits out locking and adds per-semaphore locks for greater scalability of the IPC semaphore code. Most normal user-level locking by now uses futexes (ie pthreads, but also a lot of specialized locks), but SysV IPC semaphores are apparently still used in some big applications, either for portability reasons, or because they offer tracking and undo (and you don't need to have a special shared memory area for them). Our IPC semaphore scalability was pitiful. We used to lock much too big ranges, and we used to have a single ipc lock per ipc semaphore array. Most loads never cared, but some do. There are some numbers in the individual commits. * ipc-scalability: ipc: sysv shared memory limited to 8TiB ipc/msg.c: use list_for_each_entry_[safe] for list traversing ipc,sem: fine grained locking for semtimedop ipc,sem: have only one list in struct sem_queue ipc,sem: open code and rename sem_lock ipc,sem: do not hold ipc lock more than necessary ipc: introduce lockless pre_down ipcctl ipc: introduce obtaining a lockless ipc object ipc: remove bogus lock comment for ipc_checkid ipc/msgutil.c: use linux/uaccess.h ipc: refactor msg list search into separate function ipc: simplify msg list search ipc: implement MSG_COPY as a new receive mode ipc: remove msg handling from queue scan ipc: set EFAULT as default error in load_msg() ipc: tighten msg copy loops ipc: separate msg allocation from userspace copy ipc: clamp with min()	2013-05-01 08:17:51 -07:00
Robin Holt	d69f3bad46	ipc: sysv shared memory limited to 8TiB Trying to run an application which was trying to put data into half of memory using shmget(), we found that having a shmall value below 8EiB-8TiB would prevent us from using anything more than 8TiB. By setting kernel.shmall greater than 8EiB-8TiB would make the job work. In the newseg() function, ns->shm_tot which, at 8TiB is INT_MAX. ipc/shm.c: 458 static int newseg(struct ipc_namespace ns, struct ipc_params params) 459 { ... 465 int numpages = (size + PAGE_SIZE -1) >> PAGE_SHIFT; ... 474 if (ns->shm_tot + numpages > ns->shm_ctlall) 475 return -ENOSPC; [akpm@linux-foundation.org: make ipc/shm.c:newseg()'s numpages size_t, not int] Signed-off-by: Robin Holt <holt@sgi.com> Reported-by: Alex Thorlton <athorlton@sgi.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-05-01 08:12:58 -07:00
Linus Torvalds	149b306089	Mostly performance and bug fixes, plus some cleanups. The one new feature this merge window is a new ioctl EXT4_IOC_SWAP_BOOT which allows installation of a hidden inode designed for boot loaders. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABCAAGBQJRfpDwAAoJENNvdpvBGATwsjwP/17V3AN6XTEhZK80p3/qN5YD N2QIHeyYIqCGpczLs2TQEkxWX6nqpDggAPXY956wvgeEMQV+pQ+DLO4Ol9+p5WD2 hrklleYhtOjFQ3Xh4lqrEi5FzKVzWagVDLqgUjALJ+D+hkDB7ZQT/fm2sH45rzot xBp3aVqANU8GqAAbEW4/Ng9ZGMx0dpANiU2svbjM71sv2dCLFmWAkz+GgZsMbuJZ vnKIZP6I6plwP3LuZzEbVCA7F2PzC4ywEOJKjIEvgHpX6uMDR3FX8pD5Dlo/o6e2 eP+KLnD43mJMxBmTn22x5Sm0N6DUzJCEELRJWB9wCZoLdEvbEWRxT3qsPXfLWelG 2jj4bImXF2CqYEsJww5FV2WdXXdnuM57pZym5vMZGAFyKPSCJobA4Y3XRdXkBfXf Gq/cFoPYv2EcBIhz3zrRj+tbY8esbO9wOnF6+x+AF10BspD2V7nuoVdWVhOf0A3v i9ifGPwLk3e3xHr9oXheo7IWn52oviZeyD77d7D7MLhgn+xU4LaVhW3R63Q+mI4D 0TXG25R1CVcE7wyFy3gqSVXSCDO0JcQBL5LgcL+wAGXcHPAXqBpN2DFTPo+9fJH2 g3YMwr+wMbci1XRVQ2vdTt/nBZYjOCh6PgRmg3KjTz11Ra5EsjQvYjKWYwqf2RGn QhCgbzd/qtZfNJztLvr7 =GCT2 -----END PGP SIGNATURE----- Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "Mostly performance and bug fixes, plus some cleanups. The one new feature this merge window is a new ioctl EXT4_IOC_SWAP_BOOT which allows installation of a hidden inode designed for boot loaders." * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (50 commits) ext4: fix type-widening bug in inode table readahead code ext4: add check for inodes_count overflow in new resize ioctl ext4: fix Kconfig documentation for CONFIG_EXT4_DEBUG ext4: fix online resizing for ext3-compat file systems jbd2: trace when lock_buffer in do_get_write_access takes a long time ext4: mark metadata blocks using bh flags buffer: add BH_Prio and BH_Meta flags ext4: mark all metadata I/O with REQ_META ext4: fix readdir error in case inline_data+^dir_index. ext4: fix readdir error in the case of inline_data+dir_index jbd2: use kmem_cache_zalloc instead of kmem_cache_alloc/memset ext4: mext_insert_extents should update extent block checksum ext4: move quota initialization out of inode allocation transaction ext4: reserve xattr index for Rich ACL support jbd2: reduce journal_head size ext4: clear buffer_uninit flag when submitting IO ext4: use io_end for multiple bios ext4: make ext4_bio_write_page() use BH_Async_Write flags ext4: Use kstrtoul() instead of parse_strtoul() ext4: defragmentation code cleanup ...	2013-05-01 08:04:12 -07:00
Linus Torvalds	b0ca4d0123	3.10 dma-buf updates -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJRgQj+AAoJEAG+/NWsLn5b+ZgP/2To0IUb3hyvcfCEnGytzOu1 zIkDMyqTYtPy3r+gdZ7LkZYJkJ0NrO9DwWUDcPEY/9mBL0kwOBIgWumUi2ABESQI /tuTYzG0MKo5UNRr/aSK5erLSZeSbRht4DE8oiK9mw1O6/mUq5c4J/V+z6AYWC8m 4cad8+XGU6x1g0a13VIhO8p2OcfV/ZyGwOeHN4SMZhkoHt3Le8wVXYMUDDofmCKs lkkCetleQFcHLtwIMRC3SX15mIF57LhicK5JfmvEYvI4ghVu5eKi5B59aGdhG0R9 p7cKZ10TgDbr2I8gvQNZ/jSKhMCxLeGHeA781YWqJRU/YdvoYltVVOeWH60YZJdF K/dwaPsVkE68xVpTO5xieCTY9nSk1q258gQLgPkgThQYuRuPgWwAN/GbaFYGJjew vGmpp2F7VZxrVjBO89Rqgp47KWZE9ibEpc4Gb6pg7thqRl/joi6sosP0zGUApnFT fJSKMvXgNGDk6r8b5F2EYKTeLAHDuQ+FqzZeZ1+B/IOEwmyUosqslUlhmEVXqawW CUMO0MkimIfv7stPcgx/v7e3lQDz/8EGI8NBjJIipkeF+Fo39bUt4MA56SVp/lBC YyPokFjZGcqtADKHxZzNYTpGx/9NTMCDR510qhqITWQXd+e6ygbmLUgwdOOiBlQM gQDdPh29sPULSad37tbr =VZTM -----END PGP SIGNATURE----- Merge tag 'tag-for-linus-3.10' of git://git.linaro.org/people/sumitsemwal/linux-dma-buf Pull dma-buf updates from Sumit Semwal: "Added debugfs support to dma-buf" * tag 'tag-for-linus-3.10' of git://git.linaro.org/people/sumitsemwal/linux-dma-buf: dma-buf: Add debugfs support dma-buf: replace dma_buf_export() with dma_buf_export_named()	2013-05-01 07:44:37 -07:00
Linus Torvalds	08d7676083	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal Pull compat cleanup from Al Viro: "Mostly about syscall wrappers this time; there will be another pile with patches in the same general area from various people, but I'd rather push those after both that and vfs.git pile are in." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: syscalls.h: slightly reduce the jungles of macros get rid of union semop in sys_semctl(2) arguments make do_mremap() static sparc: no need to sign-extend in sync_file_range() wrapper ppc compat wrappers for add_key(2) and request_key(2) are pointless x86: trim sys_ia32.h x86: sys32_kill and sys32_mprotect are pointless get rid of compat_sys_semctl() and friends in case of ARCH_WANT_OLD_COMPAT_IPC merge compat sys_ipc instances consolidate compat lookup_dcookie() convert vmsplice to COMPAT_SYSCALL_DEFINE switch getrusage() to COMPAT_SYSCALL_DEFINE switch epoll_pwait to COMPAT_SYSCALL_DEFINE convert sendfile{,64} to COMPAT_SYSCALL_DEFINE switch signalfd{,4}() to COMPAT_SYSCALL_DEFINE make SYSCALL_DEFINE<n>-generated wrappers do asmlinkage_protect make HAVE_SYSCALL_WRAPPERS unconditional consolidate cond_syscall and SYSCALL_ALIAS declarations teach SYSCALL_DEFINE<n> how to deal with long long/unsigned long long get rid of duplicate logics in __SC_....[1-6] definitions	2013-05-01 07:21:43 -07:00
Miklos Szeredi	60b9df7a54	fuse: add flag to turn on async direct IO Without async DIO write requests to a single file were always serialized. With async DIO that's no longer the case. So don't turn on async DIO by default for fear of breaking backward compatibility. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2013-05-01 14:37:21 +02:00
Sumit Semwal	b89e35636b	dma-buf: Add debugfs support Add debugfs support to make it easier to print debug information about the dma-buf buffers. Cc: Dave Airlie <airlied@redhat.com> [minor fixes on init and warning fix] Cc: Dan Carpenter <dan.carpenter@oracle.com> [remove double unlock in fail case] Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>	2013-05-01 16:36:22 +05:30
Sumit Semwal	78df969550	dma-buf: replace dma_buf_export() with dma_buf_export_named() For debugging purposes, it is useful to have a name-string added while exporting buffers. Hence, dma_buf_export() is replaced with dma_buf_export_named(), which additionally takes 'exp_name' as a parameter. For backward compatibility, and for lazy exporters who don't wish to name themselves, a #define dma_buf_export() is also made available, which adds a __FILE__ instead of 'exp_name'. Cc: Daniel Vetter <daniel.vetter@ffwll.ch> [Thanks for the idea!] Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>	2013-05-01 16:35:36 +05:30
Michael S. Tsirkin	bc7562355f	Merge branch 'for-next-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending into vhost-net-next	2013-05-01 09:16:50 +03:00
Linus Torvalds	5f56886521	Merge branch 'akpm' (incoming from Andrew) Merge third batch of fixes from Andrew Morton: "Most of the rest. I still have two large patchsets against AIO and IPC, but they're a bit stuck behind other trees and I'm about to vanish for six days. - random fixlets - inotify - more of the MM queue - show_stack() cleanups - DMI update - kthread/workqueue things - compat cleanups - epoll udpates - binfmt updates - nilfs2 - hfs - hfsplus - ptrace - kmod - coredump - kexec - rbtree - pids - pidns - pps - semaphore tweaks - some w1 patches - relay updates - core Kconfig changes - sysrq tweaks" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (109 commits) Documentation/sysrq: fix inconstistent help message of sysrq key ethernet/emac/sysrq: fix inconstistent help message of sysrq key sparc/sysrq: fix inconstistent help message of sysrq key powerpc/xmon/sysrq: fix inconstistent help message of sysrq key ARM/etm/sysrq: fix inconstistent help message of sysrq key power/sysrq: fix inconstistent help message of sysrq key kgdb/sysrq: fix inconstistent help message of sysrq key lib/decompress.c: fix initconst notifier-error-inject: fix module names in Kconfig kernel/sys.c: make prctl(PR_SET_MM) generally available UAPI: remove empty Kbuild files menuconfig: print more info for symbol without prompts init/Kconfig: re-order CONFIG_EXPERT options to fix menuconfig display kconfig menu: move Virtualization drivers near other virtualization options Kconfig: consolidate CONFIG_DEBUG_STRICT_USER_COPY_CHECKS relay: use macro PAGE_ALIGN instead of FIX_SIZE kernel/relay.c: move FIX_SIZE macro into relay.c kernel/relay.c: remove unused function argument actor drivers/w1/slaves/w1_ds2760.c: fix the error handling in w1_ds2760_add_slave() drivers/w1/slaves/w1_ds2781.c: fix the error handling in w1_ds2781_add_slave() ...	2013-04-30 17:37:43 -07:00
David Howells	22145aa1f6	UAPI: remove empty Kbuild files Remove empty Kbuild files as they cause problems with the patch program which removes files that become empty. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-30 17:04:09 -07:00
zhangwei(Jovi)	536b39ecf1	kernel/relay.c: move FIX_SIZE macro into relay.c It's better to place FIX_SIZE macro in relay.c, instead of relay.h Signed-off-by: zhangwei(Jovi) <jovi.zhangwei@huawei.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-30 17:04:09 -07:00
Raphael S.Carvalho	5cc5445164	pid_namespace.c/.h: simplify defines Move BITS_PER_PAGE from pid_namespace.c to pid_namespace.h, since we can simplify the define PID_MAP_ENTRIES by using the BITS_PER_PAGE. [akpm@linux-foundation.org: kernel/pid.c:54:1: warning: "BITS_PER_PAGE" redefined] Signed-off-by: Raphael S.Carvalho <raphael.scarv@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-30 17:04:07 -07:00
Oleg Nesterov	e56fb28740	exec: do not abuse ->cred_guard_mutex in threadgroup_lock() threadgroup_lock() takes signal->cred_guard_mutex to ensure that thread_group_leader() is stable. This doesn't look nice, the scope of this lock in do_execve() is huge. And as Dave pointed out this can lead to deadlock, we have the following dependencies: do_execve: cred_guard_mutex -> i_mutex cgroup_mount: i_mutex -> cgroup_mutex attach_task_by_pid: cgroup_mutex -> cred_guard_mutex Change de_thread() to take threadgroup_change_begin() around the switch-the-leader code and change threadgroup_lock() to avoid ->cred_guard_mutex. Note that de_thread() can't sleep with ->group_rwsem held, this can obviously deadlock with the exiting leader if the writer is active, so it does threadgroup_change_end() before schedule(). Reported-by: Dave Jones <davej@redhat.com> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-30 17:04:07 -07:00
Oleg Nesterov	403bad72b6	coredump: only SIGKILL should interrupt the coredumping task There are 2 well known and ancient problems with coredump/signals, and a lot of related bug reports: - do_coredump() clears TIF_SIGPENDING but of course this can't help if, say, SIGCHLD comes after that. In this case the coredump can fail unexpectedly. See for example wait_for_dump_helper()->signal_pending() check but there are other reasons. - At the same time, dumping a huge core on the slow media can take a lot of time/resources and there is no way to kill the coredumping task reliably. In particular this is not oom_kill-friendly. This patch tries to fix the 1st problem, and makes the preparation for the next changes. We add the new SIGNAL_GROUP_COREDUMP flag set by zap_threads() to indicate that this process dumps the core. prepare_signal() checks this flag and nacks any signal except SIGKILL. Note that this check tries to be conservative, in the long term we should probably treat the SIGNAL_GROUP_EXIT case equally but this needs more discussion. See marc.info/?l=linux-kernel&m=120508897917439 Notes: - recalc_sigpending() doesn't check SIGNAL_GROUP_COREDUMP. The patch assumes that dump_write/etc paths should never call it, but we can change it as well. - There is another source of TIF_SIGPENDING, freezer. This will be addressed separately. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Tested-by: Mandeep Singh Baines <msb@chromium.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Neil Horman <nhorman@redhat.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Roland McGrath <roland@hack.frob.com> Cc: Tejun Heo <tj@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-30 17:04:06 -07:00
Lucas De Marchi	66e5b7e194	kmod: remove call_usermodehelper_fns() This function suffers from not being able to determine if the cleanup is called in case it returns -ENOMEM. Nobody is using it anymore, so let's remove it. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi> Cc: Oleg Nesterov <oleg@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: James Morris <james.l.morris@oracle.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Tejun Heo <tj@kernel.org> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-30 17:04:06 -07:00
Lucas De Marchi	938e4b22e2	usermodehelper: export call_usermodehelper_exec() and call_usermodehelper_setup() call_usermodehelper_setup() + call_usermodehelper_exec() need to be called instead of call_usermodehelper_fns() when the cleanup function needs to be called even when an ENOMEM error occurs. In this case using call_usermodehelper_fns() the user can't distinguish if the cleanup function was called or not. [akpm@linux-foundation.org: export call_usermodehelper_setup() to modules] Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: James Morris <james.l.morris@oracle.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Tejun Heo <tj@kernel.org> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-30 17:04:05 -07:00
Andrey Vagin	84c751bd4a	ptrace: add ability to retrieve signals without removing from a queue (v4) This patch adds a new ptrace request PTRACE_PEEKSIGINFO. This request is used to retrieve information about pending signals starting with the specified sequence number. Siginfo_t structures are copied from the child into the buffer starting at "data". The argument "addr" is a pointer to struct ptrace_peeksiginfo_args. struct ptrace_peeksiginfo_args { u64 off; /* from which siginfo to start / u32 flags; s32 nr; / how may siginfos to take */ }; "nr" has type "s32", because ptrace() returns "long", which has 32 bits on i386 and a negative values is used for errors. Currently here is only one flag PTRACE_PEEKSIGINFO_SHARED for dumping signals from process-wide queue. If this flag is not set, signals are read from a per-thread queue. The request PTRACE_PEEKSIGINFO returns a number of dumped signals. If a signal with the specified sequence number doesn't exist, ptrace returns zero. The request returns an error, if no signal has been dumped. Errors: EINVAL - one or more specified flags are not supported or nr is negative EFAULT - buf or addr is outside your accessible address space. A result siginfo contains a kernel part of si_code which usually striped, but it's required for queuing the same siginfo back during restore of pending signals. This functionality is required for checkpointing pending signals. Pedro Alves suggested using it in "gdb" to peek at pending signals. gdb already uses PTRACE_GETSIGINFO to get the siginfo for the signal which was already dequeued. This functionality allows gdb to look at the pending signals which were not reported yet. The prototype of this code was developed by Oleg Nesterov. Signed-off-by: Andrew Vagin <avagin@openvz.org> Cc: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: David Howells <dhowells@redhat.com> Cc: Dave Jones <davej@redhat.com> Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pedro Alves <palves@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-30 17:04:05 -07:00
Stephen Rothwell	1a0df59444	kernel/compat.c: make do_sysinfo() static The only use outside of kernel/timer.c was in kernel/compat.c, so move compat_sys_sysinfo() next to sys_sysinfo() in kernel/timer.c. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-30 17:04:03 -07:00
Andy Shevchenko	16c7fa0582	lib/string_helpers: introduce generic string_unescape There are several places in kernel where modules unescapes input to convert C-Style Escape Sequences into byte codes. The patch provides generic implementation of such approach. Test cases are also included into the patch. [akpm@linux-foundation.org: clarify comment] [akpm@linux-foundation.org: export get_random_int() to modules] Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Samuel Thibault <samuel.thibault@ens-lyon.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jason Baron <jbaron@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: William Hubbs <w.d.hubbs@gmail.com> Cc: Chris Brannon <chris@the-brannons.com> Cc: Kirk Reiser <kirk@braille.uwo.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-30 17:04:03 -07:00
Fan Du	74e3d1e17b	include/linux/fs.h: disable preempt when acquire i_size_seqcount write lock Two rt tasks bind to one CPU core. The higher priority rt task A preempts a lower priority rt task B which has already taken the write seq lock, and then the higher priority rt task A try to acquire read seq lock, it's doomed to lockup. rt task A with lower priority: call write i_size_write rt task B with higher priority: call sync, and preempt task A write_seqcount_begin(&inode->i_size_seqcount); i_size_read inode->i_size = i_size; read_seqcount_begin <-- lockup here... So disable preempt when acquiring every i_size_seqcount write lock will cure the problem. Signed-off-by: Fan Du <fan.du@windriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-30 17:04:03 -07:00
liguang	3440a1ca99	kernel/smp.c: remove 'priv' of call_single_data The 'priv' field is redundant; we can pass data via 'info'. Signed-off-by: liguang <lig.fnst@cn.fujitsu.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-30 17:04:03 -07:00
Tejun Heo	3d1cb2059d	workqueue: include workqueue info when printing debug dump of a worker task One of the problems that arise when converting dedicated custom threadpool to workqueue is that the shared worker pool used by workqueue anonimizes each worker making it more difficult to identify what the worker was doing on which target from the output of sysrq-t or debug dump from oops, BUG() and friends. This patch implements set_worker_desc() which can be called from any workqueue work function to set its description. When the worker task is dumped for whatever reason - sysrq-t, WARN, BUG, oops, lockdep assertion and so on - the description will be printed out together with the workqueue name and the worker function pointer. The printing side is implemented by print_worker_info() which is called from functions in task dump paths - sched_show_task() and dump_stack_print_info(). print_worker_info() can be safely called on any task in any state as long as the task struct itself is accessible. It uses probe_*() functions to access worker fields. It may print garbage if something went very wrong, but it wouldn't cause (another) oops. The description is currently limited to 24bytes including the terminating \0. worker->desc_valid and workder->desc[] are added and the 64 bytes marker which was already incorrect before adding the new fields is moved to the correct position. Here's an example dump with writeback updated to set the bdi name as worker desc. Hardware name: Bochs Modules linked in: Pid: 7, comm: kworker/u9:0 Not tainted 3.9.0-rc1-work+ #1 Workqueue: writeback bdi_writeback_workfn (flush-8:0) ffffffff820a3ab0 ffff88000f6e9cb8 ffffffff81c61845 ffff88000f6e9cf8 ffffffff8108f50f 0000000000000000 0000000000000000 ffff88000cde16b0 ffff88000cde1aa8 ffff88001ee19240 ffff88000f6e9fd8 ffff88000f6e9d08 Call Trace: [<ffffffff81c61845>] dump_stack+0x19/0x1b [<ffffffff8108f50f>] warn_slowpath_common+0x7f/0xc0 [<ffffffff8108f56a>] warn_slowpath_null+0x1a/0x20 [<ffffffff81200150>] bdi_writeback_workfn+0x2a0/0x3b0 ... Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Acked-by: Jan Kara <jack@suse.cz> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-30 17:04:02 -07:00
Tejun Heo	cd42d559e4	kthread: implement probe_kthread_data() One of the problems that arise when converting dedicated custom threadpool to workqueue is that the shared worker pool used by workqueue anonimizes each worker making it more difficult to identify what the worker was doing on which target from the output of sysrq-t or debug dump from oops, BUG() and friends. For example, after writeback is converted to use workqueue instead of priviate thread pool, there's no easy to tell which backing device a writeback work item was working on at the time of task dump, which, according to our writeback brethren, is important in tracking down issues with a lot of mounted file systems on a lot of different devices. This patchset implements a way for a work function to mark its execution instance so that task dump of the worker task includes information to indicate what the work item was doing. An example WARN dump would look like the following. WARNING: at fs/fs-writeback.c:1015 bdi_writeback_workfn+0x2b4/0x3c0() Modules linked in: CPU: 0 Pid: 28 Comm: kworker/u18:0 Not tainted 3.9.0-rc1-work+ #24 Hardware name: empty empty/S3992, BIOS 080011 10/26/2007 Workqueue: writeback bdi_writeback_workfn (flush-8:16) ffffffff820a3a98 ffff88015b927cb8 ffffffff81c61855 ffff88015b927cf8 ffffffff8108f500 0000000000000000 ffff88007a171948 ffff88007a1716b0 ffff88015b49df00 ffff88015b8d3940 0000000000000000 ffff88015b927d08 Call Trace: [<ffffffff81c61855>] dump_stack+0x19/0x1b [<ffffffff8108f500>] warn_slowpath_common+0x70/0xa0 ... This patch: Implement probe_kthread_data() which returns kthread_data if accessible. The function is equivalent to kthread_data() except that the specified @task may not be a kthread or its vfork_done is already cleared rendering struct kthread inaccessible. In the former case, probe_kthread_data() may return any value. In the latter, NULL. This will be used to safely print debug information without affecting synchronization in the normal paths. Workqueue debug info printing on dump_stack() and friends will make use of it. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Acked-by: Jan Kara <jack@suse.cz> Cc: Dave Chinner <david@fromorbit.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-30 17:04:02 -07:00
Tejun Heo	a43cb95d54	dump_stack: unify debug information printed by show_regs() show_regs() is inherently arch-dependent but it does make sense to print generic debug information and some archs already do albeit in slightly different forms. This patch introduces a generic function to print debug information from show_regs() so that different archs print out the same information and it's much easier to modify what's printed. show_regs_print_info() prints out the same debug info as dump_stack() does plus task and thread_info pointers. * Archs which didn't print debug info now do. alpha, arc, blackfin, c6x, cris, frv, h8300, hexagon, ia64, m32r, metag, microblaze, mn10300, openrisc, parisc, score, sh64, sparc, um, xtensa * Already prints debug info. Replaced with show_regs_print_info(). The printed information is superset of what used to be there. arm, arm64, avr32, mips, powerpc, sh32, tile, unicore32, x86 * s390 is special in that it used to print arch-specific information along with generic debug info. Heiko and Martin think that the arch-specific extra isn't worth keeping s390 specfic implementation. Converted to use the generic version. Note that now all archs print the debug info before actual register dumps. An example BUG() dump follows. kernel BUG at /work/os/work/kernel/workqueue.c:4841! invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.9.0-rc1-work+ #7 Hardware name: empty empty/S3992, BIOS 080011 10/26/2007 task: ffff88007c85e040 ti: ffff88007c860000 task.ti: ffff88007c860000 RIP: 0010:[<ffffffff8234a07e>] [<ffffffff8234a07e>] init_workqueues+0x4/0x6 RSP: 0000:ffff88007c861ec8 EFLAGS: 00010246 RAX: ffff88007c861fd8 RBX: ffffffff824466a8 RCX: 0000000000000001 RDX: 0000000000000046 RSI: 0000000000000001 RDI: ffffffff8234a07a RBP: ffff88007c861ec8 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff8234a07a R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88007dc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffff88015f7ff000 CR3: 00000000021f1000 CR4: 00000000000007f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Stack: ffff88007c861ef8 ffffffff81000312 ffffffff824466a8 ffff88007c85e650 0000000000000003 0000000000000000 ffff88007c861f38 ffffffff82335e5d ffff88007c862080 ffffffff8223d8c0 ffff88007c862080 ffffffff81c47760 Call Trace: [<ffffffff81000312>] do_one_initcall+0x122/0x170 [<ffffffff82335e5d>] kernel_init_freeable+0x9b/0x1c8 [<ffffffff81c47760>] ? rest_init+0x140/0x140 [<ffffffff81c4776e>] kernel_init+0xe/0xf0 [<ffffffff81c6be9c>] ret_from_fork+0x7c/0xb0 [<ffffffff81c47760>] ? rest_init+0x140/0x140 ... v2: Typo fix in x86-32. v3: CPU number dropped from show_regs_print_info() as dump_stack_print_info() has been updated to print it. s390 specific implementation dropped as requested by s390 maintainers. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Mike Frysinger <vapier@gentoo.org> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Sam Ravnborg <sam@ravnborg.org> Acked-by: Chris Metcalf <cmetcalf@tilera.com> [tile bits] Acked-by: Richard Kuo <rkuo@codeaurora.org> [hexagon bits] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-30 17:04:02 -07:00
Tejun Heo	98e5e1bf72	dump_stack: implement arch-specific hardware description in task dumps x86 and ia64 can acquire extra hardware identification information from DMI and print it along with task dumps; however, the usage isn't consistent. * x86 show_regs() collects vendor, product and board strings and print them out with PID, comm and utsname. Some of the information is printed again later in the same dump. * warn_slowpath_common() explicitly accesses the DMI board and prints it out with "Hardware name:" label. This applies to both x86 and ia64 but is irrelevant on all other archs. * ia64 doesn't show DMI information on other non-WARN dumps. This patch introduces arch-specific hardware description used by dump_stack(). It can be set by calling dump_stack_set_arch_desc() during boot and, if exists, printed out in a separate line with "Hardware name:" label. dmi_set_dump_stack_arch_desc() is added which sets arch-specific description from DMI data. It uses dmi_ids_string[] which is set from dmi_present() used for DMI debug message. It is superset of the information x86 show_regs() is using. The function is called from x86 and ia64 boot code right after dmi_scan_machine(). This makes the explicit DMI handling in warn_slowpath_common() unnecessary. Removed. show_regs() isn't yet converted to use generic debug information printing and this patch doesn't remove the duplicate DMI handling in x86 show_regs(). The next patch will unify show_regs() handling and remove the duplication. An example WARN dump follows. WARNING: at kernel/workqueue.c:4841 init_workqueues+0x35/0x505() Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.9.0-rc1-work+ #3 Hardware name: empty empty/S3992, BIOS 080011 10/26/2007 0000000000000009 ffff88007c861e08 ffffffff81c614dc ffff88007c861e48 ffffffff8108f500 ffffffff82228240 0000000000000040 ffffffff8234a08e 0000000000000000 0000000000000000 0000000000000000 ffff88007c861e58 Call Trace: [<ffffffff81c614dc>] dump_stack+0x19/0x1b [<ffffffff8108f500>] warn_slowpath_common+0x70/0xa0 [<ffffffff8108f54a>] warn_slowpath_null+0x1a/0x20 [<ffffffff8234a0c3>] init_workqueues+0x35/0x505 ... v2: Use the same string as the debug message from dmi_present() which also contains BIOS information. Move hardware name into its own line as warn_slowpath_common() did. This change was suggested by Bjorn Helgaas. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: David S. Miller <davem@davemloft.net> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Mike Frysinger <vapier@gentoo.org> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-30 17:04:02 -07:00
Tejun Heo	196779b9b4	dump_stack: consolidate dump_stack() implementations and unify their behaviors Both dump_stack() and show_stack() are currently implemented by each architecture. show_stack(NULL, NULL) dumps the backtrace for the current task as does dump_stack(). On some archs, dump_stack() prints extra information - pid, utsname and so on - in addition to the backtrace while the two are identical on other archs. The usages in arch-independent code of the two functions indicate show_stack(NULL, NULL) should print out bare backtrace while dump_stack() is used for debugging purposes when something went wrong, so it does make sense to print additional information on the task which triggered dump_stack(). There's no reason to require archs to implement two separate but mostly identical functions. It leads to unnecessary subtle information. This patch expands the dummy fallback dump_stack() implementation in lib/dump_stack.c such that it prints out debug information (taken from x86) and invokes show_stack(NULL, NULL) and drops arch-specific dump_stack() implementations in all archs except blackfin. Blackfin's dump_stack() does something wonky that I don't understand. Debug information can be printed separately by calling dump_stack_print_info() so that arch-specific dump_stack() implementation can still emit the same debug information. This is used in blackfin. This patch brings the following behavior changes. * On some archs, an extra level in backtrace for show_stack() could be printed. This is because the top frame was determined in dump_stack() on those archs while generic dump_stack() can't do that reliably. It can be compensated by inlining dump_stack() but not sure whether that'd be necessary. * Most archs didn't use to print debug info on dump_stack(). They do now. An example WARN dump follows. WARNING: at kernel/workqueue.c:4841 init_workqueues+0x35/0x505() Hardware name: empty Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.9.0-rc1-work+ #9 0000000000000009 ffff88007c861e08 ffffffff81c614dc ffff88007c861e48 ffffffff8108f50f ffffffff82228240 0000000000000040 ffffffff8234a03c 0000000000000000 0000000000000000 0000000000000000 ffff88007c861e58 Call Trace: [<ffffffff81c614dc>] dump_stack+0x19/0x1b [<ffffffff8108f50f>] warn_slowpath_common+0x7f/0xc0 [<ffffffff8108f56a>] warn_slowpath_null+0x1a/0x20 [<ffffffff8234a071>] init_workqueues+0x35/0x505 ... v2: CPU number added to the generic debug info as requested by s390 folks and dropped the s390 specific dump_stack(). This loses %ksp from the debug message which the maintainers think isn't important enough to keep the s390-specific dump_stack() implementation. dump_stack_print_info() is moved to kernel/printk.c from lib/dump_stack.c. Because linkage is per objecct file, dump_stack_print_info() living in the same lib file as generic dump_stack() means that archs which implement custom dump_stack() - at this point, only blackfin - can't use dump_stack_print_info() as that will bring in the generic version of dump_stack() too. v1 The v1 patch broke build on blackfin due to this issue. The build breakage was reported by Fengguang Wu. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Vineet Gupta <vgupta@synopsys.com> Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> Acked-by: Vineet Gupta <vgupta@synopsys.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> [s390 bits] Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Mike Frysinger <vapier@gentoo.org> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Sam Ravnborg <sam@ravnborg.org> Acked-by: Richard Kuo <rkuo@codeaurora.org> [hexagon bits] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-30 17:04:02 -07:00
Dan Magenheimer	10a7a07713	xen: tmem: enable Xen tmem shim to be built/loaded as a module Allow Xen tmem shim to be built/loaded as a module. Xen self-ballooning and frontswap-selfshrinking are now also "lazily" initialized when the Xen tmem shim is loaded as a module, unless explicitly disabled by module parameters. Note runtime dependency disallows loading if cleancache/frontswap lazy initialization patches are not present. If built-in (not built as a module), the original mechanism of enabling via a kernel boot parameter is retained, but this should be considered deprecated. Note that module unload is explicitly not yet supported. [v1: Removed the [CLEANCACHE\|FRONTSWAP]_HAS_LAZY_INIT ifdef] [v2: Squashed the xen/tmem: Remove the subsys call patch in] [akpm@linux-foundation.org: fix build (disable_frontswap_selfshrinking undeclared)] Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Bob Liu <lliubbo@gmail.com> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Andor Daam <andor.daam@googlemail.com> Cc: Florian Schmaus <fschmaus@gmail.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Stefan Hengelein <ilendir@googlemail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-30 17:04:01 -07:00
Bob Liu	ff610a1d55	mm: cleancache: clean up cleancache_enabled cleancache_ops is used to decide whether backend is registered. So now cleancache_enabled is always true if defined CONFIG_CLEANCACHE. Signed-off-by: Bob Liu <lliubbo@gmail.com> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Andor Daam <andor.daam@googlemail.com> Cc: Dan Magenheimer <dan.magenheimer@oracle.com> Cc: Florian Schmaus <fschmaus@gmail.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Stefan Hengelein <ilendir@googlemail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-30 17:04:01 -07:00
Konrad Rzeszutek Wilk	833f8662af	cleancache: Make cleancache_init use a pointer for the ops Instead of using a backend_registered to determine whether a backend is enabled. This allows us to remove the backend_register check and just do 'if (cleancache_ops)' [v1: Rebase on top of b97c4b430b0a (ramster->zcache move] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Bob Liu <lliubbo@gmail.com> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Andor Daam <andor.daam@googlemail.com> Cc: Dan Magenheimer <dan.magenheimer@oracle.com> Cc: Florian Schmaus <fschmaus@gmail.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Stefan Hengelein <ilendir@googlemail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-30 17:04:01 -07:00
Minchan Kim	4f89849da2	frontswap: get rid of swap_lock dependency Frontswap initialization routine depends on swap_lock, which want to be atomic about frontswap's first appearance. IOW, frontswap is not present and will fail all calls OR frontswap is fully functional but if new swap_info_struct isn't registered by enable_swap_info, swap subsystem doesn't start I/O so there is no race between init procedure and page I/O working on frontswap. So let's remove unnecessary swap_lock dependency. Cc: Dan Magenheimer <dan.magenheimer@oracle.com> Signed-off-by: Minchan Kim <minchan@kernel.org> [v1: Rebased on my branch, reworked to work with backends loading late] [v2: Added a check for !map] [v3: Made the invalidate path follow the init path] [v4: Address comments by Wanpeng Li <liwanp@linux.vnet.ibm.com>] Signed-off-by: Konrad Rzeszutek Wilk <konrad@darnok.org> Signed-off-by: Bob Liu <lliubbo@gmail.com> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Andor Daam <andor.daam@googlemail.com> Cc: Florian Schmaus <fschmaus@gmail.com> Cc: Stefan Hengelein <ilendir@googlemail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-30 17:04:00 -07:00
Bob Liu	f066ea230a	mm: frontswap: cleanup code After allowing tmem backends to build/run as modules, frontswap_enabled always true if defined CONFIG_FRONTSWAP. But frontswap_test() depends on whether backend is registered, mv it into frontswap.c using fronstswap_ops to make the decision. frontswap_set/clear are not used outside frontswap, so don't export them. Signed-off-by: Bob Liu <lliubbo@gmail.com> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Andor Daam <andor.daam@googlemail.com> Cc: Dan Magenheimer <dan.magenheimer@oracle.com> Cc: Florian Schmaus <fschmaus@gmail.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Stefan Hengelein <ilendir@googlemail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-30 17:04:00 -07:00
Konrad Rzeszutek Wilk	1e01c968db	frontswap: make frontswap_init use a pointer for the ops This simplifies the code in the frontswap - we can get rid of the 'backend_registered' test and instead check against frontswap_ops. [v1: Rebase on top of `703ba7fe5e` (ramster->zcache move] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Bob Liu <lliubbo@gmail.com> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Andor Daam <andor.daam@googlemail.com> Cc: Dan Magenheimer <dan.magenheimer@oracle.com> Cc: Florian Schmaus <fschmaus@gmail.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Stefan Hengelein <ilendir@googlemail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-30 17:04:00 -07:00
Vincent Stehlé	ace6128d60	memory hotplug: fix warnings Fix the following compilation warnings: mm/slab.c: In function `kmem_cache_init_late': mm/slab.c:1778:2: warning: statement with no effect [-Wunused-value] mm/page_cgroup.c: In function `page_cgroup_init': mm/page_cgroup.c:305:2: warning: statement with no effect [-Wunused-value] Signed-off-by: Vincent Stehlé <vincent.stehle@laposte.net> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-30 17:04:00 -07:00
Dave Airlie	219b47339c	drm/prime: keep a reference from the handle to exported dma-buf (v6) Currently we have a problem with this: 1. i915: create gem object 2. i915: export gem object to prime 3. radeon: import gem object 4. close prime fd 5. radeon: unref object 6. i915: unref object i915 has an imported object reference in its file priv, that isn't cleaned up properly until fd close. The reference gets added at step 2, but at step 6 we don't have enough info to clean it up. The solution is to take a reference on the dma-buf when we export it, and drop the reference when the gem handle goes away. So when we export a dma_buf from a gem object, we keep track of it with the handle, we take a reference to the dma_buf. When we close the handle (i.e. userspace is finished with the buffer), we drop the reference to the dma_buf, and it gets collected. This patch isn't meant to fix any other problem or bikesheds, and it doesn't fix any races with other scenarios. v1.1: move export symbol line back up. v2: okay I had to do a bit more, as the first patch showed a leak on one of my tests, that I found using the dma-buf debugfs support, the problem case is exporting a buffer twice with the same handle, we'd add another export handle for it unnecessarily, however we now fail if we try to export the same object with a different gem handle, however I'm not sure if that is a case I want to support, and I've gotten the code to WARN_ON if we hit something like that. v2.1: rebase this patch, write better commit msg. v3: cleanup error handling, track import vs export in linked list, these two patches were separate previously, but seem to work better like this. v4: danvet is correct, this code is no longer useful, since the buffer better exist, so remove it. v5: always take a reference to the dma buf object, import or export. (Imre Deak contributed this originally) v6: square the circle, remove import vs export tracking now that there is no difference Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: stable@vger.kernel.org Signed-off-by: Dave Airlie <airlied@redhat.com>	2013-05-01 09:30:15 +10:00
Linus Torvalds	2e1deaad1e	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull security subsystem update from James Morris: "Just some minor updates across the subsystem" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: ima: eliminate passing d_name.name to process_measurement() TPM: Retry SaveState command in suspend path tpm/tpm_i2c_infineon: Add small comment about return value of __i2c_transfer tpm/tpm_i2c_infineon.c: Add OF attributes type and name to the of_device_id table entries tpm_i2c_stm_st33: Remove duplicate inclusion of header files tpm: Add support for new Infineon I2C TPM (SLB 9645 TT 1.2 I2C) char/tpm: Convert struct i2c_msg initialization to C99 format drivers/char/tpm/tpm_ppi: use strlcpy instead of strncpy tpm/tpm_i2c_stm_st33: formatting and white space changes Smack: include magic.h in smackfs.c selinux: make security_sb_clone_mnt_opts return an error on context mismatch seccomp: allow BPF_XOR based ALU instructions. Fix NULL pointer dereference in smack_inode_unlink() and smack_inode_rmdir() Smack: add support for modification of existing rules smack: SMACK_MAGIC to include/uapi/linux/magic.h Smack: add missing support for transmute bit in smack_str_from_perm() Smack: prevent revoke-subject from failing when unseen label is written to it tomoyo: use DEFINE_SRCU() to define tomoyo_ss tomoyo: use DEFINE_SRCU() to define tomoyo_ss	2013-04-30 16:27:51 -07:00
Linus Torvalds	3ed1c478ef	Power management and ACPI updates for 3.10-rc1 - ARM big.LITTLE cpufreq driver from Viresh Kumar. - exynos5440 cpufreq driver from Amit Daniel Kachhap. - cpufreq core cleanup and code consolidation from Viresh Kumar and Stratos Karafotis. - cpufreq scalability improvement from Nathan Zimmer. - AMD "frequency sensitivity feedback" powersave bias for the ondemand cpufreq governor from Jacob Shin. - cpuidle code consolidation and cleanups from Daniel Lezcano. - ARM OMAP cpuidle fixes from Santosh Shilimkar and Daniel Lezcano. - ACPICA fixes and other improvements from Bob Moore, Jung-uk Kim, Lv Zheng, Yinghai Lu, Tang Chen, Colin Ian King, and Linn Crosetto. - ACPI core updates related to hotplug from Toshi Kani, Paul Bolle, Yasuaki Ishimatsu, and Rafael J. Wysocki. - Intel Lynxpoint LPSS (Low-Power Subsystem) support improvements from Rafael J. Wysocki and Andy Shevchenko. / -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iQIcBAABAgAGBQJRf8M8AAoJEKhOf7ml8uNsud4P/3cabXP5lDipzibRrpOiONse puuvIdhtNdMRMc3t1oSDjNH/w/JA51Gc+ICGFAORiyVmqxBc85mpT6J5ibqV7hNd pCqbKJceoB5PajHZSx22e4wG9O7YN1k3r80p38IfFzA+Ct0KNSuE0ixMEfHKYjiq p5pXswk6TY3gtBReH9agrafHqDtXw4IMTE0asMuJ+BorPW7vQeiNlrkuA+0qmDuu 26O0Pm2TVkx1ryfTjdM9zSZ9X2G4JuM8rm1/VFZWQJTExwlv3bA2Za1nvQNJlJ99 6JZ0JXfAehcEW2Ye0sqsZ8HSEabDVHM29QvvOszJ5RpBXERiOCHOkhvFleCoTpn0 Xq0rtXPrLMH1G28Ej+cxmsAjfzOLV2Byg30CAoI/GCLuQ+xh+VMCpuNYQuld25CG 9rtYd0fWESeYsAebhDcX0E3xyzJtbrHtOb9PyGwNkbAJ8YQfhVSMCOPi2SX2wa+Q qXLXi2VaHvjBSUKcAv5BmM+Ya57Be+88D0LxbgXbUeOnYefUK1ljldKDDshkMjgG P4LPdm4JpoB5ncXSOO1Dz9w9QnNcFexSUySd/TtKLNMha1vEHV8ISzNPYY+9IdXf XN0VZbFnUDzdj+Fwna7zyFb1cGihDYJKAtpXvRd8Y6RGUxKx9uGLAFJZw/xZB/cR KZKuML5O8MgJuef37F38 =H/se -----END PGP SIGNATURE----- Merge tag 'pm+acpi-3.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management and ACPI updates from Rafael J Wysocki: - ARM big.LITTLE cpufreq driver from Viresh Kumar. - exynos5440 cpufreq driver from Amit Daniel Kachhap. - cpufreq core cleanup and code consolidation from Viresh Kumar and Stratos Karafotis. - cpufreq scalability improvement from Nathan Zimmer. - AMD "frequency sensitivity feedback" powersave bias for the ondemand cpufreq governor from Jacob Shin. - cpuidle code consolidation and cleanups from Daniel Lezcano. - ARM OMAP cpuidle fixes from Santosh Shilimkar and Daniel Lezcano. - ACPICA fixes and other improvements from Bob Moore, Jung-uk Kim, Lv Zheng, Yinghai Lu, Tang Chen, Colin Ian King, and Linn Crosetto. - ACPI core updates related to hotplug from Toshi Kani, Paul Bolle, Yasuaki Ishimatsu, and Rafael J Wysocki. - Intel Lynxpoint LPSS (Low-Power Subsystem) support improvements from Rafael J Wysocki and Andy Shevchenko. * tag 'pm+acpi-3.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (192 commits) cpufreq: Revert incorrect commit `5800043` cpufreq: MAINTAINERS: Add co-maintainer cpuidle: add maintainer entry ACPI / thermal: do not always return THERMAL_TREND_RAISING for active trip points ARM: s3c64xx: cpuidle: use init/exit common routine cpufreq: pxa2xx: initialize variables ACPI: video: correct acpi_video_bus_add error processing SH: cpuidle: use init/exit common routine ARM: S5pv210: compiling issue, ARM_S5PV210_CPUFREQ needs CONFIG_CPU_FREQ_TABLE=y ACPI: Fix wrong parameter passed to memblock_reserve cpuidle: fix comment format pnp: use %*phC to dump small buffers isapnp: remove debug leftovers ARM: imx: cpuidle: use init/exit common routine ARM: davinci: cpuidle: use init/exit common routine ARM: kirkwood: cpuidle: use init/exit common routine ARM: calxeda: cpuidle: use init/exit common routine ARM: tegra: cpuidle: use init/exit common routine for tegra3 ARM: tegra: cpuidle: use init/exit common routine for tegra2 ARM: OMAP4: cpuidle: use init/exit common routine ...	2013-04-30 15:21:02 -07:00
Linus Torvalds	151173e8ce	Highlights: - OpenFirmware/DeviceTree support for the Power Supply core: the core now automatically populates supplied_from hierarchy from the device tree. With these patches chargers and batteries can now lookup each other without the board files support shim. Rhyland Klein at NVIDIA did the work; - New ST-Ericsson ABX500 hwmon driver. The driver is heavily using the AB85xx core and depends on some recent changes to it, so that is why the driver comes through the battery tree. It has an appropriate ack from the hwmon maintainer (i.e. Guenter Roeck). Martin Persson at ST-Ericsson and Hongbo Zhang at Linaro authored the driver; - Final bits to sync AB85xx ST-Ericsson changes into mainline. The changes touch mfd parts, but these were acked by the appropriate MFD maintainer (i.e. Samuel Ortiz). Lee Jones at Linaro did most of the work and lead the submission process. Minor changes, but still worth mentioning: - Battery temperature reporting fix for Nokia N900 phones; - Versatile Express poweroff driver moved into drivers/power/reset/. - Tree-wise: use devm_kzalloc() where appropriate; - Tree-wise: dev_pm_ops cleanups/fixes. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iQIcBAABAgAGBQJRf1BPAAoJEGgI9fZJve1bwyoP/3Gv+qStzbN7mUtIVEvH3EAe aVJwlODFzEZjk5xoiw7Dc8PuBE8O948hWOnQyCuUQ8+OfK6SyNIjexPYy3Z25a0F cX9JMj7rtPWHvxo2q/YuKwBPZoxj/JIPyxwUT7akwXoHAV059fvcy9R1DfFX2Qur hSP0NXTg+guvEpxGV4bC2l+LWZPmDFK9n0RsorttYaBvsiRDWl0c2TY2byofYlBw ++m/rI8Qgl8db8pKq/WDue62HtMt3kmZj6ZIgej3Wb0+GIRmYHMyPIyAkf82Wlw2 g2sGNPT7cstrSNOozegzJ7UghJObcYDFf10NCgvFMNjmAT1dAwdneQHEWy6Ek7pT 9X3e0LmaFqVbufFp4xFiLkMutsCLLTnGyXIbzs7RkTm3XBVIUqiDWtI6I6X44ohG 6PJn8vUlufu7owXrqFpgSBar2U1vfoQdhInmz4hbQeff0qn2nX/BGNwhxYptZ549 TudsI9WGzJ6fvYQ56zh6+BfiA0FmjhUiSKOtrXImrhxE6gUf3IOJyMQlkxLx5t8D uuhBmO0J6kDi2lqF6alOEo+UDefJj4mUJn2tnIdis90+lNQlSV02GEtiwFT1zt1z LFW0xshQkxZ4lMa28h35FB1/Z11ApUOe4Es+OKADDJhAnxdZzXcAwIRyPRRPgdsy jTnJno+Kxk9wXLcekxVE =5BdE -----END PGP SIGNATURE----- Merge tag 'for-v3.10' of git://git.infradead.org/battery-2.6 Pull battery updates from Anton Vorontsov: "Highlights: - OpenFirmware/DeviceTree support for the Power Supply core: the core now automatically populates supplied_from hierarchy from the device tree. With these patches chargers and batteries can now lookup each other without the board files support shim. Rhyland Klein at NVIDIA did the work - New ST-Ericsson ABX500 hwmon driver. The driver is heavily using the AB85xx core and depends on some recent changes to it, so that is why the driver comes through the battery tree. It has an appropriate ack from the hwmon maintainer (i.e. Guenter Roeck). Martin Persson at ST-Ericsson and Hongbo Zhang at Linaro authored the driver - Final bits to sync AB85xx ST-Ericsson changes into mainline. The changes touch mfd parts, but these were acked by the appropriate MFD maintainer (ie Samuel Ortiz). Lee Jones at Linaro did most of the work and lead the submission process. Minor changes, but still worth mentioning: - Battery temperature reporting fix for Nokia N900 phones - Versatile Express poweroff driver moved into drivers/power/reset/ - Tree-wide: use devm_kzalloc() where appropriate - Tree-wide: dev_pm_ops cleanups/fixes" * tag 'for-v3.10' of git://git.infradead.org/battery-2.6: (112 commits) pm2301-charger: Fix suspend/resume charger-manager: Use kmemdup instead of kzalloc + memcpy power_supply: Populate supplied_from hierarchy from the device tree power_supply: Add core support for supplied_from power_supply: Define Binding for power-supplies rx51_battery: Fix reporting temperature hwmon: Add ST-Ericsson ABX500 hwmon driver ab8500_bmdata: Export abx500_res_to_temp tables for hwmon ab8500_{bmdata,fg}: Add const attributes to some data arrays ab8500_bmdata: Eliminate CamelCase warning of some variables ab8500_btemp: Make ab8500_btemp_get* interfaces public goldfish_battery: Use resource_size() lp8788-charger: Use PAGE_SIZE for the sysfs read operation max8925_power: Use devm_kzalloc() da9030_battery: Use devm_kzalloc() da9052-battery: Use devm_kzalloc() ds2760_battery: Use devm_kzalloc() ds2780_battery: Use devm_kzalloc() gpio-charger: Use devm_kzalloc() isp1704_charger: Use devm_kzalloc() ...	2013-04-30 15:15:24 -07:00
Linus Torvalds	5aa1c98862	SCSI misc on 20130430 The patch set is mostly driver updates (qla4, qla2 [ISF support updates], lpfc, aacraid [dual firmware image support]) and a few bug fixes. Signed-off-by: James Bottomley <JBottomley@Parallels.com> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iQEcBAABAgAGBQJRgAd8AAoJEDeqqVYsXL0Mg5wH/3P4wlXaRyqvyrFk1WSkmklZ 6YxzIKn/RLFmJlJvkaTT7N02ble2UqTluB6+5+AorU/jqz6DArxHsPnyv0/+2pXS zYmp1hrcLn9dB3sZ2Y32jU2GlzHq+LSJSjjnUrA/uRrq1KTP09KCJtGbZUkvy710 x1/3e3I8u2bvBAehUkKvazg5+xlw/XImJ+IVXgUWOyiv1mNbqNEtT5qYt7sjnhLu Mg2VfKTSb+kzxSpol3v51vh/wqY6unVcI/a9HxanihkDtkqRBRhg8wXpNAYI2xNK uNLGq0VFpyCYfBZX0aqh01QZ++EUU2TNlRreVg4/crQnBR88EI3KtUsA1BOthwQ= =x6dL -----END PGP SIGNATURE----- Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull first round of SCSI updates from James "Jej B" Bottomley: "The patch set is mostly driver updates (qla4, qla2 [ISF support updates], lpfc, aacraid [dual firmware image support]) and a few bug fixes" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (47 commits) [SCSI] iscsi_tcp: support PF_MEMALLOC/__GFP_MEMALLOC [SCSI] libiscsi: avoid unnecessary multiple NULL assignments [SCSI] qla4xxx: Update driver version to 5.03.00-k8 [SCSI] qla4xxx: Added print statements to display AENs [SCSI] qla4xxx: Use correct value for max flash node entries [SCSI] qla4xxx: Restrict logout from boot target session using session id [SCSI] qla4xxx: Use correct flash ddb offset for ISP40XX [SCSI] isci: add CONFIG_PM_SLEEP to suspend/resume functions [SCSI] scsi_dh_alua: Add module parameter to allow failover to non preferred path without STPG [SCSI] qla2xxx: Update the driver version to 8.05.00.03-k. [SCSI] qla2xxx: Obtain loopback iteration count from bsg request. [SCSI] qla2xxx: Add clarifying printk to thermal access fail cases. [SCSI] qla2xxx: Remove duplicated include form qla_isr.c [SCSI] qla2xxx: Enhancements to support ISPFx00. [SCSI] qla4xxx: Update driver version to 5.03.00-k7 [SCSI] qla4xxx: Replace dev type macros with generic portal type macros [SCSI] scsi_transport_iscsi: Declare portal type string macros for generic use [SCSI] qla4xxx: Add flash node mgmt support [SCSI] libiscsi: export function iscsi_switch_str_param [SCSI] scsi_transport_iscsi: Add flash node mgmt support ...	2013-04-30 13:16:38 -07:00
Linus Torvalds	6da6dc2380	Merge branch 'for-next-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending Pull SCSI target update from Nicholas Bellinger: "The highlights this round include: - Add fileio support for WRITE_SAME w/ UNMAP=1 discard (asias) - Add fileio support for UNMAP discard (asias) - Add tcm_vhost hotplug support to work with upstream QEMU vhost-scsi-pci code (asias + mst) - Check for aborted sequence in tcm_fc response path (mdr) - Add initial iscsit_transport support into iscsi-target code (nab) - Refactor iscsi-target RX PDU logic + export request PDU handling (nab) - Refactor iscsi-target TX queue logic + export response PDU creation (nab) - Add new iSCSI Extentions for RDMA (ISER) target driver (Or + nab) The biggest changes revolve around iscsi-target refactoring in order to support the iser-target driver. This includes the conversion of the iscsi-target data-path to use modern se_cmd->cmd_kref counting, and allowing transport independent aspects of RX/TX PDU request/response handling be shared across existing traditional iscsi-target code, and the new iser-target code. Thanks to Or Gerlitz + Mellanox for supporting the iser-target development effort!" * 'for-next-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (25 commits) iser-target: Add iSCSI Extensions for RDMA (iSER) target driver tcm_vhost: Enable VIRTIO_SCSI_F_HOTPLUG tcm_vhost: Add ioctl to get and set events missed flag tcm_vhost: Add hotplug/hotunplug support tcm_vhost: Refactor the lock nesting rule tcm_fc: Check for aborted sequence iscsi-target: Add iser network portal attribute iscsi-target: Refactor TX queue logic + export response PDU creation iscsi-target: Refactor RX PDU logic + export request PDU handling iscsi-target: Add per transport iscsi_cmd alloc/free iscsi-target: Add iser-target parameter keys + setup during login iscsi-target: Initial traditional TCP conversion to iscsit_transport iscsi-target: Add iscsit_transport API template target: Add export of target_get_sess_cmd symbol target: Change default sense key of NOT_READY target/file: Set is_nonrot attribute target: Add sbc_execute_unmap() helper target/iblock: Add iblock_do_unmap() helper target/file: Add fd_do_unmap() helper target/file: Add UNMAP emulation support ...	2013-04-30 13:14:57 -07:00
Eric Paris	b24a30a730	audit: fix event coverage of AUDIT_ANOM_LINK The userspace audit tools didn't like the existing formatting of the AUDIT_ANOM_LINK event. It needed to be expanded to emit an AUDIT_PATH event as well, so this implements the change. The bulk of the patch is moving code out of auditsc.c into audit.c and audit.h for general use. It expands audit_log_name to include an optional "struct path" argument for the simple case of just needing to report a pathname. This also makes audit_log_task_info available when syscall auditing is not enabled, since it is needed in either case for process details. Signed-off-by: Kees Cook <keescook@chromium.org> Reported-by: Steve Grubb <sgrubb@redhat.com>	2013-04-30 15:31:28 -04:00
Richard Guy Briggs	46e959ea29	audit: add an option to control logging of passwords with pam_tty_audit Most commands are entered one line at a time and processed as complete lines in non-canonical mode. Commands that interactively require a password, enter canonical mode to do this while shutting off echo. This pair of features (icanon and !echo) can be used to avoid logging passwords by audit while still logging the rest of the command. Adding a member (log_passwd) to the struct audit_tty_status passed in by pam_tty_audit allows control of canonical mode without echo per task. Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>	2013-04-30 15:31:28 -04:00
Eric Paris	4d3fb709b2	helper for some session id stuff	2013-04-30 15:31:28 -04:00
Eric Paris	b122c3767c	audit: use a consistent audit helper to log lsm information We have a number of places we were reimplementing the same code to write out lsm labels. Just do it one darn place. Signed-off-by: Eric Paris <eparis@redhat.com>	2013-04-30 15:31:28 -04:00
Eric Paris	152f497b9b	audit: push loginuid and sessionid processing down Since we are always current, we can push a lot of this stuff to the bottom and get rid of useless interfaces and arguments. Signed-off-by: Eric Paris <eparis@redhat.com>	2013-04-30 15:31:28 -04:00
Eric Paris	dc9eb698f4	audit: stop pushing loginid, uid, sessionid as arguments We always use current. Stop pulling this when the skb comes in and pushing it around as arguments. Just get it at the end when you need it. Signed-off-by: Eric Paris <eparis@redhat.com>	2013-04-30 15:31:28 -04:00
Linus Torvalds	8728f986fe	NFS client bugfixes and cleanups for 3.10 - NLM: stable fix for NFSv2/v3 blocking locks - NFSv4.x: stable fixes for the delegation recall error handling code - NFSv4.x: Security flavour negotiation fixes and cleanups by Chuck Lever - SUNRPC: A number of RPCSEC_GSS fixes and cleanups also from Chuck - NFSv4.x assorted state management and reboot recovery bugfixes - NFSv4.1: In cases where we have already looked up a file, and hold a valid filehandle, use the new open-by-filehandle operation instead of opening by name. - Allow the NFSv4.1 callback thread to freeze - NFSv4.x: ensure that file unlock waits for readahead to complete - NFSv4.1: ensure that the RPC layer doesn't override the NFS session table size negotiation by limiting the number of slots. - NFSv4.x: Fix SETATTR spec compatibility issues -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.13 (GNU/Linux) iQIcBAABAgAGBQJRfu+cAAoJEGcL54qWCgDylxkP/24cXOLHMKMYnIab0cQYIW2m SQgADGE+MqgTVlWjGVWublVMDY1R51iINsksAjxMtXYt50FdBJEqV2uxIGi4VnbR nR9eppqQ6vk5e6r5+aZyVmWKoLFnJ4MBF6OpPUZB5mf8iH/fiixmSYLvseopPdDj bjHwCxg+xEgew5EhQF/xqkEfkAp2NN84xUksTWb9uDIW2c3SJweY/ZVR2Zsqpugm oqYVtrSLvNKqINQG8OP10s+mRWULwoqapF+kEHlxNbRy26C0zlbXPaneSgYzqHsY OyRkAT7uJJqStYlqdW7k+DhyNMB+T33WAGJpWQlfJGYk5d/n0rtBJDVo0hfhCSQr VkOXiO9J08NMbelCu4+0CJii7h5GCaqpuJEEmNL6AlF/TJVkIQJuRaG2+WDmEtO2 oYd4UfXlAbUuts1SW7u/yyN/yrjVTm1tZYRBqn2VJdqh1s8dMxEWPct2Yn314mpS ODAnbDkEhtWlc9cloSRnwKec5WcxMZb19IJeK9ZvHm7PfIu/QHtj6Ren8s1//bZI OMQxC/Vf/wcjMdNtr7QdMNxWG1aK8DL9mYP5XwCrkZ560LIrtxmhyqYeoGAfgO5u +K/gKmQwjsaPhEa8jbP2/wI0II9yKPWj/fVwqhbhqaBUx5GA2iAKcdpPP6JAMAti +PXkLTtkyrIgSNwzl63S =Hgot -----END PGP SIGNATURE----- Merge tag 'nfs-for-3.10-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client bugfixes and cleanups from Trond Myklebust: - NLM: stable fix for NFSv2/v3 blocking locks - NFSv4.x: stable fixes for the delegation recall error handling code - NFSv4.x: Security flavour negotiation fixes and cleanups by Chuck Lever - SUNRPC: A number of RPCSEC_GSS fixes and cleanups also from Chuck - NFSv4.x assorted state management and reboot recovery bugfixes - NFSv4.1: In cases where we have already looked up a file, and hold a valid filehandle, use the new open-by-filehandle operation instead of opening by name. - Allow the NFSv4.1 callback thread to freeze - NFSv4.x: ensure that file unlock waits for readahead to complete - NFSv4.1: ensure that the RPC layer doesn't override the NFS session table size negotiation by limiting the number of slots. - NFSv4.x: Fix SETATTR spec compatibility issues * tag 'nfs-for-3.10-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (67 commits) NFSv4: Warn once about servers that incorrectly apply open mode to setattr NFSv4: Servers should only check SETATTR stateid open mode on size change NFSv4: Don't recheck permissions on open in case of recovery cached open NFSv4.1: Don't do a delegated open for NFS4_OPEN_CLAIM_DELEG_CUR_FH modes NFSv4.1: Use the more efficient open_noattr call for open-by-filehandle NFS: Retry SETCLIENTID with AUTH_SYS instead of AUTH_NONE NFSv4: Ensure that we clear the NFS_OPEN_STATE flag when appropriate LOCKD: Ensure that nlmclnt_block resets block->b_status after a server reboot NFSv4: Ensure the LOCK call cannot use the delegation stateid NFSv4: Use the open stateid if the delegation has the wrong mode nfs: Send atime and mtime as a 64bit value NFSv4: Record the OPEN create mode used in the nfs4_opendata structure NFSv4.1: Set the RPC_CLNT_CREATE_INFINITE_SLOTS flag for NFSv4.1 transports SUNRPC: Allow rpc_create() to request that TCP slots be unlimited SUNRPC: Fix a livelock problem in the xprt->backlog queue NFSv4: Fix handling of revoked delegations by setattr NFSv4 release the sequence id in the return on close case nfs: remove unnecessary check for NULL inode->i_flock from nfs_delegation_claim_locks NFS: Ensure that NFS file unlock waits for readahead to complete NFS: Add functionality to allow waiting on all outstanding reads to complete ...	2013-04-30 11:28:08 -07:00
Stanislaw Gruszka	f300213415	Revert "math64: New div64_u64_rem helper" This reverts commit `f792685006`. The cputime scaling code was changed/fixed and does not need the div64_u64_rem() primitive anymore. It has no other users, so let's remove them. Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: rostedt@goodmis.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Dave Hansen <dave@sr71.net> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1367314507-9728-4-git-send-email-sgruszka@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-04-30 19:13:05 +02:00
Linus Torvalds	87c1f0f8c9	Misc arch/metag changes for v3.10-rc1 - Various fixes for the interrupting perf counter handling in metag's perf backend. - Add OProfile support based on perf. - Sets up cache partitions for SMP so bootloader doesn't have to. - Patch from Paul Bolle to remove ARCH_POPULATES_NODE_MAP again (touches microblaze too). - Add TLS pointer regset to metag ptrace api. - Add exported metag DSP extended context handling header <asm/ech.h>. - Increase defconfig log buffer size to 128KiB. - Various fixes, typos, missing exports. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.13 (GNU/Linux) iQIcBAABAgAGBQJRf8FdAAoJEKHZs+irPybfY68P/3Orph99480CpWq2gNjGgzrU 0xTpVaF+7/qmfdCBPi3byZXX+RdNXQC3MrSzf3pFBcLsEs5JpTngvOzrqR+ZJBg3 83p5Gv7m/Gtl67s3rymiex7eM3AnTu4TmwXGv2C7Equ7kwq21jQbgxXoOq1YSBdk hsT/eNDUwfFV6WCSexsG7svt23LVCoaT0hwo0HniEG/ej3ezPSTJ9Y0WuZR0oDfx EBl6INcNr8yhhS7UPLZhqF5oDJWpu22zPv14g8FXrAF/QXvfBSjH49xUtOOt1TYS VFOHDAQCjPMEJDltkxKcDUtNw1WW5i1y0HMcblAHFpa1YYZQdy4JwR1vgZxJs9eW 2wNFo0omUi/EU0s98AvVKU1hs4aDpVovENVGpzIEbkkFWsDDT2G6Pp30uHSrmZRb yvn87JwijPOgLnc8xrN/7jeWJ9fP/KX08YTS7/8H7U7PZBos9aSAqp/V0yD2e+Sh xSqzo2xYAey+yNCGvuyx/AePjIP/EblQI+/zjDoGcFS+OhvwK9N1TN0s1rGk7xG8 KXjke4xcaxP9a+p8TQAf/m9DTmXxXjIcK99HZ4oKv5IrEWZI4P8aFRq/H/N7aZTt +vscajLQpprFnd0FsnzD+ntuCrc/fMofz3l29kW7uRpRJghf4KJCxyRho40KDvsB 31L4HvWrU8m/FhVhJ+2q =JhIi -----END PGP SIGNATURE----- Merge tag 'metag-for-v3.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/metag Pull arch/metag update from James Hogan: - Various fixes for the interrupting perf counter handling in metag's perf backend. - Add OProfile support based on perf. - Sets up cache partitions for SMP so bootloader doesn't have to. - Patch from Paul Bolle to remove ARCH_POPULATES_NODE_MAP again (touches microblaze too). - Add TLS pointer regset to metag ptrace api. - Add exported metag DSP extended context handling header <asm/ech.h>. - Increase defconfig log buffer size to 128KiB. - Various fixes, typos, missing exports. * tag 'metag-for-v3.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/metag: metag: defconfigs: increase log buffer 8KiB => 128KiB metag: avoid unnecessary builtin dtb rebuilds metag: add exported <asm/ech.h> for extended context handling metag: export _metag_da_present and cpu_2_hwthread_id metag: ptrace: Implement NT_METAG_TLS memblock: Kill ARCH_POPULATES_NODE_MAP once more metag: cachepart: fix get_global_dcache_size() typo metag: cachepart: take into account small cache bits metag: smp: copy cache partition and enable GCOn metag: OProfile support metag: perf: prepare for use by oprofile metag: perf: don't reset TXTACTCYC metag: perf: use hard_processor_id() to get thread metag: perf: fix frequency sampling (dynamic period) metag: perf: add missing prev_count updates metag: perf: fixes for interrupting perf counters metag: perf: fix wrap handling in delta calculation metag: perf: fix core internal / perf channel mux	2013-04-30 10:09:39 -07:00
Linus Torvalds	240c3c3424	Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media update from Mauro Carvalho Chehab: - OF documentation and patches at core and drivers, to be used by for embedded media systems - some I2C drivers used on go7007 were rewritten/promoted from staging: sony-btf-mpx, tw2804, tw9903, tw9906, wis-ov7640, wis-uda1342 - add fimc-is driver (Exynos) - add a new radio driver: radio-si476x - add a two new tuners: r820t and tuner_it913x - split camera code on em28xx driver and add more models - the cypress firmware load is used outside dvb usb drivers. So, move it to a common directory to make easier to re-use it - siano media driver updated to work with sms2270 devices - several work done in order to promote go7007 and solo6x1x out of staging (still, there are some pending issues) - several API compliance fixes at v4l2 drivers that don't behave as expected - as usual, lots of driver fixes, improvements, cleanups and new device addition at the existing drivers. * 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (831 commits) [media] cx88: make core less verbose [media] em28xx: fix oops at em28xx_dvb_bus_ctrl() [media] s5c73m3: fix indentation of the help section in Kconfig [media] cx25821-alsa: get rid of a __must_check warning [media] cx25821-video: declare cx25821_vidioc_s_std as static [media] cx25821-video: remove maxw from cx25821_vidioc_try_fmt_vid_cap [media] r820t: Remove a warning for an unused value [media] dib0090: Fix a warning at dib0090_set_EFUSE [media] dib8000: fix a warning [media] dib8000: Fix sub-channel range [media] dib8000: store dtv_property_cache in a temp var [media] dib8000: warning fix: declare internal functions as static [media] r820t: quiet gcc warning on n_ring [media] r820t: memory leak in release() [media] r820t: precendence bug in r820t_xtal_check() [media] videodev2.h: Remove the unused old V4L1 buffer types [media] anysee: Grammar s/report the/report to/ [media] anysee: Initialize ret = 0 in anysee_frontend_attach() [media] media: videobuf2: fix the length check for mmap [media] em28xx: save isoc endpoint number for DVB only if endpoint has alt settings with xMaxPacketSize != 0 ...	2013-04-30 09:58:16 -07:00
Linus Torvalds	19b344efa3	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid Pull HID updates from Jiri Kosina: - hid driver transport cleanup, finalizing the long-desired decoupling of core from transport layers, by Benjamin Tissoires and Henrik Rydberg - support for hybrid finger/pen multitouch HID devices, by Benjamin Tissoires - fix for long-standing issue in Logitech unifying driver sometimes not inializing properly due to device specifics, by Andrew de los Reyes - Wii remote driver updates to support 2nd generation of devices, by David Herrmann - support for Apple IR remote - roccat driver now supports new devices (Roccat Kone Pure, IskuFX), by Stefan Achatz - debugfs locking fixes in hid debug interface, by Jiri Kosina * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: (43 commits) HID: protect hid_debug_list HID: debug: break out hid_dump_report() into hid-debug HID: Add PID for Japanese version of NE4K keyboard HID: hid-lg4ff add support for new version of DFGT wheel HID: icade: u16 which never < 0 HID: clarify Magic Mouse Kconfig description HID: appleir: add support for Apple ir devices HID: roccat: added media key support for Kone HID: hid-lenovo-tpkbd: remove doubled hid_get_drvdata HID: i2c-hid: fix length for set/get report in i2c hid HID: wiimote: parse reduced status reports HID: wiimote: add 2nd generation Wii Remote IDs HID: wiimote: use unique battery names HID: hidraw: warn if userspace headers are outdated HID: multitouch: force BTN_STYLUS for pen devices HID: multitouch: append " Pen" to the name of the stylus input HID: multitouch: add handling for pen in dual-sensors device HID: multitouch: change touch sensor detection in mt_input_configured() HID: multitouch: do not map usage from non used reports HID: multitouch: breaks out touch handling in specific functions ...	2013-04-30 09:37:55 -07:00
Linus Torvalds	5d434fcb25	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree updates from Jiri Kosina: "Usual stuff, mostly comment fixes, typo fixes, printk fixes and small code cleanups" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (45 commits) mm: Convert print_symbol to %pSR gfs2: Convert print_symbol to %pSR m32r: Convert print_symbol to %pSR iostats.txt: add easy-to-find description for field 6 x86 cmpxchg.h: fix wrong comment treewide: Fix typo in printk and comments doc: devicetree: Fix various typos docbook: fix 8250 naming in device-drivers pata_pdc2027x: Fix compiler warning treewide: Fix typo in printks mei: Fix comments in drivers/misc/mei treewide: Fix typos in kernel messages pm44xx: Fix comment for "CONFIG_CPU_IDLE" doc: Fix typo "CONFIG_CGROUP_CGROUP_MEMCG_SWAP" mmzone: correct "pags" to "pages" in comment. kernel-parameters: remove outdated 'noresidual' parameter Remove spurious _H suffixes from ifdef comments sound: Remove stray pluses from Kconfig file radio-shark: Fix printk "CONFIG_LED_CLASS" doc: put proper reference to CONFIG_MODULE_SIG_ENFORCE ...	2013-04-30 09:36:50 -07:00
Linus Torvalds	5a5a1bf099	Merge branch 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 RAS changes from Ingo Molnar: - Add an Intel CMCI hotplug fix - Add AMD family 16h EDAC support - Make the AMD MCE banks code more flexible for virtual environments * 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: amd64_edac: Add Family 16h support x86/mce: Rework cmci_rediscover() to play well with CPU hotplug x86, MCE, AMD: Use MCG_CAP MSR to find out number of banks on AMD x86, MCE, AMD: Replace shared_bank array with is_shared_bank() helper	2013-04-30 08:42:45 -07:00
Linus Torvalds	ab86e974f0	Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core timer updates from Ingo Molnar: "The main changes in this cycle's merge are: - Implement shadow timekeeper to shorten in kernel reader side blocking, by Thomas Gleixner. - Posix timers enhancements by Pavel Emelyanov: - allocate timer ID per process, so that exact timer ID allocations can be re-created be checkpoint/restore code. - debuggability and tooling (/proc/PID/timers, etc.) improvements. - suspend/resume enhancements by Feng Tang: on certain new Intel Atom processors (Penwell and Cloverview), there is a feature that the TSC won't stop in S3 state, so the TSC value won't be reset to 0 after resume. This can be taken advantage of by the generic via the CLOCK_SOURCE_SUSPEND_NONSTOP flag: instead of using the RTC to recover/approximate sleep time, the main (and precise) clocksource can be used. - Fix /proc/timer_list for 4096 CPUs by Nathan Zimmer: on so many CPUs the file goes beyond 4MB of size and thus the current simplistic seqfile approach fails. Convert /proc/timer_list to a proper seq_file with its own iterator. - Cleanups and refactorings of the core timekeeping code by John Stultz. - International Atomic Clock time is managed by the NTP code internally currently but not exposed externally. Separate the TAI code out and add CLOCK_TAI support and TAI support to the hrtimer and posix-timer code, by John Stultz. - Add deep idle support enhacement to the broadcast clockevents core timer code, by Daniel Lezcano: add an opt-in CLOCK_EVT_FEAT_DYNIRQ clockevents feature (which will be utilized by future clockevents driver updates), which allows the use of IRQ affinities to avoid spurious wakeups of idle CPUs - the right CPU with an expiring timer will be woken. - Add new ARM bcm281xx clocksource driver, by Christian Daudt - ... various other fixes and cleanups" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits) clockevents: Set dummy handler on CPU_DEAD shutdown timekeeping: Update tk->cycle_last in resume posix-timers: Remove unused variable clockevents: Switch into oneshot mode even if broadcast registered late timer_list: Convert timer list to be a proper seq_file timer_list: Split timer_list_show_tickdevices posix-timers: Show sigevent info in proc file posix-timers: Introduce /proc/PID/timers file posix timers: Allocate timer id per process (v2) timekeeping: Make sure to notify hrtimers when TAI offset changes hrtimer: Fix ktime_add_ns() overflow on 32bit architectures hrtimer: Add expiry time overflow check in hrtimer_interrupt timekeeping: Shorten seq_count region timekeeping: Implement a shadow timekeeper timekeeping: Delay update of clock->cycle_last timekeeping: Store cycle_last value in timekeeper struct as well ntp: Remove ntp_lock, using the timekeeping locks to protect ntp state timekeeping: Simplify tai updating from do_adjtimex timekeeping: Hold timekeepering locks in do_adjtimex and hardpps timekeeping: Move ADJ_SETOFFSET to top level do_adjtimex() ...	2013-04-30 08:15:40 -07:00
Matt Fleming	8a415b8c05	efi, pstore: Read data from variable store before memcpy() Seiji reported getting empty dmesg-* files, because the data was never actually read in efi_pstore_read_func(), and so the memcpy() was copying garbage data. This patch necessitated adding __efivar_entry_get() which is callable between efivar_entry_iter_{begin,end}(). We can also delete __efivar_entry_size() because efi_pstore_read_func() was the only caller. Reported-by: Seiji Aguchi <seiji.aguchi@hds.com> Tested-by: Seiji Aguchi <seiji.aguchi@hds.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Matthew Garrett <matthew.garrett@nebula.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com>	2013-04-30 16:03:10 +01:00
Linus Torvalds	8700c95adb	Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull SMP/hotplug changes from Ingo Molnar: "This is a pretty large, multi-arch series unifying and generalizing the various disjunct pieces of idle routines that architectures have historically copied from each other and have grown in random, wildly inconsistent and sometimes buggy directions: 101 files changed, 455 insertions(+), 1328 deletions(-) this went through a number of review and test iterations before it was committed, it was tested on various architectures, was exposed to linux-next for quite some time - nevertheless it might cause problems on architectures that don't read the mailing lists and don't regularly test linux-next. This cat herding excercise was motivated by the -rt kernel, and was brought to you by Thomas "the Whip" Gleixner." * 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits) idle: Remove GENERIC_IDLE_LOOP config switch um: Use generic idle loop ia64: Make sure interrupts enabled when we "safe_halt()" sparc: Use generic idle loop idle: Remove unused ARCH_HAS_DEFAULT_IDLE bfin: Fix typo in arch_cpu_idle() xtensa: Use generic idle loop x86: Use generic idle loop unicore: Use generic idle loop tile: Use generic idle loop tile: Enter idle with preemption disabled sh: Use generic idle loop score: Use generic idle loop s390: Use generic idle loop powerpc: Use generic idle loop parisc: Use generic idle loop openrisc: Use generic idle loop mn10300: Use generic idle loop mips: Use generic idle loop microblaze: Use generic idle loop ...	2013-04-30 07:50:17 -07:00
Linus Torvalds	16fa94b532	Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler changes from Ingo Molnar: "The main changes in this development cycle were: - full dynticks preparatory work by Frederic Weisbecker - factor out the cpu time accounting code better, by Li Zefan - multi-CPU load balancer cleanups and improvements by Joonsoo Kim - various smaller fixes and cleanups" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (45 commits) sched: Fix init NOHZ_IDLE flag sched: Prevent to re-select dst-cpu in load_balance() sched: Rename load_balance_tmpmask to load_balance_mask sched: Move up affinity check to mitigate useless redoing overhead sched: Don't consider other cpus in our group in case of NEWLY_IDLE sched: Explicitly cpu_idle_type checking in rebalance_domains() sched: Change position of resched_cpu() in load_balance() sched: Fix wrong rq's runnable_avg update with rt tasks sched: Document task_struct::personality field sched/cpuacct/UML: Fix header file dependency bug on the UML build cgroup: Kill subsys.active flag sched/cpuacct: No need to check subsys active state sched/cpuacct: Initialize cpuacct subsystem earlier sched/cpuacct: Initialize root cpuacct earlier sched/cpuacct: Allocate per_cpu cpuusage for root cpuacct statically sched/cpuacct: Clean up cpuacct.h sched/cpuacct: Remove redundant NULL checks in cpuacct_acount_field() sched/cpuacct: Remove redundant NULL checks in cpuacct_charge() sched/cpuacct: Add cpuacct_acount_field() sched/cpuacct: Add cpuacct_init() ...	2013-04-30 07:43:28 -07:00
Linus Torvalds	e0972916e8	Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "Features: - Add "uretprobes" - an optimization to uprobes, like kretprobes are an optimization to kprobes. "perf probe -x file sym%return" now works like kretprobes. By Oleg Nesterov. - Introduce per core aggregation in 'perf stat', from Stephane Eranian. - Add memory profiling via PEBS, from Stephane Eranian. - Event group view for 'annotate' in --stdio, --tui and --gtk, from Namhyung Kim. - Add support for AMD NB and L2I "uncore" counters, by Jacob Shin. - Add Ivy Bridge-EP uncore support, by Zheng Yan - IBM zEnterprise EC12 oprofile support patchlet from Robert Richter. - Add perf test entries for checking breakpoint overflow signal handler issues, from Jiri Olsa. - Add perf test entry for for checking number of EXIT events, from Namhyung Kim. - Add perf test entries for checking --cpu in record and stat, from Jiri Olsa. - Introduce perf stat --repeat forever, from Frederik Deweerdt. - Add --no-demangle to report/top, from Namhyung Kim. - PowerPC fixes plus a couple of cleanups/optimizations in uprobes and trace_uprobes, by Oleg Nesterov. Various fixes and refactorings: - Fix dependency of the python binding wrt libtraceevent, from Naohiro Aota. - Simplify some perf_evlist methods and to allow 'stat' to share code with 'record' and 'trace', by Arnaldo Carvalho de Melo. - Remove dead code in related to libtraceevent integration, from Namhyung Kim. - Revert "perf sched: Handle PERF_RECORD_EXIT events" to get 'perf sched lat' back working, by Arnaldo Carvalho de Melo - We don't use Newt anymore, just plain libslang, by Arnaldo Carvalho de Melo. - Kill a bunch of die() calls, from Namhyung Kim. - Fix build on non-glibc systems due to libio.h absence, from Cody P Schafer. - Remove some perf_session and tracing dead code, from David Ahern. - Honor parallel jobs, fix from Borislav Petkov - Introduce tools/lib/lk library, initially just removing duplication among tools/perf and tools/vm. from Borislav Petkov ... and many more I missed to list, see the shortlog and git log for more details." * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (136 commits) perf/x86/intel/P4: Robistify P4 PMU types perf/x86/amd: Fix AMD NB and L2I "uncore" support perf/x86/amd: Remove old-style NB counter support from perf_event_amd.c perf/x86: Check all MSRs before passing hw check perf/x86/amd: Add support for AMD NB and L2I "uncore" counters perf/x86/intel: Add Ivy Bridge-EP uncore support perf/x86/intel: Fix SNB-EP CBO and PCU uncore PMU filter management perf/x86: Avoid kfree() in CPU_{STARTING,DYING} uprobes/perf: Avoid perf_trace_buf_prepare/submit if ->perf_events is empty uprobes/tracing: Don't pass addr=ip to perf_trace_buf_submit() uprobes/tracing: Change create_trace_uprobe() to support uretprobes uprobes/tracing: Make seq_printf() code uretprobe-friendly uprobes/tracing: Make register_uprobe_event() paths uretprobe-friendly uprobes/tracing: Make uprobe_{trace,perf}_print() uretprobe-friendly uprobes/tracing: Introduce is_ret_probe() and uretprobe_dispatcher() uprobes/tracing: Introduce uprobe_{trace,perf}_print() helpers uprobes/tracing: Generalize struct uprobe_trace_entry_head uprobes/tracing: Kill the pointless local_save_flags/preempt_count calls uprobes/tracing: Kill the pointless seq_print_ip_sym() call uprobes/tracing: Kill the pointless task_pt_regs() calls ...	2013-04-30 07:41:01 -07:00
Linus Torvalds	1f889ec62c	Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnar: "The main changes in this cycle are mostly related to preparatory work for the full-dynticks work: - Remove restrictions on no-CBs CPUs, make RCU_FAST_NO_HZ take advantage of numbered callbacks, do callback accelerations based on numbered callbacks. Posted to LKML at https://lkml.org/lkml/2013/3/18/960 - RCU documentation updates. Posted to LKML at https://lkml.org/lkml/2013/3/18/570 - Miscellaneous fixes. Posted to LKML at https://lkml.org/lkml/2013/3/18/594" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) rcu: Make rcu_accelerate_cbs() note need for future grace periods rcu: Abstract rcu_start_future_gp() from rcu_nocb_wait_gp() rcu: Rename n_nocb_gp_requests to need_future_gp rcu: Push lock release to rcu_start_gp()'s callers rcu: Repurpose no-CBs event tracing to future-GP events rcu: Rearrange locking in rcu_start_gp() rcu: Make RCU_FAST_NO_HZ take advantage of numbered callbacks rcu: Accelerate RCU callbacks at grace-period end rcu: Export RCU_FAST_NO_HZ parameters to sysfs rcu: Distinguish "rcuo" kthreads by RCU flavor rcu: Add event tracing for no-CBs CPUs' grace periods rcu: Add event tracing for no-CBs CPUs' callback registration rcu: Introduce proper blocking to no-CBs kthreads GP waits rcu: Provide compile-time control for no-CBs CPUs rcu: Tone down debugging during boot-up and shutdown. rcu: Add softirq-stall indications to stall-warning messages rcu: Documentation update rcu: Make bugginess of code sample more evident rcu: Fix hlist_bl_set_first_rcu() annotation rcu: Delete unused rcu_node "wakemask" field ...	2013-04-30 07:39:01 -07:00
Mauro Carvalho Chehab	df90e22589	Merge branch 'devel-for-v3.10' into v4l_for_linus * patchwork: (831 commits) [media] cx88: make core less verbose [media] em28xx: fix oops at em28xx_dvb_bus_ctrl() [media] s5c73m3: fix indentation of the help section in Kconfig [media] cx25821-alsa: get rid of a __must_check warning [media] cx25821-video: declare cx25821_vidioc_s_std as static [media] cx25821-video: remove maxw from cx25821_vidioc_try_fmt_vid_cap [media] r820t: Remove a warning for an unused value [media] dib0090: Fix a warning at dib0090_set_EFUSE [media] dib8000: fix a warning [media] dib8000: Fix sub-channel range [media] dib8000: store dtv_property_cache in a temp var [media] dib8000: warning fix: declare internal functions as static [media] r820t: quiet gcc warning on n_ring [media] r820t: memory leak in release() [media] r820t: precendence bug in r820t_xtal_check() [media] videodev2.h: Remove the unused old V4L1 buffer types [media] anysee: Grammar s/report the/report to/ [media] anysee: Initialize ret = 0 in anysee_frontend_attach() [media] media: videobuf2: fix the length check for mmap [media] em28xx: save isoc endpoint number for DVB only if endpoint has alt settings with xMaxPacketSize != 0 ... Conflicts: drivers/media/pci/cx25821/cx25821-video.c drivers/media/platform/Kconfig	2013-04-30 09:01:04 -03:00
Matt Fleming	a614e1923d	Linux 3.9 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iQEcBAABAgAGBQJRfcB+AAoJEHm+PkMAQRiGmLAH/0bIpdOYylJhRDmVOztXpANP jRYYH00UiSIBz8XO463dbbtevT2pB8pIw5TCxBWBi/V5rnJS9X5pvAHyNZBDUvYd 3BQCQ2cnQ+6stFpi4o6NciZzQShDGMmUxAOD6ejZM35/P2l+ZKrNqBwy3R4oeMuZ /WUYZTCfFF3G7qgkHoOwIjM6c34v0tpqLfx4R5CdTnKe0Ow0OGb5ko5+lefD6i9m 6cd2GFlWeIUvw0FSMLyB+HN6Tkf3JnwrklP+vuLNV+uOq5BLwggGc6A1eS51IuVJ e/ZkGTtirz+mZiG5lvqSXHaVEObPsbm32XfVVHp1SiE+TIugDb2uhtEQEv+a43w= =UOGY -----END PGP SIGNATURE----- Merge tag 'v3.9' into efi-for-tip2 Resolve conflicts for Ingo. Conflicts: drivers/firmware/Kconfig drivers/firmware/efivars.c Signed-off-by: Matt Fleming <matt.fleming@intel.com>	2013-04-30 11:42:13 +01:00
Shimoda, Yoshihiro	18a1053f7b	sudmac: add support for SUDMAC Some Renesas USB modules have SUDMAC. This patch supports it using the shdma-base driver. Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Reviewed-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Acked-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>	2013-04-30 15:50:12 +05:30
Jiri Kosina	72c16d9a5c	Merge branch 'for-3.10/mt-hybrid-finger-pen' into for-linus Conflicts: drivers/hid/hid-multitouch.c	2013-04-30 10:17:48 +02:00
Jiri Kosina	4f5a810429	Merge branches 'for-3.10/appleir', 'for-3.10/hid-debug', 'for-3.10/hid-driver-transport-cleanups', 'for-3.10/i2c-hid' and 'for-3.10/logitech' into for-linus	2013-04-30 10:12:44 +02:00
Jiri Kosina	2353f2bea3	HID: protect hid_debug_list Accesses to hid_device->hid_debug_list are not serialized properly, which could result in SMP concurrency issues when HID debugfs events are accessesed by multiple userspace processess. Serialize all the list operations by a mutex. Spotted by Al Viro. Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2013-04-30 10:09:31 +02:00
Benjamin Tissoires	a5f04b9df1	HID: debug: break out hid_dump_report() into hid-debug No semantic changes, but hid_dump_report should be in hid-debug.c, not in hid-core.c Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2013-04-30 10:09:06 +02:00
Linus Torvalds	56847d857c	Merge branch 'akpm' (incoming from Andrew) Merge second batch of fixes from Andrew Morton: - various misc bits - some printk updates - a new "SRAM" driver. - MAINTAINERS updates - the backlight driver queue - checkpatch updates - a few init/ changes - a huge number of drivers/rtc changes - fatfs updates - some lib/idr.c work - some renaming of the random driver interfaces * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (285 commits) net: rename random32 to prandom net/core: remove duplicate statements by do-while loop net/core: rename random32() to prandom_u32() net/netfilter: rename random32() to prandom_u32() net/sched: rename random32() to prandom_u32() net/sunrpc: rename random32() to prandom_u32() scsi: rename random32() to prandom_u32() lguest: rename random32() to prandom_u32() uwb: rename random32() to prandom_u32() video/uvesafb: rename random32() to prandom_u32() mmc: rename random32() to prandom_u32() drbd: rename random32() to prandom_u32() kernel/: rename random32() to prandom_u32() mm/: rename random32() to prandom_u32() lib/: rename random32() to prandom_u32() x86: rename random32() to prandom_u32() x86: pageattr-test: remove srandom32 call uuid: use prandom_bytes() raid6test: use prandom_bytes() sctp: convert sctp_assoc_set_id() to use idr_alloc_cyclic() ...	2013-04-29 19:47:50 -07:00
Linus Torvalds	191a712090	Merge branch 'for-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: - Fixes and a lot of cleanups. Locking cleanup is finally complete. cgroup_mutex is no longer exposed to individual controlelrs which used to cause nasty deadlock issues. Li fixed and cleaned up quite a bit including long standing ones like racy cgroup_path(). - device cgroup now supports proper hierarchy thanks to Aristeu. - perf_event cgroup now supports proper hierarchy. - A new mount option "__DEVEL__sane_behavior" is added. As indicated by the name, this option is to be used for development only at this point and generates a warning message when used. Unfortunately, cgroup interface currently has too many brekages and inconsistencies to implement a consistent and unified hierarchy on top. The new flag is used to collect the behavior changes which are necessary to implement consistent unified hierarchy. It's likely that this flag won't be used verbatim when it becomes ready but will be enabled implicitly along with unified hierarchy. The option currently disables some of broken behaviors in cgroup core and also .use_hierarchy switch in memcg (will be routed through -mm), which can be used to make very unusual hierarchy where nesting is partially honored. It will also be used to implement hierarchy support for blk-throttle which would be impossible otherwise without introducing a full separate set of control knobs. This is essentially versioning of interface which isn't very nice but at this point I can't see any other options which would allow keeping the interface the same while moving towards hierarchy behavior which is at least somewhat sane. The planned unified hierarchy is likely to require some level of adaptation from userland anyway, so I think it'd be best to take the chance and update the interface such that it's supportable in the long term. Maintaining the existing interface does complicate cgroup core but shouldn't put too much strain on individual controllers and I think it'd be manageable for the foreseeable future. Maybe we'll be able to drop it in a decade. Fix up conflicts (including a semantic one adding a new #include to ppc that was uncovered by header the file changes) as per Tejun. * 'for-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (45 commits) cpuset: fix compile warning when CONFIG_SMP=n cpuset: fix cpu hotplug vs rebuild_sched_domains() race cpuset: use rebuild_sched_domains() in cpuset_hotplug_workfn() cgroup: restore the call to eventfd->poll() cgroup: fix use-after-free when umounting cgroupfs cgroup: fix broken file xattrs devcg: remove parent_cgroup. memcg: force use_hierarchy if sane_behavior cgroup: remove cgrp->top_cgroup cgroup: introduce sane_behavior mount option move cgroupfs_root to include/linux/cgroup.h cgroup: convert cgroupfs_root flag bits to masks and add CGRP_ prefix cgroup: make cgroup_path() not print double slashes Revert "cgroup: remove bind() method from cgroup_subsys." perf: make perf_event cgroup hierarchical cgroup: implement cgroup_is_descendant() cgroup: make sure parent won't be destroyed before its children cgroup: remove bind() method from cgroup_subsys. devcg: remove broken_hierarchy tag cgroup: remove cgroup_lock_is_held() ...	2013-04-29 19:14:20 -07:00
Linus Torvalds	46d9be3e5e	Merge branch 'for-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue updates from Tejun Heo: "A lot of activities on workqueue side this time. The changes achieve the followings. - WQ_UNBOUND workqueues - the workqueues which are per-cpu - are updated to be able to interface with multiple backend worker pools. This involved a lot of churning but the end result seems actually neater as unbound workqueues are now a lot closer to per-cpu ones. - The ability to interface with multiple backend worker pools are used to implement unbound workqueues with custom attributes. Currently the supported attributes are the nice level and CPU affinity. It may be expanded to include cgroup association in future. The attributes can be specified either by calling apply_workqueue_attrs() or through /sys/bus/workqueue/WQ_NAME/* if the workqueue in question is exported through sysfs. The backend worker pools are keyed by the actual attributes and shared by any workqueues which share the same attributes. When attributes of a workqueue are changed, the workqueue binds to the worker pool with the specified attributes while leaving the work items which are already executing in its previous worker pools alone. This allows converting custom worker pool implementations which want worker attribute tuning to use workqueues. The writeback pool is already converted in block tree and there are a couple others are likely to follow including btrfs io workers. - WQ_UNBOUND's ability to bind to multiple worker pools is also used to make it NUMA-aware. Because there's no association between work item issuer and the specific worker assigned to execute it, before this change, using unbound workqueue led to unnecessary cross-node bouncing and it couldn't be helped by autonuma as it requires tasks to have implicit node affinity and workers are assigned randomly. After these changes, an unbound workqueue now binds to multiple NUMA-affine worker pools so that queued work items are executed in the same node. This is turned on by default but can be disabled system-wide or for individual workqueues. Crypto was requesting NUMA affinity as encrypting data across different nodes can contribute noticeable overhead and doing it per-cpu was too limiting for certain cases and IO throughput could be bottlenecked by one CPU being fully occupied while others have idle cycles. While the new features required a lot of changes including restructuring locking, it didn't complicate the execution paths much. The unbound workqueue handling is now closer to per-cpu ones and the new features are implemented by simply associating a workqueue with different sets of backend worker pools without changing queue, execution or flush paths. As such, even though the amount of change is very high, I feel relatively safe in that it isn't likely to cause subtle issues with basic correctness of work item execution and handling. If something is wrong, it's likely to show up as being associated with worker pools with the wrong attributes or OOPS while workqueue attributes are being changed or during CPU hotplug. While this creates more backend worker pools, it doesn't add too many more workers unless, of course, there are many workqueues with unique combinations of attributes. Assuming everything else is the same, NUMA awareness costs an extra worker pool per NUMA node with online CPUs. There are also a couple things which are being routed outside the workqueue tree. - block tree pulled in workqueue for-3.10 so that writeback worker pool can be converted to unbound workqueue with sysfs control exposed. This simplifies the code, makes writeback workers NUMA-aware and allows tuning nice level and CPU affinity via sysfs. - The conversion to workqueue means that there's no 1:1 association between a specific worker, which makes writeback folks unhappy as they want to be able to tell which filesystem caused a problem from backtrace on systems with many filesystems mounted. This is resolved by allowing work items to set debug info string which is printed when the task is dumped. As this change involves unifying implementations of dump_stack() and friends in arch codes, it's being routed through Andrew's -mm tree." * 'for-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (84 commits) workqueue: use kmem_cache_free() instead of kfree() workqueue: avoid false negative WARN_ON() in destroy_workqueue() workqueue: update sysfs interface to reflect NUMA awareness and a kernel param to disable NUMA affinity workqueue: implement NUMA affinity for unbound workqueues workqueue: introduce put_pwq_unlocked() workqueue: introduce numa_pwq_tbl_install() workqueue: use NUMA-aware allocation for pool_workqueues workqueue: break init_and_link_pwq() into two functions and introduce alloc_unbound_pwq() workqueue: map an unbound workqueues to multiple per-node pool_workqueues workqueue: move hot fields of workqueue_struct to the end workqueue: make workqueue->name[] fixed len workqueue: add workqueue->unbound_attrs workqueue: determine NUMA node of workers accourding to the allowed cpumask workqueue: drop 'H' from kworker names of unbound worker pools workqueue: add wq_numa_tbl_len and wq_numa_possible_cpumask[] workqueue: move pwq_pool_locking outside of get/put_unbound_pool() workqueue: fix memory leak in apply_workqueue_attrs() workqueue: fix unbound workqueue attrs hashing / comparison workqueue: fix race condition in unbound workqueue free path workqueue: remove pwq_lock which is no longer used ...	2013-04-29 19:07:40 -07:00
Linus Torvalds	ce8aa48929	Merge branch 'for-3.10-async' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull async update from Tejun Heo: "This contains three cleanup patches for async from Lai. All three patches are essentially cosmetic." * 'for-3.10-async' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: async: rename and redefine async_func_ptr async: remove unused @node from struct async_domain async: simplify lowest_in_progress()	2013-04-29 19:06:59 -07:00
Akinobu Mita	8d564368a9	net: rename random32 to prandom Commit `496f2f93b1` ("random32: rename random32 to prandom") renamed random32() and srandom32() to prandom_u32() and prandom_seed() respectively. net_random() and net_srandom() need to be redefined with prandom_* in order to finish the naming transition. While I'm at it, enclose macro argument of net_srandom() with parenthesis. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 18:28:44 -07:00
Jeff Layton	a66c04b453	inotify: convert inotify_add_to_idr() to use idr_alloc_cyclic() Signed-off-by: Jeff Layton <jlayton@redhat.com> Cc: John McCutchan <john@johnmccutchan.com> Cc: Robert Love <rlove@rlove.org> Cc: Eric Paris <eparis@parisplace.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 18:28:41 -07:00
Jeff Layton	3e6628c4b3	idr: introduce idr_alloc_cyclic() As Tejun points out, there are several users of the IDR facility that attempt to use it in a cyclic fashion. These users are likely to see -ENOSPC errors after the counter wraps one or more times however. This patchset adds a new idr_alloc_cyclic routine and converts several of these users to it. Many of these users are in obscure parts of the kernel, and I don't have a good way to test some of them. The change is pretty straightforward though, so hopefully it won't be an issue. There is one other cyclic user of idr_alloc that I didn't touch in ipc/util.c. That one is doing some strange stuff that I didn't quite understand, but it looks like it should probably be converted later somehow. This patch: Thus spake Tejun Heo: Ooh, BTW, the cyclic allocation is broken. It's prone to -ENOSPC after the first wraparound. There are several cyclic users in the kernel and I think it probably would be best to implement cyclic support in idr. This patch does that by adding new idr_alloc_cyclic function that such users in the kernel can use. With this, there's no need for a caller to keep track of the last value used as that's now tracked internally. This should prevent the ENOSPC problems that can hit when the "last allocated" counter exceeds INT_MAX. Later patches will convert existing cyclic users to the new interface. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Tejun Heo <tj@kernel.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Eric Paris <eparis@parisplace.org> Cc: Jack Morgenstein <jackm@dev.mellanox.co.il> Cc: John McCutchan <john@johnmccutchan.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Or Gerlitz <ogerlitz@mellanox.com> Cc: Robert Love <rlove@rlove.org> Cc: Roland Dreier <roland@purestorage.com> Cc: Sridhar Samudrala <sri@us.ibm.com> Cc: Steve Wise <swise@opengridcomputing.com> Cc: Tom Tucker <tom@opengridcomputing.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 18:28:41 -07:00
Namjae Jeon	ea3983ace6	fat: restructure export_operations Define two nfs export_operation structures,one for 'stale_rw' mounts and the other for 'nostale_ro'. The latter uses i_pos as a basis for encoding and decoding file handles. Also, assign i_pos to kstat->ino. The logic for rebuilding the inode is added in the subsequent patches. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Ravishankar N <ravi.n1@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 18:28:40 -07:00
Jingoo Han	6636a9944b	drivers/rtc/class.c: use struct device as the first argument for devm_rtc_device_register() Other devm_* APIs use 'struct device dev' as the first argument. Thus, in order to sync with other devm_ functions, struct device is used as the first argument for devm_rtc_device_register(). Signed-off-by: Jingoo Han <jg1.han@samsung.com> Cc: Tejun Heo <tj@kernel.org> Cc: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 18:28:22 -07:00
Jingoo Han	3e217b6602	rtc: add devm_rtc_device_{register,unregister}() These functions allow the driver core to automatically clean up any allocation made by rtc drivers. Thus it simplifies the error paths. Signed-off-by: Jingoo Han <jg1.han@samsung.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 18:28:21 -07:00
Andy Shevchenko	2e0fb404c8	lib, net: make isodigit() public and use it There are at least two users of isodigit(). Let's make it a public function of ctype.h. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 18:28:19 -07:00
Matus Ujhelyi	4d22f8c306	drivers/video/backlight/tps65217_bl.c add default brightness value option Signed-off-by: Matus Ujhelyi <matus.ujhelyi@streamunlimited.com> Cc: Grant Likely <grant.likely@secretlab.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 18:28:19 -07:00
Kim, Milo	c365e59d47	backlight: lp855x: remove duplicate platform data The 'load_new_rom_data' was used for checking whether new ROM data should be updated or not. However, we can decide it with 'size_program' data. If the size is greater than 0, it means updating ROM area is required. Otherwise, the default ROM data will be used. Therefore, this duplicate platform data can be removed. Signed-off-by: Milo(Woogyom) Kim <milo.kim@ti.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 18:28:19 -07:00
Kim, Milo	98e35be2ba	backlight: lp855x: fix initial brightness type Valid range of the brightness is from 0 to 255, so initial brightness is changed from integer to u8. Signed-off-by: Milo(Woogyom) Kim <milo.kim@ti.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 18:28:18 -07:00
Kim, Milo	0b81857339	backlight: lp855x: move backlight mode platform data The brightness of LP855x devices is controlled by I2C register or PWM input . This mode was selected through the platform data, but it can be chosen by the driver internally without platform data configuration. How to decide the control mode: If the PWM period has specific value, the mode is PWM input. On the other hand, the mode is register-based. This mode selection is done on the _probe(). Move 'mode' from a header file to the driver private data structure, 'lp855 x'. And correlated code was replaced. Signed-off-by: Milo(Woogyom) Kim <milo.kim@ti.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 18:28:18 -07:00
Kim, Milo	600ffd33d0	backlight: lp855x: convert a type of device name Configurable data, backlight device name is set to constant character type. Signed-off-by: Milo(Woogyom) Kim <milo.kim@ti.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 18:28:18 -07:00
Andrew Bresticker	46e1915eef	drivers/video/backlight/platform_lcd.c: introduce probe callback Platform LCD devices may need to do some device-specific initialization before they can be used (regulator or GPIO setup, for example), but currently the driver does not support any way of doing this. This patch adds a probe() callback to plat_lcd_data which platform LCD devices can set to indicate that device-specific initialization is needed. Signed-off-by: Andrew Bresticker <abrestic@chromium.org> Cc: Richard Purdie <rpurdie@rpsys.net> Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de> Reviewed-by: Doug Anderson <dianders@chromium.org> Acked-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 18:28:18 -07:00
Andrew Morton	1b2c289b4f	include/linux/printk.h: include stdarg.h printk.h uses va_list but doesn't include stdarg.h. Hence printk.h is unusable unless its includer has already included kernel.h (which includes stdarg.h). Remove the dependency by including stdarg.h in printk.h Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 18:28:13 -07:00
Thomas Gleixner	d0380e6c3c	early_printk: consolidate random copies of identical code The early console implementations are the same all over the place. Move the print function to kernel/printk and get rid of the copies. [akpm@linux-foundation.org: arch/mips/kernel/early_printk.c needs kernel.h for va_list] [paul.gortmaker@windriver.com: sh4: make the bios early console support depend on EARLY_PRINTK] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Russell King <linux@arm.linux.org.uk> Acked-by: Mike Frysinger <vapier@gentoo.org> Cc: Michal Simek <monstr@monstr.eu> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Richard Weinberger <richard@nod.at> Reviewed-by: Ingo Molnar <mingo@kernel.org> Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 18:28:13 -07:00
zhangwei(Jovi)	07c65f4d1a	printk/tracing: rework console tracing Commit `7ff9554bb5` ("printk: convert byte-buffer to variable-length record buffer") removed start and end parameters from call_console_drivers, but those parameters still exist in include/trace/events/printk.h. Without start and end parameters handling, printk tracing became more simple as: trace_console(text, len); Signed-off-by: zhangwei(Jovi) <jovi.zhangwei@huawei.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Kay Sievers <kay@vrfy.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 18:28:13 -07:00
Guenter Roeck	2fb0815c9e	gcc4: disable __compiletime_object_size for GCC 4.6+ __builtin_object_size is known to be broken on gcc 4.6+. See http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48880 for details. This causes unnecssary build warnings and errors such as In function 'copy_from_user', inlined from 'sb16_copy_from_user' at sound/oss/sb_audio.c:878:22: arch/x86/include/asm/uaccess_32.h:211:26: error: call to 'copy_from_user_overflow' declared with attribute error: copy_from_user() buffer size is not provably correct make[3]: [sound/oss/sb_audio.o] Error 1 (ignored) Disable it where broken. Signed-off-by: Guenter Roeck <linux@roeck-us.net> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 18:28:13 -07:00
Philipp Zabel	657eee7d25	media: coda: use genalloc API This patch depends on "genalloc: add devres support, allow to find a managed pool by device", which provides the of_get_named_gen_pool and dev_get_gen_pool functions. Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de> Acked-by: Javier Martin <javier.martin@vista-silicon.com> Acked-by: Grant Likely <grant.likely@secretlab.ca> Cc: Michal Simek <monstr@monstr.eu> Cc: Dong Aisheng <dong.aisheng@linaro.org> Cc: Fabio Estevam <fabio.estevam@freescale.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Huang Shijie <shijie8@gmail.com> Cc: Matt Porter <mporter@ti.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Rob Herring <rob.herring@calxeda.com> Cc: Shawn Guo <shawn.guo@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 18:28:13 -07:00
Philipp Zabel	9375db07ad	genalloc: add devres support, allow to find a managed pool by device This patch adds three exported functions to lib/genalloc.c: devm_gen_pool_create, dev_get_gen_pool, and of_get_named_gen_pool. devm_gen_pool_create is a managed version of gen_pool_create that keeps track of the pool via devres and allows the management code to automatically destroy it after device removal. dev_get_gen_pool retrieves the gen_pool for a given device, if it was created with devm_gen_pool_create, using devres_find. of_get_named_gen_pool retrieves the gen_pool for a given device node and property name, where the property must contain a phandle pointing to a platform device node. The corresponding platform device is then fed into dev_get_gen_pool and the resulting gen_pool is returned. [akpm@linux-foundation.org: make the of_get_named_gen_pool() stub static, fixing a zillion link errors] [akpm@linux-foundation.org: squish "struct device declared inside parameter list" warning] Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de> Acked-by: Grant Likely <grant.likely@secretlab.ca> Tested-by: Michal Simek <monstr@monstr.eu> Cc: Fabio Estevam <fabio.estevam@freescale.com> Cc: Matt Porter <mporter@ti.com> Cc: Dong Aisheng <dong.aisheng@linaro.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Rob Herring <rob.herring@calxeda.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Javier Martin <javier.martin@vista-silicon.com> Cc: Huang Shijie <shijie8@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 18:28:13 -07:00
Linus Torvalds	73154383f0	Merge branch 'akpm' (incoming from Andrew) Merge first batch of fixes from Andrew Morton: - A couple of kthread changes - A few minor audit patches - A number of fbdev patches. Florian remains AWOL so I'm picking up some of these. - A few kbuild things - ocfs2 updates - Almost all of the MM queue (And in the meantime, I already have the second big batch from Andrew pending in my mailbox ;^) * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (149 commits) memcg: take reference before releasing rcu_read_lock mem hotunplug: fix kfree() of bootmem memory mmKconfig: add an option to disable bounce mm, nobootmem: do memset() after memblock_reserve() mm, nobootmem: clean-up of free_low_memory_core_early() fs/buffer.c: remove unnecessary init operation after allocating buffer_head. numa, cpu hotplug: change links of CPU and node when changing node number by onlining CPU mm: fix memory_hotplug.c printk format warning mm: swap: mark swap pages writeback before queueing for direct IO swap: redirty page if page write fails on swap file mm, memcg: give exiting processes access to memory reserves thp: fix huge zero page logic for page with pfn == 0 memcg: avoid accessing memcg after releasing reference fs: fix fsync() error reporting memblock: fix missing comment of memblock_insert_region() mm: Remove unused parameter of pages_correctly_reserved() firmware, memmap: fix firmware_map_entry leak mm/vmstat: add note on safety of drain_zonestat mm: thp: add split tail pages to shrink page list in page reclaim mm: allow for outstanding swap writeback accounting ...	2013-04-29 17:29:08 -07:00
Ville Syrjälä	c55b6b3da2	drm: Kill user_modes list and the associated ioctls There is no way to use modes added to the user_modes list. We never look at the contents of said list in the kernel, and the only operations userspace can do are attach and detach. So the only "benefit" of this interface is wasting kernel memory. Fortunately it seems no real user space application ever used these ioctls. So just kill them. Also remove the prototypes for the non-existing drm_mode_addmode_ioctl() and drm_mode_rmmode_ioctl() functions. v2: Use drm_noop instead of completely removing the ioctls Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2013-04-30 10:03:07 +10:00
Ville Syrjälä	ea9cbb063c	drm: Silence some sparse warnings drivers/gpu/drm/drm_pci.c:155:5: warning: symbol 'drm_pci_set_busid' was not declared. Should it be static? drivers/gpu/drm/drm_pci.c:197:5: warning: symbol 'drm_pci_set_unique' was not declared. Should it be static? drivers/gpu/drm/drm_pci.c:269:5: warning: symbol 'drm_pci_agp_init' was not declared. Should it be static? drivers/gpu/drm/drm_crtc.c:181:1: warning: symbol 'drm_get_dirty_info_name' was not declared. Should it be static? drivers/gpu/drm/drm_crtc.c:1123:5: warning: symbol 'drm_mode_group_init' was not declared. Should it be static? drivers/gpu/drm/drm_modes.c:918:6: warning: symbol 'drm_mode_validate_clocks' was not declared. Should it be static? Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2013-04-30 10:02:25 +10:00
Linus Torvalds	362ed48dee	The common clock framework changes for 3.10 include many fixes for existing platforms, as well as adoption of the framework by new platforms and devices. Some long-needed fixes to the core framework are here as well as new features such as improved initialization of clocks from DT as well as framework reentrancy for nested clock operations. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJRfqtLAAoJEDqPOy9afJhJsxwP/RLvfeeMIU3804ahVNK2C59h ehJ06ZP+b0u0A7+YSC7CX1pHXIFW+UoZgYLJiLdV2kEdpOIKMELZyUcEVB97u1Of TVlsmHfTLv2zVAq/LYRVSKFYeMUd/6RRoq7Cm6hoj638IVeXG7C+8pei2aVZe++t 1ENmb4UGFJ7NLfpE5zQ3fEuIfHfuWA8Od6SmPaV/YG5Io8HgkDGF3/tCJURJGII6 xLN2Rh8qbFktJLVvKe6yLyvUEZiWh8A6HNPyNiFYYGX11wU76zK2wMN3BW6Nn/kW 3PubzISoKRaoCZvuVK+CoLWnhFl2LteFVVmL1TBc/jxJe6q+rLX33sXl1q9K+SLt POnHf/7nDyO3zbZWgfRR1r3FdeZqdLYw8HVsLcOKFcv9n1UligzuUNml5PklKwNh BDMmSo5ytS1QPV1e9ZtVrk6IyvDyrenwfDW1Mw43ST6D23FVrivywB4X9ur6WljI d1/CBvQXQZ11Hd4OAvqRL8QYFJvc5WlERjSd1j6I6XS6xioKOTKMkUC/KpRcCid9 avA6mJ5k/a1jTojvh2wl37paI//OzY0VDlxRSeMZIu9Dsn29DnPlE5CLg535Ovu+ mn9OtLFEDNnlgWCMQYUehGd7ITgtwrB/fxxNeBbMYjDz4AIirR2BIvMR7I8CMTQz M0rHu8NpwKH6eqC6kAup =+LO3 -----END PGP SIGNATURE----- Merge tag 'clk-for-linus-3.10' of git://git.linaro.org/people/mturquette/linux Pull clock framework update from Michael Turquette: "The common clock framework changes for 3.10 include many fixes for existing platforms, as well as adoption of the framework by new platforms and devices. Some long-needed fixes to the core framework are here as well as new features such as improved initialization of clocks from DT as well as framework reentrancy for nested clock operations." * tag 'clk-for-linus-3.10' of git://git.linaro.org/people/mturquette/linux: (44 commits) clk: add clk_ignore_unused option to keep boot clocks on clk: ux500: fix mismatched types clk: vexpress: Add separate SP810 driver clk: si5351: make clk-si5351 depend on CONFIG_OF clk: export __clk_get_flags for modular clock providers clk: vt8500: Missing breaks in vtwm_pll_round_rate/_set_rate. clk: sunxi: Unify oscillator clock clk: composite: allow fixed rates & fixed dividers clk: composite: rename 'div' references to 'rate' clk: add si5351 i2c common clock driver clk: add device tree fixed-factor-clock binding support clk: Properly handle notifier return values clk: ux500: abx500: Define clock tree for ab850x clk: ux500: Add support for sysctrl clocks clk: mvebu: Fix valid value range checking for cpu_freq_select clk: Fixup locking issues for clk_set_parent clk: Fixup errorhandling for clk_set_parent clk: Restructure code for __clk_reparent clk: sunxi: drop an unnecesary kmalloc clk: sunxi: drop CLK_IGNORE_UNUSED ...	2013-04-29 16:43:54 -07:00
Linus Torvalds	61f3d0a988	spi: Updates for v3.10 A fairly quiet release for SPI, mainly driver work. A few highlights: - Supports bits per word compatibility checking in the core. - Allow use of the IP used in Freescale SPI controllers outside Freescale SoCs. - DMA support for the Atmel SPI driver. - New drivers for the BCM2835 and Tegra114. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJRfoxOAAoJELSic+t+oim9P2IP/0bKjrSdJ3aypqi5k8hF7Sw0 ksWYyYQ7yVIQlr+2zCIn3YO69/Z8OzJf2skGW7NW9TZ/mSXp0NXB/E4v5+fB+d4h +Dj/eFQG/T39RLSvuHsuJP0VAFTzigFM2DGZ4yQDUIyxZQiG4U3R50rOmj91GeDK s00By0nVAQVnnHcQJ4KDr82Z30NoPW32caz1GzB3xCkXO3HnDSNXnOHa93fxrVGx iyN52gkmLyyD9MwxzMHvxIg/HY3/US5i7RkgUuWRhVaG+gwEOrfrC9PmniFyJUf/ qbqnoP2xQB50eo4DeCMZDknxgWb7n8S/FbmXYxUcVZVqYbkNuHEAP0SqroMlgc55 cVu0zQ84qwwU3jmngg7CkVvqxw2L3znYjEr0StfxmpJwr93Tn0yaWLjzTuY57zaz BWuHG0SK1+wghCwdzqQBpRY7yRg9lE+1S81YQoLRYTqYz6fT6TwhLpdTUNpP2zIu Ue1rM3JEgYr5TsOF/vZV8MuNXvodhCvzsv95Mm5G2R3uSCN/0LApVi6A96AAk6ms WpFvqSZ2+ugEVE+ZUgmOqXjUuOTKxooTwfIZEogXKabBtHmGCGLXG7wwG5X4thBy UJgfvm0LE+zmAGVGmZycnyfDu+JSs1ofnkUGJb28edyP4HOlbm+6gHvxGMf2iUpw nqrbZ2lvUdiu69SGeV53 =+Omc -----END PGP SIGNATURE----- Merge tag 'spi-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi updates from Mark Brown: "A fairly quiet release for SPI, mainly driver work. A few highlights: - Supports bits per word compatibility checking in the core. - Allow use of the IP used in Freescale SPI controllers outside Freescale SoCs. - DMA support for the Atmel SPI driver. - New drivers for the BCM2835 and Tegra114" * tag 'spi-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: (68 commits) spi-topcliff-pch: fix to use list_for_each_entry_safe() when delete list items spi-topcliff-pch: missing platform_driver_unregister() on error in pch_spi_init() ARM: dts: add pinctrl property for spi node for atmel SoC ARM: dts: add spi nodes for the atmel boards ARM: dts: add spi nodes for atmel SoC ARM: at91: add clocks for spi dt entries spi/spi-atmel: add dmaengine support spi/spi-atmel: add flag to controller data for lock operations spi/spi-atmel: add physical base address spi/sirf: fix MODULE_DEVICE_TABLE MAINTAINERS: Add git repository and update my address spi/s3c64xx: Check for errors in dmaengine prepare_transfer() spi/s3c64xx: Fix non-dmaengine usage spi: omap2-mcspi: fix error return code in omap2_mcspi_probe() spi/s3c64xx: let device core setup the default pin configuration MAINTAINERS: Update Grant's email address and maintainership spi: omap2-mcspi: Fix transfers if DMADEVICES is not set spi: s3c64xx: move to generic dmaengine API spi-gpio: init CS before spi_bitbang_setup() spi: spi-mpc512x-psc: let transmiter/receiver enabled when in xfer loop ...	2013-04-29 16:38:41 -07:00
Linus Torvalds	8ded8d4e4f	regulator: Updates for v3.10 The diffstat and changelog here is dominated by Lee Jones' heroic efforts to sync the ab8500 driver that's been maintained out of tree with mainline (plus Axel's cleanup work on the results) but there's a few other things here: - Axel Lin added regulator_map_voltage_ascend() optimising a common pattern for drivers using the core code. - Milo Kim tought the regulator core to handle regulators sharing an enable GPIO, avoiding the need to do hacks to support such systems. - Andrew Bresticker added code to handle missing supplies for regulators more sensibly for device tree systems, reducing the need for stubbing there. plus the usual batch of driver specific updates and fixes. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJRfohQAAoJELSic+t+oim9MvgP/2PxUNYQ8XIEXwCFP4GGsS8x NxFRAEpRJPBqa6qnTDywnN5VyBykTqKDfcA0EILW7Hz+EzWKpqucsloli0e7B4VE PMXf5s5YPwRLDslAv1VYMzKbGzzB2jDWlVjhXtiBRNXSUv2zwR1MWnoguQzSXq8J pE4uh6u2/5FUO/upcQ9LxmmBGr2CFZ/egKK3HvAWpidWOO9ykzIA8VzAD5dKAwzV Bo63ia51ymxn1HyokhPtIpko4+J6KYO3Lts8vi+g1DT1aA1nAHTcN4ewUl1v5NkD xFBpt06m95AQ7y9oQ1gdcGXDefnfdrzPtFZkofVVJpYNMtcbxOoO+WJk2ZUBjhrZ cpVmvqELfRp/eMr1xe1XJIuLelyE+bOCx36F5FQgGCQNI+gNWT2SlRCWeH4VLhh+ Zeuqnhlce5Chv0wsjrNk4biwj981V3uKNo/n/O9mDQAXLYC2AVGJbXL04EcoxXag ButmfjWshYUzEXmxpXD9+pas4EMsuziWqCQjtuVRtTf9XSWkps39mitPRu3h2aWg IwWlk3/eMI3WPr7eE7vcu5PvOnQ9Nm6fasx3NhxjiYBVwktyprV3tMhKDBjt8qdG frzOfimOUGumeKinFm7tfP5EQE4prfwpN/kT+PPleNeXARe3AWKLsatO1mEtey9b t1PC8z5k8/9rBDIWvzq3 =rX43 -----END PGP SIGNATURE----- Merge tag 'regulator-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator updates from Mark Brown: "The diffstat and changelog here is dominated by Lee Jones' heroic efforts to sync the ab8500 driver that's been maintained out of tree with mainline (plus Axel's cleanup work on the results) but there's a few other things here: - Axel Lin added regulator_map_voltage_ascend() optimising a common pattern for drivers using the core code. - Milo Kim tought the regulator core to handle regulators sharing an enable GPIO, avoiding the need to do hacks to support such systems. - Andrew Bresticker added code to handle missing supplies for regulators more sensibly for device tree systems, reducing the need for stubbing there. plus the usual batch of driver specific updates and fixes" * tag 'regulator-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: (152 commits) regulator: mc13892: Fix MC13892_SWITCHERS0_SWxHI bit in set_voltage_sel regulator: Remove NULL test before calling regulator_unregister() regulator: mc13783: Add device tree probe support regulator: mc13xxx: Add warning of incorrect names of regulators regulator: max77686: Don't update max77686->opmode if update register fails regulator: max8952: Add missing config.of_node setting for regulator register regulator: ab3100: Fix regulator register error handling regulator: tps6524x: Use regulator_map_voltage_ascend regulator: lp8788-buck: Use regulator_map_voltage_ascend regulator: lp872x: Use regulator_map_voltage_ascend regulator: mc13892: Use regulator_map_voltage_ascend for mc13892_sw_regulator_ops regulator: tps65023: Use regulator_map_voltage_ascend regulator: tps65023: Merge tps65020 ldo1 and ldo2 vsel table regulator: tps6507x: Use regulator_map_voltage_ascend regulator: mc13892: Fix MC13892_SWITCHERS0_SWxHI bit in set_voltage_sel regulator: ab3100: device tree support regulator: ab3100: refactor probe to use IDs regulator: max8973: Don't override control1 variable when set ramp delay bits regulator: tps80031: Convert tps80031_dcdc_ops to [get\|set]_voltage_sel_regmap regulator: tps80031: Fix LDO2 track mode for TPS80031 or TPS80032-ES1.0 ...	2013-04-29 16:32:25 -07:00
Linus Torvalds	7b053842b9	regmap: Updates for v3.10 In user visible terms just a couple of enhancements here, though there was a moderate amount of refactoring required in order to support the register cache sync performance improvements. - Support for block and asynchronous I/O during register cache syncing; this provides a use case dependant performance improvement. - Additional debugfs information on the memory consuption and register set. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIbBAABAgAGBQJRfoZRAAoJELSic+t+oim9gXAP+JhAihmIQJlhUxZkXojFhClD SKNWuFHmFC6VGndv52HPZR7nLN6hIlT4VUqk/rEw58R/RTqGuuWGc0KnKJf7ipid 6CdutuOP6q8mgs02kGKFAWRbSl++IXJ4TwvBbiyDMBmmngFoJY+gnmtnpP+PzcAd LA3fn54jDWzBKCSlFBEC5acYxOMPmzm2uW13mO8Gy1RJrUkXfOemEFsyP0NVNJys N0Zslp4nUUWmEu41UujuAUGZ7xXnnNQF5R4/RdS3+p22+sCEe7/mhLU1AxalUT4c m9h9U2UKoXqRBuFQ9kRGwM2Gufjg33DoB0ExqIDEgaD2kRdAdAo/WhTHLxTiQEfq 6YXGZYwl0QUC1KcUwUWJZIq/nECibaYDAoyooNzLQNPAbbO6gdjsTIVCaZK8U/k6 D8bWAM4eRbv6xwXEd8rKW5+2f41dnsb5O3OgbdEEBZnbQ8UizI9KDGbPB3ARV2RI Xqn+lYZV/q/99Bb3Pn0oS6Ud/tz5BqN4w3N84H0KcvcRHXvYjkdQ6ulsterRykOa gYWfsCKTbm2C1zBLGDPXkDablodLZmzoCs4ajeIt6zIELNzuIsI3trprpT85RtrS cjYl61ECuypPYBIW4uzxxBk/FeiEjQ4ndgQ4MgVnUfx0NpmG2N9LlDc2r6i+UgV/ EBxvYlPsEzQYLKoiJl8= =RG1W -----END PGP SIGNATURE----- Merge tag 'regmap-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap Pull regmap updates from Mark Brown: "In user visible terms just a couple of enhancements here, though there was a moderate amount of refactoring required in order to support the register cache sync performance improvements. - Support for block and asynchronous I/O during register cache syncing; this provides a use case dependant performance improvement. - Additional debugfs information on the memory consuption and register set" * tag 'regmap-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap: (23 commits) regmap: don't corrupt work buffer in _regmap_raw_write() regmap: cache: Fix format specifier in dev_dbg regmap: cache: Make regcache_sync_block_raw static regmap: cache: Write consecutive registers in a single block write regmap: cache: Split raw and non-raw syncs regmap: cache: Factor out block sync regmap: cache: Factor out reg_present support from rbtree cache regmap: cache: Use raw I/O to sync rbtrees if we can regmap: core: Provide regmap_can_raw_write() operation regmap: cache: Provide a get address of value operation regmap: Cut down on the average # of nodes in the rbtree cache regmap: core: Make raw write available to regcache regmap: core: Warn on invalid operation combinations regmap: irq: Clarify error message when we fail to request primary IRQ regmap: rbtree Expose total memory consumption in the rbtree debugfs entry regmap: debugfs: Add a registers `range' file regmap: debugfs: Simplify calculation of `c->max_reg' regmap: cache: Store caches in native register format where possible regmap: core: Split out in place value parsing regmap: cache: Use regcache_get_value() to check if we updated ...	2013-04-29 16:31:26 -07:00
Joonsoo Kim	b4def3509d	mm, nobootmem: clean-up of free_low_memory_core_early() Remove unused argument and make function static, because there is no user outside of nobootmem.c Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Yinghai Lu <yinghai@kernel.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Jiang Liu <liuj97@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 15:54:39 -07:00
Shaohua Li	5bc7b8aca9	mm: thp: add split tail pages to shrink page list in page reclaim In page reclaim, huge page is split. split_huge_page() adds tail pages to LRU list. Since we are reclaiming a huge page, it's better we reclaim all subpages of the huge page instead of just the head page. This patch adds split tail pages to shrink page list so the tail pages can be reclaimed soon. Before this patch, run a swap workload: thp_fault_alloc 3492 thp_fault_fallback 608 thp_collapse_alloc 6 thp_collapse_alloc_failed 0 thp_split 916 With this patch: thp_fault_alloc 4085 thp_fault_fallback 16 thp_collapse_alloc 90 thp_collapse_alloc_failed 0 thp_split 1272 fallback allocation is reduced a lot. [akpm@linux-foundation.org: fix CONFIG_SWAP=n build] Signed-off-by: Shaohua Li <shli@fusionio.com> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Minchan Kim <minchan@kernel.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 15:54:38 -07:00
Seth Jennings	1eec6702a8	mm: allow for outstanding swap writeback accounting To prevent flooding the swap device with writebacks, frontswap backends need to count and limit the number of outstanding writebacks. The incrementing of the counter can be done before the call to __swap_writepage(). However, the caller must receive a notification when the writeback completes in order to decrement the counter. To achieve this functionality, this patch modifies __swap_writepage() to take the bio completion callback function as an argument. end_swap_bio_write(), the normal bio completion function, is also made non-static so that code doing the accounting can call it after the accounting is done. There should be no behavioural change to existing code. Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com> Signed-off-by: Bob Liu <bob.liu@oracle.com> Acked-by: Minchan Kim <minchan@kernel.org> Reviewed-by: Dan Magenheimer <dan.magenheimer@oracle.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 15:54:38 -07:00
Seth Jennings	2f772e6cad	mm: break up swap_writepage() for frontswap backends swap_writepage() is currently where frontswap hooks into the swap write path to capture pages with the frontswap_store() function. However, if a frontswap backend wants to "resume" the writeback of a page to the swap device, it can't call swap_writepage() as the page will simply reenter the backend. This patch separates swap_writepage() into a top and bottom half, the bottom half named __swap_writepage() to allow a frontswap backend, like zswap, to resume writeback beyond the frontswap_store() hook. __add_to_swap_cache() is also made non-static so that the page for which writeback is to be resumed can be added to the swap cache. Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com> Signed-off-by: Bob Liu <bob.liu@oracle.com> Acked-by: Minchan Kim <minchan@kernel.org> Reviewed-by: Dan Magenheimer <dan.magenheimer@oracle.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 15:54:38 -07:00
Anton Vorontsov	70ddf637ee	memcg: add memory.pressure_level events With this patch userland applications that want to maintain the interactivity/memory allocation cost can use the pressure level notifications. The levels are defined like this: The "low" level means that the system is reclaiming memory for new allocations. Monitoring this reclaiming activity might be useful for maintaining cache level. Upon notification, the program (typically "Activity Manager") might analyze vmstat and act in advance (i.e. prematurely shutdown unimportant services). The "medium" level means that the system is experiencing medium memory pressure, the system might be making swap, paging out active file caches, etc. Upon this event applications may decide to further analyze vmstat/zoneinfo/memcg or internal memory usage statistics and free any resources that can be easily reconstructed or re-read from a disk. The "critical" level means that the system is actively thrashing, it is about to out of memory (OOM) or even the in-kernel OOM killer is on its way to trigger. Applications should do whatever they can to help the system. It might be too late to consult with vmstat or any other statistics, so it's advisable to take an immediate action. The events are propagated upward until the event is handled, i.e. the events are not pass-through. Here is what this means: for example you have three cgroups: A->B->C. Now you set up an event listener on cgroups A, B and C, and suppose group C experiences some pressure. In this situation, only group C will receive the notification, i.e. groups A and B will not receive it. This is done to avoid excessive "broadcasting" of messages, which disturbs the system and which is especially bad if we are low on memory or thrashing. So, organize the cgroups wisely, or propagate the events manually (or, ask us to implement the pass-through events, explaining why would you need them.) Performance wise, the memory pressure notifications feature itself is lightweight and does not require much of bookkeeping, in contrast to the rest of memcg features. Unfortunately, as of current memcg implementation, pages accounting is an inseparable part and cannot be turned off. The good news is that there are some efforts[1] to improve the situation; plus, implementing the same, fully API-compatible[2] interface for CONFIG_MEMCG=n case (e.g. embedded) is also a viable option, so it will not require any changes on the userland side. [1] http://permalink.gmane.org/gmane.linux.kernel.cgroups/6291 [2] http://lkml.org/lkml/2013/2/21/454 [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: fix CONFIG_CGROPUPS=n warnings] Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org> Acked-by: Kirill A. Shutemov <kirill@shutemov.name> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Tejun Heo <tj@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Glauber Costa <glommer@parallels.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: Leonid Moiseichuk <leonid.moiseichuk@nokia.com> Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Cc: John Stultz <john.stultz@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 15:54:38 -07:00
David Rientjes	4edd7ceff0	mm, hotplug: avoid compiling memory hotremove functions when disabled __remove_pages() is only necessary for CONFIG_MEMORY_HOTREMOVE. PowerPC pseries will return -EOPNOTSUPP if unsupported. Adding an #ifdef causes several other functions it depends on to also become unnecessary, which saves in .text when disabled (it's disabled in most defconfigs besides powerpc, including x86). remove_memory_block() becomes static since it is not referenced outside of drivers/base/memory.c. Build tested on x86 and powerpc with CONFIG_MEMORY_HOTREMOVE both enabled and disabled. Signed-off-by: David Rientjes <rientjes@google.com> Acked-by: Toshi Kani <toshi.kani@hp.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 15:54:37 -07:00
Toshi Kani	825f787bb4	resource: add release_mem_region_adjustable() Add release_mem_region_adjustable(), which releases a requested region from a currently busy memory resource. This interface adjusts the matched memory resource accordingly even if the requested region does not match exactly but still fits into. This new interface is intended for memory hot-delete. During bootup, memory resources are inserted from the boot descriptor table, such as EFI Memory Table and e820. Each memory resource entry usually covers the whole contigous memory range. Memory hot-delete request, on the other hand, may target to a particular range of memory resource, and its size can be much smaller than the whole contiguous memory. Since the existing release interfaces like __release_region() require a requested region to be exactly matched to a resource entry, they do not allow a partial resource to be released. This new interface is restrictive (i.e. release under certain conditions), which is consistent with other release interfaces, __release_region() and __release_resource(). Additional release conditions, such as an overlapping region to a resource entry, can be supported after they are confirmed as valid cases. There is no change to the existing interfaces since their restriction is valid for I/O resources. [akpm@linux-foundation.org: use GFP_ATOMIC under write_lock()] [akpm@linux-foundation.org: switch back to GFP_KERNEL, less buggily] [akpm@linux-foundation.org: remove unneeded and wrong kfree(), per Toshi] Signed-off-by: Toshi Kani <toshi.kani@hp.com> Reviewed-by : Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Reviewed-by: Ram Pai <linuxram@us.ibm.com> Cc: T Makphaibulchoke <tmac@hp.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Jiang Liu <jiang.liu@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 15:54:37 -07:00
Yijing Wang	f1cb08798e	mm: remove CONFIG_HOTPLUG ifdefs CONFIG_HOTPLUG is going away as an option, cleanup CONFIG_HOTPLUG ifdefs in mm files. Signed-off-by: Yijing Wang <wangyijing@huawei.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 15:54:37 -07:00
Andrew Shewmaker	4eeab4f558	mm: replace hardcoded 3% with admin_reserve_pages knob Add an admin_reserve_kbytes knob to allow admins to change the hardcoded memory reserve to something other than 3%, which may be multiple gigabytes on large memory systems. Only about 8MB is necessary to enable recovery in the default mode, and only a few hundred MB are required even when overcommit is disabled. This affects OVERCOMMIT_GUESS and OVERCOMMIT_NEVER. admin_reserve_kbytes is initialized to min(3% free pages, 8MB) I arrived at 8MB by summing the RSS of sshd or login, bash, and top. Please see first patch in this series for full background, motivation, testing, and full changelog. [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: make init_admin_reserve() static] Signed-off-by: Andrew Shewmaker <agshew@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 15:54:36 -07:00
Andrew Shewmaker	c9b1d0981f	mm: limit growth of 3% hardcoded other user reserve Add user_reserve_kbytes knob. Limit the growth of the memory reserved for other user processes to min(3% current process size, user_reserve_pages). Only about 8MB is necessary to enable recovery in the default mode, and only a few hundred MB are required even when overcommit is disabled. user_reserve_pages defaults to min(3% free pages, 128MB) I arrived at 128MB by taking the max VSZ of sshd, login, bash, and top ... then adding the RSS of each. This only affects OVERCOMMIT_NEVER mode. Background 1. user reserve __vm_enough_memory reserves a hardcoded 3% of the current process size for other applications when overcommit is disabled. This was done so that a user could recover if they launched a memory hogging process. Without the reserve, a user would easily run into a message such as: bash: fork: Cannot allocate memory 2. admin reserve Additionally, a hardcoded 3% of free memory is reserved for root in both overcommit 'guess' and 'never' modes. This was intended to prevent a scenario where root-cant-log-in and perform recovery operations. Note that this reserve shrinks, and doesn't guarantee a useful reserve. Motivation The two hardcoded memory reserves should be updated to account for current memory sizes. Also, the admin reserve would be more useful if it didn't shrink too much. When the current code was originally written, 1GB was considered "enterprise". Now the 3% reserve can grow to multiple GB on large memory systems, and it only needs to be a few hundred MB at most to enable a user or admin to recover a system with an unwanted memory hogging process. I've found that reducing these reserves is especially beneficial for a specific type of application load: * single application system * one or few processes (e.g. one per core) * allocating all available memory * not initializing every page immediately * long running I've run scientific clusters with this sort of load. A long running job sometimes failed many hours (weeks of CPU time) into a calculation. They weren't initializing all of their memory immediately, and they weren't using calloc, so I put systems into overcommit 'never' mode. These clusters run diskless and have no swap. However, with the current reserves, a user wishing to allocate as much memory as possible to one process may be prevented from using, for example, almost 2GB out of 32GB. The effect is less, but still significant when a user starts a job with one process per core. I have repeatedly seen a set of processes requesting the same amount of memory fail because one of them could not allocate the amount of memory a user would expect to be able to allocate. For example, Message Passing Interfce (MPI) processes, one per core. And it is similar for other parallel programming frameworks. Changing this reserve code will make the overcommit never mode more useful by allowing applications to allocate nearly all of the available memory. Also, the new admin_reserve_kbytes will be safer than the current behavior since the hardcoded 3% of available memory reserve can shrink to something useless in the case where applications have grabbed all available memory. Risks * "bash: fork: Cannot allocate memory" The downside of the first patch-- which creates a tunable user reserve that is only used in overcommit 'never' mode--is that an admin can set it so low that a user may not be able to kill their process, even if they already have a shell prompt. Of course, a user can get in the same predicament with the current 3% reserve--they just have to launch processes until 3% becomes negligible. * root-cant-log-in problem The second patch, adding the tunable rootuser_reserve_pages, allows the admin to shoot themselves in the foot by setting it too small. They can easily get the system into a state where root-can't-log-in. However, the new admin_reserve_kbytes will be safer than the current behavior since the hardcoded 3% of available memory reserve can shrink to something useless in the case where applications have grabbed all available memory. Alternatives * Memory cgroups provide a more flexible way to limit application memory. Not everyone wants to set up cgroups or deal with their overhead. * We could create a fourth overcommit mode which provides smaller reserves. The size of useful reserves may be drastically different depending on the whether the system is embedded or enterprise. * Force users to initialize all of their memory or use calloc. Some users don't want/expect the system to overcommit when they malloc. Overcommit 'never' mode is for this scenario, and it should work well. The new user and admin reserve tunables are simple to use, with low overhead compared to cgroups. The patches preserve current behavior where 3% of memory is less than 128MB, except that the admin reserve doesn't shrink to an unusable size under pressure. The code allows admins to tune for embedded and enterprise usage. FAQ * How is the root-cant-login problem addressed? What happens if admin_reserve_pages is set to 0? Root is free to shoot themselves in the foot by setting admin_reserve_kbytes too low. On x86_64, the minimum useful reserve is: 8MB for overcommit 'guess' 128MB for overcommit 'never' admin_reserve_pages defaults to min(3% free memory, 8MB) So, anyone switching to 'never' mode needs to adjust admin_reserve_pages. * How do you calculate a minimum useful reserve? A user or the admin needs enough memory to login and perform recovery operations, which includes, at a minimum: sshd or login + bash (or some other shell) + top (or ps, kill, etc.) For overcommit 'guess', we can sum resident set sizes (RSS) because we only need enough memory to handle what the recovery programs will typically use. On x86_64 this is about 8MB. For overcommit 'never', we can take the max of their virtual sizes (VSZ) and add the sum of their RSS. We use VSZ instead of RSS because mode forces us to ensure we can fulfill all of the requested memory allocations-- even if the programs only use a fraction of what they ask for. On x86_64 this is about 128MB. When swap is enabled, reserves are useful even when they are as small as 10MB, regardless of overcommit mode. When both swap and overcommit are disabled, then the admin should tune the reserves higher to be absolutley safe. Over 230MB each was safest in my testing. * What happens if user_reserve_pages is set to 0? Note, this only affects overcomitt 'never' mode. Then a user will be able to allocate all available memory minus admin_reserve_kbytes. However, they will easily see a message such as: "bash: fork: Cannot allocate memory" And they won't be able to recover/kill their application. The admin should be able to recover the system if admin_reserve_kbytes is set appropriately. * What's the difference between overcommit 'guess' and 'never'? "Guess" allows an allocation if there are enough free + reclaimable pages. It has a hardcoded 3% of free pages reserved for root. "Never" allows an allocation if there is enough swap + a configurable percentage (default is 50) of physical RAM. It has a hardcoded 3% of free pages reserved for root, like "Guess" mode. It also has a hardcoded 3% of the current process size reserved for additional applications. * Why is overcommit 'guess' not suitable even when an app eventually writes to every page? It takes free pages, file pages, available swap pages, reclaimable slab pages into consideration. In other words, these are all pages available, then why isn't overcommit suitable? Because it only looks at the present state of the system. It does not take into account the memory that other applications have malloced, but haven't initialized yet. It overcommits the system. Test Summary There was little change in behavior in the default overcommit 'guess' mode with swap enabled before and after the patch. This was expected. Systems run most predictably (i.e. no oom kills) in overcommit 'never' mode with swap enabled. This also allowed the most memory to be allocated to a user application. Overcommit 'guess' mode without swap is a bad idea. It is easy to crash the system. None of the other tested combinations crashed. This matches my experience on the Roadrunner supercomputer. Without the tunable user reserve, a system in overcommit 'never' mode and without swap does not allow the admin to recover, although the admin can. With the new tunable reserves, a system in overcommit 'never' mode and without swap can be configured to: 1. maximize user-allocatable memory, running close to the edge of recoverability 2. maximize recoverability, sacrificing allocatable memory to ensure that a user cannot take down a system Test Description Fedora 18 VM - 4 x86_64 cores, 5725MB RAM, 4GB Swap System is booted into multiuser console mode, with unnecessary services turned off. Caches were dropped before each test. Hogs are user memtester processes that attempt to allocate all free memory as reported by /proc/meminfo In overcommit 'never' mode, memory_ratio=100 Test Results 3.9.0-rc1-mm1 Overcommit \| Swap \| Hogs \| MB Got/Wanted \| OOMs \| User Recovery \| Admin Recovery ---------- ---- ---- ------------- ---- ------------- -------------- guess yes 1 5432/5432 no yes yes guess yes 4 5444/5444 1 yes yes guess no 1 5302/5449 no yes yes guess no 4 - crash no no never yes 1 5460/5460 1 yes yes never yes 4 5460/5460 1 yes yes never no 1 5218/5432 no no yes never no 4 5203/5448 no no yes 3.9.0-rc1-mm1-tunablereserves User and Admin Recovery show their respective reserves, if applicable. Overcommit \| Swap \| Hogs \| MB Got/Wanted \| OOMs \| User Recovery \| Admin Recovery ---------- ---- ---- ------------- ---- ------------- -------------- guess yes 1 5419/5419 no - yes 8MB yes guess yes 4 5436/5436 1 - yes 8MB yes guess no 1 5440/5440 * - yes 8MB yes guess no 4 - crash - no 8MB no * process would successfully mlock, then the oom killer would pick it never yes 1 5446/5446 no 10MB yes 20MB yes never yes 4 5456/5456 no 10MB yes 20MB yes never no 1 5387/5429 no 128MB no 8MB barely never no 1 5323/5428 no 226MB barely 8MB barely never no 1 5323/5428 no 226MB barely 8MB barely never no 1 5359/5448 no 10MB no 10MB barely never no 1 5323/5428 no 0MB no 10MB barely never no 1 5332/5428 no 0MB no 50MB yes never no 1 5293/5429 no 0MB no 90MB yes never no 1 5001/5427 no 230MB yes 338MB yes never no 4* 4998/5424 no 230MB yes 338MB yes * more memtesters were launched, able to allocate approximately another 100MB Future Work - Test larger memory systems. - Test an embedded image. - Test other architectures. - Time malloc microbenchmarks. - Would it be useful to be able to set overcommit policy for each memory cgroup? - Some lines are slightly above 80 chars. Perhaps define a macro to convert between pages and kb? Other places in the kernel do this. [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: make init_user_reserve() static] Signed-off-by: Andrew Shewmaker <agshew@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 15:54:36 -07:00
Andrew Morton	f02c696800	include/linux/memory.h: implement register_hotmemory_notifier() When CONFIG_MEMORY_HOTPLUG=n, we don't want the memory-hotplug notifier handlers to be included in the .o files, for space reasons. The existing hotplug_memory_notifier() tries to handle this but testing with gcc-4.4.4 shows that it doesn't work - the hotplug functions are still present in the .o files. So implement a new register_hotmemory_notifier() which is a copy of register_hotcpu_notifier(), and which actually works as desired. hotplug_memory_notifier() and register_memory_notifier() callsites should be converted to use this new register_hotmemory_notifier(). While we're there, let's repair the existing hotplug_memory_notifier(): it simply stomps on the register_memory_notifier() return value, so well-behaved code cannot check for errors. Apparently non of the existing callers were well-behaved :( Cc: Andrew Shewmaker <agshew@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 15:54:36 -07:00
Cody P Schafer	f9872caf07	page_alloc: make setup_nr_node_ids() usable for arch init code powerpc and x86 were opencoding copies of setup_nr_node_ids(), which page_alloc provides but makes static. Make it avaliable to the archs in linux/mm.h. Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 15:54:36 -07:00
Johannes Weiner	0aad818b2d	sparse-vmemmap: specify vmemmap population range in bytes The sparse code, when asking the architecture to populate the vmemmap, specifies the section range as a starting page and a number of pages. This is an awkward interface, because none of the arch-specific code actually thinks of the range in terms of 'struct page' units and always translates it to bytes first. In addition, later patches mix huge page and regular page backing for the vmemmap. For this, they need to call vmemmap_populate_basepages() on sub-section ranges with PAGE_SIZE and PMD_SIZE in mind. But these are not necessarily multiples of the 'struct page' size and so this unit is too coarse. Just translate the section range into bytes once in the generic sparse code, then pass byte ranges down the stack. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Bernhard Schmidt <Bernhard.Schmidt@lrz.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: David S. Miller <davem@davemloft.net> Tested-by: David S. Miller <davem@davemloft.net> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 15:54:35 -07:00
David Rientjes	949f7ec576	mm, hugetlb: include hugepages in meminfo Particularly in oom conditions, it's troublesome that hugetlb memory is not displayed. All other meminfo that is emitted will not add up to what is expected, and there is no artifact left in the kernel log to show that a potentially significant amount of memory is actually allocated as hugepages which are not available to be reclaimed. Booting with hugepages=8192 on the command line, this memory is now shown in oom conditions. For example, with echo m > /proc/sysrq-trigger: Node 0 hugepages_total=2048 hugepages_free=2048 hugepages_surp=0 hugepages_size=2048kB Node 1 hugepages_total=2048 hugepages_free=2048 hugepages_surp=0 hugepages_size=2048kB Node 2 hugepages_total=2048 hugepages_free=2048 hugepages_surp=0 hugepages_size=2048kB Node 3 hugepages_total=2048 hugepages_free=2048 hugepages_surp=0 hugepages_size=2048kB [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: David Rientjes <rientjes@google.com> Acked-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 15:54:35 -07:00
Hugh Dickins	6ee8630e02	mm: allow arch code to control the user page table ceiling On architectures where a pgd entry may be shared between user and kernel (e.g. ARM+LPAE), freeing page tables needs a ceiling other than 0. This patch introduces a generic USER_PGTABLES_CEILING that arch code can override. It is the responsibility of the arch code setting the ceiling to ensure the complete freeing of the page tables (usually in pgd_free()). [catalin.marinas@arm.com: commit log; shift_arg_pages(), asm-generic/pgtables.h changes] Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: <stable@vger.kernel.org> [3.3+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 15:54:34 -07:00
Atsushi Kumagai	13ba3fcbbe	kexec, vmalloc: export additional vmalloc layer information Now, vmap_area_list is exported as VMCOREINFO for makedumpfile to get the start address of vmalloc region (vmalloc_start). The address which contains vmalloc_start value is represented as below: vmap_area_list.next - OFFSET(vmap_area.list) + OFFSET(vmap_area.va_start) However, both OFFSET(vmap_area.va_start) and OFFSET(vmap_area.list) aren't exported as VMCOREINFO. So this patch exports them externally with small cleanup. [akpm@linux-foundation.org: vmalloc.h should include list.h for list_head] Signed-off-by: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp> Cc: Joonsoo Kim <js1304@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Dave Anderson <anderson@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: Ingo Molnar <mingo@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 15:54:34 -07:00
Joonsoo Kim	f1c4069e1d	mm, vmalloc: export vmap_area_list, instead of vmlist Although our intention is to unexport internal structure entirely, but there is one exception for kexec. kexec dumps address of vmlist and makedumpfile uses this information. We are about to remove vmlist, then another way to retrieve information of vmalloc layer is needed for makedumpfile. For this purpose, we export vmap_area_list, instead of vmlist. Signed-off-by: Joonsoo Kim <js1304@gmail.com> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Dave Anderson <anderson@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 15:54:34 -07:00
Joonsoo Kim	db3808c1ba	mm, vmalloc: move get_vmalloc_info() to vmalloc.c Now get_vmalloc_info() is in fs/proc/mmu.c. There is no reason that this code must be here and it's implementation needs vmlist_lock and it iterate a vmlist which may be internal data structure for vmalloc. It is preferable that vmlist_lock and vmlist is only used in vmalloc.c for maintainability. So move the code to vmalloc.c Signed-off-by: Joonsoo Kim <js1304@gmail.com> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Dave Anderson <anderson@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: Ingo Molnar <mingo@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 15:54:33 -07:00
Darrick J. Wong	7136851117	mm: make snapshotting pages for stable writes a per-bio operation Walking a bio's page mappings has proved problematic, so create a new bio flag to indicate that a bio's data needs to be snapshotted in order to guarantee stable pages during writeback. Next, for the one user (ext3/jbd) of snapshotting, hook all the places where writes can be initiated without PG_writeback set, and set BIO_SNAP_STABLE there. We must also flag journal "metadata" bios for stable writeout, since file data can be written through the journal. Finally, the MS_SNAP_STABLE mount flag (only used by ext3) is now superfluous, so get rid of it. [akpm@linux-foundation.org: rename _submit_bh()'s `flags' to `bio_flags', delobotomize the _submit_bh declaration] [akpm@linux-foundation.org: teeny cleanup] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Artem Bityutskiy <dedekind1@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 15:54:33 -07:00
Gerald Schaefer	106c992a5e	mm/hugetlb: add more arch-defined huge_pte functions Commit `abf09bed3c` ("s390/mm: implement software dirty bits") introduced another difference in the pte layout vs. the pmd layout on s390, thoroughly breaking the s390 support for hugetlbfs. This requires replacing some more pte_xxx functions in mm/hugetlbfs.c with a huge_pte_xxx version. This patch introduces those huge_pte_xxx functions and their generic implementation in asm-generic/hugetlb.h, which will now be included on all architectures supporting hugetlbfs apart from s390. This change will be a no-op for those architectures. [akpm@linux-foundation.org: fix warning] Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: Hillf Danton <dhillf@gmail.com> Acked-by: Michal Hocko <mhocko@suse.cz> [for !s390 parts] Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 15:54:33 -07:00
Josh Triplett	146732ce10	fs: don't compile in drop_caches.c when CONFIG_SYSCTL=n drop_caches.c provides code only invokable via sysctl, so don't compile it in when CONFIG_SYSCTL=n. Signed-off-by: Josh Triplett <josh@joshtriplett.org> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 15:54:33 -07:00
Michal Hocko	6d2488f64a	cgroup: remove css_get_next Now that we have generic and well ordered cgroup tree walkers there is no need to keep css_get_next in the place. Signed-off-by: Michal Hocko <mhocko@suse.cz> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Li Zefan <lizefan@huawei.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Ying Han <yinghan@google.com> Cc: Tejun Heo <htejun@gmail.com> Cc: Glauber Costa <glommer@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 15:54:33 -07:00
Jiang Liu	cfa11e08ed	mm: introduce free_highmem_page() helper to free highmem pages into buddy system The original goal of this patchset is to fix the bug reported by https://bugzilla.kernel.org/show_bug.cgi?id=53501 Now it has also been expanded to reduce common code used by memory initializion. This is the second part, which applies to the previous part at: http://marc.info/?l=linux-mm&m=136289696323825&w=2 It introduces a helper function free_highmem_page() to free highmem pages into the buddy system when initializing mm subsystem. Introduction of free_highmem_page() is one step forward to clean up accesses and modificaitons of totalhigh_pages, totalram_pages and zone->managed_pages etc. I hope we could remove all references to totalhigh_pages from the arch/ subdirectory. We have only tested these patchset on x86 platforms, and have done basic compliation tests using cross-compilers from ftp.kernel.org. That means some code may not pass compilation on some architectures. So any help to test this patchset are welcomed! There are several other parts still under development: Part3: refine code to manage totalram_pages, totalhigh_pages and zone->managed_pages Part4: introduce helper functions to simplify mem_init() and remove the global variable num_physpages. This patch: Introduce helper function free_highmem_page(), which will be used by architectures with HIGHMEM enabled to free highmem pages into the buddy system. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "Suzuki K. Poulose" <suzuki@in.ibm.com> Cc: Alexander Graf <agraf@suse.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Attilio Rao <attilio.rao@citrix.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Cong Wang <amwang@redhat.com> Cc: David Daney <david.daney@cavium.com> Cc: David Howells <dhowells@redhat.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Jiang Liu <liuj97@gmail.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Michal Simek <monstr@monstr.eu> Cc: Michel Lespinasse <walken@google.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Richard Weinberger <richard@nod.at> Cc: Rik van Riel <riel@redhat.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Stephen Boyd <sboyd@codeaurora.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yinghai Lu <yinghai@kernel.org> Reviewed-by: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 15:54:31 -07:00
Jiang Liu	69afade72a	mm: introduce common help functions to deal with reserved/managed pages The original goal of this patchset is to fix the bug reported by https://bugzilla.kernel.org/show_bug.cgi?id=53501 Now it has also been expanded to reduce common code used by memory initializion. This is the first part, which applies to v3.9-rc1. It introduces following common helper functions to simplify free_initmem() and free_initrd_mem() on different architectures: adjust_managed_page_count(): will be used to adjust totalram_pages, totalhigh_pages, zone->managed_pages when reserving/unresering a page. __free_reserved_page(): free a reserved page into the buddy system without adjusting page statistics info free_reserved_page(): free a reserved page into the buddy system and adjust page statistics info mark_page_reserved(): mark a page as reserved and adjust page statistics info free_reserved_area(): free a continous ranges of pages by calling free_reserved_page() free_initmem_default(): default method to free __init pages. We have only tested these patchset on x86 platforms, and have done basic compliation tests using cross-compilers from ftp.kernel.org. That means some code may not pass compilation on some architectures. So any help to test this patchset are welcomed! There are several other parts still under development: Part2: introduce free_highmem_page() to simplify freeing highmem pages Part3: refine code to manage totalram_pages, totalhigh_pages and zone->managed_pages Part4: introduce helper functions to simplify mem_init() and remove the global variable num_physpages. This patch: Code to deal with reserved/managed pages are duplicated by many architectures, so introduce common help functions to reduce duplicated code. These common help functions will also be used to concentrate code to modify totalram_pages and zone->managed_pages, which makes the code much more clear. Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Anatolij Gustschin <agust@denx.de> Cc: Aurelien Jacquiot <a-jacquiot@ti.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chen Liqin <liqin.chen@sunplusct.com> Cc: Chris Zankel <chris@zankel.net> Cc: David Howells <dhowells@redhat.com> Cc: David S. Miller <davem@davemloft.net> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Hirokazu Takata <takata@linux-m32r.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: James Hogan <james.hogan@imgtec.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Jiang Liu <liuj97@gmail.com> Cc: Jonas Bonn <jonas@southpole.se> Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com> Cc: Lennox Wu <lennox.wu@gmail.com> Cc: Mark Salter <msalter@redhat.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michal Simek <monstr@monstr.eu> Cc: Mikael Starvik <starvik@axis.com> Cc: Mike Frysinger <vapier@gentoo.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Russell King <linux@arm.linux.org.uk> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 15:54:29 -07:00
Paul E. McKenney	8375ad98cc	vm: adjust ifdef for TINY_RCU There is an ifdef in page_cache_get_speculative() that checks for !SMP and TREE_RCU, which has been an impossible combination since the advent of TINY_RCU. The ifdef enables a fastpath that is valid when preemption is disabled by rcu_read_lock() in UP systems, which is the case when TINY_RCU is enabled. This commit therefore adjusts the ifdef to generate the fastpath when TINY_RCU is enabled. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reported-by: Andi Kleen <andi@firstfloor.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 15:54:28 -07:00
Andrew Morton	250297edf8	mm/shmem.c: remove an ifdef Create a CONFIG_MMU=y stub for ramfs_nommu_expand_for_mapping() in the usual fashion. Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Wolfram Sang <wsa@the-dreams.de> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 15:54:28 -07:00
David Rientjes	4b59e6c473	mm, show_mem: suppress page counts in non-blockable contexts On large systems with a lot of memory, walking all RAM to determine page types may take a half second or even more. In non-blockable contexts, the page allocator will emit a page allocation failure warning unless __GFP_NOWARN is specified. In such contexts, irqs are typically disabled and such a lengthy delay may even result in NMI watchdog timeouts. To fix this, suppress the page walk in such contexts when printing the page allocation failure warning. Signed-off-by: David Rientjes <rientjes@google.com> Cc: Mel Gorman <mgorman@suse.de> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 15:54:28 -07:00
Robert Jarzmik	fe0bfaaff8	mm: trace filemap add and del Use the events API to trace filemap loading and unloading of file pieces into the page cache. This patch aims at tracing the eviction reload cycle of executable and shared libraries pages in a memory constrained environment. The typical usage is to spot a specific device and inode (for example /lib/libc.so) to see the eviction cycles, and find out if frequently used code is rather spread across many pages (bad) or coallesced (good). Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr> Cc: Dave Chinner <david@fromorbit.com> Cc: Hugh Dickins <hughd@google.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 15:54:28 -07:00
James Hogan	2c2fea1195	debug_locks.h: make warning more verbose The WARN_ON(1) in DEBUG_LOCKS_WARN_ON is surprisingly awkward to track down when it's hit, as it's usually buried in macros, causing multiple instances to land on the same line number. This patch makes it more useful by switching to: WARN(1, "DEBUG_LOCKS_WARN_ON(%s)", #c); so that the particular DEBUG_LOCKS_WARN_ON is more easily identified and grep'd for. For example: WARNING: at kernel/mutex.c:198 _mutex_lock_nested+0x31c/0x380() DEBUG_LOCKS_WARN_ON(l->magic != l) Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: David Howells <dhowells@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 15:54:27 -07:00
Haiyang Zhang	68a2d20b79	drivers/video: add Hyper-V Synthetic Video Frame Buffer Driver This is the driver for the Hyper-V Synthetic Video, which supports screen resolution up to Full HD 1920x1080 on Windows Server 2012 host, and 1600x1200 on Windows Server 2008 R2 or earlier. It also solves the double mouse cursor issue of the emulated video mode. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: K. Y. Srinivasan <kys@microsoft.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>, Cc: Olaf Hering <olaf@aepfle.de> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-04-29 15:54:26 -07:00
Linus Torvalds	9e8529afc4	Tracing updates for Linux 3.10 Along with the usual minor fixes and clean ups there are a few major changes with this pull request. 1) Multiple buffers for the ftrace facility This feature has been requested by many people over the last few years. I even heard that Google was about to implement it themselves. I finally had time and cleaned up the code such that you can now create multiple instances of the ftrace buffer and have different events go to different buffers. This way, a low frequency event will not be lost in the noise of a high frequency event. Note, currently only events can go to different buffers, the tracers (ie. function, function_graph and the latency tracers) still can only be written to the main buffer. 2) The function tracer triggers have now been extended. The function tracer had two triggers. One to enable tracing when a function is hit, and one to disable tracing. Now you can record a stack trace on a single (or many) function(s), take a snapshot of the buffer (copy it to the snapshot buffer), and you can enable or disable an event to be traced when a function is hit. 3) A perf clock has been added. A "perf" clock can be chosen to be used when tracing. This will cause ftrace to use the same clock as perf uses, and hopefully this will make it easier to interleave the perf and ftrace data for analysis. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQEcBAABAgAGBQJRfnTPAAoJEOdOSU1xswtMqYYH/1WIdrwXmxHflErnYkCIr3sU QtYae2K5A1HcgiqOvRJrdWMOt016iMx5CaQQyBFM1vvMiPY0sTWRmwNxDfZzz9LN 10jRvWEzZSLtzl+a9mkFWLEpr5nR/QODOxkWFCnRWscp46sp04LSTxGDYsOnPQZB sam/AQ1h4xA+DqDBChm9BDEUEPorGleTlN54LBaCGgSFGvrbF+eAg2s4vHNAQAvQ 8d5xjSE9zC7J+FqbVxvJTbKI3+EqKL6hMsJKsKfi0SI+FuxBaFMSltXck5zKyTI4 HpNJzXCmw+v90Tju7oMkPHh6RTbESPCHoGU+wqE52fM6m7oScVeuI/kfc6USwU4= =W1n+ -----END PGP SIGNATURE----- Merge tag 'trace-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "Along with the usual minor fixes and clean ups there are a few major changes with this pull request. 1) Multiple buffers for the ftrace facility This feature has been requested by many people over the last few years. I even heard that Google was about to implement it themselves. I finally had time and cleaned up the code such that you can now create multiple instances of the ftrace buffer and have different events go to different buffers. This way, a low frequency event will not be lost in the noise of a high frequency event. Note, currently only events can go to different buffers, the tracers (ie function, function_graph and the latency tracers) still can only be written to the main buffer. 2) The function tracer triggers have now been extended. The function tracer had two triggers. One to enable tracing when a function is hit, and one to disable tracing. Now you can record a stack trace on a single (or many) function(s), take a snapshot of the buffer (copy it to the snapshot buffer), and you can enable or disable an event to be traced when a function is hit. 3) A perf clock has been added. A "perf" clock can be chosen to be used when tracing. This will cause ftrace to use the same clock as perf uses, and hopefully this will make it easier to interleave the perf and ftrace data for analysis." * tag 'trace-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (82 commits) tracepoints: Prevent null probe from being added tracing: Compare to 1 instead of zero for is_signed_type() tracing: Remove obsolete macro guard _TRACE_PROFILE_INIT ftrace: Get rid of ftrace_profile_bits tracing: Check return value of tracing_init_dentry() tracing: Get rid of unneeded key calculation in ftrace_hash_move() tracing: Reset ftrace_graph_filter_enabled if count is zero tracing: Fix off-by-one on allocating stat->pages kernel: tracing: Use strlcpy instead of strncpy tracing: Update debugfs README file tracing: Fix ftrace_dump() tracing: Rename trace_event_mutex to trace_event_sem tracing: Fix comment about prefix in arch_syscall_match_sym_name() tracing: Convert trace_destroy_fields() to static tracing: Move find_event_field() into trace_events.c tracing: Use TRACE_MAX_PRINT instead of constant tracing: Use pr_warn_once instead of open coded implementation ring-buffer: Add ring buffer startup selftest tracing: Bring Documentation/trace/ftrace.txt up to date tracing: Add "perf" trace_clock ... Conflicts: kernel/trace/ftrace.c kernel/trace/trace.c	2013-04-29 13:55:38 -07:00
J. Bruce Fields	b1df763723	Merge branch 'nfs-for-next' of git://linux-nfs.org/~trondmy/nfs-2.6 into for-3.10 Note conflict: Chuck's patches modified (and made static) gss_mech_get_by_OID, which is still needed by gss-proxy patches. The conflict resolution is a bit minimal; we may want some more cleanup.	2013-04-29 16:23:34 -04:00
David Howells	2f96b8c1d5	proc: Split kcore bits from linux/procfs.h into linux/kcore.h Split kcore bits from linux/procfs.h into linux/kcore.h. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Ralf Baechle <ralf@linux-mips.org> cc: linux-mips@linux-mips.org cc: sparclinux@vger.kernel.org cc: x86@kernel.org cc: linux-mm@kvack.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-04-29 15:42:02 -04:00
David Howells	3cb5bf1bf9	proc: Delete create_proc_read_entry() Delete create_proc_read_entry() as it no longer has any users. Also delete read_proc_t, write_proc_t, the read_proc member of the proc_dir_entry struct and the support functions that use them. This saves a pointer for every PDE allocated. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-04-29 15:42:00 -04:00
David Howells	6bbefe8679	hostap: Don't use create_proc_read_entry() Don't use create_proc_read_entry() as that is deprecated, but rather use proc_create_data() and seq_file instead. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> cc: Jouni Malinen <j@w1.fi> cc: John W. Linville <linville@tuxdriver.com> cc: Johannes Berg <johannes@sipsolutions.net> cc: linux-wireless@vger.kernel.org cc: netdev@vger.kernel.org cc: devel@driverdev.osuosl.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-04-29 15:41:56 -04:00
David Howells	11db656ad4	nubus: Don't use create_proc_read_entry() Don't use create_proc_read_entry() as that is deprecated, but rather use proc_create_data() and seq_file instead. Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-m68k@lists.linux-m68k.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-04-29 15:41:54 -04:00
David Howells	d0206fb555	procfs: Mark create_proc_read_entry deprecated Mark create_proc_read_entry deprecated. proc_create[_data]() should be used instead. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-04-29 15:41:50 -04:00
Al Viro	3dc20cb282	new helper: read_code() switch binfmts that use ->read() to that (and to kernel_read() in several cases in binfmt_flat - sure, it's nommu, but still, doing ->read() into kmalloc'ed buffer...) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-04-29 15:40:23 -04:00
John W. Linville	17a2911f33	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem	2013-04-29 15:31:57 -04:00
Linus Torvalds	ec25e246b9	USB patches for 3.10-rc1 Here's the big USB pull request for 3.10-rc1. Lots of USB patches here, the majority being USB gadget changes and USB-serial driver cleanups, the rest being ARM build fixes / cleanups, and individual driver updates. We also finally got some chipidea fixes, which have been delayed for a number of kernel releases, as the maintainer has now reappeared. All of these have been in linux-next for a while. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iEYEABECAAYFAlF+md4ACgkQMUfUDdst+ymkSgCfZWIiCtiX/li0yJqSiRB4yYJx Ex0AoNemOOf6ywvSOHPbILTbJ1G+c/PX =JmvB -----END PGP SIGNATURE----- Merge tag 'usb-3.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB patches from Greg Kroah-Hartman: "Here's the big USB pull request for 3.10-rc1. Lots of USB patches here, the majority being USB gadget changes and USB-serial driver cleanups, the rest being ARM build fixes / cleanups, and individual driver updates. We also finally got some chipidea fixes, which have been delayed for a number of kernel releases, as the maintainer has now reappeared. All of these have been in linux-next for a while" * tag 'usb-3.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (568 commits) USB: ehci-msm: USB_MSM_OTG needs USB_PHY USB: OHCI: avoid conflicting platform drivers USB: OMAP: ISP1301 needs USB_PHY USB: lpc32xx: ISP1301 needs USB_PHY USB: ftdi_sio: enable two UART ports on ST Microconnect Lite usb: phy: tegra: don't call into tegra-ehci directly usb: phy: phy core cannot yet be a module USB: Fix initconst in ehci driver usb-storage: CY7C68300A chips do not support Cypress ATACB USB: serial: option: Added support Olivetti Olicard 145 USB: ftdi_sio: correct ST Micro Connect Lite PIDs ARM: mxs_defconfig: add CONFIG_USB_PHY ARM: imx_v6_v7_defconfig: add CONFIG_USB_PHY usb: phy: remove exported function from __init section usb: gadget: zero: put function instances on unbind usb: gadget: f_sourcesink.c: correct a copy-paste misnomer usb: gadget: cdc2: fix error return code in cdc_do_config() usb: gadget: multi: fix error return code in rndis_do_config() usb: gadget: f_obex: fix error return code in obex_bind() USB: storage: convert to use module_usb_driver() ...	2013-04-29 12:19:23 -07:00
Linus Torvalds	507ffe4f38	TTY/Serial driver update for 3.10-rc1 Here's the big tty/serial driver merge request for 3.10-rc1 Once again, Jiri has a number of TTY driver fixes and cleanups, and Peter Hurley came through with a bunch of ldisc fixes that resolve a number of reported issues. There are some other serial driver cleanups as well. All of these have been in the linux-next tree for a while. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iEYEABECAAYFAlF+nSMACgkQMUfUDdst+ymy9QCfRmYn0MC0W+Q1D3Spz87gVsuo cqEAniu1BEkYZpjAz7ZlIN07Ao0jbQOR =Osu/ -----END PGP SIGNATURE----- Merge tag 'tty-3.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial driver update from Greg Kroah-Hartman: "Here's the big tty/serial driver merge request for 3.10-rc1 Once again, Jiri has a number of TTY driver fixes and cleanups, and Peter Hurley came through with a bunch of ldisc fixes that resolve a number of reported issues. There are some other serial driver cleanups as well. All of these have been in the linux-next tree for a while" * tag 'tty-3.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (117 commits) tty/serial/sirf: fix MODULE_DEVICE_TABLE serial: mxs: drop superfluous {get\|put}_device serial: mxs: fix buffer overflow ARM: PL011: add support for extended FIFO-size of PL011-r1p5 serial_core.c: add put_device() after device_find_child() tty: Fix unsafe bit ops in tty_throttle_safe/unthrottle_safe serial: sccnxp: Replace pdata.init/exit with regulator API serial: sccnxp: Do not override device name TTY: pty, fix compilation warning TTY: rocket, fix compilation warning TTY: ircomm: fix DTR being raised on hang up TTY: synclinkmp: fix DTR being raised on hang up TTY: synclink_gt: fix DTR being raised on hang up TTY: synclink: fix DTR being raised on hang up serial: 8250_dw: Fix the stub for dw8250_probe_acpi() serial: 8250_dw: Convert to devm_ioremap() serial: 8250_dw: Set port capabilities based on CPR register serial: 8250_dw: Let ACPI code extract the DMA client info serial: 8250_dw: Support clk framework also with ACPI serial: 8250_dw: Enable runtime PM ...	2013-04-29 12:16:17 -07:00
Eric Dumazet	6a5dc9e598	net: Add MIB counters for checksum errors Add MIB counters for checksum errors in IP layer, and TCP/UDP/ICMP layers, to help diagnose problems. $ nstat -a \| grep Csum IcmpInCsumErrors 72 0.0 TcpInCsumErrors 382 0.0 UdpInCsumErrors 463221 0.0 Icmp6InCsumErrors 75 0.0 Udp6InCsumErrors 173442 0.0 IpExtInCsumErrors 10884 0.0 Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-29 15:14:03 -04:00
Eric Dumazet	aebda156a5	net: defer net_secret[] initialization Instead of feeding net_secret[] at boot time, defer the init at the point first socket is created. This permits some platforms to use better entropy sources than the ones available at boot time. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-29 15:14:02 -04:00
Linus Torvalds	fdc719b63a	Staging driver tree update for 3.10-rc1 Here's the big staging driver tree update for 3.10-rc1 This update contains loads of comedi driver cleanups and fixes in here, iio updates, android driver changes, and other various staging driver cleanups. Thanks to some drivers being removed, and the comedi driver cleanups, we have removed more code than we added: 627 files changed, 65145 insertions(+), 76321 deletions(-) which is always nice to see. All of these have been in linux-next for a while. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iEYEABECAAYFAlF+nF8ACgkQMUfUDdst+ynlIwCfYm2pSkA0w1K56mftq1T0hpMH b9IAmwQlfEHSIKeAxqRO3RRrfLu5XD7L =Jnxr -----END PGP SIGNATURE----- Merge tag 'staging-3.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging Pull staging driver tree update from Greg Kroah-Hartman: "Here's the big staging driver tree update for 3.10-rc1 This update contains loads of comedi driver cleanups and fixes in here, iio updates, android driver changes, and other various staging driver cleanups. Thanks to some drivers being removed, and the comedi driver cleanups, we have removed more code than we added: 627 files changed, 65145 insertions(+), 76321 deletions(-) which is always nice to see. All of these have been in linux-next for a while." * tag 'staging-3.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (940 commits) staging: comedi: ni_labpc: fix legacy driver build staging: comedi: das800: cleanup the cio-das802/16 fifo comments staging: comedi: das800: rename CamelCase vars in das800_ai_do_cmd() staging: comedi: das800: tidy up the private data staging: comedi: das800: tidy up das800_interrupt() staging: comedi: das800: tidy up das800_ai_insn_read() staging: comedi: das800: tidy up das800_di_insn_bits() staging: comedi: das800: tidy up das800_do_insn_bits() staging: comedi: das800: remove extra divisor calculation call staging: comedi: das800: rename {enable,disable}_das800 staging: comedi: das800: tidy up subdevice init staging: comedi: das800: allow attaching without interrupt support staging: comedi: das800: interrupts are required for async command support staging: comedi: das800: tidy up das800_ai_do_cmdtest() staging: comedi: das800: remove 'volatile' on private data variables staging: comedi: das800: cleanup the boardinfo staging: comedi: das800: cleanup range table declarations staging: comedi: das800: introduce das800_ind_{write, read}() staging: comedi: das800: remove forward declarations staging: comedi: das800: move das800_set_frequency() ...	2013-04-29 11:34:17 -07:00
Linus Torvalds	2794b5d408	Driver core update for 3.10-rc1 Here's the merge request for the driver core tree for 3.10-rc1 It's pretty small, just a number of driver core and sysfs updates and fixes, all of which have been in linux-next for a while now. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iEYEABECAAYFAlF+m4cACgkQMUfUDdst+ymp+wCgv/F7zAhZsKW5YT9A/FsTNl3m Ge8AnRlfYPwxM1Zt4kIuDAwfKuLTYV/B =swS7 -----END PGP SIGNATURE----- Merge tag 'driver-core-3.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core update from Greg Kroah-Hartman: "Here's the merge request for the driver core tree for 3.10-rc1 It's pretty small, just a number of driver core and sysfs updates and fixes, all of which have been in linux-next for a while now. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>" Fixed conflict in kernel/rtmutex-tester.c, the locking tree had a better fix for the same sysfs file mode problem. * tag 'driver-core-3.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: PM / Runtime: Idle devices asynchronously after probe\|release driver core: handle user namespaces properly with the uid/gid devtmpfs change driver core: devtmpfs: fix compile failure with CONFIG_UIDGID_STRICT_TYPE_CHECKS devtmpfs: add base.h include driver core: add uid and gid to devtmpfs sysfs: check if one entry has been removed before freeing sysfs: fix crash_notes_size build warning sysfs: fix use after free in case of concurrent read/write and readdir rtmutex-tester: fix mode of sysfs files Documentation: Add ABI entry for crash_notes and crash_notes_size sysfs: Add crash_notes_size to export percpu note size driver core: platform_device.h: fix checkpatch errors and warnings driver core: platform.c: fix checkpatch errors and warnings driver core: warn that platform_driver_probe can not use deferred probing sysfs: use atomic_inc_unless_negative in sysfs_get_active base: core: WARN() about bogus permissions on device attributes device: separate all subsys mutexes	2013-04-29 11:31:50 -07:00
David S. Miller	14d3692f04	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== The following patchset contains relevant updates for the Netfilter tree, they are: * Enhancements for ipset: Add the counter extension for sets, this information can be used from the iptables set match, to change the matching behaviour. Jozsef required to add the extension infrastructure and moved the existing timeout support upon it. This also includes a change in net/sched/em_ipset to adapt it to the new extension structure. * Enhancements for performance boosting in nfnetlink_queue: Add new configuration flags that allows user-space to receive big packets (GRO) and to disable checksumming calculation. This were proposed by Eric Dumazet during the Netfilter Workshop 2013 in Copenhagen. Florian Westphal was kind enough to find the time to materialize the proposal. * A sparse fix from Simon, he noticed it in the SCTP NAT helper, the fix required a change in the interface of sctp_end_cksum. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-29 14:29:06 -04:00
Linus Torvalds	4f567cbc95	Char / Misc driver update for 3.10-rc1 Here's the big char / misc driver update for 3.10-rc1 A number of various driver updates, the majority being new functionality in the MEI driver subsystem (it's now a subsystem, it started out just a single driver), extcon updates, memory updates, hyper-v updates, and a bunch of other small stuff that doesn't fit in any other tree. All of these have been in linux-next for a while Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iEYEABECAAYFAlF+mtYACgkQMUfUDdst+ymFXQCfdLsD4Cxz+jkgW+tljh9i70XD OFkAnRPMMhLS8/kddf02lLMYzYUFdy1U =zaFJ -----END PGP SIGNATURE----- Merge tag 'char-misc-3.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver update from Greg Kroah-Hartman: "Here's the big char / misc driver update for 3.10-rc1 A number of various driver updates, the majority being new functionality in the MEI driver subsystem (it's now a subsystem, it started out just a single driver), extcon updates, memory updates, hyper-v updates, and a bunch of other small stuff that doesn't fit in any other tree. All of these have been in linux-next for a while" * tag 'char-misc-3.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (148 commits) Tools: hv: Fix a checkpatch warning tools: hv: skip iso9660 mounts in hv_vss_daemon tools: hv: use FIFREEZE/FITHAW in hv_vss_daemon tools: hv: use getmntent in hv_vss_daemon Tools: hv: Fix a checkpatch warning tools: hv: fix checks for origin of netlink message in hv_vss_daemon Tools: hv: fix warnings in hv_vss_daemon misc: mark spear13xx-pcie-gadget as broken mei: fix krealloc() misuse in in mei_cl_irq_read_msg() mei: reduce flow control only for completed messages mei: reseting -> resetting mei: fix reading large reposnes mei: revamp mei_irq_read_client_message function mei: revamp mei_amthif_irq_read_message mei: revamp hbm state machine Revert "drivers/scsi: use module_pcmcia_driver() in pcmcia drivers" Revert "scsi: pcmcia: nsp_cs: remove module init/exit function prototypes" scsi: pcmcia: nsp_cs: remove module init/exit function prototypes mei: wd: fix line over 80 characters misc: tsl2550: Use dev_pm_ops ...	2013-04-29 11:18:34 -07:00
Simon Horman	eee1d5a147	sctp: Correct type and usage of sctp_end_cksum() Change the type of the crc32 parameter of sctp_end_cksum() from __be32 to __u32 to reflect that fact that it is passed to cpu_to_le32(). There are five in-tree users of sctp_end_cksum(). The following four had warnings flagged by sparse which are no longer present with this change. net/netfilter/ipvs/ip_vs_proto_sctp.c:sctp_nat_csum() net/netfilter/ipvs/ip_vs_proto_sctp.c:sctp_csum_check() net/sctp/input.c:sctp_rcv_checksum() net/sctp/output.c:sctp_packet_transmit() The fifth user is net/netfilter/nf_nat_proto_sctp.c:sctp_manip_pkt(). It has been updated to pass a __u32 instead of a __be32, the value in question was already calculated in cpu byte-order. net/netfilter/nf_nat_proto_sctp.c:sctp_manip_pkt() has also been updated to assign the return value of sctp_end_cksum() directly to a variable of type __le32, matching the type of the return value. Previously the return value was assigned to a variable of type __be32 and then that variable was finally assigned to another variable of type __le32. Problems flagged by sparse. Compile and sparse tested only. Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-04-29 20:09:08 +02:00
Florian Westphal	00bd1cc24a	netfilter: nfnetlink_queue: avoid expensive gso segmentation and checksum fixup Userspace can now indicate that it can cope with larger-than-mtu sized packets and packets that have invalid ipv4/tcp checksums. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-04-29 20:09:07 +02:00
Florian Westphal	7237190df8	netfilter: nfnetlink_queue: add skb info attribute Once we allow userspace to receive gso/gro packets, userspace needs to be able to determine when checksums appear to be broken, but are not. NFQA_SKB_CSUMNOTREADY means 'checksums will be fixed in kernel later, pretend they are ok'. NFQA_SKB_GSO could be used for statistics, or to determine when packet size exceeds mtu. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-04-29 20:09:06 +02:00
Florian Westphal	a5fedd43d5	netfilter: move skb_gso_segment into nfnetlink_queue module skb_gso_segment is expensive, so it would be nice if we could avoid it in the future. However, userspace needs to be prepared to receive larger-than-mtu-packets (which will also have incorrect l3/l4 checksums), so we cannot simply remove it. The plan is to add a per-queue feature flag that userspace can set when binding the queue. The problem is that in nf_queue, we only have a queue number, not the queue context/configuration settings. This patch should have no impact other than the skb_gso_segment call now being in a function that has access to the queue config data. A new size attribute in nf_queue_entry is needed so nfnetlink_queue can duplicate the entry of the gso skb when segmenting the skb while also copying the route key. The follow up patch adds switch to disable skb_gso_segment when queue config says so. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2013-04-29 20:09:05 +02:00

... 3 4 5 6 7 ...

58723 Commits