linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-11-19 02:04:19 +08:00

Author	SHA1	Message	Date
Michal Hocko	ae79878356	mm: make vm_munmap killable Almost all current users of vm_munmap are ignoring the return value and so they do not handle potential error. This means that some VMAs might stay behind. This patch doesn't try to solve those potential problems. Quite contrary it adds a new failure mode by using down_write_killable in vm_munmap. This should be safer than other failure modes, though, because the process is guaranteed to die as soon as it leaves the kernel and exit_mmap will clean the whole address space. This will help in the OOM conditions when the oom victim might be stuck waiting for the mmap_sem for write which in turn can block oom_reaper which relies on the mmap_sem for read to make a forward progress and reclaim the address space of the victim. Signed-off-by: Michal Hocko <mhocko@suse.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-05-23 17:04:14 -07:00
Michal Hocko	9fbeb5ab59	mm: make vm_mmap killable All the callers of vm_mmap seem to check for the failure already and bail out in one way or another on the error which means that we can change it to use killable version of vm_mmap_pgoff and return -EINTR if the current task gets killed while waiting for mmap_sem. This also means that vm_mmap_pgoff can be killable by default and drop the additional parameter. This will help in the OOM conditions when the oom victim might be stuck waiting for the mmap_sem for write which in turn can block oom_reaper which relies on the mmap_sem for read to make a forward progress and reclaim the address space of the victim. Please note that load_elf_binary is ignoring vm_mmap error for current->personality & MMAP_PAGE_ZERO case but that shouldn't be a problem because the address is not used anywhere and we never return to the userspace if we got killed. Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-05-23 17:04:14 -07:00
Michal Hocko	dc0ef0df7b	mm: make mmap_sem for write waits killable for mm syscalls This is a follow up work for oom_reaper [1]. As the async OOM killing depends on oom_sem for read we would really appreciate if a holder for write didn't stood in the way. This patchset is changing many of down_write calls to be killable to help those cases when the writer is blocked and waiting for readers to release the lock and so help __oom_reap_task to process the oom victim. Most of the patches are really trivial because the lock is help from a shallow syscall paths where we can return EINTR trivially and allow the current task to die (note that EINTR will never get to the userspace as the task has fatal signal pending). Others seem to be easy as well as the callers are already handling fatal errors and bail and return to userspace which should be sufficient to handle the failure gracefully. I am not familiar with all those code paths so a deeper review is really appreciated. As this work is touching more areas which are not directly connected I have tried to keep the CC list as small as possible and people who I believed would be familiar are CCed only to the specific patches (all should have received the cover though). This patchset is based on linux-next and it depends on down_write_killable for rw_semaphores which got merged into tip locking/rwsem branch and it is merged into this next tree. I guess it would be easiest to route these patches via mmotm because of the dependency on the tip tree but if respective maintainers prefer other way I have no objections. I haven't covered all the mmap_write(mm->mmap_sem) instances here $ git grep "down_write(.\<mmap_sem\>)" next/master \| wc -l 98 $ git grep "down_write(.\<mmap_sem\>)" \| wc -l 62 I have tried to cover those which should be relatively easy to review in this series because this alone should be a nice improvement. Other places can be changed on top. [0] http://lkml.kernel.org/r/1456752417-9626-1-git-send-email-mhocko@kernel.org [1] http://lkml.kernel.org/r/1452094975-551-1-git-send-email-mhocko@kernel.org [2] http://lkml.kernel.org/r/1456750705-7141-1-git-send-email-mhocko@kernel.org This patch (of 18): This is the first step in making mmap_sem write waiters killable. It focuses on the trivial ones which are taking the lock early after entering the syscall and they are not changing state before. Therefore it is very easy to change them to use down_write_killable and immediately return with -EINTR. This will allow the waiter to pass away without blocking the mmap_sem which might be required to make a forward progress. E.g. the oom reaper will need the lock for reading to dismantle the OOM victim address space. The only tricky function in this patch is vm_mmap_pgoff which has many call sites via vm_mmap. To reduce the risk keep vm_mmap with the original non-killable semantic for now. vm_munmap callers do not bother checking the return value so open code it into the munmap syscall path for now for simplicity. Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@suse.de> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Hugh Dickins <hughd@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-05-23 17:04:14 -07:00
Kieran Bingham	e10af1328b	MAINTAINERS: add co-maintainer for scripts/gdb Add myself as a co-maintainer for scripts/gdb supporting Jan Kizka Link: http://lkml.kernel.org/r/fb5d34ce563f33d2f324f26f592b24ded30032ee.1462865983.git.jan.kiszka@siemens.com Signed-off-by: Kieran Bingham <kieran@bingham.xyz> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-05-23 17:04:14 -07:00
Kieran Bingham	b3b0842985	scripts/gdb: decode bytestream on dmesg for Python3 The recent fixes to lx-dmesg, now allow the command to print successfully on Python3, however the python interpreter wraps the bytes for each line with a b'<text>' marker. To remove this, we need to decode the line, where .decode() will default to 'UTF-8' Link: http://lkml.kernel.org/r/d67ccf93f2479c94cb3399262b9b796e0dbefcf2.1462865983.git.jan.kiszka@siemens.com Signed-off-by: Kieran Bingham <kieran@bingham.xyz> Acked-by: Dom Cote <buzdelabuz2@gmail.com> Tested-by: Dom Cote <buzdelabuz2@gmail.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-05-23 17:04:14 -07:00
Dom Cote	d21d5b9eb0	scripts/gdb: fix issue with dmesg.py and python 3.X When built against Python 3, GDB differs in the return type for its read_memory function, causing the lx-dmesg command to fail. Now that we have an improved read_16() we can use the new read_memoryview() abstraction to make lx-dmesg return valid data on both current Python APIs Tested with python 3.4 and 2.7 Tested with gdb 7.7 Link: http://lkml.kernel.org/r/28477b727ff7fe3101fd4e426060e8a68317a639.1462865983.git.jan.kiszka@siemens.com Signed-off-by: Dom Cote <buzdelabuz2+git@gmail.com> [kieran@bingham.xyz: Adjusted commit log to better reflect code changes] Tested-by: Kieran Bingham <kieran@bingham.xyz> (Py2.7,Py3.4,GDB10) Signed-off-by: Kieran Bingham <kieran@bingham.xyz> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-05-23 17:04:14 -07:00
Dom Cote	321958d971	scripts/gdb: improve types abstraction for gdb python scripts Change the read_u16 function so it accepts both 'str' and 'byte' as type for the arguments. When calling read_memory() from gdb API, depending on if it was built with 2.7 or 3.X, the format used to return the data will differ ( 'str' for 2.7, and 'byte' for 3.X ). Add a function read_memoryview() to be able to get a 'memoryview' object back from read_memory() both with python 2.7 and 3.X . Tested with python 3.4 and 2.7 Tested with gdb 7.7 Link: http://lkml.kernel.org/r/73621f564503137a002a639d174e4fb35f73f462.1462865983.git.jan.kiszka@siemens.com Signed-off-by: Dom Cote <buzdelabuz2+git@gmail.com> Tested-by: Kieran Bingham <kieran@bingham.xyz> (Py2.7,Py3.4,GDB10) Signed-off-by: Kieran Bingham <kieran@bingham.xyz> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-05-23 17:04:14 -07:00
Kieran Bingham	9f66dee720	scripts/gdb: add lx_thread_info_by_pid helper The tasks module already provides helpers to find the task struct by pid, and the thread_info by task struct; however this is cumbersome to utilise on the gdb commandline. Wrap these two functionalities together in an extra single helper to allow exploring the thread info, from a PID value Link: http://lkml.kernel.org/r/dadc5667f053ec811eb3e3033d99d937fedbc93b.1462865983.git.jan.kiszka@siemens.com Signed-off-by: Kieran Bingham <kieran.bingham@linaro.org> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-05-23 17:04:14 -07:00
Kieran Bingham	9b5580359a	scripts/gdb: add documentation example for radix tree Provide a worked example for utilising the lx_radix_tree_lookup function Link: http://lkml.kernel.org/r/e786008ac5aec4b84198812805b326d718bdeb4b.1462865983.git.jan.kiszka@siemens.com Signed-off-by: Kieran Bingham <kieran.bingham@linaro.org> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-05-23 17:04:14 -07:00
Kieran Bingham	e127a73d41	scripts/gdb: add a Radix Tree Parser Linux makes use of the Radix Tree data structure to store pointers indexed by integer values. This structure is utilised across many structures in the kernel including the IRQ descriptor tables, and several filesystems. This module provides a method to lookup values from a structure given its head node. Usage: The function lx_radix_tree_lookup, must be given a symbol of type struct radix_tree_root, and an index into that tree. The object returned is a generic integer value, and must be cast correctly to the type based on the storage in the data structure. For example, to print the irq descriptor in the sparse irq_desc_tree at index 18, try the following: (gdb) print (struct irq_desc)$lx_radix_tree_lookup(irq_desc_tree, 18) Link: http://lkml.kernel.org/r/d2028c55e50cf95a9b7f8ca0d11885174b0cc709.1462865983.git.jan.kiszka@siemens.com Signed-off-by: Kieran Bingham <kieran.bingham@linaro.org> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-05-23 17:04:14 -07:00
Jan Kiszka	4bc393dbcf	scripts/gdb: cast CPU numbers to integer We won't see more than 2 billion CPUs any time soon, and having cpu_list return long makes the output of lx-cpus a bit ugly. Link: http://lkml.kernel.org/r/dcb45c3b0a59e0fd321fa56ff7aa398458c689b3.1462865983.git.jan.kiszka@siemens.com Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-05-23 17:04:14 -07:00
Kieran Bingham	b1503934a5	scripts/gdb: add cpu iterators The linux kernel provides macro's for iterating against values from the cpu_list masks. By providing some commonly used masks, we can mirror the kernels helper macros with easy to use generators. Link: http://lkml.kernel.org/r/d045c6599771ada1999d49612ee30fd2f9acf17f.1462865983.git.jan.kiszka@siemens.com Signed-off-by: Kieran Bingham <kieran.bingham@linaro.org> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-05-23 17:04:14 -07:00
Kieran Bingham	c1a153992e	scripts/gdb: add mount point list command lx-mounts will identify current mount points based on the 'init_task' namespace by default, as we do not yet have a kernel thread list implementation to select the current running thread. Optionally, a user can specify a PID to list from that process' namespace Link: http://lkml.kernel.org/r/e614c7bc32d2350b4ff1627ec761a7148e65bfe6.1462865983.git.jan.kiszka@siemens.com Signed-off-by: Kieran Bingham <kieran.bingham@linaro.org> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-05-23 17:04:14 -07:00
Kieran Bingham	e7165a2d7d	scripts/gdb: add io resource readers Provide iomem_resource and ioports_resource printers and command hooks It can be quite interesting to halt the kernel as it's booting and check to see this list as it is being populated. It should be useful in the event that a kernel is not booting, you can identify what memory resources have been registered Link: http://lkml.kernel.org/r/f0a6b9fa9c92af4d7ed2e7343ccc84150e9c6fc5.1462865983.git.jan.kiszka@siemens.com Signed-off-by: Kieran Bingham <kieran.bingham@linaro.org> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-05-23 17:04:14 -07:00
Kieran Bingham	74627cf2df	scripts/gdb: provide a dentry_name VFS path helper Walk the VFS entries, pre-pending the iname strings to generate a full VFS path name from a dentry. Link: http://lkml.kernel.org/r/4328fdb2d15ba7f1b21ad21c2eecc38d9cfc4d13.1462865983.git.jan.kiszka@siemens.com Signed-off-by: Kieran Bingham <kieran.bingham@linaro.org> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-05-23 17:04:14 -07:00
Kieran Bingham	958ef8a09a	scripts/gdb: support !CONFIG_MODULES gracefully If CONFIG_MODULES is not enabled, lx-lsmod tries to find a non-existent symbol and generates an unfriendly traceback: (gdb) lx-lsmod Address Module Size Used by Traceback (most recent call last): File "scripts/gdb/linux/modules.py", line 75, in invoke for module in module_list(): File "scripts/gdb/linux/modules.py", line 24, in module_list module_ptr_type = module_type.get_type().pointer() File "scripts/gdb/linux/utils.py", line 28, in get_type self._type = gdb.lookup_type(self._name) gdb.error: No struct type named module. Error occurred in Python command: No struct type named module. Catch the error and return an empty module_list() for a clean command output as follows: (gdb) lx-lsmod Address Module Size Used by (gdb) Link: http://lkml.kernel.org/r/94d533819437408b85ae5864f939dd7ca6fbfcd6.1462865983.git.jan.kiszka@siemens.com Signed-off-by: Kieran Bingham <kieran.bingham@linaro.org> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-05-23 17:04:14 -07:00
Kieran Bingham	e78f3d70b3	scripts/gdb: provide exception catching parser If we attempt to read a value that is not available to GDB, an exception is raised. Most of the time, this is a good thing; however on occasion we will want to be able to determine if a symbol is available. By catching the exception to simply return None, we can determine if we tried to read an invalid value, without the exception taking our execution context away from us Link: http://lkml.kernel.org/r/c72b25c06fc66e1d68371154097e2cbb112555d8.1462865983.git.jan.kiszka@siemens.com Signed-off-by: Kieran Bingham <kieran.bingham@linaro.org> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-05-23 17:04:14 -07:00
Kieran Bingham	619ccaf3e9	scripts/gdb: convert modules usage to lists functions Simplify the module list functions with the new list_for_each_entry abstractions Link: http://lkml.kernel.org/r/ad0101c9391088608166fcec26af179868973d86.1462865983.git.jan.kiszka@siemens.com Signed-off-by: Kieran Bingham <kieran.bingham@linaro.org> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-05-23 17:04:14 -07:00
Kieran Bingham	a84be61d0e	scripts/gdb: provide kernel list item generators Facilitate linked-list items by providing a generator to return the dereferenced, and type-cast objects from a kernel linked list Link: http://lkml.kernel.org/r/2b0998564e6e5abe53585d466f87e491331fd2a4.1462865983.git.jan.kiszka@siemens.com Signed-off-by: Kieran Bingham <kieran.bingham@linaro.org> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Cc: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-05-23 17:04:14 -07:00
Kieran Bingham	f197d75fca	scripts/gdb: provide linux constants Some macro's and defines are needed when parsing memory, and without compiling the kernel as -g3 they are not available in the debug-symbols. We use the pre-processor here to extract constants to a dedicated module for the linux debugger extensions Top level Kbuild is used to call in and generate the constants file, while maintaining dependencies on autogenerated files in include/generated Link: http://lkml.kernel.org/r/bc3df9c25f57ea72177c066a51a446fc19e2c27f.1462865983.git.jan.kiszka@siemens.com Signed-off-by: Kieran Bingham <kieran.bingham@linaro.org> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Cc: Michal Marek <mmarek@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-05-23 17:04:14 -07:00
Jan Kiszka	0c22fde8b0	scripts/gdb: Adjust module reference counter reported by lx-lsmod This takes the MODULE_REF_BASE into account. Link: http://lkml.kernel.org/r/d926d2d54caa034adb964b52215090cbdb875249.1462865983.git.jan.kiszka@siemens.com Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-05-23 17:04:14 -07:00
Konstantin Khlebnikov	7efb2a7b85	arch/defconfig: remove CONFIG_RESOURCE_COUNTERS This option was replaced by PAGE_COUNTER which is selected by MEMCG. Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-05-23 17:04:14 -07:00
Muhammad Falak R Wani	de6cdcb5ce	drivers/memstick/core/mspro_block: use kmemdup Use kmemdup when some other buffer is immediately copied into allocated region. It replaces call to allocation followed by memcpy, by a single call to kmemdup. [akpm@linux-foundation.org: remove unneeded cast to void*] Link: http://lkml.kernel.org/r/1463665743-16269-1-git-send-email-falakreyaz@gmail.com Signed-off-by: Muhammad Falak R Wani <falakreyaz@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-05-23 17:04:14 -07:00
Oleksandr Natalenko	a831979f85	rtsx_usb_ms: use schedule_timeout_idle() in polling loop First version of this patch has already been posted to LKML by Ben Hutchings ~6 months ago, but no further action were performed. Ben's original message: : rtsx_usb_ms creates a task that mostly sleeps, but tasks in : uninterruptible sleep still contribute to the load average (for : bug-compatibility with Unix). A load average of ~1 on a system that : should be idle is somewhat alarming. : : Change the sleep to be interruptible, but still ignore signals. References: https://bugs.debian.org/765717 Link: http://lkml.kernel.org/r/b49f95ae83057efa5d96f532803cba47@natalenko.name Signed-off-by: Oleksandr Natalenko <oleksandr@natalenko.name> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Lee Jones <lee.jones@linaro.org> Cc: Wolfram Sang <wsa@the-dreams.de> Cc: Roger Tseng <rogerable@realtek.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-05-23 17:04:14 -07:00
Corey Minyard	a0c20deae9	kdump: fix gdb macros work work with newer and 64-bit kernels Lots of little changes needed to be made to clean these up, remove the four byte pointer assumption and traverse the pid queue properly. Also consolidate the traceback code into a single function instead of having three copies of it. Link: http://lkml.kernel.org/r/1462926655-9390-1-git-send-email-minyard@acm.org Signed-off-by: Corey Minyard <cminyard@mvista.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Haren Myneni <hbabu@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-05-23 17:04:14 -07:00
Xunlei Pang	7a0058ec78	s390/kexec: consolidate crash_map/unmap_reserved_pages() and arch_kexec_protect(unprotect)_crashkres() Commit 3f625002581b ("kexec: introduce a protection mechanism for the crashkernel reserved memory") is a similar mechanism for protecting the crash kernel reserved memory to previous crash_map/unmap_reserved_pages() implementation, the new one is more generic in name and cleaner in code (besides, some arch may not be allowed to unmap the pgtable). Therefore, this patch consolidates them, and uses the new arch_kexec_protect(unprotect)_crashkres() to replace former crash_map/unmap_reserved_pages() which by now has been only used by S390. The consolidation work needs the crash memory to be mapped initially, this is done in machine_kdump_pm_init() which is after reserve_crashkernel(). Once kdump kernel is loaded, the new arch_kexec_protect_crashkres() implemented for S390 will actually unmap the pgtable like before. Signed-off-by: Xunlei Pang <xlpang@redhat.com> Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Minfei Huang <mhuang@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Baoquan He <bhe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-05-23 17:04:14 -07:00
Minfei Huang	0eea08678e	kexec: do a cleanup for function kexec_load There are a lof of work to be done in function kexec_load, not only for allocating structs and loading initram, but also for some misc. To make it more clear, wrap a new function do_kexec_load which is used to allocate structs and load initram. And the pre-work will be done in kexec_load. Signed-off-by: Minfei Huang <mnfhuang@gmail.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Xunlei Pang <xlpang@redhat.com> Cc: Baoquan He <bhe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-05-23 17:04:14 -07:00
Minfei Huang	917a35605f	kexec: make a pair of map/unmap reserved pages in error path For some arch, kexec shall map the reserved pages, then use them, when we try to start the kdump service. kexec may return directly, without unmaping the reserved pages, if it fails during starting service. To fix it, we make a pair of map/unmap reserved pages both in generic path and error path. This patch only affects s390. Other architecturess don't implement the interface of crash_unmap_reserved_pages and crash_map_reserved_pages. It isn't a urgent patch. Kernel can work well without any risk, although the reserved pages are not unmapped before returning in error path. Signed-off-by: Minfei Huang <mnfhuang@gmail.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Xunlei Pang <xlpang@redhat.com> Cc: Baoquan He <bhe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-05-23 17:04:14 -07:00
Xunlei Pang	1e5768ae75	kexec: provide arch_kexec_protect(unprotect)_crashkres() Implement the protection method for the crash kernel memory reservation for the 64-bit x86 kdump. Signed-off-by: Xunlei Pang <xlpang@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Dave Young <dyoung@redhat.com> Cc: Minfei Huang <mhuang@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Baoquan He <bhe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-05-23 17:04:14 -07:00
Xunlei Pang	9b492cf580	kexec: introduce a protection mechanism for the crashkernel reserved memory For the cases that some kernel (module) path stamps the crash reserved memory(already mapped by the kernel) where has been loaded the second kernel data, the kdump kernel will probably fail to boot when panic happens (or even not happens) leaving the culprit at large, this is unacceptable. The patch introduces a mechanism for detecting such cases: 1) After each crash kexec loading, it simply marks the reserved memory regions readonly since we no longer access it after that. When someone stamps the region, the first kernel will panic and trigger the kdump. The weak arch_kexec_protect_crashkres() is introduced to do the actual protection. 2) To allow multiple loading, once 1) was done we also need to remark the reserved memory to readwrite each time a system call related to kdump is made. The weak arch_kexec_unprotect_crashkres() is introduced to do the actual protection. The architecture can make its specific implementation by overriding arch_kexec_protect_crashkres() and arch_kexec_unprotect_crashkres(). Signed-off-by: Xunlei Pang <xlpang@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Dave Young <dyoung@redhat.com> Cc: Minfei Huang <mhuang@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Baoquan He <bhe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-05-23 17:04:14 -07:00
Oleg Nesterov	9eb8a659de	exec: remove the no longer needed remove_arg_zero()->free_arg_page() remove_arg_zero() does free_arg_page() for no reason. This was needed before and only if CONFIG_MMU=y: see commit `4fc75ff481` ("exec: fix remove_arg_zero"), install_arg_page() was called for every page != NULL in bprm->page[] array. Today install_arg_page() has already gone and free_arg_page() is nop after another commit `b6a2fea393` ("mm: variable length argument support"). CONFIG_MMU=n does free_arg_pages() in free_bprm() and thus it doesn't need remove_arg_zero()->free_arg_page() too; apart from get_arg_page() it never checks if the page in bprm->page[] was allocated or not, so the "extra" non-freed page is fine. OTOH, this free_arg_page() can add the minor pessimization, the caller is going to do copy_strings_kernel() right after remove_arg_zero() which will likely need to re-allocate the same page again. And as Hujunjie pointed out, the "offset == PAGE_SIZE" check is wrong because we are going to increment bprm->p once again before return, so CONFIG_MMU=n "leaks" the page anyway if '0' is the final byte in this page. NOTE: remove_arg_zero() assumes that argv[0] is null-terminated but this is not necessarily true. copy_strings() does "len = strnlen_user(...)", then copy_from_user(len) but another thread or debuger can overwrite the trailing '0' in between. Afaics nothing really bad can happen because we must always have the null-terminated bprm->filename copied by the 1st copy_strings_kernel(), but perhaps we should change this code to check "bprm->p < bprm->exec" anyway, and/or change copy_strings() to ensure that the last byte in string is always zero. Link: http://lkml.kernel.org/r/20160517155335.GA31435@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reported by: hujunjie <jj.net@163.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-05-23 17:04:14 -07:00
Andi Kleen	725fc629ff	kernek/fork.c: allocate idle task for a CPU always on its local node Linux preallocates the task structs of the idle tasks for all possible CPUs. This currently means they all end up on node 0. This also implies that the cache line of MWAIT, which is around the flags field in the task struct, are all located in node 0. We see a noticeable performance improvement on Knights Landing CPUs when the cache lines used for MWAIT are located in the local nodes of the CPUs using them. I would expect this to give a (likely slight) improvement on other systems too. The patch implements placing the idle task in the node of its CPUs, by passing the right target node to copy_process() [akpm@linux-foundation.org: use NUMA_NO_NODE, not a bare -1] Link: http://lkml.kernel.org/r/1463492694-15833-1-git-send-email-andi@firstfloor.org Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-05-23 17:04:14 -07:00
Oleg Nesterov	5c8ccefdf4	signal: move the "sig < SIGRTMIN" check into siginmask(sig) All the users of siginmask() must ensure that sig < SIGRTMIN. sig_fatal() doesn't and this is wrong: UBSAN: Undefined behaviour in kernel/signal.c:911:6 shift exponent 32 is too large for 32-bit type 'long unsigned int' the patch doesn't add the neccesary check to sig_fatal(), it moves the check into siginmask() and updates other callers. Link: http://lkml.kernel.org/r/20160517195052.GA15187@redhat.com Reported-by: Meelis Roos <mroos@linux.ee> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-05-23 17:04:14 -07:00
Wang Xiaoqiang	747800efbe	kernel/signal.c: convert printk(KERN_<LEVEL> ...) to pr_<level>(...) Use pr_<level> instead of printk(KERN_<LEVEL> ). Signed-off-by: Wang Xiaoqiang <wangxq10@lzu.edu.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-05-23 17:04:14 -07:00
Tetsuo Handa	c96fc2d85f	signal: make oom_flags a bool Currently the size of "struct signal_struct"->oom_flags member is sizeof(unsigned) bytes, but only one flag OOM_FLAG_ORIGIN which is updated by current thread is defined. We can convert OOM_FLAG_ORIGIN into a bool, and reuse the saved bytes for updating from the OOM killer and/or the OOM reaper thread. By the way, do we care about a race window between run_store() and swapoff() because it would be theoretically possible that two threads sharing the "struct signal_struct" concurrently call respective functions? If we care, we can make oom_flags an atomic_t. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-05-23 17:04:14 -07:00
Oleg Nesterov	91c4e8ea8f	wait: allow sys_waitid() to accept __WNOTHREAD/__WCLONE/__WALL I see no reason why waitid() can't support other linux-specific flags allowed in sys_wait4(). In particular this change can help if we reconsider the previous change ("wait/ptrace: assume __WALL if the child is traced") which adds the "automagical" __WALL for debugger. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Jan Kratochvil <jan.kratochvil@redhat.com> Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> Cc: Pedro Alves <palves@redhat.com> Cc: Roland McGrath <roland@hack.frob.com> Cc: <syzkaller@googlegroups.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-05-23 17:04:14 -07:00
Oleg Nesterov	bf959931dd	wait/ptrace: assume __WALL if the child is traced The following program (simplified version of generated by syzkaller) #include <pthread.h> #include <unistd.h> #include <sys/ptrace.h> #include <stdio.h> #include <signal.h> void thread_func(void arg) { ptrace(PTRACE_TRACEME, 0,0,0); return 0; } int main(void) { pthread_t thread; if (fork()) return 0; while (getppid() != 1) ; pthread_create(&thread, NULL, thread_func, NULL); pthread_join(thread, NULL); return 0; } creates an unreapable zombie if /sbin/init doesn't use __WALL. This is not a kernel bug, at least in a sense that everything works as expected: debugger should reap a traced sub-thread before it can reap the leader, but without __WALL/__WCLONE do_wait() ignores sub-threads. Unfortunately, it seems that /sbin/init in most (all?) distributions doesn't use it and we have to change the kernel to avoid the problem. Note also that most init's use sys_waitid() which doesn't allow __WALL, so the necessary user-space fix is not that trivial. This patch just adds the "ptrace" check into eligible_child(). To some degree this matches the "tsk->ptrace" in exit_notify(), ->exit_signal is mostly ignored when the tracee reports to debugger. Or WSTOPPED, the tracer doesn't need to set this flag to wait for the stopped tracee. This obviously means the user-visible change: __WCLONE and __WALL no longer have any meaning for debugger. And I can only hope that this won't break something, but at least strace/gdb won't suffer. We could make a more conservative change. Say, we can take __WCLONE into account, or !thread_group_leader(). But it would be nice to not complicate these historical/confusing checks. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Jan Kratochvil <jan.kratochvil@redhat.com> Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> Cc: Pedro Alves <palves@redhat.com> Cc: Roland McGrath <roland@hack.frob.com> Cc: <syzkaller@googlegroups.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-05-23 17:04:14 -07:00
Ryusuke Konishi	076a378ba6	nilfs2: fix block comments This fixes block comments with proper formatting to eliminate the following checkpatch.pl warnings: "WARNING: Block comments use * on subsequent lines" "WARNING: Block comments use a trailing */ on a separate line" Link: http://lkml.kernel.org/r/1462886671-3521-8-git-send-email-konishi.ryusuke@lab.ntt.co.jp Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-05-23 17:04:14 -07:00
Ryusuke Konishi	80d6505232	nilfs2: remove loops of single statement macros This fixes checkpatch.pl warning "WARNING: Single statement macros should not use a do {} while (0) loop". Link: http://lkml.kernel.org/r/1462886671-3521-7-git-send-email-konishi.ryusuke@lab.ntt.co.jp Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-05-23 17:04:14 -07:00
Ryusuke Konishi	7f00184e9c	nilfs2: remove unnecessary else after return or break This fixes the checkpatch.pl warning that suggests else is not generally useful after a break or return. Link: http://lkml.kernel.org/r/1462886671-3521-6-git-send-email-konishi.ryusuke@lab.ntt.co.jp Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-05-23 17:04:14 -07:00
Ryusuke Konishi	0c6c44cb9f	nilfs2: avoid bare use of 'unsigned' This fixes checkpatch.pl warning "WARNING: Prefer 'unsigned int' to bare use of 'unsigned'". Link: http://lkml.kernel.org/r/1462886671-3521-5-git-send-email-konishi.ryusuke@lab.ntt.co.jp Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-05-23 17:04:14 -07:00
Ryusuke Konishi	7592ecde65	nilfs2: fix code indent coding style issue This fixes checkpatch.pl warning "WARNING: suspect code indent for conditional statements". Link: http://lkml.kernel.org/r/1462886671-3521-4-git-send-email-konishi.ryusuke@lab.ntt.co.jp Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-05-23 17:04:14 -07:00
Ryusuke Konishi	c9cb9b5c85	nilfs2: remove space before semicolon This fixes the checkpatch.pl warning "WARNING: space prohibited before semicolon" at nilfs_store_magic_and_option(). Link: http://lkml.kernel.org/r/1462886671-3521-3-git-send-email-konishi.ryusuke@lab.ntt.co.jp Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-05-23 17:04:14 -07:00
Ryusuke Konishi	06f4abf6ca	nilfs2: do not emit extra newline on nilfs_warning() and nilfs_error() This updates call sites of nilfs_warning() and nilfs_error() so that they don't add a duplicate newline. These output functions are already designed to add a trailing newline to the message. Link: http://lkml.kernel.org/r/1462886671-3521-2-git-send-email-konishi.ryusuke@lab.ntt.co.jp Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-05-23 17:04:14 -07:00
Ryusuke Konishi	facb9ec5e6	nilfs2: clean trailing semicolons in macros Remove trailing semicolons from macros, as suggested by checkpatch.pl. Link: http://lkml.kernel.org/r/1461935747-10380-12-git-send-email-konishi.ryusuke@lab.ntt.co.jp [konishi.ryusuke@lab.ntt.co.jp: fix style issues] Link: http://lkml.kernel.org/r/20160509.231703.1481729973362188932.konishi.ryusuke@lab.ntt.co.jp Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-05-23 17:04:14 -07:00
Ryusuke Konishi	4ad364ca1c	nilfs2: add missing line spacing Clean up checkpatch.pl warnings "WARNING: Missing a blank line after declarations" from nilfs2. Link: http://lkml.kernel.org/r/1461935747-10380-11-git-send-email-konishi.ryusuke@lab.ntt.co.jp Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-05-23 17:04:14 -07:00
Ryusuke Konishi	e7a142aaa0	nilfs2: replace __attribute__((packed)) with __packed This fixes the following checkpatch.pl warning: WARNING: __packed is preferred over __attribute__((packed)) #23: FILE: export.h:23: +} __attribute__ ((packed)); Link: http://lkml.kernel.org/r/1461935747-10380-10-git-send-email-konishi.ryusuke@lab.ntt.co.jp Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-05-23 17:04:14 -07:00
Ryusuke Konishi	2d19961d83	nilfs2: move cleanup code of metadata file from inode routines Refactor nilfs_clear_inode() and nilfs_i_callback() so that cleanup code or resource deallocation related to metadata file will be moved out to mdt.c. Link: http://lkml.kernel.org/r/1461935747-10380-9-git-send-email-konishi.ryusuke@lab.ntt.co.jp Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-05-23 17:04:14 -07:00
Ryusuke Konishi	24e20ead2f	nilfs2: get rid of nilfs_mdt_mark_block_dirty() nilfs_mdt_mark_block_dirty() can be replaced with primary functions like nilfs_mdt_get_block() and mark_buffer_dirty(), and it's used only by nilfs_ioctl_mark_blocks_dirty(). This gets rid of the function to simplify the interface of metadata file. Link: http://lkml.kernel.org/r/1461935747-10380-8-git-send-email-konishi.ryusuke@lab.ntt.co.jp Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-05-23 17:04:14 -07:00
Ryusuke Konishi	756cbdb353	nilfs2: clarify permission to replicate the design To respond to a certain developer's request, this explicitly state that developers can reimplement the nilfs2 design for other operating systems to share data stored in that format. Link: http://lkml.kernel.org/r/1461935747-10380-7-git-send-email-konishi.ryusuke@lab.ntt.co.jp Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-05-23 17:04:14 -07:00

1 2 3 4 5 ...

599141 Commits