Commit 2dabb32484 ("Btrfs: Direct I/O read: Work on sectorsized blocks")
introduced this bug during iterating bio pages in dio read's endio hook,
and it could end up with segment fault of the dio reading task.
So the reason is 'if (nr_sectors--)', and it makes the code assume that
there is one more block in the same page, so page offset is increased and
the bio which is created to repair the bad block then has an incorrect
bvec.bv_offset, and a later access of the page content would throw a
segmentation fault.
This also adds ASSERT to check page offset against page size.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When doing directIO repair, we have this oops:
[ 1458.532816] general protection fault: 0000 [#1] SMP
...
[ 1458.536291] Workqueue: btrfs-endio-repair btrfs_endio_repair_helper [btrfs]
[ 1458.536893] task: ffff88082a42d100 task.stack: ffffc90002b3c000
[ 1458.537499] RIP: 0010:btrfs_retry_endio+0x7e/0x1a0 [btrfs]
...
[ 1458.543261] Call Trace:
[ 1458.543958] ? rcu_read_lock_sched_held+0xc4/0xd0
[ 1458.544374] bio_endio+0xed/0x100
[ 1458.544750] end_workqueue_fn+0x3c/0x40 [btrfs]
[ 1458.545257] normal_work_helper+0x9f/0x900 [btrfs]
[ 1458.545762] btrfs_endio_repair_helper+0x12/0x20 [btrfs]
[ 1458.546224] process_one_work+0x34d/0xb70
[ 1458.546570] ? process_one_work+0x29e/0xb70
[ 1458.546938] worker_thread+0x1cf/0x960
[ 1458.547263] ? process_one_work+0xb70/0xb70
[ 1458.547624] kthread+0x17d/0x180
[ 1458.547909] ? kthread_create_on_node+0x70/0x70
[ 1458.548300] ret_from_fork+0x31/0x40
It turns out that btrfs_retry_endio is trying to get inode from a directIO
page.
This fixes the problem by using the saved inode pointer, done->inode.
btrfs_retry_endio_nocsum has the same problem, and it's fixed as well.
Also cleanup unused @start (which is too trivial for a separate patch).
Cc: David Sterba <dsterba@suse.cz>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The opposite case was already handled right in the very next switch entry.
And also when turning on nossd, drop ssd_spread.
Reported-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com>
Signed-off-by: Adam Borowski <kilobyte@angband.pl>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
On SPARC, the udl driver filled my kernel log with these messages:
[186668.910612] Kernel unaligned access at TPC[76609c] udl_render_hline+0x13c/0x3a0
Use put_unaligned_be16 to avoid them. On x86 this results in the same
code, but on SPARC the compiler emits two single-byte stores.
Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Acked-by: David Airlie <airlied@linux.ie>
Signed-off-by: Sean Paul <seanpaul@chromium.org>
Link: http://patchwork.freedesktop.org/patch/msgid/20170407200229.20642-1-j.neuschaefer@gmx.net
Only call synchronize_rcu_expedited after unlocking struct_mutex to
avoid deadlock because the workqueues depend on struct_mutex.
>From original patch by Andrea:
synchronize_rcu/synchronize_sched/synchronize_rcu_expedited() will
hang until its own workqueues are run. The i915 gem workqueues will
wait on the struct_mutex to be released. So we cannot wait for a
quiescent state using those rcu primitives while holding the
struct_mutex or it creates a circular lock dependency resulting in
kernel hangs (which is reproducible but goes undetected by lockdep).
kswapd0 D 0 700 2 0x00000000
Call Trace:
? __schedule+0x1a5/0x660
? schedule+0x36/0x80
? _synchronize_rcu_expedited.constprop.65+0x2ef/0x300
? wake_up_bit+0x20/0x20
? rcu_stall_kick_kthreads.part.54+0xc0/0xc0
? rcu_exp_wait_wake+0x530/0x530
? i915_gem_shrink+0x34b/0x4b0
? i915_gem_shrinker_scan+0x7c/0x90
? i915_gem_shrinker_scan+0x7c/0x90
? shrink_slab.part.61.constprop.72+0x1c1/0x3a0
? shrink_zone+0x154/0x160
? kswapd+0x40a/0x720
? kthread+0xf4/0x130
? try_to_free_pages+0x450/0x450
? kthread_create_on_node+0x40/0x40
? ret_from_fork+0x23/0x30
plasmashell D 0 4657 4614 0x00000000
Call Trace:
? __schedule+0x1a5/0x660
? schedule+0x36/0x80
? schedule_preempt_disabled+0xe/0x10
? __mutex_lock.isra.4+0x1c9/0x790
? i915_gem_close_object+0x26/0xc0
? i915_gem_close_object+0x26/0xc0
? drm_gem_object_release_handle+0x48/0x90
? drm_gem_handle_delete+0x50/0x80
? drm_ioctl+0x1fa/0x420
? drm_gem_handle_create+0x40/0x40
? pipe_write+0x391/0x410
? __vfs_write+0xc6/0x120
? do_vfs_ioctl+0x8b/0x5d0
? SyS_ioctl+0x3b/0x70
? entry_SYSCALL_64_fastpath+0x13/0x94
kworker/0:0 D 0 29186 2 0x00000000
Workqueue: events __i915_gem_free_work
Call Trace:
? __schedule+0x1a5/0x660
? schedule+0x36/0x80
? schedule_preempt_disabled+0xe/0x10
? __mutex_lock.isra.4+0x1c9/0x790
? del_timer_sync+0x44/0x50
? update_curr+0x57/0x110
? __i915_gem_free_objects+0x31/0x300
? __i915_gem_free_objects+0x31/0x300
? __i915_gem_free_work+0x2d/0x40
? process_one_work+0x13a/0x3b0
? worker_thread+0x4a/0x460
? kthread+0xf4/0x130
? process_one_work+0x3b0/0x3b0
? kthread_create_on_node+0x40/0x40
? ret_from_fork+0x23/0x30
Fixes: 3d3d18f086 ("drm/i915: Avoid rcu_barrier() from reclaim paths (shrinker)")
Reported-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
(cherry picked from commit 8f612d0551)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
i915 is currently doing a full GPU reset at the end of
i915_gem_suspend() followed by GuC suspend in i915_drm_suspend(). This
GPU reset clobbers the GuC, causing the suspend request to then fail,
leaving the GuC in an undefined state. We need to tell the GuC to
suspend before we do the direct intel_gpu_reset().
v2: Commit message update. (Chris, Daniele)
Fixes: 1c777c5d1d ("drm/i915/hsw: Fix GPU hang during resume from S3-devices state")
Cc: Jeff McGee <jeff.mcgee@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1491387710-20553-1-git-send-email-sagar.a.kamble@intel.com
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
(cherry picked from commit fd08923384)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
After commit 47c950d102 ("pinctrl: cherryview: Do not add all
southwest and north GPIOs to IRQ domain") the driver does not add all
GPIOs to the irqdomain. The reason for that is that those GPIOs cannot
generate IRQs at all, only GPEs (General Purpose Events). This causes
Linux virtual IRQ numbering to change.
However, it seems some CYAN Chromebooks, including Acer Chromebook
hardcodes these Linux IRQ numbers in the ACPI tables of the machine.
Since the numbering is different now, the IRQ meant for keyboard does
not match the Linux virtual IRQ number anymore making the keyboard
non-functional.
Work this around by adding special quirk just for these machines where
we add back all GPIOs to the irqdomain. Rest of the Cherryview/Braswell
based machines will not be affected by the change.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=194945
Fixes: 47c950d102 ("pinctrl: cherryview: Do not add all southwest and north GPIOs to IRQ domain")
Reported-by: Adam S Levy <theadamlevy@gmail.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
This fixes Continuous Availability when errors during
file reopen are encountered.
cifs_user_readv and cifs_user_writev would wait for ever if
results of cifs_reopen_file are not stored and for later inspection.
In fact, results are checked and, in case of errors, a chain
of function calls leading to reads and writes to be scheduled in
a separate thread is skipped.
These threads will wake up the corresponding waiters once reads
and writes are done.
However, given the return value is not stored, when rc is checked
for errors a previous one (always zero) is inspected instead.
This leads to pending reads/writes added to the list, making
cifs_user_readv and cifs_user_writev wait for ever.
Signed-off-by: Germano Percossi <germano.percossi@citrix.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <smfrench@gmail.com>
STATUS_BAD_NETWORK_NAME can be received during node failover,
causing the flag to be set and making the reconnect thread
always unsuccessful, thereafter.
Once the only place where it is set is removed, the remaining
bits are rendered moot.
Removing it does not prevent "mount" from failing when a non
existent share is passed.
What happens when the share really ceases to exist while the
share is mounted is undefined now as much as it was before.
Signed-off-by: Germano Percossi <germano.percossi@citrix.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <smfrench@gmail.com>
In case of error, smb2_reconnect_server reschedule itself
with a delay, to avoid being too aggressive.
Signed-off-by: Germano Percossi <germano.percossi@citrix.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <smfrench@gmail.com>
Commit 1a967d6c9b ("correctly to
anonymous authentication for the NTLM(v2) authentication") introduces
a regression in handling errors related to attempting a guest
connection to a Windows share which requires authentication. This
should result in a permission denied error but actually causes the
kernel module to enter a never-ending loop trying to follow a DFS
referal which doesn't exist.
The base cause of this is the failure now occurs later in the process
during tree connect and not at the session setup setup and all errors
in tree connect are interpreted as needing to follow the DFS paths
which isn't in this case correct. So, check the returned error against
EACCES and fail if this is returned error.
Feedback from Aurelien:
PS> net user guest /activate:no
PS> mkdir C:\guestshare
PS> icacls C:\guestshare /grant 'Everyone:(OI)(CI)F'
PS> new-smbshare -name guestshare -path C:\guestshare -fullaccess Everyone
I've tested v3.10, v4.4, master, master+your patch using default options
(empty or no user "NU") and user=abc (U).
NT_LOGON_FAILURE in session setup: LF
This is what you seem to have in 3.10.
NT_ACCESS_DENIED in tree connect to the share: AD
This is what you get before your infinite loop.
| NU U
--------------------------------
3.10 | LF LF
4.4 | LF LF
master | AD LF
master+patch | AD LF
No infinite DFS loop :(
All these issues result in mount failing very fast with permission denied.
I guess it could be from either the Windows version or the share/folder
ACL. A deeper analysis of the packets might reveal more.
In any case I did not notice any issues for on a basic DFS setup with
the patch so I don't think it introduced any regressions, which is
probably all that matters. It still bothers me a little I couldn't hit
the bug.
I've included kernel output w/ debugging output and network capture of
my tests if anyone want to have a look at it. (master+patch = ml-guestfix).
Signed-off-by: Mark Syms <mark.syms@citrix.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Tested-by: Aurelien Aptel <aaptel@suse.com>
Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Currently during receiving a read response mid->resp_buf can be
NULL when it is being passed to cifs_discard_remaining_data() from
cifs_readv_discard(). Fix it by always passing server->smallbuf
instead and initializing mid->resp_buf at the end of read response
processing.
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Acked-by: Sachin Prabhu <sprabhu@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Run this:
touch file0
for ((; ;))
{
mount -t cpuset xxx file0
}
And this concurrently:
touch file1
for ((; ;))
{
mount -t cpuset xxx file1
}
We'll trigger a warning like this:
------------[ cut here ]------------
WARNING: CPU: 1 PID: 4675 at lib/percpu-refcount.c:317 percpu_ref_kill_and_confirm+0x92/0xb0
percpu_ref_kill_and_confirm called more than once on css_release!
CPU: 1 PID: 4675 Comm: mount Not tainted 4.11.0-rc5+ #5
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
Call Trace:
dump_stack+0x63/0x84
__warn+0xd1/0xf0
warn_slowpath_fmt+0x5f/0x80
percpu_ref_kill_and_confirm+0x92/0xb0
cgroup_kill_sb+0x95/0xb0
deactivate_locked_super+0x43/0x70
deactivate_super+0x46/0x60
...
---[ end trace a79f61c2a2633700 ]---
Here's a race:
Thread A Thread B
cgroup1_mount()
# alloc a new cgroup root
cgroup_setup_root()
cgroup1_mount()
# no sb yet, returns NULL
kernfs_pin_sb()
# but succeeds in getting the refcnt,
# so re-use cgroup root
percpu_ref_tryget_live()
# alloc sb with cgroup root
cgroup_do_mount()
cgroup_kill_sb()
# alloc another sb with same root
cgroup_do_mount()
cgroup_kill_sb()
We end up using the same cgroup root for two different superblocks,
so percpu_ref_kill() will be called twice on the same root when the
two superblocks are destroyed.
We should fix to make sure the superblock pinning is really successful.
Cc: stable@vger.kernel.org # 3.16+
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Zefan Li <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
virtio-pci registers a per-vq affinity hint when using MSIX,
but fails to remove it when freeing the interrupt, resulting
in this type of splat:
[ 31.111202] WARNING: CPU: 0 PID: 2823 at kernel/irq/manage.c:1503 __free_irq+0x2c4/0x2c8
[ 31.114689] Modules linked in:
[ 31.116101] CPU: 0 PID: 2823 Comm: kexec Not tainted 4.10.0+ #6941
[ 31.118911] Hardware name: Generic DT based system
[ 31.121319] [<c022fb78>] (unwind_backtrace) from [<c0229d8c>] (show_stack+0x18/0x1c)
[ 31.125017] [<c0229d8c>] (show_stack) from [<c05192f4>] (dump_stack+0x84/0x98)
[ 31.128427] [<c05192f4>] (dump_stack) from [<c023d940>] (__warn+0xf4/0x10c)
[ 31.131910] [<c023d940>] (__warn) from [<c023da20>] (warn_slowpath_null+0x28/0x30)
[ 31.135543] [<c023da20>] (warn_slowpath_null) from [<c0290238>] (__free_irq+0x2c4/0x2c8)
[ 31.139355] [<c0290238>] (__free_irq) from [<c02902d0>] (free_irq+0x44/0x78)
[ 31.142909] [<c02902d0>] (free_irq) from [<c059d3a8>] (vp_del_vqs+0x68/0x1c0)
[ 31.146299] [<c059d3a8>] (vp_del_vqs) from [<c056ca4c>] (pci_device_shutdown+0x3c/0x78)
The obvious fix is to drop the affinity hint before freeing the
interrupt.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This reverts commit 5c34d002dc.
Conflicts:
drivers/virtio/virtio_pci_common.c
The cleanup seems to be one of the changes that broke
hybernation for some users. We are still not sure why
but revert helps.
This reverts the cleanup changes but keeps the affinity support.
Tested-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This reverts commit 07ec51480b.
Conflicts:
drivers/virtio/virtio_pci_common.c
Unfortunately the idea does not work with threadirqs
as more than 32 queues can then map to a single interrupts.
Further, the cleanup seems to be one of the changes that broke
hybernation for some users. We are still not sure why
but revert helps.
This reverts the cleanup changes but keeps the affinity support.
Tested-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This reverts commit 53a020c661.
The cleanup seems to be one of the changes that broke
hybernation for some users. We are still not sure why
but revert helps.
Tested-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This reverts commit 52a6151612.
Conflicts:
drivers/virtio/virtio_pci_common.c
The cleanup seems to be one of the changes that broke
hybernation for some users. We are still not sure why
but revert helps.
This reverts the cleanup changes but keeps the affinity support.
Tested-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This reverts commit de85ec8b07.
Follow-up patches will revert 07ec51480b ("virtio_pci: use shared
interrupts for virtqueues") that triggered the problem so no need for
this one anymore.
Tested-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The retry queue is intended to provide a temporary buffer in the case
of transient errors when communicating with auditd, it is not meant
as a long life queue, that functionality is provided by the hold
queue.
This patch fixes a problem identified by Seth where the retry queue
could grow uncontrollably if an auditd instance did not connect to
the kernel to drain the queues. This commit fixes this by doing the
following:
* Make sure we always call auditd_reset() if we decide the connection
with audit is really dead. There were some cases in
kauditd_hold_skb() where we did not reset the connection, this patch
relocates the reset calls to kauditd_thread() so all the error
conditions are caught and the connection reset. As a side effect,
this means we could move auditd_reset() and get rid of the forward
definition at the top of kernel/audit.c.
* We never checked the status of the auditd connection when
processing the main audit queue which meant that the retry queue
could grow unchecked. This patch adds a call to auditd_reset()
after the main queue has been processed if auditd is not connected,
the auditd_reset() call will make sure the retry and hold queues are
correctly managed/flushed so that the retry queue remains reasonable.
Cc: <stable@vger.kernel.org> # 4.10.x-: 5b52330bbf
Reported-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
The commit 1259feddd0f8("pinctrl: samsung: Fix the width of
PINCFG_TYPE_DRV bitfields for Exynos5433") already fixed
the different width of PINCFG_TYPE_DRV from previous Exynos SoC.
However wrong merge conflict resolution was chosen in commit
7f36f5d11c ("Merge tag 'v4.10-rc6' into devel") effectively dropping
the changes for PINCFG_TYPE_DRV. Re-do them here.
The macro EXYNOS_PIN_BANK_EINTW is no longer used so remove it.
Fixes: 7f36f5d11c ("Merge tag 'v4.10-rc6' into devel")
Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Pull CIFS fixes from Steve French:
"This is a set of CIFS/SMB3 fixes for stable.
There is another set of four SMB3 reconnect fixes for stable in
progress but they are still being reviewed/tested, so didn't want to
wait any longer to send these five below"
* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
Reset TreeId to zero on SMB2 TREE_CONNECT
CIFS: Fix build failure with smb2
Introduce cifs_copy_file_range()
SMB3: Rename clone_range to copychunk_range
Handle mismatched open calls
Pull ARM fixes from Russell King:
"A number of ARM fixes:
- prevent oopses caused by dma_get_sgtable() and declared DMA
coherent memory
- fix boot failure on nommu caused by ID_PFR1 access
- a number of kprobes fixes from Jon Medhurst and Masami Hiramatsu"
* 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: 8665/1: nommu: access ID_PFR1 only if CPUID scheme
ARM: dma-mapping: disallow dma_get_sgtable() for non-kernel managed memory
arm: kprobes: Align stack to 8-bytes in test code
arm: kprobes: Fix the return address of multiple kretprobes
arm: kprobes: Skip single-stepping in recursing path if possible
arm: kprobes: Allow to handle reentered kprobe on single-stepping
Here are 3 small fixes for 4.11-rc6. One resolves a reported issue with
sysfs files that NeilBrown found, one is a documenatation fix for the
stable kernel rules, and the last is a small MAINTAINERS file update for
kernfs.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWOnrMw8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+yk/JQCfQKjOpGDAR9Hs6u4YQ4hJrAHFneYAn1F4MLDW
3b0ZMnlZHkDq834UwKnB
=iiei
-----END PGP SIGNATURE-----
Merge tag 'driver-core-4.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core fixes from Greg KH:
"Here are 3 small fixes for 4.11-rc6.
One resolves a reported issue with sysfs files that NeilBrown found,
one is a documenatation fix for the stable kernel rules, and the last
is a small MAINTAINERS file update for kernfs"
* tag 'driver-core-4.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
MAINTAINERS: separate out kernfs maintainership
sysfs: be careful of error returns from ops->show()
Documentation: stable-kernel-rules: fix stable-tag format
Here are a number of small IIO and staging driver fixes for 4.11-rc6.
Nothing big here, just iio fixes for reported issues, and an ashmem fix
for a very old bug that has been reported by a number of Android
vendors.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWOnsZA8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ymb4ACfSnGU4ndDTKoyTaJ7B/ZO/RF5lZUAni9d3kYF
3Ztp0ssmF8PBNvQhyIs0
=aeZf
-----END PGP SIGNATURE-----
Merge tag 'staging-4.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging/IIO driver rfixes from Greg KH:
"Here are a number of small IIO and staging driver fixes for 4.11-rc6.
Nothing big here, just iio fixes for reported issues, and an ashmem
fix for a very old bug that has been reported by a number of Android
vendors"
* tag 'staging-4.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: android: ashmem: lseek failed due to no FMODE_LSEEK.
iio: hid-sensor-attributes: Fix sensor property setting failure.
iio: accel: hid-sensor-accel-3d: Fix duplicate scan index error
iio: core: Fix IIO_VAL_FRACTIONAL_LOG2 for negative values
iio: st_pressure: initialize lps22hb bootime
iio: bmg160: reset chip when probing
iio: cros_ec_sensors: Fix return value to get raw and calibbias data.
Pull VFS fixes from Al Viro:
"statx followup fixes and a fix for stack-smashing on alpha"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
alpha: fix stack smashing in old_adjtimex(2)
statx: Include a mask for stx_attributes in struct statx
statx: Reserve the top bit of the mask for future struct expansion
xfs: report crtime and attribute flags to statx
ext4: Add statx support
statx: optimize copy of struct statx to userspace
statx: remove incorrect part of vfs_statx() comment
statx: reject unknown flags when using NULL path
Documentation/filesystems: fix documentation for ->getattr()
Pull block fixes from Jens Axboe:
"Here's a pull request for 4.11-rc, fixing a set of issues mostly
centered around the new scheduling framework. These have been brewing
for a while, but split up into what we absolutely need in 4.11, and
what we can defer until 4.12. These are well tested, on both single
queue and multiqueue setups, and with and without shared tags. They
fix several hangs that have happened in testing.
This is obviously larger than I would have preferred at this point in
time, but I don't think we can shave much off this and still get the
desired results.
In detail, this pull request contains:
- a set of five fixes for NVMe, mostly from Christoph and one from
Roland.
- a series from Bart, fixing issues with dm-mq and SCSI shared tags
and scheduling. Note that one of those patches commit messages may
read like an optimization, but it is in fact an important fix for
queue restarts in particular.
- a series from Omar, most importantly fixing a hang with multiple
hardware queues when we fail to get a driver tag. Another important
fix in there is for resizing hardware queues, which nbd does when
handling multiple sockets for one connection.
- fixing an imbalance in putting the ctx for hctx request allocations
from Minchan"
* 'for-linus' of git://git.kernel.dk/linux-block:
blk-mq: Restart a single queue if tag sets are shared
dm rq: Avoid that request processing stalls sporadically
scsi: Avoid that SCSI queues get stuck
blk-mq: Introduce blk_mq_delay_run_hw_queue()
blk-mq: remap queues when adding/removing hardware queues
blk-mq-sched: fix crash in switch error path
blk-mq-sched: set up scheduler tags when bringing up new queues
blk-mq-sched: refactor scheduler initialization
blk-mq: use the right hctx when getting a driver tag fails
nvmet: fix byte swap in nvmet_parse_io_cmd
nvmet: fix byte swap in nvmet_execute_write_zeroes
nvmet: add missing byte swap in nvmet_get_smart_log
nvme: add missing byte swap in nvme_setup_discard
nvme: Correct NVMF enum values to match NVMe-oF rev 1.0
block: do not put mq context in blk_mq_alloc_request_hctx
An issue was detected with pin control hos on the Freescale i.MX after
the refactorings for more general group and function handling. We now
have the proper fix for this.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJY6LkVAAoJEEEQszewGV1znr0P/17ltjCxoR9qYkMsreCs6FIk
BSCx2UEmYt03WKizyj1M1/YKP2NYcngp8TXsRsMyi7vqMjVoL1BsPo8BjFGNI7lq
znLyUWuP3xo9Y/naagxkfLw5TbfNF4hyL0JBchvg6ox1Kt7Z47Sed7KDXtB5QQdJ
WbU4Hdo6ZG/nvl3LAc1wivF3qtnBsxIzx6CMiR2dyiOmLGADHj7jiJ70BuRMyTlo
4no0Cfm93lnPo1ccNMVZY2Rqt09XhwPppewL7j2IqOin/Kr88qWKwdOheCu/Ojsp
GJfTgKjVpieKW2PjkIiDDSiTKKkUvVmzEQz+qqXozjQSwwKtJ106xZ8fW+d5xFeY
EJ3jsQtKdmI3q7M0mbYpfK0vM9C1MKMg71CJt8pvbtg2NXfAfLsA9BioVOGKrOua
upy6RCMDhoBRh4jRjd5DcJPKRq45m/toVSZ+tfS1Nur2k3tXd41CI3y6D+wUlz95
oq8QW2bWsC52vLXS6qywJkUM7CQiBs61FIryf84YC7mE4AqRFJpCZfBqrUYLkctN
5OHF++wu6tEXYfgR6rtWY+c26xgc6PK/rALtYvzDC4o72Z0xQLlQqFnf6hGAp3Dl
eosuW5TUvnlFUEMF3CEQwVHj3awpgdo6X4UnYDIxZDRU4R/vODH46s1H719TMIWx
ZBztLllUHpn57LVRvudT
=06og
-----END PGP SIGNATURE-----
Merge tag 'pinctrl-v4.11-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pin control fix from Linus Walleij:
"This late fix for pin control is hopefully the last I send this cycle.
The problem was detected early in the v4.11 release cycle and there
has been some back and forth on how to solve it. Sadly the proper fix
arrives late, but at least not too late.
An issue was detected with pin control on the Freescale i.MX after the
refactorings for more general group and function handling.
We now have the proper fix for this"
* tag 'pinctrl-v4.11-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: core: Fix pinctrl_register_and_init() with pinctrl_enable()
Headed to stable:
- disable HFSCR[TM] if TM is not supported, fixes a potential host kernel crash
triggered by a hostile guest, but only in configurations that no one uses
- don't try to fix up misaligned load-with-reservation instructions
- fix flush_(d|i)cache_range() called from modules on little endian kernels
- add missing global TLB invalidate if cxl is active
- fix missing preempt_disable() in crc32c-vpmsum
And a fix for selftests build changes that went in this release:
- selftests/powerpc: Fix standalone powerpc build
Thanks to:
Benjamin Herrenschmidt, Frederic Barrat, Oliver O'Halloran, Paul Mackerras.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJY6LIKAAoJEFHr6jzI4aWAhfcQAKORHx/tJf9w8KqcfSfKfeEL
O8cZEl5/N3ArNXVM5J5QK5KnMVHnoWWR3FWYwntOjt3RJywjJYJ02YvhOVvt4q+M
YinRS34KzAhnT1f526zx97v0BGqi//UJamrcFBUBTd4rLuHGbol7fdtWHVrsMYa0
KWQ+ooPLEpGDk4I3sDz37yeJBQXVpyhC/UF8vzHpvHGPvIQ8Dw8rfWwOZ0HooJuZ
ewKdkeIsYF8SrM461c1GhOI0VXB0q+CMn9mzIaEKMuZMhHDKyiaM5rm8mWXapzcT
HsCQKlF9X9YHAbhbSbz9DGvNCEYaW7T4vnudSNHjQaAJlA4HsmeRwWXy4+zqZuPc
rIbRIFZAyV3wYowN7j3P6Se3lLBDMmlHZvVkygJnwoaR4rmoujePGwdAv8ZH4Udn
hrbieC41HKVxcm5t3whIDOcHmxaAo1MDqmrVhyxJSjgnkdBtN/gnZXvHDb0VeOJV
9wFGGE8WvMXnTKEcjM2l+a14CuOrV/wRbHQ1B1O0Kfk613cPrukMYab6eLPqyJzF
lmkCm1o46bib5oBOmvlqK+5oVuwNyfHmJSzvL+VOylhLVbJPmFJUhHQFssCvsTUf
k36ZAUxH4fbz1TzAPipXl+wrkE/yzthGmA9FTC9hLkYE/rzvrZt9IKowFw1mq5n/
2zFabXQBl5JBQ4hdL54f
=bTuf
-----END PGP SIGNATURE-----
Merge tag 'powerpc-4.11-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"Some more powerpc fixes for 4.11:
Headed to stable:
- disable HFSCR[TM] if TM is not supported, fixes a potential host
kernel crash triggered by a hostile guest, but only in
configurations that no one uses
- don't try to fix up misaligned load-with-reservation instructions
- fix flush_(d|i)cache_range() called from modules on little endian
kernels
- add missing global TLB invalidate if cxl is active
- fix missing preempt_disable() in crc32c-vpmsum
And a fix for selftests build changes that went in this release:
- selftests/powerpc: Fix standalone powerpc build
Thanks to: Benjamin Herrenschmidt, Frederic Barrat, Oliver O'Halloran,
Paul Mackerras"
* tag 'powerpc-4.11-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/crypto/crc32c-vpmsum: Fix missing preempt_disable()
powerpc/mm: Add missing global TLB invalidate if cxl is active
powerpc/64: Fix flush_(d|i)cache_range() called from modules
powerpc: Don't try to fix up misaligned load-with-reservation instructions
powerpc: Disable HFSCR[TM] if TM is not supported
selftests/powerpc: Fix standalone powerpc build
In the case that compat_get_bitmap fails we do not want to copy the
bitmap to the user as it will contain uninitialized stack data and leak
sensitive data.
Signed-off-by: Chris Salls <salls@cs.ucsb.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently, inputting the following command will succeed but actually the
value will be truncated:
# echo 0x12ffffffff > /proc/sys/net/ipv4/tcp_notsent_lowat
This is not friendly to the user, so instead, we should report error
when the value is larger than UINT_MAX.
Fixes: e7d316a02f ("sysctl: handle error writing UINT_MAX to u32 fields")
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Cc: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Separate out kernfs from driver core and add myself as a
co-maintainer.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ops->show() can return a negative error code.
Commit 65da3484d9 ("sysfs: correctly handle short reads on PREALLOC attrs.")
(in v4.4) caused this to be stored in an unsigned 'size_t' variable, so errors
would look like large numbers.
As a result, if an error is returned, sysfs_kf_read() will return the
value of 'count', typically 4096.
Commit 17d0774f80 ("sysfs: correctly handle read offset on PREALLOC attrs")
(in v4.8) extended this error to use the unsigned large 'len' as a size for
memmove().
Consequently, if ->show returns an error, then the first read() on the
sysfs file will return 4096 and could return uninitialized memory to
user-space.
If the application performs a subsequent read, this will trigger a memmove()
with extremely large count, and is likely to crash the machine is bizarre ways.
This bug can currently only be triggered by reading from an md
sysfs attribute declared with __ATTR_PREALLOC() during the
brief period between when mddev_put() deletes an mddev from
the ->all_mddevs list, and when mddev_delayed_delete() - which is
scheduled on a workqueue - completes.
Before this, an error won't be returned by the ->show()
After this, the ->show() won't be called.
I can reproduce it reliably only by putting delay like
usleep_range(500000,700000);
early in mddev_delayed_delete(). Then after creating an
md device md0 run
echo clear > /sys/block/md0/md/array_state; cat /sys/block/md0/md/array_state
The bug can be triggered without the usleep.
Fixes: 65da3484d9 ("sysfs: correctly handle short reads on PREALLOC attrs.")
Fixes: 17d0774f80 ("sysfs: correctly handle read offset on PREALLOC attrs")
Cc: stable@vger.kernel.org
Signed-off-by: NeilBrown <neilb@suse.com>
Acked-by: Tejun Heo <tj@kernel.org>
Reported-and-tested-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A patch documenting how to specify which kernels a particular fix should
be backported to (seemingly) inadvertently added a minus sign after the
kernel version. This particular stable-tag format had never been used
prior to this patch, and was neither present when the patch in question
was first submitted (it was added in v2 without any comment).
Drop the minus sign to avoid any confusion.
Fixes: fdc81b7910 ("stable_kernel_rules: Add clause about specification of kernel versions to patch.")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
vfs_llseek will check whether the file mode has
FMODE_LSEEK, no return failure. But ashmem can be
lseek, so add FMODE_LSEEK to ashmem file.
Comment From Greg Hackmann:
ashmem_llseek() passes the llseek() call through to the backing
shmem file. 91360b02ab ("ashmem: use vfs_llseek()") changed
this from directly calling the file's llseek() op into a VFS
layer call. This also adds a check for the FMODE_LSEEK bit, so
without that bit ashmem_llseek() now always fails with -ESPIPE.
Fixes: 91360b02ab ("ashmem: use vfs_llseek()")
Signed-off-by: Shuxiao Zhang <zhangshuxiao@xiaomi.com>
Tested-by: Greg Hackmann <ghackmann@google.com>
Cc: stable <stable@vger.kernel.org> # 3.18+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pull sparc fixes from David Miller:
"Several fixes here, mostly having to due with either build errors or
memory corruptions depending upon whether you have THP enabled or not"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc: remove unused wp_works_ok macro
sparc32: Export vac_cache_size to fix build error
sparc64: Fix memory corruption when THP is enabled
sparc64: Fix kernel panic due to erroneous #ifdef surrounding pmd_write()
arch/sparc: Avoid DCTI Couples
sparc64: kern_addr_valid regression
sparc64: Add support for 2G hugepages
sparc64: Fix size check in huge_pte_alloc
ARM:
- Fix a problem with GICv3 userspace save/restore
- Clarify GICv2 userspace save/restore ABI
- Be more careful in clearing GIC LRs
- Add missing synchronization primitive to our MMU handling code
PPC:
- Check for a NULL return from kzalloc
s390:
- Prevent translation exception errors on valid page tables for the
instruction-exection-protection support
x86:
- Fix Page-Modification Logging when running a nested guest
-----BEGIN PGP SIGNATURE-----
iQEcBAABCAAGBQJY5/X8AAoJEED/6hsPKofo8hQH/As3CbihZMysaK6JJTx5oMZw
b3W8p8xVXVu4dKM8WnXa6m5xBDFmOa7eBB+CtT3gP68XnFvMpr/vPmDv6v6i9p8q
7VyALDqqk2fxDmgHEwuETw9XZyuhdyCz/GaINCdnAJs25wTFOA7r0WEW5W8qRJpA
9nQirapdJcknymIch1JqeWlYYmbIaFzT8jItfA9QQ7F9mG4pxC8D1k2D56lNYwTf
FJIgXgkMPe7CPDXmgc/KqT5+iVsc/+SgzP/WdH6bX/007TV71sksxxfz6fIrao0X
RtcL2WIZTXBdSNrvXflHhCfYgogPgCnYp8AsYTIa+IEijcfteJx7UiET47Ne0Ow=
=/SPG
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Radim Krčmář:
"ARM:
- Fix a problem with GICv3 userspace save/restore
- Clarify GICv2 userspace save/restore ABI
- Be more careful in clearing GIC LRs
- Add missing synchronization primitive to our MMU handling code
PPC:
- Check for a NULL return from kzalloc
s390:
- Prevent translation exception errors on valid page tables for the
instruction-exection-protection support
x86:
- Fix Page-Modification Logging when running a nested guest"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: PPC: Book3S HV: Check for kmalloc errors in ioctl
KVM: nVMX: initialize PML fields in vmcs02
KVM: nVMX: do not leak PML full vmexit to L1
KVM: arm/arm64: vgic: Fix GICC_PMR uaccess on GICv3 and clarify ABI
KVM: arm64: Ensure LRs are clear when they should be
kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd
KVM: s390: remove change-recording override support
arm/arm64: KVM: Take mmap_sem in kvm_arch_prepare_memory_region
arm/arm64: KVM: Take mmap_sem in stage2_unmap_vm
Pull audit cleanup from Paul Moore:
"A week later than I had hoped, but as promised, here is the audit
uninline-fix we talked about during the last audit pull request.
The patch is slightly different than what we originally discussed as
it made more sense to keep the audit_signal_info() function in
auditsc.c rather than move it and bunch of other related
variables/definitions into audit.c/audit.h.
At some point in the future I need to look at how the audit code is
organized across kernel/audit*, I suspect we could do things a bit
better, but it doesn't seem like a -rc release is a good place for
that ;)
Regardless, this patch passes our tests without problem and looks good
for v4.11"
* 'stable-4.11' of git://git.infradead.org/users/pcmoore/audit:
audit: move audit_signal_info() into kernel/auditsc.c
Merge misc fixes from Andrew Morton:
"10 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm: move pcp and lru-pcp draining into single wq
mailmap: update Yakir Yang email address
mm, swap_cgroup: reschedule when neeed in swap_cgroup_swapoff()
dax: fix radix tree insertion race
mm, thp: fix setting of defer+madvise thp defrag mode
ptrace: fix PTRACE_LISTEN race corrupting task->state
vmlinux.lds: add missing VMLINUX_SYMBOL macros
mm/page_alloc.c: fix print order in show_free_areas()
userfaultfd: report actual registered features in fdinfo
mm: fix page_vma_mapped_walk() for ksm pages
We currently have 2 specific WQ_RECLAIM workqueues in the mm code.
vmstat_wq for updating pcp stats and lru_add_drain_wq dedicated to drain
per cpu lru caches. This seems more than necessary because both can run
on a single WQ. Both do not block on locks requiring a memory
allocation nor perform any allocations themselves. We will save one
rescuer thread this way.
On the other hand drain_all_pages() queues work on the system wq which
doesn't have rescuer and so this depend on memory allocation (when all
workers are stuck allocating and new ones cannot be created).
Initially we thought this would be more of a theoretical problem but
Hugh Dickins has reported:
: 4.11-rc has been giving me hangs after hours of swapping load. At
: first they looked like memory leaks ("fork: Cannot allocate memory");
: but for no good reason I happened to do "cat /proc/sys/vm/stat_refresh"
: before looking at /proc/meminfo one time, and the stat_refresh stuck
: in D state, waiting for completion of flush_work like many kworkers.
: kthreadd waiting for completion of flush_work in drain_all_pages().
This worker should be using WQ_RECLAIM as well in order to guarantee a
forward progress. We can reuse the same one as for lru draining and
vmstat.
Link: http://lkml.kernel.org/r/20170307131751.24936-1-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Suggested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@suse.de>
Tested-by: Yang Li <pku.leo@gmail.com>
Tested-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We got need_resched() warnings in swap_cgroup_swapoff() because
swap_cgroup_ctrl[type].length is particularly large.
Reschedule when needed.
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1704061315270.80559@chino.kir.corp.google.com
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
While running generic/340 in my test setup I hit the following race. It
can happen with kernels that support FS DAX PMDs, so v4.10 thru
v4.11-rc5.
Thread 1 Thread 2
-------- --------
dax_iomap_pmd_fault()
grab_mapping_entry()
spin_lock_irq()
get_unlocked_mapping_entry()
'entry' is NULL, can't call lock_slot()
spin_unlock_irq()
radix_tree_preload()
dax_iomap_pmd_fault()
grab_mapping_entry()
spin_lock_irq()
get_unlocked_mapping_entry()
...
lock_slot()
spin_unlock_irq()
dax_pmd_insert_mapping()
<inserts a PMD mapping>
spin_lock_irq()
__radix_tree_insert() fails with -EEXIST
<fall back to 4k fault, and die horribly
when inserting a 4k entry where a PMD exists>
The issue is that we have to drop mapping->tree_lock while calling
radix_tree_preload(), but since we didn't have a radix tree entry to
lock (unlike in the pmd_downgrade case) we have no protection against
Thread 2 coming along and inserting a PMD at the same index. For 4k
entries we handled this with a special-case response to -EEXIST coming
from the __radix_tree_insert(), but this doesn't save us for PMDs
because the -EEXIST case can also mean that we collided with a 4k entry
in the radix tree at a different index, but one that is covered by our
PMD range.
So, correctly handle both the 4k and 2M collision cases by explicitly
re-checking the radix tree for an entry at our index once we reacquire
mapping->tree_lock.
This patch has made it through a clean xfstests run with the current
v4.11-rc5 based linux/master, and it also ran generic/340 500 times in a
loop. It used to fail within the first 10 iterations.
Link: http://lkml.kernel.org/r/20170406212944.2866-1-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: <stable@vger.kernel.org> [4.10+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Setting thp defrag mode of "defer+madvise" actually sets "defer" in the
kernel due to the name similarity and the out-of-order way the string is
checked in defrag_store().
Check the string in the correct order so that
TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG is set appropriately for
"defer+madvise".
Fixes: 21440d7eb9 ("mm, thp: add new defer+madvise defrag option")
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1704051814420.137626@chino.kir.corp.google.com
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In PT_SEIZED + LISTEN mode STOP/CONT signals cause a wakeup against
__TASK_TRACED. If this races with the ptrace_unfreeze_traced at the end
of a PTRACE_LISTEN, this can wake the task /after/ the check against
__TASK_TRACED, but before the reset of state to TASK_TRACED. This
causes it to instead clobber TASK_WAKING, allowing a subsequent wakeup
against TRACED while the task is still on the rq wake_list, corrupting
it.
Oleg said:
"The kernel can crash or this can lead to other hard-to-debug problems.
In short, "task->state = TASK_TRACED" in ptrace_unfreeze_traced()
assumes that nobody else can wake it up, but PTRACE_LISTEN breaks the
contract. Obviusly it is very wrong to manipulate task->state if this
task is already running, or WAKING, or it sleeps again"
[akpm@linux-foundation.org: coding-style fixes]
Fixes: 9899d11f ("ptrace: ensure arch_ptrace/ptrace_request can never race with SIGKILL")
Link: http://lkml.kernel.org/r/xm26y3vfhmkp.fsf_-_@bsegall-linux.mtv.corp.google.com
Signed-off-by: Ben Segall <bsegall@google.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>