The Cavium guys reported a soft lockup on their arm64 machine, caused by
commit c55a6ffa62 ("locking/osq: Relax atomic semantics"):
mutex_optimistic_spin+0x9c/0x1d0
__mutex_lock_slowpath+0x44/0x158
mutex_lock+0x54/0x58
kernfs_iop_permission+0x38/0x70
__inode_permission+0x88/0xd8
inode_permission+0x30/0x6c
link_path_walk+0x68/0x4d4
path_openat+0xb4/0x2bc
do_filp_open+0x74/0xd0
do_sys_open+0x14c/0x228
SyS_openat+0x3c/0x48
el0_svc_naked+0x24/0x28
This is because in osq_lock we initialise the node for the current CPU:
node->locked = 0;
node->next = NULL;
node->cpu = curr;
and then publish the current CPU in the lock tail:
old = atomic_xchg_acquire(&lock->tail, curr);
Once the update to lock->tail is visible to another CPU, the node is
then live and can be both read and updated by concurrent lockers.
Unfortunately, the ACQUIRE semantics of the xchg operation mean that
there is no guarantee the contents of the node will be visible before
lock tail is updated. This can lead to lock corruption when, for
example, a concurrent locker races to set the next field.
Fixes: c55a6ffa62 ("locking/osq: Relax atomic semantics"):
Reported-by: David Daney <ddaney@caviumnetworks.com>
Reported-by: Andrew Pinski <andrew.pinski@caviumnetworks.com>
Tested-by: Andrew Pinski <andrew.pinski@caviumnetworks.com>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1449856001-21177-1-git-send-email-will.deacon@arm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull libnvdimm fixes from Dan Williams:
- Two bug fixes for misuse of PAGE_MASK in scatterlist and dma-debug.
These are tagged for -stable. The scatterlist impact is potentially
corrupted dma addresses on HIGHMEM enabled platforms.
- A minor locking fix for the NFIT hot-add implementation that is new
in 4.4-rc. This would only trigger in the case a hot-add raced
driver removal.
* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
dma-debug: Fix dma_debug_entry offset calculation
Revert "scatterlist: use sg_phys()"
nfit: acpi_nfit_notify(): Do not leave device locked
commit e20538b82f
("gpio: Propagate errors from chip->get()")
started to propagate errors from the .get() functions since
we can get errors from the infrastructure of e.g. slowbus
GPIO expanders.
However it turns out a bunch of drivers relied on the core
to clamp the value, so we need to revert to the old behaviour
and go over all drivers and fix them to conform to the
expectations of the core before we go back to propagating
the error code.
Cc: stable@vger.kernel.org # 4.3+
Cc: Bjorn Andersson <bjorn.andersson@sonymobile.com>
Cc: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
Fixes: e20538b82f ("gpio: Propagate errors from chip->get()")
Reported-by: Michael Trimarchi <michael@amarulasolutions.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
The bgpio_get_set() call should return a value clamped to [0,1],
the current code will return a negative value if reading
bit 31, which turns the value negative as this is a signed value
and thus gets interpreted as an error by the gpiolib core.
Found on the gpio-mxc but applies to any MMIO driver.
Cc: stable@vger.kernel.org # 4.3+
Cc: kernel@pengutronix.de
Cc: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
Fixes: e20538b82f ("gpio: Propagate errors from chip->get()")
Reported-by: Clemens Gruber <clemens.gruber@pqgruber.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
When doing a direct IO write, __blockdev_direct_IO() can call the
btrfs_get_blocks_direct() callback one or more times before it calls the
btrfs_submit_direct() callback. However it can fail after calling the
first callback and before calling the second callback, which is a problem
because the first one creates ordered extents and the second one is the
one that submits bios that cover the ordered extents created by the first
one. That means the ordered extents will never complete nor have any of
the flags BTRFS_ORDERED_IO_DONE / BTRFS_ORDERED_IOERR set, resulting in
subsequent operations (such as other direct IO writes, buffered writes or
hole punching) that lock the same IO range and lookup for ordered extents
in the range to hang forever waiting for those ordered extents because
they can not complete ever, since no bio was submitted.
Fix this by tracking a range of created ordered extents that don't have
yet corresponding bios submitted and completing the ordered extents in
the range if __blockdev_direct_IO() fails with an error.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
If readpages() (triggered by defrag or buffered reads) is called while a
direct IO write is in progress, we have a small time window where we can
deadlock, resulting in traces like the following being generated:
[84723.212993] INFO: task fio:2849 blocked for more than 120 seconds.
[84723.214310] Tainted: G W 4.3.0-rc5-btrfs-next-17+ #1
[84723.215640] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[84723.217313] fio D ffff88023ec75218 0 2849 2835 0x00000000
[84723.218778] ffff880122dfb6e8 0000000000000092 0000000000000000 ffff88023ec75200
[84723.220458] ffff88000e05d2c0 ffff880122dfc000 ffff88023ec75200 7fffffffffffffff
[84723.230597] 0000000000000002 ffffffff8147891a ffff880122dfb700 ffffffff8147856a
[84723.232085] Call Trace:
[84723.232625] [<ffffffff8147891a>] ? bit_wait+0x3c/0x3c
[84723.233529] [<ffffffff8147856a>] schedule+0x7d/0x95
[84723.234398] [<ffffffff8147baa3>] schedule_timeout+0x43/0x10b
[84723.235384] [<ffffffff810f82eb>] ? time_hardirqs_on+0x15/0x28
[84723.236426] [<ffffffff8108a23d>] ? trace_hardirqs_on+0xd/0xf
[84723.237502] [<ffffffff810af8a3>] ? read_seqcount_begin.constprop.20+0x57/0x6d
[84723.238807] [<ffffffff8108a09b>] ? trace_hardirqs_on_caller+0x16/0x1ab
[84723.242012] [<ffffffff8108a23d>] ? trace_hardirqs_on+0xd/0xf
[84723.243064] [<ffffffff810af2ad>] ? timekeeping_get_ns+0xe/0x33
[84723.244116] [<ffffffff810afa2e>] ? ktime_get+0x41/0x52
[84723.245029] [<ffffffff81477cff>] io_schedule_timeout+0xb7/0x12b
[84723.245942] [<ffffffff81477cff>] ? io_schedule_timeout+0xb7/0x12b
[84723.246596] [<ffffffff81478953>] bit_wait_io+0x39/0x45
[84723.247503] [<ffffffff81478b93>] __wait_on_bit_lock+0x49/0x8d
[84723.248540] [<ffffffff8111684f>] __lock_page+0x66/0x68
[84723.249558] [<ffffffff81081c9b>] ? autoremove_wake_function+0x3a/0x3a
[84723.250844] [<ffffffff81124a04>] lock_page+0x2c/0x2f
[84723.251871] [<ffffffff81124afc>] invalidate_inode_pages2_range+0xf5/0x2aa
[84723.253274] [<ffffffff81117c34>] ? filemap_fdatawait_range+0x12d/0x146
[84723.254757] [<ffffffff81118191>] ? filemap_fdatawrite_range+0x13/0x15
[84723.256378] [<ffffffffa05139a2>] btrfs_get_blocks_direct+0x1b0/0x664 [btrfs]
[84723.258556] [<ffffffff8119e3f9>] ? submit_page_section+0x7b/0x111
[84723.260064] [<ffffffff8119eb90>] do_blockdev_direct_IO+0x658/0xbdb
[84723.261479] [<ffffffffa05137f2>] ? btrfs_page_exists_in_range+0x1a9/0x1a9 [btrfs]
[84723.262961] [<ffffffffa050a8a6>] ? btrfs_writepage_start_hook+0xce/0xce [btrfs]
[84723.264449] [<ffffffff8119f144>] __blockdev_direct_IO+0x31/0x33
[84723.265614] [<ffffffff8119f144>] ? __blockdev_direct_IO+0x31/0x33
[84723.266769] [<ffffffffa050a8a6>] ? btrfs_writepage_start_hook+0xce/0xce [btrfs]
[84723.268264] [<ffffffffa050935d>] btrfs_direct_IO+0x1b9/0x259 [btrfs]
[84723.270954] [<ffffffffa050a8a6>] ? btrfs_writepage_start_hook+0xce/0xce [btrfs]
[84723.272465] [<ffffffff8111878c>] generic_file_direct_write+0xb3/0x128
[84723.273734] [<ffffffffa051955c>] btrfs_file_write_iter+0x228/0x404 [btrfs]
[84723.275101] [<ffffffff8116ca6f>] __vfs_write+0x7c/0xa5
[84723.276200] [<ffffffff8116cfab>] vfs_write+0xa0/0xe4
[84723.277298] [<ffffffff8116d79d>] SyS_write+0x50/0x7e
[84723.278327] [<ffffffff8147cd97>] entry_SYSCALL_64_fastpath+0x12/0x6f
[84723.279595] INFO: lockdep is turned off.
[84723.379035] INFO: task btrfs:2923 blocked for more than 120 seconds.
[84723.380323] Tainted: G W 4.3.0-rc5-btrfs-next-17+ #1
[84723.381608] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[84723.383003] btrfs D ffff88023ed75218 0 2923 2859 0x00000000
[84723.384277] ffff88001311f860 0000000000000082 ffff88001311f840 ffff88023ed75200
[84723.385748] ffff88012c6751c0 ffff880013120000 ffff88012042fe68 ffff88012042fe30
[84723.387152] ffff880221571c88 0000000000000001 ffff88001311f878 ffffffff8147856a
[84723.388620] Call Trace:
[84723.389105] [<ffffffff8147856a>] schedule+0x7d/0x95
[84723.391882] [<ffffffffa051da32>] btrfs_start_ordered_extent+0x161/0x1fa [btrfs]
[84723.393718] [<ffffffff81081c61>] ? signal_pending_state+0x31/0x31
[84723.395659] [<ffffffffa0522c5b>] __do_contiguous_readpages.constprop.21+0x81/0xdc [btrfs]
[84723.397383] [<ffffffffa050ac96>] ? btrfs_submit_direct+0x3f0/0x3f0 [btrfs]
[84723.398852] [<ffffffffa0522da3>] __extent_readpages.constprop.20+0xed/0x100 [btrfs]
[84723.400561] [<ffffffff81123f6c>] ? __lru_cache_add+0x5d/0x72
[84723.401787] [<ffffffffa0523896>] extent_readpages+0x111/0x1a7 [btrfs]
[84723.403121] [<ffffffffa050ac96>] ? btrfs_submit_direct+0x3f0/0x3f0 [btrfs]
[84723.404583] [<ffffffffa05088fa>] btrfs_readpages+0x1f/0x21 [btrfs]
[84723.406007] [<ffffffff811226df>] __do_page_cache_readahead+0x168/0x1f4
[84723.407502] [<ffffffff81122988>] ondemand_readahead+0x21d/0x22e
[84723.408937] [<ffffffff81122988>] ? ondemand_readahead+0x21d/0x22e
[84723.410487] [<ffffffff81122af1>] page_cache_sync_readahead+0x3d/0x3f
[84723.411710] [<ffffffffa0535388>] btrfs_defrag_file+0x419/0xaaf [btrfs]
[84723.413007] [<ffffffffa0531db0>] ? kzalloc+0xf/0x11 [btrfs]
[84723.414085] [<ffffffffa0535b43>] btrfs_ioctl_defrag+0x125/0x14e [btrfs]
[84723.415307] [<ffffffffa0536753>] btrfs_ioctl+0x746/0x24c6 [btrfs]
[84723.416532] [<ffffffff81087481>] ? arch_local_irq_save+0x9/0xc
[84723.417731] [<ffffffff8113ad61>] ? __might_fault+0x4c/0xa7
[84723.418699] [<ffffffff8113ad61>] ? __might_fault+0x4c/0xa7
[84723.421532] [<ffffffff8113adba>] ? __might_fault+0xa5/0xa7
[84723.422629] [<ffffffff81171139>] ? cp_new_stat+0x15d/0x174
[84723.423712] [<ffffffff8117c610>] do_vfs_ioctl+0x427/0x4e6
[84723.424801] [<ffffffff81171175>] ? SYSC_newfstat+0x25/0x2e
[84723.425968] [<ffffffff8118574d>] ? __fget_light+0x4d/0x71
[84723.427063] [<ffffffff8117c726>] SyS_ioctl+0x57/0x79
[84723.428138] [<ffffffff8147cd97>] entry_SYSCALL_64_fastpath+0x12/0x6f
Consider the following logical and physical file layout:
logical: ... [ prealloc extent A ] [ prealloc extent B ] [ extent C ] ...
4K 8K 16K
physical: ... 12853248 12857344 1103101952 ...
(= 12853248 + 4K)
Extents A and B are physically adjacent. The following diagram shows a
sequence of events that lead to the deadlock when we attempt to do a
direct IO write against the file range [4K, 16K[ and a defrag is triggered
simultaneously.
CPU 1 CPU 2
btrfs_direct_IO()
btrfs_get_blocks_direct()
creates ordered extent A, covering
the 4k prealloc extent A (range [4K, 8K[)
btrfs_defrag_file()
page_cache_sync_readahead([0K, 1M[)
btrfs_readpages()
extent_readpages()
locks all pages in the file
range [0K, 128K[ through calls
to add_to_page_cache_lru()
__do_contiguous_readpages()
finds ordered extent A
waits for it to complete
btrfs_get_blocks_direct() called again
lock_extent_direct(range [8K, 16K[)
finds a page in range [8K, 16K[ through
btrfs_page_exists_in_range()
invalidate_inode_pages2_range([8K, 16K[)
--> tries to lock pages that are already
locked by the task at CPU 2
--> our task, running __blockdev_direct_IO(),
hangs waiting to lock the pages and the
submit bio callback, btrfs_submit_direct(),
ends up never being called, resulting in the
ordered extent A never completing (because a
corresponding bio is never submitted) and
CPU 2 will wait for it forever while holding
the pages locked
---> deadlock!
Fix this by removing the page invalidation approach when attempting to
lock the range for IO from the callback btrfs_get_blocks_direct() and
falling back buffered IO. This was a rare case anyway and well behaved
applications do not mix concurrent direct IO writes with buffered reads
anyway, being a concurrent defrag the only normal case that could lead
to the deadlock.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Commit 61de718fce ("Btrfs: fix memory corruption on failure to submit
bio for direct IO") fixed problems with the error handling code after we
fail to submit a bio for direct IO. However there were 2 problems that it
did not address when the failure is due to memory allocation failures for
direct IO writes:
1) We considered that there could be only one ordered extent for the whole
IO range, which is not always true, as we can have multiple;
2) It did not set the bit BTRFS_ORDERED_IO_DONE in the ordered extent,
which can make other tasks running btrfs_wait_logged_extents() hang
forever, since they wait for that bit to be set. The general assumption
is that regardless of an error, the BTRFS_ORDERED_IO_DONE is always set
and it precedes setting the bit BTRFS_ORDERED_COMPLETE.
Fix these issues by moving part of the btrfs_endio_direct_write() handler
into a new helper function and having that new helper function called when
we fail to allocate memory to submit the bio (and its private object) for
a direct IO write.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
When a transaction is aborted, or its commit fails before writing the new
superblock and calling btrfs_finish_extent_commit(), we leak reference
counts on the block groups attached to the transaction's delete_bgs list,
because btrfs_finish_extent_commit() is never called for those two cases.
Fix this by dropping their references at btrfs_put_transaction(), which
is called when transactions are aborted (by making the transaction kthread
commit the transaction) or if their commits fail.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
During the final phase of a device replace operation, I ran into a
transaction abort that resulted in the following trace:
[23919.655368] WARNING: CPU: 10 PID: 30175 at fs/btrfs/extent-tree.c:9843 btrfs_create_pending_block_groups+0x15e/0x1ab [btrfs]()
[23919.664742] BTRFS: Transaction aborted (error -2)
[23919.665749] Modules linked in: btrfs crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop fuse parport_pc i2c_piix4 parport psmouse acpi_cpufreq processor i2c_core evdev microcode pcspkr button serio_raw ext4 crc16 jbd2 mbcache sd_mod sg sr_mod cdrom virtio_scsi ata_generic ata_piix virtio_pci floppy virtio_ring libata e1000 virtio scsi_mod [last unloaded: btrfs]
[23919.679442] CPU: 10 PID: 30175 Comm: fsstress Not tainted 4.3.0-rc5-btrfs-next-17+ #1
[23919.682392] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
[23919.689151] 0000000000000000 ffff8804020cbb50 ffffffff812566f4 ffff8804020cbb98
[23919.692604] ffff8804020cbb88 ffffffff8104d0a6 ffffffffa03eea69 ffff88041b678a48
[23919.694230] ffff88042ac38000 ffff88041b678930 00000000fffffffe ffff8804020cbbf0
[23919.696716] Call Trace:
[23919.698669] [<ffffffff812566f4>] dump_stack+0x4e/0x79
[23919.700597] [<ffffffff8104d0a6>] warn_slowpath_common+0x9f/0xb8
[23919.701958] [<ffffffffa03eea69>] ? btrfs_create_pending_block_groups+0x15e/0x1ab [btrfs]
[23919.703612] [<ffffffff8104d107>] warn_slowpath_fmt+0x48/0x50
[23919.705047] [<ffffffffa03eea69>] btrfs_create_pending_block_groups+0x15e/0x1ab [btrfs]
[23919.706967] [<ffffffffa0402097>] __btrfs_end_transaction+0x84/0x2dd [btrfs]
[23919.708611] [<ffffffffa0402300>] btrfs_end_transaction+0x10/0x12 [btrfs]
[23919.710099] [<ffffffffa03ef0b8>] btrfs_alloc_data_chunk_ondemand+0x121/0x28b [btrfs]
[23919.711970] [<ffffffffa0413025>] btrfs_fallocate+0x7d3/0xc6d [btrfs]
[23919.713602] [<ffffffff8108b78f>] ? lock_acquire+0x10d/0x194
[23919.714756] [<ffffffff81086dbc>] ? percpu_down_read+0x51/0x78
[23919.716155] [<ffffffff8116ef1d>] ? __sb_start_write+0x5f/0xb0
[23919.718918] [<ffffffff8116ef1d>] ? __sb_start_write+0x5f/0xb0
[23919.724170] [<ffffffff8116b579>] vfs_fallocate+0x170/0x1ff
[23919.725482] [<ffffffff8117c1d7>] ioctl_preallocate+0x89/0x9b
[23919.726790] [<ffffffff8117c5ef>] do_vfs_ioctl+0x406/0x4e6
[23919.728428] [<ffffffff81171175>] ? SYSC_newfstat+0x25/0x2e
[23919.729642] [<ffffffff8118574d>] ? __fget_light+0x4d/0x71
[23919.730782] [<ffffffff8117c726>] SyS_ioctl+0x57/0x79
[23919.731847] [<ffffffff8147cd97>] entry_SYSCALL_64_fastpath+0x12/0x6f
[23919.733330] ---[ end trace 166ef301a335832a ]---
This is due to a race between device replace and chunk allocation, which
the following diagram illustrates:
CPU 1 CPU 2
btrfs_dev_replace_finishing()
at this point
dev_replace->tgtdev->devid ==
BTRFS_DEV_REPLACE_DEVID (0ULL)
...
btrfs_start_transaction()
btrfs_commit_transaction()
btrfs_fallocate()
btrfs_alloc_data_chunk_ondemand()
btrfs_join_transaction()
--> starts a new transaction
do_chunk_alloc()
lock fs_info->chunk_mutex
btrfs_alloc_chunk()
--> creates extent map for
the new chunk with
em->bdev->map->stripes[i]->dev->devid
== X (X > 0)
--> extent map is added to
fs_info->mapping_tree
--> initial phase of bg A
allocation completes
unlock fs_info->chunk_mutex
lock fs_info->chunk_mutex
btrfs_dev_replace_update_device_in_mapping_tree()
--> iterates fs_info->mapping_tree and
replaces the device in every extent
map's map->stripes[] with
dev_replace->tgtdev, which still has
an id of 0ULL (BTRFS_DEV_REPLACE_DEVID)
btrfs_end_transaction()
btrfs_create_pending_block_groups()
--> starts final phase of
bg A creation (update device,
extent, and chunk trees, etc)
btrfs_finish_chunk_alloc()
btrfs_update_device()
--> attempts to update a device
item with ID == 0ULL
(BTRFS_DEV_REPLACE_DEVID)
which is the current ID of
bg A's
em->bdev->map->stripes[i]->dev->devid
--> doesn't find such item
returns -ENOENT
--> the device id should have been X
and not 0ULL
got -ENOENT from
btrfs_finish_chunk_alloc()
and aborts current transaction
finishes setting up the target device,
namely it sets tgtdev->devid to the value
of srcdev->devid, which is X (and X > 0)
frees the srcdev
unlock fs_info->chunk_mutex
So fix this by taking the device list mutex when processing the chunk's
extent map stripes to update the device items. This avoids getting the
wrong device id and use-after-free problems if the task finishing a
chunk allocation grabs the replaced device, which is freed while the
dev replace task is holding the device list mutex.
This happened while running fstest btrfs/071.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
When running on newer OPAL firmware that supports sending extra
OPAL_MSG types, we would print a warning on *every* message received.
This could be a problem for kernels that don't support OPAL_MSG_OCC
on machines that are running real close to thermal limits and the
OCC is throttling the chip. For a kernel that is paying attention to
the message queue, we could get these notifications quite often.
Conceivably, future message types could also come fairly often,
and printing that we didn't understand them 10,000 times provides
no further information than printing them once.
Cc: stable@vger.kernel.org
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This will better reflect its description i.e. "any needed setup..."
and not just do an "IPI request".
Signed-off-by: Noam Camus <noamc@ezchip.com>
Acked-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
ARC dwarf unwinder only supports CIE version == 1
The boot time dwarf sanitizer (part of binary lookup table constructor)
would simply bail if it saw CIE version == 3, rendering unwinder with a
NULL lookup table.
It seems libgcc linked with kernel does have such entries.
With fallback linear search removed, and a NULL binary lookup table,
unwinder fails to generate any stack trace.
So allow graceful ignoring of unsupported CIE entries.
This problem was initially seen in Alexey's setup (and not mine) as he
was using buildroot built toolchain (libgcc) which doesn't get built with
CFLAGS_FOR_TARGET="-gdwarf-2 which is my default
Fixes STAR 9000985048: "kernel unwinder broken with stock tools"
Fixes: 2e22502c08 ARC: dw2 unwind: Remove falllback linear search thru FDE entries
Reported-by Alexey Brodkin <abrodkin@synopsys.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
The fix which removed linear searching of dwarf (because binary lookup
data always exists) missed out on the fact that modules don't get the
binary lookup tables info. This caused unwinding out of modules to stop
working.
So add binary lookup header setup (equivalent of eh_frame_hdr setup) to
modules as well.
While at it, confine the header setup to within unwinder code,
reducing one API exposed out of unwinder code.
Fixes: 2e22502c08 ARC: dw2 unwind: Remove falllback linear search thru FDE entries
Cc: <stable@vger.kernel.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
HIGHMEM support bumped the default memory size for nsim platform to 1G.
Thus total memory ended at the very edge of start of peripherals address
space. With linux link base shifted, memory started bleeding into
peripheral space which caused early boot bad_page spew !
Fixes: 29e332261d ("ARC: mm: HIGHMEM: populate high memory from DT")
Reported-by: Anton Kolesov <akolesov@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
fou->udp_offloads is managed by RCU. As it is actually included inside
the fou sockets, we cannot let the memory go out of scope before a grace
period. We either can synchronize_rcu or switch over to kfree_rcu to
manage the sockets. kfree_rcu seems appropriate as it is used by vxlan
and geneve.
Fixes: 23461551c0 ("fou: Support for foo-over-udp RX path")
Cc: Tom Herbert <tom@herbertland.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The patch fixes FCC port lock-up, which occurs as a result of a bug
during underrun/collision handling. Within the tx_startup() function
in mac-fcc.c, the address of last BD is not calculated correctly.
As a result of wrong calculation of the last BD address, the next
transmitted BD may be set to an area out of the transmit BD ring.
This actually causes to port lock-up and it is not recoverable.
Signed-off-by: Martin Roth <martin.roth@motorolasolutions.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After commit 15bf176db1 ("gianfar: Don't enable the Filer w/o the
Parser"), 'TSEC' model controllers (for example as seen on MPC8541E)
always have 8 bytes stripped from the front of received frames.
Only 'eTSEC' gianfar controllers have the RX Filer capability (amongst
other enhancements). Previously this was treated as always enabled
for both 'TSEC' and 'eTSEC' controllers.
In commit 15bf176db1 ("gianfar: Don't enable the Filer w/o the Parser")
a subtle change was made to the setting of 'uses_rxfcb' to effectively
always set it (since 'rx_filer_enable' was always true). This had the
side-effect of always stripping 8 bytes from the front of received frames
on 'TSEC' type controllers.
We now only enable the RX Filer capability on controller types that
support it, thereby avoiding the issue for 'TSEC' type controllers.
Reviewed-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Reviewed-by: Mark Tomlinson <mark.tomlinson@alliedtelesis.co.nz>
Signed-off-by: Hamish Martin <hamish.martin@alliedtelesis.co.nz>
Reviewed-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
dma-debug uses struct dma_debug_entry to keep track of dma coherent
memory allocation requests. The virtual address is converted into a pfn
and an offset. Previously, the offset was calculated using an incorrect
bit mask. As a result, we saw incorrect error messages from dma-debug
like the following:
"DMA-API: exceeded 7 overlapping mappings of cacheline 0x03e00000"
Cacheline 0x03e00000 does not exist on our platform.
Cc: <stable@vger.kernel.org>
Fixes: 0abdd7a81b ("dma-debug: introduce debug_dma_assert_idle()")
Signed-off-by: Daniel Mentz <danielmentz@google.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Pull ARM fixes from Russell King:
"Further ARM fixes:
- Anson Huang noticed that we were corrupting a register we shouldn't
be during suspend on some CPUs.
- Shengjiu Wang spotted a bug in the 'swp' instruction emulation.
- Will Deacon fixed a bug in the ASID allocator.
- Laura Abbott fixed the kernel permission protection to apply to all
threads running in the system.
- I've fixed two bugs with the domain access control register
handling, one to do with printing an appropriate value at oops
time, and the other to further fix the uaccess_with_memcpy code"
* 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
ARM: 8475/1: SWP emulation: Restore original *data when failed
ARM: 8471/1: need to save/restore arm register(r11) when it is corrupted
ARM: fix uaccess_with_memcpy() with SW_DOMAIN_PAN
ARM: report proper DACR value in oops dumps
ARM: 8464/1: Update all mm structures with section adjustments
ARM: 8465/1: mm: keep reserved ASIDs in sync with mm after multiple rollovers
Docbook does not like the definition of macros inside a field declaration
and adds a warning. Move the definition out.
Fixes: 79462ad02e ("net: add validation for the socket syscall protocol argument")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The commit ba7c95ea38 ("rhashtable:
Fix sleeping inside RCU critical section in walk_stop") introduced
a new spinlock for the walker list. However, it did not convert
all existing users of the list over to the new spin lock. Some
continued to use the old mutext for this purpose. This obviously
led to corruption of the list.
The fix is to use the spin lock everywhere where we touch the list.
This also allows us to do rcu_rad_lock before we take the lock in
rhashtable_walk_start. With the old mutex this would've deadlocked
but it's safe with the new spin lock.
Fixes: ba7c95ea38 ("rhashtable: Fix sleeping inside RCU...")
Reported-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
William Hua <william.hua@canonical.com> wrote:
>
> I wasn't aware there was an enforced minimum size. I simply set the
> nelem_hint in the rhastable_params struct to 1, expecting it to grow as
> needed. This caused a segfault afterwards when trying to insert an
> element.
OK we're doing the size computation before we enforce the limit
on min_size.
---8<---
We need to do the initial hash table size computation after we
have obtained the correct min_size/max_size parameters. Otherwise
we may end up with a hash table whose size is outside the allowed
envelope.
Fixes: a998f712f7 ("rhashtable: Round up/down min/max_size to...")
Reported-by: William Hua <william.hua@canonical.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix parent-device reference leak due to SPI-core taking an unnecessary
reference to the parent when allocating the master structure, a
reference that was never released.
Note that driver core takes its own reference to the parent when the
master device is registered.
Fixes: 49dce689ad ("spi doesn't need class_device")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
We use the spi_lock spinlock to protect against races between the device
being removed and file operations on the spidev. This means that in the
removal path all references to the device need to be done under lock as
in removal we dropping references to the device.
Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
This partially reverts commit a34236155a.
While reviewing the glibc patch to exploit the individual IPC calls,
Arnd & Andreas noticed that we were still requiring userspace to pass
IPC_64 in order to get the new style IPC API.
With a bit of cleanup in the kernel we can drop that requirement, and
instead only provide the new style API, which will simplify things for
userspace.
Rather than try and sneak that patch into 4.4, instead we will drop the
individual IPC calls for powerpc, and merge them again in 4.5 once the
cleanup patch has gone in.
Because we've already added sys_mlock2() as syscall #378, we don't do a
full revert of the IPC calls. Instead we drop the __NR #defines, and
send those now undefined syscall numbers to sys_ni_syscall(). This
leaves a gap in the syscall numbers, but we'll reuse them when we merge
the individual IPC calls.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Arnd Bergmann <arnd@arndb.de>
David Ahern added a vif field in the a4 part of inetpeer_addr struct.
This broke IPv4 TCP fast open client side and more generally tcp metrics
cache, because inetpeer_addr_cmp() is now comparing two u32 instead of
one.
inetpeer_set_addr_v4() needs to properly init vif field, otherwise
the comparison result depends on uninitialized data.
Fixes: 192132b9a0 ("net: Add support for VRFs to inetpeer cache")
Reported-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bjørn reported that while we switch all interfaces to privacy stable mode
when setting the secret, we don't set this mode for new interfaces. This
does not make sense, so change this behaviour.
Fixes: 622c81d57b ("ipv6: generation of stable privacy addresses for link-local and autoconf")
Reported-by: Bjørn Mork <bjorn@mork.no>
Cc: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit db0fa0cb01 "scatterlist: use sg_phys()" did replacements of
the form:
phys_addr_t phys = page_to_phys(sg_page(s));
phys_addr_t phys = sg_phys(s) & PAGE_MASK;
However, this breaks platforms where sizeof(phys_addr_t) >
sizeof(unsigned long). Revert for 4.3 and 4.4 to make room for a
combined helper in 4.5.
Cc: <stable@vger.kernel.org>
Cc: Jens Axboe <axboe@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Fixes: db0fa0cb01 ("scatterlist: use sg_phys()")
Suggested-by: Joerg Roedel <joro@8bytes.org>
Reported-by: Vitaly Lavrov <vel21ripn@gmail.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
msg_iocb needs to be initialized on the recv/recvfrom path.
Otherwise afalg will wrongly interpret it as an async call.
Cc: stable@vger.kernel.org
Reported-by: Harald Freudenberger <freude@linux.vnet.ibm.com>
Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously the "vendor" and "product" IDs for the elan_i2c driver simply
reported 0000. This patch modifies the elan_i2c driver to include the
Elan vendor ID and the touchpad's product id under
input/input*/{vendor,product}.
Specifically, this is to allow us to apply a generic Elan gestures config
that will apply to all Elan touchpads on ChromeOS. These configs match to
input devices in various ways, but one major way is by matching on vendor
ID. Adding this patch allows the default Elan touchpad config to be
applied to Elan touchpads in this kernel by matching on devices that have
vendor ID 04f3.
Note that product ID is also available via custom sysfs entry "product_id"
as well.
Signed-off-by: Charlie Mooney <charliemooney@chromium.org>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
This has fixes spread thru driver, notably among them
- edma fixes for recent edma DT changes which went into 4.4
- odd fixes for at_hdmac
- minor fixes on bc dma and mic dma
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJWcE7VAAoJEHwUBw8lI4NH+IsP+gJEq1+xwC+Qni+oW0hUirwd
jltn0PZiwawsBFFxj8ZoKBxRGcpLIG0YI0k/umdpeZE3bxg/IfffpBtZfZGSF8Gr
w5lFZab2tyTjThc7PpjkGJB0ks/Dv1qlPZRx2+SoRq1IZP3ROv7i5HcTjr0pYWur
PfGq7EkeBGxyVPeElSa7VhfzimyiDz/SS77ZgOPCagnu99rWc1A+bXGvTO367E1E
IugN3+ndfykIHw4I3WBuVO+IC3yyXvgE21LyTIsb81iCs/ZzB3Cijb8jR3dpmtlK
VoFJQwAdlJHw7J7pDWhMvM8HMYIErmLbFqZbDi6PHBe6ZYkLOP3z5QVIc67l1KIN
vzIDUDvSLrFrRZ06691A6+/3yhI/g+FdlBaLeWwpcdJbmXHoEen2HrcQAF4ZUTRw
RZQeDtgze1iBeqbzEeO5+esBcAxc2PUFKQHpt/vEB1kHoA9/KjUg5L8Mkjj6o/Xz
uwoolopYJwI9H8rKZnX25F89N8R8cNrPXIe+qiCqPQj9cg2bmXzSTHFPdbI+t5bN
YdOuV1qiYfzFPKStoQEjgmYEDduHw7ndUjGuw4CXGF0ctcYlkLbP3KATfLh4MLJb
KpQ2GguIjOixlwtx/v9ovRKtqjYH5Egxa6TPiXm/MNVP2NXIXW+OG4x6OcKkLD9/
KEDo6XywrAc0VkL+1EEy
=r53B
-----END PGP SIGNATURE-----
Merge tag 'dmaengine-fix-4.4-rc6' of git://git.infradead.org/users/vkoul/slave-dma
Pull dmaengine fixes from Vinod Koul:
"This has fixes spread thru driver, notably among them:
- edma fixes for recent edma DT changes which went into 4.4
- odd fixes for at_hdmac
- minor fixes on bc dma and mic dma"
* tag 'dmaengine-fix-4.4-rc6' of git://git.infradead.org/users/vkoul/slave-dma:
dmaengine: at_xdmac: fix at_xdmac_prep_dma_memcpy()
dmaengine: edma: DT: Change reserved slot array from 16bit to 32bit type
dmaengine: edma: DT: Change memcpy channel array from 16bit to 32bit type
dmaengine: mic_x100: add missing spin_unlock
dmaengine: bcm2835-dma: Convert to use DMA pool
dmaengine: at_xdmac: fix bad behavior in interleaved mode
dmaengine: at_xdmac: fix false condition for memset_sg transfers
dmaengine: at_xdmac: fix macro typo
* OMAP: fix analog tv-out when using omapdrm
* fsl: Fix kernel crash when diu_ops is not implemented
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJWcCqqAAoJEPo9qoy8lh71QZoP/0Rftvw/AYsFMFaOyJALIguI
ou0PtZkAgAj5snmXcTcytTG7ifP1wyeJTh97SCStA8UmpbZ3o0DWkZrEMNM636Nv
iUS/TjDeJRYHZc/KJhfLQA2M3rNCIHv9YbdptXWzBzk73bp3LvAnutp5I0J2wDT6
zQdvRlIj0BYTBFJIl5UXnm/yVUcD3bEO79ezuwWPQ0wELl5iLRzkS0QHMvEpBe1k
u2+9iM7wz/6kk9jetrffbV6Jg0Pr3ASHR22Dzvq55RWAMZBhnOpcSpK9T6OyiByB
fmt5jguAAmxYDkPEPGEhSHTUcznkK/1Wmy3xRmR+DwOcQ9yjUj2/ThlQD4tsdlYu
+uF2lngXfi2ZK0dsYVStEL8KvoxVWIJEcqregjgEeBQ46t5OPgecd5YOpMNDa2v+
huHQpQ5ag7SBYl7EzaBTaO9gyxDJF9lupi1SvY3GRHugwkMqOQIT32lu2wtVTz59
h2h6y726Jtc1RedervwwmviaEa772ZQpUIEC7i0zsHazsBNaicb0CoxxsMXD+F5D
3BFsREZ+Bit/9f0Th540ar4EBlu2rFsqUspnJFWX15ITlIlbLc2cEqD+LeGnxlYL
ims/iQYpznwwc0nICSBNMIDh+tzMDdgjG8P+IV0qnv8SHYgYc+ceMfERij+gGQrS
e9grM4yFCWd51G8eu8/o
=q7XP
-----END PGP SIGNATURE-----
Merge tag 'fbdev-fixes-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux
Pull two fbdev fixes from Tomi Valkeinen:
- OMAP: fix analog tv-out when using omapdrm
- fsl: Fix kernel crash when diu_ops is not implemented
* tag 'fbdev-fixes-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux:
OMAPDSS: fix timings for VENC to match what omapdrm expects
video: fbdev: fsl: Fix kernel crash when diu_ops is not implemented
Stas Nichiporovich reported a regression in his HFSC qdisc setup
on a non multi queue device.
It turns out I mistakenly added a TCQ_F_NOPARENT flag on all qdisc
allocated in qdisc_create() for non multi queue devices, which was
rather buggy. I was clearly mislead by the TCQ_F_ONETXQUEUE that is
also set here for no good reason, since it only matters for the root
qdisc.
Fixes: 4eaf3b84f2 ("net_sched: fix qdisc_tree_decrease_qlen() races")
Reported-by: Stas Nichiporovich <stasn77@gmail.com>
Tested-by: Stas Nichiporovich <stasn77@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paul Bolle says:
====================
ser_gigaset: fix deallocation of platform device structure
Sascha Levin reported that the syzkaller fuzzer triggered a WARNING in
ser_gigaset (see https://lkml.kernel.org/g/56587467.8050102@oracle.com ). It
turned out that ser_gigaset has always deallocated its platform device
structure incorrectly. Tilman submitted the patch that fixes that (3/4) and a
related cleanup (4/4).
Tilman also submitted a minor cleanup of some NULL checks (1/4) that prompted
Alan to turn those checks into WARN_ONs (2/4). If no one hits these WARN_ONs in
the next couple of releases these WARN_ONs should be removed.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
device->platform_data and platform_device->resource are never used
and remain NULL through their entire life. Drops the kfree() calls
for them from the device release method.
Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: David S. Miller <davem@davemloft.net>
When shutting down the device, the struct ser_cardstate must not be
kfree()d immediately after the call to platform_device_unregister()
since the embedded struct platform_device is still in use.
Move the kfree() call to the release method instead.
Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Fixes: 2869b23e4b ("drivers/isdn/gigaset: new M101 driver (v2)")
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: David S. Miller <davem@davemloft.net>
These checks do nothing useful to protect the code from races. On the
other hand if the old code has been masking a real bug we would like to
know about it.
The check for tiocmset is kept because it is valid for a tty driver to
have a NULL tiocmset method. That in itself is probably a mistake given
modern coding practices - but needs fixing in the tty layer.
Signed-off-by: Alan Cox <alan@linux.intel.com>
Acked-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit f34d7a5b70 ("tty: The big operations rework") changed
tty->driver to tty->ops but left NULL checks for tty->driver untouched.
Fix.
Signed-off-by: Tilman Schmidt <tilman@imap.cc>
[pebolle: removed Fixes tag]
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull crypto fix from Herbert Xu:
"This fixes a boundary condition in the blkcipher SG walking code that
can lead to a crash when used with the new chacha20 algorithm"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: skcipher - Copy iv from desc even for 0-len walks
Pavel Machek reports a warning about W+X pages found in the "Persisent"
kmap area. After grepping for it (using the correct spelling), and not
finding it, I noticed how the debug printk was just misspelled. Fix it.
The actual mapping bug that Pavel reported is still open. It's
apparently a separate issue from the known EFI page tables, looks like
it's related to the HIGHMEM mappings.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The problem here is that at the end of the loop we test for if
idc->vnic_wait_limit is zero, but since idc->vnic_wait_limit-- is a
post-op, it actually ends up set to (u8)-1. I have fixed this by
moving the decrement inside the loop.
Fixes: 486a5bc77a ('qlcnic: Add support for 83xx suspend and resume.')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We test for if "tries" is zero at the end but "tries--" is a post-op so
it will end with "tries" set to -1. I have changed it to a pre-op
instead.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The problem here is that after the loop we test for "if (!i) " but
because "i--" is a post-op we exit with i set to -1. I have fixed this
by changing it to a pre-op instead. I had to change the starting value
from 3 to 4 so that we still iterate 3 times.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>