Eliminates need for a separate mempool to allocate 'struct dm_io'
objects from. As such, it saves an extra mempool allocation for each
original bio that DM core is issued.
This complicates the per-bio-data accessor functions by needing to
conditonally add extra padding to get to a target's per-bio-data. But
in the end this provides a decent performance improvement for all
bio-based DM devices.
On an NVMe-loop based testbed to a ramdisk (~3100 MB/s): bio-based
DM linear performance improved by 2% (went from 2665 to 2777 MB/s).
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
These CRUD comments have worn out their welcome. The code is what it
is, over time it'll hopefully get better. But these comments serve no
purpose whatsoever.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
__send_changing_extent_only() must follow the same pattern that was
established with commit "dm: ensure bio submission follows a depth-first
tree walk". That is: submit first bio up to split boundary and then
split the remainder to further submissions.
Suggested-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
alloc_multiple_bios() assumes it can allocate the requested number of
bios but until now there was no gaurantee that the mempools would be
accomodating.
Suggested-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Now that all of DM has been revised and/or verified to no longer require
the use of BIOSET_NEED_RESCUER the dm_offload code may be removed.
Suggested-by: NeilBrown <neilb@suse.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
DM targets can request multiple bios be sent to them by DM core (see:
num_{flush,discard,write_same,write_zeroes}_bios). But until now these
bios were allocated in an unsafe manner than could potentially exhaust
the DM device's bioset -- in the face of multiple threads each trying to
do multiple allocations from the same DM device's bioset.
Fix __send_duplicate_bios() by using the new alloc_multiple_bios(). The
allocation strategy used by alloc_multiple_bios() models that used by
dm-crypt.c:crypt_alloc_buffer().
Neil Brown initially proposed this fix but the implementation has been
revised enough that it inappropriate to attribute the entirety of it to
him.
Suggested-by: NeilBrown <neilb@suse.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
No DM target provides num_write_bios and none has since dm-cache's
brief use in 2013.
Having the possibility of num_write_bios > 1 complicates bio
allocation. So remove the interface and assume there is only one bio
needed.
If a target ever needs more, it must provide a suitable bioset and
allocate itself based on its particular needs.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
A dm device can, in general, represent a tree of targets, each of which
handles a sub-range of the range of blocks handled by the parent.
The bio sequencing managed by generic_make_request() requires that bios
are generated and handled in a depth-first manner. Each call to a
make_request_fn() may submit bios to a single member device, and may
submit bios for a reduced region of the same device as the
make_request_fn.
In particular, any bios submitted to member devices must be expected to
be processed in order, so a later one must never wait for an earlier
one.
This ordering is usually achieved by using bio_split() to reduce a bio
to a size that can be completely handled by one target, and resubmitting
the remainder to the originating device. bio_queue_split() shows the
canonical approach.
dm doesn't follow this approach, largely because it has needed to split
bios since long before bio_split() was available. It currently can
submit bios to separate targets within the one dm_make_request() call.
Dependencies between these targets, as can happen with dm-snap, can
cause deadlocks if either bios gets stuck behind the other in the queues
managed by generic_make_request(). This requires the 'rescue'
functionality provided by dm_offload_{start,end}.
Some of this requirement can be removed by changing the order of bio
submission to follow the canonical approach. That is, if dm finds that
it needs to split a bio, the remainder should be sent to
generic_make_request() rather than being handled immediately. This
delays the handling until the first part is completely processed, so the
deadlock problems do not occur.
__split_and_process_bio() can be called both from dm_make_request() and
from dm_wq_work(). When called from dm_wq_work() the current approach
is perfectly satisfactory as each bio will be processed immediately.
When called from dm_make_request(), current->bio_list will be non-NULL,
and in this case it is best to create a separate "clone" bio for the
remainder.
When we use bio_clone_bioset() to split off the front part of a bio
and chain the two together and submit the remainder to
generic_make_request(), it is important that the newly allocated
bio is used as the head to be processed immediately, and the original
bio gets "bio_advance()"d and sent to generic_make_request() as the
remainder. Otherwise, if the newly allocated bio is used as the
remainder, and if it then needs to be split again, then the next
bio_clone_bioset() call will be made while holding a reference a bio
(result of the first clone) from the same bioset. This can potentially
exhaust the bioset mempool and result in a memory allocation deadlock.
Note that there is no race caused by reassigning cio.io->bio after already
calling __map_bio(). This bio will only be dereferenced again after
dec_pending() has found io->io_count to be zero, and this cannot happen
before the dec_pending() call at the end of __split_and_process_bio().
To provide the clone bio when splitting, we use q->bio_split. This
was previously being freed by bio-based dm to avoid having excess
rescuer threads. As bio_split bio sets no longer create rescuer
threads, there is little cost and much gain from restoring the
q->bio_split bio set.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
The BIOSET_NEED_RESCUER flag is only needed when a make_request_fn might
do two allocations from the one bioset, and the second one could block
until the first bio completes.
dm_io() is called from make_request_fn() context. The closest it comes
to multiple allocations is in chunk_io() in dm-snap-persistent. But
there the code uses a separate thread to avoid problems.
So BIOSET_NEED_RESCUER is not needed.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
The BIOSET_NEED_RESCUER flag is only needed when a make_request_fn might
do two allocations from the one bioset, and the second one could block
until the first bio completes.
dm-crypt does allocate from this bioset inside the dm make_request_fn,
but does so using GFP_NOWAIT so that the allocation will not block.
So BIOSET_NEED_RESCUER is not needed.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Clarify that dm_accept_partial_bio isn't allowed for REQ_OP_ZONE_RESET
bios.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
No need to calculate the reshaping progress because
mddev->curr_resync_completed holds it.
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
During reshape, 'A' chars were reported in status rather than 'a'.
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
In order to avoid redoing synchronization/recovery/reshape partially,
the raid set got frozen until after all passed in table line flags had
been cleared. The related table reload sequence had to be precisely
followed, or reshaping may lead to data corruption caused by the active
mapping carrying on with a reshape when the inactive mapping already
had retrieved a stale reshape position.
Harden by retrieving the actual resync/recovery/reshape position
during resume whilst the active table is suspended thus avoiding
to keep the raid set frozen altogether. This prevents superfluous
redoing of an already resynchronized or recovered segment and,
most importantly, potential for redoing of an already reshaped
segment causing data corruption.
Fixes: d39f0010e ("dm raid: fix raid_resume() to keep raid set frozen as needed")
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Verifying the current raid sets redundancy based on retrieved
superblock content has to use the superblock's raid level (e.g. raid0),
not the constructor requested one (e.g. raid10).
Using the requested raid level of raid10 lead to a "divide error"
on raid0 which defines data copies divided by to be zero.
Also check for bogus data copies.
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Move raid_resume()'s setting of 'rw' and 'in_sync' to just prior to
mddev_resume().
Also, remove unused 'bitmap_loaded' member from "struct raid_set".
No functional changes.
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Fix various sync state issues causing racy/bogus sync ratio,
sync_action ad health chars in dm_status() info output.
Sync ratio could be N/N (i.e. 100%) shortly after raid set
creation, i.e. creating a new RaidLV or upconverting a linear LV to
raid1 thus:
"0 2097152 raid raid1 2 Aa 2097162/2097152 recover 0 0 -"
instead of:
"0 2097152 raid raid1 2 Aa 0/2097152 idle 0 0 -"
Sync action could be non-idle, when the MD thread was done with io.
Health chars could be 'A' when they should be 'a' for a short time
before a resynchonization started.
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
The raid_status() function passes the bool array_in_sync variable around
providing synchronization state of the MD array. Replace it with a
runtime flag. This will avoid a pattern of having to pass discrete
variables to various functions.
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
The MD sync thread updates recovery flags providing state of any
running, idle, frozen, recovering, reshaping, ... activity it performs
and updates respective flags asynchronously versus dm processing
raid_status(). To close that race window, take a single copy of the
flags and pass it into its callees.
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
During a reshape request: if userspace reloads a "raid" table multiple
times, resulting in multiple superblock reads, the raid set needs to
stay frozen until all config changes (chunk size, layout data_offset,
delta_disks) have been stored in the superblocks and respective flags
cleared.
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Check all component data device sizes versus calculated size.
Reject if device(s) are too small. Otherwise, MD will fail the
operation by accessing beyond the end of the data device.
An example use-case is that growing bitmap won't fit any more and the MD
runtime will report an error when DM raid should catch this earlier.
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
The raid set size is being revalidated unconditionally before a
reshaping conversion is started. MD requires the size to only be
reduced in case of a stripe removing (i.e. shrinking) reshape but not
when growing because the raid array has to stay small until after the
growing reshape finishes.
Fix by avoiding the size revalidation in preresume unless a shrinking
reshape is requested.
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Pay attention to existing reshape space to define if a raid set needs
resizing. Otherwise we can hit "Can't resize a reshaping raid set"
when a reshape is being requested.
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
The md raid personalities call md_finish_reshape() at the end of a
reshape conversion which adjusts rdev->sectors.
Correct/check rdev->sectors before initiating a reshape and raise the
recovery pointer accordingly.
Otherwise, the DM raid coordinated reshape will fail.
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
md_stop_writes() is called in raid_presuspend() causing deadlocks on
bios submitted afterwards -- which happens on loaded raid sets with
conversion requests.
Fix by moving md_stop_writes() to raid_postsuspend(). NOTE: when the
recovery's frozen (MD_RECOVERY_FROZEN), writes haven't been started (or
are already stopped) so don't stop them again.
Also remove superfluous readonly setting.
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
When system is under memory pressure it is observed that dm bufio
shrinker often reclaims only one buffer per scan. This change fixes
the following two issues in dm bufio shrinker that cause this behavior:
1. ((nr_to_scan - freed) <= retain_target) condition is used to
terminate slab scan process. This assumes that nr_to_scan is equal
to the LRU size, which might not be correct because do_shrink_slab()
in vmscan.c calculates nr_to_scan using multiple inputs.
As a result when nr_to_scan is less than retain_target (64) the scan
will terminate after the first iteration, effectively reclaiming one
buffer per scan and making scans very inefficient. This hurts vmscan
performance especially because mutex is acquired/released every time
dm_bufio_shrink_scan() is called.
New implementation uses ((LRU size - freed) <= retain_target)
condition for scan termination. LRU size can be safely determined
inside __scan() because this function is called after dm_bufio_lock().
2. do_shrink_slab() uses value returned by dm_bufio_shrink_count() to
determine number of freeable objects in the slab. However dm_bufio
always retains retain_target buffers in its LRU and will terminate
a scan when this mark is reached. Therefore returning the entire LRU size
from dm_bufio_shrink_count() is misleading because that does not
represent the number of freeable objects that slab will reclaim during
a scan. Returning (LRU size - retain_target) better represents the
number of freeable objects in the slab. This way do_shrink_slab()
returns 0 when (LRU size < retain_target) and vmscan will not try to
scan this shrinker avoiding scans that will not reclaim any memory.
Test: tested using Android device running
<AOSP>/system/extras/alloc-stress that generates memory pressure
and causes intensive shrinker scans
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Commit ca5beb76 ("dm mpath: micro-optimize the hot path relative to
MPATHF_QUEUE_IF_NO_PATH") caused bio-based DM-multipath to fail mptest's
"test_02_sdev_delete".
Restoring the logic that existed prior to commit ca5beb76 fixes this
bio-based DM-multipath regression. Also verified all mptest tests pass
with request-based DM-multipath.
This commit effectively reverts commit ca5beb76 -- but it does so
without reintroducing the need to take the m->lock spinlock in
must_push_back_{rq,bio}.
Fixes: ca5beb76 ("dm mpath: micro-optimize the hot path relative to MPATHF_QUEUE_IF_NO_PATH")
Cc: stable@vger.kernel.org # 4.12+
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
A NULL pointer is seen if two concurrent "vgchange -ay -K <vg name>"
processes race to load the dm-thin-pool module:
PID: 25992 TASK: ffff883cd7d23500 CPU: 4 COMMAND: "vgchange"
#0 [ffff883cd743d600] machine_kexec at ffffffff81038fa9
0000001 [ffff883cd743d660] crash_kexec at ffffffff810c5992
0000002 [ffff883cd743d730] oops_end at ffffffff81515c90
0000003 [ffff883cd743d760] no_context at ffffffff81049f1b
0000004 [ffff883cd743d7b0] __bad_area_nosemaphore at ffffffff8104a1a5
0000005 [ffff883cd743d800] bad_area at ffffffff8104a2ce
0000006 [ffff883cd743d830] __do_page_fault at ffffffff8104aa6f
0000007 [ffff883cd743d950] do_page_fault at ffffffff81517bae
0000008 [ffff883cd743d980] page_fault at ffffffff81514f95
[exception RIP: kmem_cache_alloc+108]
RIP: ffffffff8116ef3c RSP: ffff883cd743da38 RFLAGS: 00010046
RAX: 0000000000000004 RBX: ffffffff81121b90 RCX: ffff881bf1e78cc0
RDX: 0000000000000000 RSI: 00000000000000d0 RDI: 0000000000000000
RBP: ffff883cd743da68 R8: ffff881bf1a4eb00 R9: 0000000080042000
R10: 0000000000002000 R11: 0000000000000000 R12: 00000000000000d0
R13: 0000000000000000 R14: 00000000000000d0 R15: 0000000000000246
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
0000009 [ffff883cd743da70] mempool_alloc_slab at ffffffff81121ba5
0000010 [ffff883cd743da80] mempool_create_node at ffffffff81122083
0000011 [ffff883cd743dad0] mempool_create at ffffffff811220f4
0000012 [ffff883cd743dae0] pool_ctr at ffffffffa08de049 [dm_thin_pool]
0000013 [ffff883cd743dbd0] dm_table_add_target at ffffffffa0005f2f [dm_mod]
0000014 [ffff883cd743dc30] table_load at ffffffffa0008ba9 [dm_mod]
0000015 [ffff883cd743dc90] ctl_ioctl at ffffffffa0009dc4 [dm_mod]
The race results in a NULL pointer because:
Process A (vgchange -ay -K):
a. send DM_LIST_VERSIONS_CMD ioctl;
b. pool_target not registered;
c. modprobe dm_thin_pool and wait until end.
Process B (vgchange -ay -K):
a. send DM_LIST_VERSIONS_CMD ioctl;
b. pool_target registered;
c. table_load->dm_table_add_target->pool_ctr;
d. _new_mapping_cache is NULL and panic.
Note:
1. process A and process B are two concurrent processes.
2. pool_target can be detected by process B but
_new_mapping_cache initialization has not ended.
To fix dm-thin-pool, and other targets (cache, multipath, and snapshot)
with the same problem, simply dm_register_target() after all resources
created during module init (as labelled with __init) are finished.
Cc: stable@vger.kernel.org
Signed-off-by: monty <monty_pavel@sina.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Multiple refcounts are needed if the device was already added. The
micro-optimization of setting the refcount to 1 on first added (rather
than fall thru to a common refcount_inc) lost sight of the fact that the
refcount_inc is also needed for the case when the device already exists
and the mode need not be upgraded.
Fixes: 2a0b4682e0 ("dm: convert dm_dev_internal.count from atomic_t to refcount_t")
Reported-by: Zdenek Kabelac <zkabelac@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Pull ARM fix from Russell King:
"Just one fix this time around, for the late commit in the merge window
that triggered a problem with qemu. Qemu is apparently also going to
receive a fix for the discovered issue"
* 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: avoid faulting on qemu
Pull i2c fixes from Wolfram Sang:
"Here are two bugfixes for I2C, fixing a memleak in the core and irq
allocation for i801.
Also three bugfixes for the at24 eeprom driver which Bartosz collected
while taking over maintainership for this driver"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
eeprom: at24: check at24_read/write arguments
eeprom: at24: fix reading from 24MAC402/24MAC602
eeprom: at24: correctly set the size for at24mac402
i2c: i2c-boardinfo: fix memory leaks on devinfo
i2c: i801: Fix Failed to allocate irq -2147483648 error
Drop reference to obsolete maintainer tree
Fix overflow bug in pmbus driver
Fix SMBUS timeout problem in jc42 driver
For the SMBUS timeout handling, we had a brief discussion if this should
be considered a bug fix or a feature. Peter says "it fixes real problems
where the application misbehave due to faulty content when reading from
an eeprom", and he needs the patch in his company's v4.14 images. This is
good enough for me and warrants backport to stable kernels.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJaItAeAAoJEMsfJm/On5mBYrQQAJz+Ukg8blLKg1bqb01nZxEY
4vOxqZySpqR5icW6KP0zn/ws8eKyFjGQQsRxoRePo9Zfj2Y9RKRKVOoRrs7McZ6s
mO12KJBf13cGiB+Msm8JsjJv81E15RnDwsWw39+SPms3ueKiBDAl4xaH6PSeGKTZ
zB8teDk7MLLwCplRuNbB3qrc0BCj4AgeAu3omHO7PpKClOCHRieJPLaFE2Fpzsu/
P4RxUn4CFY0urgWJ5b9g5A3FdH8lOz8nfkiWPPnEb/IF+8tR9M3GYzwOp5r2uVul
uKszDMKKx3Q+Hi+67/Ou2uLhCDnaxYtFHiN+REB9dRi33BHSuIsc4riCqa1ZMz8c
XWQLbQq3u0bS1XgiegD4nihF2iNrj0fMcy2dcnUVWJNFKrfHjGAIodY0cKZJsZKW
RqzWlX/aUVIpCSKxtJm0xDNPJS5FqKXLCV0xsNMF2Mz2JosT8o4IARZNlY7Ap0Be
kRiQMXA/3y2RyeOUi82YHM0MorMt4icmTT3ztRrJpVbM1MBiiX2SefZquai5RLgp
o/qTOrJ0gD8XzBhwP8wQYP5BvpPX2UX3V0sjLcRDNakqaOaDuslAMtzZ0sNn37Ng
R+sAJNuFkLZkzhsa8IZSidngWMLvGS3Zjh2N75v6HcEaLrVoK2p6rBJM82Dg8cmp
0GwZkjd72bdXIXuRLpri
=5rYP
-----END PGP SIGNATURE-----
Merge tag 'hwmon-for-linus-v4.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
"Fixes:
- Drop reference to obsolete maintainer tree
- Fix overflow bug in pmbus driver
- Fix SMBUS timeout problem in jc42 driver
For the SMBUS timeout handling, we had a brief discussion if this
should be considered a bug fix or a feature. Peter says "it fixes real
problems where the application misbehave due to faulty content when
reading from an eeprom", and he needs the patch in his company's v4.14
images. This is good enough for me and warrants backport to stable
kernels"
* tag 'hwmon-for-linus-v4.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (jc42) optionally try to disable the SMBUS timeout
hwmon: (pmbus) Use 64bit math for DIRECT format values
hwmon: Drop reference to Jean's tree
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEFp3rbAvDxGAT0sefEacuoBRx13IFAlofxdUACgkQEacuoBRx
13JvfA//bB7nUyvHlglfYMOq6z4O3L7IB5bbWs+Z5hoccR0nsnYPXwe3huIyuxBa
vQLgztqpyzcsT3LYDWS7sD/NQQoHF0it2ZOSRl7pYo83I1KXeKcbimDp3NmurG89
kx8AEtdhD4XkP/E7IwCsYlO7xTms0d9hShoyy0+0/GB4St5u+NOip+zT3TQVjJBS
+ChnHMala2WBQji0wXmfOwFGHGEeEZXx5ZdrIheEiedFgOV0k7r/9IwaHVfh+DUb
Lyb8fRCqTWwUblyky8nybSAtl4ki4jU5FJhiPsa3tI0MO2Vt5kzwWHYCYyQXo8D5
BEYW1gsFY2R2SV/QF3SNmfWK+HQLgZl+MYzslCd25GiSDFD4mMhRvi85twig8mC8
w86oaY22IVPAh5aUeF7W1FyRdYmAESsG2gOOG5dyxf8XPeGL7IqaV+GJkj2tPTdC
OQ9q2hnO08e10e7Nub4k6NCWrIXK4WdNSjyRJz7DE2bvhWtYtnlDB3pCSKpxRNp7
6aHOMKHnJyNsGIGYmfq7/Zyq511EtYux3xSAZa3NwnnukEE5CeHnYleO4IXXBxNU
reVIq14QZ5AngT0QF7p+oE0evf2bqeNrv8i2UFF/qqtEA6mCbYnVsgu0+vzuy31t
k1X/PkfAgQLdqI5TDDABP3y++PSfdWp07hIdlG9wPKVlwjFj2z8=
=ldzq
-----END PGP SIGNATURE-----
Merge tag 'at24-4.15-fixes-for-wolfram' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux into i2c/for-current
Please consider pulling the following fixes for v4.15. While it doesn't
fix any regression introduced in the v4.15 merge window, we have a
feature in at24 since linux v4.8 - reading the mac address block from
at24mac series - which turned out to be not working.
This pull request contains changes that fix it together with a patch
that hardens the read and write argument sanitization with
out-of-bounds checks that were missing.
Bugfixes:
- NFSv4: Ensure gcc 4.4.4 can compile initialiser for "invalid_stateid"
- SUNRPC: Allow connect to return EHOSTUNREACH
- SUNRPC: Handle ENETDOWN errors
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAlohwp4ACgkQ18tUv7Cl
QOtq1A//RPOxJBPQsImfkVTiVzxZbS8k2/obJSZjPYoNozmywEJs9dnFYJVCFUGp
l9AvRd/SjXOVjGovk6ZhDCY3xA2eP1XfOLiVg7EhpczPVCRNJ34BUT7hWyxnTLSz
MKc1qLLfVaSjsLioO6YmdCPjiGC0KegrBKNlRlIbI+OjCq5aNJpz73Fb4mFgCp5M
taERunf7X29WHxAVn0c3mhIHN7tpCi9SgfbMURBEKLNrzj7RxnRY07dT1S9Mg/Yg
4FWU9FIpAyk9C9we/LR9jUywZQ3GGJFFFTOo8RfyMB/LR9RACSXnbHjhI1nUEQTb
R/NpBxlpvxEOapHdmw32jwj1fkY/WYlUiJekQhjEekp/HkFNdctQL8PjrhG6lIW7
eBfFqZ2RUhYF1OQ8k4o0pR60O2scH3/D7tZwpgnJMFSpQSMnPnU8K3gvn/B5Mi4f
UPDHtfj3GlWCIIJq1RIqKN4mt4tPktatnTCLIzDmqNbwqISwxow1lxmSesNejULo
MryXLLl5M3XegjokXs0d0hadoywswHRTAxXxQEZav0dKMcHq4F0NirVw+VOIyNCB
CztIVFI5Czzo4h4x99lgN26bNTysGMvse2qiPkVVr0CZt2leyrZyTl9khvDe3C0t
ijyq882b4LqibuQtnI3l/Pynrrowfp7fqYx7SO62VJjraBVYUzE=
=eQyi
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-4.15-2' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client fixes from Anna Schumaker:
"These patches fix a problem with compiling using an old version of
gcc, and also fix up error handling in the SUNRPC layer.
- NFSv4: Ensure gcc 4.4.4 can compile initialiser for
"invalid_stateid"
- SUNRPC: Allow connect to return EHOSTUNREACH
- SUNRPC: Handle ENETDOWN errors"
* tag 'nfs-for-4.15-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
SUNRPC: Handle ENETDOWN errors
SUNRPC: Allow connect to return EHOSTUNREACH
NFSv4: Ensure gcc 4.4.4 can compile initialiser for "invalid_stateid"
- Fix memory leaks that appeared after removing ifork inline data buffer
- Recover deferred rmap update log items in correct order
- Fix memory leaks when buffer construction fails
- Fix memory leaks when bmbt is corrupt
- Fix some uninitialized variables and math problems in the quota scrubber
- Add some omitted attribution tags on the log replay commit
- Fix some UBSAN complaints about integer overflows with large sparse files
- Implement an effective inode mode check in online fsck
- Fix log's inability to retry quota item writeout due to transient errors
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABCgAGBQJaIDZ8AAoJEPh/dxk0SrTrTD4QAIUq223XSyqMJYkAK163zMj4
PADY30MV7uMlFBLEm3b7ZEWA/vtFzDM7Qpa61WN15oR5jEVSqSFes9AzuLeISqia
s7Hc1ksqgZLNaMnW+jQc4iT/yiCVhiWw3rFC4tahDVCF2lJO/la3ToUBbcoADAFk
kBYVN1H1t5b+n5+A9QY6+Vxm6LXGPPo8vNyCQCEtN+dE7CcSEL4Ff9H9GmJiVPzk
rG6uizwRvxZje/yY1jEnkCSI88Gj1v0L//VmIDDuGjCZleYxwbTQQO0l8p4S+Su8
48la8PZbk3KcBTfiRbcU0m4995DHDVT/mAOWHeZnv+ZI5jhDEe1lpJG5l65kwPK+
BOoTYaRaBv3yZvEOob6wEqyfT3A1dxXstKBJLPyHx+McqFH8+NV2WAry+6dedOkv
Hwz6+OlAFmuBuhOZAZSt0LSWxu/qYovo5lCSNrBtiLlmDyFjtdbanQ7s8oWaV7p/
wimNV4Y+Y3XiePOEUftnG8yxOULZS4KMeYsdJxj9HzaKloYHQer+MWfPe0gzExBb
eE3P9PckQpcx9hK8LE1irgDCDG6J2eb8b5sFZY0eNzngdtWCR/xYz3NFT+72kz3s
XOI0mByH1Ab0Q1lvJml0RyW86Uj7lpMD2SzV2nVhbYrW81rkkzb7AQx5VyO57Gq6
WAX9mHNNRcY+uVrbb8QQ
=oTB7
-----END PGP SIGNATURE-----
Merge tag 'xfs-4.15-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fixes from Darrick Wong:
"Here are some bug fixes for 4.15-rc2.
- fix memory leaks that appeared after removing ifork inline data
buffer
- recover deferred rmap update log items in correct order
- fix memory leaks when buffer construction fails
- fix memory leaks when bmbt is corrupt
- fix some uninitialized variables and math problems in the quota
scrubber
- add some omitted attribution tags on the log replay commit
- fix some UBSAN complaints about integer overflows with large sparse
files
- implement an effective inode mode check in online fsck
- fix log's inability to retry quota item writeout due to transient
errors"
* tag 'xfs-4.15-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: Properly retry failed dquot items in case of error during buffer writeback
xfs: scrub inode mode properly
xfs: remove unused parameter from xfs_writepage_map
xfs: ubsan fixes
xfs: calculate correct offset in xfs_scrub_quota_item
xfs: fix uninitialized variable in xfs_scrub_quota
xfs: fix leaks on corruption errors in xfs_bmap.c
xfs: fortify xfs_alloc_buftarg error handling
xfs: log recovery should replay deferred ops in order
xfs: always free inline data before resetting inode fork during ifree
This tag contains a handful of small cleanups that are a result of
feedback that didn't make it into our original patch set, either because
the feedback hadn't been given yet, I missed the original emails, or
we weren't ready to submit the changes yet.
I've been maintaining the various cleanup patch sets I have as their own
branches, which I then merged together and signed. Each merge commit
has a short summary of the changes, and each branch is based on your
latest tag (4.15-rc1, in this case). If this isn't the right way to do
this then feel free to suggest something else, but it seems sane to me.
Here's a short summary of the changes, roughly in order of how
interesting they are.
* libgcc.h has been moved from include/lib, where it's the only member,
to include/linux. This is meant to avoid tab completion conflicts.
* VDSO entries for clock_get/gettimeofday/getcpu have been added. These
are simple syscalls now, but we want to let glibc use them from the
start so we can make them faster later.
* A VDSO entry for instruction cache flushing has been added so
userspace can flush the instruction cache.
* The VDSO symbol versions for __vdso_cmpxchg{32,64} have been removed,
as those VDSO entries don't actually exist.
* __io_writes has been corrected to respect the given type.
* A new READ_ONCE in arch_spin_is_locked().
* __test_and_op_bit_ord() is now actually ordered.
* Various small fixes throughout the tree to enable allmodconfig to
build cleanly.
* Removal of some dead code in our atomic support headers.
* Improvements to various comments in our atomic support headers.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEAM520YNJYN/OiG3470yhUCzLq0EFAlohyvMTHHBhbG1lckBk
YWJiZWx0LmNvbQAKCRDvTKFQLMurQWkhD/wO/F8vrwsNMOWR8zxvHdB30KD+FHmr
X1+X9OqnH8AMd4Woj6pS0ap7g0GCKuLiI/bOTrQVVdTJpmKFaJ9rrwRCJzHq43yt
feRjKyPAFYlvf6YaIEJ3YHU0t3LO1eK27YyFMg6F8y+bZim6oK2GdyfYF0Xiik3B
L3NkDPSH4oplTJjUI+tzDZdMsuZKhxpXPnbNQA7YZLepz04jOPGWqFrA1C3gAaVQ
dj1OkOGTSyQFwia7LrIm2g0J5/mqpjAF0KjdiTsvH6G9x3V0HZYU5Br3kHgauWKc
YrNEbbDl8EakT5QocPf5F4Z8qpO9Hvxjwe2/z27usPtV9FQOPuDDPOygSPykwNNJ
bDfv9nIE3W7lN26BaRcV2ivY3r9ZpCEcq+qXIiTm3P/uTVqjMq54NkvHnj4ON1ih
DJZEgkM9L+rm7c9XDn627FBkmkeEndPJcQ3P/nopb5zGTYTb2HGrUt2nM+KR2vuE
FdYtA9+ll3OzyFO3OVVjiAlxr8Qnwf2wIWXJXxWpcmmchGJ5NeTSZtiD14pAP5eC
EDpoWwefvhqRMGdOlgq/fkx4Mrhz27euWXine3ZccprABAf7Hxkb/N5ojIJKT7qW
mN3HL3PC9P0t/HxQEu0q0NLLsP+X/1yZ5HmDl44Y7N8aeCrIUXaB61gsTt6Oi6Ha
PMJi5PI6VDDQbA==
=CCe+
-----END PGP SIGNATURE-----
Merge tag 'riscv-for-linus-4.15-rc2_cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux
Pull RISC-V cleanups and ABI fixes from Palmer Dabbelt:
"This contains a handful of small cleanups that are a result of
feedback that didn't make it into our original patch set, either
because the feedback hadn't been given yet, I missed the original
emails, or we weren't ready to submit the changes yet.
I've been maintaining the various cleanup patch sets I have as their
own branches, which I then merged together and signed. Each merge
commit has a short summary of the changes, and each branch is based on
your latest tag (4.15-rc1, in this case). If this isn't the right way
to do this then feel free to suggest something else, but it seems sane
to me.
Here's a short summary of the changes, roughly in order of how
interesting they are.
- libgcc.h has been moved from include/lib, where it's the only
member, to include/linux. This is meant to avoid tab completion
conflicts.
- VDSO entries for clock_get/gettimeofday/getcpu have been added.
These are simple syscalls now, but we want to let glibc use them
from the start so we can make them faster later.
- A VDSO entry for instruction cache flushing has been added so
userspace can flush the instruction cache.
- The VDSO symbol versions for __vdso_cmpxchg{32,64} have been
removed, as those VDSO entries don't actually exist.
- __io_writes has been corrected to respect the given type.
- A new READ_ONCE in arch_spin_is_locked().
- __test_and_op_bit_ord() is now actually ordered.
- Various small fixes throughout the tree to enable allmodconfig to
build cleanly.
- Removal of some dead code in our atomic support headers.
- Improvements to various comments in our atomic support headers"
* tag 'riscv-for-linus-4.15-rc2_cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux: (23 commits)
RISC-V: __io_writes should respect the length argument
move libgcc.h to include/linux
RISC-V: Clean up an unused include
RISC-V: Allow userspace to flush the instruction cache
RISC-V: Flush I$ when making a dirty page executable
RISC-V: Add missing include
RISC-V: Use define for get_cycles like other architectures
RISC-V: Provide stub of setup_profiling_timer()
RISC-V: Export some expected symbols for modules
RISC-V: move empty_zero_page definition to C and export it
RISC-V: io.h: type fixes for warnings
RISC-V: use RISCV_{INT,SHORT} instead of {INT,SHORT} for asm macros
RISC-V: use generic serial.h
RISC-V: remove spin_unlock_wait()
RISC-V: `sfence.vma` orderes the instruction cache
RISC-V: Add READ_ONCE in arch_spin_is_locked()
RISC-V: __test_and_op_bit_ord should be strongly ordered
RISC-V: Remove smb_mb__{before,after}_spinlock()
RISC-V: Remove __smp_bp__{before,after}_atomic
RISC-V: Comment on why {,cmp}xchg is ordered how it is
...
- Fix FP register corruption when SVE is not available or in use
- Fix out-of-tree module build failure when CONFIG_ARM64_MODULE_PLTS=y
- Missing 'const' generating errors with LTO builds
- Remove unsupported events from Cortex-A73 PMU description
- Removal of stale and incorrect comments
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABCgAGBQJaIXOkAAoJELescNyEwWM0swYH/3iSLxKnGDht1M9xqa5V288z
eNC/Vw/Y/Sqi305reRK6gWbJ0hwtJLYSEK3tDbeL6C9v9mg8CIZNzbPI3vrEjAq+
n8yKmJVYaXlu9jmmo7vqF7LZ7LRgKZPO0cEKWZBR8LAYjD0zJPikwDR/JvTkGH75
1VnFfwuMykB989NMcVGQ1eD2G5RH13e2j9D2ErT0fbdcZ/MWpcviVVqMr4ggsQoR
imVozMPXXLQ/0LeUfr8IRIst3x0CgFwmMX7CDWoVJJJXB7Zq0nvNptEtlS5tUZ/x
1vbXJstFasG3EL6QKiKxfUvtbaa4Vm7xEBBIVABQij+iUw8Og1OBojVi0wBCE3s=
=9hCV
-----END PGP SIGNATURE-----
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"The critical one here is a fix for fpsimd register corruption across
signals which was introduced by the SVE support code (the register
files overlap), but the others are worth having as well.
Summary:
- Fix FP register corruption when SVE is not available or in use
- Fix out-of-tree module build failure when CONFIG_ARM64_MODULE_PLTS=y
- Missing 'const' generating errors with LTO builds
- Remove unsupported events from Cortex-A73 PMU description
- Removal of stale and incorrect comments"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: context: Fix comments and remove pointless smp_wmb()
arm64: cpu_ops: Add missing 'const' qualifiers
arm64: perf: remove unsupported events for Cortex-A73
arm64: fpsimd: Fix failure to restore FPSIMD state after signals
arm64: pgd: Mark pgd_cache as __ro_after_init
arm64: ftrace: emit ftrace-mod.o contents through code
arm64: module-plts: factor out PLT generation code for ftrace
arm64: mm: cleanup stale AIVIVT references
Olaf said: Here's a short series of patches that produces a working
allmodconfig. Would be nice to see them go in so we can add build
coverage.
I've dropped patches 8 and 10 from the original set:
* [PATCH 08/10] (RISC-V: Set __ARCH_WANT_RENAMEAT to pick up generic
version) has a better fix that I've sent out for review, we don't want
renameat.
* [PATCH 10/10] (input: joystick: riscv has get_cycles) has already been
taken into Dmitry Torokhov's tree.
This merge contains the user-visible, ABI-breaking changes that we want
to make sure we have in Linux before our first release. Highlights
include:
* VDSO entries for clock_get/gettimeofday/getcpu have been added. These
are simple syscalls now, but we want to let glibc use them from the
start so we can make them faster later.
* A VDSO entry for instruction cache flushing has been added so
userspace can flush the instruction cache.
* The VDSO symbol versions for __vdso_cmpxchg{32,64} have been removed,
as those VDSO entries don't actually exist.
Conflicts:
arch/riscv/include/asm/tlbflush.h
This patch set is the result of some feedback that filtered through
after our original patch set was reviewed, some of which was the result
of me missing some email. It contains:
* A new READ_ONCE in arch_spin_is_locked()
* __test_and_op_bit_ord() is now actually ordered
* Improvements to various comments
* Removal of some dead code
Whoops -- I must have just been being an idiot again. Thanks to Segher
for finding the bug :).
CC: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
Introducing a new include/lib directory just for this file totally
messes up tab completion for include/linux, which is highly annoying.
Move it to include/linux where we have headers for all kinds of other
lib/ code as well.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>