In the patch referred to below we added link tolerance as an additional
criteria for declaring broadcast transmission "stale" and resetting the
affected links.
However, the 'tolerance' field of the broadcast link is never set, and
remains at zero. This renders the whole commit without the intended
improving effect, but luckily also with no negative effect.
In this commit we add the missing initialization.
Fixes: a4dc70d46c ("tipc: extend link reset criteria for stale packet retransmission")
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sabrina Dubroca says:
====================
net: ipv4: fixes for PMTU when link MTU changes
The first patch adapts the changes that commit e9fa1495d7 ("ipv6:
Reflect MTU changes on PMTU of exceptions for MTU-less routes") did in
IPv6 to IPv4: lower PMTU when the first hop's MTU drops below it, and
raise PMTU when the first hop was limiting PMTU discovery and its MTU
is increased.
The second patch fixes bugs introduced in commit d52e5a7e7c ("ipv4:
lock mtu in fnhe when received PMTU < net.ipv4.route.min_pmtu") that
only appear once the first patch is applied.
Selftests for these cases were introduced in net-next commit
e44e428f59 ("selftests: pmtu: add basic IPv4 and IPv6 PMTU tests")
v2: add cover letter, and fix a few small things in patch 1
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When an MTU update with PMTU smaller than net.ipv4.route.min_pmtu is
received, we must clamp its value. However, we can receive a PMTU
exception with PMTU < old_mtu < ip_rt_min_pmtu, which would lead to an
increase in PMTU.
To fix this, take the smallest of the old MTU and ip_rt_min_pmtu.
Before this patch, in case of an update, the exception's MTU would
always change. Now, an exception can have only its lock flag updated,
but not the MTU, so we need to add a check on locking to the following
"is this exception getting updated, or close to expiring?" test.
Fixes: d52e5a7e7c ("ipv4: lock mtu in fnhe when received PMTU < net.ipv4.route.min_pmtu")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit 5aad1de5ea ("ipv4: use separate genid for next hop
exceptions"), exceptions get deprecated separately from cached
routes. In particular, administrative changes don't clear PMTU anymore.
As Stefano described in commit e9fa1495d7 ("ipv6: Reflect MTU changes
on PMTU of exceptions for MTU-less routes"), the PMTU discovered before
the local MTU change can become stale:
- if the local MTU is now lower than the PMTU, that PMTU is now
incorrect
- if the local MTU was the lowest value in the path, and is increased,
we might discover a higher PMTU
Similarly to what commit e9fa1495d7 did for IPv6, update PMTU in those
cases.
If the exception was locked, the discovered PMTU was smaller than the
minimal accepted PMTU. In that case, if the new local MTU is smaller
than the current PMTU, let PMTU discovery figure out if locking of the
exception is still needed.
To do this, we need to know the old link MTU in the NETDEV_CHANGEMTU
notifier. By the time the notifier is called, dev->mtu has been
changed. This patch adds the old MTU as additional information in the
notifier structure, and a new call_netdevice_notifiers_u32() function.
Fixes: 5aad1de5ea ("ipv4: use separate genid for next hop exceptions")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The fib6_info_alloc() function allocates percpu memory to hold per CPU
pointers to rt6_info, but this memory is never freed. Fix it.
Fixes: a64efe142f ("net/ipv6: introduce fib6_info struct and helpers")
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQIVAwUAW7vWy/u3V2unywtrAQJkeA//UTmUmv9bL8k38XvnnHQ82iTyIjvZ40md
9XbNagwjkxuDv2eqW1rTxk5+Po596Nb4s/cBvpyAABjCA+zoT0/fklK5PqsSKFla
ibAHlrBiv/1FlSUSTf4dvAfVVszRAItCP4OhKTU+0fkX3Hq75T7Lty1h92Yk/ijx
ccqSV9K/nB4WN9bbg+sDngu3RAVR3VJ5RhVSmrnE9xd3M9ihz2RHacJ6GWNQf5PV
j8cKuP3qY/2K245hb4sGn1i9OO6/WVVQhnojYA+jR8Vg4+8mY0Y++MLsW8ywlsuo
xlrDCVPBaR6YqjnlDoOXIXYZ8EbEsmy5FP51zfEZJqeKJcVPLavQyoku0xJ8fHgr
oK/3DBUkwgi0hiq4dC1CrmXZRSbBQqGTEJMA6AKh2ApXb9BkmhmqRGMx+2nqocwF
TjAQmCu0FJTAYdwJuqzv5SRFjocjrHGcYDpQ27CgNqSiiDlh8CZ1ej/mbV+o8pUQ
tOrpnRCLY8Nl55AgJbl2260mKcpcKQcRUwggKAOzqRjXwcy9q3KKm8H149qejJSl
AQihDkPVqBZrv5ODKR0z88e5cVF9ad+vBPd+2TrRVGtTWhm/q8N/0Hh0M2sYDSc6
vLklYfwI4U5U1/naJC9WCkW+ybj5ekVNHgW91ICvQVNG0TH2zfE7sUGMQn6lRZL8
jfpLTVbtKfI=
=q6E6
-----END PGP SIGNATURE-----
Merge tag 'rxrpc-fixes-20181008' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
David Howells says:
====================
rxrpc: Fix packet reception code
Here are a set of patches that prepares for and fix problems in rxrpc's
package reception code. There serious problems are:
(A) There's a window between binding the socket and setting the data_ready
hook in which packets can find their way into the UDP socket's receive
queues.
(B) The skb_recv_udp() will return an error (and clear the error state) if
there was an error on the Tx side. rxrpc doesn't handle this.
(C) The rxrpc data_ready handler doesn't fully drain the UDP receive
queue.
(D) The rxrpc data_ready handler assumes it is called in a non-reentrant
state.
The second patch fixes (A) - (C); the third patch renders (B) and (C)
non-issues by using the recap_rcv hook instead of data_ready - and the
final patch fixes (D). That last is the most complex.
The preparatory patches are:
(1) Fix some places that are doing things in the wrong net namespace.
(2) Stop taking the rcu read lock as it's held by the IP input routine in
the call chain.
(3) Only end the Tx phase if *we* rotated the final packet out of the Tx
buffer.
(4) Don't assume that the call state won't change after dropping the
call_state lock.
(5) Only take receive window and MTU suze parameters from an ACK packet if
it's the latest ACK packet.
(6) Record connection-level abort information correctly.
(7) Fix a trace line.
And then there are three main patches - note that these are mixed in with
the preparatory patches somewhat:
(1) Fix the setup window (A), skb_recv_udp() error check (B) and packet
drainage (C).
(2) Switch to using the encap_rcv instead of data_ready to cut out the
effects of the UDP read queues and get the packets delivered directly.
(3) Add more locking into the various packet input paths to defend against
re-entrance (D).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
In rds_send_mprds_hash(), if the calculated hash value is non-zero and
the MPRDS connections are not yet up, it will wait. But it should not
wait if the send is non-blocking. In this case, it should just use the
base c_path for sending the message.
Signed-off-by: Ka-Cheong Poon <ka-cheong.poon@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
checked for CONFIG_DM_ZONED rather than CONFIG_BLK_DEV_ZONED.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJbvsLkAAoJEMUj8QotnQNaGLMIAOOCRjlNfzAgtFixyk53GF4g
oUbWaIhp1j27t3xq8K7A0yOG6mILZZd39mtnkW1Zo5TA+kWdMmJDij/QAhd4MKFl
m+SUjHr97hbXwRJSOm2qjfm6syfuSXLYWhW2/z9KI3bT6lQg5KAdhGGka3P5ZJ7r
awNxV2K9vuog4OAS8b56dHsELmhvdNd8dKZ8eX//eP9d/3FCGH7ZmKukxAZMf4OW
WK9ym5cDulCzvNMnfvCKmMZQJHoWvaIv6wckl20P8BImdM8hJmwPCrQvOqNRztHM
y15XyKHHqyDSwPByNCzJZ3KnHywdkg988p4bwRO/ciCYRCHfdmOxBlaY7SVIQi0=
=W4w5
-----END PGP SIGNATURE-----
Merge tag 'for-4.19/dm-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Mike writes:
"device mapper fix for 4.19 final
- Fix for earlier 4.19 final DM linear change that incorrectly
checked for CONFIG_DM_ZONED rather than CONFIG_BLK_DEV_ZONED."
* tag 'for-4.19/dm-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm linear: fix linear_end_io conditional definition
Update for 4.19-rc7 to fix numerous file clone and deduplication issues.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJbvo1jAAoJEK3oKUf0dfodurwP/3OGvJo11XjE5Wuh5tbcSOUm
duJdJvN+b0gx8Lb1IL2lHaBEt/Pw3PKi+KPx6d+pd47aflCYNMiU7I1kzejxvq9I
TBVdhxffAmNBU02VU/qmhucMnKfpVr/UX19f0sHZcfXbZkpIuHrSpI25MNSYqbEs
bRoMDkgbuu377hCJu6tnBAUm38z4iZdfJaGPVmJmuVMn8JJE8KA3Une/daxg0/HY
zbCMWP3SGJwqx7SyyeurQSkRS/8GG3LgbnYOz0FlaHfBd4JVJscHOZU08nrYmMAN
9SOowvahaPkDcyO8c6gRkyE6NIbOsb79718g+HTm4eqzyChnh+jnFTzitcTMuK2c
vjblXZSDKnN/P4UaEEzIAnQ1Eew4WhLrKuBr+3fCr2bKTGfj0iiX36CjEgk1k+Df
t1DV/Hj6me6crZpWO7GQZVLnDsiJBzaOsIgpYnB3vcJyof/cgm+5bfGFedDZFdpV
XTh0oBN8B7oQztd4MvcfQOGXSktrMtq+6RomO8kv77mVtHFnjnroKHq/wIft2nyI
rs7u7DFmCLyB9Lbm8aoeMPo4+0zxTp9385HSJ+XznHcjGXxkxIIGhhdU3omKxANa
j3UQB1+VRC4lHSluUXOW7vH0j1tX3gaS3z/yJ+gfcAdizUmdp0h0rMx8c9VR8SEJ
xv/fpbrcKHyBwuNvKtBm
=yj1P
-----END PGP SIGNATURE-----
Merge tag 'xfs-fixes-for-4.19-rc7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Dave writes:
"xfs: fixes for 4.19-rc7
Update for 4.19-rc7 to fix numerous file clone and deduplication issues."
* tag 'xfs-fixes-for-4.19-rc7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: fix data corruption w/ unaligned reflink ranges
xfs: fix data corruption w/ unaligned dedupe ranges
xfs: update ctime and remove suid before cloning files
xfs: zero posteof blocks when cloning above eof
xfs: refactor clonerange preparation into a separate helper
The dm-linear target is independent of the dm-zoned target. For code
requiring support for zoned block devices, use CONFIG_BLK_DEV_ZONED
instead of CONFIG_DM_ZONED.
While at it, similarly to dm linear, also enable the DM_TARGET_ZONED_HM
feature in dm-flakey only if CONFIG_BLK_DEV_ZONED is defined.
Fixes: beb9caac21 ("dm linear: eliminate linear_end_io call if CONFIG_DM_ZONED disabled")
Fixes: 0be12c1c7f ("dm linear: add support for zoned block devices")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
mlx5e netdevice used to calculate fragment edges by a call to
mlx5_wq_cyc_get_frag_size(). This calculation did not give the correct
indication for queues smaller than a PAGE_SIZE, (broken by default on
PowerPC, where PAGE_SIZE == 64KB). Here it is replaced by the correct new
calls/API.
Since (TX/RX) Work Queues buffers are fragmented, here we introduce
changes to the API in core driver, so that it gets a stride index and
returns the index of last stride on same fragment, and an additional
wrapping function that returns the number of physically contiguous
strides that can be written contiguously to the work queue.
This obsoletes the following API functions, and their buggy
usage in EN driver:
* mlx5_wq_cyc_get_frag_size()
* mlx5_wq_cyc_ctr2fragix()
The new API improves modularity and hides the details of such
calculation for mlx5e netdevice and mlx5_ib rdma drivers.
New calculation is also more efficient, and improves performance
as follows:
Packet rate test: pktgen, UDP / IPv4, 64byte, single ring, 8K ring size.
Before: 16,477,619 pps
After: 17,085,793 pps
3.7% improvement
Fixes: 3a2f703312 ("net/mlx5: Use order-0 allocations for all WQ types")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The HW spec defines only bits 24-26 of pftype_wq as the page fault type,
use the required mask to ensure that.
Fixes: d9aaed8387 ("{net,IB}/mlx5: Refactor page fault handling")
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Allocated memory for context should be freed once
finished working with it.
Fixes: d6c4f0298c ("net/mlx5: Refactor accel IPSec code")
Signed-off-by: Talat Batheesh <talatb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
cleanup a KMEM_CACHE if target registration fails.
- Two stable@ fixes for DM zoned target; 4.20 will have changes that
eliminate this code entirely but <= 4.19 needs these changes.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJbviuKAAoJEMUj8QotnQNa06YH/1JiwvxQzgWVfyjWeYNlK81Y
BTo0CRH+T8qe0LVF3Y5Dz4oH3JVlU7SrlseTn57tR/gmPgE88XXByOlr1VAvqaEj
x//MAEVQcvWE8luF/QEK04/eUCPK+U0L4ix2YSKngS/IkeMzfEtSiki4FgRrR8OI
qLwNRpWoTOOBRMBkJEaDbD4uOzHKoK+LdPekbWrFx0j231Tp3iuxD+/gaRmWBKip
LFm45rXuFJB8+ZUrsO8wlHGMVoe8yo9D7qfdYG0DUXRcE+iXl7ml3kPPu09e9+zY
1JYxXJzsbGuRfknf0zj5sETxuE2fVJhhPv2zsIdIlGBvyRUO2ELUyRhOENhU4XI=
=RGlv
-----END PGP SIGNATURE-----
Merge tag 'for-4.19/dm-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Mike writes:
"device mapper fixes for 4.19 final
- Fix a DM cache module init error path bug that doesn't properly
cleanup a KMEM_CACHE if target registration fails.
- Two stable@ fixes for DM zoned target; 4.20 will have changes that
eliminate this code entirely but <= 4.19 needs these changes."
* tag 'for-4.19/dm-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm linear: eliminate linear_end_io call if CONFIG_DM_ZONED disabled
dm: fix report zone remapping to account for partition offset
dm cache: destroy migration_cache if cache target registration failed
values that came after a dereference pointer.
trace_printk() utilizes vbin_printf() and bstr_printf() to keep the
overhead of tracing down. vbin_printf() does not do any conversions
and just stors the string format and the raw arguments into the
buffer. bstr_printf() is used to read the buffer and does the conversions
to complete the printf() output.
This can be troublesome with dereferenced pointers because the reference
may be different from the time vbin_printf() is called to the time
bstr_printf() is called. To fix this, a prior commit changed vbin_printf()
to convert dereferenced pointers into strings and load the converted
string into the buffer. But the change to bstr_printf() had an off-by-one
error and didn't account for the nul character at the end of the string
and this corrupted the rest of the values in the format that came after
a dereferenced pointer.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCW737iRQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qnraAQDVbp0aWOpS73YUVbW/bArC8t8Z6/9h
bXLeCdSSa1BHswD+K+kj7NiVrxIzyXrotb40JoscLsaXSIEJjlNFHQKqxQQ=
=4BpJ
-----END PGP SIGNATURE-----
Merge tag 'trace-v4.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Steven writes:
"vsprint fix:
It was reported that trace_printk() was not reporting properly
values that came after a dereference pointer.
trace_printk() utilizes vbin_printf() and bstr_printf() to keep the
overhead of tracing down. vbin_printf() does not do any conversions
and just stors the string format and the raw arguments into the
buffer. bstr_printf() is used to read the buffer and does the
conversions to complete the printf() output.
This can be troublesome with dereferenced pointers because the
reference may be different from the time vbin_printf() is called to
the time bstr_printf() is called. To fix this, a prior commit changed
vbin_printf() to convert dereferenced pointers into strings and load
the converted string into the buffer. But the change to bstr_printf()
had an off-by-one error and didn't account for the nul character at
the end of the string and this corrupted the rest of the values in
the format that came after a dereferenced pointer."
* tag 'trace-v4.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
vsprintf: Fix off-by-one bug in bstr_printf() processing dereferenced pointers
- Fix DT unittest on Oldworld MAC systems
-----BEGIN PGP SIGNATURE-----
iQJEBAABCgAuFiEEktVUI4SxYhzZyEuo+vtdtY28YcMFAlu95pgQHHJvYmhAa2Vy
bmVsLm9yZwAKCRD6+121jbxhwxHjEACgsFN0hQTsM4EKWN2kXRrDH5S5amXLa787
rI+tbdKDUkKTk94Ue+8Gm681Y9Euhecd3rMkZ4D+v0s63hbvxZV6DawuVE0ifhX9
K5x2Ara592rkMs4GT9GLysjWwJ6c7mV1yJdzZnsqzVSXTkMYIBUf1MI8i8nLFet2
8BnOt8sLKE8NJ3VwgDgb4hrF2tSZyl8UD/Bdn2AKPAZ6hKFW1ZL1ef0OZq7EoiLS
W3/2XM8Z0Ti7HuONyWxP+GhBkQOcxkuBRui4L1MkFXb4ENyfJEh9uZ+dz/hoXORI
y6WtNIGaOFFCGbT0IvFqqPly29JNchIAhG4B6MsLVHtg1wgOJ0z9iL4AZkIQJ+/B
GB0+ApcQk6cFvEe0EunqSZgoV6SuVLni5dUlmLAbm6mlUtbHNUjlW9o4DBv7uB0n
F6Rh0ruXm5VbLildpHXi6WcP7ijT03mbB4B9PRVC+ve9S8sKU1T3u7V2pUdOIerX
Xmh0dIPrCWMPvNSMUeYJr14gYutS9slTUKz50GH9pwHK8IdsOsps8TuIGrIHeiiP
8D3fF+Tn1DHcj6yYWI/vHW8JX2RsVq9AlY3MybU7dk13uSyEPDEu8tkjlpBDn+P2
vvrepGynAm021otLLtfMx1DiuyUdhqt4aK9nc5YhjJkPY0PFjyEb3Jqv48DXKQFD
WKRthDIXSg==
=bQuC
-----END PGP SIGNATURE-----
Merge tag 'devicetree-fixes-for-4.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Rob writes:
"Devicetree fixes for 4.19, part 3:
- Fix DT unittest on Oldworld MAC systems"
* tag 'devicetree-fixes-for-4.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
of: unittest: Disable interrupt node tests for old world MAC systems
Moshe Shemesh says:
====================
devlink param type string fixes
This patchset fixes devlink param infrastructure for string param type.
The devlink param infrastructure doesn't handle copying the string data
correctly. The first two patches fix it and the third patch adds helper
function to safely copy string value without exceeding
DEVLINK_PARAM_MAX_STRING_VALUE.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Devlink string param buffer is allocated at the size of
DEVLINK_PARAM_MAX_STRING_VALUE. Add helper function which makes sure
this size is not exceeded.
Renamed DEVLINK_PARAM_MAX_STRING_VALUE to
__DEVLINK_PARAM_MAX_STRING_VALUE to emphasize that it should be used by
devlink only. The driver should use the helper function instead to
verify it doesn't exceed the allowed length.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Driverinit configuration mode value is held by devlink to enable the
driver fetch the value after reload command. In case the param type is
string devlink should copy the value from driver string buffer to
devlink string buffer on devlink_param_driverinit_value_set() and
vice-versa on devlink_param_driverinit_value_get().
Fixes: ec01aeb180 ("devlink: Add support for get/set driverinit value")
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case devlink param type is string, it needs to copy the string value
it got from the input to devlink_param_value.
Fixes: e3b7ca18ad ("devlink: Add param set command")
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some samples require headers installation, so commit 3fca1700c4
("kbuild: make samples really depend on headers_install") added
such dependency in the top Makefile. However, UML fails to build
with CONFIG_SAMPLES=y because UML does not support headers_install.
Fixes: 3fca1700c4 ("kbuild: make samples really depend on headers_install")
Reported-by: Kees Cook <keescook@chromium.org>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
It is best to avoid any extra overhead associated with bio completion.
DM core will indirectly call a DM target's .end_io if it is defined.
In the case of DM linear, there is no need to do so (for every bio that
completes) if CONFIG_DM_ZONED is not enabled.
Avoiding an extra indirect call for every bio completion is very
important for ensuring DM linear doesn't incur more overhead that
further widens the performance gap between dm-linear and raw block
devices.
Fixes: 0be12c1c7f ("dm linear: add support for zoned block devices")
Cc: stable@vger.kernel.org
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Since 'commit 02e389e63e ("pinctrl: mcp23s08: fix irq setup order")' the
irq request isn't the last devm_* allocation. Without a deeper look at
the irq and testing this isn't a good solution. Since this driver relies
on the devm mechanism, requesting a interrupt should be the last thing
to avoid memory corruptions during unbinding.
'Commit 02e389e63e ("pinctrl: mcp23s08: fix irq setup order")' fixed the
order for the interrupt-controller use case only. The
mcp23s08_irq_setup() must be split into two to fix it for the
interrupt-controller use case and to register the irq at last. So the
irq will be freed first during unbind.
Cc: stable@vger.kernel.org
Cc: Jan Kundrát <jan.kundrat@cesnet.cz>
Cc: Dmitry Mastykin <mastichi@gmail.com>
Cc: Sebastian Reichel <sebastian.reichel@collabora.co.uk>
Fixes: 82039d244f ("pinctrl: mcp23s08: add pinconf support")
Fixes: 02e389e63e ("pinctrl: mcp23s08: fix irq setup order")
Signed-off-by: Marco Felsch <m.felsch@pengutronix.de>
Tested-by: Phil Reid <preid@electromag.com.au>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
gpiochip_set_cascaded_irqchip() is passed 'parent_irq' as an argument
and then the address of that argument is assigned to the gpio chips
gpio_irq_chip 'parents' pointer shortly thereafter. This can't ever
work, because we've just assigned some stack address to a pointer that
we plan to dereference later in gpiochip_irq_map(). I ran into this
issue with the KASAN report below when gpiochip_irq_map() tried to setup
the parent irq with a total junk pointer for the 'parents' array.
BUG: KASAN: stack-out-of-bounds in gpiochip_irq_map+0x228/0x248
Read of size 4 at addr ffffffc0dde472e0 by task swapper/0/1
CPU: 7 PID: 1 Comm: swapper/0 Not tainted 4.14.72 #34
Call trace:
[<ffffff9008093638>] dump_backtrace+0x0/0x718
[<ffffff9008093da4>] show_stack+0x20/0x2c
[<ffffff90096b9224>] __dump_stack+0x20/0x28
[<ffffff90096b91c8>] dump_stack+0x80/0xbc
[<ffffff900845a350>] print_address_description+0x70/0x238
[<ffffff900845a8e4>] kasan_report+0x1cc/0x260
[<ffffff900845aa14>] __asan_report_load4_noabort+0x2c/0x38
[<ffffff900897e098>] gpiochip_irq_map+0x228/0x248
[<ffffff900820cc08>] irq_domain_associate+0x114/0x2ec
[<ffffff900820d13c>] irq_create_mapping+0x120/0x234
[<ffffff900820da78>] irq_create_fwspec_mapping+0x4c8/0x88c
[<ffffff900820e2d8>] irq_create_of_mapping+0x180/0x210
[<ffffff900917114c>] of_irq_get+0x138/0x198
[<ffffff9008dc70ac>] spi_drv_probe+0x94/0x178
[<ffffff9008ca5168>] driver_probe_device+0x51c/0x824
[<ffffff9008ca6538>] __device_attach_driver+0x148/0x20c
[<ffffff9008ca14cc>] bus_for_each_drv+0x120/0x188
[<ffffff9008ca570c>] __device_attach+0x19c/0x2dc
[<ffffff9008ca586c>] device_initial_probe+0x20/0x2c
[<ffffff9008ca18bc>] bus_probe_device+0x80/0x154
[<ffffff9008c9b9b4>] device_add+0x9b8/0xbdc
[<ffffff9008dc7640>] spi_add_device+0x1b8/0x380
[<ffffff9008dcbaf0>] spi_register_controller+0x111c/0x1378
[<ffffff9008dd6b10>] spi_geni_probe+0x4dc/0x6f8
[<ffffff9008cab058>] platform_drv_probe+0xdc/0x130
[<ffffff9008ca5168>] driver_probe_device+0x51c/0x824
[<ffffff9008ca59cc>] __driver_attach+0x100/0x194
[<ffffff9008ca0ea8>] bus_for_each_dev+0x104/0x16c
[<ffffff9008ca58c0>] driver_attach+0x48/0x54
[<ffffff9008ca1edc>] bus_add_driver+0x274/0x498
[<ffffff9008ca8448>] driver_register+0x1ac/0x230
[<ffffff9008caaf6c>] __platform_driver_register+0xcc/0xdc
[<ffffff9009c4b33c>] spi_geni_driver_init+0x1c/0x24
[<ffffff9008084cb8>] do_one_initcall+0x240/0x3dc
[<ffffff9009c017d0>] kernel_init_freeable+0x378/0x468
[<ffffff90096e8240>] kernel_init+0x14/0x110
[<ffffff9008086fcc>] ret_from_fork+0x10/0x18
The buggy address belongs to the page:
page:ffffffbf037791c0 count:0 mapcount:0 mapping: (null) index:0x0
flags: 0x4000000000000000()
raw: 4000000000000000 0000000000000000 0000000000000000 00000000ffffffff
raw: ffffffbf037791e0 ffffffbf037791e0 0000000000000000 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffffffc0dde47180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffffffc0dde47200: f1 f1 f1 f1 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f2 f2
>ffffffc0dde47280: f2 f2 00 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3
^
ffffffc0dde47300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffffffc0dde47380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Let's leave around one unsigned int in the gpio_irq_chip struct for the
single parent irq case and repoint the 'parents' array at it. This way
code is left mostly intact to setup parents and we waste an extra few
bytes per structure of which there should be only a handful in a system.
Cc: Evan Green <evgreen@chromium.org>
Cc: Thierry Reding <treding@nvidia.com>
Cc: Grygorii Strashko <grygorii.strashko@ti.com>
Fixes: e0d8972898 ("gpio: Implement tighter IRQ chip integration")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
When powering down a SDIO connected card during suspend, make sure to call
into the generic lbs_suspend() function before pulling the plug. This will
make sure the card is successfully deregistered from the system to avoid
communication to the card starving out.
Fixes: 7444a80929 ("libertas: fix suspend and resume for SDIO connected cards")
Signed-off-by: Daniel Mack <daniel@zonque.org>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
On systems with OF_IMAP_OLDWORLD_MAC set in of_irq_workarounds, the
devicetree interrupt parsing code is different, causing unit tests of
devicetree interrupt nodes to fail. Due to a bug in unittest code, which
tries to dereference an uninitialized pointer, this results in a crash.
OF: /testcase-data/phandle-tests/consumer-a: arguments longer than property
Unable to handle kernel paging request for data at address 0x00bc616e
Faulting instruction address: 0xc08e9468
Oops: Kernel access of bad area, sig: 11 [#1]
BE PREEMPT PowerMac
Modules linked in:
CPU: 0 PID: 1 Comm: swapper Not tainted 4.14.72-rc1-yocto-standard+ #1
task: cf8e0000 task.stack: cf8da000
NIP: c08e9468 LR: c08ea5bc CTR: c08ea5ac
REGS: cf8dbb50 TRAP: 0300 Not tainted (4.14.72-rc1-yocto-standard+)
MSR: 00001032 <ME,IR,DR,RI> CR: 82004044 XER: 00000000
DAR: 00bc616e DSISR: 40000000
GPR00: c08ea5bc cf8dbc00 cf8e0000 c13ca517 c13ca517 c13ca8a0 00000066 00000002
GPR08: 00000063 00bc614e c0b05865 000affff 82004048 00000000 c00047f0 00000000
GPR16: c0a80000 c0a9cc34 c13ca517 c0ad1134 05ffffff 000affff c0b05860 c0abeef8
GPR24: cecec278 cecec278 c0a8c4d0 c0a885e0 c13ca8a0 05ffffff c13ca8a0 c13ca517
NIP [c08e9468] device_node_gen_full_name+0x30/0x15c
LR [c08ea5bc] device_node_string+0x190/0x3c8
Call Trace:
[cf8dbc00] [c007f670] trace_hardirqs_on_caller+0x118/0x1fc (unreliable)
[cf8dbc40] [c08ea5bc] device_node_string+0x190/0x3c8
[cf8dbcb0] [c08eb794] pointer+0x25c/0x4d0
[cf8dbd00] [c08ebcbc] vsnprintf+0x2b4/0x5ec
[cf8dbd60] [c08ec00c] vscnprintf+0x18/0x48
[cf8dbd70] [c008e268] vprintk_store+0x4c/0x22c
[cf8dbda0] [c008ecac] vprintk_emit+0x94/0x130
[cf8dbdd0] [c008ff54] printk+0x5c/0x6c
[cf8dbe10] [c0b8ddd4] of_unittest+0x2220/0x26f8
[cf8dbea0] [c0004434] do_one_initcall+0x4c/0x184
[cf8dbf00] [c0b4534c] kernel_init_freeable+0x13c/0x1d8
[cf8dbf30] [c0004814] kernel_init+0x24/0x118
[cf8dbf40] [c0013398] ret_from_kernel_thread+0x5c/0x64
The problem was observed when running a qemu test for the g3beige machine
with devicetree unittests enabled.
Disable interrupt node tests on affected systems to avoid both false
unittest failures and the crash.
With this patch in place, unittest on the affected system passes with
the following message.
dt-test ### end of unittest - 144 passed, 0 failed
Fixes: 53a42093d9 ("of: Add device tree selftests")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Frank Rowand <frank.rowand@sony.com>
Signed-off-by: Rob Herring <robh@kernel.org>
This contains a fix to 57e94c8b97, which caused
cros_ec based chromebooks to truncate an entire column of their built-in
keyboard.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE6gYDF28Li+nEiKLaHwn1ewov5lgFAlu9fNgACgkQHwn1ewov
5lh8hQ//UTmPH+KU/Ex+PkUyoRAEyQycF78dMJGDk+JUBEGclrqUhHeIWoXvW1ov
KMQr6mStcXBssZRUUhsyL8YmoC9dwGJzBU5oyZY8OjMaJMAiTq4RxxJLPrVVRp3F
fq+d+STEPN7wQmAo+jzq1d/Kdq5Y0IZ5G/C4qmj85jbx9dERnRgR6KJQTXS82qcJ
t6sI3CbKBfvC+DkVnDcZZTO934pwzaP4Fj+gLlBRGTEUFwbnrs3Yid9x9dMvctY4
zagnHo0ZQqAWqY/vjlVvJ/C0SV8JDcq3gP8QUrFnDhwFqvUBpM/60HN+VyKv2YoA
7DIV0Txdka1KvaBKW4gTsS7yP9AgprpfHDNirx1UTRWAZ8LMdFRVj7CEluMNuled
WMMQXVaecL5EQJ6pFd7kDr7X8oIrNe2o6fD4QyaT/Llzlf3dVpQm13ZKmI3ws3zu
422DSKcMMIjDJ7kdmT3fNHLhbRG7LdhK1fVRVAQT+OMYxhZ5/Pcvr3emq8F+WWhF
+4nyAlwhemPf9m0tfC6yWtZ3Bbv3ycILeIJE3t6jom1MRR9WRVS85sIlxJm6M+MO
cRjvIRNVgPaDKw1iawIV+hhleQvA0Q8ncEdxfD2thei9ilzeLVFsYpXW2HFN4rW0
sMdtYB+Y6vjHCIXno0ET4qgG5uqMWkMKMErFaPArTUQD8pzRft4=
=NoBP
-----END PGP SIGNATURE-----
Merge tag 'tag-chrome-platform-fixes-for-v4.19-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/bleung/chrome-platform
Benson writes:
"chrome-platform fix for v4.19-rc8
This contains a fix to 57e94c8b97 ("mfd: cros-ec: Increase maximum
mkbp event size"), which caused cros_ec based chromebooks to truncate
an entire column of their built-in keyboard."
* tag 'tag-chrome-platform-fixes-for-v4.19-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/bleung/chrome-platform:
mfd: cros-ec: copy the whole event in get_next_event_xfer
Dennis writes:
"percpu fixes for-4.19-rc8
The new percpu allocator introduced in 4.14 had a missing free for
the percpu metadata. This caused a memory leak when percpu memory is
being churned resulting in the allocation and deallocation of percpu
memory chunks"
* 'for-4.19-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu:
percpu: stop leaking bitmap metadata blocks
Four more patches for 4.19
- Fix resume after suspend-to-disk if resume-CPU != suspend-CPU
- Fix vfio-ccw check for pinned pages
- Two patches to avoid a usercopy-whitelist warning in vfio-ccw
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAABCAAGBQJbvEC8AAoJEDjwexyKj9rg9vEIANHnOfx9qzM6grkvc3hH4kTR
VSZ2kZE+bQOqmxUF2XiOuBZkjX6EWZKmWwL8p+5uIydxkqmUvDtG2hYqo7GKkddP
e3Qze+pSNz4uU+LT4t3wOUJfL1R1Pgrdea7rSr1DwbQQ8K4FiEImMr9pDfVPaHjq
XDnkpIwavG1s4Lxlkc6IbGHX0tYt2wp2lXO/QEp8sLjqAdUBnOiZhA3Ssy8veS+M
Q0u0Vgara5enhq9UDevr6e+jJ5w1qT4SZ0T38eif4FJSJMSZpWFe9qI92gKCIv9U
3oBZM/Qof0He9ow/Yl57rgxLmt8VGXiO5rL+VfT7j0H6kGkmao79SCvgyon0R+M=
=/FjH
-----END PGP SIGNATURE-----
Merge tag 's390-4.19-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Martin writes:
"s390 fixes for 4.19-rc8
Four more patches for 4.19:
- Fix resume after suspend-to-disk if resume-CPU != suspend-CPU
- Fix vfio-ccw check for pinned pages
- Two patches to avoid a usercopy-whitelist warning in vfio-ccw"
* tag 's390-4.19-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/cio: Fix how vfio-ccw checks pinned pages
s390/cio: Refactor alloc of ccw_io_region
s390/cio: Convert ccw_io_region to pointer
s390/hibernate: fix error handling when suspend cpu != resume cpu
- Avoid suboptimal placement of our VDSO when using the legacy mmap
layout, which can prevent statically linked programs that were able to
allocate large amounts of memory using the brk syscall prior to the
introduction of our VDSO from functioning correctly.
- Fix up CONFIG_CMDLINE handling for platforms which ought to ignore DT
arguments but have incorrectly used them & lost other arguments since
v3.16.
- Fix a path in MAINTAINERS to use valid wildcards.
- Fixup a regression from v4.17 in memset() for systems using
CPU_DADDI_WORKAROUNDS.
-----BEGIN PGP SIGNATURE-----
iIsEABYIADMWIQRgLjeFAZEXQzy86/s+p5+stXUA3QUCW7zsfxUccGF1bC5idXJ0
b25AbWlwcy5jb20ACgkQPqefrLV1AN0CzAEAksqhQt+PJpWTQtdQs05nuRMec0ey
2wOt2fjRTY9hd9UA/31ftnc520IJ9A5HXHlypPiA5SqtbaD6Z6YcIi0PcmUO
=0/Ro
-----END PGP SIGNATURE-----
Merge tag 'mips_fixes_4.19_2' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
Paul writes:
"A few MIPS fixes for 4.19:
- Avoid suboptimal placement of our VDSO when using the legacy mmap
layout, which can prevent statically linked programs that were able
to allocate large amounts of memory using the brk syscall prior to
the introduction of our VDSO from functioning correctly.
- Fix up CONFIG_CMDLINE handling for platforms which ought to ignore
DT arguments but have incorrectly used them & lost other arguments
since v3.16.
- Fix a path in MAINTAINERS to use valid wildcards.
- Fixup a regression from v4.17 in memset() for systems using
CPU_DADDI_WORKAROUNDS."
* tag 'mips_fixes_4.19_2' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: memset: Fix CPU_DADDI_WORKAROUNDS `small_fixup' regression
MAINTAINERS: MIPS/LOONGSON2 ARCHITECTURE - Use the normal wildcard style
MIPS: Fix CONFIG_CMDLINE handling
MIPS: VDSO: Always map near top of user memory
Commit 57e94c8b97 caused cros-ec keyboard events
be truncated on many chromebooks so that Left and Right keys on Column 12 were
always 0. Use ret as memcpy len to fix this.
The old code was using ec_dev->event_size, which is the event payload/data size
excluding event_type header, for the length of the memcpy operation. Use ret
as memcpy length to avoid the off by one and copy the whole msg->data.
Fixes: 57e94c8b97 ("mfd: cros-ec: Increase maximum mkbp event size")
Acked-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
Tested-by: Emil Renner Berthing <kernel@esmil.dk>
Signed-off-by: Emil Karlson <jekarlson@gmail.com>
Signed-off-by: Benson Leung <bleung@chromium.org>
If dm-linear or dm-flakey are layered on top of a partition of a zoned
block device, remapping of the start sector and write pointer position
of the zones reported by a report zones BIO must be modified to account
for the target table entry mapping (start offset within the device and
entry mapping with the dm device). If the target's backing device is a
partition of a whole disk, the start sector on the physical device of
the partition must also be accounted for when modifying the zone
information. However, dm_remap_zone_report() was not considering this
last case, resulting in incorrect zone information remapping with
targets using disk partitions.
Fix this by calculating the target backing device start sector using
the position of the completed report zones BIO and the unchanged
position and size of the original report zone BIO. With this value
calculated, the start sector and write pointer position of the target
zones can be correctly remapped.
Fixes: 10999307c1 ("dm: introduce dm_remap_zone_report()")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Commit 7e6358d244 ("dm: fix various targets to dm_register_target
after module __init resources created") inadvertently introduced this
bug when it moved dm_register_target() after the call to KMEM_CACHE().
Fixes: 7e6358d244 ("dm: fix various targets to dm_register_target after module __init resources created")
Cc: stable@vger.kernel.org
Signed-off-by: Shenghui Wang <shhuiw@foxmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Arthur Kiyanovski says:
====================
minor bug fixes for ENA Ethernet driver
Arthur Kiyanovski (4):
net: ena: fix warning in rmmod caused by double iounmap
net: ena: fix rare bug when failed restart/resume is followed by
driver removal
net: ena: fix NULL dereference due to untimely napi initialization
net: ena: fix auto casting to boolean
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eliminate potential auto casting compilation error.
Fixes: 1738cd3ed3 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)")
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
napi poll functions should be initialized before running request_irq(),
to handle a rare condition where there is a pending interrupt, causing
the ISR to fire immediately while the poll function wasn't set yet,
causing a NULL dereference.
Fixes: 1738cd3ed3 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)")
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In a rare scenario when ena_device_restore() fails, followed by device
remove, an FLR will not be issued. In this case, the device will keep
sending asynchronous AENQ keep-alive events, even after driver removal,
leading to memory corruption.
Fixes: 8c5c7abdeb ("net: ena: add power management ops to the ENA driver")
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Memory mapped with devm_ioremap is automatically freed when the driver
is disconnected from the device. Therefore there is no need to
explicitly call devm_iounmap.
Fixes: 0857d92f71 ("net: ena: add missing unmap bars on device removal")
Fixes: 411838e7b4 ("net: ena: fix rare kernel crash when bar memory remap fails")
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 64bc06bb32 broke buffered writes to journaled files (chattr
+j): we'll try to journal the buffer heads of the page being written to
in gfs2_iomap_journaled_page_done. However, the iomap code no longer
creates buffer heads, so we'll BUG() in gfs2_page_add_databufs. Fix
that by creating buffer heads ourself when needed.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
On some SD cards over SPI, reading with the multiblock read command the last
sector will leave the card in a bad state.
Remove last sectors from the multiblock reading cmd.
Signed-off-by: Chris Boot <bootc@bootc.net>
Signed-off-by: Clément Péron <peron.clem@gmail.com>
Cc: stable@vger.kernel.org # v4.10+
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Arnd Bergmann reported that turning on -Wvla found a new (unintended) VLA usage:
arch/x86/mm/pgtable.c: In function 'pgd_alloc':
include/linux/build_bug.h:29:45: error: ISO C90 forbids variable length array 'u_pmds' [-Werror=vla]
arch/x86/mm/pgtable.c:190:34: note: in expansion of macro 'static_cpu_has'
#define PREALLOCATED_USER_PMDS (static_cpu_has(X86_FEATURE_PTI) ? \
^~~~~~~~~~~~~~
arch/x86/mm/pgtable.c:431:16: note: in expansion of macro 'PREALLOCATED_USER_PMDS'
pmd_t *u_pmds[PREALLOCATED_USER_PMDS];
^~~~~~~~~~~~~~~~~~~~~~
Use the actual size of the array that is used for X86_FEATURE_PTI,
which is known at build time, instead of the variable size.
[ mingo: Squashed original fix with followup fix to avoid bisection breakage, wrote new changelog. ]
Reported-by: Arnd Bergmann <arnd@arndb.de>
Original-written-by: Arnd Bergmann <arnd@arndb.de>
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hpe.com>
Fixes: 1be3f247c288 ("x86/mm: Avoid VLA in pgd_alloc()")
Link: http://lkml.kernel.org/r/20181008235434.GA35035@beast
Signed-off-by: Ingo Molnar <mingo@kernel.org>
While the DOC at the beginning of lib/bitmap.c explicitly states that
"The number of valid bits in a given bitmap does _not_ need to be an
exact multiple of BITS_PER_LONG.", some of the bitmap operations do
indeed access BITS_PER_LONG portions of the provided bitmap no matter
the size of the provided bitmap. For example, if bitmap_intersects()
is provided with an 8 bit bitmap the operation will access
BITS_PER_LONG bits from the provided bitmap. While the operation
ensures that these extra bits do not affect the result, the memory
is still accessed.
The capacity bitmasks (CBMs) are typically stored in u32 since they
can never exceed 32 bits. A few instances exist where a bitmap_*
operation is performed on a CBM by simply pointing the bitmap operation
to the stored u32 value.
The consequence of this pattern is that some bitmap_* operations will
access out-of-bounds memory when interacting with the provided CBM. This
is confirmed with a KASAN test that reports:
BUG: KASAN: stack-out-of-bounds in __bitmap_intersects+0xa2/0x100
and
BUG: KASAN: stack-out-of-bounds in __bitmap_weight+0x58/0x90
Fix this by moving any CBM provided to a bitmap operation needing
BITS_PER_LONG to an 'unsigned long' variable.
[ tglx: Changed related function arguments to unsigned long and got rid
of the _cbm extra step ]
Fixes: 72d5050566 ("x86/intel_rdt: Add utilities to test pseudo-locked region possibility")
Fixes: 49f7b4efa1 ("x86/intel_rdt: Enable setting of exclusive mode")
Fixes: d9b48c86eb ("x86/intel_rdt: Display resource groups' allocations' size in bytes")
Fixes: 95f0b77efa ("x86/intel_rdt: Initialize new resource group with sane defaults")
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/69a428613a53f10e80594679ac726246020ff94f.1538686926.git.reinette.chatre@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The rxrpc_input_packet() function and its call tree was built around the
assumption that data_ready() handler called from UDP to inform a kernel
service that there is data to be had was non-reentrant. This means that
certain locking could be dispensed with.
This, however, turns out not to be the case with a multi-queue network card
that can deliver packets to multiple cpus simultaneously. Each of those
cpus can be in the rxrpc_input_packet() function at the same time.
Fix by adding or changing some structure members:
(1) Add peer->rtt_input_lock to serialise access to the RTT buffer.
(2) Make conn->service_id into a 32-bit variable so that it can be
cmpxchg'd on all arches.
(3) Add call->input_lock to serialise access to the Rx/Tx state. Note
that although the Rx and Tx states are (almost) entirely separate,
there's no point completing the separation and having separate locks
since it's a bi-phasal RPC protocol rather than a bi-direction
streaming protocol. Data transmission and data reception do not take
place simultaneously on any particular call.
and making the following functional changes:
(1) In rxrpc_input_data(), hold call->input_lock around the core to
prevent simultaneous producing of packets into the Rx ring and
updating of tracking state for a particular call.
(2) In rxrpc_input_ping_response(), only read call->ping_serial once, and
check it before checking RXRPC_CALL_PINGING as that's a cheaper test.
The bit test and bit clear can then be combined. No further locking
is needed here.
(3) In rxrpc_input_ack(), take call->input_lock after we've parsed much of
the ACK packet. The superseded ACK check is then done both before and
after the lock is taken.
The handing of ackinfo data is split, parsing before the lock is taken
and processing with it held. This is keyed on rxMTU being non-zero.
Congestion management is also done within the locked section.
(4) In rxrpc_input_ackall(), take call->input_lock around the Tx window
rotation. The ACKALL packet carries no information and is only really
useful after all packets have been transmitted since it's imprecise.
(5) In rxrpc_input_implicit_end_call(), we use rx->incoming_lock to
prevent calls being simultaneously implicitly ended on two cpus and
also to prevent any races with incoming call setup.
(6) In rxrpc_input_packet(), use cmpxchg() to effect the service upgrade
on a connection. It is only permitted to happen once for a
connection.
(7) In rxrpc_new_incoming_call(), we have to recheck the routing inside
rx->incoming_lock to see if someone else set up the call, connection
or peer whilst we were getting there. We can't trust the values from
the earlier routing check unless we pin refs on them - which we want
to avoid.
Further, we need to allow for an incoming call to have its state
changed on another CPU between us making it live and us adjusting it
because the conn is now in the RXRPC_CONN_SERVICE state.
(8) In rxrpc_peer_add_rtt(), take peer->rtt_input_lock around the access
to the RTT buffer. Don't need to lock around setting peer->rtt.
For reference, the inventory of state-accessing or state-altering functions
used by the packet input procedure is:
> rxrpc_input_packet()
* PACKET CHECKING
* ROUTING
> rxrpc_post_packet_to_local()
> rxrpc_find_connection_rcu() - uses RCU
> rxrpc_lookup_peer_rcu() - uses RCU
> rxrpc_find_service_conn_rcu() - uses RCU
> idr_find() - uses RCU
* CONNECTION-LEVEL PROCESSING
- Service upgrade
- Can only happen once per conn
! Changed to use cmpxchg
> rxrpc_post_packet_to_conn()
- Setting conn->hi_serial
- Probably safe not using locks
- Maybe use cmpxchg
* CALL-LEVEL PROCESSING
> Old-call checking
> rxrpc_input_implicit_end_call()
> rxrpc_call_completed()
> rxrpc_queue_call()
! Need to take rx->incoming_lock
> __rxrpc_disconnect_call()
> rxrpc_notify_socket()
> rxrpc_new_incoming_call()
- Uses rx->incoming_lock for the entire process
- Might be able to drop this earlier in favour of the call lock
> rxrpc_incoming_call()
! Conflicts with rxrpc_input_implicit_end_call()
> rxrpc_send_ping()
- Don't need locks to check rtt state
> rxrpc_propose_ACK
* PACKET DISTRIBUTION
> rxrpc_input_call_packet()
> rxrpc_input_data()
* QUEUE DATA PACKET ON CALL
> rxrpc_reduce_call_timer()
- Uses timer_reduce()
! Needs call->input_lock()
> rxrpc_receiving_reply()
! Needs locking around ack state
> rxrpc_rotate_tx_window()
> rxrpc_end_tx_phase()
> rxrpc_proto_abort()
> rxrpc_input_dup_data()
- Fills the Rx buffer
- rxrpc_propose_ACK()
- rxrpc_notify_socket()
> rxrpc_input_ack()
* APPLY ACK PACKET TO CALL AND DISCARD PACKET
> rxrpc_input_ping_response()
- Probably doesn't need any extra locking
! Need READ_ONCE() on call->ping_serial
> rxrpc_input_check_for_lost_ack()
- Takes call->lock to consult Tx buffer
> rxrpc_peer_add_rtt()
! Needs to take a lock (peer->rtt_input_lock)
! Could perhaps manage with cmpxchg() and xadd() instead
> rxrpc_input_requested_ack
- Consults Tx buffer
! Probably needs a lock
> rxrpc_peer_add_rtt()
> rxrpc_propose_ack()
> rxrpc_input_ackinfo()
- Changes call->tx_winsize
! Use cmpxchg to handle change
! Should perhaps track serial number
- Uses peer->lock to record MTU specification changes
> rxrpc_proto_abort()
! Need to take call->input_lock
> rxrpc_rotate_tx_window()
> rxrpc_end_tx_phase()
> rxrpc_input_soft_acks()
- Consults the Tx buffer
> rxrpc_congestion_management()
- Modifies the Tx annotations
! Needs call->input_lock()
> rxrpc_queue_call()
> rxrpc_input_abort()
* APPLY ABORT PACKET TO CALL AND DISCARD PACKET
> rxrpc_set_call_completion()
> rxrpc_notify_socket()
> rxrpc_input_ackall()
* APPLY ACKALL PACKET TO CALL AND DISCARD PACKET
! Need to take call->input_lock
> rxrpc_rotate_tx_window()
> rxrpc_end_tx_phase()
> rxrpc_reject_packet()
There are some functions used by the above that queue the packet, after
which the procedure is terminated:
- rxrpc_post_packet_to_local()
- local->event_queue is an sk_buff_head
- local->processor is a work_struct
- rxrpc_post_packet_to_conn()
- conn->rx_queue is an sk_buff_head
- conn->processor is a work_struct
- rxrpc_reject_packet()
- local->reject_queue is an sk_buff_head
- local->processor is a work_struct
And some that offload processing to process context:
- rxrpc_notify_socket()
- Uses RCU lock
- Uses call->notify_lock to call call->notify_rx
- Uses call->recvmsg_lock to queue recvmsg side
- rxrpc_queue_call()
- call->processor is a work_struct
- rxrpc_propose_ACK()
- Uses call->lock to wrap __rxrpc_propose_ACK()
And a bunch that complete a call, all of which use call->state_lock to
protect the call state:
- rxrpc_call_completed()
- rxrpc_set_call_completion()
- rxrpc_abort_call()
- rxrpc_proto_abort()
- Also uses rxrpc_queue_call()
Fixes: 17926a7932 ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both")
Signed-off-by: David Howells <dhowells@redhat.com>
Fix the rxrpc_tx_packet trace line by storing the where parameter.
Fixes: 4764c0da69 ("rxrpc: Trace packet transmission")
Signed-off-by: David Howells <dhowells@redhat.com>
Fix connection-level abort handling to cache the abort and error codes
properly so that a new incoming call can be properly aborted if it races
with the parent connection being aborted by another CPU.
The abort_code and error parameters can then be dropped from
rxrpc_abort_calls().
Fixes: f5c17aaeb2 ("rxrpc: Calls should only have one terminal state")
Signed-off-by: David Howells <dhowells@redhat.com>