The read of ->dynticks_nmi_nesting in rcu_irq_enter() and rcu_irq_exit()
is currently protected with READ_ONCE(). However, this protection is
unnecessary because (1) ->dynticks_nmi_nesting is updated only by the
current CPU, (2) Although NMI handlers can update this field, they reset
it back to its old value before return, and (3) Interrupts are disabled,
so nothing else can modify it. The value of ->dynticks_nmi_nesting is
thus effectively constant, and so no protection is required.
This commit therefore removes the READ_ONCE() protection from these
two accesses.
Link: http://lkml.kernel.org/r/20170926031902.GA2074@linux.vnet.ibm.com
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The trampoline allocated by function tracer was overwriten by function_graph
tracer, and caused a memory leak. The save_global_trampoline should have
saved the previous trampoline in register_ftrace_graph() and restored it in
unregister_ftrace_graph(). But as it is implemented, save_global_trampoline was
only used in unregister_ftrace_graph as default value 0, and it overwrote the
previous trampoline's value. Causing the previous allocated trampoline to be
lost.
kmmeleak backtrace:
kmemleak_vmalloc+0x77/0xc0
__vmalloc_node_range+0x1b5/0x2c0
module_alloc+0x7c/0xd0
arch_ftrace_update_trampoline+0xb5/0x290
ftrace_startup+0x78/0x210
register_ftrace_function+0x8b/0xd0
function_trace_init+0x4f/0x80
tracing_set_tracer+0xe6/0x170
tracing_set_trace_write+0x90/0xd0
__vfs_write+0x37/0x170
vfs_write+0xb2/0x1b0
SyS_write+0x55/0xc0
do_syscall_64+0x67/0x180
return_from_SYSCALL_64+0x0/0x6a
[
Looking further into this, I found that this was left over from when the
function and function graph tracers shared the same ftrace_ops. But in
commit 5f151b2401 ("ftrace: Fix function_profiler and function tracer
together"), the two were separated, and the save_global_trampoline no
longer was necessary (and it may have been broken back then too).
-- Steven Rostedt
]
Link: http://lkml.kernel.org/r/20170912021454.5976-1-shuwang@redhat.com
Cc: stable@vger.kernel.org
Fixes: 5f151b2401 ("ftrace: Fix function_profiler and function tracer together")
Signed-off-by: Shu Wang <shuwang@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The mmu context on the 40x, 44x does not define pte_frag entry. This
causes gcc abort the compilation due to:
setup-common.c: In function ‘setup_arch’:
setup-common.c:908: error: ‘mm_context_t’ has no ‘pte_frag’
This patch fixes the issue by removing the pte_frag initialization in
setup-common.c.
This is possible, because the compiler will do the initialization,
since the mm_context is a sub struct of init_mm. init_mm is declared
in mm_types.h as external linkage.
According to C99 6.2.4.3:
An object whose identifier is declared with external linkage
[...] has static storage duration.
C99 defines in 6.7.8.10 that:
If an object that has static storage duration is not
initialized explicitly, then:
- if it has pointer type, it is initialized to a null pointer
Fixes: b1923caa6e ("powerpc: Merge 32-bit and 64-bit setup_arch()")
Signed-off-by: Christian Lamparter <chunkeey@gmail.com>
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Commit 41d0c2ecde ("powerpc/powernv: Fix local TLB flush for boot
and MCE on POWER9") introduced calls to __flush_tlb_power[89] from the
cpufeatures code, specifying the number of sets to flush.
However, these functions take an action argument, not a number of
sets. This means we hit the BUG() in __flush_tlb_{206,300} when using
cpufeatures-style configuration.
This change passes TLB_INVAL_SCOPE_GLOBAL instead.
Fixes: 41d0c2ecde ("powerpc/powernv: Fix local TLB flush for boot and MCE on POWER9")
Cc: stable@vger.kernel.org # v4.13+
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
For write_pending if the queue is down or client failed then return -EIO
so that LIO can properly process the completed command. Prior we
returned 0 since LIO could not handle it properly. Now with commit
fa7e25cf13 ("target: Fix unknown fabric callback queue-full errors")
that patch addresses LIO's ability to handle things right.
Signed-off-by: Bryant G. Ly <bgly@us.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
iscsi_session_teardown was the only user of this function. Function
currently is just short for iscsi_remove_session + iscsi_free_session.
Signed-off-by: Khazhismel Kumykov <khazhy@google.com>
Acked-by: Chris Leech <cleech@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Session attributes exposed through sysfs were freed before the device
was destroyed, resulting in a potential use-after-free. Free these
attributes after removing the device.
Signed-off-by: Khazhismel Kumykov <khazhy@google.com>
Acked-by: Chris Leech <cleech@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
A user may lower the max_sectors_kb setting in sysfs to accommodate
certain workloads. Previously we would always set the max I/O size to
either the block layer default or the optional preferred I/O size
reported by the device.
Keep the current heuristics for the initial setting of max_sectors_kb.
For subsequent invocations, only update the current queue limit if it
exceeds the capabilities of the hardware.
Cc: <stable@vger.kernel.org>
Reported-by: Don Brace <don.brace@microsemi.com>
Reviewed-by: Martin Wilck <mwilck@suse.com>
Tested-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
SBC-4 states:
"A MAXIMUM UNMAP LBA COUNT field set to a non-zero value indicates the
maximum number of LBAs that may be unmapped by an UNMAP command"
"A MAXIMUM WRITE SAME LENGTH field set to a non-zero value indicates
the maximum number of contiguous logical blocks that the device server
allows to be unmapped or written in a single WRITE SAME command."
Despite the spec being clear on the topic, some devices incorrectly
expect WRITE SAME commands with the UNMAP bit set to be limited to the
value reported in MAXIMUM UNMAP LBA COUNT in the Block Limits VPD.
Implement a blacklist option that can be used to accommodate devices
with this behavior.
Cc: <stable@vger.kernel.org>
Reported-by: Bill Kuzeja <William.Kuzeja@stratus.com>
Reported-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Tested-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
drm/i915 fixes for 4.14-rc3
Couple fixes for stable:
- Fix ELD connector types and consequently audio on DP (Jani).
- Ignore HDMI on Port A and consequently fix an ops on i915 probe
when VBT advertises HDMI on Port A (Jani).
And a small fix:
- That removes a reduntant hw_check on modeset. (Colin)
* tag 'drm-intel-fixes-2017-09-27' of git://anongit.freedesktop.org/git/drm-intel:
drm/i915/bios: ignore HDMI on port A
drm/i915: remove redundant variable hw_check
drm/i915: always update ELD connector type after get modes
Starting from linux-4.4, 3WHS no longer takes the listener lock.
Since this time, we might hit a use-after-free in sk_filter_charge(),
if the filter we got in the memcpy() of the listener content
just happened to be replaced by a thread changing listener BPF filter.
To fix this, we need to make sure the filter refcount is not already
zero before incrementing it again.
Fixes: e994b2f0fb ("tcp: do not lock listener to process SYN packets")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Christoph made it so that if we return'ed BLK_STS_RESOURCE whenever we
got ERESTARTSYS from sending our packets we'd return BLK_STS_OK, which
means we'd never requeue and just hang. We really need to return the
right value from the upper layer.
Fixes: fc17b6534e ("blk-mq: switch ->queue_rq return value to blk_status_t")
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The HDMI driver enables the bus and mod clocks in the bind function, but
does not disable them if it then bails our due to any errors. Neither
does it disable the clocks in the unbind function.
Fix this by adding a proper error path to the bind function, and
clk_disable_unprepare calls to the unbind function.
Also rename the err_cleanup_connector label to err_cleanup_encoder,
since it is the encoder that gets cleaned up.
Fixes: 9c5681011a ("drm/sun4i: Add HDMI support")
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Acked-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170929082306.16193-6-wens@csie.org
ahci_pci_reset_controller() calls ahci_reset_controller(), which may
fail, but ignores the result code and always returns success. This
may result in failures like below
ahci 0000:02:00.0: version 3.0
ahci 0000:02:00.0: enabling device (0000 -> 0003)
ahci 0000:02:00.0: SSS flag set, parallel bus scan disabled
ahci 0000:02:00.0: controller reset failed (0xffffffff)
ahci 0000:02:00.0: failed to stop engine (-5)
... repeated many times ...
ahci 0000:02:00.0: failed to stop engine (-5)
Unable to handle kernel paging request at virtual address ffff0000093f9018
...
PC is at ahci_stop_engine+0x5c/0xd8 [libahci]
LR is at ahci_deinit_port.constprop.12+0x1c/0xc0 [libahci]
...
[<ffff000000a17014>] ahci_stop_engine+0x5c/0xd8 [libahci]
[<ffff000000a196b4>] ahci_deinit_port.constprop.12+0x1c/0xc0 [libahci]
[<ffff000000a197d8>] ahci_init_controller+0x80/0x168 [libahci]
[<ffff000000a260f8>] ahci_pci_init_controller+0x60/0x68 [ahci]
[<ffff000000a26f94>] ahci_init_one+0x75c/0xd88 [ahci]
[<ffff000008430324>] local_pci_probe+0x3c/0xb8
[<ffff000008431728>] pci_device_probe+0x138/0x170
[<ffff000008585e54>] driver_probe_device+0x2dc/0x458
[<ffff0000085860e4>] __driver_attach+0x114/0x118
[<ffff000008583ca8>] bus_for_each_dev+0x60/0xa0
[<ffff000008585638>] driver_attach+0x20/0x28
[<ffff0000085850b0>] bus_add_driver+0x1f0/0x2a8
[<ffff000008586ae0>] driver_register+0x60/0xf8
[<ffff00000842f9b4>] __pci_register_driver+0x3c/0x48
[<ffff000000a3001c>] ahci_pci_driver_init+0x1c/0x1000 [ahci]
[<ffff000008083918>] do_one_initcall+0x38/0x120
where an obvious hardware level failure results in an unnecessary 15 second
delay and a subsequent crash.
So record the result code of ahci_reset_controller() and relay it, rather
than ignoring it.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
Jiri Pirko says:
====================
mlxsw: Fixes in GRE offloading
Petr says:
This patchset fixes a couple unrelated problems in offloading IP-in-IP tunnels
in mlxsw driver.
- The first patch fixes a potential reference-counting problem that might lead
to a kernel crash.
- The second patch associates IPIP next hops with their loopback RIFs. Besides
being the right thing to do, it also fixes a problem where offloaded IPv6
routes that forward to IP-in-IP netdevices were not flagged as such.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When considering whether to set RTNH_F_OFFLOAD flag on an IPv6 route,
mlxsw_sp_fib6_entry_offload_set() looks up the mlxsw_sp_nexthop
corresponding to a given route, and decides based on whether the next
hop's offloaded flag was set. When looking for the matching next hop, it
also takes into account the device of the route, which must match next
hop's RIF.
IPIP next hops however hitherto didn't set the RIF. As a result, IPv6
routes forwarding traffic to IP-in-IP netdevices are never marked as
offloaded, even when they actually are.
Thus track RIF of IPIP next hops the same way as that of ETHERNET next
hops.
Fixes: 8f28a30976 ("mlxsw: spectrum_router: Support IPv6 overlay encap")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When creating a new RIF, bumping RIF count of the containing VR is the
last thing to be done. Symmetrically, when destroying a RIF, RIF count
is first dropped and only then the rest of the cleanup proceeds.
That's a problem for loopback RIFs. Those hold two VR references: one
for overlay and one for underlay. mlxsw_sp_rif_destroy() releases the
overlay one, and the deconfigure() callback the underlay one. But if
both overlay and underlay are the same, and if there are no other
artifacts holding the VR alive, this put actually destroys the VR. Later
on, when mlxsw_sp_rif_destroy() calls mlxsw_sp_vr_put() for the same VR,
the VR will already have been released and the kernel crashes with NULL
pointer dereference.
The underlying problem is that the RIF under destruction ends up
referencing the overlay VR much longer than it claims: all the way until
the call to mlxsw_sp_vr_put(). So line up the reference counting
properly to reflect this. Make corresponding changes in
mlxsw_sp_rif_create() as well for symmetry.
Fixes: 6ddb7426a7 ("mlxsw: spectrum_router: Introduce loopback RIFs")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The usx2y driver allocates the stream read/write buffers in continuous
pages depending on the stream setup, and this may spew the kernel
warning messages with a stack trace like:
WARNING: CPU: 1 PID: 1846 at mm/page_alloc.c:3883
__alloc_pages_slowpath+0x1ef2/0x2d70
Modules linked in:
CPU: 1 PID: 1846 Comm: kworker/1:2 Not tainted
....
It may confuse user as if it were any serious error, although this is
no fatal error and the driver handles the error case gracefully.
Since the driver has already some sanity check of the given size (128
and 256 pages), it can't pass any crazy value. So it's merely page
fragmentation.
This patch adds __GFP_NOWARN to each caller for suppressing such
kernel warnings. The original issue was spotted by syzkaller.
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
previous commit 5d37ca14 "ceph: send LSSNAP request to auth mds
of directory inode" is buggy. It makes __choose_mds() choose mds
base on hash of '.snap' dentry.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
commit 3ae0bebc "ceph: queue cap snap only when snap realm's
context changes" introduced a regression: we may not call
queue_realm_cap_snaps() for newly created snap realm. This
regression allows unflushed snapshot data to be overwritten.
Link: http://tracker.ceph.com/issues/21483
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Currently data_abort_decode() dumps the ISS field as a decimal value
with a '0x' prefix, which is somewhat misleading.
Fix it to print as hexadecimal, as was intended.
Fixes: 1f9b8936f3 ("arm64: Decode information from ESR upon mem faults")
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Julien Thierry <julien.thierry@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
This reverts commit 275353bb68 to fix a regression which can abort
'alsactl' program in alsa-utils due to assertion in alsa-lib.
alsactl: control.c:2513: snd_ctl_elem_value_get_integer: Assertion `idx < sizeof(obj->value.integer.value) / sizeof(obj->value.integer.value[0])' failed.
alsactl: control.c:2976: snd_ctl_elem_value_get_integer: Assertion `idx < ARRAY_SIZE(obj->value.integer.value)' failed.
This commit is a band-aid. In a point of usage of ALSA control interface,
the drivers still bring an issue that they prevent userspace applications
to have a consistent way to parse each levels of the dimension information
via ALSA control interface.
Let me investigate this issue. Current implementation of the drivers
have three control element sets with dimension information:
* 'Monitor Mixer Volume' (type: integer)
* 'VMixer Volume' (type: integer)
* 'VU-meters' (type: boolean)
Although the number of elements named as 'Monitor Mixer Volume' differs
depending on drivers in this group, it can be calculated by macros
defined by each driver (= (BX_NUM - BX_ANALOG_IN) * BX_ANALOG_IN). Each
of the elements has one member for value and has dimension information
with 2 levels (= BX_ANALOG_IN * (BX_NUM - BX_ANALOG_IN)). For these
elements, userspace applications are expected to handle the dimension
information so that all of the elements construct a matrix where the
number of rows and columns are represented by the dimension information.
The same way is applied to elements named as 'VMixer Volume'. The number
of these elements can also be calculated by macros defined by each
drivers (= PX_ANALOG_IN * BX_ANALOG_IN). Each of the element has one
member for value and has dimension information with 2 levels
(= BX_ANALOG_IN * PX_ANALOG_IN). All of the elements construct a matrix
with the dimension information.
An element named as 'VU-meters' gets a different way in a point of
dimension information. The element includes 96 members for value. The
element has dimension information with 3 levels (= 3 or 2 * 16 * 2). For
this element, userspace applications are expected to handle the dimension
information so that all of the members for value construct a matrix
where the number of rows and columns are represented by the dimension
information. This is different from the way for the former.
As a summary, the drivers were not designed to produce a consistent way to
parse the dimension information. This makes it hard for general userspace
applications such as amixer to parse the information by a consistent way,
and actually no userspace applications except for 'echomixer' utilize the
dimension information. Additionally, no drivers excluding this group use
the information.
The reverted commit was written based on the latter way. A commit
860c1994a7 ('ALSA: control: add dimension validator for userspace
elements') is written based on the latter way, too. The patch should be
reconsider too in the same time to re-define a consistent way to parse the
dimension information.
Reported-by: Mark Hills <mark@xwax.org>
Reported-by: S. Christian Collins <s.chriscollins@gmail.com>
Fixes: 275353bb68 ('ALSA: echoaudio: purge contradictions between dimension matrix members and total number of members')
Cc: <stable@vger.kernel.org> # v4.8+
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
This reverts commit fcaa4a07d2.
As noted by Masaki [1], 0x120A + trackpoint will not be used in mass
production machines, so remove the ID accordingly.
[1] http://www.spinics.net/lists/linux-input/msg53222.html
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Acked-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
We should not try to bring HID device out of full power state before
calling hid_hw_close(), so that transport driver operates on powered up
device (making this inverse of the opening sequence).
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Reviewed-by: Benson Leung <bleung@chromium.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
The wacom_get_hdev_data function is used to find and return a reference to
the "other half" of a Wacom device (i.e., the touch device associated with
a pen, or vice-versa). To ensure these references are properly accounted
for, the function is supposed to automatically increment the refcount before
returning. This was not done, however, for devices which have pen & touch
on different interfaces of the same USB device. This can lead to a WARNING
("refcount_t: underflow; use-after-free") when removing the module or device
as we call kref_put() more times than kref_get(). Triggering an "actual" use-
after-free would be difficult since both devices will disappear nearly-
simultaneously. To silence this warning and prevent the potential error, we
need to increment the refcount for all cases within wacom_get_hdev_data.
Fixes: 41372d5d40 ("HID: wacom: Augment 'oVid' and 'oPid' with heuristics for HID_GENERIC")
Cc: <stable@vger.kernel.org> # v4.9+
Signed-off-by: Jason Gerecke <jason.gerecke@wacom.com>
Reviewed-by: Ping Cheng <ping.cheng@wacom.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
The driver strength selection is missed and required when selecting
hs400es. So, It is added here.
Fixes: 81ac2af657 ("mmc: core: implement enhanced strobe support")
Cc: stable@vger.kernel.org
Signed-off-by: Hankyung Yu <hankyung.yu@lge.com>
Signed-off-by: Chanho Min <chanho.min@lge.com>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Shawn Lin <shawn.lin@rock-chips.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
If this sanity check fails, we must free 'rss_indir'. Otherwise there is a
memory leak.
'goto err' as done in the other error handling paths to fix it.
Fixes: 46a3df9f97 ("net: hns3: Fix for setting rss_size incorrectly")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
On Armada 7K/8K we need to explicitly enable the bus clock. The bus clock
is optional because not all the SoCs need them but at least for Armada
7K/8K it is actually mandatory.
The binding documentation is updating accordingly.
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This linksys dongle by default comes up in cdc_ether mode.
This patch allows r8152 to claim the device:
Bus 002 Device 002: ID 13b1:0041 Linksys
Signed-off-by: Grant Grundler <grundler@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The l2tp_eth module crashes if its netlink callbacks are run when the
pernet data aren't initialised.
We should normally register_pernet_device() before the genl callbacks.
However, the pernet data only maintain a list of l2tpeth interfaces,
and this list is never used. So let's just drop pernet handling
instead.
Fixes: d9e31d17ce ("l2tp: Add L2TP ethernet pseudowire support")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long says:
====================
ip_gre: a bunch of fixes for erspan
This patchset is to fix some issues that could cause 0 or low
performance, and even unexpected truncated packets on erspan.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The patch 'ip_gre: ipgre_tap device should keep dst' fixed
the issue ipgre_tap dev mtu couldn't be updated in tx path.
The same fix is needed for erspan as well.
Fixes: 84e54fe0a5 ("gre: introduce native tunnel support for ERSPAN")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
According to __gre_tunnel_init, tunnel->hlen should be set as the
headers' length between inner packet and outer iphdr.
It would be used especially to calculate a proper mtu when updating
mtu in tnl_update_pmtu. Now without setting it, a bigger mtu value
than expected would be updated, which hurts performance a lot.
This patch is to fix it by setting tunnel->hlen with:
tunnel->tun_hlen + tunnel->encap_hlen + sizeof(struct erspanhdr)
Fixes: 84e54fe0a5 ("gre: introduce native tunnel support for ERSPAN")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As a ARPHRD_ETHER device, skb->len in erspan_xmit is the length
of the whole ether packet. So before checking if a packet size
exceeds the mtu, skb->len should subtract dev->hard_header_len.
Otherwise, all packets with max size according to mtu would be
trimmed to be truncated packet.
Fixes: 84e54fe0a5 ("gre: introduce native tunnel support for ERSPAN")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
erspan only uses the first 10 bits of session_id as the key to look
up the tunnel. But in erspan_rcv, it missed 'session_id & ID_MASK'
when getting the key from session_id.
If any other flag is also set in session_id in a packet, it would
fail to find the tunnel due to incorrect key in erspan_rcv.
This patch is to add 'session_id & ID_MASK' there and also remove
the unnecessary variable session_id.
Fixes: 84e54fe0a5 ("gre: introduce native tunnel support for ERSPAN")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull x86 fixes from Thomas Gleixner:
"This contains the following fixes and improvements:
- Avoid dereferencing an unprotected VMA pointer in the fault signal
generation code
- Fix inline asm call constraints for GCC 4.4
- Use existing register variable to retrieve the stack pointer
instead of forcing the compiler to create another indirect access
which results in excessive extra 'mov %rsp, %<dst>' instructions
- Disable branch profiling for the memory encryption code to prevent
an early boot crash
- Fix a sparse warning caused by casting the __user annotation in
__get_user_asm_u64() away
- Fix an off by one error in the loop termination of the error patch
in the x86 sysfs init code
- Add missing CPU IDs to various Intel specific drivers to enable the
functionality on recent hardware
- More (init) constification in the numachip code"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/asm: Use register variable to get stack pointer value
x86/mm: Disable branch profiling in mem_encrypt.c
x86/asm: Fix inline asm call constraints for GCC 4.4
perf/x86/intel/uncore: Correct num_boxes for IIO and IRP
perf/x86/intel/rapl: Add missing CPU IDs
perf/x86/msr: Add missing CPU IDs
perf/x86/intel/cstate: Add missing CPU IDs
x86: Don't cast away the __user in __get_user_asm_u64()
x86/sysfs: Fix off-by-one error in loop termination
x86/mm: Fix fault error path using unsafe vma pointer
x86/numachip: Add const and __initconst to numachip2_clockevent
Pull timer fixes from Thomas Gleixner:
"This adds a new timer wheel function which is required for the
conversion of the timer callback function from the 'unsigned long
data' argument to 'struct timer_list *timer'. This conversion has two
benefits:
1) It makes struct timer_list smaller
2) Many callers hand in a pointer to the timer or to the structure
containing the timer, which happens via type casting both at setup
and in the callback. This change gets rid of the typecasts.
Once the conversion is complete, which is planned for 4.15, the old
setup function and the intermediate typecast in the new setup function
go away along with the data field in struct timer_list.
Merging this now into mainline allows a smooth queueing of the actual
conversion in the affected maintainer trees without creating
dependencies"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
um/time: Fixup namespace collision
timer: Prepare to change timer callback argument type
Pull smp/hotplug fixes from Thomas Gleixner:
"This addresses the fallout of the new lockdep mechanism which covers
completions in the CPU hotplug code.
The lockdep splats are false positives, but there is no way to
annotate that reliably. The solution is to split the completions for
CPU up and down, which requires some reshuffling of the failure
rollback handling as well"
* 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
smp/hotplug: Hotplug state fail injection
smp/hotplug: Differentiate the AP completion between up and down
smp/hotplug: Differentiate the AP-work lockdep class between up and down
smp/hotplug: Callback vs state-machine consistency
smp/hotplug: Rewrite AP state machine core
smp/hotplug: Allow external multi-instance rollback
smp/hotplug: Add state diagram
Pull scheduler fixes from Thomas Gleixner:
"The scheduler pull request comes with the following updates:
- Prevent a divide by zero issue by validating the input value of
sysctl_sched_time_avg
- Make task state printing consistent all over the place and have
explicit state characters for IDLE and PARKED so they wont be
displayed as 'D' state which confuses tools"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/sysctl: Check user input value of sysctl_sched_time_avg
sched/debug: Add explicit TASK_PARKED printing
sched/debug: Ignore TASK_IDLE for SysRq-W
sched/debug: Add explicit TASK_IDLE printing
sched/tracing: Use common task-state helpers
sched/tracing: Fix trace_sched_switch task-state printing
sched/debug: Remove unused variable
sched/debug: Convert TASK_state to hex
sched/debug: Implement consistent task-state printing
Pull perf fixes from Thomas Gleixner:
- Prevent a division by zero in the perf aux buffer handling
- Sync kernel headers with perf tool headers
- Fix a build failure in the syscalltbl code
- Make the debug messages of perf report --call-graph work correctly
- Make sure that all required perf files are in the MANIFEST for
container builds
- Fix the atrr.exclude kernel handling so it respects the
perf_event_paranoid and the user permissions
- Make perf test on s390x work correctly
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/aux: Only update ->aux_wakeup in non-overwrite mode
perf test: Fix vmlinux failure on s390x part 2
perf test: Fix vmlinux failure on s390x
perf tools: Fix syscalltbl build failure
perf report: Fix debug messages with --call-graph option
perf evsel: Fix attr.exclude_kernel setting for default cycles:p
tools include: Sync kernel ABI headers with tooling headers
perf tools: Get all of tools/{arch,include}/ in the MANIFEST
Pull locking fixes from Thomas Gleixner:
"Two fixes for locking:
- Plug a hole the pi_stat->owner serialization which was changed
recently and failed to fixup two usage sites.
- Prevent reordering of the rwsem_has_spinner() check vs the
decrement of rwsem count in up_write() which causes a missed
wakeup"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/rwsem-xadd: Fix missed wakeup due to reordering of load
futex: Fix pi_state->owner serialization
Pull irq fixes from Thomas Gleixner:
- Add a missing NULL pointer check in free_irq()
- Fix a memory leak/memory corruption in the generic irq chip
- Add missing rcu annotations for radix tree access
- Use ffs instead of fls when extracting data from a chip register in
the MIPS GIC irq driver
- Fix the unmasking of IPI interrupts in the MIPS GIC driver so they
end up at the target CPU and not at CPU0
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irq/generic-chip: Don't replace domain's name
irqdomain: Add __rcu annotations to radix tree accessors
irqchip/mips-gic: Use effective affinity to unmask
irqchip/mips-gic: Fix shifts to extract register fields
genirq: Check __free_irq() return value for NULL
Pull objtool fixes from Thomas Gleixner:
"Two small fixes for objtool:
- Support frame pointer setup via 'lea (%rsp), %rbp' which was not
yet supported and caused build warnings
- Disable unreacahble warnings for GCC4.4 and older to avoid false
positives caused by the compiler itself"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool: Support unoptimized frame pointer setup
objtool: Skip unreachable warnings for GCC 4.4 and older
Commit 2ca492e22c has moved the call to 'kfifo_alloc()' from after the
main 'if' statement to before it.
But it has not updated the error handling paths accordingly.
Fix all that:
- if 'kfifo_alloc()' fails we can return directly
- direct returns after 'kfifo_alloc()' must now go to 'out_mbox_free'
- 'goto out_mbox_free' must be replaced by 'goto out', otherwise the
'[pcc_]mbox_free_channel()' call will be missed.
Fixes: 2ca492e22c ("hwmon: (xgene) Fix crash when alarm occurs before driver probe")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
"uuid" must be invisible if both ns->uuid and ns->nguid are unset,
not if either one is.
Fixes: d934f9848a "nvme: provide UUID value to userspace"
Signed-off-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
In commit e3a77561e7 ("tipc: split up function tipc_msg_eval()"),
we have updated the function tipc_msg_lookup_dest() to set the error
codes to negative values at destination lookup failures. Thus when
the function sets the error code to -TIPC_ERR_NO_NAME, its inserted
into the 4 bit error field of the message header as 0xf instead of
TIPC_ERR_NO_NAME (1). The value 0xf is an unknown error code.
In this commit, we set only positive error code.
Fixes: e3a77561e7 ("tipc: split up function tipc_msg_eval()")
Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni says:
====================
udp: fix early demux for mcast packets
Currently the early demux callbacks do not perform source address validation.
This is not an issue for TCP or UDP unicast, where the early demux
is only allowed for connected sockets and the source address is validated
for the first packet and never change.
The UDP protocol currently allows early demux also for unconnected multicast
sockets, and we are not currently doing any validation for them, after that
the first packet lands on the socket: beyond ignoring the rp_filter - if
enabled - any kind of martian sources are also allowed.
This series addresses the issue allowing the early demux callback to return an
error code, and performing the proper checks for unconnected UDP multicast
sockets before leveraging the rx dst cache.
Alternatively we could disable the early demux for unconnected mcast sockets,
but that would cause relevant performance regression - around 50% - while with
this series, with full rp_filter in place, we keep the regression to a more
moderate level.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The UDP early demux can leverate the rx dst cache even for
multicast unconnected sockets.
In such scenario the ipv4 source address is validated only on
the first packet in the given flow. After that, when we fetch
the dst entry from the socket rx cache, we stop enforcing
the rp_filter and we even start accepting any kind of martian
addresses.
Disabling the dst cache for unconnected multicast socket will
cause large performace regression, nearly reducing by half the
max ingress tput.
Instead we factor out a route helper to completely validate an
skb source address for multicast packets and we call it from
the UDP early demux for mcast packets landing on unconnected
sockets, after successful fetching the related cached dst entry.
This still gives a measurable, but limited performance
regression:
rp_filter = 0 rp_filter = 1
edmux disabled: 1182 Kpps 1127 Kpps
edmux before: 2238 Kpps 2238 Kpps
edmux after: 2037 Kpps 2019 Kpps
The above figures are on top of current net tree.
Applying the net-next commit 6e617de84e ("net: avoid a full
fib lookup when rp_filter is disabled.") the delta with
rp_filter == 0 will decrease even more.
Fixes: 421b3885bf ("udp: ipv4: Add udp early demux")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>