2
0
mirror of https://github.com/edk2-porting/linux-next.git synced 2025-01-02 10:43:57 +08:00
Commit Graph

840661 Commits

Author SHA1 Message Date
Paolo Abeni
720f1de402 pktgen: do not sleep with the thread lock held.
Currently, the process issuing a "start" command on the pktgen procfs
interface, acquires the pktgen thread lock and never release it, until
all pktgen threads are completed. The above can blocks indefinitely any
other pktgen command and any (even unrelated) netdevice removal - as
the pktgen netdev notifier acquires the same lock.

The issue is demonstrated by the following script, reported by Matteo:

ip -b - <<'EOF'
	link add type dummy
	link add type veth
	link set dummy0 up
EOF
modprobe pktgen
echo reset >/proc/net/pktgen/pgctrl
{
	echo rem_device_all
	echo add_device dummy0
} >/proc/net/pktgen/kpktgend_0
echo count 0 >/proc/net/pktgen/dummy0
echo start >/proc/net/pktgen/pgctrl &
sleep 1
rmmod veth

Fix the above releasing the thread lock around the sleep call.

Additionally we must prevent racing with forcefull rmmod - as the
thread lock no more protects from them. Instead, acquire a self-reference
before waiting for any thread. As a side effect, running

rmmod pktgen

while some thread is running now fails with "module in use" error,
before this patch such command hanged indefinitely.

Note: the issue predates the commit reported in the fixes tag, but
this fix can't be applied before the mentioned commit.

v1 -> v2:
 - no need to check for thread existence after flipping the lock,
   pktgen threads are freed only at net exit time
 -

Fixes: 6146e6a43b ("[PKTGEN]: Removes thread_{un,}lock() macros.")
Reported-and-tested-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-06 11:31:35 -07:00
Linus Torvalds
44e843eb5c As a result of some of Al Viro's great work, here are a few cleanups
with fixes for adfs:
 
 - factor out filename comparison, so we can be sure that adfs_compare()
   (used for namei compare) and adfs_match() (used for lookup) have the
   same behaviour.
 - factor out filename lowering (which is not the same as tolower() which
   will lower top-bit-set characters) to ensure that we have the same
   behaviour when comparing filenames as when we hash them.
 - factor out the object fixups, so we are applying all fixups to
   directory objects in the same way, independent of the disk format.
 - factor out the object name fixup (into the previously factored out
   function) to ensure that filenames are appropriately translated -
   for example, adfs allows '/' in filenames, which being the Unix path
   separator, need to be translated to a different character, which is
   normally '.' (DOS 8.3 filenames represent the . as a / on adfs, so
   this is the expected reverse translation.)
 - remove filename truncation; Al asked about this and apparently the
   decision is to remove it.  In any case, adfs's truncation was buggy,
   so this rids us of that bug by removing the truncation feature.
 - we now have only one location which adds the "filetype" suffix to the
   filename, so there's no point that code being out of line.
 - since we translate '/' into '.', an adfs filename of "/" or "//" would
   end up being translated to "." and ".." which have special meanings.
   In this case, change the first character to "^" to avoid these special
   directory names being abused.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIVAwUAXPD4oPTnkBvkraxkAQK35Q//bql8SDnF9sqy8YlVUmAgoIMdyQtiFuD7
 5UgRBPDl4bIm2Y+mtuYjU/u3nq6tZit0HukybUvD5yA64Opy1Ahkf8OB/f0f6TTU
 xjx55/D9QRj4loGxXHM6PuEO9GX+pIsYPFQoufPYq7hksQB2y1ETkjENk4W4PY2m
 gvbtQqQmz/B+G7PZvrMsZQV/BwYF3vhP8S/qLbgl3PAbciVofruXtPJNRxgOL8ot
 hbOIfT5x30YZpILzXqDZJq4mviWPru+FVJ1uIW1Nd5s8T/9seICxXMjFaMQJUSMn
 oIHCzC1WlP4uRbjwmJ+lyLlEPyYrgYN3+H1FcIO0MTYfBXwYZrVdFLW4TtWBracc
 8bRa+p9jeRe9jdlKpGaX12a4W7xQ3SmB2i8UFE+/epnBqPvuDPOM0h19XK+FclTH
 fAg7Ej1uBC1ROkTlW4OXBsLakXvIIka859DQYQduVKDw8kTSH4QyTdAE7qvOGz/d
 Y0XMeIUz+U1izVJ8ShHCVyttXnkBkC6Xwoc6RY3IET++Fu0hXCMdQMSf6Ta8zJjA
 EUAdb3GLYdJTqX6Oy9NTUd42GPnZwR5KMUOJd6v9ETU7gLDRDyu9ILpgrF+vFaKj
 Xnf7B+D4l1jdB5cU/MYzfzF7Ky80vDHjVr62PSvtb5X9F0pHOINgZJ5an8n80bDc
 dxQq92h9hCI=
 =mQU/
 -----END PGP SIGNATURE-----

Merge tag 'for-rc-adfs' of git://git.armlinux.org.uk/~rmk/linux-arm

Pull ADFS cleanups/fixes from Russell King:
 "As a result of some of Al Viro's great work, here are a few cleanups
  with fixes for adfs:

   - factor out filename comparison, so we can be sure that
     adfs_compare() (used for namei compare) and adfs_match() (used for
     lookup) have the same behaviour.

   - factor out filename lowering (which is not the same as tolower()
     which will lower top-bit-set characters) to ensure that we have the
     same behaviour when comparing filenames as when we hash them.

   - factor out the object fixups, so we are applying all fixups to
     directory objects in the same way, independent of the disk format.

   - factor out the object name fixup (into the previously factored out
     function) to ensure that filenames are appropriately translated -
     for example, adfs allows '/' in filenames, which being the Unix
     path separator, need to be translated to a different character,
     which is normally '.' (DOS 8.3 filenames represent the . as a / on
     adfs, so this is the expected reverse translation.)

   - remove filename truncation; Al asked about this and apparently the
     decision is to remove it. In any case, adfs's truncation was buggy,
     so this rids us of that bug by removing the truncation feature.

   - we now have only one location which adds the "filetype" suffix to
     the filename, so there's no point that code being out of line.

   - since we translate '/' into '.', an adfs filename of "/" or "//"
     would end up being translated to "." and ".." which have special
     meanings. In this case, change the first character to "^" to avoid
     these special directory names being abused"

* tag 'for-rc-adfs' of git://git.armlinux.org.uk/~rmk/linux-arm:
  fs/adfs: fix filename fixup handling for "/" and "//" names
  fs/adfs: move append_filetype_suffix() into adfs_object_fixup()
  fs/adfs: remove truncated filename hashing
  fs/adfs: factor out filename fixup
  fs/adfs: factor out object fixups
  fs/adfs: factor out filename case lowering
  fs/adfs: factor out filename comparison
2019-06-06 11:02:54 -07:00
Maxime Chevallier
d37acd5aa9 net: mvpp2: Use strscpy to handle stat strings
Use a safe strscpy call to copy the ethtool stat strings into the
relevant buffers, instead of a memcpy that will be accessing
out-of-bound data.

Fixes: 118d6298f6 ("net: mvpp2: add ethtool GOP statistics")
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-06 10:38:42 -07:00
Zhu Yanjun
85cb928787 net: rds: fix memory leak in rds_ib_flush_mr_pool
When the following tests last for several hours, the problem will occur.

Server:
    rds-stress -r 1.1.1.16 -D 1M
Client:
    rds-stress -r 1.1.1.14 -s 1.1.1.16 -D 1M -T 30

The following will occur.

"
Starting up....
tsks   tx/s   rx/s  tx+rx K/s    mbi K/s    mbo K/s tx us/c   rtt us cpu
%
  1      0      0       0.00       0.00       0.00    0.00 0.00 -1.00
  1      0      0       0.00       0.00       0.00    0.00 0.00 -1.00
  1      0      0       0.00       0.00       0.00    0.00 0.00 -1.00
  1      0      0       0.00       0.00       0.00    0.00 0.00 -1.00
"
>From vmcore, we can find that clean_list is NULL.

>From the source code, rds_mr_flushd calls rds_ib_mr_pool_flush_worker.
Then rds_ib_mr_pool_flush_worker calls
"
 rds_ib_flush_mr_pool(pool, 0, NULL);
"
Then in function
"
int rds_ib_flush_mr_pool(struct rds_ib_mr_pool *pool,
                         int free_all, struct rds_ib_mr **ibmr_ret)
"
ibmr_ret is NULL.

In the source code,
"
...
list_to_llist_nodes(pool, &unmap_list, &clean_nodes, &clean_tail);
if (ibmr_ret)
        *ibmr_ret = llist_entry(clean_nodes, struct rds_ib_mr, llnode);

/* more than one entry in llist nodes */
if (clean_nodes->next)
        llist_add_batch(clean_nodes->next, clean_tail, &pool->clean_list);
...
"
When ibmr_ret is NULL, llist_entry is not executed. clean_nodes->next
instead of clean_nodes is added in clean_list.
So clean_nodes is discarded. It can not be used again.
The workqueue is executed periodically. So more and more clean_nodes are
discarded. Finally the clean_list is NULL.
Then this problem will occur.

Fixes: 1bc144b625 ("net, rds, Replace xlist in net/rds/xlist.h with llist")
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-06 10:32:16 -07:00
David S. Miller
8d037f92de Merge branch 'ipv6-fix-EFAULT-on-sendto-with-icmpv6-and-hdrincl'
Olivier Matz says:

====================
ipv6: fix EFAULT on sendto with icmpv6 and hdrincl

The following code returns EFAULT (Bad address):

  s = socket(AF_INET6, SOCK_RAW, IPPROTO_ICMPV6);
  setsockopt(s, SOL_IPV6, IPV6_HDRINCL, 1);
  sendto(ipv6_icmp6_packet, addr);   /* returns -1, errno = EFAULT */

The problem is fixed in the second patch. The first one aligns the
code to ipv4, to avoid a race condition in the second patch.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-06 10:29:21 -07:00
Olivier Matz
b9aa52c4cb ipv6: fix EFAULT on sendto with icmpv6 and hdrincl
The following code returns EFAULT (Bad address):

  s = socket(AF_INET6, SOCK_RAW, IPPROTO_ICMPV6);
  setsockopt(s, SOL_IPV6, IPV6_HDRINCL, 1);
  sendto(ipv6_icmp6_packet, addr);   /* returns -1, errno = EFAULT */

The IPv4 equivalent code works. A workaround is to use IPPROTO_RAW
instead of IPPROTO_ICMPV6.

The failure happens because 2 bytes are eaten from the msghdr by
rawv6_probe_proto_opt() starting from commit 19e3c66b52 ("ipv6
equivalent of "ipv4: Avoid reading user iov twice after
raw_probe_proto_opt""), but at that time it was not a problem because
IPV6_HDRINCL was not yet introduced.

Only eat these 2 bytes if hdrincl == 0.

Fixes: 715f504b11 ("ipv6: add IPV6_HDRINCL option for raw sockets")
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-06 10:29:21 -07:00
Olivier Matz
59e3e4b526 ipv6: use READ_ONCE() for inet->hdrincl as in ipv4
As it was done in commit 8f659a03a0 ("net: ipv4: fix for a race
condition in raw_sendmsg") and commit 20b50d7997 ("net: ipv4: emulate
READ_ONCE() on ->hdrincl bit-field in raw_sendmsg()") for ipv4, copy the
value of inet->hdrincl in a local variable, to avoid introducing a race
condition in the next commit.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-06 10:29:21 -07:00
Bob Peterson
638803d456 Revert "gfs2: Replace gl_revokes with a GLF flag"
Commit 73118ca8ba introduced a glock reference counting bug in
gfs2_trans_remove_revoke.  Given that, replacing gl_revokes with a GLF flag is
no longer useful, so revert that commit.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-06-06 16:29:26 +02:00
Dave Martin
ebcc5928c5 arm64: Silence gcc warnings about arch ABI drift
Since GCC 9, the compiler warns about evolution of the
platform-specific ABI, in particular relating for the marshaling of
certain structures involving bitfields.

The kernel is a standalone binary, and of course nobody would be
so stupid as to expose structs containing bitfields as function
arguments in ABI.  (Passing a pointer to such a struct, however
inadvisable, should be unaffected by this change.  perf and various
drivers rely on that.)

So these warnings do more harm than good: turn them off.

We may miss warnings about future ABI drift, but that's too bad.
Future ABI breaks of this class will have to be debugged and fixed
the traditional way unless the compiler evolves finer-grained
diagnostics.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-06-06 13:28:45 +01:00
Helge Deller
527a1d1ede parisc: Fix crash due alternative coding for NP iopdir_fdc bit
According to the found documentation, data cache flushes and sync
instructions are needed on the PCX-U+ (PA8200, e.g. C200/C240)
platforms, while PCX-W (PA8500, e.g. C360) platforms aparently don't
need those flushes when changing the IO PDIR data structures.

We have no documentation for PCX-W+ (PA8600) and PCX-W2 (PA8700) CPUs,
but Carlo Pisani reported that his C3600 machine (PA8600, PCX-W+) fails
when the fdc instructions were removed. His firmware didn't set the NIOP
bit, so one may assume it's a firmware bug since other C3750 machines
had the bit set.

Even if documentation (as mentioned above) states that PCX-W (PA8500,
e.g.  J5000) does not need fdc flushes, Sven could show that an Adaptec
29320A PCI-X SCSI controller reliably failed on a dd command during the
first five minutes in his J5000 when fdc flushes were missing.

Going forward, we will now NOT replace the fdc and sync assembler
instructions by NOPS if:
a) the NP iopdir_fdc bit was set by firmware, or
b) we find a CPU up to and including a PCX-W+ (PA8600).

This fixes the HPMC crashes on a C240 and C36XX machines. For other
machines we rely on the firmware to set the bit when needed.

In case one finds HPMC issues, people could try to boot their machines
with the "no-alternatives" kernel option to turn off any alternative
patching.

Reported-by: Sven Schnelle <svens@stackframe.org>
Reported-by: Carlo Pisani <carlojpisani@gmail.com>
Tested-by: Sven Schnelle <svens@stackframe.org>
Fixes: 3847dab774 ("parisc: Add alternative coding infrastructure")
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org # 5.0+
2019-06-06 14:25:22 +02:00
John David Anglin
116d753308 parisc: Use lpa instruction to load physical addresses in driver code
Most I/O in the kernel is done using the kernel offset mapping.
However, there is one API that uses aliased kernel address ranges:

> The final category of APIs is for I/O to deliberately aliased address
> ranges inside the kernel.  Such aliases are set up by use of the
> vmap/vmalloc API.  Since kernel I/O goes via physical pages, the I/O
> subsystem assumes that the user mapping and kernel offset mapping are
> the only aliases.  This isn't true for vmap aliases, so anything in
> the kernel trying to do I/O to vmap areas must manually manage
> coherency.  It must do this by flushing the vmap range before doing
> I/O and invalidating it after the I/O returns.

For this reason, we should use the hardware lpa instruction to load the
physical address of kernel virtual addresses in the driver code.

I believe we only use the vmap/vmalloc API with old PA 1.x processors
which don't have a sba, so we don't hit this problem.

Tested on c3750, c8000 and rp3440.

Signed-off-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>
2019-06-06 14:12:22 +02:00
Krzysztof Kozlowski
ec13c82d26 parisc: configs: Remove useless UEVENT_HELPER_PATH
Remove the CONFIG_UEVENT_HELPER_PATH because:
1. It is disabled since commit 1be01d4a57 ("driver: base: Disable
   CONFIG_UEVENT_HELPER by default") as its dependency (UEVENT_HELPER) was
   made default to 'n',
2. It is not recommended (help message: "This should not be used today
   [...] creates a high system load") and was kept only for ancient
   userland,
3. Certain userland specifically requests it to be disabled (systemd
   README: "Legacy hotplug slows down the system and confuses udev").

Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Acked-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Helge Deller <deller@gmx.de>
2019-06-06 14:12:20 +02:00
John David Anglin
63923d2c38 parisc: Use implicit space register selection for loading the coherence index of I/O pdirs
We only support I/O to kernel space. Using %sr1 to load the coherence
index may be racy unless interrupts are disabled. This patch changes the
code used to load the coherence index to use implicit space register
selection. This saves one instruction and eliminates the race.

Tested on rp3440, c8000 and c3750.

Signed-off-by: John David Anglin <dave.anglin@bell.net>
Cc: stable@vger.kernel.org
Signed-off-by: Helge Deller <deller@gmx.de>
2019-06-06 14:12:18 +02:00
George G. Davis
2b55d83e9a ARM64: trivial: s/TIF_SECOMP/TIF_SECCOMP/ comment typo fix
Fix a s/TIF_SECOMP/TIF_SECCOMP/ comment typo

Cc: Jiri Kosina <trivial@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org
Signed-off-by: George G. Davis <george_davis@mentor.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-06-06 10:40:05 +01:00
Hangbin Liu
4970b42d5c Revert "fib_rules: return 0 directly if an exactly same rule exists when NLM_F_EXCL not supplied"
This reverts commit e9919a24d3.

Nathan reported the new behaviour breaks Android, as Android just add
new rules and delete old ones.

If we return 0 without adding dup rules, Android will remove the new
added rules and causing system to soft-reboot.

Fixes: e9919a24d3 ("fib_rules: return 0 directly if an exactly same rule exists when NLM_F_EXCL not supplied")
Reported-by: Nathan Chancellor <natechancellor@gmail.com>
Reported-by: Yaro Slav <yaro330@gmail.com>
Reported-by: Maciej Żenczykowski <zenczykowski@gmail.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Tested-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-05 17:54:46 -07:00
Nikita Danilov
930b9a0543 net: aquantia: fix wol configuration not applied sometimes
WoL magic packet configuration sometimes does not work due to
couple of leakages found.

Mainly there was a regression introduced during readx_poll refactoring.

Next, fw request waiting time was too small. Sometimes that
caused sleep proxy config function to return with an error
and to skip WoL configuration.
At last, WoL data were passed to FW from not clean buffer.
That could cause FW to accept garbage as a random configuration data.

Fixes: 6a7f227731 ("net: aquantia: replace AQ_HW_WAIT_FOR with readx_poll_timeout_atomic")
Signed-off-by: Nikita Danilov <nikita.danilov@aquantia.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-05 17:39:43 -07:00
Vivien Didelot
0ee4e76937 ethtool: fix potential userspace buffer overflow
ethtool_get_regs() allocates a buffer of size ops->get_regs_len(),
and pass it to the kernel driver via ops->get_regs() for filling.

There is no restriction about what the kernel drivers can or cannot do
with the open ethtool_regs structure. They usually set regs->version
and ignore regs->len or set it to the same size as ops->get_regs_len().

But if userspace allocates a smaller buffer for the registers dump,
we would cause a userspace buffer overflow in the final copy_to_user()
call, which uses the regs.len value potentially reset by the driver.

To fix this, make this case obvious and store regs.len before calling
ops->get_regs(), to only copy as much data as requested by userspace,
up to the value returned by ops->get_regs_len().

While at it, remove the redundant check for non-null regbuf.

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Reviewed-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-05 17:15:27 -07:00
Neil Horman
0a8dd9f67c Fix memory leak in sctp_process_init
syzbot found the following leak in sctp_process_init
BUG: memory leak
unreferenced object 0xffff88810ef68400 (size 1024):
  comm "syz-executor273", pid 7046, jiffies 4294945598 (age 28.770s)
  hex dump (first 32 bytes):
    1d de 28 8d de 0b 1b e3 b5 c2 f9 68 fd 1a 97 25  ..(........h...%
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<00000000a02cebbd>] kmemleak_alloc_recursive include/linux/kmemleak.h:55
[inline]
    [<00000000a02cebbd>] slab_post_alloc_hook mm/slab.h:439 [inline]
    [<00000000a02cebbd>] slab_alloc mm/slab.c:3326 [inline]
    [<00000000a02cebbd>] __do_kmalloc mm/slab.c:3658 [inline]
    [<00000000a02cebbd>] __kmalloc_track_caller+0x15d/0x2c0 mm/slab.c:3675
    [<000000009e6245e6>] kmemdup+0x27/0x60 mm/util.c:119
    [<00000000dfdc5d2d>] kmemdup include/linux/string.h:432 [inline]
    [<00000000dfdc5d2d>] sctp_process_init+0xa7e/0xc20
net/sctp/sm_make_chunk.c:2437
    [<00000000b58b62f8>] sctp_cmd_process_init net/sctp/sm_sideeffect.c:682
[inline]
    [<00000000b58b62f8>] sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1384
[inline]
    [<00000000b58b62f8>] sctp_side_effects net/sctp/sm_sideeffect.c:1194
[inline]
    [<00000000b58b62f8>] sctp_do_sm+0xbdc/0x1d60 net/sctp/sm_sideeffect.c:1165
    [<0000000044e11f96>] sctp_assoc_bh_rcv+0x13c/0x200
net/sctp/associola.c:1074
    [<00000000ec43804d>] sctp_inq_push+0x7f/0xb0 net/sctp/inqueue.c:95
    [<00000000726aa954>] sctp_backlog_rcv+0x5e/0x2a0 net/sctp/input.c:354
    [<00000000d9e249a8>] sk_backlog_rcv include/net/sock.h:950 [inline]
    [<00000000d9e249a8>] __release_sock+0xab/0x110 net/core/sock.c:2418
    [<00000000acae44fa>] release_sock+0x37/0xd0 net/core/sock.c:2934
    [<00000000963cc9ae>] sctp_sendmsg+0x2c0/0x990 net/sctp/socket.c:2122
    [<00000000a7fc7565>] inet_sendmsg+0x64/0x120 net/ipv4/af_inet.c:802
    [<00000000b732cbd3>] sock_sendmsg_nosec net/socket.c:652 [inline]
    [<00000000b732cbd3>] sock_sendmsg+0x54/0x70 net/socket.c:671
    [<00000000274c57ab>] ___sys_sendmsg+0x393/0x3c0 net/socket.c:2292
    [<000000008252aedb>] __sys_sendmsg+0x80/0xf0 net/socket.c:2330
    [<00000000f7bf23d1>] __do_sys_sendmsg net/socket.c:2339 [inline]
    [<00000000f7bf23d1>] __se_sys_sendmsg net/socket.c:2337 [inline]
    [<00000000f7bf23d1>] __x64_sys_sendmsg+0x23/0x30 net/socket.c:2337
    [<00000000a8b4131f>] do_syscall_64+0x76/0x1a0 arch/x86/entry/common.c:3

The problem was that the peer.cookie value points to an skb allocated
area on the first pass through this function, at which point it is
overwritten with a heap allocated value, but in certain cases, where a
COOKIE_ECHO chunk is included in the packet, a second pass through
sctp_process_init is made, where the cookie value is re-allocated,
leaking the first allocation.

Fix is to always allocate the cookie value, and free it when we are done
using it.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: syzbot+f7e9153b037eac9b1df8@syzkaller.appspotmail.com
CC: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: netdev@vger.kernel.org
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-05 17:11:47 -07:00
Zhu Yanjun
b50e058746 net: rds: fix memory leak when unload rds_rdma
When KASAN is enabled, after several rds connections are
created, then "rmmod rds_rdma" is run. The following will
appear.

"
BUG rds_ib_incoming (Not tainted): Objects remaining
in rds_ib_incoming on __kmem_cache_shutdown()

Call Trace:
 dump_stack+0x71/0xab
 slab_err+0xad/0xd0
 __kmem_cache_shutdown+0x17d/0x370
 shutdown_cache+0x17/0x130
 kmem_cache_destroy+0x1df/0x210
 rds_ib_recv_exit+0x11/0x20 [rds_rdma]
 rds_ib_exit+0x7a/0x90 [rds_rdma]
 __x64_sys_delete_module+0x224/0x2c0
 ? __ia32_sys_delete_module+0x2c0/0x2c0
 do_syscall_64+0x73/0x190
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
"
This is rds connection memory leak. The root cause is:
When "rmmod rds_rdma" is run, rds_ib_remove_one will call
rds_ib_dev_shutdown to drop the rds connections.
rds_ib_dev_shutdown will call rds_conn_drop to drop rds
connections as below.
"
rds_conn_path_drop(&conn->c_path[0], false);
"
In the above, destroy is set to false.
void rds_conn_path_drop(struct rds_conn_path *cp, bool destroy)
{
        atomic_set(&cp->cp_state, RDS_CONN_ERROR);

        rcu_read_lock();
        if (!destroy && rds_destroy_pending(cp->cp_conn)) {
                rcu_read_unlock();
                return;
        }
        queue_work(rds_wq, &cp->cp_down_w);
        rcu_read_unlock();
}
In the above function, destroy is set to false. rds_destroy_pending
is called. This does not move rds connections to ib_nodev_conns.
So destroy is set to true to move rds connections to ib_nodev_conns.
In rds_ib_unregister_client, flush_workqueue is called to make rds_wq
finsh shutdown rds connections. The function rds_ib_destroy_nodev_conns
is called to shutdown rds connections finally.
Then rds_ib_recv_exit is called to destroy slab.

void rds_ib_recv_exit(void)
{
        kmem_cache_destroy(rds_ib_incoming_slab);
        kmem_cache_destroy(rds_ib_frag_slab);
}
The above slab memory leak will not occur again.

>From tests,
256 rds connections
[root@ca-dev14 ~]# time rmmod rds_rdma

real    0m16.522s
user    0m0.000s
sys     0m8.152s
512 rds connections
[root@ca-dev14 ~]# time rmmod rds_rdma

real    0m32.054s
user    0m0.000s
sys     0m15.568s

To rmmod rds_rdma with 256 rds connections, about 16 seconds are needed.
And with 512 rds connections, about 32 seconds are needed.
>From ftrace, when one rds connection is destroyed,

"
 19)               |  rds_conn_destroy [rds]() {
 19)   7.782 us    |    rds_conn_path_drop [rds]();
 15)               |  rds_shutdown_worker [rds]() {
 15)               |    rds_conn_shutdown [rds]() {
 15)   1.651 us    |      rds_send_path_reset [rds]();
 15)   7.195 us    |    }
 15) + 11.434 us   |  }
 19)   2.285 us    |    rds_cong_remove_conn [rds]();
 19) * 24062.76 us |  }
"
So if many rds connections will be destroyed, this function
rds_ib_destroy_nodev_conns uses most of time.

Suggested-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-05 17:08:14 -07:00
Xin Long
b7999b0772 ipv6: fix the check before getting the cookie in rt6_get_cookie
In Jianlin's testing, netperf was broken with 'Connection reset by peer',
as the cookie check failed in rt6_check() and ip6_dst_check() always
returned NULL.

It's caused by Commit 93531c6743 ("net/ipv6: separate handling of FIB
entries from dst based routes"), where the cookie can be got only when
'c1'(see below) for setting dst_cookie whereas rt6_check() is called
when !'c1' for checking dst_cookie, as we can see in ip6_dst_check().

Since in ip6_dst_check() both rt6_dst_from_check() (c1) and rt6_check()
(!c1) will check the 'from' cookie, this patch is to remove the c1 check
in rt6_get_cookie(), so that the dst_cookie can always be set properly.

c1:
  (rt->rt6i_flags & RTF_PCPU || unlikely(!list_empty(&rt->rt6i_uncached)))

Fixes: 93531c6743 ("net/ipv6: separate handling of FIB entries from dst based routes")
Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-05 17:00:29 -07:00
Xin Long
0a90478b93 ipv4: not do cache for local delivery if bc_forwarding is enabled
With the topo:

    h1 ---| rp1            |
          |     route  rp3 |--- h3 (192.168.200.1)
    h2 ---| rp2            |

If rp1 bc_forwarding is set while rp2 bc_forwarding is not, after
doing "ping 192.168.200.255" on h1, then ping 192.168.200.255 on
h2, and the packets can still be forwared.

This issue was caused by the input route cache. It should only do
the cache for either bc forwarding or local delivery. Otherwise,
local delivery can use the route cache for bc forwarding of other
interfaces.

This patch is to fix it by not doing cache for local delivery if
all.bc_forwarding is enabled.

Note that we don't fix it by checking route cache local flag after
rt_cache_valid() in "local_input:" and "ip_mkroute_input", as the
common route code shouldn't be touched for bc_forwarding.

Fixes: 5cbf777cfd ("route: add support for directed broadcast forwarding")
Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-05 16:59:21 -07:00
Linus Torvalds
156c05917e linux-kselftest-5.2-rc4
This Kselftest update for Linux 5.2-rc4 consists of
 
 - Alex Shi's fixes to cgroup tests
 - Alakesh Haloi's fix to userfaultfd compiler warning
 - Naresh Kamboju's fix to vm install to include test script to run
   the test.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAlz36zAACgkQCwJExA0N
 QxyEVxAAhvjL+l9QQpexV91Cl6nu5OIhElsqobbAzQjcKCE5SENVhHTh5rJCuX1W
 yWyl5CdS6XnUWxT2mxsSCJxSBQUFBqsIZy4qLzSORFi2zYL11ICJU/6kJ7WGMdhp
 tASpD4txNzjzKecslCgTXBupyEaGAnDmzE4YeNy09AUbpFcYPr++wqlgc58C9u6k
 uGsc0YVQhsxuK+1gCtoOHjxIiY5RXBegb6v2krpQFMhR/AOrnhunMyhpLSTxrTnq
 jEaOvKI/ug4i4VZUdrd0Z0dcZMbE1Iqyvc7BgIip93VQjvgZGpL+0gRolTCboLmc
 ufk6GuDw++jeU/AGMp7qva2x0L0JTVuGzQQ28HUOcwugBFd5bI8LtgwDCI+gQIB8
 haNS6uWnWjl2ezmdXqChz0nA9J/RcsBeYPxhVqW9s6Rj2CYqLAmkL3pW063/7ayG
 dBwBPad0JSSZiMkf8R26pbJEcaCJFgpINYobnQ3/SbBQ36ripE0lzwKiyOXp4y+v
 BHcrtiNOSwaFUiy+kd9/0og1uvSebdgtRCqvkukIMJpMPIJlC0m7t4X21fzwAYH4
 K6euOYUABvy81EBlhkJKwwQhukoL3OSVuayumtHvzI1lXxh/fXFtQdYdVxMpRs0H
 IjTdw+mIpwrIiGI4fUyX6fkkDnopYapzU/ETl7aLCaSAGzC07oc=
 =9KtO
 -----END PGP SIGNATURE-----

Merge tag 'linux-kselftest-5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull Kselftest fixes from Shuah Khan:

 - fixes to cgroup tests (Alex Shi)

 - fix to userfaultfd compiler warning (Alakesh Haloi)

 - fix to vm install to include test script to run the test (Naresh
   Kamboju)

* tag 'linux-kselftest-5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests: vm: install test_vmalloc.sh for run_vmtests
  userfaultfd: selftest: fix compiler warning
  kselftest/cgroup: fix incorrect test_core skip
  kselftest/cgroup: fix unexpected testing failure on test_core
  kselftest/cgroup: fix unexpected testing failure on test_memcontrol
2019-06-05 13:09:55 -07:00
Krzesimir Nowak
1884c06657 tools: bpftool: Fix JSON output when lookup fails
In commit 9a5ab8bf1d ("tools: bpftool: turn err() and info() macros
into functions") one case of error reporting was special cased, so it
could report a lookup error for a specific key when dumping the map
element. What the code forgot to do is to wrap the key and value keys
into a JSON object, so an example output of pretty JSON dump of a
sockhash map (which does not support looking up its values) is:

[
    "key": ["0x0a","0x41","0x00","0x02","0x1f","0x78","0x00","0x00"
    ],
    "value": {
        "error": "Operation not supported"
    },
    "key": ["0x0a","0x41","0x00","0x02","0x1f","0x78","0x00","0x01"
    ],
    "value": {
        "error": "Operation not supported"
    }
]

Note the key-value pairs inside the toplevel array. They should be
wrapped inside a JSON object, otherwise it is an invalid JSON. This
commit fixes this, so the output now is:

[{
        "key": ["0x0a","0x41","0x00","0x02","0x1f","0x78","0x00","0x00"
        ],
        "value": {
            "error": "Operation not supported"
        }
    },{
        "key": ["0x0a","0x41","0x00","0x02","0x1f","0x78","0x00","0x01"
        ],
        "value": {
            "error": "Operation not supported"
        }
    }
]

Fixes: 9a5ab8bf1d ("tools: bpftool: turn err() and info() macros into functions")
Cc: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Krzesimir Nowak <krzesimir@kinvolk.io>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-05 22:08:26 +02:00
Linus Torvalds
db309f2aed pidfd fixes for v5.2-rc4
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE7btrcuORLb1XUhEwjrBW1T7ssS0FAlz3v20ACgkQjrBW1T7s
 sS1rbw//clR7YbczqO1Og5W3Bpg2JdPPQjXyfomO9gqPsOuOaRFQ4vmhvrCnavN1
 4SdXm3QzXZ30fnhgtoPzrNgPrZBU4DzGq7V8G6NCWxu0wsWJrYOy++FK6WpW3ngB
 AFRRWCwuI9u3xEu/0xXgd2UtM2Wy9rmu5PNa/BDkIYx33F56Lz0aIQrzrYKX3a+W
 DroRiPpNTrrThFyFUH1pPs0rZHWDY9l90drq7QxZB5+7irktHHKiywL71N4gy7Wj
 Hje7P5pc3Zkj2qNKT1im/ccSWdApOrTlzDrIx5GLJpZCycVlCwcGRF8F1+l6g/Pg
 AV3ABMo2k5SLQ+q/3PzlCFhmIvPL/ucly+l7KbYrwwb0Zn+QKuwc+Z8vFJt3daeQ
 BZPrxWse3Iwjg2S/b4tyrbxowS6SmPGQ7Dmk62q7nffgsb161uypz1/p/dLmgL9b
 W1bY12bZHB/nr0smTewQk1N15dvxnsViFa1oyAdJjtngbA448nCQnZklFIaEl4Mo
 NngcgKSQrp4B7G+htcPCi5Jda1GE4blhed/fwDN9G4mbCBgnhG+GB8ABx46/tlpV
 A+dt0Bm88lNZETqNBQHFqE0nOaNzzdCGteHCx+61ZhX1eaFNRb3AaLsrGs9PR4zY
 PVrvwmMksnTZ5Wc3YYeSHbC7IlkMdB8tbi+HFbZapPucu+33VFA=
 =KySr
 -----END PGP SIGNATURE-----

Merge tag 'pidfd-fixes-v5.2-rc4' of gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux

Pull pidfd fixes from Christian Brauner:
 "The contains two small patches to the pidfd samples and test binaries
  respectively.

  They were lacking appropriate ifdefines for __NR_pidfd_send_signal and
  could hence lead to compilation errors when that was not defined.

  This was spotted on mips independently by Guenter Roeck (who was kind
  enough to send a fix for the samples binary) and Arnd who spotted it
  in linux-next.

  Apart from these two patches, there's also a patch to update the
  comments for the pidfd_send_signal() syscall which were slightly
  wrong/inconsistenly worded"

* tag 'pidfd-fixes-v5.2-rc4' of gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux:
  tests: fix pidfd-test compilation
  signal: improve comments
  samples: fix pidfd-metadata compilation
2019-06-05 13:03:36 -07:00
Linus Torvalds
47358b6475 pstore fixes for v5.2-rc4
- Avoid NULL deref when unloading/reloading ramoops module (Pi-Hsun Shih)
 - Run ramoops without crash dump region
 -----BEGIN PGP SIGNATURE-----
 Comment: Kees Cook <kees@outflux.net>
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAlz3NB8WHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJjGyEACklx2W3qjE51oCWpRuN9B29As0
 XzuWrQr15WzD2zAtdG4wc6/OI3Gfu+/xXb9YClqXFN8TqGEwyGIcz0kOOExvzJN2
 bdseq8gA4JkL8NK3LKjGMXCvUBQiCGUKfGa4xXL8I2NfyZkykUqRa0PVkkfNYOEf
 q6zjPz73BDRvpZUw7+50sKDJalcicwOzn3GMXw7C43qDuuychpwzLTL5ZmFrQ3oX
 qJqz7mIfsP5DpJk8SUTZl+W4eZ6/ianfML883ia9Zg8AP6ix/iET0iQHXw59DbOZ
 XeFmXBudou+JNAjqlDbGppBwJOu3iHXFKh7eJre2W2swkdah/V8CvYo36qdJ9zHP
 zs4/Wt/yloWYZqtY4UWsMhs47ryvm8iC2Ki//OPTZh30fIeqGAcVknbFbu1EHron
 autOEy8DiKH5I76BGGaR78We6AVt04HXTT0kFcDgczv3MLhfOpHLoL4w4fM0NvNq
 3CSDEkr6dsTQPCPUoApBo3rfbiVROzgXdDLLLxULWphtL6rAvvn/FmAPQsC7OdN3
 TdZQ0AjMtiQO32TFfm9badadDXW2QjXJF91TQBqtGacR+ipiXSnImeZC24VCdXyT
 pO9U/rbrU3tds3+Qu1WNh87IvEWOjzC/sjDKSd/ClZqk9F0KVGGSxc9YXgxNzLIR
 gC0luMlt7acj4Jzkog==
 =g26W
 -----END PGP SIGNATURE-----

Merge tag 'pstore-v5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull pstore fixes from Kees Cook:

 - Avoid NULL deref when unloading/reloading ramoops module (Pi-Hsun
   Shih)

 - Run ramoops without crash dump region

* tag 'pstore-v5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  pstore/ram: Run without kernel crash dump region
  pstore: Set tfm to NULL on free_buf_for_compression
2019-06-05 12:42:26 -07:00
David S. Miller
e7a9fe7b0d Merge branch 's390-qeth-fixes'
Julian Wiedmann says:

====================
s390/qeth: fixes 2019-06-05

one more shot...  now with patch 2 fixed up so that it uses the
dst entry returned from dst_check().

From the v1 cover letter:

Please apply the following set of qeth fixes to -net.

- The first two patches fix issues in the L3 driver's cast type
  selection for transmitted skbs.
- Alexandra adds a sanity check when retrieving VLAN information from
  neighbour address events.
- The last patch adds some missing error handling for qeth's new
  multiqueue code.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-05 11:48:57 -07:00
Julian Wiedmann
bd966839bd s390/qeth: handle error when updating TX queue count
netif_set_real_num_tx_queues() can return an error, deal with it.

Fixes: 73dc2daf11 ("s390/qeth: add TX multiqueue support for OSA devices")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-05 11:48:57 -07:00
Alexandra Winter
335726195e s390/qeth: fix VLAN attribute in bridge_hostnotify udev event
Enabling sysfs attribute bridge_hostnotify triggers a series of udev events
for the MAC addresses of all currently connected peers. In case no VLAN is
set for a peer, the device reports the corresponding MAC addresses with
VLAN ID 4096. This currently results in attribute VLAN=4096 for all
non-VLAN interfaces in the initial series of events after host-notify is
enabled.

Instead, no VLAN attribute should be reported in the udev event for
non-VLAN interfaces.

Only the initial events face this issue. For dynamic changes that are
reported later, the device uses a validity flag.

This also changes the code so that it now sets the VLAN attribute for
MAC addresses with VID 0. On Linux, no qeth interface will ever be
registered with VID 0: Linux kernel registers VID 0 on all network
interfaces initially, but qeth will drop .ndo_vlan_rx_add_vid for VID 0.
Peers with other OSs could register MACs with VID 0.

Fixes: 9f48b9db9a ("qeth: bridgeport support - address notifications")
Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-05 11:48:57 -07:00
Julian Wiedmann
0cd6783d3c s390/qeth: check dst entry before use
While qeth_l3 uses netif_keep_dst() to hold onto the dst, a skb's dst
may still have been obsoleted (via dst_dev_put()) by the time that we
end up using it. The dst then points to the loopback interface, which
means the neighbour lookup in qeth_l3_get_cast_type() determines a bogus
cast type of RTN_BROADCAST.
For IQD interfaces this causes us to place such skbs on the wrong
HW queue, resulting in TX errors.

Fix-up the various call sites to first validate the dst entry with
dst_check(), and fall back accordingly.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-05 11:48:57 -07:00
Julian Wiedmann
72c87976c5 s390/qeth: handle limited IPv4 broadcast in L3 TX path
When selecting the cast type of a neighbourless IPv4 skb (eg. on a raw
socket), qeth_l3 falls back to the packet's destination IP address.
For this case we should classify traffic sent to 255.255.255.255 as
broadcast.
This fixes DHCP requests, which were misclassified as unicast
(and for IQD interfaces thus ended up on the wrong HW queue).

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-05 11:48:57 -07:00
Christian Brauner
1fcd0eb356
tests: fix pidfd-test compilation
Define __NR_pidfd_send_signal if it isn't to prevent a potential
compilation error.

To make pidfd-test compile on all arches, irrespective of whether
or not syscall numbers are assigned, define the syscall number to -1.
If it isn't defined this will cause the kernel to return -ENOSYS.

Fixes: 575a0ae974 ("selftests: add tests for pidfd_send_signal()")
Signed-off-by: Christian Brauner <christian@brauner.io>
2019-06-05 15:06:32 +02:00
Christian Brauner
c732327f04
signal: improve comments
Improve the comments for pidfd_send_signal().
First, the comment still referred to a file descriptor for a process as a
"task file descriptor" which stems from way back at the beginning of the
discussion. Replace this with "pidfd" for consistency.
Second, the wording for the explanation of the arguments to the syscall
was a bit inconsistent, e.g. some used the past tense some used present
tense. Make the wording more consistent.

Signed-off-by: Christian Brauner <christian@brauner.io>
2019-06-05 15:06:07 +02:00
Guenter Roeck
7c33277b9a
samples: fix pidfd-metadata compilation
Define __NR_pidfd_send_signal if it isn't to prevent a compilation error.

To make pidfd-metadata compile on all arches, irrespective of whether
or not syscall numbers are assigned, define the syscall number to -1.
If it isn't defined this will cause the kernel to return -ENOSYS.

Fixes: 43c6afee48 ("samples: show race-free pidfd metadata access")
Reported-by: Arnd Bergmann <arnd@arndb.de>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Cc: Christian Brauner <christian@brauner.io>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
[christian@brauner.io: tweak commit message]
Signed-off-by: Christian Brauner <christian@brauner.io>
2019-06-05 15:06:07 +02:00
Anders Roxell
f31e98bfae arm64: arch_timer: mark functions as __always_inline
If CONFIG_FUNCTION_GRAPH_TRACER is enabled function
arch_counter_get_cntvct() is marked as notrace. However, function
__arch_counter_get_cntvct is marked as inline. If
CONFIG_OPTIMIZE_INLINING is set that will make the two functions
tracable which they shouldn't.

Rework so that functions __arch_counter_get_* are marked with
__always_inline so they will be inlined even if CONFIG_OPTIMIZE_INLINING
is turned on.

Fixes: 0ea415390c ("clocksource/arm_arch_timer: Use arch_timer_read_counter to access stable counters")
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-06-05 13:24:06 +01:00
Florian Fainelli
262afe92fa arm64: smp: Moved cpu_logical_map[] to smp.h
asm/smp.h is included by linux/smp.h and some drivers, in particular
irqchip drivers can access cpu_logical_map[] in order to perform SMP
affinity tasks. Make arm64 consistent with other architectures here.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-06-05 13:09:11 +01:00
Dave Martin
78ed70bf3a arm64: cpufeature: Fix missing ZFR0 in __read_sysreg_by_encoding()
In commit 06a916feca ("arm64: Expose SVE2 features for
userspace"), new hwcaps are added that are detected via fields in
the SVE-specific ID register ID_AA64ZFR0_EL1.

In order to check compatibility of secondary cpus with the hwcaps
established at boot, the cpufeatures code uses
__read_sysreg_by_encoding() to read this ID register based on the
sys_reg field of the arm64_elf_hwcaps[] table.

This leads to a kernel splat if an hwcap uses an ID register that
__read_sysreg_by_encoding() doesn't explicitly handle, as now
happens when exercising cpu hotplug on an SVE2-capable platform.

So fix it by adding the required case in there.

Fixes: 06a916feca ("arm64: Expose SVE2 features for userspace")
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-06-05 13:05:28 +01:00
Hangbin Liu
25a7991c84 selftests/bpf: move test_lirc_mode2_user to TEST_GEN_PROGS_EXTENDED
test_lirc_mode2_user is included in test_lirc_mode2.sh test and should
not be run directly.

Fixes: 6bdd533cee ("bpf: add selftest for lirc_mode2 type program")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-06-05 12:26:46 +02:00
Paolo Abeni
fdf71426e7 net: fix indirect calls helpers for ptype list hooks.
As Eric noted, the current wrapper for ptype func hook inside
__netif_receive_skb_list_ptype() has no chance of avoiding the indirect
call: we enter such code path only for protocols other than ipv4 and
ipv6.

Instead we can wrap the list_func invocation.

v1 -> v2:
 - use the correct fix tag

Fixes: f5737cbadb ("net: use indirect calls helpers for ptype hook")
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Edward Cree <ecree@solarflare.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-04 20:16:22 -07:00
Miaohe Lin
ceae266bf0 net: ipvlan: Fix ipvlan device tso disabled while NETIF_F_IP_CSUM is set
There's some NICs, such as hinic, with NETIF_F_IP_CSUM and NETIF_F_TSO
on but NETIF_F_HW_CSUM off. And ipvlan device features will be
NETIF_F_TSO on with NETIF_F_IP_CSUM and NETIF_F_IP_CSUM both off as
IPVLAN_FEATURES only care about NETIF_F_HW_CSUM. So TSO will be
disabled in netdev_fix_features.
For example:
Features for enp129s0f0:
rx-checksumming: on
tx-checksumming: on
        tx-checksum-ipv4: on
        tx-checksum-ip-generic: off [fixed]
        tx-checksum-ipv6: on

Fixes: a188222b6e ("net: Rename NETIF_F_ALL_CSUM to NETIF_F_CSUM_MASK")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-04 20:01:15 -07:00
Tim Beale
82ba25c6de udp: only choose unbound UDP socket for multicast when not in a VRF
By default, packets received in another VRF should not be passed to an
unbound socket in the default VRF. This patch updates the IPv4 UDP
multicast logic to match the unicast VRF logic (in compute_score()),
as well as the IPv6 mcast logic (in __udp_v6_is_mcast_sock()).

The particular case I noticed was DHCP discover packets going
to the 255.255.255.255 address, which are handled by
__udp4_lib_mcast_deliver(). The previous code meant that running
multiple different DHCP server or relay agent instances across VRFs
did not work correctly - any server/relay agent in the default VRF
received DHCP discover packets for all other VRFs.

Fixes: 6da5b0f027 ("net: ensure unbound datagram socket to be chosen when not in a VRF")
Signed-off-by: Tim Beale <timbeale@catalyst.net.nz>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-04 18:34:03 -07:00
David S. Miller
2b66552eb2 Merge branch 'net-tls-redo-the-RX-resync-locking'
Jakub Kicinski says:

====================
net/tls: redo the RX resync locking

Take two of making sure we don't use a NULL netdev pointer
for RX resync.  This time using a bit and an open coded
wait loop.

v2:
 - fix build warning (DaveM).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-04 13:34:38 -07:00
Jakub Kicinski
e52972c11d net/tls: replace the sleeping lock around RX resync with a bit lock
Commit 38030d7cb7 ("net/tls: avoid NULL-deref on resync during device removal")
tried to fix a potential NULL-dereference by taking the
context rwsem.  Unfortunately the RX resync may get called
from soft IRQ, so we can't use the rwsem to protect from
the device disappearing.  Because we are guaranteed there
can be only one resync at a time (it's called from strparser)
use a bit to indicate resync is busy and make device
removal wait for the bit to get cleared.

Note that there is a leftover "flags" field in struct
tls_context already.

Fixes: 4799ac81e5 ("tls: Add rx inline crypto offload")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-04 13:34:37 -07:00
Jakub Kicinski
27393f8c6e Revert "net/tls: avoid NULL-deref on resync during device removal"
This reverts commit 38030d7cb7.
Unfortunately the RX resync may get called from soft IRQ,
so we can't take the rwsem to protect from the device
disappearing.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-04 13:34:37 -07:00
Vladimir Oltean
f4cfcfbdf0 net: dsa: sja1105: Fix link speed not working at 100 Mbps and below
The hardware values for link speed are held in the sja1105_speed_t enum.
However they do not increase in the order that sja1105_get_speed_cfg was
iterating over them (basically from SJA1105_SPEED_AUTO - 0 - to
SJA1105_SPEED_1000MBPS - 1 - skipping the other two).

Another bug is that the code in sja1105_adjust_port_config relies on the
fact that an invalid link speed is detected by sja1105_get_speed_cfg and
returned as -EINVAL.  However storing this into an enum that only has
positive members will cast it into an unsigned value, and it will miss
the negative check.

So take the simplest approach and remove the sja1105_get_speed_cfg
function and replace it with a simple switch-case statement.

Fixes: 8aa9ebccae ("net: dsa: Introduce driver for NXP SJA1105 5-port L2 switch")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Suggested-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-04 11:51:57 -07:00
Russell King
7731676332 net: phylink: avoid reducing support mask
Avoid reducing the support mask as a result of the interface type
selected for SFP modules, or when setting the link settings through
ethtool - this should only change when the supported link modes of
the hardware combination change.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-04 11:43:24 -07:00
Russell King
28e74a7cfd net: sfp: read eeprom in maximum 16 byte increments
Some SFP modules do not like reads longer than 16 bytes, so read the
EEPROM in chunks of 16 bytes at a time.  This behaviour is not specified
in the SFP MSAs, which specifies:

 "The serial interface uses the 2-wire serial CMOS E2PROM protocol
  defined for the ATMEL AT24C01A/02/04 family of components."

and

 "As long as the SFP+ receives an acknowledge, it shall serially clock
  out sequential data words. The sequence is terminated when the host
  responds with a NACK and a STOP instead of an acknowledge."

We must avoid breaking a read across a 16-bit quantity in the diagnostic
page, thankfully all 16-bit quantities in that page are naturally
aligned.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-03 15:16:37 -07:00
Xin Long
67c0aaa1ea selftests: set sysctl bc_forwarding properly in router_broadcast.sh
sysctl setting bc_forwarding for $rp2 is needed when ping_test_from h2,
otherwise the bc packets from $rp2 won't be forwarded. This patch is to
add this setting for $rp2.

Also, as ping_test_from does grep "$from" only, which could match some
unexpected output, some test case doesn't really work, like:

  # ping_test_from $h2 198.51.200.255 198.51.200.2
    PING 198.51.200.255 from 198.51.100.2 veth3: 56(84) bytes of data.
    64 bytes from 198.51.100.1: icmp_seq=1 ttl=64 time=0.336 ms

When doing grep $form (198.51.200.2), the output could still match.
So change to grep "bytes from $from" instead.

Fixes: 40f98b9af9 ("selftests: add a selftest for directed broadcast forwarding")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-03 15:15:01 -07:00
Sean Wang
880c2d4b2f net: ethernet: mediatek: Use NET_IP_ALIGN to judge if HW RX_2BYTE_OFFSET is enabled
Should only enable HW RX_2BYTE_OFFSET function in the case NET_IP_ALIGN
equals to 2.

Signed-off-by: Mark Lee <mark-mc.lee@mediatek.com>
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-03 15:04:07 -07:00
Sean Wang
9e4f56f1a7 net: ethernet: mediatek: Use hw_feature to judge if HWLRO is supported
Should hw_feature as hardware capability flags to check if hardware LRO
got support.

Signed-off-by: Mark Lee <mark-mc.lee@mediatek.com>
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-03 15:04:07 -07:00
Linus Torvalds
788a024921 ARC fixes for 5.2-rc4
- Fix for userspace trying to access kernel vaddr space
 
  - HSDK platform DT updates
 
  - Cleanup some build warnings
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJc9YpvAAoJEGnX8d3iisJe/cAQAKe0nM3w7hPGwKpk8QnVOrSM
 Hto0QR+mEQkyrad1lBgFSM6pQgp6ziVr54dXQz646+vItUPysHkxBQ5bAb6oLTce
 KpcjSwztO2AxTnIyhLGTm0pmUpzHvW/Dd+HV5a0sSK1lQ/Ph8LuqCljvjx7U0yMb
 niH2g9WJLxc4Ry6KaFRwKKUmRVUeof9TfNLRdz2+r46vZ7dnQvkLTCI4WcpUcrWx
 Dh4J2E6v6t8wTAYm+Ev6CtKYIE3LM2MbFeAHL5XDz8VTQB585pXMaLvDOzK6ANUz
 6I8eUskx8L6WJd4qqE4ZmvTl0FXQGcONUSd4z6+Cj6NgmxHwMqdiw6Xv2Qo5SIGl
 vW3RVzdO8cuQj9Lfcydrng9XCh5rNI7yRXX7CNTZPY7hEPeaC1g91CHWVvvjTWes
 yd7pQf95K5ZokfdGNRqslngqdlxET8yBz2uKEc560BaWghtekIaPwTdBPY150jvr
 vALbrjFF8SBh8yH4ShMpYl0SkNeb1RjU7V4qDhukosQnhGqF0FWauW5ONd5ZREFr
 E7vQsicOqHx0loOABgeq4yvqH6aR/GvIUlulWJtoqaF7DSXxCiMzZTUyd1iavDEa
 8xrV8kdbr1EJIOAih0A1YKANDw9z9W/16+Xz0ifRwPW6a3wMaDHPT2Z273LurdFd
 JUGZi4aA06xVVXvsLrCV
 =N0Dk
 -----END PGP SIGNATURE-----

Merge tag 'arc-5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc

Pull ARC fixes from Vineet Gupta:

 - Fix for userspace trying to access kernel vaddr space

 - HSDK platform DT updates

 - Cleanup some build warnings

* tag 'arc-5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
  ARC: [plat-hsdk] Get rid of inappropriate PHY settings
  ARC: [plat-hsdk]: Add support of Vivante GPU
  ARC: [plat-hsdk]: enable creg-gpio controller
  ARC: [plat-hsdk]: Add missing FIFO size entry in GMAC node
  ARC: [plat-hsdk]: Add missing multicast filter bins number to GMAC node
  ARC: mm: SIGSEGV userspace trying to access kernel virtual memory
  ARC: fix build warnings
2019-06-03 14:45:48 -07:00