Currently, SCTP code defines its own timeval functions (since timeval
is rarely used inside the kernel by others), namely tv_lt() and
TIMEVAL_ADD() macros, that operate on SCTP cookie expiration.
We might as well remove all those, and operate directly on ktime
structures for a couple of reasons: ktime is available on all archs;
complexity of ktime calculations depending on the arch is less than
(reduces to a simple arithmetic operations on archs with
BITS_PER_LONG == 64 or CONFIG_KTIME_SCALAR) or equal to timeval
functions (other archs); code becomes more readable; macros can be
thrown out.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add two ktime helper functions that i) convert a given msec value to
a ktime structure and ii) that adds a msec value to a ktime structure.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We do neither ship a test_frame.h, nor will this be compatible with
the 2.5 out-of-tree lksctp kernel test suite anyway. So remove this
artefact.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Should not allow negative num_vfs
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.com>
Signed-off-by: Vladimir Sokolovsky <vlad@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Warning prints when there are command timeout to help debugging future
failures.
Signed-off-by: Dotan Barak <dotanb@dev.mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is not safe to use sscanf.
Signed-off-by: Dotan Barak <dotanb@dev.mellanox.com>
Signed-off-by: Vladimir Sokolovsky <vlad@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since this variable is now part of a structure and not allocated dynamically,
this test is irrelevant now.
Signed-off-by: Dotan Barak <dotanb@dev.mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Print a warning when a TX timeout is detected
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.com>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The RX rings were cleaned while there was still possible RX traffic completion
handling.
Change the sequance of events so that the port is closed and the QPs are being
stopped before RX cleanup.
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The port vlan table size is 126 (used for IBoE) so after 126 we will
not have space and the user need to see it only in debug print and not
error.
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Reviewed-by: Yevgeny Petrilin <yevgenyp@mellanox.com>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To avoid a race between the open function and everything that happens after
register_netdev() move it to be the last operation called.
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are no counters allocated to the eth device when the port is down, so
this query is meaningless at that time.
It also leads to querying incorrect counters (since the counter_index is not
valid when the device port is down).
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wrong condition was used when calling iounmap.
Signed-off-by: Dotan Barak <dotanb@dev.mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the tokenized ip address is re-set on an interface we depend on the
arrival of a new router advertisment to call addrconf_verify to clean
up the old address (which valid_lft is now set to 0). Old addresses can
linger around for a longer time if e.g. the source of router advertisments
vanishes.
So, call addrconf_verify immediately after setting the new tokenized
address to get rid of the old tokenized addresses.
Cc: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The reason behind this change is that as soon as we delete
the last ipv6 address of an interface we also lose the
/proc/sys/net/ipv6/conf/<interface> directory. This seems to be a
usability problem for me.
I don't see any reason why we should shutdown ipv6 on that interface in
such cases.
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch splits the timers for duplicate address detection and router
solicitations apart. The router solicitations timer goes into inet6_dev
and the dad timer stays in inet6_ifaddr.
The reason behind this patch is to reduce the number of unneeded router
solicitations send out by the host if additional link-local addresses
are created. Currently we send out RS for every link-local address on
an interface.
If the RS timer fires we pick a source address with ipv6_get_lladdr. This
change could hurt people adding additional link-local addresses and
specifying these addresses in the radvd clients section because we
no longer guarantee that we use every ll address as source address in
router solicitations.
Cc: Flavio Leitner <fleitner@redhat.com>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: David Stevens <dlstevens@us.ibm.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Reviewed-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Eric Dumazet <edumazet@google.com>
mlx4 exclusively uses order-2 allocations in RX path, which are
likely to fail under memory pressure.
We therefore drop frames more than needed.
This patch tries order-3, order-2, order-1 and finally order-0
allocations to keep good performance, yet allow allocations if/when
memory gets fragmented.
By using larger pages, and avoiding unnecessary get_page()/put_page()
on compound pages, this patch improves performance as well, lowering
false sharing on struct page.
Also use GFP_KERNEL allocations in initialization path, as allocating 12
MB (390 order-3 pages) can easily fail with GFP_ATOMIC.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Amir Vadai <amirv@mellanox.com>
Acked-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings says:
====================
1. Make EEH recovery work when using legacy interrupts, from Alexandre
Rames.
2. Enable accelerated RFS for VLAN-tagged flows, from Andy Lutomirski.
3. Improve performance for non-TCP (and particularly UDP) traffic, which
regressed in 3.10 when we switched to always allocating paged RX
buffers. Partly by Jon Cooper.
4. Some minor bug fixes to IOMMU detection, timestamping capabilities,
and IRQ cleanup on the probe failure path.
I've dropped the RX skb cache, which improved some benchmarks but
perhaps needs some reworking to be more generally useful.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit 82dc3c63c6
"net: introduce NAPI_POLL_WEIGHT" network drivers receive a warning
when they use napi weight higher than NAPI_POLL_WEIGHT. This patch
reduces QETH_NAPI_WEIGHT from 128 to 64 (NAPI_POLL_WEIGHT).
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the initial MTU size is changed prior to any activity on the device
(e.g. by attaching a z/VM vNIC already configured in Linux to a guestLAN),
we call dev_kfree_skb_irq(NULL) which results in a kernel panic.
Adding a proper check for NULL pointers to address this issue.
Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
Reviewed-by: Ursula Braun <braunu@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
blkt settings (or LAN idle settings) for an OSA Express card
determine when and how often an OSA Express card tells the
operating system about new incoming packets. The semantic of
these settings has changed starting with OSA Express3. Currently
the qeth standard settings apply to OSA Express2 and older
generations of OSA Express cards, while new generations of OSA
Express cards require extra coding of their reasonable default.
To cover future OSA Express generations the qeth default standard
blkt setting is now the desired setting for OSA generations
starting with OSA Express3, while the fixed set of older OSA
Express cards receives its blkt settings explicitly.
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
Reviewed-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Increase the default MTU for real OSA devices in layer 2 mode
to 1500 Bytes for increased compatibility.
Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
Reviewed-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If someone is interested to dump something they may consider to use
print_hex_dump() or print_hex_dump_bytes() kernel helpers.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To improve power consumption in idle associated mode FW may lower
RX power. This low linearity mode is acceptable for listening low rate
RX such as beacons and groupcast. The driver enables LPRX only if PM
is enabled and associated AP's beacon TX rate is 1Mbps or 6Mbps.
LPRX RSSI threshold is used to limit a range where LPRX is applied.
Signed-off-by: Alexander Bondar <alexander.bondar@intel.com>
Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
A few places use 'pcie_trans' which is a bit non-standard,
use 'trans_pcie' there as well.
Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
A few places use just 'q', use 'rxq' there like all
other places.
Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The PCIe code has an array of buffer descriptors (RXBs) that have pages
and DMA mappings attached. In regular use, the array isn't used and the
buffers are either on the hardware receive queue or the rx_free/rx_used
lists for recycling.
Occasionally, during module unload, we'd see a warning from this:
WARNING: at lib/list_debug.c:32 __list_add+0x91/0xa0()
list_add corruption. prev->next should be next (c31c98cc), but was c31c80bc. (prev=c31c80bc).
Pid: 519, comm: rmmod Tainted: G W O 3.4.24-dev #3
Call Trace:
[<c10335b2>] warn_slowpath_common+0x72/0xa0
[<c1033683>] warn_slowpath_fmt+0x33/0x40
[<c12e31d1>] __list_add+0x91/0xa0
[<fdf2083c>] iwl_pcie_rxq_free_rbs+0xcc/0xe0 [iwlwifi]
[<fdf21b3f>] iwl_pcie_rx_free+0x3f/0x210 [iwlwifi]
[<fdf2dd7a>] iwl_trans_pcie_free+0x2a/0x90 [iwlwifi]
The reason for this seems to be that in iwl_pcie_rxq_free_rbs() we use
the array to free all buffers (the hardware receive queue isn't in use
any more at this point). The function also adds all buffers to rx_used
because it's also used during initialisation (when no freeing happens.)
This can cause the warning because it may add entries to the list that
are already on it. Luckily, this is harmless because it can only happen
when the entire data structure is freed anyway, since during init both
lists are initialized from scratch.
Disentangle this code and treat init and free separately. During init
we just need to put them onto the list after freeing all buffers (for
switching between 4k/8k buffers); during free no list manipulations
are necessary at all.
Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
In case that an AP/GO interface is started while there is a
station/P2P client associated, need to make sure that the AP/GO
beacon time is far enough from the station's one in oder to allow
the station to receive the DTIM beacons and the following traffic
etc.
To resolve this, when the AP is started, check if there is an
active station interface, and guarantee that the AP/GO TBTT is far
enough from the station one.
Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add prints visible to the user when entering and exiting
thrermal throttling, because so users can tell that the
NIC is getting too hot (and throughput will decrease.)
Signed-off-by: eytan lifshitz <eytan.lifshitz@intel.com>
Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
1x1 products will need a special LUT.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This patch solves several sparse issues as well as an unneeded semicolon
found via coccinelle.
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit f88c91ddba ("ipv6: statically link
register_inet6addr_notifier()" added following sparse warnings :
net/ipv6/addrconf_core.c:83:5: warning: symbol
'register_inet6addr_notifier' was not declared. Should it be static?
net/ipv6/addrconf_core.c:89:5: warning: symbol
'unregister_inet6addr_notifier' was not declared. Should it be static?
net/ipv6/addrconf_core.c:95:5: warning: symbol
'inet6addr_notifier_call_chain' was not declared. Should it be static?
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
struct tcp_fastopen_context has a field named tfm, which is a pointer
to a crypto_cipher structure.
It currently has a __rcu annotation, which is not needed at all.
tcp_fastopen_ctx is the pointer fetched by rcu_dereference(), but once
we have a pointer to current tcp_fastopen_context, we do not use/need
rcu_dereference() to access tfm.
This fixes a lot of sparse errors like the following :
net/ipv4/tcp_fastopen.c:21:31: warning: incorrect type in argument 1 (different address spaces)
net/ipv4/tcp_fastopen.c:21:31: expected struct crypto_cipher *tfm
net/ipv4/tcp_fastopen.c:21:31: got struct crypto_cipher [noderef] <asn:4>*tfm
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jerry Chu <hkchu@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, there is no good possibility to debug netlink traffic that
is being exchanged between kernel and user space. Therefore, this patch
implements a netlink virtual device, so that netlink messages will be
made visible to PF_PACKET sockets. Once there was an approach with a
similar idea [1], but it got forgotten somehow.
I think it makes most sense to accept the "overhead" of an extra netlink
net device over implementing the same functionality from PF_PACKET
sockets once again into netlink sockets. We have BPF filters that can
already be easily applied which even have netlink extensions, we have
RX_RING zero-copy between kernel- and user space that can be reused,
and much more features. So instead of re-implementing all of this, we
simply pass the skb to a given PF_PACKET socket for further analysis.
Another nice benefit that comes from that is that no code needs to be
changed in user space packet analyzers (maybe adding a dissector, but
not more), thus out of the box, we can already capture pcap files of
netlink traffic to debug/troubleshoot netlink problems.
Also thanks goes to Thomas Graf, Flavio Leitner, Jesper Dangaard Brouer.
[1] http://marc.info/?l=linux-netdev&m=113813401516110
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Similarly to the networking receive path with ptype_all taps, we add
the possibility to register netdevices that are for ARPHRD_NETLINK to
the netlink subsystem, so that those can be used for netlink analyzers
resp. debuggers. We do not offer a direct callback function as out-of-tree
modules could do crap with it. Instead, a netdevice must be registered
properly and only receives a clone, managed by the netlink layer. Symbols
are exported as GPL-only.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This small patch adds the definition of ARPHRD_NETLINK which can for
example be used by netlink monitoring devices as device type. So that
sockaddr_ll can pick it up and based on that choose the correct packet
dissector.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This restores commits:
c573972c111a5904342cda2e2c2149
which initially accidently went into 'net', were
reverted there, and then properly placed into 'net-next'.
But the next net --> net-next merge accidently wiped them
out again.
Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The device::iommu_group field may be set even if no IOMMU is in use.
iommu_present() is still a better indicator, although it doesn't tell
us whether *our* device is affected.
Reported-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
The lifetime of an irq_cpu_rmap is odd: we have to allocate it before
installing IRQ handlers and free it before removing the IRQ handlers.
As a result of this asymmetry, it was omitted from some failure paths.
On another failure path, we could try to remove IRQ handlers we
had not yet installed.
Move the irq_cpu_rmap allocation and freeing alongside IRQ handler
installation and removal, in efx_nic_{init,fini}_interrupts().
Count the number of IRQ handlers successfully installed and only
remove those on the failure path.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
GRO can handle non-TCP packets and pass them up without coalescing,
but it has to do some extra work to parse the packet which we can
bypass using the hardware parse result. (This condition yields a
false negative for TCP/IPv6 packets received by Falcon, but its
performance is already poor in that case due to lack of checksum
offload.)
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
As far as I know, the hardware doesn't support matching on both IP
fields and vlan tag, but it can at least match on the IP fields.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
The kernel can generate software receive timestamps and we should
report those for all ports regardless of hardware capabilities.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
PCI legacy interrupts are level-triggered, and we cannot mask them up
on an isolated device. Instead, disable the IRQ at the controller
until we have recovered.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Various parts of the HW code are applicable for
both v2.0 and v2.1.
Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>