The ifinfomsg is in there (thanks kaber@ for foreseeing this long time ago),
so take the given ifidex and register netdev with it.
Ben noticed, that this code path previously ignored ifmp->ifi_index and
userland could be passing in garbage. Thus it may now fail occasionally
because the value clashes with an existing interface.
To address this it's assumed that if the caller specifies the ifindex for
the veth master device, then it's aware of this possibility and should
explicitly specify (or set to 0 for auto-assignment) the peer's ifindex as
well. With this the compatibility with old tools not setting ifindex is
preserved.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the RTM_NEWLINK results in -EOPNOTSUPP if the ifinfomsg->ifi_index
is not zero. I propose to allow requesting ifindices on link creation. This
is required by the checkpoint-restore to correctly restore a net namespace
(i.e. -- a container).
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric noticed, that when there will be devices with equal indices, some
hash functions that use them will become less effective as they could.
Fix this in advance by mixing the net_device address into the hash value
instead of the device index.
This is true for arp and ndisc hash fns. The netlabel, can and llc ones
are also ifindex-based, but that three are init_net-only, thus will not
be affected.
Many thanks to David and Eric for the hash32_ptr implementation!
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Various /proc/net files sometimes report crazy timer values, expressed
in clock_t units.
This happens when an expired timer delta (expires - jiffies) is passed
to jiffies_to_clock_t().
This function has an overflow in :
return div_u64((u64)x * TICK_NSEC, NSEC_PER_SEC / USER_HZ);
commit cbbc719fcc (time: Change jiffies_to_clock_t() argument type
to unsigned long) only got around the problem.
As we cant output negative values in /proc/net/tcp without breaking
various tools, I suggest adding a jiffies_delta_to_clock_t() wrapper
that caps the negative delta to a 0 value.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Maciej Żenczykowski <maze@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: hank <pyu@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
__fls(x) is a bit faster than fls(x), granted we know x is non null.
As Ben Hutchings pointed out, fls(x) = __fls(x) + 1
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
1) Avoid dirtying neighbour's confirmed field.
TCP workloads hits this cache line for each incoming ACK.
Lets write n->confirmed only if there is a jiffie change.
2) Optimize neigh_hh_output() for the common Ethernet case, were
hh_len is less than 16 bytes. Replace the memcpy() call
by two inlined 64bit load/stores on x86_64.
Bench results using udpflood test, with -C option (MSG_CONFIRM flag
added to sendto(), to reproduce the n->confirmed dirtying on UDP)
24 threads doing 1.000.000 UDP sendto() on dummy device, 4 runs.
before : 2.247s, 2.235s, 2.247s, 2.318s
after : 1.884s, 1.905s, 1.891s, 1.895s
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixing the cpsw device tree example to make it simpler to copy pastable to dts
file and use it directly.
Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
device tree implementation for davinci mdio driver
Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While playing with CoDel and ECN marking, I discovered a
non optimal behavior of receiver of CE (Congestion Encountered)
segments.
In pathological cases, sender has reduced its cwnd to low values,
and receiver delays its ACK (by 40 ms).
While RFC 3168 6.1.3 (The TCP Receiver) doesn't explicitly recommend
to send immediate ACKS, we believe its better to not delay ACKS, because
a CE segment should give same signal than a dropped segment, and its
quite important to reduce RTT to give ECE/CWR signals as fast as
possible.
Note we already call tcp_enter_quickack_mode() from TCP_ECN_check_ce()
if we receive a retransmit, for the same reason.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While doing TCP ECN tests, I discovered GRO was reordering packets if it
receives one packet with CE set, while previous packets in same NAPI run
have ECT(0) for the same flow :
09:25:25.857620 IP (tos 0x2,ECT(0), ttl 64, id 27893, offset 0, flags
[DF], proto TCP (6), length 4396)
172.30.42.19.54550 > 172.30.42.13.44139: Flags [.], seq
233801:238145, ack 1, win 115, options [nop,nop,TS val 3397779 ecr
1990627], length 4344
09:25:25.857626 IP (tos 0x3,CE, ttl 64, id 27892, offset 0, flags [DF],
proto TCP (6), length 1500)
172.30.42.19.54550 > 172.30.42.13.44139: Flags [.], seq
232353:233801, ack 1, win 115, options [nop,nop,TS val 3397779 ecr
1990627], length 1448
09:25:25.857638 IP (tos 0x0, ttl 64, id 34581, offset 0, flags [DF],
proto TCP (6), length 64)
172.30.42.13.44139 > 172.30.42.19.54550: Flags [.], cksum 0xac8f
(incorrect -> 0xca69), ack 232353, win 1271, options [nop,nop,TS val
1990627 ecr 3397779,nop,nop,sack 1 {233801:238145}], length 0
We have two problems here :
1) GRO reorders packets
If NIC gave packet1, then packet2, which happen to be from "different
flows" GRO feeds stack with packet2, then packet1. I have yet to
understand how to solve this problem.
2) GRO is not ECN friendly
Delivering packets out of order makes TCP stack not as fast as it could
be.
In this patch I suggest we make the tos test not part of the 'same_flow'
determination, but part of the 'should flush' logic
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reduce IP latencies by placing hot MIB IP fields in a single cache line.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Avoid two instructions to reload dev->nd_net->mib.ip_statistics pointer,
unsing a temp variable, in ip_rcv(), ip_output() paths for example.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
performance profiles show a high cost in the IN_DEV_ROUTE_LOCALNET()
call done in ip_route_input_slow(), because of multiple dereferences,
even if cache lines are clean and available in cpu caches.
Since we already have the 'net' pointer, introduce
IN_DEV_NET_ROUTE_LOCALNET() macro avoiding two dereferences
(dev_net(in_dev->dev))
Also change the tests to use IN_DEV_NET_ROUTE_LOCALNET() only if saddr
or/and daddr are loopback addresse.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use net_hash_mix(net) instead of hash_ptr(net, 8), and use
hash_32() instead of using a serie of XOR
Define IN4_ADDR_HSIZE_SHIFT for clarity
__ip_dev_find() can perform the net_eq() call only if ifa_local
matches the key, to avoid unneeded dereferences.
remove inline attributes
# size net/ipv4/devinet.o.before net/ipv4/devinet.o
text data bss dec hex filename
17471 2545 2048 22064 5630 net/ipv4/devinet.o.before
17335 2545 2048 21928 55a8 net/ipv4/devinet.o
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to avoid false drop_monitor indications, we should
call consume_skb() if skb_clone() was successful.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds device tree support for cpsw driver
Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
cpsw is dependent on davinci_cpdma and davinci_mdio, so adding SOC support for
dependent modules
Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add 64 bit stats to ppp driver. The 64 bit stats include tx_bytes,
rx_bytes, tx_packets and rx_packets. Other stats are still 32 bit.
The 64 bit stats can be retrieved via the ndo_get_stats operation. The
SIOCGPPPSTATS ioctl is still 32 bit stats only.
Signed-off-by: Kevin Groeneveld <kgroeneveld@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Similar to what bonding has. This allows to set queue_id for port so
this port will be used when skb with matching skb->queue_mapping is
going to be transmitted.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed types might be needed in NL communication from time to time
(I need s32 in team driver), so add them.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix build error on cris (not tested, no toolchain here):
drivers/net/cris/eth_v10.c: error: too many arguments to function 'e100rxtx_interrupt'
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: linux-cris-kernel@axis.com
Signed-off-by: David S. Miller <davem@davemloft.net>
Hello,
looking at http://sourceforge.net/apps/mediawiki/mbm/index.php?title=Main_Page#Supported_devices, there are branded Ericsson devices from Dell and Toshiba.
The to-be-added vendor IDs are 0x413c for Dell and 0x0930 for Toshiba.
Please find attached a patch to add these vendor IDs.
Signed-off-by: Peter Meiser <meiser@gmx-topmail.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to wait for send_completion msg before put_rndis_request() at
the end of rndis_filter_halt_device(). Otherwise, netvsc_send_completion()
may reference freed memory which is overwritten, and cause panic.
Reported-by: Long Li <longli@microsoft.com>
Reported-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Port1=Eth, Port2=IB restriction is no longer required.
Having RoCE, there will always rdma port initialized over ConnectX
physical port, no matter whether the link layer is IB or Ethernet.
So we always have dual port IB device.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
Removing the ring->blocked flag, it is redundant and leads to a race:
We close the TX queue and then set the "blocked" flag.
Between those 2 operations the completion function can check the "blocked"
flag, sees that it is 0, and wouldn't open the TX queue.
Using netif_tx_queue_stopped to check the state of the queue to avoid this race.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
Should NOT check SMAC=DMAC when:
1. loopback is turned on
2. validate_loopback is true.
Fixed it accordingly.
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
gact_rand array is accessed by gact->tcfg_ptype whose value
is assumed to less than MAX_RAND, but any range checks are
not performed.
So add a check in tcf_gact_init(). And in tcf_gact(), we can
reduce a branch.
Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Missing a CR in printk causes 2 messages printed in one line.
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
AR1111 is same as AR9485. The h/w
difference between them is quite insignificant,
Felix suggests only very few baseband features
may not be available in AR1111. The h/w code for
AR9485 is already present, so AR1111 should
work fine with the addition of its PID/VID.
Cc: stable@vger.kernel.org [2.6.39+]
Cc: Felix Bitterli <felixb@qca.qualcomm.com>
Reported-by: Tim Bentley <Tim.Bentley@Gmail.com>
Signed-off-by: Mohammed Shafi Shajakhan <mohammed@qca.qualcomm.com>
Tested-by: Tim Bentley <Tim.Bentley@Gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
brcmsmac cannot call freq_reg_info() during channel changes as it does
not hold cfg80211_lock, and as a result it generates a lockdep warning.
freq_reg_info() is being used to determine whether OFDM is allowed on
the current channel, so we can avoid the errant call by using the new
IEEE80211_CHAN_NO_OFDM for this purpose instead.
Reported-by: Josh Boyer <jwboyer@redhat.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The if_sdio_card structure was never being freed, and neither
was the command structure used for association.
Signed-off-by: Daniel Drake <dsd@laptop.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch is going to fix the resuming failed from S3/S4
for rt3290 chip.
Signed-off-by: Woody Hung <Woody.Hung@mediatek.com>
Cc: Kevin Chou <kevin.chou@mediatek.com>
Signed-off-by: Chen, Chien-Chia <machen@suse.com>
Reviewed-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
On an OLPC XO-1.5 we have seen the following situation:
- the system starts going into suspend
- no wake params are set, so the mmc layer removes the card
- during remove, we send a command to the card
- that command fails, causing if_sdio's reset method to try and remove
the mmc card in attempt to reset it
- the mmc layer is not happy about being asked to remove a card that
it is already removing, and the kernel crashes
While the MMC layer could possibly be taught to behave better here,
it also seems sensible for libertas not to try and reset a card if
we're in the process of removing it anyway.
Signed-off-by: Daniel Drake <dsd@laptop.org>
Acked-by: Dan Williams <dcbw@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Restore the default state to the "beacon_found" flag when
the channel flags are restored. Otherwise, we can end up
with a channel that we can no longer transmit on even when
we can see beacons on that channel.
Signed-off-by: Paul Stewart <pstew@chromium.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Currently the only way for wireless drivers to tell whether or not OFDM
is allowed on the current channel is to check the regulatory
information. However, this requires hodling cfg80211_mutex, which is not
visible to the drivers.
Other regulatory restrictions are provided as flags in the channel
definition, so let's do similarly with OFDM. This patch adds a new flag,
IEEE80211_CHAN_NO_OFDM, to tell drivers that OFDM on a channel is not
allowed. This flag is set on any channels for which regulatory indicates
that OFDM is prohibited.
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Tested-by: Arend van Spriel <arend@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The current firmware version used by the device driver
is 7.12.0
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: James Bottomley <JBottomley@Parallels.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Divy Le Ray <divy@chelsio.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
In bnx2x_mcast_enqueue_cmd() we'll leak the memory allocated to
'new_cmd' if we hit the deafault case of the 'switch (cmd)'.
Add a 'kfree(new_cmd)' to that case to avoid the leak.
Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Acked-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After SA is setup, one timer is armed to detect soft/hard expiration,
however the timer handler uses xtime to do the math. This makes hard
expiration occurs first before soft expiration after setting new date
with big interval. As a result new child SA is deleted before rekeying
the new one.
Signed-off-by: Fan Du <fdu@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cache the device gso_max_segs in sock::sk_gso_max_segs and use it to
limit the size of TSO skbs. This avoids the need to fall back to
software GSO for local TCP senders.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently an skb requiring TSO may not fit within a minimum-size TX
queue. The TX queue selected for the skb may stall and trigger the TX
watchdog repeatedly (since the problem skb will be retried after the
TX reset). This issue is designated as CVE-2012-3412.
Set the maximum number of TSO segments for our devices to 100. This
should make no difference to behaviour unless the actual MSS is less
than about 700. Increase the minimum TX queue size accordingly to
allow for 2 worst-case skbs, so that there will definitely be space
to add an skb after we wake a queue.
To avoid invalidating existing configurations, change
efx_ethtool_set_ringparam() to fix up values that are too small rather
than returning -EINVAL.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A peer (or local user) may cause TCP to use a nominal MSS of as little
as 88 (actual MSS of 76 with timestamps). Given that we have a
sufficiently prodigious local sender and the peer ACKs quickly enough,
it is nevertheless possible to grow the window for such a connection
to the point that we will try to send just under 64K at once. This
results in a single skb that expands to 861 segments.
In some drivers with TSO support, such an skb will require hundreds of
DMA descriptors; a substantial fraction of a TX ring or even more than
a full ring. The TX queue selected for the skb may stall and trigger
the TX watchdog repeatedly (since the problem skb will be retried
after the TX reset). This particularly affects sfc, for which the
issue is designated as CVE-2012-3412.
Therefore:
1. Add the field net_device::gso_max_segs holding the device-specific
limit.
2. In netif_skb_features(), if the number of segments is too high then
mask out GSO features to force fall back to software GSO.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull MIPS updates from Ralf Baechle:
"The lion share of this pull request are fixes for clk-related breakage
caused by other changes during this merge window. For some platforms
the fix was as simple as selecting HAVE_CLK, for others like the
Loongson 2 significant restructuring was required.
The remainder are changes required to get the Lantiq code to work
again."
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
MIPS: Loongson 2: Sort out clock managment.
MIPS: Loongson 1: more clk support and add select HAVE_CLK
MIPS: txx9: Fix redefinition of clk_* by adding select HAVE_CLK
MIPS: BCM63xx: Fix redefinition of clk_* by adding select HAVE_CLK
MIPS: AR7: Fix redefinition of clk_* by adding select HAVE_CLK
MIPS: Lantiq: Platform specific CLK fixup
MIPS: Lantiq: Add device_tree_init function
MIPS: Lantiq: Fix interface clock and PCI control register offset