sh_eth_drv_probe() does cast from 'void *' when assigning to the 'pd' variable
which is automatic anyway. Turn the assignment into initializer, while removing
the cast...
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sh_mdio_init() uses the bare numbers instead of the PHY interface bits, despite
these are declared in sh_eth.h as 'enum PIR_BIT'...
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The packetsocket fanout test uses a packet ring. Use TPACKET_V2
instead of TPACKET_V1 to work around a known 32/64 bit issue in
the older ring that manifests on sparc64.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The netlink_diag can be built as a module, just like it's done in
unix sockets.
The core dumping message carries the basic info about netlink sockets:
family, type and protocol, portis, dst_group, dst_portid, state.
Groups can be received as an optional parameter NETLINK_DIAG_GROUPS.
Netlink sockets cab be filtered by protocols.
The socket inode number and cookie is reserved for future per-socket info
retrieving. The per-protocol filtering is also reserved for future by
requiring the sdiag_protocol to be zero.
The file /proc/net/netlink doesn't provide enough information for
dumping netlink sockets. It doesn't provide dst_group, dst_portid,
groups above 32.
v2: fix NETLINK_DIAG_MAX. Now it's equal to the last constant.
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Gao feng <gaofeng@cn.fujitsu.com>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move a few declarations in a header.
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Gao feng <gaofeng@cn.fujitsu.com>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Follow the common pattern and define *_DIAG_MAX like:
[...]
__XXX_DIAG_MAX,
};
Because everyone is used to do:
struct nlattr *attrs[XXX_DIAG_MAX+1];
nla_parse([...], XXX_DIAG_MAX, [...]
Reported-by: Thomas Graf <tgraf@suug.ch>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Or Gerlitz says:
====================
Here's a batch of mlx4 driver fixes for 3.9, mostly SRIOV/Flow-steering
related. Series done against the net tree as of commit 5a3da1f
"inet: limit length of fragment queue hash table bucket lists
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
VF QPs must not be released when they have steering rules attached to them.
For that end, introduce a reference count field to the QP object in the
SRIOV resource tracker which is incremented/decremented when steering rules
are attached/detached to it. QPs can be released by VF only when their
ref count is zero.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
One of the resource tracker code paths was wrongly using int and not u64
for resource tracking IDs, fix it.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix the ethtool flow steering rules cleanup to be carried out before
releasing the RX QPs.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On the resource tracker cleanup flow, the DMFS rules must be deleted before we
destroy the QPs, else the HW may attempt doing packet steering to non existent QPs.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the mask is wrongly set in the MAP_EQ wrapper, fix that.
Without the fix any EQ number above 511 is mapped to one below 511.
Signed-off-by: Moshe Lazer <moshel@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The GRO_DROP return code is handled by the core network layer.
The current kernel approach is to factorize this kind of statistics into
the upper layers, instead of having all the drivers maintaining them.
Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Warning message:
warning: 'budget_per_q' may be used uninitialized in this function
budget_per_q won't be used uninitialized since the only time
it doesn't get initialized is when entering gfar_poll with
num_act_queues == 0, meaning rstat_rxf == 0, in which case
budget_per_q is not utilized (as it has no meaning).
Inititalize budget_per_q to 0 though to suppress this compile
warning.
Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The error check in cpsw_probe_dt() has an '&&' where an '||' is
meant to be. This causes a NULL pointer dereference when incomplet DT
data is passed to the driver ('phy_id' property for cpsw_emac1
missing).
Signed-off-by: Lothar Waßmann <LW@KARO-electronics.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Nobuhiro Iwamatsu <nobuhiro.iwamatsu.yj@renesas.com>
Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Freeing netdev without free_netdev() leads to net, tx leaks.
And it may lead to dereferencing freed pointer.
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
The cgroup code has been surrounded by ifdef CONFIG_NET_CLS_CGROUP
and CONFIG_NETPRIO_CGROUP.
Signed-off-by: Li Zefan <lizefan@huawei.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch implements F-RTO (foward RTO recovery):
When the first retransmission after timeout is acknowledged, F-RTO
sends new data instead of old data. If the next ACK acknowledges
some never-retransmitted data, then the timeout was spurious and the
congestion state is reverted. Otherwise if the next ACK selectively
acknowledges the new data, then the timeout was genuine and the
loss recovery continues. This idea applies to recurring timeouts
as well. While F-RTO sends different data during timeout recovery,
it does not (and should not) change the congestion control.
The implementaion follows the three steps of SACK enhanced algorithm
(section 3) in RFC5682. Step 1 is in tcp_enter_loss(). Step 2 and
3 are in tcp_process_loss(). The basic version is not supported
because SACK enhanced version also works for non-SACK connections.
The new implementation is functionally in parity with the old F-RTO
implementation except the one case where it increases undo events:
In addition to the RFC algorithm, a spurious timeout may be detected
without sending data in step 2, as long as the SACK confirms not
all the original data are dropped. When this happens, the sender
will undo the cwnd and perhaps enter fast recovery instead. This
additional check increases the F-RTO undo events by 5x compared
to the prior implementation on Google Web servers, since the sender
often does not have new data to send for HTTP.
Note F-RTO may detect spurious timeout before Eifel with timestamps
does so.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Consolidate all of TCP CA_Loss state processing in
tcp_fastretrans_alert() into a new function called tcp_process_loss().
This is to prepare the new F-RTO implementation in the next patch.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The patch series refactor the F-RTO feature (RFC4138/5682).
This is to simplify the loss recovery processing. Existing F-RTO
was developed during the experimental stage (RFC4138) and has
many experimental features. It takes a separate code path from
the traditional timeout processing by overloading CA_Disorder
instead of using CA_Loss state. This complicates CA_Disorder state
handling because it's also used for handling dubious ACKs and undos.
While the algorithm in the RFC does not change the congestion control,
the implementation intercepts congestion control in various places
(e.g., frto_cwnd in tcp_ack()).
The new code implements newer F-RTO RFC5682 using CA_Loss processing
path. F-RTO becomes a small extension in the timeout processing
and interfaces with congestion control and Eifel undo modules.
It lets congestion control (module) determines how many to send
independently. F-RTO only chooses what to send in order to detect
spurious retranmission. If timeout is found spurious it invokes
existing Eifel undo algorithms like DSACK or TCP timestamp based
detection.
The first patch removes all F-RTO code except the sysctl_tcp_frto is
left for the new implementation. Since CA_EVENT_FRTO is removed, TCP
westwood now computes ssthresh on regular timeout CA_EVENT_LOSS event.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a minimal stand-alone user space helper, that allows for debugging or
verification of emitted BPF JIT images. This is in particular useful for
emitted opcode debugging, since minor bugs in the JIT compiler can be fatal.
The disassembler is architecture generic and uses libopcodes and libbfd.
How to get to the disassembly, example:
1) `echo 2 > /proc/sys/net/core/bpf_jit_enable`
2) Load a BPF filter (e.g. `tcpdump -p -n -s 0 -i eth1 host 192.168.20.0/24`)
3) Run e.g. `bpf_jit_disasm -o` to disassemble the most recent JIT code output
`bpf_jit_disasm -o` will display the related opcodes to a particular instruction
as well. Example for x86_64:
$ ./bpf_jit_disasm
94 bytes emitted from JIT compiler (pass:3, flen:9)
ffffffffa0356000 + <x>:
0: push %rbp
1: mov %rsp,%rbp
4: sub $0x60,%rsp
8: mov %rbx,-0x8(%rbp)
c: mov 0x68(%rdi),%r9d
10: sub 0x6c(%rdi),%r9d
14: mov 0xe0(%rdi),%r8
1b: mov $0xc,%esi
20: callq 0xffffffffe0d01b71
25: cmp $0x86dd,%eax
2a: jne 0x000000000000003d
2c: mov $0x14,%esi
31: callq 0xffffffffe0d01b8d
36: cmp $0x6,%eax
[...]
5c: leaveq
5d: retq
$ ./bpf_jit_disasm -o
94 bytes emitted from JIT compiler (pass:3, flen:9)
ffffffffa0356000 + <x>:
0: push %rbp
55
1: mov %rsp,%rbp
48 89 e5
4: sub $0x60,%rsp
48 83 ec 60
8: mov %rbx,-0x8(%rbp)
48 89 5d f8
c: mov 0x68(%rdi),%r9d
44 8b 4f 68
10: sub 0x6c(%rdi),%r9d
44 2b 4f 6c
[...]
5c: leaveq
c9
5d: retq
c3
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
John W. Linville says:
====================
This is a big pull request for new features intended for the 3.10
stream...
Regarding mac80211, Johannes says:
"First, I merged mac80211/master to avoid some conflicts. This brings in
a bunch of fixes you're already familiar with. For real -next material,
I have a whole bunch of minstrel work, minstrel_ht from Felix and legacy
minstrel from Thomas (Huehn). The other Thomas (Pedersen) did a number
of changes in mesh to allow userspace peering management even when the
mesh isn't secured. Stanislaw changes suspend/resume to always
disconnect the networks. This is typically already done by
network-manager so won't make a huge difference for most users, but
fixes a number problems, particularly with USB drivers that can easily
disconnect while suspended. Ilan has a small change to allow mac80211
drivers to differentiate remain-on-channel reasons, and Jouni extends
nl80211 to allow fast roaming with full-MAC devices. I have a fairly
large number of patches as well, many of them fairly simple cleanups,
but also allowing split wiphy dumps and adding back the full wiphy
information in nl80211, station entry change checking and more VHT work
including VHT capability overrides (mostly for testing purposes)."
And for iwlwifi, Johannes says:
"Here, I also merged iwlwifi-fixes to avoid conflicts, and otherwise have
various cleanups and improvements on the MVM driver, along with a few
throughout the driver. Other than Bluetooth Coexistence from Emmanuel
there's no over-arching theme, so listing them would pretty much
reproduce the shortlog."
Regarding NFC, Samuel says:
"The 2 features we have with this one are:
- An LLCP Service Name Lookup (SNL) netlink interface for querying LLCP
service availability from user space.
Along the way, Thierry also improved the existing SNL interface for
aggregating SNL responses.
- An initial LLCP socket options implementation, for setting the Receive
Window (RW) and the Maximum Information Unit Extension (MIUX) per socket.
This is need for the LLCP validation tests.
We also have a microread MEI build failure here: I am not sending this one to
3.9 because the MEI bus code is not there yet, so it won't break for anyone
else than me."
And for ath6kl, Kalle says:
"I added tracing support to ath6kl, along with a new Kconfig option. Now
there's also a workaround to reset USB devices when the firmware upload
fails, this happened when host was warm rebooted. There are also quite a
few small fixes or cleanup."
On top of all that, there is the usual bundle of driver updates
with new features, new hardware support and the like mixed-in.
The ath9k, b43, brcmfmac, mwifiex, rt2800, and wil6210 drivers
are all well-represented, and a few other drivers are hit as well.
I also pulled-in the wireless fixes tree in order to resolve some
pending merge conflicts.
Please let me know if there are problems!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Use netdev_alloc_sk_ip_align in the case where packet is copied.
This handles case where NET_IP_ALIGN == 0 as well as adding required header
padding.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
John W. Linville says:
====================
I present to you another batch of fixes intended for the 3.9 stream...
On the bluetooth bits, Gustavo says:
"I put together 3 fixes intended for 3.9, there are support for two
new devices and a NULL dereference fix in the SCO code."
Amitkumar Karwar fixes a command queueing race in mwifiex.
Bing Zhao provides a pair of mwifiex related to cleaning-up before
a shutdown.
Felix Fietkau provides an ath9k fix for a regression caused by an
earlier calibration fix, and another ath9k fix to avoid race conditions
that unnecessarily lead to chip resets.
Jussi Kivilinna prevents and skbuff leak in rtlwifi.
Stanislaw Gruszka corrects a length paramater for a DMA buffer mapping
operation in iwlegacy.
Please let me know if there are problems!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Drivers should reserve some headroom in skb used in receive path,
to avoid future head reallocation.
One possible way to do that is to use dev_alloc_skb() instead
of alloc_skb(), so that NET_SKB_PAD bytes are reserved.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously, if you did an "ifconfig down" or similar on one core, and
the kernel had CONFIG_XFRM enabled, every core would be interrupted to
check its percpu flow list for items that could be garbage collected.
With this change, we generate a mask of cores that actually have any
percpu items, and only interrupt those cores. When we are trying to
isolate a set of cpus from interrupts, this is important to do.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Revised bnx2x implementation of PCI Express Advanced Error Recovery -
stop and free driver resources according to the AER flow (instead of the
currently implemented `hope-for-the-best' release approach), and do not make
any assumptions on the HW state after slot reset.
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
fec_poll_controller() was not declared. It should be static.
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
emac_poll_controller() was not declared. It should be static.
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
module_platform_driver macro removes some boilerplate and
simplifies the code.
Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Cc: David Daney <ddaney@caviumnetworks.com>
Acked-by: David Daney <ddaney@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
module_platform_driver macro removes some boilerplate and
simplifies the code.
Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
module_platform_driver macro removes some boilerplate and
simplifies the code.
Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Cc: Samuel Ortiz <samuel@sortiz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
module_platform_driver macro removes some boilerplate and
simplifies the code.
Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
module_platform_driver macro removes some boilerplate and
simplifies the code.
Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Process connector can now also detect coredumping events.
Main aim of patch is get notified at start of coredumping, instead of
having to wait for it to finish and then being notified through EXIT
event.
Could be used for instance by process-managers that want to get
notified as soon as possible about process failures, and not
necessarily beeing notified after coredump, which could be in the
order of minutes depending on size of coredump, piping and so on.
Signed-off-by: Jesper Derehag <jderehag@hotmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The only place where gfar_configure_coalescing is called
with an actual bitmask (other than 0xff) is in gfar_poll
(on the hot path). So make gfar_configure_coalescing()
static for the buffer processing path, and export
gfar_configure_coalescing_all() for the remaining cases
that require to set coalescing for all the queues at once
(on the slow path).
Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For Multi Q Multi Group (MQ_MG_MODE) mode, the Rx/Tx colescing registers [rt]xic
are aliased with the [rt]xic0 registers (coalescing setting regs for Q0). This
avoids programming twice in a row the coalescing registers for the Rx/Tx hw Q0.
Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Split the napi budget fairly among the active queues only, instead
of dividing it by the total number of Rx queues assigned to the
given interrupt group.
Use the h/w indication field RXFi in rstat (receive status register)
to identify the active rx queues from the current interrupt group
(i.e. receive event occured on ring i, if ring i is part of the current
interrupt group). This indication field in rstat, RXFi i=0..7,
allows us to find out on which queues of the same interrupt group
do we have incomming traffic once we entered the polling routine for
the given interrupt group. After servicing the ring i, the corresponding
bit RXFi will be written with 1 to clear the active queue indication for
that ring.
Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are 2 issues with the current napi poll routine, with regards
to tx ring cleanup:
1) for multi-queue devices (MQ_MG_MODE), should tx_bit_map != rx_bit_map,
which is possible (and supported in h/w) if the DT property "fsl,tx-bit-map"
holds a different value than rx_bit_map, the current polling routine will
service the wrong Tx queues in this case (i.e. the interrupt group will
receive interrupts from tx queues that it will not service)
2) Tx cleanup completion consumes napi budget, whereas the napi budget
should be reserved for Rx work only.
The patch fixes these issues and provides a clean napi polling routine.
Napi poll completion is reached when all the Rx queues have been
serviced and there is no Tx work to do.
Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is very useful to do dynamic truncation of packets. In particular,
we're interested to push the necessary header bytes to the user space and
cut off user payload that should probably not be transferred for some reasons
(e.g. privacy, speed, or others). With the ancillary extension PAY_OFFSET,
we can load it into the accumulator, and return it. E.g. in bpfc syntax ...
ld #poff ; { 0x20, 0, 0, 0xfffff034 },
ret a ; { 0x16, 0, 0, 0x00000000 },
... as a filter will accomplish this without having to do a big hackery in
a BPF filter itself. Follow-up JIT implementations are welcome.
Thanks to Eric Dumazet for suggesting and discussing this during the
Netfilter Workshop in Copenhagen.
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
__skb_get_poff() returns the offset to the payload as far as it could
be dissected. The main user is currently BPF, so that we can dynamically
truncate packets without needing to push actual payload to the user
space and instead can analyze headers only.
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>