Conflicts:
drivers/net/ethernet/rocker/rocker.c
The rocker commit was two overlapping changes, one to rename
the ->vport member to ->pport, and another making the bitmask
expression use '1ULL' instead of plain '1'.
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
A small batch with accumulated updates in nf-next, mostly IPVS updates,
they are:
1) Add 64-bits stats counters to IPVS, from Julian Anastasov.
2) Move NETFILTER_XT_MATCH_ADDRTYPE out of NETFILTER_ADVANCED as docker
seem to require this, from Anton Blanchard.
3) Use boolean instead of numeric value in set_match_v*(), from
coccinelle via Fengguang Wu.
4) Allows rescheduling of new connections in IPVS when port reuse is
detected, from Marcelo Ricardo Leitner.
5) Add missing bits to support arptables extensions from nft_compat,
from Arturo Borrero.
Patrick is preparing a large batch to enhance the set infrastructure,
named expressions among other things, that should follow up soon after
this batch.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support to arptables extensions from nft_compat.
Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Currently, all real users of rhashtable default their grow and shrink
decision functions to rht_grow_above_75() and rht_shrink_below_30(),
so that there's currently no need to have this explicitly selectable.
It can/should be generic and private inside rhashtable until a real
use case pops up. Since we can make this private, we'll save us this
additional indirection layer and can improve insertion/deletion time
as well.
Reference: http://patchwork.ozlabs.org/patch/443040/
Suggested-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, when TCP/SCTP port reusing happens, IPVS will find the old
entry and use it for the new one, behaving like a forced persistence.
But if you consider a cluster with a heavy load of small connections,
such reuse will happen often and may lead to a not optimal load
balancing and might prevent a new node from getting a fair load.
This patch introduces a new sysctl, conn_reuse_mode, that allows
controlling how to proceed when port reuse is detected. The default
value will allow rescheduling of new connections only if the old entry
was in TIME_WAIT state for TCP or CLOSED for SCTP.
Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Although it is clear that textsearch state is intentionally passed to
skb_find_text() as uninitialized argument, it was never used by the
callers. Therefore, we can simplify skb_find_text() by making it
local variable.
Signed-off-by: Bojan Prtvar <prtvar.b@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter/IPVS fixes for net
The following patchset contains updates for your net tree, they are:
1) Fix removal of destination in IPVS when the new mixed family support
is used, from Alexey Andriyanov via Simon Horman.
2) Fix module refcount undeflow in nft_compat when reusing a match /
target.
3) Fix iptables-restore when the recent match is used with a new hitcount
that exceeds threshold, from Florian Westphal.
4) Fix stack corruption in xt_socket due to using stack storage to save
the inner IPv6 header, from Eric Dumazet.
I'll follow up soon with another batch with more fixes that are still
cooking.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
As soon as extract_icmp6_fields() returns, its local storage (automatic
variables) is deallocated and can be overwritten.
Lets add an additional parameter to make sure storage is valid long
enough.
While we are at it, adds some const qualifiers.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: b64c9256a9 ("tproxy: added IPv6 support to the socket match")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
given:
-A INPUT -m recent --update --seconds 30 --hitcount 4
and
iptables-save > foo
then
iptables-restore < foo
will fail with:
kernel: xt_recent: hitcount (4) is larger than packets to be remembered (4) for table DEFAULT
Even when the check is fixed, the restore won't work if the hitcount is
increased to e.g. 6, since by the time checkentry runs it will find the
'old' incarnation of the table.
We can avoid this by increasing the maximum threshold silently; we only
have to rm all the current entries of the table (these entries would
not have enough room to handle the increased hitcount).
This even makes (not-very-useful)
-A INPUT -m recent --update --seconds 30 --hitcount 4
-A INPUT -m recent --update --seconds 30 --hitcount 42
work.
Fixes: abc86d0f99 (netfilter: xt_recent: relax ip_pkt_list_tot restrictions)
Tracked-down-by: Chris Vine <chris@cvine.freeserve.co.uk>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains two small Netfilter updates for your
net-next tree, they are:
1) Add ebtables support to nft_compat, from Arturo Borrero.
2) Fix missing validation of the SET_ID attribute in the lookup
expressions, from Patrick McHardy.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
net/netfilter/xt_set.c:196:9-10: WARNING: return of 0/1 in function 'set_match_v3' with return type bool
net/netfilter/xt_set.c:242:9-10: WARNING: return of 0/1 in function 'set_match_v4' with return type bool
Return statements in functions returning bool should use
true/false instead of 1/0.
Generated by: scripts/coccinelle/misc/boolreturn.cocci
CC: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Docker needs NETFILTER_XT_MATCH_ADDRTYPE, so move it out from behind
NETFILTER_ADVANCED and make it default to a module.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
IPVS stats are limited to 2^(32-10) conns/s and packets/s,
2^(32-5) bytes/s. It is time to use 64 bits:
* Change all conn/packet kernel counters to 64-bit and update
them in u64_stats_update_{begin,end} section
* In kernel use struct ip_vs_kstats instead of the user-space
struct ip_vs_stats_user and use new func ip_vs_export_stats_user
to export it to sockopt users to preserve compatibility with
32-bit values
* Rename cpu counters "ustats" to "cnt"
* To netlink users provide additionally 64-bit stats:
IPVS_SVC_ATTR_STATS64 and IPVS_DEST_ATTR_STATS64. Old stats
remain for old binaries.
* We can use ip_vs_copy_stats in ip_vs_stats_percpu_show
Thanks to Chris Caputo for providing initial patch for ip_vs_est.c
Signed-off-by: Chris Caputo <ccaputo@alt.net>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
The current code prevents any operation with a mixed-family dest
unless IP_VS_CONN_F_TUNNEL flag is set. The problem is that it's impossible
for the client to follow this rule, because ip_vs_genl_parse_dest does
not even read the destination conn_flags when cmd = IPVS_CMD_DEL_DEST
(need_full_dest = 0).
Also, not every client can pass this flag when removing a dest. ipvsadm,
for example, does not support the "-i" command line option together with
the "-d" option.
This change disables any checks for mixed-family on IPVS_CMD_DEL_DEST command.
Signed-off-by: Alexey Andriyanov <alan@al-an.info>
Fixes: bc18d37f67 ("ipvs: Allow heterogeneous pools now that we support them")
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Conflicts:
drivers/net/vxlan.c
drivers/vhost/net.c
include/linux/if_vlan.h
net/core/dev.c
The net/core/dev.c conflict was the overlap of one commit marking an
existing function static whilst another was adding a new function.
In the include/linux/if_vlan.h case, the type used for a local
variable was changed in 'net', whereas the function got rewritten
to fix a stacked vlan bug in 'net-next'.
In drivers/vhost/net.c, Al Viro's iov_iter conversions in 'net-next'
overlapped with an endainness fix for VHOST 1.0 in 'net'.
In drivers/net/vxlan.c, vxlan_find_vni() added a 'flags' parameter
in 'net-next' whereas in 'net' there was a bug fix to pass in the
correct network namespace pointer in calls to this function.
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch gets rid of the manual rhashtable walk in nft_hash
which touches rhashtable internals that should not be exposed.
It does so by using the rhashtable iterator primitives.
Note that I'm leaving nft_hash_destroy alone since it's only
invoked on shutdown and it shouldn't be affected by changes
to rhashtable internals (or at least not what I'm planning to
change).
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch extends nft_compat to support ebtables extensions.
ebtables verdict codes are translated to the ones used by the nf_tables engine,
so we can properly use ebtables target extensions from nft_compat.
This patch extends previous work by Giuseppe Longo <giuseppelng@gmail.com>.
Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
commit f5a41847ac ("ipvs: move ip_route_me_harder for ICMP")
from 2.6.37 introduced ip_route_me_harder() call for responses to
local clients, so that we can provide valid rt_src after SNAT.
It was used by TCP to provide valid daddr for ip_send_reply().
After commit 0a5ebb8000 ("ipv4: Pass explicit daddr arg to
ip_send_reply()." from 3.0 this rerouting is not needed anymore
and should be avoided, especially in LOCAL_IN.
Fixes 3.12.33 crash in xfrm reported by Florian Wiessner:
"3.12.33 - BUG xfrm_selector_match+0x25/0x2f6"
Reported-by: Smart Weblications GmbH - Florian Wiessner <f.wiessner@smart-weblications.de>
Tested-by: Smart Weblications GmbH - Florian Wiessner <f.wiessner@smart-weblications.de>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
The user can crash the kernel if it uses any of the existing NAT
expressions from the wrong hook, so add some code to validate this
when loading the rule.
This patch introduces nft_chain_validate_hooks() which is based on
an existing function in the bridge version of the reject expression.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Contrary to common expectations for an "int" return, these functions
return only a positive value -- if used correctly they cannot even
return 0 because the message header will necessarily be in the skb.
This makes the very common pattern of
if (genlmsg_end(...) < 0) { ... }
be a whole bunch of dead code. Many places also simply do
return nlmsg_end(...);
and the caller is expected to deal with it.
This also commonly (at least for me) causes errors, because it is very
common to write
if (my_function(...))
/* error condition */
and if my_function() does "return nlmsg_end()" this is of course wrong.
Additionally, there's not a single place in the kernel that actually
needs the message length returned, and if anyone needs it later then
it'll be very easy to just use skb->len there.
Remove this, and make the functions void. This removes a bunch of dead
code as described above. The patch adds lines because I did
- return nlmsg_end(...);
+ nlmsg_end(...);
+ return 0;
I could have preserved all the function's return values by returning
skb->len, but instead I've audited all the places calling the affected
functions and found that none cared. A few places actually compared
the return value with <= 0 in dump functionality, but that could just
be changed to < 0 with no change in behaviour, so I opted for the more
efficient version.
One instance of the error I've made numerous times now is also present
in net/phonet/pn_netlink.c in the route_dumpit() function - it didn't
check for <0 or <=0 and thus broke out of the loop every single time.
I've preserved this since it will (I think) have caused the messages to
userspace to be formatted differently with just a single message for
every SKB returned to userspace. It's possible that this isn't needed
for the tools that actually use this, but I don't even know what they
are so couldn't test that changing this behaviour would be acceptable.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
netfilter updates for net-next
The following patchset contains netfilter updates for net-next, just a
bunch of cleanups and small enhancement to selectively flush conntracks
in ctnetlink, more specifically the patches are:
1) Rise default number of buckets in conntrack from 16384 to 65536 in
systems with >= 4GBytes, patch from Marcelo Leitner.
2) Small refactor to save one level on indentation in xt_osf, from
Joe Perches.
3) Remove unnecessary sizeof(char) in nf_log, from Fabian Frederick.
4) Another small cleanup to remove redundant variable in nfnetlink,
from Duan Jiong.
5) Fix compilation warning in nfnetlink_cthelper on parisc, from
Chen Gang.
6) Fix wrong format in debugging for ctseqadj, from Gao feng.
7) Selective conntrack flushing through the mark for ctnetlink, patch
from Kristian Evensen.
8) Remove nf_ct_conntrack_flush_report() exported symbol now that is
not required anymore after the selective flushing patch, again from
Kristian.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/xen-netfront.c
Minor overlapping changes in xen-netfront.c, mostly to do
with some buffer management changes alongside the split
of stats into TX and RX.
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
netfilter/ipvs fixes for net
The following patchset contains netfilter/ipvs fixes, they are:
1) Small fix for the FTP helper in IPVS, a diff variable may be left
unset when CONFIG_IP_VS_IPV6 is set. Patch from Dan Carpenter.
2) Fix nf_tables port NAT in little endian archs, patch from leroy
christophe.
3) Fix race condition between conntrack confirmation and flush from
userspace. This is the second reincarnation to resolve this problem.
4) Make sure inner messages in the batch come with the nfnetlink header.
5) Relax strict check from nfnetlink_bind() that may break old userspace
applications using all 1s group mask.
6) Schedule removal of chains once no sets and rules refer to them in
the new nf_tables ruleset flush command. Reported by Asbjoern Sloth
Toennesen.
Note that this batch comes later than usual because of the short
winter holidays.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The only user of nf_ct_conntrack_flush_report() was ctnetlink_del_conntrack().
After adding support for flushing connections with a given mark, this function
is no longer called.
Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch adds support for selective flushing of conntrack mappings.
By adding CTA_MARK and CTA_MARK_MASK to a delete-message, the mark (and
mask) is checked before a connection is deleted while flushing.
Configuring the flush is moved out of ctnetlink_del_conntrack(), and
instead of calling nf_conntrack_flush_report(), we always call
nf_ct_iterate_cleanup(). This enables us to only make one call from the
new ctnetlink_flush_conntrack() and makes it easy to add more filter
parameters.
Filtering is done in the ctnetlink_filter_match()-function, which is
also called from ctnetlink_dump_table(). ctnetlink_dump_filter has been
renamed ctnetlink_filter, to indicated that it is no longer only used
when dumping conntrack entries.
Moreover, reject mark filters with -EOPNOTSUPP if no ct mark support is
available.
Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Relax the checking that was introduced in 97840cb ("netfilter:
nfnetlink: fix insufficient validation in nfnetlink_bind") when the
subscription bitmask is used. Existing userspace code code may request
to listen to all of the existing netlink groups by setting an all to one
subscription group bitmask. Netlink already validates subscription via
setsockopt() for us.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Make sure there is enough room for the nfnetlink header in the
netlink messages that are part of the batch. There is a similar
check in netlink_rcv_skb().
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Commit 5195c14c8b ("netfilter: conntrack: fix race in
__nf_conntrack_confirm against get_next_corpse") aimed to resolve the
race condition between the confirmation (packet path) and the flush
command (from control plane). However, it introduced a crash when
several packets race to add a new conntrack, which seems easier to
reproduce when nf_queue is in place.
Fix this race, in __nf_conntrack_confirm(), by removing the CT
from unconfirmed list before checking the DYING bit. In case
race occured, re-add the CT to the dying list
This patch also changes the verdict from NF_ACCEPT to NF_DROP when
we lose race. Basically, the confirmation happens for the first packet
that we see in a flow. If you just invoked conntrack -F once (which
should be the common case), then this is likely to be the first packet
of the flow (unless you already called flush anytime soon in the past).
This should be hard to trigger, but better drop this packet, otherwise
we leave things in inconsistent state since the destination will likely
reply to this packet, but it will find no conntrack, unless the origin
retransmits.
The change of the verdict has been discussed in:
https://www.marc.info/?l=linux-netdev&m=141588039530056&w=2
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
new_start_seq and new_end_seq are network byte order,
print the host byte order in debug message and print
seq number as the type of unsigned int.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@soleta.eu>
The related code can be simplified, and also can avoid related warnings
(with allmodconfig under parisc):
CC [M] net/netfilter/nfnetlink_cthelper.o
net/netfilter/nfnetlink_cthelper.c: In function ‘nfnl_cthelper_from_nlattr’:
net/netfilter/nfnetlink_cthelper.c:97:9: warning: passing argument 1 o ‘memcpy’ discards ‘const’ qualifier from pointer target type [-Wdiscarded-array-qualifiers]
memcpy(&help->data, nla_data(attr), help->helper->data_len);
^
In file included from include/linux/string.h:17:0,
from include/uapi/linux/uuid.h:25,
from include/linux/uuid.h:23,
from include/linux/mod_devicetable.h:12,
from ./arch/parisc/include/asm/hardware.h:4,
from ./arch/parisc/include/asm/processor.h:15,
from ./arch/parisc/include/asm/spinlock.h:6,
from ./arch/parisc/include/asm/atomic.h:21,
from include/linux/atomic.h:4,
from ./arch/parisc/include/asm/bitops.h:12,
from include/linux/bitops.h:36,
from include/linux/kernel.h:10,
from include/linux/list.h:8,
from include/linux/module.h:9,
from net/netfilter/nfnetlink_cthelper.c:11:
./arch/parisc/include/asm/string.h:8:8: note: expected ‘void *’ but argument is of type ‘const char (*)[]’
void * memcpy(void * dest,const void *src,size_t count);
^
Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@soleta.eu>
Actually after netlink_skb_clone() is called, the nskb and
skb will point to the same thing, but they are used just like
they are different, sometimes this is confusing, so i think
there is no necessary to keep nskb anymore.
Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@soleta.eu>
Introduces an array of spinlocks to protect bucket mutations. The number
of spinlocks per CPU is configurable and selected based on the hash of
the bucket. This allows for parallel insertions and removals of entries
which do not share a lock.
The patch also defers expansion and shrinking to a worker queue which
allows insertion and removal from atomic context. Insertions and
deletions may occur in parallel to it and are only held up briefly
while the particular bucket is linked or unzipped.
Mutations of the bucket table pointer is protected by a new mutex, read
access is RCU protected.
In the event of an expansion or shrinking, the new bucket table allocated
is exposed as a so called future table as soon as the resize process
starts. Lookups, deletions, and insertions will briefly use both tables.
The future table becomes the main table after an RCU grace period and
initial linking of the old to the new table was performed. Optimization
of the chains to make use of the new number of buckets follows only the
new table is in use.
The side effect of this is that during that RCU grace period, a bucket
traversal using any rht_for_each() variant on the main table will not see
any insertions performed during the RCU grace period which would at that
point land in the future table. The lookup will see them as it searches
both tables if needed.
Having multiple insertions and removals occur in parallel requires nelems
to become an atomic counter.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The removal function of nft_hash currently stores a reference to the
previous element during lookup which is used to optimize removal later
on. This was possible because a lock is held throughout calling
rhashtable_lookup() and rhashtable_remove().
With the introdution of deferred table resizing in parallel to lookups
and insertions, the nftables lock will no longer synchronize all
table mutations and the stored pprev may become invalid.
Removing this optimization makes removal slightly more expensive on
average but allows taking the resize cost out of the insert and
remove path.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Cc: netfilter-devel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is in preparation to introduce per bucket spinlocks. It
extends all iterator macros to take the bucket table and bucket
index. It also introduces a new rht_dereference_bucket() to
handle protected accesses to buckets.
It introduces a barrier() to the RCU iterators to the prevent
the compiler from caching the first element.
The lockdep verifier is introduced as stub which always succeeds
and properly implement in the next patch when the locks are
introduced.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hash the key inside of rhashtable_lookup_compare() like
rhashtable_lookup() does. This allows to simplify the hashing
functions and keep them private.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Cc: netfilter-devel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Netlink families can exist in multiple namespaces, and for the most
part multicast subscriptions are per network namespace. Thus it only
makes sense to have bind/unbind notifications per network namespace.
To achieve this, pass the network namespace of a given client socket
to the bind/unbind functions.
Also do this in generic netlink, and there also make sure that any
bind for multicast groups that only exist in init_net is rejected.
This isn't really a problem if it is accepted since a client in a
different namespace will never receive any notifications from such
a group, but it can confuse the family if not rejected (it's also
possible to silently (without telling the family) accept it, but it
would also have to be ignored on unbind so families that take any
kind of action on bind/unbind won't do unnecessary work for invalid
clients like that.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make sure this fetches 16-bits port data from the register.
Remove casting to make sparse happy, not needed anymore.
Signed-off-by: leroy christophe <christophe.leroy@c-s.fr>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
sizeof(char) is always 1.
Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Invert logic in test to use continue.
This routine already uses continue, use it a bit more to
minimize > 80 column long lines and unnecessary indentation.
No change in compiled object file.
Other miscellanea:
o Remove trailing whitespace
o Realign arguments to multiline statement
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Manually bumping either nf_conntrack_buckets or nf_conntrack_max has
become a common task as our Linux servers tend to serve more and more
clients/applications, so let's adjust nf_conntrack_buckets this to a
more updated value.
Now for systems with more than 4GB of memory, nf_conntrack_buckets
becomes 65536 instead of 16384, resulting in nf_conntrack_max=256k
entries.
Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Simon Horman says:
====================
Second round of IPVS Updates for v3.19
please consider these IPVS updates for v3.19 or alternatively v3.20.
The single patch in this series fixes a long standing bug that
has not caused any trouble and thus is not being prioritised as a fix.
====================
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pull networking updates from David Miller:
1) New offloading infrastructure and example 'rocker' driver for
offloading of switching and routing to hardware.
This work was done by a large group of dedicated individuals, not
limited to: Scott Feldman, Jiri Pirko, Thomas Graf, John Fastabend,
Jamal Hadi Salim, Andy Gospodarek, Florian Fainelli, Roopa Prabhu
2) Start making the networking operate on IOV iterators instead of
modifying iov objects in-situ during transfers. Thanks to Al Viro
and Herbert Xu.
3) A set of new netlink interfaces for the TIPC stack, from Richard
Alpe.
4) Remove unnecessary looping during ipv6 routing lookups, from Martin
KaFai Lau.
5) Add PAUSE frame generation support to gianfar driver, from Matei
Pavaluca.
6) Allow for larger reordering levels in TCP, which are easily
achievable in the real world right now, from Eric Dumazet.
7) Add a variable of napi_schedule that doesn't need to disable cpu
interrupts, from Eric Dumazet.
8) Use a doubly linked list to optimize neigh_parms_release(), from
Nicolas Dichtel.
9) Various enhancements to the kernel BPF verifier, and allow eBPF
programs to actually be attached to sockets. From Alexei
Starovoitov.
10) Support TSO/LSO in sunvnet driver, from David L Stevens.
11) Allow controlling ECN usage via routing metrics, from Florian
Westphal.
12) Remote checksum offload, from Tom Herbert.
13) Add split-header receive, BQL, and xmit_more support to amd-xgbe
driver, from Thomas Lendacky.
14) Add MPLS support to openvswitch, from Simon Horman.
15) Support wildcard tunnel endpoints in ipv6 tunnels, from Steffen
Klassert.
16) Do gro flushes on a per-device basis using a timer, from Eric
Dumazet. This tries to resolve the conflicting goals between the
desired handling of bulk vs. RPC-like traffic.
17) Allow userspace to ask for the CPU upon what a packet was
received/steered, via SO_INCOMING_CPU. From Eric Dumazet.
18) Limit GSO packets to half the current congestion window, from Eric
Dumazet.
19) Add a generic helper so that all drivers set their RSS keys in a
consistent way, from Eric Dumazet.
20) Add xmit_more support to enic driver, from Govindarajulu
Varadarajan.
21) Add VLAN packet scheduler action, from Jiri Pirko.
22) Support configurable RSS hash functions via ethtool, from Eyal
Perry.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1820 commits)
Fix race condition between vxlan_sock_add and vxlan_sock_release
net/macb: fix compilation warning for print_hex_dump() called with skb->mac_header
net/mlx4: Add support for A0 steering
net/mlx4: Refactor QUERY_PORT
net/mlx4_core: Add explicit error message when rule doesn't meet configuration
net/mlx4: Add A0 hybrid steering
net/mlx4: Add mlx4_bitmap zone allocator
net/mlx4: Add a check if there are too many reserved QPs
net/mlx4: Change QP allocation scheme
net/mlx4_core: Use tasklet for user-space CQ completion events
net/mlx4_core: Mask out host side virtualization features for guests
net/mlx4_en: Set csum level for encapsulated packets
be2net: Export tunnel offloads only when a VxLAN tunnel is created
gianfar: Fix dma check map error when DMA_API_DEBUG is enabled
cxgb4/csiostor: Don't use MASTER_MUST for fw_hello call
net: fec: only enable mdio interrupt before phy device link up
net: fec: clear all interrupt events to support i.MX6SX
net: fec: reset fep link status in suspend function
net: sock: fix access via invalid file descriptor
net: introduce helper macro for_each_cmsghdr
...
Pull VFS changes from Al Viro:
"First pile out of several (there _definitely_ will be more). Stuff in
this one:
- unification of d_splice_alias()/d_materialize_unique()
- iov_iter rewrite
- killing a bunch of ->f_path.dentry users (and f_dentry macro).
Getting that completed will make life much simpler for
unionmount/overlayfs, since then we'll be able to limit the places
sensitive to file _dentry_ to reasonably few. Which allows to have
file_inode(file) pointing to inode in a covered layer, with dentry
pointing to (negative) dentry in union one.
Still not complete, but much closer now.
- crapectomy in lustre (dead code removal, mostly)
- "let's make seq_printf return nothing" preparations
- assorted cleanups and fixes
There _definitely_ will be more piles"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
copy_from_iter_nocache()
new helper: iov_iter_kvec()
csum_and_copy_..._iter()
iov_iter.c: handle ITER_KVEC directly
iov_iter.c: convert copy_to_iter() to iterate_and_advance
iov_iter.c: convert copy_from_iter() to iterate_and_advance
iov_iter.c: get rid of bvec_copy_page_{to,from}_iter()
iov_iter.c: convert iov_iter_zero() to iterate_and_advance
iov_iter.c: convert iov_iter_get_pages_alloc() to iterate_all_kinds
iov_iter.c: convert iov_iter_get_pages() to iterate_all_kinds
iov_iter.c: convert iov_iter_npages() to iterate_all_kinds
iov_iter.c: iterate_and_advance
iov_iter.c: macros for iterating over iov_iter
kill f_dentry macro
dcache: fix kmemcheck warning in switch_names
new helper: audit_file()
nfsd_vfs_write(): use file_inode()
ncpfs: use file_inode()
kill f_dentry uses
lockd: get rid of ->f_path.dentry->d_sb
...
The app_tcp_pkt_out() function expects "*diff" to be set and ends up
using uninitialized data if CONFIG_IP_VS_IPV6 is turned on.
The same issue is there in app_tcp_pkt_in(). Thanks to Julian Anastasov
for noticing that.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Since commit f886497212 ("ipv4: fix dst race in sk_dst_get()")
DST_NOCACHE dst_entries get freed by RCU. So there is no need to get a
reference on them when we are in rcu protected sections.
Cc: Eric Dumazet <edumazet@google.com>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Reviewed-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>