The shared packet statistics are a potential source of slow down
on bridged traffic. Convert to per-cpu array, but only keep those
statistics which change per-packet.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Without CONFIG_BRIDGE_IGMP_SNOOPING,
BR_INPUT_SKB_CB(skb)->mrouters_only is not appropriately
initialized, so we can see garbage.
A clear option to fix this is to set it even without that
config, but we cannot optimize out the branch.
Let's introduce a macro that returns value of mrouters_only
and let it return 0 without CONFIG_BRIDGE_IGMP_SNOOPING.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
From: Michael Braun <michael-dev@fami-braun.de>
bridge: Fix br_forward crash in promiscuous mode
It's a linux-next kernel from 2010-03-12 on an x86 system and it
OOPs in the bridge module in br_pass_frame_up (called by
br_handle_frame_finish) because brdev cannot be dereferenced (its set to
a non-null value).
Adding some BUG_ON statements revealed that
BR_INPUT_SKB_CB(skb)->brdev == br-dev
(as set in br_handle_frame_finish first)
only holds until br_forward is called.
The next call to br_pass_frame_up then fails.
Digging deeper it seems that br_forward either frees the skb or passes
it to NF_HOOK which will in turn take care of freeing the skb. The
same is holds for br_pass_frame_ip. So it seems as if two independent
skb allocations are required. As far as I can see, commit
b33084be19 ("bridge: Avoid unnecessary
clone on forward path") removed skb duplication and so likely causes
this crash. This crash does not happen on 2.6.33.
I've therefore modified br_forward the same way br_flood has been
modified so that the skb is not freed if skb0 is going to be used
and I can confirm that the attached patch resolves the issue for me.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since all callers of br_mdb_ip_get need to check whether the
hash table is NULL, this patch moves the check into the function.
This fixes the two callers (query/leave handler) that didn't
check it.
Reported-by: Michael Braun <michael-dev@fami-braun.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (108 commits)
bridge: ensure to unlock in error path in br_multicast_query().
drivers/net/tulip/eeprom.c: fix bogus "(null)" in tulip init messages
sky2: Avoid rtnl_unlock without rtnl_lock
ipv6: Send netlink notification when DAD fails
drivers/net/tg3.c: change the field used with the TG3_FLAG_10_100_ONLY constant
ipconfig: Handle devices which take some time to come up.
mac80211: Fix memory leak in ieee80211_if_write()
mac80211: Fix (dynamic) power save entry
ipw2200: use kmalloc for large local variables
ath5k: read eeprom IQ calibration values correctly for G mode
ath5k: fix I/Q calibration (for real)
ath5k: fix TSF reset
ath5k: use fixed antenna for tx descriptors
libipw: split ieee->networks into small pieces
mac80211: Fix sta_mtx unlocking on insert STA failure path
rt2x00: remove KSEG1ADDR define from rt2x00soc.h
net: add ColdFire support to the smc91x driver
asix: fix setting mac address for AX88772
ipv6 ip6_tunnel: eliminate unused recursion field from ip6_tnl{}.
net: Fix dev_mc_add()
...
Constify struct sysfs_ops.
This is part of the ops structure constification
effort started by Arjan van de Ven et al.
Benefits of this constification:
* prevents modification of data that is shared
(referenced) by many other structure instances
at runtime
* detects/prevents accidental (but not intentional)
modification attempts on archs that enforce
read-only kernel data at runtime
* potentially better optimized code as the compiler
can assume that the const data cannot be changed
* the compiler/linker move const data into .rodata
and therefore exclude them from false sharing
Signed-off-by: Emese Revfy <re.emese@gmail.com>
Acked-by: David Teigland <teigland@redhat.com>
Acked-by: Matt Domsch <Matt_Domsch@dell.com>
Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Acked-by: Hans J. Koch <hjk@linutronix.de>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Thanks to Paul McKenny for pointing out that it is incorrect to use
synchronize_rcu_bh to ensure that pending callbacks have completed.
Instead we should use rcu_barrier_bh.
Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
As Paul McKenney correctly pointed out, __br_mdb_ip_get needs
to use the RCU list walking primitive in order to work correctly
on platforms where data-dependency ordering is not guaranteed.
Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
We dereference "port" on the lines immediately before and immediately
after the test so port should hopefully never be null here.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
br_multicast calls ip_send_check(), so it should depend on INET.
built-in:
br_multicast.c:(.text+0x88cf4): undefined reference to `ip_send_check'
or modular:
ERROR: "ip_send_check" [net/bridge/bridge.ko] undefined!
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix the following build error when IGMP_SNOOPING is not enabled.
In file included from net/bridge/br.c:24:
net/bridge/br_private.h: In function 'br_multicast_is_router':
net/bridge/br_private.h:361: error: 'struct net_bridge' has no member named 'multicast_router'
net/bridge/br_private.h:362: error: 'struct net_bridge' has no member named 'multicast_router'
net/bridge/br_private.h:363: error: 'struct net_bridge' has no member named 'multicast_router_timer'
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch allows the user to the IGMP parameters related to the
snooping function of the bridge. This includes various time
values and retransmission limits.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch allows the user to control the hash elasticity/max
parameters. The elasticity setting does not take effect until
the next new multicast group is added. At which point it is
checked and if after rehashing it still can't be satisfied then
snooping will be disabled.
The max setting on the other hand takes effect immediately. It
must be a power of two and cannot be set to a value less than the
current number of multicast group entries. This is the only way
to shrink the multicast hash.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch allows the user to disable IGMP snooping completely
through a sysfs toggle. It also allows the user to reenable
snooping when it has been automatically disabled due to hash
collisions. If the collisions have not been resolved however
the system will refuse to reenable snooping.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch allows the user to forcibly enable/disable ports as
having multicast routers attached. A port with a multicast router
will receive all multicast traffic.
The value 0 disables it completely. The default is 1 which lets
the system automatically detect the presence of routers (currently
this is limited to picking up queries), and 2 means that the port
will always receive all multicast traffic.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch finally hooks up the multicast snooping module to the
data path. In particular, all multicast packets passing through
the bridge are fed into the module and switched by it.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch hooks up the bridge start/stop and add/delete/disable
port functions to the new multicast module.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds code to perform selective multicast forwarding.
We forward multicast traffic to a set of ports plus all multicast
router ports. In order to avoid duplications among these two
sets of ports, we order all ports by the numeric value of their
pointers. The two lists are then walked in lock-step to eliminate
duplicates.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the core functionality of IGMP snooping support
without actually hooking it up. So this patch should be a no-op
as far as the bridge's external behaviour is concerned.
All the new code and data is controlled by the Kconfig option
BRIDGE_IGMP_SNOOPING. A run-time toggle is also available.
The multicast switching is done using an hash table that is
lockless on the read-side through RCU. On the write-side the
new multicast_lock is used for all operations. The hash table
supports dynamic growth/rehashing.
The hash table will be rehashed if any chain length exceeds a
preset limit. If rehashing does not reduce the maximum chain
length then snooping will be disabled.
These features may be added in future (in no particular order):
* IGMPv3 source support
* Non-querier router detection
* IPv6
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch moves the main loop body in br_flood into the function
may_deliver. The code that clones an skb and delivers it is moved
into the deliver_clone function.
This allows this to be reused by the future multicast forward
function.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
this patch makes BR_INPUT_SKB_CB available on the xmit path so
that we could avoid passing the br pointer around for the purpose
of collecting device statistics.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the packet is delivered to the local bridge device we may
end up cloning it unnecessarily if no bridge port can receive
the packet in br_flood.
This patch avoids this by moving the skb_clone into br_flood.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch allows tail-call on the call to br_pass_frame_up
in br_handle_frame_finish. This is now possible because of the
previous patch to call br_pass_frame_up last.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
At the moment we deliver to the local bridge port via the function
br_pass_frame_up before all other ports. There is no requirement
for this.
For the purpose of IGMP snooping, it would be more convenient if
we did the local port last. Therefore this patch rearranges the
bridge input processing so that the local bridge port gets to see
the packet last (if at all).
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the required handlers to convert 32 bit
ebtables mark match and match target structs to 64bit layout.
Signed-off-by: Florian Westphal <fwestphal@astaro.com>
ebt_limit structure is larger on 64 bit systems due
to "long" type used in the (kernel-only) data section.
Setting .compatsize is enough in this case, these values
have no meaning in userspace.
Signed-off-by: Florian Westphal <fwestphal@astaro.com>
ebtables can be compiled to perform userspace-side padding of
structures. In that case, all the structures are already in the
'native' format expected by the kernel.
This tries to determine what format the userspace program is
using.
For most set/getsockopts, this can be done by checking
the len argument for sizeof(compat_ebt_replace) and
re-trying the native handler on error.
In case of EBT_SO_GET_ENTRIES, the native handler is tried first,
it will error out early when checking the *len argument
(the compat version has to defer this check until after
iterating over the kernel data set once, to adjust for all
the structure size differences).
As this would cause error printks, remove those as well, as
recommended by Bart de Schuymer.
Signed-off-by: Florian Westphal <fw@strlen.de>
Main code for 32 bit userland ebtables binary with 64 bit kernels
support.
Tested on x86_64 kernel only, using 64bit ebtables binary
for output comparision.
At least ebt_mark, m_mark and ebt_limit need CONFIG_COMPAT hooks, too.
remaining problem:
The ebtables userland makefile has:
ifeq ($(shell uname -m),sparc64)
CFLAGS+=-DEBT_MIN_ALIGN=8 -DKERNEL_64_USERSPACE_32
endif
struct ebt_replace, ebt_entry_match etc. then contain userland-side
padding, i.e. even if we are called from a 32 bit userland, the
structures may already be in the right format.
This problem is addressed in a follow-up patch.
Signed-off-by: Florian Westphal <fwestphal@astaro.com>
allows to call do_update_counters() from upcoming CONFIG_COMPAT
code instead of copy&pasting the same code.
Signed-off-by: Florian Westphal <fw@strlen.de>
once CONFIG_COMPAT support is added to ebtables, the new
copy_counters_to_user function can be called instead of duplicating
code.
Also remove last use of MEMPRINT, as requested by Bart De Schuymer.
Signed-off-by: Florian Westphal <fw@strlen.de>
once CONFIG_COMPAT support is merged this allows
to call do_replace_finish() after doing the CONFIG_COMPAT conversion
instead of copy & pasting this.
Signed-off-by: Florian Westphal <fw@strlen.de>
This will cause trouble once CONFIG_COMPAT support is added to ebtables.
xt_compat_*_offset() calculate the kernel/userland structure size delta
using:
XT_ALIGN(size) - COMPAT_XT_ALIGN(size)
If the match/target sizes are aligned at registration time,
delta is always zero.
Should have zero effect for existing systems: xtables uses
XT_ALIGN() whenever it deals with match/target sizes.
Signed-off-by: Florian Westphal <fwestphal@astaro.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
next_offset must be > 0, otherwise this loops forever.
The offset also contains the size of the ebt_entry structure
itself, so anything smaller is invalid.
Signed-off-by: Florian Westphal <fwestphal@astaro.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
This patch removes the unused age_list member from the net_bridge
structure.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add ->net to match destructor list like ->net in constructor list.
Make sure it's set in ebtables/iptables/ip6tables, this requires to
propagate netns up to *_unregister_table().
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Some complex match modules (like xt_hashlimit/xt_recent) want netns
information at constructor and destructor time. We propably can play
games at match destruction time, because netns can be passed in object,
but I think it's cleaner to explicitly pass netns.
Add ->net, make sure it's set from ebtables/iptables/ip6tables code.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
__net_init/__net_exit are apparently not going away, so use them
to full extent.
In some cases __net_init was removed, because it was called from
__net_exit code.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
normal users are currently allowed to set/modify ebtables rules.
Restrict it to processes with CAP_NET_ADMIN.
Note that this cannot be reproduced with unmodified ebtables binary
because it uses SOCK_RAW.
Signed-off-by: Florian Westphal <fwestphal@astaro.com>
Cc: stable@kernel.org
Signed-off-by: Patrick McHardy <kaber@trash.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1815 commits)
mac80211: fix reorder buffer release
iwmc3200wifi: Enable wimax core through module parameter
iwmc3200wifi: Add wifi-wimax coexistence mode as a module parameter
iwmc3200wifi: Coex table command does not expect a response
iwmc3200wifi: Update wiwi priority table
iwlwifi: driver version track kernel version
iwlwifi: indicate uCode type when fail dump error/event log
iwl3945: remove duplicated event logging code
b43: fix two warnings
ipw2100: fix rebooting hang with driver loaded
cfg80211: indent regulatory messages with spaces
iwmc3200wifi: fix NULL pointer dereference in pmkid update
mac80211: Fix TX status reporting for injected data frames
ath9k: enable 2GHz band only if the device supports it
airo: Fix integer overflow warning
rt2x00: Fix padding bug on L2PAD devices.
WE: Fix set events not propagated
b43legacy: avoid PPC fault during resume
b43: avoid PPC fault during resume
tcp: fix a timewait refcnt race
...
Fix up conflicts due to sysctl cleanups (dead sysctl_check code and
CTL_UNNUMBERED removed) in
kernel/sysctl_check.c
net/ipv4/sysctl_net_ipv4.c
net/ipv6/addrconf.c
net/sctp/sysctl.c
Not including net/atm/
Compiled tested x86 allyesconfig only
Added a > 80 column line or two, which I ignored.
Existing checkpatch plaints willfully, cheerfully ignored.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>