When using nanosleep() in an userspace application we get a
ratelimit warning
NOHZ: local_softirq_pending 08
for 10 times.
This patch replaces netif_rx() with netif_rx_ni() which has
to be used from process/softirq context.
The process/softirq context will be called from fakelb driver.
See linux-kernel commit 481a819 for similar fix.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds empty br_mdb_init() and br_mdb_uninit() definitions in
br_private.h to avoid build failure when CONFIG_BRIDGE_IGMP_SNOOPING is not set.
These methods were moved from br_multicast.c to br_netlink.c by
commit 3ec8e9f085
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 63233159fd:
bridge: Do not unregister all PF_BRIDGE rtnl operations
introduced a bug where a removal of a single bridge from a
multi-bridge system would remove MDB netlink handlers.
The handlers should only be removed once all bridges are gone, but
since we don't keep track of the number of bridge interfaces, it's
simpler to do it when the bridge module is unloaded. To make it
consistent, move the registration code into module initialization
code path.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
The following batch contains Netfilter fixes for 3.8-rc1. They are
a mixture of old bugs that have passed unnoticed (I'll pass these to
stable) and more fresh ones from the previous merge window, they are:
* Fix for MAC address in 6in4 tunnels via NFLOG that results in ulogd
showing up wrong address, from Bob Hockney.
* Fix a comment in nf_conntrack_ipv6, from Florent Fourcot.
* Fix a leak an error path in ctnetlink while creating an expectation,
from Jesper Juhl.
* Fix missing ICMP time exceeded in the IPv6 defragmentation code, from
Haibo Xi.
* Fix inconsistent handling of routing changes in MASQUERADE for the
new connections case, from Andrew Collins.
* Fix a missing skb_reset_transport in ip[6]t_REJECT that leads to
crashes in the ixgbe driver (since it seems to access the transport
header with TSO enabled), from Mukund Jampala.
* Recover obsoleted NOTRACK target by including it into the CT and spot
a warning via printk about being obsoleted. Many people don't check the
scheduled to be removal file under Documentation, so we follow some
less agressive approach to kill this in a year or so. Spotted by Florian
Westphal, patch from myself.
* Fix race condition in xt_hashlimit that allows to create two or more
entries, from myself.
* Fix crash if the CT is used due to the recently added facilities to
consult the dying and unconfirmed conntrack lists, from myself.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
ip6gre_xmit2() incorrectly sets transport header to inner payload
instead of GRE header. It seems copy-and-pasted from ipip.c.
Set transport header to gre header.
(In ipip case the transport header is the inner ip header, so that's
correct.)
Found by inspection. In practice the incorrect transport header
doesn't matter because the skb usually is sent to another net_device
or socket, so the transport header isn't referenced.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
ipgre_tunnel_xmit() incorrectly sets transport header to inner payload
instead of GRE header. It seems copy-and-pasted from ipip.c.
So set transport header to gre header.
(In ipip case the transport header is the inner ip header, so that's
correct.)
Found by inspection. In practice the incorrect transport header
doesn't matter because the skb usually is sent to another net_device
or socket, so the transport header isn't referenced.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add an else to only print the incompatible protocol message
when version hasn't been established.
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
0b088e00 ("RDS: Use page_remainder_alloc() for recv bufs")
added uses of sg_dma_len() and sg_dma_address(). This makes
RDS DOA with the qib driver.
IB ulps should use ib_sg_dma_len() and ib_sg_dma_address
respectively since some HCAs overload ib_sg_dma* operations.
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In commit 96e0bf4b51 (tcp: Discard segments that ack data not yet
sent) John Dykstra enforced a check against ack sequences.
In commit 354e4aa391 (tcp: RFC 5961 5.2 Blind Data Injection Attack
Mitigation) I added more safety tests.
But we missed fact that these tests are not performed if ACK bit is
not set.
RFC 793 3.9 mandates TCP should drop a frame without ACK flag set.
" fifth check the ACK field,
if the ACK bit is off drop the segment and return"
Not doing so permits an attacker to only guess an acceptable sequence
number, evading stronger checks.
Many thanks to Zhiyun Qian for bringing this issue to our attention.
See :
http://web.eecs.umich.edu/~zhiyunq/pub/ccs12_TCP_sequence_number_inference.pdf
Reported-by: Zhiyun Qian <zhiyunq@umich.edu>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: John Dykstra <john.dykstra1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
batadv_iv_ogm_emit_send_time() attempts to calculates a random integer
in the range of 'orig_interval +- BATADV_JITTER' by the below lines.
msecs = atomic_read(&bat_priv->orig_interval) - BATADV_JITTER;
msecs += (random32() % 2 * BATADV_JITTER);
But it actually gets 'orig_interval' or 'orig_interval - BATADV_JITTER'
because '%' and '*' have same precedence and associativity is
left-to-right.
This adds the parentheses at the appropriate position so that it matches
original intension.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: Antonio Quartulli <ordex@autistici.org>
Cc: Marek Lindner <lindner_marek@yahoo.de>
Cc: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Cc: Antonio Quartulli <ordex@autistici.org>
Cc: b.a.t.m.a.n@lists.open-mesh.org
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes a leak in one of the error paths of
ctnetlink_create_expect if no helper and no timeout is specified.
Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
recent_net_exit() is called before recent_mt_destroy() in the
destroy path of network namespaces. Make sure there are no entries
in the parent proc entry xt_recent before removing it.
Signed-off-by: Vitaly E. Lavrov <lve@guap.ru>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
recent_net_exit() is called before recent_mt_destroy() in the
destroy path of network namespaces. Make sure there are no entries
in the parent proc entry xt_recent before removing it.
Signed-off-by: Vitaly E. Lavrov <lve@guap.ru>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Two packets may race to create the same entry in the hashtable,
double check if this packet lost race. This double checking only
happens in the path of the packet that creates the hashtable for
first time.
Note that, with this patch, no packet drops occur if the race happens.
Reported-by: Feng Gao <gfree.wind@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Sedat reported the following commit caused a regression:
commit 9650388b5c
Author: Eric Dumazet <edumazet@google.com>
Date: Fri Dec 21 07:32:10 2012 +0000
ipv4: arp: fix a lockdep splat in arp_solicit
This is due to the 6th parameter of arp_send() needs to be NULL
for the broadcast case, the above commit changed it to an all-zero
array by mistake.
Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Westphal reported that the removal of the NOTRACK target
(9655050 netfilter: remove xt_NOTRACK) is breaking some existing
setups.
That removal was scheduled for removal since long time ago as
described in Documentation/feature-removal-schedule.txt
What: xt_NOTRACK
Files: net/netfilter/xt_NOTRACK.c
When: April 2011
Why: Superseded by xt_CT
Still, people may have not notice / may have decided to stick to an
old iptables version. I agree with him in that some more conservative
approach by spotting some printk to warn users for some time is less
agressive.
Current iptables 1.4.16.3 already contains the aliasing support
that makes it point to the CT target, so upgrading would fix it.
Still, the policy so far has been to avoid pushing our users to
upgrade.
As a solution, this patch recovers the NOTRACK target inside the CT
target and it now spots a warning.
Reported-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Fixed integer overflow in function htb_dequeue
Signed-off-by: Stefan Hasko <hasko.stevo@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
CONFIG_HOTPLUG is always enabled now, so remove the unused code that was
trying to be compiled out when this option was disabled, in the
networking core.
Cc: Bill Pemberton <wfp5p@virginia.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When netdev_set_master faild in br_add_if, we should
call br_netpoll_disable to do some cleanup jobs,such
as free the memory of struct netpoll which allocated
in br_netpoll_enable.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Acked-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Using a seqlock for devnet_rename_seq is not a good idea,
as device_rename() can sleep.
As we hold RTNL, we dont need a protection for writers,
and only need a seqcount so that readers can catch a change done
by a writer.
Bug added in commit c91f6df2db (sockopt: Change getsockopt() of
SO_BINDTODEVICE to return an interface name)
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Once skb_realloc_headroom() is called, tiph might point to freed memory.
Cache tiph->ttl value before the reallocation, to avoid unexpected
behavior.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
ipgre_tunnel_xmit() parses network header as IP unconditionally.
But transmitting packets are not always IP packet. For example such packet
can be sent by packet socket with sockaddr_ll.sll_protocol set.
So make the function check if skb->protocol is IP.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull nfsd update from Bruce Fields:
"Included this time:
- more nfsd containerization work from Stanislav Kinsbursky: we're
not quite there yet, but should be by 3.9.
- NFSv4.1 progress: implementation of basic backchannel security
negotiation and the mandatory BACKCHANNEL_CTL operation. See
http://wiki.linux-nfs.org/wiki/index.php/Server_4.0_and_4.1_issues
for remaining TODO's
- Fixes for some bugs that could be triggered by unusual compounds.
Our xdr code wasn't designed with v4 compounds in mind, and it
shows. A more thorough rewrite is still a todo.
- If you've ever seen "RPC: multiple fragments per record not
supported" logged while using some sort of odd userland NFS client,
that should now be fixed.
- Further work from Jeff Layton on our mechanism for storing
information about NFSv4 clients across reboots.
- Further work from Bryan Schumaker on his fault-injection mechanism
(which allows us to discard selective NFSv4 state, to excercise
rarely-taken recovery code paths in the client.)
- The usual mix of miscellaneous bugs and cleanup.
Thanks to everyone who tested or contributed this cycle."
* 'for-3.8' of git://linux-nfs.org/~bfields/linux: (111 commits)
nfsd4: don't leave freed stateid hashed
nfsd4: free_stateid can use the current stateid
nfsd4: cleanup: replace rq_resused count by rq_next_page pointer
nfsd: warn on odd reply state in nfsd_vfs_read
nfsd4: fix oops on unusual readlike compound
nfsd4: disable zero-copy on non-final read ops
svcrpc: fix some printks
NFSD: Correct the size calculation in fault_inject_write
NFSD: Pass correct buffer size to rpc_ntop
nfsd: pass proper net to nfsd_destroy() from NFSd kthreads
nfsd: simplify service shutdown
nfsd: replace boolean nfsd_up flag by users counter
nfsd: simplify NFSv4 state init and shutdown
nfsd: introduce helpers for generic resources init and shutdown
nfsd: make NFSd service structure allocated per net
nfsd: make NFSd service boot time per-net
nfsd: per-net NFSd up flag introduced
nfsd: move per-net startup code to separated function
nfsd: pass net to __write_ports() and down
nfsd: pass net to nfsd_set_nrthreads()
...
Pull Ceph update from Sage Weil:
"There are a few different groups of commits here. The largest is
Alex's ongoing work to enable the coming RBD features (cloning,
striping). There is some cleanup in libceph that goes along with it.
Cyril and David have fixed some problems with NFS reexport (leaking
dentries and page locks), and there is a batch of patches from Yan
fixing problems with the fs client when running against a clustered
MDS. There are a few bug fixes mixed in for good measure, many of
which will be going to the stable trees once they're upstream.
My apologies for the late pull. There is still a gremlin in the rbd
map/unmap code and I was hoping to include the fix for that as well,
but we haven't been able to confirm the fix is correct yet; I'll send
that in a separate pull once it's nailed down."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (68 commits)
rbd: get rid of rbd_{get,put}_dev()
libceph: register request before unregister linger
libceph: don't use rb_init_node() in ceph_osdc_alloc_request()
libceph: init event->node in ceph_osdc_create_event()
libceph: init osd->o_node in create_osd()
libceph: report connection fault with warning
libceph: socket can close in any connection state
rbd: don't use ENOTSUPP
rbd: remove linger unconditionally
rbd: get rid of RBD_MAX_SEG_NAME_LEN
libceph: avoid using freed osd in __kick_osd_requests()
ceph: don't reference req after put
rbd: do not allow remove of mounted-on image
libceph: Unlock unprocessed pages in start_read() error path
ceph: call handle_cap_grant() for cap import message
ceph: Fix __ceph_do_pending_vmtruncate
ceph: Don't add dirty inode to dirty list if caps is in migration
ceph: Fix infinite loop in __wake_requests
ceph: Don't update i_max_size when handling non-auth cap
bdi_register: add __printf verification, fix arg mismatch
...
In kick_requests(), we need to register the request before we
unregister the linger request. Otherwise the unregister will
reset the request's osd pointer to NULL.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
The red-black node in the ceph osd request structure is initialized
in ceph_osdc_alloc_request() using rbd_init_node(). We do need to
initialize this, because in __unregister_request() we call
RB_EMPTY_NODE(), which expects the node it's checking to have
been initialized. But rb_init_node() is apparently overkill, and
may in fact be on its way out. So use RB_CLEAR_NODE() instead.
For a little more background, see this commit:
4c199a93 rbtree: empty nodes have no color"
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
The red-black node node in the ceph osd event structure is not
initialized in create_osdc_create_event(). Because this node can
be the subject of a RB_EMPTY_NODE() call later on, we should ensure
the node is initialized properly for that.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
The red-black node node in the ceph osd structure is not initialized
in create_osd(). Because this node can be the subject of a
RB_EMPTY_NODE() call later on, we should ensure the node is
initialized properly for that. Add a call to RB_CLEAR_NODE()
initialize it.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
When a connection's socket disconnects, or if there's a protocol
error of some kind on the connection, a fault is signaled and
the connection is reset (closed and reopened, basically). We
currently get an error message on the log whenever this occurs.
A ceph connection will attempt to reestablish a socket connection
repeatedly if a fault occurs. This means that these error messages
will get repeatedly added to the log, which is undesirable.
Change the error message to be a warning, so they don't get
logged by default.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Latinoware 2012.
There's a slightly non-trivial merge in virtio-net, as we cleaned up the
virtio add_buf interface while DaveM accepted the mq virtio-net patches.
You can see my solution in my pending-rebases branch, if that helps, but I
know you love merging:
https://git.kernel.org/?p=linux/kernel/git/rusty/linux.git;a=commit;h=12e4e64fa66a4c812e4855de32abdb4d819526fe
Cheers,
Rusty.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJQz/vKAAoJENkgDmzRrbjx+eYQAK/egj9T8Nnth6mkzdbCFSO7
Bciga2hDiudGCiGojTRGPRSc0VP9LgfvPbY2pxX+R9CfEqR+a8q/rRQhCS79ZwPB
/mJy3HNiCx418HZxgwNtk6vPe0PjJm6SsjbXeB9hB+PQLCbdwA0BjpG6xjF/jitP
noPqhhXreeQgYVxAKoFPvff/Byu2GlNnDdVMQxWRmo8hTKlTCzl0T/7BHRxthhJj
iOrXTFzrT/osPT0zyqlngT03T4wlBvL2Bfw8d/kuRPEZ71dpIctWeH2KzdwXVCrz
hFQGxAz4OWvW3xrNwj7c6O3SWj4VemUMjQqeA/PtRiOEI5gM0Y/Bit47dWL4wM/O
OWUKFHzq4DFs8MmwXBgDDXl5xOjOBH9Ik4FZayn3Y7COT/B8CjFdOC2MdDGmZ9yd
NInumg7FqP+u12g+9Vq8S/b0cfoQm4qFe8VHiPJu+jRmCZglyvLjk7oq/QwW8Gaq
Pkzit1Ey0DWo2KvZ4D/nuXJCuhmzN/AJ10M48lLYZhtOIVg9gsa0xjhfgq4FnvSK
xFCf3rcWnlGIXcOYh/hKU25WaCLzBuqMuSK35A72IujrQOL7OJTk4Oqote3Z3H9B
08XJmyW6SOZdfw17X4Im1jbyuLek///xQJ9Jw/tya7j9lBt8zjJ+FmLPs4mLGEOm
WJv9uZPs+QbIMNky2Lcb
=myDR
-----END PGP SIGNATURE-----
Merge tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
Pull virtio update from Rusty Russell:
"Some nice cleanups, and even a patch my wife did as a "live" demo for
Latinoware 2012.
There's a slightly non-trivial merge in virtio-net, as we cleaned up
the virtio add_buf interface while DaveM accepted the mq virtio-net
patches."
* tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (27 commits)
virtio_console: Add support for remoteproc serial
virtio_console: Merge struct buffer_token into struct port_buffer
virtio: add drv_to_virtio to make code clearly
virtio: use dev_to_virtio wrapper in virtio
virtio-mmio: Fix irq parsing in command line parameter
virtio_console: Free buffers from out-queue upon close
virtio: Convert dev_printk(KERN_<LEVEL> to dev_<level>(
virtio_console: Use kmalloc instead of kzalloc
virtio_console: Free buffer if splice fails
virtio: tools: make it clear that virtqueue_add_buf() no longer returns > 0
virtio: scsi: make it clear that virtqueue_add_buf() no longer returns > 0
virtio: rpmsg: make it clear that virtqueue_add_buf() no longer returns > 0
virtio: net: make it clear that virtqueue_add_buf() no longer returns > 0
virtio: console: make it clear that virtqueue_add_buf() no longer returns > 0
virtio: make virtqueue_add_buf() returning 0 on success, not capacity.
virtio: console: don't rely on virtqueue_add_buf() returning capacity.
virtio_net: don't rely on virtqueue_add_buf() returning capacity.
virtio-net: remove unused skb_vnet_hdr->num_sg field
virtio-net: correct capacity math on ring full
virtio: move queue_index and num_free fields into core struct virtqueue.
...
Pull networking fixes from David Miller:
1) Really fix tuntap SKB use after free bug, from Eric Dumazet.
2) Adjust SKB data pointer to point past the transport header before
calling icmpv6_notify() so that the headers are in the state which
that function expects. From Duan Jiong.
3) Fix ambiguities in the new tuntap multi-queue APIs. From Jason
Wang.
4) mISDN needs to use del_timer_sync(), from Konstantin Khlebnikov.
5) Don't destroy mutex after freeing up device private in mac802154,
fix also from Konstantin Khlebnikov.
6) Fix INET request socket leak in TCP and DCCP, from Christoph Paasch.
7) SCTP HMAC kconfig rework, from Neil Horman.
8) Fix SCTP jprobes function signature, otherwise things explode, from
Daniel Borkmann.
9) Fix typo in ipv6-offload Makefile variable reference, from Simon
Arlott.
10) Don't fail USBNET open just because remote wakeup isn't supported,
from Oliver Neukum.
11) be2net driver bug fixes from Sathya Perla.
12) SOLOS PCI ATM driver bug fixes from Nathan Williams and David
Woodhouse.
13) Fix MTU changing regression in 8139cp driver, from John Greene.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (45 commits)
solos-pci: ensure all TX packets are aligned to 4 bytes
solos-pci: add firmware upgrade support for new models
solos-pci: remove superfluous debug output
solos-pci: add GPIO support for newer versions on Geos board
8139cp: Prevent dev_close/cp_interrupt race on MTU change
net: qmi_wwan: add ZTE MF880
drivers/net: Use of_match_ptr() macro in smsc911x.c
drivers/net: Use of_match_ptr() macro in smc91x.c
ipv6: addrconf.c: remove unnecessary "if"
bridge: Correctly encode addresses when dumping mdb entries
bridge: Do not unregister all PF_BRIDGE rtnl operations
use generic usbnet_manage_power()
usbnet: generic manage_power()
usbnet: handle PM failure gracefully
ksz884x: fix receive polling race condition
qlcnic: update driver version
qlcnic: fix unused variable warnings
net: fec: forbid FEC_PTP on SoCs that do not support
be2net: fix wrong frag_idx reported by RX CQ
be2net: fix be_close() to ensure all events are ack'ed
...
the value of err is always negative if it goes to errout, so we don't need to
check the value of err.
Signed-off-by: Cong Ding <dinggnu@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When dumping mdb table, set the addresses the kernel returns
based on the address protocol type.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Acked-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bridge fdb and link rtnl operations are registered in
core/rtnetlink. Bridge mdb operations are registred
in bridge/mdb. When removing bridge module, do not
unregister ALL PF_BRIDGE ops since that would remove
the ops from rtnetlink as well. Do remove mdb ops when
bridge is destroyed.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull (again) user namespace infrastructure changes from Eric Biederman:
"Those bugs, those darn embarrasing bugs just want don't want to get
fixed.
Linus I just updated my mirror of your kernel.org tree and it appears
you successfully pulled everything except the last 4 commits that fix
those embarrasing bugs.
When you get a chance can you please repull my branch"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
userns: Fix typo in description of the limitation of userns_install
userns: Add a more complete capability subset test to commit_creds
userns: Require CAP_SYS_ADMIN for most uses of setns.
Fix cap_capable to only allow owners in the parent user namespace to have caps.
Features include:
- Full audit of BUG_ON asserts in the NFS, SUNRPC and lockd client code
Remove altogether where possible, and replace with WARN_ON_ONCE and
appropriate error returns where not.
- NFSv4.1 client adds session dynamic slot table management. There is
matching server side code that has been submitted to Bruce for
consideration. Together, this code allows the server to dynamically
manage the amount of memory it allocates to the duplicate request
cache for each client. It will constantly resize those caches to
reserve more memory for clients that are hot while shrinking caches
for those that are quiescent.
In addition, there are assorted bugfixes for the generic NFS write code,
fixes to deal with the drop_nlink() warnings, and yet another fix for
NFSv4 getacl.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJQz8VNAAoJEGcL54qWCgDy7iYQAKbr7AAZOcZPoJigzakZ7nMi
UKYulGbFais2Llwzw1e+U5RzmorTSbvl7/m8eS7pDf3auYw/t4xtXjKSGZUNxaE1
q2hNKgVwodMbScYdkZXvKKNckS93oPDttrmEyzjKanqey+1E3HSklvOvikN0ihte
B/G1OtA7Qpcr92bPrLK+PjDqarCBUI4g42dYbZOBrZnXKTRtzUqsuKPu7WjpPiof
SHE5b1Emt7oUxgcijWGcvYCQ8voZdeSCnSksH3DgvORlutwdhUD3Yg8KyEfFZdyc
6C59ozXRLiHkV3c+jMhJzDkQXR9bYHrnK3tlq4G8v1NdJxRktQliZeqecRvip/Wz
rAxfE6fnPDEvKsCpZb3+5yTAt+aZwzEhRg1fFC9qfGOp+oRa+CWw5kJCyIFHwJu6
4LOlubQAf6rnIsja1L8D0FdeqHUa1+wy61On5kgVYS5JGtoBsQHpa1zTwdOxPmsR
2XTMYGNCEabvpKpO9+5xQbUzkFExPTesw47ygXiUuDT/snaarpV3/f05SSCaWZkX
R8QsGEOXTIh8/S+UxARGpc7H6xi1PdBM5nBziHVzjEdHgZRF4wGFaJe2CirMjSJO
Df5GEd5Z/8VCGWs+1w7HD5EaQ2n0wbt5daCE80Y2jRBr7NMYnY+ciF8/GktLpHsn
Zq1bXGOdr3UZ92LXuzL9
=G3N9
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-3.8-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client updates from Trond Myklebust:
"Features include:
- Full audit of BUG_ON asserts in the NFS, SUNRPC and lockd client
code. Remove altogether where possible, and replace with
WARN_ON_ONCE and appropriate error returns where not.
- NFSv4.1 client adds session dynamic slot table management. There
is matching server side code that has been submitted to Bruce for
consideration.
Together, this code allows the server to dynamically manage the
amount of memory it allocates to the duplicate request cache for
each client. It will constantly resize those caches to reserve
more memory for clients that are hot while shrinking caches for
those that are quiescent.
In addition, there are assorted bugfixes for the generic NFS write
code, fixes to deal with the drop_nlink() warnings, and yet another
fix for NFSv4 getacl."
* tag 'nfs-for-3.8-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (106 commits)
SUNRPC: continue run over clients list on PipeFS event instead of break
NFS: Don't use SetPageError in the NFS writeback code
SUNRPC: variable 'svsk' is unused in function bc_send_request
SUNRPC: Handle ECONNREFUSED in xs_local_setup_socket
NFSv4.1: Deal effectively with interrupted RPC calls.
NFSv4.1: Move the RPC timestamp out of the slot.
NFSv4.1: Try to deal with NFS4ERR_SEQ_MISORDERED.
NFS: nfs_lookup_revalidate should not trust an inode with i_nlink == 0
NFS: Fix calls to drop_nlink()
NFS: Ensure that we always drop inodes that have been marked as stale
nfs: Remove unused list nfs4_clientid_list
nfs: Remove duplicate function declaration in internal.h
NFS: avoid NULL dereference in nfs_destroy_server
SUNRPC handle EKEYEXPIRED in call_refreshresult
SUNRPC set gss gc_expiry to full lifetime
nfs: fix page dirtying in NFS DIO read codepath
nfs: don't zero out the rest of the page if we hit the EOF on a DIO READ
NFSv4.1: Be conservative about the client highest slotid
NFSv4.1: Handle NFS4ERR_BADSLOT errors correctly
nfs: don't extend writes to cover entire page if pagecache is invalid
...
As reported by Chen Gang <gang.chen@asianux.com>, we should ensure there
is enough space when formatting the sysfs buffers.
Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>
Otherwise an out of bounds read could happen.
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull user namespace changes from Eric Biederman:
"While small this set of changes is very significant with respect to
containers in general and user namespaces in particular. The user
space interface is now complete.
This set of changes adds support for unprivileged users to create user
namespaces and as a user namespace root to create other namespaces.
The tyranny of supporting suid root preventing unprivileged users from
using cool new kernel features is broken.
This set of changes completes the work on setns, adding support for
the pid, user, mount namespaces.
This set of changes includes a bunch of basic pid namespace
cleanups/simplifications. Of particular significance is the rework of
the pid namespace cleanup so it no longer requires sending out
tendrils into all kinds of unexpected cleanup paths for operation. At
least one case of broken error handling is fixed by this cleanup.
The files under /proc/<pid>/ns/ have been converted from regular files
to magic symlinks which prevents incorrect caching by the VFS,
ensuring the files always refer to the namespace the process is
currently using and ensuring that the ptrace_mayaccess permission
checks are always applied.
The files under /proc/<pid>/ns/ have been given stable inode numbers
so it is now possible to see if different processes share the same
namespaces.
Through the David Miller's net tree are changes to relax many of the
permission checks in the networking stack to allowing the user
namespace root to usefully use the networking stack. Similar changes
for the mount namespace and the pid namespace are coming through my
tree.
Two small changes to add user namespace support were commited here adn
in David Miller's -net tree so that I could complete the work on the
/proc/<pid>/ns/ files in this tree.
Work remains to make it safe to build user namespaces and 9p, afs,
ceph, cifs, coda, gfs2, ncpfs, nfs, nfsd, ocfs2, and xfs so the
Kconfig guard remains in place preventing that user namespaces from
being built when any of those filesystems are enabled.
Future design work remains to allow root users outside of the initial
user namespace to mount more than just /proc and /sys."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (38 commits)
proc: Usable inode numbers for the namespace file descriptors.
proc: Fix the namespace inode permission checks.
proc: Generalize proc inode allocation
userns: Allow unprivilged mounts of proc and sysfs
userns: For /proc/self/{uid,gid}_map derive the lower userns from the struct file
procfs: Print task uids and gids in the userns that opened the proc file
userns: Implement unshare of the user namespace
userns: Implent proc namespace operations
userns: Kill task_user_ns
userns: Make create_new_namespaces take a user_ns parameter
userns: Allow unprivileged use of setns.
userns: Allow unprivileged users to create new namespaces
userns: Allow setting a userns mapping to your current uid.
userns: Allow chown and setgid preservation
userns: Allow unprivileged users to create user namespaces.
userns: Ignore suid and sgid on binaries if the uid or gid can not be mapped
userns: fix return value on mntns_install() failure
vfs: Allow unprivileged manipulation of the mount namespace.
vfs: Only support slave subtrees across different user namespaces
vfs: Add a user namespace reference from struct mnt_namespace
...
A connection's socket can close for any reason, independent of the
state of the connection (and without irrespective of the connection
mutex). As a result, the connectino can be in pretty much any state
at the time its socket is closed.
Handle those other cases at the top of con_work(). Pull this whole
block of code into a separate function to reduce the clutter.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
In __unregister_linger_request(), the request is being removed
from the osd client's req_linger list only when the request
has a non-null osd pointer. It should be done whether or not
the request currently has an osd.
This is most likely a non-issue because I believe the request
will always have an osd when this function is called.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
There are SUNRPC clients, which program doesn't have pipe_dir_name. These
clients can be skipped on PipeFS events, because nothing have to be created or
destroyed. But instead of breaking in case of such a client was found, search
for suitable client over clients list have to be continued. Otherwise some
clients could not be covered by PipeFS event handler.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Cc: stable@vger.kernel.org [>= v3.4]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
If an osd has no requests and no linger requests, __reset_osd()
will just remove it with a call to __remove_osd(). That drops
a reference to the osd, and therefore the osd may have been free
by the time __reset_osd() returns. That function offers no
indication this may have occurred, and as a result the osd will
continue to be used even when it's no longer valid.
Change__reset_osd() so it returns an error (ENODEV) when it
deletes the osd being reset. And change __kick_osd_requests() so it
returns immediately (before referencing osd again) if __reset_osd()
returns *any* error.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
In __unregister_request(), there is a call to list_del_init()
referencing a request that was the subject of a call to
ceph_osdc_put_request() on the previous line. This is not
safe, because the request structure could have been freed
by the time we reach the list_del_init().
Fix this by reversing the order of these lines.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-off-by: Sage Weil <sage@inktank.com>
In (0c36b48 netfilter: nfnetlink_log: fix mac address for 6in4 tunnels)
the include file that defines ARPD_SIT was missing. This passed unnoticed
during my tests (I did not hit this problem here).
net/netfilter/nfnetlink_log.c: In function '__build_packet_message':
net/netfilter/nfnetlink_log.c:494:25: error: 'ARPHRD_SIT' undeclared (first use in this function)
net/netfilter/nfnetlink_log.c:494:25: note: each undeclared identifier is reported only once for
+each function it appears in
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pull security subsystem updates from James Morris:
"A quiet cycle for the security subsystem with just a few maintenance
updates."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
Smack: create a sysfs mount point for smackfs
Smack: use select not depends in Kconfig
Yama: remove locking from delete path
Yama: add RCU to drop read locking
drivers/char/tpm: remove tasklet and cleanup
KEYS: Use keyring_alloc() to create special keyrings
KEYS: Reduce initial permissions on keys
KEYS: Make the session and process keyrings per-thread
seccomp: Make syscall skipping and nr changes more consistent
key: Fix resource leak
keys: Fix unreachable code
KEYS: Add payload preparsing opportunity prior to key instantiate or update