linux-next

mirror of https://github.com/edk2-porting/linux-next.git synced 2024-12-23 04:34:11 +08:00

Author	SHA1	Message	Date
Harald Welte	020b4c12db	[NETFILTER]: Move ipv4 specific code from net/core/netfilter.c to net/ipv4/netfilter.c Netfilter cleanup - Move ipv4 code from net/core/netfilter.c to net/ipv4/netfilter.c - Move ipv6 netfilter code from net/ipv6/ip6_output.c to net/ipv6/netfilter.c Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-29 15:35:01 -07:00
Harald Welte	089af26c70	[NETFILTER]: Rename skb_ip_make_writable() to skb_make_writable() There is nothing IPv4-specific in it. In fact, it was already used by IPv6, too... Upcoming nfnetlink_queue code will use it for any kind of packet. Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-29 15:34:40 -07:00
Patrick McHardy	373ac73595	[NETFILTER]: C99 initizalizers for NAT protocols Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-29 15:33:34 -07:00
David S. Miller	86e65da9c1	[NET]: Remove explicit initializations of skb->input_dev Instead, set it in one place, namely the beginning of netif_receive_skb(). Based upon suggestions from Jamal Hadi Salim. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-29 15:33:26 -07:00
Adrian Bunk	0742fd53a3	[IPV4]: possible cleanups This patch contains the following possible cleanups: - make needlessly global code static - #if 0 the following unused global function: - xfrm4_state.c: xfrm4_state_fini - remove the following unneeded EXPORT_SYMBOL's: - ip_output.c: ip_finish_output - ip_output.c: sysctl_ip_default_ttl - fib_frontend.c: ip_dev_find - inetpeer.c: inet_peer_idlock - ip_options.c: ip_options_compile - ip_options.c: ip_options_undo - net/core/request_sock.c: sysctl_max_syn_backlog Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-29 15:33:20 -07:00
David S. Miller	f2ccd8fa06	[NET]: Kill skb->real_dev Bonding just wants the device before the skb_bond() decapsulation occurs, so simply pass that original device into packet_type->func() as an argument. It remains to be seen whether we can use this same exact thing to get rid of skb->input_dev as well. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-29 15:32:25 -07:00
Arnaldo Carvalho de Melo	83e3609eba	[REQSK]: Move the syn_table destroy from tcp_listen_stop to reqsk_queue_destroy Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-29 15:32:11 -07:00
Harald Welte	080774a243	[NETFILTER]: Add ctnetlink subsystem Add ctnetlink subsystem for userspace-access to ip_conntrack table. This allows reading and updating of existing entries, as well as creating new ones (and new expect's) via nfnetlink. Please note the 'strange' byte order: nfattr (tag+length) are in host byte order, while the payload is always guaranteed to be in network byte order. This allows a simple userspace process to encapsulate netlink messages into arch-independent udp packets by just processing/swapping the headers and not knowing anything about the actual payload. Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-29 15:31:49 -07:00
Stephen Hemminger	6f1cf16582	[NET]: Remove HIPPI private from skbuff.h This removes the private element from skbuff, that is only used by HIPPI. Instead it uses skb->cb[] to hold the additional data that is needed in the output path from hard_header to device driver. PS: The only qdisc that might potentially corrupt this cb[] is if netem was used over HIPPI. I will take care of that by fixing netem to use skb->stamp. I don't expect many users of netem over HIPPI Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-29 15:31:42 -07:00
Patrick McHardy	b0573dea1f	[NET]: Introduce SO_{SND,RCV}BUFFORCE socket options Allows overriding of sysctl_{wmem,rmrm}_max Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-29 15:31:35 -07:00
Harald Welte	f9e815b376	[NETFITLER]: Add nfnetlink layer. Introduce "nfnetlink" (netfilter netlink) layer. This layer is used as transport layer for all userspace communication of the new upcoming netfilter subsystems, such as ctnetlink, nfnetlink_queue and some day even the mythical pkttables ;) Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-29 15:31:29 -07:00
Harald Welte	ac3247baf8	[NETFILTER]: connection tracking event notifiers This adds a notifier chain based event mechanism for ip_conntrack state changes. As opposed to the previous implementations in patch-o-matic, we do no longer need a field in the skb to achieve this. Thanks to the valuable input from Patrick McHardy and Rusty on the idea of a per_cpu implementation. Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-29 15:31:24 -07:00
Patrick McHardy	abc3bc5804	[NET]: Kill skb->tc_classid Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-29 15:31:18 -07:00
David S. Miller	8728b834b2	[NET]: Kill skb->list Remove the "list" member of struct sk_buff, as it is entirely redundant. All SKB list removal callers know which list the SKB is on, so storing this in sk_buff does nothing other than taking up some space. Two tricky bits were SCTP, which I took care of, and two ATM drivers which Francois Romieu <romieu@fr.zoreil.com> fixed up. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>	2005-08-29 15:31:14 -07:00
Harald Welte	6869c4d8e0	[NETFILTER]: reduce netfilter sk_buff enlargement As discussed at netconf'05, we're trying to save every bit in sk_buff. The patch below makes sk_buff 8 bytes smaller. I did some basic testing on my notebook and it seems to work. The only real in-tree user of nfcache was IPVS, who only needs a single bit. Unfortunately I couldn't find some other free bit in sk_buff to stuff that bit into, so I introduced a separate field for them. Maybe the IPVS guys can resolve that to further save space. Initially I wanted to shrink pkt_type to three bits (PACKET_HOST and alike are only 6 values defined), but unfortunately the bluetooth code overloads pkt_type :( The conntrack-event-api (out-of-tree) uses nfcache, but Rusty just came up with a way how to do it without any skb fields, so it's safe to remove it. - remove all never-implemented 'nfcache' code - don't have ipvs code abuse 'nfcache' field. currently get's their own compile-conditional skb->ipvs_property field. IPVS maintainers can decide to move this bit elswhere, but nfcache needs to die. - remove skb->nfcache field to save 4 bytes - move skb->nfctinfo into three unused bits to save further 4 bytes Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-29 15:31:04 -07:00
Harald Welte	bf3a46aa9b	[NETFILTER]: convert nfmark and conntrack mark to 32bit As discussed at netconf'05, we convert nfmark and conntrack-mark to be 32bits even on 64bit architectures. Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-29 15:29:31 -07:00
Patrick McHardy	06c7427021	[FIB_TRIE]: Don't ignore negative results from fib_semantic_match When a semantic match occurs either success, not found or an error (for matching unreachable routes/blackholes) is returned. fib_trie ignores the errors and looks for a different matching route. Treat results other than "no match" as success and end lookup. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-23 22:06:09 -07:00
David S. Miller	c1cc168442	[ROSE]: Fix typo in rose_route_frame() locking fix. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-23 14:55:32 -07:00
David S. Miller	dc16aaf29d	[ROSE]: Fix missing unlocks in rose_route_frame() Noticed by Coverity checker. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-23 10:50:09 -07:00
David S. Miller	d5d283751e	[TCP]: Document non-trivial locking path in tcp_v{4,6}_get_port(). This trips up a lot of folks reading this code. Put an unlikely() around the port-exhaustion test for good measure. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-23 10:49:54 -07:00
David S. Miller	89ebd197eb	[TCP]: Unconditionally clear TCP_NAGLE_PUSH in skb_entail(). Intention of this bit is to force pushing of the existing send queue when TCP_CORK or TCP_NODELAY state changes via setsockopt(). But it's easy to create a situation where the bit never clears. For example, if the send queue starts empty: 1) set TCP_NODELAY 2) clear TCP_NODELAY 3) set TCP_CORK 4) do small write() The current code will leave TCP_NAGLE_PUSH set after that sequence. Unconditionally clearing the bit when new data is added via skb_entail() solves the problem. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-23 10:13:06 -07:00
Thomas Graf	0fbbeb1ba4	[PKT_SCHED]: Fix missing qdisc_destroy() in qdisc_create_dflt() qdisc_create_dflt() is missing to destroy the newly allocated default qdisc if the initialization fails resulting in leaks of all kinds. The only caller in mainline which may trigger this bug is sch_tbf.c in tbf_create_dflt_qdisc(). Note: qdisc_create_dflt() doesn't fulfill the official locking requirements of qdisc_destroy() but since the qdisc could never be seen by the outside world this doesn't matter and it can stay as-is until the locking of pkt_sched is cleaned up. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-23 10:12:44 -07:00
Vlad Yasevich	d2287f8441	[SCTP]: Add SENTINEL to SCTP MIB stats Add SNMP_MIB_SENTINEL to the definition of the sctp_snmp_list so that the output routine in proc correctly terminates. This was causing some problems running on ia64 systems. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-23 10:12:04 -07:00
Ralf Baechle	01d7dd0e9f	[AX25]: UID fixes o Brown paperbag bug - ax25_findbyuid() was always returning a NULL pointer as the result. Breaks ROSE completly and AX.25 if UID policy set to deny. o While the list structure of AX.25's UID to callsign mapping table was properly protected by a spinlock, it's elements were not refcounted resulting in a race between removal and usage of an element. Signed-off-by: Ralf Baechle DL5RB <ralf@linux-mips.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-23 10:11:45 -07:00
Ralf Baechle	53b924b31f	[NET]: Fix socket bitop damage The socket flag cleanups that went into 2.6.12-rc1 are basically oring the flags of an old socket into the socket just being created. Unfortunately that one was just initialized by sock_init_data(), so already has SOCK_ZAPPED set. As the result zapped sockets are created and all incoming connection will fail due to this bug which again was carefully replicated to at least AX.25, NET/ROM or ROSE. In order to keep the abstraction alive I've introduced sock_copy_flags() to copy the socket flags from one sockets to another and used that instead of the bitwise copy thing. Anyway, the idea here has probably been to copy all flags, so sock_copy_flags() should be the right thing. With this the ham radio protocols are usable again, so I hope this will make it into 2.6.13. Signed-off-by: Ralf Baechle DL5RB <ralf@linux-mips.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-23 10:11:30 -07:00
Patrick McHardy	66a79a19a7	[NETFILTER]: Fix HW checksum handling in ip_queue/ip6_queue The checksum needs to be filled in on output, after mangling a packet ip_summed needs to be reset. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-23 10:10:35 -07:00
Dave Johnson	1344a41637	[IPV4]: Fix negative timer loop with lots of ipv4 peers. From: Dave Johnson <djohnson+linux-kernel@sw.starentnetworks.com> Found this bug while doing some scaling testing that created 500K inet peers. peer_check_expire() in net/ipv4/inetpeer.c isn't using inet_peer_gc_mintime correctly and will end up creating an expire timer with less than the minimum duration, and even zero/negative if enough active peers are present. If >65K peers, the timer will be less than inet_peer_gc_mintime, and with >70K peers, the timer duration will reach zero and go negative. The timer handler will continue to schedule another zero/negative timer in a loop until peers can be aged. This can continue for at least a few minutes or even longer if the peers remain active due to arriving packets while the loop is occurring. Bug is present in both 2.4 and 2.6. Same patch will apply to both just fine. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-23 10:10:15 -07:00
Herbert Xu	c3a20692ca	[RPC]: Kill bogus kmap in krb5 While I was going through the crypto users recently, I noticed this bogus kmap in sunrpc. It's totally unnecessary since the crypto layer will do its own kmap before touching the data. Besides, the kmap is throwing the return value away. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-23 10:09:53 -07:00
Dmitry Yusupov	14869c3886	[TCP]: Do TSO deferral even if tail SKB can go out now. If the tail SKB fits into the window, it is still benefitical to defer until the goal percentage of the window is available. This give the application time to feed more data into the send queue and thus results in larger TSO frames going out. Patch from Dmitry Yusupov <dima@neterion.com>. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-23 10:09:27 -07:00
Patrick McHardy	7e71af49d4	[NETFILTER]: Fix HW checksum handling in TCPMSS target Most importantly, remove bogus BUG() in receive path. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-20 17:40:41 -07:00
Patrick McHardy	f93592ff4f	[NETFILTER]: Fix HW checksum handling in ECN target Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-20 17:39:15 -07:00
Patrick McHardy	fd841326d7	[NETFILTER]: Fix ECN target TCP marking An incorrect check made it bail out before doing anything. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-20 17:38:40 -07:00
Herbert Xu	6fc8b9e7c6	[IPCOMP]: Fix false smp_processor_id warning This patch fixes a false-positive from debug_smp_processor_id(). The processor ID is only used to look up crypto_tfm objects. Any processor ID is acceptable here as long as it is one that is iterated on by for_each_cpu(). Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-18 14:36:59 -07:00
Patrick McHardy	cb94c62c25	[IPV4]: Fix DST leak in icmp_push_reply() Based upon a bug report and initial patch by Ollie Wild. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-18 14:05:44 -07:00
Jay Vosburgh	001dd250c1	[TOKENRING]: Use interrupt-safe locking with rif_lock. Change operations on rif_lock from spin_{un}lock_bh to spin_{un}lock_irq{save,restore} equivalents. Some of the rif_lock critical sections are called from interrupt context via tr_type_trans->tr_add_rif_info. The TR NIC drivers call tr_type_trans from their packet receive handlers. Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-18 14:04:51 -07:00
Paul E. McKenney	1f07247de5	[DECNET]: Fix RCU race condition in dn_neigh_construct(). Signed-off-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-17 12:05:27 -07:00
Patrick McHardy	bfd272b1ca	[IPV6]: Fix SKB leak in ip6_input_finish() Changing it to how ip_input handles should fix it. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-17 12:04:22 -07:00
Herbert Xu	35d59efd10	[TCP]: Fix bug #5070 : kernel BUG at net/ipv4/tcp_output.c:864 1) We send out a normal sized packet with TSO on to start off. 2) ICMP is received indicating a smaller MTU. 3) We send the current sk_send_head which needs to be fragmented since it was created before the ICMP event. The first fragment is then sent out. At this point the remaining fragment is allocated by tcp_fragment. However, its size is padded to fit the L1 cache-line size therefore creating tail-room up to 124 bytes long. This fragment will also be sitting at sk_send_head. 4) tcp_sendmsg is called again and it stores data in the tail-room of of the fragment. 5) tcp_push_one is called by tcp_sendmsg which then calls tso_fragment since the packet as a whole exceeds the MTU. At this point we have a packet that has data in the head area being fed to tso_fragment which bombs out. My take on this is that we shouldn't ever call tcp_fragment on a TSO socket for a packet that is yet to be transmitted since this creates a packet on sk_send_head that cannot be extended. So here is a patch to change it so that tso_fragment is always used in this case. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-17 12:03:59 -07:00
Patrick McHardy	97077c4a98	[IPV6]: Fix raw socket hardware checksum failures When packets hit raw sockets the csum update isn't done yet, do it manually. Packets can also reach rawv6_rcv on the output path through ip6_call_ra_chain, in this case skb->ip_summed is CHECKSUM_NONE and this codepath isn't executed. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-17 12:03:32 -07:00
Trond Myklebust	58fcb8df0b	[PATCH] NFS: Ensure ACL xdr code doesn't overflow. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-08-16 08:52:11 -07:00
Matt Mackall	d7b9dfc8ea	[NETPOLL]: remove unused variable Remove unused variable Signed-off-by: Matt Mackall <mpm@selenic.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-11 19:28:05 -07:00
Matt Mackall	53fb95d3c1	[NETPOLL]: fix initialization/NAPI race This fixes a race during initialization with the NAPI softirq processing by using an RCU approach. This race was discovered when refill_skbs() was added to the setup code. Signed-off-by: Matt Mackall <mpm@selenic.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-11 19:27:43 -07:00
Ingo Molnar	2652076507	[NETPOLL]: pre-fill skb pool we could do one thing (see the patch below): i think it would be useful to fill up the netlogging skb queue straight at initialization time. Especially if netpoll is used for dumping alone, the system might not be in a situation to fill up the queue at the point of crash, so better be a bit more prepared and keep the pipeline filled. [ I've modified this to be called earlier - mpm ] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Matt Mackall <mpm@selenic.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-11 19:26:42 -07:00
Matt Mackall	0db1d6fc1e	[NETPOLL]: add retry timeout Add limited retry logic to netpoll_send_skb Each time we attempt to send, decrement our per-device retry counter. On every successful send, we reset the counter. We delay 50us between attempts with up to 20000 retries for a total of 1 second. After we've exhausted our retries, subsequent failed attempts will try only once until reset by success. Signed-off-by: Matt Mackall <mpm@selenic.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-11 19:25:54 -07:00
Matt Mackall	f0d3459d07	[NETPOLL]: netpoll_send_skb simplify Minor netpoll_send_skb restructuring Restructure to avoid confusing goto and move some bits out of the retry loop. Signed-off-by: Matt Mackall <mpm@selenic.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-11 19:25:11 -07:00
Jeff Moyer	a636e13579	[NETPOLL]: deadlock bugfix This fixes an obvious deadlock in the netpoll code. netpoll_rx takes the npinfo->rx_lock. netpoll_rx is also the only caller of arp_reply (through __netpoll_rx). As such, it is not necessary to take this lock. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Matt Mackall <mpm@selenic.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-11 19:23:50 -07:00
Jeff Moyer	11513128bb	[NETPOLL]: rx_flags bugfix Initialize npinfo->rx_flags. The way it stands now, this will have random garbage, and so will incur a locking penalty even when an rx_hook isn't registered and we are not active in the netpoll polling code. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Matt Mackall <mpm@selenic.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-11 19:23:04 -07:00
Herbert Xu	b5da623ae9	[TCP]: Adjust {p,f}ackets_out correctly in tcp_retransmit_skb() Well I've only found one potential cause for the assertion failure in tcp_mark_head_lost. First of all, this can only occur if cnt > 1 since tp->packets_out is never zero here. If it did hit zero we'd have much bigger problems. So cnt is equal to fackets_out - reordering. Normally fackets_out is less than packets_out. The only reason I've found that might cause fackets_out to exceed packets_out is if tcp_fragment is called from tcp_retransmit_skb with a TSO skb and the current MSS is greater than the MSS stored in the TSO skb. This might occur as the result of an expiring dst entry. In that case, packets_out may decrease (line 1380-1381 in tcp_output.c). However, fackets_out is unchanged which means that it may in fact exceed packets_out. Previously tcp_retrans_try_collapse was the only place where packets_out can go down and it takes care of this by decrementing fackets_out. So we should make sure that fackets_out is reduced by an appropriate amount here as well. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-10 18:32:36 -07:00
Steven Whitehouse	001ab02a8c	[DECNET]: Use sk_stream_error function rather than DECnet's own Signed-off-by: Steven Whitehouse <steve@chygwyn.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-10 11:32:57 -07:00
Andrew Morton	d64d387372	[NET]: Fix memory leak in sys_{send,recv}msg() w/compat From: Dave Johnson <djohnson+linux-kernel@sw.starentnetworks.com> sendmsg()/recvmsg() syscalls from o32/n32 apps to a 64bit kernel will cause a kernel memory leak if iov_len > UIO_FASTIOV for each syscall! This is because both sys_sendmsg() and verify_compat_iovec() kmalloc a new iovec structure. Only the one from sys_sendmsg() is free'ed. I wrote a simple test program to confirm this after identifying the problem: http://davej.org/programs/testsendmsg.c Note that the below fix will break solaris_sendmsg()/solaris_recvmsg() as it also calls verify_compat_iovec() but expects it to malloc internally. [ I fixed that. -DaveM ] Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-09 15:29:19 -07:00
David S. Miller	3501466941	[SUNRPC]: Fix nsec --> usec conversion. We need to divide, not multiply. While we're here, use NSEC_PER_USEC instead of a magic constant. Based upon a report from Josip Loncaric and a patch by Andrew Morton. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-09 14:57:12 -07:00
Linus Torvalds	92e52b2e82	Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6	2005-08-08 16:06:01 -07:00
Heikki Orsila	ca9334523c	[IPV4]: Debug cleanup Here's a small patch to cleanup NETDEBUG() use in net/ipv4/ for Linux kernel 2.6.13-rc5. Also weird use of indentation is changed in some places. Signed-off-by: Heikki Orsila <heikki.orsila@iki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-08 14:26:52 -07:00
Harald Welte	8b83bc77bf	[PATCH] don't try to do any NAT on untracked connections With the introduction of 'rustynat' in 2.6.11, the old tricks of preventing NAT of 'untracked' connections (e.g. NOTRACK target in 'raw' table) are no longer sufficient. The ip_conntrack_untracked.status \|= IPS_NAT_DONE_MASK effectively prevents iteration of the 'nat' table, but doesn't prevent nat_packet() to be executed. Since nr_manips is gone in 'rustynat', nat_packet() now implicitly thinks that it has to do NAT on the packet. This patch fixes that problem by explicitly checking for ip_conntrack_untracked in ip_nat_fn(). Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-08-08 11:48:28 -07:00
Herbert Xu	6fc0b4a7a7	[IPSEC]: Restrict socket policy loading to CAP_NET_ADMIN. The interface needs much redesigning if we wish to allow normal users to do this in some way. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-06 06:33:15 -07:00
Marcel Holtmann	576c7d858f	[Bluetooth] Add direction and timestamp to stack internal events This patch changes the direction to incoming and adds the timestamp to all stack internal events. Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2005-08-06 12:36:54 +02:00
Marcel Holtmann	66e8b6c31b	[Bluetooth] Remove unused functions and cleanup symbol exports This patch removes the unused bt_dump() function and it also removes its BT_DMP macro. It also unexports the hci_dev_get(), hci_send_cmd() and hci_si_event() functions. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2005-08-06 12:36:51 +02:00
Marcel Holtmann	dcc365d8f2	[Bluetooth] Revert session reference counting fix The fix for the reference counting problem of the signal DLC introduced a race condition which leads to an oops. The reason for it is not fully understood by now and so revert this fix, because the reference counting problem is not crashing the RFCOMM layer and its appearance it rare. Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2005-08-06 12:36:42 +02:00
David S. Miller	b7656e7f29	[IPV4]: Fix memory leak during fib_info hash expansion. When we grow the tables, we forget to free the olds ones up. Noticed by Yan Zheng. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-08-05 04:12:48 -07:00
Herbert Xu	b68e9f8572	[PATCH] tcp: fix TSO cwnd caching bug tcp_write_xmit caches the cwnd value indirectly in cwnd_quota. When tcp_transmit_skb reduces the cwnd because of tcp_enter_cwr, the cached value becomes invalid. This patch ensures that the cwnd value is always reread after each tcp_transmit_skb call. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-08-04 21:43:14 -07:00
David S. Miller	846998ae87	[PATCH] tcp: fix TSO sizing bugs MSS changes can be lost since we preemptively initialize the tso_segs count for an SKB before we %100 commit to sending it out. So, by the time we send it out, the tso_size information can be stale due to PMTU events. This mucks up all of the logic in our send engine, and can even result in the BUG() triggering in tcp_tso_should_defer(). Another problem we have is that we're storing the tp->mss_cache, not the SACK block normalized MSS, as the tso_size. That's wrong too. Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-08-04 21:43:14 -07:00
Denis Lunev	f0098f7863	[NET] Fix too aggressive backoff in dst garbage collection The bug is evident when it is seen once. dst gc timer was backed off, when gc queue is not empty. But this means that timer quickly backs off, if at least one destination remains in use. Normally, the bug is invisible, because adding new dst entry to queue cancels the backoff. But it shots deadly with destination cache overflow when new destinations are not released for long time f.e. after an interface goes down. The fix is to cancel backoff when something was released. Signed-off-by: Denis Lunev <den@sw.ru> Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-30 17:47:25 -07:00
Alexey Kuznetsov	db44575f6f	[NET]: fix oops after tunnel module unload Tunnel modules used to obtain module refcount each time when some tunnel was created, which meaned that tunnel could be unloaded only after all the tunnels are deleted. Since killing old MOD_*_USE_COUNT macros this protection has gone. It is possible to return it back as module_get/put, but it looks more natural and practically useful to force destruction of all the child tunnels on module unload. Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-30 17:46:44 -07:00
Harald Welte	1f494c0e04	[NETFILTER] Inherit masq_index to slave connections masq_index is used for cleanup in case the interface address changes (such as a dialup ppp link with dynamic addreses). Without this patch, slave connections are not evicted in such a case, since they don't inherit masq_index. Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-30 17:44:07 -07:00
Baruch Even	d1b04c081e	[NET]: Spelling mistakes threshoulds -> thresholds Just simple spelling mistake fixes. Signed-Off-By: Baruch Even <baruch@ev-en.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-30 17:41:59 -07:00
David S. Miller	6192b54b84	[NET]: Fix busy waiting in dev_close(). If the current task has signal_pending(), the loop we have to wait for the __LINK_STATE_RX_SCHED bit to clear becomes a pure busy-loop. Fixed by using msleep() instead of the hand-crafted version. Noticed by Andrew Morton. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-28 12:12:58 -07:00
Linus Torvalds	839c5d2511	Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6	2005-07-27 16:37:59 -07:00
Jesper Juhl	77933d7276	[PATCH] clean up inline static vs static inline `gcc -W' likes to complain if the static keyword is not at the beginning of the declaration. This patch fixes all remaining occurrences of "inline static" up with "static inline" in the entire kernel tree (140 occurrences in 47 files). While making this change I came across a few lines with trailing whitespace that I also fixed up, I have also added or removed a blank line or two here and there, but there are no functional changes in the patch. Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-07-27 16:26:20 -07:00
Olaf Hering	44456d37b5	[PATCH] turn many #if $undefined_string into #ifdef $undefined_string turn many #if $undefined_string into #ifdef $undefined_string to fix some warnings after -Wno-def was added to global CFLAGS Signed-off-by: Olaf Hering <olh@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-07-27 16:26:08 -07:00
Matt Mackall	5e43db7730	[NET]: Move in_aton from net/ipv4/utils.c to net/core/utils.c Move in_aton to allow netpoll and pktgen to work without the rest of the IPv4 stack. Fix whitespace and add comment for the odd placement. Delete now-empty net/ipv4/utils.c Re-enable netpoll/netconsole without CONFIG_INET Signed-off-by: Matt Mackall <mpm@selenic.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-27 15:24:42 -07:00
Nick Sillik	7cee432a22	[NETFILTER]: Fix -Wunder error in ip_conntrack_core.c Signed-off-by: Nick Sillik <n.sillik@temple.edu> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-27 14:46:03 -07:00
Kyle Moffett	a77be819f9	[NET]: Fix setsockopt locking bug On Sparc, SO_DONTLINGER support resulted in sock_reset_flag being called without lock_sock(). Signed-off-by: Kyle Moffett <mrmacman_g4@mac.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-27 14:22:30 -07:00
Hans-Juergen Tappe (SYSGO AG)	eaa1c5d059	[IPV4]: Fix Kconfig syntax error From: "Hans-Juergen Tappe (SYSGO AG)" <hjt@sysgo.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-27 13:00:04 -07:00
Herbert Xu	a4f1bac625	[XFRM]: Fix possible overflow of sock->sk_policy Spotted by, and original patch by, Balazs Scheidler. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-26 15:43:17 -07:00
Patrick McHardy	7686ee1ad9	[EMATCH]: Remove feature ifdefs in meta ematch. Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-24 19:44:23 -07:00
Cal Peake	227510c7f1	[IPV6]: fix implicit declaration of function `xfrm6_tunnel_unregister' Signed-off-by: Cal Peake <cp@absolutedigital.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-24 19:30:06 -07:00
David S. Miller	261688d01e	[PKT_SCHED]: em_meta: Kill TCF_META_ID_{INDEV,SECURITY,TCVERDICT} More unusable TCF_META_* match types that need to get eliminated before 2.6.13 goes out the door. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Thomas Graf <tgraf@suug.ch>	2005-07-22 14:43:52 -07:00
Patrick McHardy	d3984a6b6a	[NETFILTER]: Fix ip6t_LOG MAC format I broke this in the patch that consolidated MAC logging. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-22 12:52:47 -07:00
Patrick McHardy	74bb421da7	[NETFILTER]: Use correct byteorder in ICMP NAT Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-22 12:51:38 -07:00
Patrick McHardy	21f930e4ab	[NETFILTER]: Wait until all references to ip_conntrack_untracked are dropped on unload Fixes a crash when unloading ip_conntrack. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-22 12:51:03 -07:00
Patrick McHardy	d04b4f8c1c	[NETFILTER]: Fix potential memory corruption in NAT code (aka memory NAT) The portptr pointing to the port in the conntrack tuple is declared static, which could result in memory corruption when two packets of the same protocol are NATed at the same time and one conntrack goes away. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-22 12:50:29 -07:00
Patrick McHardy	4c1217deeb	[NETFILTER]: Fix deadlock in ip6_queue Already fixed in ip_queue, ip6_queue was missed. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-22 12:49:30 -07:00
David S. Miller	28e212fb36	[PKT_SCHED]: Kill TCF_META_ID_REALDEV from meta ematch. It won't exist any longer when we shrink the SKB in 2.6.14, and we should kill this off before anyone in userspace starts using it. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Thomas Graf <tgraf@suug.ch>	2005-07-22 11:47:25 -07:00
Rusty Russell	4acdbdbe50	[NETFILTER]: ip_conntrack_expect_related must not free expectation If a connection tracking helper tells us to expect a connection, and we're already expecting that connection, we simply free the one they gave us and return success. The problem is that NAT helpers (eg. FTP) have to allocate the expectation first (to see what port is available) then rewrite the packet. If that rewrite fails, they try to remove the expectation, but it was freed in ip_conntrack_expect_related. This is one example of a larger problem: having registered the expectation, the pointer is no longer ours to use. Reference counting is needed for ctnetlink anyway, so introduce it now. To have a single "put" path, we need to grab the reference to the connection on creation, rather than open-coding it in the caller. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-21 13:14:46 -07:00
David S. Miller	b72f6eccb0	[NET]: Fix tc_verd thinko in skb_clone() It was overwriting the computer n->tc_verd value over and over with skb->tc_verd, by mistake. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-19 14:13:54 -07:00
Patrick McHardy	0303770deb	[NET]: Make ipip/ip6_tunnel independant of XFRM Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-19 14:03:34 -07:00
Stephen Hemminger	c877efb207	[IPV4]: Fix up lots of little whitespace indentation stuff in fib_trie. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-19 14:01:51 -07:00
Adrian Bunk	eb3f8f5e22	[NET]: BRIDGE_EBT_ARPREPLY must depend on INET BRIDGE_EBT_ARPREPLY=y and INET=n results in the following compile error: net/built-in.o: In function `ebt_target_reply': ebt_arpreply.c:(.text+0x68fb9): undefined reference to `arp_send' make: *** [.tmp_vmlinux1] Error 1 Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-19 14:00:13 -07:00
Patrick McHardy	abaacad9bc	[IPV4]: Don't select XFRM for ip_gre Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-19 13:59:17 -07:00
Patrick McHardy	6aef4fdfea	[NET]: Only build flow.o if CONFIG_XFRM=y Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-19 13:58:40 -07:00
Jesper Juhl	88e9fa8a54	[ATM]: Trivial spelling fix patch for net/Kconfig Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-19 13:56:53 -07:00
Chas Williams	322361b371	[ATM]: allow bind() on point-to-multpoint svcs (from Martin Whitaker <martin_whitaker@ntlworld.com>) Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-19 13:54:44 -07:00
David S. Miller	3f1c81ff10	[EMATCH]: Kill TCF_META_ID_TCCLASSID reference from meta ematch as well. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-18 17:10:55 -07:00
Adrian Bunk	6876f95f20	[IPV4]: fix IP_FIB_HASH kconfig warning This patch fixes the following kconfig warning: net/ipv4/Kconfig:92:warning: defaults for choice values not supported Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-18 13:55:19 -07:00
Randy Dunlap	54208991e1	[NET]: Kconfig: NETCONSOLE and NETPOLL together Put NETCONSOLE and NETPOLL options together since they are related. This cuts down on the hassle of flipping back and forth between the Networking menu and the Network drivers menu to change their config settings. Tested with menuconfig, gconfig, and xconfig. gconfig has a small problem with this. I think that it's a bug in gconfig and I will take it up with Romain Lievin. Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-18 13:45:12 -07:00
Sridhar Samudrala	d1ad1ff299	[SCTP]: Fix potential null pointer dereference while handling an icmp error Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-18 13:44:10 -07:00
Christophe Lucas	ee71a29eb5	[SCTP]: Audit return code of create_proc_* From: Christophe Lucas <clucas@rotomalug.org> Audit return of create_proc_* functions. Signed-off-by: Christophe Lucas <clucas@rotomalug.org> Signed-off-by: Domen Puncer <domen@coderock.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-18 13:38:07 -07:00
Victor Fusco	37da647d99	[NETLINK]: Fix "nocast type" warnings From: Victor Fusco <victor@cetuc.puc-rio.br> Fix the sparse warning "implicit cast to nocast type" Signed-off-by: Victor Fusco <victor@cetuc.puc-rio.br> Signed-off-by: Domen Puncer <domen@coderock.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-18 13:35:43 -07:00
Thomas Graf	452f299da3	[PKT_SCHED]: Reduce branch mispredictions in pfifo_fast_dequeue The current call to __qdisc_dequeue_head leads to a branch misprediction for every loop iteration, the fact that the most common priority is 2 makes this even worse. This issue has been brought up by Eric Dumazet <dada1@cosmosbay.com> but unlike his solution which was to manually unroll the loop, this approach preserves the possibility to increase the number of bands at compile time. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-18 13:30:53 -07:00
Thomas Graf	d7c7ed4dbc	[PKT_SCHED]: Remove debugging leftover from textsearch ematch Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-18 13:29:49 -07:00
Tommy Christensen	f4637b55ba	[VLAN]: Fix early vlan adding leads to not functional device OK, I can see what's happening here. eth0 doesn't detect link-up until after a few seconds, so when the vlan interface is opened immediately after eth0 has been opened, it inherits the link-down state. Subsequently the vlan interface is never properly activated and are thus unable to transmit any packets. dev->state bits are not supposed to be manipulated directly. Something similar is probably needed for the netif_device_present() bit, although I don't know how this is meant to work for a virtual device. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-12 12:13:49 -07:00
Alexey Dobriyan	ab611487d8	[NET]: __be'ify *_type_trans() tr_type_trans(), hippi_type_trans() left as-is. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-12 12:08:43 -07:00
Phil Oester	84531c24f2	[NETFILTER]: Revert nf_reset change Revert the nf_reset change that caused so much trouble, drop conntrack references manually before packets are queued to packet sockets. Signed-off-by: Phil Oester <kernel@linuxace.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-12 11:57:52 -07:00
Sam Ravnborg	6a2e9b738c	[NET]: move config options out to individual protocols Move the protocol specific config options out to the specific protocols. With this change net/Kconfig now starts to become readable and serve as a good basis for further re-structuring. The menu structure is left almost intact, except that indention is fixed in most cases. Most visible are the INET changes where several "depends on INET" are replaced with a single ifdef INET / endif pair. Several new files were created to accomplish this change - they are small but serve the purpose that config options are now distributed out where they belongs. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-11 21:13:56 -07:00
Sam Ravnborg	d5950b4355	[NET]: add a top-level Networking menu to config Create a new top-level menu named "Networking" thus moving net related options and protocol selection way from the drivers menu and up on the top-level where they belong. To implement this all architectures has to source "net/Kconfig" before drivers//Kconfig in their Kconfig file. This change has been implemented for all architectures. Device drivers for ordinary NIC's are still to be found in the Device Drivers section, but Bluetooth, IrDA and ax25 are located with their corresponding menu entries under the new networking menu item. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-11 21:03:49 -07:00
Olaf Kirch	0b7f22aab4	[IPV4]: Prevent oops when printing martian source In some cases, we may be generating packets with a source address that qualifies as martian. This can happen when we're in the middle of setting up the network, and netfilter decides to reject a packet with an RST. The IPv4 routing code would try to print a warning and oops, because locally generated packets do not have a valid skb->mac.raw pointer at this point. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-11 21:01:42 -07:00
Julian Anastasov	af9debd461	[IPVS]: Add and reorder bh locks after moving to keventd. An addition to the last ipvs changes that move update_defense_level/si_meminfo to keventd: - ip_vs_random_dropentry now runs in process context and should use _bh locks to protect from softirqs - update_defense_level still needs _bh locks after si_meminfo is called, for the same purpose Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-11 20:59:57 -07:00
Jesper Juhl	f5b8adb4f5	[NET]: Trivial spelling fix patch for net/Kconfig Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-11 20:59:03 -07:00
Alexey Dobriyan	3182cd84f0	[SCTP]: __nocast annotations Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-11 20:57:47 -07:00
David S. Miller	79af02c253	[SCTP]: Use struct list_head for chunk lists, not sk_buff_head. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-08 21:47:49 -07:00
David S. Miller	9c05989bb2	[IPV6]: Fix warning in ip6_mc_msfilter. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-08 21:44:39 -07:00
David L Stevens	84b42baef7	[IPV4]: fix IPv4 leave-group group matching This patch fixes the multicast group matching for IP_DROP_MEMBERSHIP, similar to the IP_ADD_MEMBERSHIP fix in a prior patch. Groups are identifiedby <group address,interface> and including the interface address in the match will fail if a leave-group is done by address when the join was done by index, or if different addresses on the same interface are used in the join and leave. Signed-off-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-08 17:48:38 -07:00
David L Stevens	9951f036fe	[IPV4]: (INCLUDE,empty)/leave-group equivalence for full-state MSF APIs & errno fix 1) Adds (INCLUDE, empty)/leave-group equivalence to the full-state multicast source filter APIs (IPv4 and IPv6) 2) Fixes an incorrect errno in the IPv6 leave-group (ENOENT should be EADDRNOTAVAIL) Signed-off-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-08 17:47:28 -07:00
David L Stevens	917f2f105e	[IPV4]: multicast API "join" issues 1) In the full-state API when imsf_numsrc == 0 errno should be "0", but returns EADDRNOTAVAIL 2) An illegal filter mode change errno should be EINVAL, but returns EADDRNOTAVAIL 3) Trying to do an any-source option without IP_ADD_MEMBERSHIP errno should be EINVAL, but returns EADDRNOTAVAIL 4) Adds comments for the less obvious error return values Signed-off-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-08 17:45:16 -07:00
David L Stevens	8cdaaa15da	[IPV4]: multicast API "join" issues 1) Changes IP_ADD_SOURCE_MEMBERSHIP and MCAST_JOIN_SOURCE_GROUP to ignore EADDRINUSE errors on a "courtesy join" -- prior membership or not is ok for these. 2) Adds "leave group" equivalence of (INCLUDE, empty) filters in the delta-based API. Without this, mixing delta-based API calls that end in an (INCLUDE, empty) filter would not allow a subsequent regular IP_ADD_MEMBERSHIP. It also frees socket buffer memory that isn't needed for both the multicast group record and source filter. Signed-off-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-08 17:39:23 -07:00
David L Stevens	ca9b907d14	[IPV4]: multicast API "join" issues This patch corrects a few problems with the IP_ADD_MEMBERSHIP socket option: 1) The existing code makes an attempt at reference counting joins when using the ip_mreqn/imr_ifindex interface. Joining the same group on the same socket is an error, whatever the API. This leads to unexpected results when mixing ip_mreqn by index with ip_mreqn by address, ip_mreq, or other API's. For example, ip_mreq followed by ip_mreqn of the same group will "work" while the same two reversed will not. Fixed to always return EADDRINUSE on a duplicate join and removed the (now unused) reference count in ip_mc_socklist. 2) The group-search list in ip_mc_join_group() is comparing a full ip_mreqn structure and all of it must match for it to find the group. This doesn't correctly match a group that was joined with ip_mreq or ip_mreqn with an address (with or without an index). It also doesn't match groups that are joined by different addresses on the same interface. All of these are the same multicast group, which is identified by group address and interface index. Fixed the check to correctly match groups so we don't get duplicate group entries on the ip_mc_socklist. 3) The old code allocates a multicast address before searching for duplicates requiring it to free in various error cases. This patch moves the allocate until after the search and igmp_max_memberships check, so never a need to allocate, then free an entry. Signed-off-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-08 17:38:07 -07:00
Alexey Kuznetsov	4c866aa798	[IPV4]: Apply sysctl_icmp_echo_ignore_broadcasts to ICMP_TIMESTAMP as well. This was the full intention of the original code. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-08 17:34:46 -07:00
Victor Fusco	86a76caf87	[NET]: Fix sparse warnings From: Victor Fusco <victor@cetuc.puc-rio.br> Fix the sparse warning "implicit cast to nocast type" Signed-off-by: Victor Fusco <victor@cetuc.puc-rio.br> Signed-off-by: Domen Puncer <domen@coderock.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-08 14:57:47 -07:00
David S. Miller	b03efcfb21	[NET]: Transform skb_queue_len() binary tests into skb_queue_empty() This is part of the grand scheme to eliminate the qlen member of skb_queue_head, and subsequently remove the 'list' member of sk_buff. Most users of skb_queue_len() want to know if the queue is empty or not, and that's trivially done with skb_queue_empty() which doesn't use the skb_queue_head->qlen member and instead uses the queue list emptyness as the test. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-08 14:57:23 -07:00
KAMBAROV, ZAUR	7e8d7e3c9e	[PATCH] coverity: sunrpc/xprt task null check In __xprt_lock_write() we check to see if `task' is NULL, but in other places we just go and dereference it. `task' shouldn't be NULL anyway, so remove this test. This defect was found automatically by Coverity Prevent, a static analysis tool. Signed-off-by: Zaur Kambarov <zkambarov@coverity.com> Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-07-07 18:23:47 -07:00
David S. Miller	908a75c17a	[TCP]: Never TSO defer under periods of congestion. Congestion window recover after loss depends upon the fact that if we have a full MSS sized frame at the head of the send queue, we will send it. TSO deferral can defeat the ACK clocking necessary to exit cleanly from recovery. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 15:43:58 -07:00
Thomas Graf	63d886c96b	[PKT_SCHED]: Blackhole queueing discipline Useful in combination with classful qdiscs to drop or temporary disable certain flows, e.g. one could block specific ds flows with dsmark. Unlike the noop qdisc it can be controlled by the user and statistic accounting is done. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 15:29:16 -07:00
David S. Miller	c1b4a7e695	[TCP]: Move to new TSO segmenting scheme. Make TSO segment transmit size decisions at send time not earlier. The basic scheme is that we try to build as large a TSO frame as possible when pulling in the user data, but the size of the TSO frame output to the card is determined at transmit time. This is guided by tp->xmit_size_goal. It is always set to a multiple of MSS and tells sendmsg/sendpage how large an SKB to try and build. Later, tcp_write_xmit() and tcp_push_one() chop up the packet if necessary and conditions warrant. These routines can also decide to "defer" in order to wait for more ACKs to arrive and thus allow larger TSO frames to be emitted. A general observation is that TSO elongates the pipe, thus requiring a larger congestion window and larger buffering especially at the sender side. Therefore, it is important that applications 1) get a large enough socket send buffer (this is accomplished by our dynamic send buffer expansion code) 2) do large enough writes. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 15:24:38 -07:00
David S. Miller	0d9901df62	[TCP]: Break out send buffer expansion test. This makes it easier to understand, and allows easier tweaking of the heuristic later on. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 15:21:10 -07:00
David S. Miller	cb83199a29	[TCP]: Do not call tcp_tso_acked() if no work to do. In tcp_clean_rtx_queue(), if the TSO packet is not even partially acked, do not waste time calling tcp_tso_acked(). Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 15:20:55 -07:00
David S. Miller	a56476962e	[TCP]: Kill bogus comment above tcp_tso_acked(). Everything stated there is out of data. tcp_trim_skb() does adjust the available socket send buffer space and skb->truesize now. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 15:20:41 -07:00
David S. Miller	b4e26f5ea0	[TCP]: Fix send-side cpu utiliziation regression. Only put user data purely to pages when doing TSO. The extra page allocations cause two problems: 1) Add the overhead of the page allocations themselves. 2) Make us do small user copies when we get to the end of the TCP socket cache page. It is still beneficial to purely use pages for TSO, so we will do it for that case. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 15:20:27 -07:00
David S. Miller	aa93466bdf	[TCP]: Eliminate redundant computations in tcp_write_xmit(). tcp_snd_test() is run for every packet output by a single call to tcp_write_xmit(), but this is not necessary. For one, the congestion window space needs to only be calculated one time, then used throughout the duration of the loop. This cleanup also makes experimenting with different TSO packetization schemes much easier. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 15:20:09 -07:00
David S. Miller	7f4dd0a943	[TCP]: Break out tcp_snd_test() into it's constituent parts. tcp_snd_test() does several different things, use inline functions to express this more clearly. 1) It initializes the TSO count of SKB, if necessary. 2) It performs the Nagle test. 3) It makes sure the congestion window is adhered to. 4) It makes sure SKB fits into the send window. This cleanup also sets things up so that things like the available packets in the congestion window does not need to be calculated multiple times by packet sending loops such as tcp_write_xmit(). Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 15:19:54 -07:00
David S. Miller	55c97f3e99	[TCP]: Fix __tcp_push_pending_frames() 'nonagle' handling. 'nonagle' should be passed to the tcp_snd_test() function as 'TCP_NAGLE_PUSH' if we are checking an SKB not at the tail of the write_queue. This is because Nagle does not apply to such frames since we cannot possibly tack more data onto them. However, while doing this __tcp_push_pending_frames() makes all of the packets in the write_queue use this modified 'nonagle' value. Fix the bug and simplify this function by just calling tcp_write_xmit() directly if sk_send_head is non-NULL. As a result, we can now make tcp_data_snd_check() just call tcp_push_pending_frames() instead of the specialized __tcp_data_snd_check(). Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 15:19:38 -07:00
David S. Miller	a2e2a59c93	[TCP]: Fix redundant calculations of tcp_current_mss() tcp_write_xmit() uses tcp_current_mss(), but some of it's callers, namely __tcp_push_pending_frames(), already has this value available already. While we're here, fix the "cur_mss" argument to be "unsigned int" instead of plain "unsigned". Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 15:19:23 -07:00
David S. Miller	92df7b518d	[TCP]: tcp_write_xmit() tabbing cleanup Put the main basic block of work at the top-level of tabbing, and mark the TCP_CLOSE test with unlikely(). Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 15:19:06 -07:00
David S. Miller	a762a98007	[TCP]: Kill extra cwnd validate in __tcp_push_pending_frames(). The tcp_cwnd_validate() function should only be invoked if we actually send some frames, yet __tcp_push_pending_frames() will always invoke it. tcp_write_xmit() does the call for us, so the call here can simply be removed. Also, tcp_write_xmit() can be marked static. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 15:18:51 -07:00
David S. Miller	f44b527177	[TCP]: Add missing skb_header_release() call to tcp_fragment(). When we add any new packet to the TCP socket write queue, we must call skb_header_release() on it in order for the TSO sharing checks in the drivers to work. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 15:18:34 -07:00
David S. Miller	84d3e7b957	[TCP]: Move __tcp_data_snd_check into tcp_output.c It reimplements portions of tcp_snd_check(), so it we move it to tcp_output.c we can consolidate it's logic much easier in a later change. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 15:18:18 -07:00
David S. Miller	f6302d1d78	[TCP]: Move send test logic out of net/tcp.h This just moves the code into tcp_output.c, no code logic changes are made by this patch. Using this as a baseline, we can begin to untangle the mess of comparisons for the Nagle test et al. We will also be able to reduce all of the redundant computation that occurs when outputting data packets. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 15:18:03 -07:00
David S. Miller	fc6415bcb0	[TCP]: Fix quick-ack decrementing with TSO. On each packet output, we call tcp_dec_quickack_mode() if the ACK flag is set. It drops tp->ack.quick until it hits zero, at which time we deflate the ATO value. When doing TSO, we are emitting multiple packets with ACK set, so we should decrement tp->ack.quick that many segments. Note that, unlike this case, tcp_enter_cwr() should not take the tcp_skb_pcount(skb) into consideration. That function, one time, readjusts tp->snd_cwnd and moves into TCP_CA_CWR state. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 15:17:45 -07:00
David S. Miller	c65f7f00c5	[TCP]: Simplify SKB data portion allocation with NETIF_F_SG. The ideal and most optimal layout for an SKB when doing scatter-gather is to put all the headers at skb->data, and all the user data in the page array. This makes SKB splitting and combining extremely simple, especially before a packet goes onto the wire the first time. So, when sk_stream_alloc_pskb() is given a zero size, make sure there is no skb_tailroom(). This is achieved by applying SKB_DATA_ALIGN() to the header length used here. Next, make select_size() in TCP output segmentation use a length of zero when NETIF_F_SG is true on the outgoing interface. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 15:17:25 -07:00
David Chau	52609c0b56	[NET]: improve readability of dev_set_promiscuity() in net/core/dev.c A trivial patch to improve the readability of dev_set_promiscuity() in net/core/dev.c. New code does exactly the same thing as original code. Signed-off-by: David Chau <ddcc@mit.edu> Signed-off-by: Domen Puncer <domen@coderock.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 15:11:06 -07:00
Robert Olsson	2f36895aa7	[IPV4]: More broken memory allocation fixes for fib_trie Below a patch to preallocate memory when doing resize of trie (inflate halve) If preallocations fails it just skips the resize of this tnode for this time. The oops we got when killing bgpd (with full routing) is now gone. Patrick memory patch is also used. Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 15:02:40 -07:00
Thomas Graf	db1322b801	[DECNET]: Fix memset overflow on 64bit archs while dumping decnet routing rules Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 15:01:25 -07:00
Eric Dumazet	bb1d23b026	[IPV4]: Bug fix in rt_check_expire() - rt_check_expire() fixes (an overflow occured if size of the hash was >= 65536) reminder of the bugfix: The rt_check_expire() has a serious problem on machines with large route caches, and a standard HZ value of 1000. With default values, ie ip_rt_gc_interval = 60*HZ = 60000 ; the loop count : for (t = ip_rt_gc_interval << rt_hash_log; t >= 0; overflows (t is a 31 bit value) as soon rt_hash_log is >= 16 (65536 slots in route cache hash table). In this case, rt_check_expire() does nothing at all Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 15:00:32 -07:00
Eric Dumazet	424c4b70cc	[IPV4]: Use the fancy alloc_large_system_hash() function for route hash table - rt hash table allocated using alloc_large_system_hash() function. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 14:58:19 -07:00
Eric Dumazet	22c047ccbc	[NET]: Hashed spinlocks in net/ipv4/route.c - Locking abstraction - Spinlocks moved out of rt hash table : Less memory (50%) used by rt hash table. it's a win even on UP. - Sizing of spinlocks table depends on NR_CPUS Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 14:55:24 -07:00
Patrick McHardy	f0e36f8cee	[IPV4]: Handle large allocations in fib_trie Inflating a node a couple of times makes it exceed the 128k kmalloc limit. Use __get_free_pages for allocations > PAGE_SIZE, as in fib_hash. Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Robert Olsson <Robert.Olsson@data.slu.se> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 14:44:55 -07:00
Herbert Xu	e2ed4052aa	[IPV6]: Makes IPv6 rcv registration happen last during initialisation. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 14:41:20 -07:00
Herbert Xu	30e224d76f	[IPV4]: Fix crash in ip_rcv while booting related to netconsole Makes IPv4 ip_rcv registration happen last in af_inet. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 14:40:10 -07:00
Thomas Graf	023e09a767	[PKT_SCHED]: Report rate estimator configuration errors during qdisc allocation Current behaviour is to not report an error if a rate estimator is created together with a qdisc and the configuration of the rate estimator is bogus. This leads to unexpected behaviour because the user is not notified. New behaviour is to report the error and let the whole qdisc creation operation fail so the user is able to fix his mistake. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 14:15:53 -07:00
Thomas Graf	3d54b82fdf	[PKT_SCHED]: Cleanup qdisc creation and alignment macros Adds qdisc_alloc() to share code between qdisc_create() and qdisc_create_dflt(). Hides the qdisc alignment behind macros and makes use of them. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 14:15:09 -07:00
Thomas Graf	e176fe8954	[NET]: Remove unused security member in sk_buff Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 14:12:44 -07:00
Patrick McHardy	3154e540e3	[NET]: net/core/filter.c: make len cover the entire packet As suggested by Herbert Xu: Since we don't require anything to be in the linear packet range anymore make len cover the entire packet. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 14:10:40 -07:00
Patrick McHardy	0b05b2a49e	[NET]: Consolidate common code in net/core/filter.c Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 14:10:21 -07:00
Patrick McHardy	6935d46c2d	[NET]: Remove redundant code in net/core/filter.c skb_header_pointer handles linear and non-linear data, no need to handle linear data again. Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 14:08:57 -07:00
Patrick McHardy	9666dae510	[NETFILTER]: Fix connection tracking bug in 2.6.12 In 2.6.12 we started dropping the conntrack reference when a packet leaves the IP layer. This broke connection tracking on a bridge, because bridge-netfilter defers calling some NF_IP_* hooks to the bridge layer for locally generated packets going out a bridge, where the conntrack reference is no longer available. This patch keeps the reference in this case as a temporary solution, long term we will remove the defered hook calling. No attempt is made to drop the reference in the bridge-code when it is no longer needed, tc actions could already have sent the packet anywhere. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-28 16:04:44 -07:00
Denis Vlasenko	ff593c592a	[NET]: Micro optimization in eth_header() Signed-off-by: Denis Vlasenko <vda@ilport.com.ua> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-28 15:49:06 -07:00
YOSHIFUJI Hideaki	7fe40f73d7	[IPV6]: remove more unused IPV6_AUTHHDR things. Remove two more unused IPV6_AUTHHDR option things, which I failed to remove them last time, plus, mark IPV6_AUTHHDR obsolete. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-28 15:46:24 -07:00
Neil Horman	fb3d89498d	[IPVS]: Close race conditions on ip_vs_conn_tab list modification In an smp system, it is possible for an connection timer to expire, calling ip_vs_conn_expire while the connection table is being flushed, before ct_write_lock_bh is acquired. Since the list iterator loop in ip_vs_con_flush releases and re-acquires the spinlock (even though it doesn't re-enable softirqs), it is possible for the expiration function to modify the connection list, while it is being traversed in ip_vs_conn_flush. The result is that the next pointer gets set to NULL, and subsequently dereferenced, resulting in an oops. Signed-off-by: Neil Horman <nhorman@redhat.com> Acked-by: JulianAnastasov Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-28 15:40:02 -07:00
Robert Olsson	f835e471b5	[IPV4]: Broken memory allocation in fib_trie This should help up the insertion... but the resize is more crucial. and complex and needs some thinking. Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-28 15:00:39 -07:00
Vlad Yasevich	2f85a42964	[SCTP] Make init & delayed sack timeouts configurable by user. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-28 13:24:23 -07:00
Maxime Bizon	7a1af5d7bb	[IPV4]: ipconfig.c: fix dhcp timeout behaviour I think there is a small bug in ipconfig.c in case IPCONFIG_DHCP is set and dhcp is used. When a DHCPOFFER is received, ip address is kept until we get DHCPACK. If no ack is received, ic_dynamic() returns negatively, but leaves the offered ip address in ic_myaddr. This makes the main loop in ip_auto_config() break and uses the maybe incomplete configuration. Not sure if it's the best way to do, but the following trivial patch correct this. Signed-off-by: Maxime Bizon <mbizon@freebox.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-28 13:21:12 -07:00
Dietmar Eggemann	2c2910a401	[IPV4]: Snmpv2 Mib IP counter ipInAddrErrors support I followed Thomas' proposal to see every martian destination as a case where the ipInAddrErrors counter has to be incremented. There are two advantages by doing so: (1) The relation between the ipInReceive counter and all the other ipInXXX counters is more accurate in the case the RTN_UNICAST code check fails and (2) it makes the code in ip_route_input_slow easier. Signed-off-by: Dietmar Eggemann <dietmar.eggemann@gmx.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-28 13:06:23 -07:00
YOSHIFUJI Hideaki	ae9cda5d65	[IPV6]: Don't dump temporary addresses twice Each IPv6 Temporary Address (w/ CONFIG_IPV6_PRIVACY) is dumped twice to netlink. Because temporary addresses are listed in idev->addr_list, there's no need to dump idev->tempaddr separately. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-28 13:00:30 -07:00
Patrick McHardy	8a47077a0b	[NETLINK]: Missing padding fields in dumped structures Plug holes with padding fields and initialized them to zero. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-28 12:56:45 -07:00
Patrick McHardy	9ef1d4c7c7	[NETLINK]: Missing initializations in dumped data Mostly missing initialization of padding fields of 1 or 2 bytes length, two instances of uninitialized nlmsgerr->msg of 16 bytes length. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-28 12:55:30 -07:00
Patrick McHardy	b3563c4fbf	[NETLINK]: Clear padding in netlink messages Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-28 12:54:43 -07:00
Harald Welte	4095ebf1e6	[NETFILTER]: ipt_CLUSTERIP: fix ARP mangling This patch adds mangling of ARP requests (in addition to replies), since ARP caches are made from snooping both requests and replies. Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-28 12:49:30 -07:00
David S. Miller	85c1937b26	[EBTABLES]: Fix thinkos in ebt_log.c When converting over the skb_header_pointer(), I converted parts of this module incorrectly. Kill the 'u' union in ebt_log() and all the bogus references to it. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-28 12:39:40 -07:00
pageexec	4da62fc70d	[IPVS]: Fix for overflows From: <pageexec@freemail.hu> $subject was fixed in 2.4 already, 2.6 needs it as well. The impact of the bugs is a kernel stack overflow and privilege escalation from CAP_NET_ADMIN via the IP_VS_SO_SET_STARTDAEMON/IP_VS_SO_GET_DAEMON ioctls. People running with 'root=all caps' (i.e., most users) are not really affected (there's nothing to escalate), but SELinux and similar users should take it seriously if they grant CAP_NET_ADMIN to other users. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-26 16:00:19 -07:00
David S. Miller	d470e3b483	[NETLINK]: Fix two socket hashing bugs. 1) netlink_release() should only decrement the hash entry count if the socket was actually hashed. This was causing hash->entries to underflow, which resulting in all kinds of troubles. On 64-bit systems, this would cause the following conditional to erroneously trigger: err = -ENOMEM; if (BITS_PER_LONG > 32 && unlikely(hash->entries >= UINT_MAX)) goto err; 2) netlink_autobind() needs to propagate the error return from netlink_insert(). Otherwise, callers will not see the error as they should and thus try to operate on a socket with a zero pid, which is very bad. However, it should not propagate -EBUSY. If two threads race to autobind the socket, that is fine. This is consistent with the autobind behavior in other protocols. So bug #1 above, combined with this one, resulted in hangs on netlink_sendmsg() calls to the rtnetlink socket. We'd try to do the user sendmsg() with the socket's pid set to zero, later we do a socket lookup using that pid (via the value we stashed away in NETLINK_CB(skb).pid), but that won't give us the user socket, it will give us the rtnetlink socket. So when we try to wake up the receive queue, we dive back into rtnetlink_rcv() which tries to recursively take the rtnetlink semaphore. Thanks to Jakub Jelink for providing backtraces. Also, thanks to Herbert Xu for supplying debugging patches to help track this down, and also finding a mistake in an earlier version of this fix. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-26 15:31:51 -07:00
Robert Olsson	64053beeb5	[PKTGEN]: Fix random packet sizes causing panic Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-26 15:27:10 -07:00
Adrian Bunk	60fe740320	[TCP]: Let TCP_CONG_ADVANCED default to n It doesn't seem to make much sense to let an "If unsure, say N." option default to y. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-26 15:21:15 -07:00
David S. Miller	6c3607676c	[IPV4]: Fix thinko in TCP_CONG_BIC default. Since it is tristate when we offer it as a choice, we should definte it also as tristate when forcing it as the default. Otherwise kconfig warns. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-26 15:20:20 -07:00
Linus Torvalds	2031d0f586	Merge Christoph's freeze cleanup patch	2005-06-25 17:16:53 -07:00
Christoph Lameter	3e1d1d28d9	[PATCH] Cleanup patch for process freezing 1. Establish a simple API for process freezing defined in linux/include/sched.h: frozen(process) Check for frozen process freezing(process) Check if a process is being frozen freeze(process) Tell a process to freeze (go to refrigerator) thaw_process(process) Restart process frozen_process(process) Process is frozen now 2. Remove all references to PF_FREEZE and PF_FROZEN from all kernel sources except sched.h 3. Fix numerous locations where try_to_freeze is manually done by a driver 4. Remove the argument that is no longer necessary from two function calls. 5. Some whitespace cleanup 6. Clear potential race in refrigerator (provides an open window of PF_FREEZE cleared before setting PF_FROZEN, recalc_sigpending does not check PF_FROZEN). This patch does not address the problem of freeze_processes() violating the rule that a task may only modify its own flags by setting PF_FREEZE. This is not clean in an SMP environment. freeze(process) is therefore not SMP safe! Signed-off-by: Christoph Lameter <christoph@lameter.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-25 17:10:13 -07:00
David S. Miller	c54d7e03c3	[SUNRPC]: Fix {s,}size_t printf format strings in xprt.c Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-24 19:57:07 -07:00
David S. Miller	a6484045fd	[TCP]: Do not present confusing congestion control options by default. Create TCP_CONG_ADVANCED option, akin to IP_ADVANCED_ROUTER, which when disabled will bypass all of the congestion control Kconfig options and leave the user with a safe default. That safe default is currently BIC-TCP with new Reno as a fallback. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-24 18:07:51 -07:00
David S. Miller	bb298ca3ce	[IPV4]: Move FIB lookup algorithm choice under IP_ADVANCED_ROUTING Most users need not be concerned with a complex choice of what FIB lookup algorithm to use. So give them the safe default of IP_FIB_HASH if IP_ADVANCED_ROUTING is disabled. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-24 17:50:53 -07:00
David S. Miller	f7704347a7	[PKT_SCHED]: Make TEXTSEARCH* options only selected. Do not present these confusing new options to the user unless he picked some facility that makes use of it, such as NET_EMATCH_TEXT. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-24 17:39:03 -07:00
Linus Torvalds	59a49e3871	Merge rsync://rsync.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6	2005-06-24 00:31:46 -07:00
Adrian Bunk	52c1da3953	[PATCH] make various thing static Another rollup of patches which give various symbols static scope Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-24 00:06:43 -07:00
NeilBrown	5ba266d632	[PATCH] knfsd: nfsd4: fix probe_callback rpc_create_client was modified recently to do its own (synchronous) NULL ping of the server. We'd rather do that on our own, asynchronously, so that we don't have to block the nfsd thread doing the probe, and so that setclientid handling (hence, client mounts) can proceed normally whether the callback is succesful or not. (We can still function fine without the callback channel--we just won't be able to give out delegations till it's verified to work.) Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-24 00:06:30 -07:00
David S. Miller	f2d368fa3e	[PKT_SCHED]: Make NET_EMATCH_TEXT select TEXTSEARCh Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 23:55:41 -07:00
Thomas Graf	d675c989ed	[PKT_SCHED]: Packet classification based on textsearch (ematch) Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 21:00:58 -07:00
Thomas Graf	3fc7e8a6d8	[NET]: skb_find_text() - Find a text pattern in skb data Finds a pattern in the skb data according to the specified textsearch configuration. Use textsearch_next() to retrieve subsequent occurrences of the pattern. Returns the offset to the first occurrence or UINT_MAX if no match was found. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 21:00:17 -07:00
Thomas Graf	677e90eda3	[NET]: Zerocopy sequential reading of skb data Implements sequential reading for both linear and non-linear skb data at zerocopy cost. The data is returned in chunks of arbitary length, therefore random access is not possible. Usage: from := 0 to := 128 state := undef data := undef len := undef consumed := 0 skb_prepare_seq_read(skb, from, to, &state) while (len = skb_seq_read(consumed, &data, &state)) != 0 do /* do something with 'data' of length 'len' / if abort then / abort read if we don't wait for * skb_seq_read() to return 0 / skb_abort_seq_read(&state) return endif / not necessary to consume all of 'len' */ consumed += len done Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 20:59:51 -07:00
Stephen Hemminger	5f8ef48d24	[TCP]: Allow choosing TCP congestion control via sockopt. Allow using setsockopt to set TCP congestion control to use on a per socket basis. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 20:37:36 -07:00
Stephen Hemminger	51b0bdedb8	[NET]: Separate two usages of netdev_max_backlog. Separate out the two uses of netdev_max_backlog. One controls the upper bound on packets processed per softirq, the new name for this is netdev_budget; the other controls the limit on packets queued via netif_rx. Increase the max_backlog default to account for faster processors. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 20:14:40 -07:00
Stephen Hemminger	31aa02c53c	[NET]: Eliminate netif_rx massive packet drops. Eliminate the throttling behaviour when the netif receive queue fills because it behaves badly when using high speed networks under load. The throttling cause multiple packet drops that cause TCP to go into slow start mode. The same effective patch has been part of BIC TCP and H-TCP as well as part of Web100. The existing code drops 100's of packets when the queue fills; this changes it to individual packet drop-tail. Signed-off-by: Stephen Hemmminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 20:12:48 -07:00
Stephen Hemminger	34008d8c63	[NET]: Remove obsolete netif_rx congestion sensing mechanism. Remove the congestion sensing mechanism from netif_rx, and always return either full or empty. Almost no driver checks the return value from netif_rx, and those that do only use it for debug messages. The original design of netif_rx was to do flow control based on the receive queue, but NAPI has supplanted this and no driver uses the feedback. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 20:10:00 -07:00
Stephen Hemminger	c1ebcdb8c4	[NET]: Remove obsolete fastroute stats. Remove last vestiages of fastroute code that is no longer used. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 20:08:59 -07:00
John Heffner	0e57976b63	[TCP]: Add Scalable TCP congestion control module. This patch implements Tom Kelly's Scalable TCP congestion control algorithm for the modular framework. The algorithm has some nice scaling properties, and has been used a fair bit in research, though is known to have significant fairness issues, so it's not really suitable for general purpose use. Signed-off-by: John Heffner <jheffner@psc.edu> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 12:29:07 -07:00
Baruch Even	a7868ea68d	[TCP]: Add H-TCP congestion control module. H-TCP is a congestion control algorithm developed at the Hamilton Institute, by Douglas Leith and Robert Shorten. It is extending the standard Reno algorithm with mode switching is thus a relatively simple modification. H-TCP is defined in a layered manner as it is still a research platform. The basic form includes the modification of beta according to the ratio of maxRTT to min RTT and the alpha=2factor(1-beta) relation, where factor is dependant on the time since last congestion. The other layers improve convergence by adding appropriate factors to alpha. The following patch implements the H-TCP algorithm in it's basic form. Signed-Off-By: Baruch Even <baruch@ev-en.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 12:28:11 -07:00
Stephen Hemminger	b87d8561d8	[TCP]: Add TCP Vegas congestion control module. TCP Vegas code modified for the new TCP infrastructure. Vegas now uses microsecond resolution timestamps for better estimation of performance over higher speed links. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 12:27:19 -07:00
Daniele Lacamera	835b3f0c0d	[TCP]: Add TCP Hybla congestion control module. TCP Hybla congestion avoidance. - "In heterogeneous networks, TCP connections that incorporate a terrestrial or satellite radio link are greatly disadvantaged with respect to entirely wired connections, because of their longer round trip times (RTTs). To cope with this problem, a new TCP proposal, the TCP Hybla, is presented and discussed in the paper[1]. It stems from an analytical evaluation of the congestion window dynamics in the TCP standard versions (Tahoe, Reno, NewReno), which suggests the necessary modifications to remove the performance dependence on RTT.[...]"[1] [1]: Carlo Caini, Rosario Firrincieli, "TCP Hybla: a TCP enhancement for heterogeneous networks", International Journal of Satellite Communications and Networking Volume 22, Issue 5 , Pages 547 - 566. September 2004. Signed-off-by: Daniele Lacamera (root at danielinux.net)net Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 12:26:34 -07:00
John Heffner	a628d29b56	[TCP]: Add High Speed TCP congestion control module. Sally Floyd's high speed TCP congestion control. This is useful for comparison and research. Signed-off-by: John Heffner <jheffner@psc.edu> Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 12:24:58 -07:00
Stephen Hemminger	8727076289	[TCP]: Add TCP Westwood congestion control module. This is the existing 2.6.12 Westwood code moved from tcp_input to the new congestion framework. A lot of the inline functions have been eliminated to try and make it clearer. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 12:24:09 -07:00
Stephen Hemminger	83803034f4	[TCP]: Add TCP BIC congestion control module. TCP BIC congestion control reworked to use the new congestion control infrastructure. This version is more up to date than the BIC code in 2.6.12; it incorporates enhancements from BICTCP 1.1, to handle low latency links. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 12:23:25 -07:00
Stephen Hemminger	056ede6cfa	[TCP]: Report congestion control algorithm in tcp_diag. Enhancement to the tcp_diag interface used by the iproute2 ss command to report the tcp congestion control being used by a socket. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 12:21:28 -07:00
Stephen Hemminger	7c99c909fa	[TCP]: Change tcp_diag to use the existing __RTA_PUT() macro. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 12:20:36 -07:00
Stephen Hemminger	317a76f9a4	[TCP]: Add pluggable congestion control algorithm infrastructure. Allow TCP to have multiple pluggable congestion control algorithms. Algorithms are defined by a set of operations and can be built in or modules. The legacy "new RENO" algorithm is used as a starting point and fallback. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 12:19:55 -07:00

... 2 3 4 5 6 ...

581 Commits