linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-11-18 17:54:13 +08:00

Author	SHA1	Message	Date
Jarek Poplawski	7cd0a63872	pkt_sched: Change misleading code in class delete. While looking for a possible reason of bugzilla report on HTB oops: http://bugzilla.kernel.org/show_bug.cgi?id=12858 I found the code in htb_delete calling htb_destroy_class on zero refcount is very misleading: it can suggest this is a common path, and destroy is called under sch_tree_lock. Actually, this can never happen like this because before deletion cops->get() is done, and after delete a class is still used by tclass_notify. The class destroy is always called from cops->put(), so without sch_tree_lock. This doesn't mean much now (since 2.6.27) because all vulnerable calls were moved from htb_destroy_class to htb_delete, but there was a bug in older kernels. The same change is done for other classful scheds, which, it seems, didn't have similar locking problems here. Reported-by: m0sia <m0sia@m0sia.ru> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-03-15 20:00:19 -07:00
David S. Miller	508827ff0a	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/tokenring/tmspci.c drivers/net/ucc_geth_mii.c	2009-03-05 02:06:47 -08:00
Jarek Poplawski	a883bf564e	pkt_sched: act_police: Fix a rate estimator test. A commit `c1b56878fb` "tc: policing requires a rate estimator" introduced a test which invalidates previously working configs, based on examples from iproute2: doc/actions/actions-general. This is too rigorous: a rate estimator is needed only when police's "avrate" option is used. Reported-by: Joao Correia <joaomiguelcorreia@gmail.com> Diagnosed-by: John Dykstra <john.dykstra1@gmail.com> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-03-04 17:38:10 -08:00
David S. Miller	aa4abc9bcc	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/wireless/iwlwifi/iwl-tx.c net/8021q/vlan_core.c net/core/dev.c	2009-03-01 21:35:16 -08:00
Jarek Poplawski	1844f74794	pkt_sched: sch_drr: Fix oops in drr_change_class. drr_change_class lacks a check for NULL of tca[TCA_OPTIONS], so oops is possible. Reported-by: Denys Fedoryschenko <denys@visp.net.lb> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-02-27 02:42:38 -08:00
Jarek Poplawski	149490f131	pkt_sched: sch_multiq: Change errno on non-multiqueue devices use. Current "RTNETLINK answers: Invalid argument" warning, while trying to add multiq qdisc to non-multiqueue device, isn't very helpful and some of these devs can be changed btw., so let's use a better errno. With feedback from Stephen Hemminger <shemminger@vyatta.com> Reported-by: Badalian Vyacheslav <slavon@bigtelecom.ru> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-02-10 00:11:21 -08:00
Jarek Poplawski	1224736d97	pkt_sched: sch_htb: Use workqueue to schedule after too many events. Patrick McHardy <kaber@trash.net> suggested using a workqueue instead of hrtimers to trigger netif_schedule() when there is a problem with setting exact time of this event: 'The differnce - yeah, it shouldn't make much, mainly wake up the qdisc earlier (but not too early) after "too many events" occured _and_ no further enqueue events wake up the qdisc anyways.' Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-02-01 01:13:22 -08:00
Jarek Poplawski	e82181de5e	pkt_sched: sch_htb: Warn on too many events. Let's get some info on possible config problems. This patch brings back an old warning, but is printed only once now. With feedback from Patrick McHardy <kaber@trash.net> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-02-01 01:13:05 -08:00
Jarek Poplawski	b00355db3f	pkt_sched: sch_hfsc: sch_htb: Add non-work-conserving warning handler. Patrick McHardy <kaber@trash.net> suggested: > How about making this flag and the warning message (in a out-of-line > function) globally available? Other qdiscs (f.i. HFSC) can't deal with > inner non-work-conserving qdiscs as well. This patch uses qdisc->flags field of "suspected" child qdisc. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-02-01 01:12:42 -08:00
Jarek Poplawski	a73be04065	pkt_sched: sch_htb: Break all htb_do_events() after 2 jiffies Currently htb_do_events() breaks events recounting for a level after 2 jiffies, but there is no reason to repeat this for next levels and increase delays even more (with softirqs disabled). htb_dequeue_tree() can add to this too, btw. In such a case q->now time is invalid anyway. Thanks to Patrick McHardy for spotting an error around earlier version of this patch. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-01-12 21:54:40 -08:00
Jarek Poplawski	c085134719	pkt_sched: sch_htb: Consider used jiffies in htb_do_events() Next event time should consider jiffies used for recounting. Otherwise qdisc_watchdog_schedule() triggers hrtimer immediately with the event in the past, and may cause very high ksoftirqd cpu usage (if highres is on). There is also removed checking "event" for zero in htb_dequeue(): it's always true in this place. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-01-12 21:54:16 -08:00
Linus Torvalds	5fbbf5f648	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (84 commits) wimax: fix kernel-doc for debufs_dentry member of struct wimax_dev net: convert pegasus driver to net_device_ops bnx2x: Prevent eeprom set when driver is down net: switch kaweth driver to netdevops pcnet32: round off carrier watch timer i2400m/usb: wrap USB power saving in #ifdef CONFIG_PM wimax: testing for rfkill support should also test for CONFIG_RFKILL_MODULE wimax: fix kconfig interactions with rfkill and input layers wimax: fix '#ifndef CONFIG_BUG' layout to avoid warning r6040: bump release number to 0.20 r6040: warn about MAC address being unset r6040: check PHY status when bringing interface up r6040: make printks consistent with DRV_NAME gianfar: Fixup use of BUS_ID_SIZE mlx4_en: Returning real Max in get_ringparam mlx4_en: Consider inline packets on completion netdev: bfin_mac: enable bfin_mac net dev driver for BF51x qeth: convert to net_device_ops vlan: add neigh_setup dm9601: warn on invalid mac address ...	2009-01-08 14:25:41 -08:00
Fernando Carrijo	c19a28e119	remove lots of double-semicolons Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Acked-by: Theodore Ts'o <tytso@mit.edu> Acked-by: Mark Fasheh <mfasheh@suse.com> Acked-by: David S. Miller <davem@davemloft.net> Cc: James Morris <jmorris@namei.org> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Acked-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-01-08 08:31:14 -08:00
Stephen Hemminger	61294e2e27	sch_teql: convert to net_device_ops Convert this driver to net_device_ops. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-01-06 10:45:57 -08:00
Jarek Poplawski	6f57321422	pkt_sched: cls_u32: Fix locking in u32_change() New nodes are inserted in u32_change() under rtnl_lock() with wmb(), so without tcf_tree_lock() like in other classifiers (e.g. cls_fw). This isn't enough without rmb() on the read side, but on the other hand adding such barriers doesn't give any savings, so the lock is added instead. Reported-by: m0sia <m0sia@plotinka.ru> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-01-05 18:14:19 -08:00
David S. Miller	c276e098d3	Revert "net: Fix for initial link state in 2.6.28" This reverts commit `22604c8668`. We can't fix this issue in this way, because we now can try to take the dev_base_lock rwlock as a writer in software interrupt context and that is not allowed without major surgery elsewhere. This initial link state problem needs to be solved in some other way. Signed-off-by: David S. Miller <davem@davemloft.net>	2009-01-05 16:01:51 -08:00
Michael Marineau	22604c8668	net: Fix for initial link state in 2.6.28 From: Michael Marineau <mike@marineau.org> Commit `b47300168e` "Do not fire linkwatch events until the device is registered." was made as a workaround for drivers that call netif_carrier_off before registering the device. Unfortunately this causes these drivers to incorrectly report their link status as IF_OPER_UNKNOWN which can falsely set the IFF_RUNNING flag when the interface is first brought up. This issues was previously pointed out[1] but was dismissed saying that IFF_RUNNING is not related to the link status. From my digging IFF_RUNNING, as reported to userspace, is based on the link state. It is set based on __LINK_STATE_START and IF_OPER_UP or IF_OPER_UNKNOWN. See [2], [3], and [4]. (Whether or not the kernel has IFF_RUNNING set in flags is not reported to user space so it may well be independent of the link, I don't know if and when it may get set.) The end result depends slightly depending on the driver. The the two I tested were e1000e and b44. With e1000e if the system is booted without a network cable attached the interface will falsely report RUNNING when it is brought up causing NetworkManager to attempt to start it and eventually time out. With b44 when the system is booted with a network cable attached and brought up with dhcpcd it will time out the first time. The attached patch that will still set the operstate variable correctly to IF_OPER_UP/DOWN/etc when linkwatch_fire_event is called but then return rather than skipping the linkwatch_fire_event call entirely as the previous fix did. (sorry it isn't inline, I don't have a patch friendly email client at the moment) Signed-off-by: David S. Miller <davem@davemloft.net>	2009-01-04 17:18:51 -08:00
Li Zefan	68ce9c0e34	cls_cgroup: clean up Kconfig cls_cgroup can't be compiled as a module, since it's not supported by cgroup. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-12-29 19:40:46 -08:00
Li Zefan	8e8ba85417	cls_cgroup: clean up for cgroup part - It's better to use container_of() instead of casting cgroup_subsys_state * to cgroup_cls_state *. - Add helper function task_cls_state(). - Rename net_cls_state() to cgrp_cls_state(). Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-12-29 19:40:45 -08:00
Li Zefan	2f068bf871	cls_cgroup: fix an oops when removing a cgroup When removing a cgroup, an oops was triggered immediately. The cause is wrong kfree() in cgrp_destroy(). Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-12-29 19:40:44 -08:00
Linus Torvalds	0191b625ca	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1429 commits) net: Allow dependancies of FDDI & Tokenring to be modular. igb: Fix build warning when DCA is disabled. net: Fix warning fallout from recent NAPI interface changes. gro: Fix potential use after free sfc: If AN is enabled, always read speed/duplex from the AN advertising bits sfc: When disabling the NIC, close the device rather than unregistering it sfc: SFT9001: Add cable diagnostics sfc: Add support for multiple PHY self-tests sfc: Merge top-level functions for self-tests sfc: Clean up PHY mode management in loopback self-test sfc: Fix unreliable link detection in some loopback modes sfc: Generate unique names for per-NIC workqueues 802.3ad: use standard ethhdr instead of ad_header 802.3ad: generalize out mac address initializer 802.3ad: initialize ports LACPDU from const initializer 802.3ad: remove typedef around ad_system 802.3ad: turn ports is_individual into a bool 802.3ad: turn ports is_enabled into a bool 802.3ad: make ntt bool ixgbe: Fix set_ringparam in ixgbe to use the same memory pools. ... Fixed trivial IPv4/6 address printing conflicts in fs/cifs/connect.c due to the conversion to %pI (in this networking merge) and the addition of doing IPv6 addresses (from the earlier merge of CIFS).	2008-12-28 12:49:40 -08:00
James Morris	cbacc2c7f0	Merge branch 'next' into for-linus	2008-12-25 11:40:09 +11:00
Jarek Poplawski	05a8c1cbfe	pkt_sched: Remove smp_wmb() in qdisc_watchdog() While implementing a TCQ_F_THROTTLED flag there was used an smp_wmb() in qdisc_watchdog(), but since this flag is practically used only in sch_netem(), and since it's not even clear what reordering is avoided here (TCQ_F_THROTTLED vs. __QDISC_STATE_SCHED?) it seems the barrier could be safely removed. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-12-22 19:44:13 -08:00
Jarek Poplawski	7f3ff4f63f	pkt_sched: Annotate uninitialized var in sfq_enqueue() Some gcc versions warn that ret may be used uninitialized in sfq_enqueue(). It's a false positive, so let's annotate this. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-12-21 20:14:48 -08:00
David S. Miller	eb14f01959	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/e1000e/ich8lan.c	2008-12-15 20:03:50 -08:00
Jesper Dangaard Brouer	eb9b851b98	SCHED: netem: Correct documentation comment in code. The netem simulator is no longer limited by Linux timer resolution HZ. Not since Patrick McHardy changed the QoS system to use hrtimer. Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-12-15 00:39:17 -08:00
Jarek Poplawski	512bb43eb5	pkt_sched: sch_htb: Optimize WARN_ONs in htb_dequeue_tree() etc. We can skip WARN_ON() in htb_dequeue_tree() because there should be always a similar warning from htb_lookup_leaf() earlier. The first WARN_ON() in in htb_lookup_leaf() is changed to BUG_ON() because most likly this should end with oops anyway. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-12-09 22:35:02 -08:00
Jarek Poplawski	1b5c0077e1	pkt_sched: sch_htb: Optimize htb_find_next_upper() htb_id_find_next_upper() is usually called to find a class with next id after some previously removed class, so let's move a check for equality to the end: it's the least likely here. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-12-09 22:34:40 -08:00
James Morris	ec98ce480a	Merge branch 'master' into next Conflicts: fs/nfsd/nfs4recover.c Manually fixed above to use new creds API functions, e.g. nfs4_save_creds(). Signed-off-by: James Morris <jmorris@namei.org>	2008-12-04 17:16:36 +11:00
Jarek Poplawski	59e4220a11	pkt_sched: sch_htb: Replace HTB_ACCNT() macro with inlines Replace HTB_ACCNT() macro with inlines to make it more readable. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-12-03 21:17:27 -08:00
Jarek Poplawski	23cb913d25	pkt_sched: sch_htb: Remove L2T() L2T() is currently used only in one place (and has one spurious parameter, btw), so let's: 'get rid of L2T completely, and just use "qdisc_l2t(rate, size)" directly.' - quote & feedback from David S. Miller. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-12-03 21:16:58 -08:00
Jarek Poplawski	c19f7a34f7	pkt_sched: sch_htb: Clean htb_class prio and quantum fields While implementing htb_parent_to_leaf() there where added backup prio and quantum struct htb_class fields to preserve these values for inner classes in case of their return to leaf. This patch cleans this a bit by removing union leaf duplicates. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-12-03 21:09:45 -08:00
Jarek Poplawski	633fe66ed8	pkt_sched: sch_htb: Remove htb_sched nwc_hit field Remove practically unused struct htb_sched nwc_hit field. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-12-03 21:09:10 -08:00
Jarek Poplawski	4164d661b8	pkt_sched: sch_htb: Remove htb_class aprio field Remove practically unused struct htb_class aprio field. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-12-03 21:08:44 -08:00
Hannes Eder	6113b748fb	pkt_sched: fix sparse warning Impact: make global function static Fix the following sparse warning: net/sched/sch_api.c:192:14: warning: symbol 'qdisc_match_from_root' was not declared. Should it be static? Signed-off-by: Hannes Eder <hannes@hanneseder.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-11-28 03:06:46 -08:00
Jarek Poplawski	244e6c2d07	pkt_sched: gen_estimator: Optimize gen_estimator_active() Since all other gen_estimator functions use bstats and rate_est params together, and searching for them is optimized now, let's use this also in gen_estimator_active(). The return type of gen_estimator_active() is changed to bool, and gen_find_node() parameters to const, btw. In tcf_act_police_locate() a check for ACT_P_CREATED is added before calling gen_estimator_active(). Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-11-26 15:24:32 -08:00
Stephen Hemminger	c1b56878fb	tc: policing requires a rate estimator Found that while trying average rate policing, it was possible to request average rate policing without a rate estimator. This results in no policing which is harmless but incorrect. Since policing could be setup in two steps, need to check in the kernel. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-11-25 21:14:06 -08:00
Stephen Hemminger	71bcb09a57	tc: check for errors in gen_rate_estimator creation The functions gen_new_estimator and gen_replace_estimator can return errors, but they were being ignored. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-11-25 21:13:31 -08:00
Stephen Hemminger	0e991ec6a0	tc: propogate errors from tcf_hash_create Allow tcf_hash_create to return different errors on estimator failure. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-11-25 21:13:25 -08:00
Ingo Molnar	dc0a0011cf	pkt_sched: fix warning in net/sched/sch_hfsc.c this warning: net/sched/sch_hfsc.c: In function ‘hfsc_enqueue’: net/sched/sch_hfsc.c:1577: warning: ‘err’ may be used uninitialized in this function triggers because GCC does not recognize the (correct) error flow between hfsc_classify(), 'cl' and 'err'. Annotate it. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-11-25 16:50:02 -08:00
Jarek Poplawski	f6486d40b3	pkt_sched: sch_api: Remove qdisc_list_lock After implementing qdisc->ops->peek() there is no more calling qdisc_tree_decrease_qlen() without rtnl_lock(), so qdisc_list_lock added by commit: `f6e0b239a2` "pkt_sched: Fix qdisc list locking" can be removed. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-11-25 13:56:06 -08:00
Patrick McHardy	3f0947c3ff	pkt_sched: sch_drr: fix drr_dequeue loop() Jarek Poplawski points out: If all child qdiscs of sch_drr are non-work-conserving (e.g. sch_tbf) drr_dequeue() will busy-loop waiting for skbs instead of leaving the job for a watchdog. Checking for list_empty() in each loop isn't necessary either, because this can never be true except the first time. Using non-work-conserving qdiscs as children of DRR makes no sense, simply bail out in that case. Reported-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-11-24 15:46:08 -08:00
Jarek Poplawski	98aa9c80f1	pkt_sched: sch_drr: Fix qlen in drr_drop() Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-11-21 04:37:27 -08:00
David S. Miller	6ab33d5171	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/ixgbe/ixgbe_main.c include/net/mac80211.h net/phonet/af_phonet.c	2008-11-20 16:44:00 -08:00
Patrick McHardy	47a1a1d4be	pkt_sched: remove unnecessary xchg() in packet classifiers The use of xchg() hasn't been necessary since 2.2.something when proper locking was added to packet schedulers. In the case of classifiers they mostly weren't even necessary before that since they're mainly used to assign a NULL pointer to the filter root in the ->destroy path; the root is destroyed immediately after that. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-11-20 04:14:28 -08:00
Patrick McHardy	b94c8afcba	pkt_sched: remove unnecessary xchg() in packet schedulers The use of xchg() hasn't been necessary since 2.2.something when proper locking was added to packet schedulers. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-11-20 04:11:36 -08:00
Patrick McHardy	13d2a1d2b0	pkt_sched: add DRR scheduler Add classful DRR scheduler as a more flexible replacement for SFQ. The main difference to the algorithm described in "Efficient Fair Queueing using Deficit Round Robin" is that this implementation doesn't drop packets from the longest queue on overrun because its classful and limits are handled by each individual child qdisc. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-11-20 04:10:00 -08:00
Patrick McHardy	3aa4614da7	pkt_sched: fix missing check for packet overrun in qdisc_dump_stab() nla_nest_start() might return NULL, causing a NULL pointer dereference. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-11-20 04:07:14 -08:00
Stephen Hemminger	d314774cf2	netdev: network device operations infrastructure This patch changes the network device internal API to move adminstrative operations out of the network device structure and into a separate structure. This patch involves some hackery to maintain compatablity between the new and old model, so all 300+ drivers don't have to be changed at once. For drivers that aren't converted yet, the netdevice_ops virt function list still resides in the net_device structure. For old protocols, the new net_device_ops are copied out to the old net_device pointers. After the transistion is completed the nag message can be changed to an WARN_ON, and the compatiablity code can be made configurable. Some function pointers aren't moved: * destructor can't be in net_device_ops because it may need to be referenced after the module is unloaded. * neighbor setup is manipulated in a couple of places that need special consideration * hard_start_xmit is in the fast path for transmit. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-11-19 21:32:24 -08:00
David S. Miller	b47300168e	net: Do not fire linkwatch events until the device is registered. Several device drivers try to do things like netif_carrier_off() before register_netdev() is invoked. This is bogus, but too many drivers do this to fix them all up in one go. Reported-by: Folkert van Heusden <folkert@vanheusden.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-11-19 15:33:54 -08:00
Alexey Dobriyan	4d24b52ac5	ematch: simpler tcf_em_unregister() Simply delete ops from list and let list debugging do the job. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-11-16 23:01:49 -08:00
Jarek Poplawski	f30ab418a1	pkt_sched: Remove qdisc->ops->requeue() etc. After implementing qdisc->ops->peek() and changing sch_netem into classless qdisc there are no more qdisc->ops->requeue() users. This patch removes this method with its wrappers (qdisc_requeue()), and also unused qdisc->requeue structure. There are a few minor fixes of warnings (htb_enqueue()) and comments btw. The idea to kill ->requeue() and a similar patch were first developed by David S. Miller. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-11-13 22:56:30 -08:00
David Howells	d76b0d9b2d	CRED: Use creds in file structs Attach creds to file structs and discard f_uid/f_gid. file_operations::open() methods (such as hppfs_open()) should use file->f_cred rather than current_cred(). At the moment file->f_cred will be current_cred() at this point. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: James Morris <jmorris@namei.org> Signed-off-by: James Morris <jmorris@namei.org>	2008-11-14 10:39:25 +11:00
Thomas Graf	f400923735	pkt_sched: Control group classifier The classifier should cover the most common use case and will work without any special configuration. The principle of the classifier is to directly access the task_struct via get_current(). In order for this to work, classification requests from softirqs must be ignored. This is not a problem because the vast majority of packets in softirq context are not assigned to a task anyway. For this to work, a mechanism is needed to trace softirq context. This repost goes back to the method of relying on the number of nested bh disable calls for the sake of not adding too much complexity and the option to come up with something more reliable if actually needed. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-11-07 22:56:00 -08:00
Stephen Hemminger	265eb67fb4	netem: eliminate unneeded return values All these individual parsing functions never return an error, so they can be void. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-11-03 21:13:26 -08:00
Jarek Poplawski	67305ebc99	pkt_sched: sch_generic: Kfree gso_skb in qdisc_reset() Since gso_skb is re-used for qdisc_peek_dequeued(), and this skb is counted in the qdisc->q.qlen, it has to be kfreed during qdisc_reset() when qlen is zeroed. With help from David S. Miller <davem@davemloft.net> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-11-03 02:52:50 -08:00
Jarek Poplawski	8ba25dad0a	sch_netem: Replace ->requeue() method with open code After removing netem classful functionality we are sure its inner qdisc is tfifo, so we can replace qdisc->ops->requeue() method with open code. After this patch there are no more ops->requeue() users. The idea of this patch is by Patrick McHardy. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-11-02 00:36:03 -07:00
Jarek Poplawski	0220146411	sch_netem: Remove classful functionality Patrick McHardy noticed that: "a lot of the functionality of netem requires the inner tfifo anyways and rate-limiting is usually done on top of netem. So I would suggest so either hard-wire the tfifo qdisc or at least make the assumption that inner qdiscs are work-conserving.", and later: "- a lot of other qdiscs still don't work as inner qdiscs of netem [...]". So, according to his suggestion, this patch removes classful options of netem. The main reason of this change is to remove ops->requeue() method, which is currently used only by netem. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-11-02 00:35:24 -07:00
Jarek Poplawski	77be155cba	pkt_sched: Add peek emulation for non-work-conserving qdiscs. This patch adds qdisc_peek_dequeued() wrapper to emulate peek method with qdisc->dequeue() and storing "peeked" skb in qdisc->gso_skb until dequeuing. This is mainly for compatibility reasons not to break some strange configs because peeking is expected for non-work-conserving parent qdiscs to query work-conserving child qdiscs. This implementation requires using qdisc_dequeue_peeked() wrapper instead of directly calling qdisc->dequeue() for all qdiscs ever querried with qdisc->ops->peek() or qdisc_peek_dequeued(). Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-10-31 00:47:01 -07:00
Jarek Poplawski	03c05f0d4b	pkt_sched: Use qdisc->ops->peek() instead of ->dequeue() & ->requeue() Use qdisc->ops->peek() instead of ->dequeue() & ->requeue() pair. After this patch the only remaining user of qdisc->ops->requeue() is netem_enqueue(). Based on ideas of Herbert Xu, Patrick McHardy and David S. Miller. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-10-31 00:46:19 -07:00
Jarek Poplawski	8e3af97899	pkt_sched: Add qdisc->ops->peek() implementation. Add qdisc->ops->peek() implementation for work-conserving qdiscs. With feedback from Patrick McHardy. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-10-31 00:45:55 -07:00
Jarek Poplawski	99c0db2679	pkt_sched: sch_generic: Add generic qdisc->ops->peek() implementation. With feedback from Patrick McHardy. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-10-31 00:45:27 -07:00
Patrick McHardy	48a8f519e0	pkt_sched: Add ->peek() methods for fifo, prio and SFQ qdiscs. From: Patrick McHardy <kaber@trash.net> Just as a demonstration how easy adding a peek operation to the work-conserving qdiscs actually is. It doesn't need to keep or change any internal state in many cases thanks to the guarantee that the packet will either be dequeued or, if another packet arrives, the upper qdisc will immediately ->peek again to reevaluate the state. (This is only slightly modified Patrick's patch.) Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-10-31 00:44:18 -07:00
Thomas Gleixner	268a3dcfea	Merge branch 'timers/range-hrtimers' into v28-range-hrtimers-for-linus-v2 Conflicts: kernel/time/tick-sched.c Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2008-10-22 09:48:06 +02:00
Jarek Poplawski	9f3ffae0db	pkt_sched: sch_generic: Fix oops in sch_teql After these commands: # modprobe sch_teql # tc qdisc add dev eth0 root teql0 # tc qdisc del dev eth0 root we get an oops in teql_destroy() when spin_lock is taken from a null qdisc_sleeping pointer. It's because at the moment teql0 dev haven't been activated yet, and a qdisc_root_sleeping() is pointing to noop qdisc's netdev_queue with qdisc_sleeping uninitialized. This patch fixes this both for noop and noqueue netdev_queues to avoid similar problems in the future. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-10-19 23:37:47 -07:00
Arjan van de Ven	651dab4264	Merge commit 'linus/master' into merge-linus Conflicts: arch/x86/kvm/i8254.c	2008-10-17 09:20:26 -07:00
Johannes Berg	95a5afca4a	net: Remove CONFIG_KMOD from net/ (towards removing CONFIG_KMOD entirely) Some code here depends on CONFIG_KMOD to not try to load protocol modules or similar, replace by CONFIG_MODULES where more than just request_module depends on CONFIG_KMOD and and also use try_then_request_module in ebtables. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-10-16 15:24:51 -07:00
Jarek Poplawski	53e9150349	pkt_sched: Update qdisc requeue stats in dev_requeue_skb() After the last change of requeuing there is no info about such incidents in tc stats. This patch updates the counter, but we should consider this should differ from previous stats because of additional checks preventing to repeat this. On the other hand, previous stats didn't include requeuing of gso_segmented skbs. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-10-08 11:36:22 -07:00
Jan Engelhardt	916a917dfe	netfilter: xtables: provide invoked family value to extensions By passing in the family through which extensions were invoked, a bit of data space can be reclaimed. The "family" member will be added to the parameter structures and the check functions be adjusted. Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-10-08 11:35:20 +02:00
Jan Engelhardt	a2df1648ba	netfilter: xtables: move extension arguments into compound structure (6/6) This patch does this for target extensions' destroy functions. Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-10-08 11:35:19 +02:00
Jan Engelhardt	af5d6dc200	netfilter: xtables: move extension arguments into compound structure (5/6) This patch does this for target extensions' checkentry functions. Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-10-08 11:35:19 +02:00
Jan Engelhardt	7eb3558655	netfilter: xtables: move extension arguments into compound structure (4/6) This patch does this for target extensions' target functions. Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-10-08 11:35:19 +02:00
Jan Engelhardt	367c679007	netfilter: xtables: do centralized checkentry call (1/2) It used to be that {ip,ip6,etc}_tables called extension->checkentry themselves, but this can be moved into the xtables core. Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-10-08 11:35:17 +02:00
Jarek Poplawski	6252352d16	pkt_sched: Simplify dev_requeue_skb and dequeue_skb qdisc->requeue was planned to universally replace all requeuing code, but at the top level we never requeue more than one skb, so qdisc-> gso_skb is enough for this. qdisc->requeue would be used on the lower levels only for one level deep requeuing (like in sch_hfsc) after finishing all the changes. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-10-06 10:41:50 -07:00
Jarek Poplawski	554794de79	pkt_sched: Fix handling of gso skbs on requeuing Jay Cliburn noticed and diagnosed a bug triggered in dev_gso_skb_destructor() after last change from qdisc->gso_skb to qdisc->requeue list. Since gso_segmented skbs can't be queued to another list this patch brings back qdisc->gso_skb for them. Reported-by: Jay Cliburn <jcliburn@gmail.com> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-10-06 09:54:39 -07:00
Jarek Poplawski	ebf059821e	pkt_sched: Check the state of tx_queue in dequeue_skb() Check in dequeue_skb() the state of tx_queue for requeued skb to save on locking and re-requeuing, and possibly remove the current check in qdisc_run(). Based on the idea of Alexander Duyck. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-22 22:16:23 -07:00
David S. Miller	f0876520b0	pkt_sched: Always use q->requeue in dev_requeue_skb(). There is no reason to call into the complicated qdiscs just to remember the last SKB where we found the device blocked. The SKB is outside of the qdiscs realm at this point. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-22 22:15:58 -07:00
David S. Miller	242f8bfefe	pkt_sched: Make qdisc->gso_skb a list. The idea is that we can use this to get rid of ->requeue(). Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-22 22:15:30 -07:00
Harvey Harrison	d48abfecea	net: em_cmp.c use unaligned access helpers Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-22 19:20:51 -07:00
Arnaldo Carvalho de Melo	6067804047	net: Use hton[sl]() instead of __constant_hton[sl]() where applicable Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-20 22:20:49 -07:00
Alexander Duyck	a574420ff4	multiq: requeue should rewind the current_band Currently dequeueing a packet and requeueing the same packet will cause a different packet to be pulled on the next dequeue. This change forces requeue to rewind the current_band. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-20 22:07:34 -07:00
Alexander Duyck	f07d150129	multiq: Further multiqueue cleanup This patch resolves a few issues found with multiq including wording suggestions and a problem seen in the allocation of queues. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-12 17:57:23 -07:00
Alexander Duyck	ca9b0e27e0	pkt_action: add new action skbedit This new action will have the ability to change the priority and/or queue_mapping fields on an sk_buff. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-12 16:30:20 -07:00
Alexander Duyck	92651940ab	pkt_sched: Add multiqueue scheduler support This patch is intended to add a qdisc to support the new tx multiqueue architecture by providing a band for each hardware queue. By doing this it is possible to support a different qdisc per physical hardware queue. This qdisc uses the skb->queue_mapping to select which band to place the traffic onto. It then uses a round robin w/ a check to see if the subqueue is stopped to determine which band to dequeue the packet from. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-12 16:29:34 -07:00
Arjan van de Ven	5337407c67	warn: Turn the netdev timeout WARN_ON() into a WARN() this patch turns the netdev timeout WARN_ON_ONCE() into a WARN_ONCE(), so that the device and driver names are inside the warning message. This helps automated tools like kerneloops.org to collect the data and do statistics, as well as making it more likely that humans cut-n-paste the important message as part of a bugreport. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-08 16:17:42 -07:00
Arjan van de Ven	23dd7bb09b	hrtimer: convert net::sched_cbq to the new hrtimer apis In order to be able to do range hrtimers we need to use accessor functions to the "expire" member of the hrtimer struct. This patch converts sched_cbq to these accessors. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>	2008-09-05 21:35:11 -07:00
Thomas Graf	2c10b32bf5	netlink: Remove compat API for nested attributes Removes all _nested_compat() functions from the API. The prio qdisc no longer requires them and netem has its own format anyway. Their existance is only confusing. Resend: Also remove the wrapper macro. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-02 17:30:27 -07:00
Jarek Poplawski	102396ae65	pkt_sched: Fix locking of qdisc_root with qdisc_root_sleeping_lock() Use qdisc_root_sleeping_lock() instead of qdisc_root_lock() where appropriate. The only difference is while dev is deactivated, when currently we can use a sleeping qdisc with the lock of noop_qdisc. This shouldn't be dangerous since after deactivation root lock could be used only by gen_estimator code, but looks wrong anyway. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-29 14:27:52 -07:00
Jarek Poplawski	f6f9b93f16	pkt_sched: Fix gen_estimator locks While passing a qdisc root lock to gen_new_estimator() and gen_replace_estimator() dev could be deactivated or even before grafting proper root qdisc as qdisc_sleeping (e.g. qdisc_create), so using qdisc_root_lock() is not enough. This patch adds qdisc_root_sleeping_lock() for this, plus additional checks, where necessary. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-27 02:25:17 -07:00
Jarek Poplawski	f7a54c13c7	pkt_sched: Use rcu_assign_pointer() to change dev_queue->qdisc These pointers are RCU protected, so proper primitives should be used. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-27 02:22:07 -07:00
Jarek Poplawski	666d9bbedf	pkt_sched: Fix dev_graft_qdisc() locking During dev_graft_qdisc() dev is deactivated, so qdisc_root_lock() returns wrong lock of noop_qdisc instead of qdisc_sleeping. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-27 02:15:20 -07:00
Jarek Poplawski	f6e0b239a2	pkt_sched: Fix qdisc list locking Since some qdiscs call qdisc_tree_decrease_qlen() (so qdisc_lookup()) without rtnl_lock(), adding and deleting from a qdisc list needs additional locking. This patch adds global spinlock qdisc_list_lock and wrapper functions for modifying the list. It is considered as a temporary solution until hfsc_dequeue(), netem_dequeue() and tbf_dequeue() (or qdisc_tree_decrease_qlen()) are redone. With feedback from Herbert Xu and David S. Miller. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-22 03:31:39 -07:00
Jarek Poplawski	2540e0511e	pkt_sched: Fix qdisc_watchdog() vs. dev_deactivate() race dev_deactivate() can skip rescheduling of a qdisc by qdisc_watchdog() or other timer calling netif_schedule() after dev_queue_deactivate(). We prevent this checking aliveness before scheduling the timer. Since during deactivation the root qdisc is available only as qdisc_sleeping additional accessor qdisc_root_sleeping() is created. With feedback from Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-21 05:11:14 -07:00
David S. Miller	f3b9605d74	Revert "pkt_sched: Add BH protection for qdisc_stab_lock." This reverts commit `1cfa26661a`. qdisc_destroy() runs fully under RTNL again and not from softint any longer, so this change is no longer needed. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-18 22:33:05 -07:00
Ilpo Järvinen	e5befbd952	pkt_sched: remove bogus block (cleanup) ...Last block local var got just deleted. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-18 22:30:01 -07:00
David S. Miller	4d8863a29c	pkt_sched: Don't hold qdisc lock over qdisc_destroy(). Based upon reports by Denys Fedoryshchenko, and feedback and help from Jarek Poplawski and Herbert Xu. We always either: 1) Never made an external reference to this qdisc. or 2) Did a dev_deactivate() which purged all asynchronous references. So do not lock the qdisc when we call qdisc_destroy(), it's illegal anyways as when we drop the lock this is free'd memory. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-18 21:06:19 -07:00
Jarek Poplawski	25bfcd5a78	pkt_sched: Add lockdep annotation for qdisc locks Qdisc locks are initialized in the same function, qdisc_alloc(), so lockdep can't distinguish tx qdisc lock from rx and reports "possible recursive locking detected" when both these locks are taken eg. while using act_mirred with ifb. This looks like a false positive. Anyway, after this patch these locks will be reported more exactly. Reported-by: Denys Fedoryshchenko <denys@visp.net.lb> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-18 21:06:09 -07:00
David S. Miller	8608db031b	pkt_sched: Never schedule non-root qdiscs. Based upon initial discovery and patch by Jarek Poplawski. The qdisc watchdogs can be attached to any qdisc, not just the root, so make sure we schedule the correct one. CBQ has a similar bug. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-18 21:05:56 -07:00
David S. Miller	69747650c8	pkt_sched: Fix return value corruption in HTB and TBF. Based upon a bug report by Josip Rodin. Packet schedulers should only return NET_XMIT_DROP iff the packet really was dropped. If the packet does reach the device after we return NET_XMIT_DROP then TCP can crash because it depends upon the enqueue path return values being accurate. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-18 00:39:41 -07:00
David S. Miller	4cf7cb280e	sch_prio: Use NET_XMIT_SUCCESS instead of "0" constant. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-17 22:45:17 -07:00
Jussi Kivilinna	0d40b6e564	sch_prio: Use return value from inner qdisc requeue Use return value from inner qdisc requeue when value returned isn't NET_XMIT_SUCCESS, instead of always returning NET_XMIT_DROP. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-17 22:43:56 -07:00
David S. Miller	1e0d5a5747	pkt_sched: No longer destroy qdiscs from RCU. We can now kill them synchronously with all of the previous dev_deactivate() cures. This makes netdev destruction and shutdown saner as the qdiscs hold references to the device. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-17 22:31:26 -07:00
Jarek Poplawski	3a76e3716b	pkt_sched: Grab correct lock in notify_and_destroy(). From: Jarek Poplawski <jarkao2@gmail.com> When we are destroying non-root qdiscs, we need to lock the root of the qdisc tree not the the qdisc itself. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-17 22:02:11 -07:00
David S. Miller	4335cd2da1	pkt_sched: Simplify dev_deactivate() polling loop. The condition under which the previous qdisc has no more references after we've attached &noop_qdisc is that both RUNNING and SCHED are both seen clear while holding the root lock. So just make specifically that check in the polling loop, instead of this overly complex "check without then check with lock held" sequence. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-17 21:58:07 -07:00
David S. Miller	a9312ae893	pkt_sched: Add 'deactivated' state. This new state lets dev_deactivate() mark a qdisc as having been deactivated. dev_queue_xmit() and ing_filter() check for this bit and do not try to process the qdisc if the bit is set. dev_deactivate() polls the qdisc after setting the bit, waiting for both __QDISC_STATE_RUNNING and __QDISC_STATE_SCHED to clear. This isn't perfect yet, but subsequent changesets will make it so. This part is just one piece of the puzzle. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-17 21:51:03 -07:00
Jarek Poplawski	323c048836	pkt_sched: Fix unlocking in tc_ctl_tfilter() Fix a bug with spin_lock_bh() inserted instead of spin_unlock_bh() by some recent patch. Reported-by: Denys Fedoryshchenko <denys@visp.net.lb> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-14 17:01:10 -07:00
David S. Miller	b9a3b1102b	pkt_sched: Fix queue quiescence testing in dev_deactivate(). Based upon discussions with Jarek P. and Herbert Xu. First, we're testing the wrong qdisc. We just reset the device queue qdiscs to &noop_qdisc and checking it's state is completely pointless here. We want to wait until the previous qdisc that was sitting at the ->qdisc pointer is not busy any more. And that would be ->qdisc_sleeping. Because of how we propagate the samples qdisc pointer down into qdisc_run and friends via per-cpu ->output_queue and netif_schedule, we have to wait also for the __QDISC_STATE_SCHED bit to clear as well. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-13 15:18:38 -07:00
Jarek Poplawski	26b284de54	pkt_sched: Fix oops in htb_delete. Recent changes introduced a bug in htb_delete(): cl->parent->children counter update misses checking cl->parent for NULL, which is used for root classes, so deleting them causes an oops. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-13 15:16:43 -07:00
Jamal Hadi Salim	36723873b6	net-sched: fix Action flushing return code Flushing must consistently return ENOMEM on failure of any allocation Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-13 02:41:45 -07:00
Jamal Hadi Salim	f97017cdef	net-sched: Fix actions flushing Flushing of actions has been broken since we changed the semantics of netlink parsed tb[X] to mean X is an attribute type. This makes the flushing work. Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-13 02:41:22 -07:00
Jarek Poplawski	1cfa26661a	pkt_sched: Add BH protection for qdisc_stab_lock. Since qdisc_stab_lock is used in qdisc_put_stab(), which is called in BH context from __qdisc_destroy() RCU callback, softirq safe locking is needed. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-11 18:11:06 -07:00
David S. Miller	8123b421e8	pkt_sched: Fix ingress deletion and filter attachment. Based upon bug reports by Stephen Hemminger. We still had some cases using ->qdisc instead of ->qdisc_sleeping. Also, qdisc_lookup() should return ingress qdiscs. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-08 23:23:39 -07:00
Jamal Hadi Salim	76aab2c1ea	pkt_sched: Fix actions referencing When an action is added several times with the same exact index it gets deleted on every even-numbered attempt. This fixes that issue. Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-07 20:37:22 -07:00
David S. Miller	827ebd6410	pkt_sched: Fix qdisc config when link is down. Bug reported by Stephen Hemminger. We need to fetch the root from ->qdisc_sleeping not ->qdisc. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-07 20:26:40 -07:00
David S. Miller	ee7af8264d	pkt_sched: Fix "parent is root" test in qdisc_create(). As noticed by Stephen Hemminger, the root qdisc is denoted by TC_H_ROOT, not zero. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-06 23:35:59 -07:00
Jarek Poplawski	c27f339af9	net_sched: Add qdisc __NET_XMIT_BYPASS flag Patrick McHardy <kaber@trash.net> noticed that it would be nice to handle NET_XMIT_BYPASS by NET_XMIT_SUCCESS with an internal qdisc flag __NET_XMIT_BYPASS and to remove the mapping from dev_queue_xmit(). David Miller <davem@davemloft.net> spotted a serious bug in the first version of this patch. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-04 22:39:11 -07:00
Jarek Poplawski	378a2f090f	net_sched: Add qdisc __NET_XMIT_STOLEN flag Patrick McHardy <kaber@trash.net> noticed: "The other problem that affects all qdiscs supporting actions is TC_ACT_QUEUED/TC_ACT_STOLEN getting mapped to NET_XMIT_SUCCESS even though the packet is not queued, corrupting upper qdiscs' qlen counters." and later explained: "The reason why it translates it at all seems to be to not increase the drops counter. Within a single qdisc this could be avoided by other means easily, upper qdiscs would still increase the counter when we return anything besides NET_XMIT_SUCCESS though. This means we need a new NET_XMIT return value to indicate this to the upper qdiscs. So I'd suggest to introduce NET_XMIT_STOLEN, return that to upper qdiscs and translate it to NET_XMIT_SUCCESS in dev_queue_xmit, similar to NET_XMIT_BYPASS." David Miller <davem@davemloft.net> noticed: "Maybe these NET_XMIT_* values being passed around should be a set of bits. They could be composed of base meanings, combined with specific attributes. So you could say "NET_XMIT_DROP \| __NET_XMIT_NO_DROP_COUNT" The attributes get masked out by the top-level ->enqueue() caller, such that the base meanings are the only thing that make their way up into the stack. If it's only about communication within the qdisc tree, let's simply code it that way." This patch is trying to realize these ideas. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-04 22:31:03 -07:00
David S. Miller	5fb662297b	pkt_sched: Use qdisc_lock() on already sampled root qdisc. Based upon a bug report by Jeff Kirsher. Don't use qdisc_root_lock() in these cases as the root qdisc could have been changed, and we'd thus lock the wrong object. Tested by Emil S Tantilov who confirms that this seems to fix the problem. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-02 20:02:43 -07:00
David S. Miller	c3f26a269c	netdev: Fix lockdep warnings in multiqueue configurations. When support for multiple TX queues were added, the netif_tx_lock() routines we converted to iterate over all TX queues and grab each queue's spinlock. This causes heartburn for lockdep and it's not a healthy thing to do with lots of TX queues anyways. So modify this to use a top-level lock and a "frozen" state for the individual TX queues. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-31 16:58:50 -07:00
David S. Miller	8d50b53d66	pkt_sched: Fix OOPS on ingress qdisc add. Bug report from Steven Jan Springl: Issuing the following command causes a kernel oops: tc qdisc add dev eth0 handle ffff: ingress The problem mostly stems from all of the special case handling of ingress qdiscs. So, to fix this, do the grafting operation the same way we do for TX qdiscs. Which means that dev_activate() and dev_deactivate() now do the "qdisc_sleeping <--> qdisc" transitions on dev->rx_queue too. Future simplifications are possible now, mainly because it is impossible for dev_queue->{qdisc,qdisc_sleeping} to be NULL. There are NULL checks all over to handle the ingress qdisc special case that used to exist before this commit. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-30 02:44:25 -07:00
Linus Torvalds	4836e30078	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (39 commits) [PATCH] fix RLIM_NOFILE handling [PATCH] get rid of corner case in dup3() entirely [PATCH] remove remaining namei_{32,64}.h crap [PATCH] get rid of indirect users of namei.h [PATCH] get rid of __user_path_lookup_open [PATCH] f_count may wrap around [PATCH] dup3 fix [PATCH] don't pass nameidata to __ncp_lookup_validate() [PATCH] don't pass nameidata to gfs2_lookupi() [PATCH] new (local) helper: user_path_parent() [PATCH] sanitize __user_walk_fd() et.al. [PATCH] preparation to __user_walk_fd cleanup [PATCH] kill nameidata passing to permission(), rename to inode_permission() [PATCH] take noexec checks to very few callers that care Re: [PATCH 3/6] vfs: open_exec cleanup [patch 4/4] vfs: immutable inode checking cleanup [patch 3/4] fat: dont call notify_change [patch 2/4] vfs: utimes cleanup [patch 1/4] vfs: utimes: move owner check into inode_change_ok() [PATCH] vfs: use kstrdup() and check failing allocation ...	2008-07-26 20:23:44 -07:00
Al Viro	516e0cc564	[PATCH] f_count may wrap around make it atomic_long_t; while we are at it, get rid of useless checks in affs, hfs and hpfs - ->open() always has it equal to 1, ->release() - to 0. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-07-26 20:53:40 -04:00
David S. Miller	cdec7e50a4	Revert "pkt_sched: sch_sfq: dump a real number of flows" This reverts commit `f867e6af94`. Based upon discussions between Jarek and Patrick McHardy this is field being set is more a config parameter than a statistic. And we should add a true statistic to provide this information if we really want it. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-26 02:28:09 -07:00
Ilpo Järvinen	547b792cac	net: convert BUG_TRAP to generic WARN_ON Removes legacy reinvent-the-wheel type thing. The generic machinery integrates much better to automated debugging aids such as kerneloops.org (and others), and is unambiguous due to better naming. Non-intuively BUG_TRAP() is actually equal to WARN_ON() rather than BUG_ON() though some might actually be promoted to BUG_ON() but I left that to future. I could make at least one BUILD_BUG_ON conversion. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-25 21:43:18 -07:00
David S. Miller	cffe1c5d7a	pkt_sched: Fix locking in shutdown_scheduler_queue() Qdisc locks need to be held with BH disabled. Tested-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-25 01:25:04 -07:00
Jarek Poplawski	f867e6af94	pkt_sched: sch_sfq: dump a real number of flows Dump the "flows" number according to the number of active flows instead of repeating the "limit". Reported-by: Denys Fedoryshchenko <denys@visp.net.lb> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-23 21:34:27 -07:00
Adrian Bunk	a94f779f9d	pkt_sched: make qdisc_class_hash_alloc() static This patch makes the needlessly global qdisc_class_hash_alloc() static. Signed-off-by: Adrian Bunk <bunk@kernel.org> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-22 14:20:11 -07:00
Arjan van de Ven	6579e57b31	net: Print the module name as part of the watchdog message As suggested by Dave: This patch adds a function to get the driver name from a struct net_device, and consequently uses this in the watchdog timeout handler to print as part of the message. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-21 13:31:48 -07:00
David S. Miller	d3678b463d	Revert "pkt_sched: Make default qdisc nonshared-multiqueue safe." This reverts commit `a0c80b80e0`. After discussions with Jamal and Herbert on netdev, we should provide at least minimal prioritization at the qdisc level even in multiqueue situations. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-21 10:10:50 -07:00
Daniel Lezcano	c3ee84163e	pkt_sched: Remove unused variable skb in dev_deactivate_queue function. Removed unused variable 'skb' in the dev_deactivate_queue function Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-21 09:18:07 -07:00
David S. Miller	3a682fbd73	pkt_sched: Fix build with NET_SCHED disabled. The stab bits can't be referenced uniless the full packet scheduler layer is enabled. Reported by Stephen Rothwell. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-20 18:13:01 -07:00
Jussi Kivilinna	175f9c1bba	net_sched: Add size table for qdiscs Add size table functions for qdiscs and calculate packet size in qdisc_enqueue(). Based on patch by Patrick McHardy http://marc.info/?l=linux-netdev&m=115201979221729&w=2 Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-20 00:08:47 -07:00
Jussi Kivilinna	0abf77e55a	net_sched: Add accessor function for packet length for qdiscs Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-20 00:08:27 -07:00
Jussi Kivilinna	5f86173bdf	net_sched: Add qdisc_enqueue wrapper Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-20 00:08:04 -07:00
David S. Miller	30ee42be00	pkt_sched: Fix noqueue_qdisc initialization. Like noop_qdisc, it needs a dummy backpointer and explicit qdisc->q.lock initialization. Based upon a report by Stephen Hemminger. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-18 23:00:11 -07:00
David S. Miller	3072367300	pkt_sched: Manage qdisc list inside of root qdisc. Idea is from Patrick McHardy. Instead of managing the list of qdiscs on the device level, manage it in the root qdisc of a netdev_queue. This solves all kinds of visibility issues during qdisc destruction. The way to iterate over all qdiscs of a netdev_queue is to visit the netdev_queue->qdisc, and then traverse it's list. The only special case is to ignore builting qdiscs at the root when dumping or doing a qdisc_lookup(). That was not needed previously because builtin qdiscs were not added to the device's qdisc_list. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-18 22:50:15 -07:00
David S. Miller	72b25a913e	pkt_sched: Get rid of u32_list. The u32_list is just an indirect way of maintaining a reference to a U32 node on a per-qdisc basis. Just add an explicit node pointer for u32 to struct Qdisc an do away with this global list. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-18 20:54:17 -07:00
David S. Miller	a0c80b80e0	pkt_sched: Make default qdisc nonshared-multiqueue safe. Instead of 'pfifo_fast' we have just plain 'fifo_fast'. No priority queues, just a straight FIFO. This is necessary in order to legally have a seperate qdisc per queue in multi-TX-queue setups, and thus get full parallelization. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-17 19:21:33 -07:00
David S. Miller	99194cff39	pkt_sched: Add multiqueue handling to qdisc_graft(). Move the destruction of the old queue into qdisc_graft(). When operating on a root qdisc (ie. "parent == NULL"), apply the operation to all queues. The caller has grabbed a single implicit reference for this graft, therefore when we apply the change to more than one queue we must grab additional qdisc references. Otherwise, we are operating on a class of a specific parent qdisc, and therefore no multiqueue handling is necessary. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-17 19:21:30 -07:00
David S. Miller	8387400092	pkt_sched: Kill netdev_queue lock. We can simply use the qdisc->q.lock for all of the qdisc tree synchronization. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-17 19:21:30 -07:00
David S. Miller	c7e4f3bbb4	pkt_sched: Kill qdisc_lock_tree and qdisc_unlock_tree. No longer used. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-17 19:21:29 -07:00
David S. Miller	53049978df	pkt_sched: Make qdisc grafting locking more specific. Lock the root of the qdisc being operated upon. All explicit references to qdisc_tree_lock() are now gone. The only remaining uses are via the sch_tree_{lock,unlock}() and tcf_tree_{lock,unlock}() macros. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-17 19:21:27 -07:00
David S. Miller	ead81cc5fc	netdevice: Move qdisc_list back into net_device proper. And give it it's own lock. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-17 19:21:26 -07:00
David S. Miller	15b458fa65	pkt_sched: Kill qdisc_lock_tree usage in cls_route.c It just wants the qdisc tree to be synchronized, so grabbing qdisc_root_lock() is sufficient. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-17 19:21:25 -07:00
David S. Miller	55dbc640c3	pkt_sched: Remove qdisc_lock_tree usage in cls_api.c It just wants the qdisc tree for the filter to be synchronized. So just BH lock qdisc_root_lock(q) instead. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-17 19:21:24 -07:00
David S. Miller	17715e62a5	pkt_sched: Use per-queue locking in shutdown_scheduler_queue. This eliminates another qdisc_lock_tree user. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-17 19:21:23 -07:00
David S. Miller	8a34c5dc3a	pkt_sched: Perform bulk of qdisc destruction in RCU. This allows less strict control of access to the qdisc attached to a netdev_queue. It is even allowed to enqueue into a qdisc which is in the process of being destroyed. The RCU handler will toss out those packets. We will need this to handle sharing of a qdisc amongst multiple TX queues. In such a setup the lock has to be shared, so will be inside of the qdisc itself. At which point the netdev_queue lock cannot be used to hard synchronize access to the ->qdisc pointer. One operation we have to keep inside of qdisc_destroy() is the list deletion. It is the only piece of state visible after the RCU quiesce period, so we have to undo it early and under the appropriate locking. The operations in the RCU handler do not need any looking because the qdisc tree is no longer visible to anything at that point. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-17 19:21:22 -07:00
David S. Miller	16361127eb	pkt_sched: dev_init_scheduler() does not need to lock qdisc tree. We are registering the device, there is no way anyone can get at this object's qdiscs yet in any meaningful way. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-17 19:21:21 -07:00
David S. Miller	37437bb2e1	pkt_sched: Schedule qdiscs instead of netdev_queue. When we have shared qdiscs, packets come out of the qdiscs for multiple transmit queues. Therefore it doesn't make any sense to schedule the transmit queue when logically we cannot know ahead of time the TX queue of the SKB that the qdisc->dequeue() will give us. Just for sanity I added a BUG check to make sure we never get into a state where the noop_qdisc is scheduled. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-17 19:21:20 -07:00
David S. Miller	7698b4fcab	pkt_sched: Add and use qdisc_root() and qdisc_root_lock(). When code wants to lock the qdisc tree state, the logic operation it's doing is locking the top-level qdisc that sits of the root of the netdev_queue. Add qdisc_root_lock() to represent this and convert the easiest cases. In order for this to work out in all cases, we have to hook up the noop_qdisc to a dummy netdev_queue. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-17 19:21:19 -07:00
David S. Miller	e2627c8c22	pkt_sched: Make QDISC_RUNNING a qdisc state. Currently it is associated with a netdev_queue, but when we have qdisc sharing that no longer makes any sense. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-17 19:21:18 -07:00
David S. Miller	d3b753db7c	pkt_sched: Move gso_skb into Qdisc. We liberate any dangling gso_skb during qdisc destruction. It really only matters for the root qdisc. But when qdiscs can be shared by multiple netdev_queue objects, we can't have the gso_skb in the netdev_queue any more. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-17 19:21:18 -07:00
David S. Miller	fd2ea0a79f	net: Use queue aware tests throughout. This effectively "flips the switch" by making the core networking and multiqueue-aware drivers use the new TX multiqueue structures. Non-multiqueue drivers need no changes. The interfaces they use such as netif_stop_queue() degenerate into an operation on TX queue zero. So everything "just works" for them. Code that really wants to do "X" to all TX queues now invokes a routine that does so, such as netif_tx_wake_all_queues(), netif_tx_stop_all_queues(), etc. pktgen and netpoll required a little bit more surgery than the others. In particular the pktgen changes, whilst functional, could be largely improved. The initial check in pktgen_xmit() will sometimes check the wrong queue, which is mostly harmless. The thing to do is probably to invoke fill_packet() earlier. The bulk of the netpoll changes is to make the code operate solely on the TX queue indicated by by the SKB queue mapping. Setting of the SKB queue mapping is entirely confined inside of net/core/dev.c:dev_pick_tx(). If we end up needing any kind of special semantics (drops, for example) it will be implemented here. Finally, we now have a "real_num_tx_queues" which is where the driver indicates how many TX queues are actually active. With IGB changes from Jeff Kirsher. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-17 19:21:07 -07:00
David S. Miller	1d8ae3fdeb	pkt_sched: Remove RR scheduler. This actually fixes a bug added by the RR scheduler changes. The ->bands and ->prio2band parameters were being set outside of the sch_tree_lock() and thus could result in strange behavior and inconsistencies. It might be possible, in the new design (where there will be one qdisc per device TX queue) to allow similar functionality via a TX hash algorithm for RR but I really see no reason to export this aspect of how these multiqueue cards actually implement the scheduling of the the individual DMA TX rings and the single physical MAC/PHY port. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-17 19:21:04 -07:00
David S. Miller	e8a0464cc9	netdev: Allocate multiple queues for TX. alloc_netdev_mq() now allocates an array of netdev_queue structures for TX, based upon the queue_count argument. Furthermore, all accesses to the TX queues are now vectored through the netdev_get_tx_queue() and netdev_for_each_tx_queue() interfaces. This makes it easy to grep the tree for all things that want to get to a TX queue of a net device. Problem spots which are not really multiqueue aware yet, and only work with one queue, can easily be spotted by grepping for all netdev_get_tx_queue() calls that pass in a zero index. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-17 19:21:00 -07:00
Patrick McHardy	72d9794f44	net-sched: cls_flow: add perturbation support Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-14 20:36:32 -07:00
David S. Miller	79d16385c7	netdev: Move atomic queue state bits into netdev_queue. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-08 23:14:46 -07:00
David S. Miller	c773e847ea	netdev: Move _xmit_lock and xmit_lock_owner into netdev_queue. Accesses are mostly structured such that when there are multiple TX queues the code transformations will be a little bit simpler. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-08 23:13:53 -07:00
David S. Miller	eb6aafe3f8	pkt_sched: Make qdisc_run take a netdev_queue. This allows us to use this calling convention all the way down into qdisc_restart(). Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-08 23:12:38 -07:00
David S. Miller	86d804e10a	netdev: Make netif_schedule() routines work with netdev_queue objects. Only plain netif_schedule() remains taking a net_device, mostly as a compatability item while we transition the rest of these interfaces. Everything else calls netif_schedule_queue() or __netif_schedule(), both of which take a netdev_queue pointer. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-08 23:11:25 -07:00
David S. Miller	970565bbad	netdev: Move gso_skb into netdev_queue. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-08 23:10:33 -07:00
David S. Miller	74d58a0c1d	pkt_sched: Make netem queue agnostic. It just wants the root qdisc given an arbitrary qdisc, and that is simply qdisc->dev_queue->qdisc Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Stephen Hemminger <shemminger@vyatta.com>	2008-07-08 22:57:51 -07:00
David S. Miller	68dfb42798	pkt_sched: Kill stats_lock member of struct Qdisc. It is always equal to qdisc->dev_queue->lock Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-08 22:57:31 -07:00
David S. Miller	816f3258e7	netdev: Kill qdisc_ingress, use netdev->rx_queue.qdisc instead. Now that our qdisc management is bi-directional, per-queue, and fully orthogonal, there is no reason to have a special ingress qdisc pointer in struct net_device. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-08 22:49:00 -07:00
David S. Miller	b0e1e6462d	netdev: Move rest of qdisc state into struct netdev_queue Now qdisc, qdisc_sleeping, and qdisc_list also live there. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-08 17:42:10 -07:00
David S. Miller	555353cfa1	netdev: The ingress_lock member is no longer needed. Every qdisc is assosciated with a queue, and in the case of ingress qdiscs that will now be netdev->rx_queue so using that queue's lock is the thing to do. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-08 17:33:13 -07:00
David S. Miller	dc2b48475a	netdev: Move queue_lock into struct netdev_queue. The lock is now an attribute of the device queue. One thing to notice is that "suspicious" places emerge which will need specific training about multiple queue handling. They are so marked with explicit "netdev->rx_queue" and "netdev->tx_queue" references. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-08 17:18:23 -07:00
David S. Miller	5ce2d488fe	pkt_sched: Remove 'dev' member of struct Qdisc. It can be obtained via the netdev_queue. So create a helper routine, qdisc_dev(), to make the transformations nicer looking. Now, qdisc_alloc() now no longer needs a net_device pointer argument. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-08 17:06:30 -07:00
David S. Miller	bb949fbd18	netdev: Create netdev_queue abstraction. A netdev_queue is an entity managed by a qdisc. Currently there is one RX and one TX queue, and a netdev_queue merely contains a backpointer to the net_device. The Qdisc struct is augmented with a netdev_queue pointer as well. Eventually the 'dev' Qdisc member will go away and we will have the resulting hierarchy: net_device --> netdev_queue --> Qdisc Also, qdisc_alloc() and qdisc_create_dflt() now take a netdev_queue pointer argument. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-08 16:55:56 -07:00
David S. Miller	e65d22e180	pkt_sched: Remove comment reference to old style TX locking. We haven't had netdev->tbusy in many years :) Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-08 16:46:01 -07:00
Patrick McHardy	fb0305ce1b	net-sched: consolidate default fifo qdisc setup Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-05 23:40:21 -07:00
Patrick McHardy	aee18a8cf2	net-sched: sch_htb: remove write-only qdisc filter_cnt The filter_cnt is supposed to count filter references to a class. Since the qdisc can't be the target of a filter, it doesn't need a filter_cnt. In fact the counter is never decreased since cls_api considers a return value of zero a failure and doesn't unbind again. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-05 23:23:27 -07:00
Patrick McHardy	4207759939	net-sched: sch_htb: remove child and sibling lists Now that the qdisc isn't destroyed in hierarchical order anymore, the only user of the child lists left is htb_parent_last_child(). This can be easily changed to use a counter of children to save a few bytes. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-05 23:22:53 -07:00
Patrick McHardy	f4c1f3e0c5	net-sched: sch_htb: use dynamic class hash helpers Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-05 23:22:35 -07:00
Patrick McHardy	fbd8f1379a	net-sched: sch_htb: move hash and sibling list removal to htb_delete Hash list removal currently happens twice (once in htb_delete, once in htb_destroy_class), which makes it harder to use the dynamically sized class hash without adding special cases for HTB. The reason is that qdisc destruction destroys classes in hierarchical order, which is not necessary if filters are destroyed in a separate iteration during qdisc destruction. Adjust qdisc destruction to follow the same scheme as other hierarchical qdiscs by first performing a filter destruction pass, then destroying all classes in hash order. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-05 23:22:19 -07:00
Patrick McHardy	d77fea2eb9	net-sched: sch_cbq: use dynamic class hash helpers Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-05 23:22:05 -07:00
Patrick McHardy	be0d39d52c	net-sched: sch_hfsc: use dynamic class hash helpers Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-05 23:21:47 -07:00
Patrick McHardy	6fe1c7a555	net-sched: add dynamically sized qdisc class hash helpers Currently all qdiscs which allow to create classes uses a fixed sized hash table with size 16 to hash the classes. This causes a large bottleneck when using thousands of classes and unbound filters. Add helpers for dynamically sized class hashes to fix this. The following patches will convert the qdiscs to use them. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-05 23:21:31 -07:00
David S. Miller	ea2aca084b	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: Documentation/feature-removal-schedule.txt drivers/net/wan/hdlc_fr.c drivers/net/wireless/iwlwifi/iwl-4965.c drivers/net/wireless/iwlwifi/iwl3945-base.c	2008-07-05 23:08:07 -07:00
Patrick McHardy	a4aebb83cf	net-sched: fix filter destruction in atm/hfsc qdisc destruction Filters need to be destroyed before beginning to destroy classes since the destination class needs to still be alive to unbind the filter. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-01 19:53:09 -07:00
Patrick McHardy	ff31ab56c0	net-sched: change tcf_destroy_chain() to clear start of filter list Pass double tcf_proto pointers to tcf_destroy_chain() to make it clear the start of the filter list for more consistency. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-01 19:52:38 -07:00
David S. Miller	1b63ba8a86	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/wireless/iwlwifi/iwl4965-base.c	2008-06-28 01:19:40 -07:00
Adrian Bunk	ede16af4cd	pkt_sched: Remove CONFIG_NET_SCH_RR Commit `d62733c8e4` ([SCHED]: Qdisc changes and sch_rr added for multiqueue) added a NET_SCH_RR option that was unused since the code went unconditionally into sch_prio. Reported-by: Robert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-27 19:54:05 -07:00
WANG Cong	01e123d79a	pkt_sched: ERR_PTR() ususally encodes an negative errno, not positive. Note, in the following patch, 'err' is initialized as: int err = -ENOBUFS; Signed-off-by: WANG Cong <wcong@critical-links.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-27 19:51:35 -07:00
David S. Miller	caea902f72	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/wireless/rt2x00/Kconfig drivers/net/wireless/rt2x00/rt2x00usb.c net/sctp/protocol.c	2008-06-16 18:25:48 -07:00
Jesper Dangaard Brouer	47083fc073	pkt_sched: Change HTB_HYSTERESIS to a runtime parameter htb_hysteresis. Add a htb_hysteresis parameter to htb_sch.ko and by sysfs magic make it runtime adjustable via /sys/module/sch_htb/parameters/htb_hysteresis mode 640. Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk> Acked-by: Martin Devera <devik@cdi.cz> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-16 16:39:32 -07:00
Jesper Dangaard Brouer	f9ffcedddb	pkt_sched: HTB scheduler, change default hysteresis mode to off. The HTB hysteresis mode reduce the CPU load, but at the cost of scheduling accuracy. On ADSL links (512 kbit/s upstream), this inaccuracy introduce significant jitter, enought to disturbe VoIP. For details see my masters thesis (http://www.adsl-optimizer.dk/thesis/), chapter 7, section 7.3.1, pp 69-70. Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk> Acked-by: Martin Devera <devik@cdi.cz> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-16 16:38:33 -07:00
Adrian Bunk	0b04082995	net: remove CVS keywords This patch removes CVS keywords that weren't updated for a long time from comments. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-11 21:00:38 -07:00
Thomas Graf	bc3ed28caa	netlink: Improve returned error codes Make nlmsg_trim(), nlmsg_cancel(), genlmsg_cancel(), and nla_nest_cancel() void functions. Return -EMSGSIZE instead of -1 if the provided message buffer is not big enough. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-03 16:36:54 -07:00
Patrick McHardy	f2df824948	net_sched: cls_api: fix return value for non-existant classifiers cls_api should return ENOENT when the requested classifier doesn't exist. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-05-20 14:34:46 -07:00
Jamal Hadi Salim	9d1045ad68	net_cls_act: act_simple dont ignore realloc code reallocation of the policy data was being ignored. It could fail. Simplify so that there is no need for reallocating. Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-05-06 00:10:24 -07:00
Jamal Hadi Salim	fa1b1cff3d	net_cls_act: Make act_simple use of netlink policy. Convert to netlink helpers by using netlink policy validation. As a side effect fixes a leak. Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-05-05 00:22:35 -07:00
Jarek Poplawski	3ba08b00e0	sch_htb: remove from event queue in htb_parent_to_leaf() There is lack of removing a class from the event queue while changing from parent to leaf which can cause corruption of this rb tree. This patch fixes a bug introduced by my patch: "sch_htb: turn intermediate classes into leaves" commit: `160d5e10f8`. Many thanks to Jan 'yanek' Bortl for finding a way to reproduce this rare bug and narrowing the test case, which made possible proper diagnosing. This patch is recommended for all kernels starting from 2.6.20. Reported-and-tested-by: Jan 'yanek' Bortl <yanek@ya.bofh.cz> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-05-03 20:46:29 -07:00
Arjan van de Ven	b4192bbd85	net: Add a WARN_ON_ONCE() to the transmit timeout function WARN_ON_ONCE() gives a stack trace including the full module list. Having this in the kernel dump for the timeout case in the generic netdev watchdog will help us see quicker which driver is involved. It also allows us to collect statistics and patterns in terms of which drivers have this event occuring. Suggested by Andrew Morton Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-05-02 16:21:07 -07:00
Jarek Poplawski	980c478ddb	sch_sfq: use del_timer_sync() in sfq_destroy() Let's delete timer reliably in sfq_destroy(). Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-29 03:29:03 -07:00
David S. Miller	1e42198609	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6	2008-04-17 23:56:30 -07:00
Patrick McHardy	f5ba2d3217	[PKT_SCHED]: Fix datalen check in tcf_simp_init(). datalen is unsigned so it can never be less than zero, but that's ok because the attribute passed to nla_len() has been validated and therefore a negative return value is impossible. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-17 23:19:55 -07:00
Jarek Poplawski	066a3b5b23	[NET_SCHED] sch_api: fix qdisc_tree_decrease_qlen() loop TC_H_MAJ(parentid) for root classes is the same as for ingress, and if ingress qdisc is created qdisc_lookup() returns its pointer (without ingress NULL is returned). After this all qdisc_lookups give the same, and we get endless loop. (I don't know how this could hide for so long - it should trigger with every leaf class deleted if it's qdisc isn't empty.) After this fix qdisc_lookup() is omitted both for ingress and root parents, but looking for root is only wasting a little time here... Many thanks to Enrico Demarin for finding a test for catching this bug, which probably bothered quite a lot of admins. Reported-by: Enrico Demarin <enrico@superclick.com>, Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-14 15:10:42 -07:00
David S. Miller	df39e8ba56	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/ehea/ehea_main.c drivers/net/wireless/iwlwifi/Kconfig drivers/net/wireless/rt2x00/rt61pci.c net/ipv4/inet_timewait_sock.c net/ipv6/raw.c net/mac80211/ieee80211_sta.c	2008-04-14 02:30:23 -07:00
Jarek Poplawski	e56cfad132	[NET_SCHED] cls_u32: refcounting fix for u32_delete() Deleting of nonroot hnodes mostly doesn't work in u32_delete(): refcnt == 1 is expected, but such hnodes' refcnts are initialized with 0 and charged only with "link" nodes. Now they'll start with 1 like usual. Thanks to Patrick McHardy for an improving suggestion. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-12 18:37:13 -07:00
David S. Miller	e1ec1b8ccd	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/s2io.c	2008-04-02 22:35:23 -07:00
Herbert Xu	2ba2506ca7	[NET]: Add preemption point in qdisc_run The qdisc_run loop is currently unbounded and runs entirely in a softirq. This is bad as it may create an unbounded softirq run. This patch fixes this by calling need_resched and breaking out if necessary. It also adds a break out if the jiffies value changes since that would indicate we've been transmitting for too long which starves other softirqs. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-28 16:25:26 -07:00
YOSHIFUJI Hideaki	3b1e0a655f	[NET] NETNS: Omit sock->sk_net without CONFIG_NET_NS. Introduce per-sock inlines: sock_net(), sock_net_set() and per-inet_timewait_sock inlines: twsk_net(), twsk_net_set(). Without CONFIG_NET_NS, no namespace other than &init_net exists. Let's explicitly define them to help compiler optimizations. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2008-03-26 04:39:55 +09:00
David S. Miller	06802a819a	Merge branch 'master' of ../net-2.6/ Conflicts: net/ipv6/ndisc.c	2008-03-23 22:54:03 -07:00
Martin Devera	8f3ea33a50	sch_htb: fix "too many events" situation HTB is event driven algorithm and part of its work is to apply scheduled events at proper times. It tried to defend itself from livelock by processing only limited number of events per dequeue. Because of faster computers some users already hit this hardcoded limit. This patch limits processing up to 2 jiffies (why not 1 jiffie ? because it might stop prematurely when only fraction of jiffie remains). Signed-off-by: Martin Devera <devik@cdi.cz> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-23 22:00:38 -07:00
David S. Miller	577f99c1d0	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/wireless/rt2x00/rt2x00dev.c net/8021q/vlan_dev.c	2008-03-18 00:37:55 -07:00
Al Viro	0382b9c354	[PKT_SCHED]: annotate cls_u32 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-17 22:46:46 -07:00
Eric Dumazet	ee6b967301	[IPV4]: Add 'rtable' field in struct sk_buff to alias 'dst' and avoid casts (Anonymous) unions can help us to avoid ugly casts. A common cast it the (struct rtable )skb->dst one. Defining an union like : union { struct dst_entry dst; struct rtable *rtable; }; permits to use skb->rtable in place. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-05 18:30:47 -08:00
David S. Miller	30ddb159ff	[PKT_SCHED] ematch: Fix build warning. Commit `954415e33e` ("[PKT_SCHED] ematch: tcf_em_destroy robustness") removed a cast on em->data when passing it to kfree(), but em->data is an integer type that can hold pointers as well as other values so the cast is necessary. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-10 03:48:15 -08:00
Jarek Poplawski	21347456ab	[NET_SCHED] sch_htb: htb_requeue fix htb_requeue() enqueues skbs for which htb_classify() returns NULL. This is wrong because such skbs could be handled by NET_CLS_ACT code, and the decision could be different than earlier in htb_enqueue(). So htb_requeue() is changed to work and look more like htb_enqueue(). Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-09 23:44:00 -08:00
Stephen Hemminger	954415e33e	[PKT_SCHED] ematch: tcf_em_destroy robustness Make the code in tcf_em_tree_destroy more robust and cleaner: * Don't need to cast pointer to kfree() or avoid passing NULL. * After freeing the tree, clear the pointer to avoid possible problems from repeated free. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-09 23:26:53 -08:00
Stephen Hemminger	ed7af3b350	[PKT_SCHED]: deinline functions in meta match A couple of functions in meta match don't need to be inline. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-09 23:26:17 -08:00
Stephen Hemminger	268bcca1e7	[PKT_SCHED] ematch: oops from uninitialized variable (resend) Setting up a meta match causes a kernel OOPS because of uninitialized elements in tree. [ 37.322381] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 [ 37.322381] IP: [<ffffffff883fc717>] :em_meta:em_meta_destroy+0x17/0x80 [ 37.322381] Call Trace: [ 37.322381] [<ffffffff803ec83d>] tcf_em_tree_destroy+0x2d/0xa0 [ 37.322381] [<ffffffff803ecc8c>] tcf_em_tree_validate+0x2dc/0x4a0 [ 37.322381] [<ffffffff803f06d2>] nla_parse+0x92/0xe0 [ 37.322381] [<ffffffff883f9672>] :cls_basic:basic_change+0x202/0x3c0 [ 37.322381] [<ffffffff802a3917>] kmem_cache_alloc+0x67/0xa0 [ 37.322381] [<ffffffff803ea221>] tc_ctl_tfilter+0x3b1/0x580 [ 37.322381] [<ffffffff803dffd0>] rtnetlink_rcv_msg+0x0/0x260 [ 37.322381] [<ffffffff803ee944>] netlink_rcv_skb+0x74/0xa0 [ 37.322381] [<ffffffff803dffc8>] rtnetlink_rcv+0x18/0x20 [ 37.322381] [<ffffffff803ee6c3>] netlink_unicast+0x263/0x290 [ 37.322381] [<ffffffff803cf276>] __alloc_skb+0x96/0x160 [ 37.322381] [<ffffffff803ef014>] netlink_sendmsg+0x274/0x340 [ 37.322381] [<ffffffff803c7c3b>] sock_sendmsg+0x12b/0x140 [ 37.322381] [<ffffffff8024de90>] autoremove_wake_function+0x0/0x30 [ 37.322381] [<ffffffff8024de90>] autoremove_wake_function+0x0/0x30 [ 37.322381] [<ffffffff803c7c3b>] sock_sendmsg+0x12b/0x140 [ 37.322381] [<ffffffff80288611>] zone_statistics+0xb1/0xc0 [ 37.322381] [<ffffffff803c7e5e>] sys_sendmsg+0x20e/0x360 [ 37.322381] [<ffffffff803c7411>] sockfd_lookup_light+0x41/0x80 [ 37.322381] [<ffffffff8028d04b>] handle_mm_fault+0x3eb/0x7f0 [ 37.322381] [<ffffffff8020c2fb>] system_call_after_swapgs+0x7b/0x80 Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-09 03:47:19 -08:00
Stephen Hemminger	04f217aca4	[TC]: oops in em_meta If userspace passes a unknown match index into em_meta, then em_meta_change will return an error and the data for the match will not be set. This then causes an null pointer dereference when the cleanup is done in the error path via tcf_em_tree_destroy. Since the tree structure comes kzalloc, it is initialized to NULL. Discovered when testing a new version of tc command against an accidental older kernel. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-07 18:13:00 -08:00
Patrick McHardy	9ec138101f	[NET_SCHED]: cls_flow: support classification based on VLAN tag Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-05 16:21:04 -08:00
Patrick McHardy	4f25049106	[NET_SCHED]: cls_flow: fix key mask validity check Since we're using fls(), we need to check whether the value is non-zero first. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-05 16:19:59 -08:00
Patrick McHardy	0ea9d70df8	[NET_SCHED]: em_meta: fix compile warning net/sched/em_meta.c: In function 'meta_int_vlan_tag': net/sched/em_meta.c:179: warning: 'tag' may be used uninitialized in this function Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-05 16:19:33 -08:00
Stephen Hemminger	3113e88c3c	[PKT_SCHED]: vlan tag match Provide a way to use tc filters on vlan tag even if tag is buried in skb due to hardware acceleration. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-05 03:20:13 -08:00
Rami Rosen	0aead54347	[NET_SCHED]: Add #ifdef CONFIG_NET_EMATCH in net/sched/cls_flow.c (latest git broken build) The 2.6 latest git build was broken when using the following configuration options: CONFIG_NET_EMATCH=n CONFIG_NET_CLS_FLOW=y with the following error: net/sched/cls_flow.c: In function 'flow_dump': net/sched/cls_flow.c:598: error: 'struct tcf_ematch_tree' has no member named 'hdr' make[2]: * [net/sched/cls_flow.o] Error 1 make[1]: * [net/sched] Error 2 make: *** [net] Error 2 see the recent post by Li Zefan: http://www.spinics.net/lists/netdev/msg54434.html The reason for this crash is that struct tcf_ematch_tree (net/pkt_cls.h) is empty when CONFIG_NET_EMATCH is not defined. When CONFIG_NET_EMATCH is defined, the tcf_ematch_tree structure indeed holds a struct tcf_ematch_tree_hdr (hdr) as flow_dump() expects. This patch adds #ifdef CONFIG_NET_EMATCH in flow_dump to avoid this. Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-05 02:56:48 -08:00
Patrick McHardy	e5dfb81518	[NET_SCHED]: Add flow classifier Add new "flow" classifier, which is meant to extend the SFQ hashing capabilities without hard-coding new hash functions and also allows deterministic mappings of keys to classes, replacing some out of tree iptables patches like IPCLASSIFY (maps IPs to classes), IPMARK (maps IPs to marks, with fw filters to classes), ... Some examples: - Classic SFQ hash: tc filter add ... flow hash \ keys src,dst,proto,proto-src,proto-dst divisor 1024 - Classic SFQ hash, but using information from conntrack to work properly in combination with NAT: tc filter add ... flow hash \ keys nfct-src,nfct-dst,proto,nfct-proto-src,nfct-proto-dst divisor 1024 - Map destination IPs of 192.168.0.0/24 to classids 1-257: tc filter add ... flow map \ key dst addend -192.168.0.0 divisor 256 - alternatively: tc filter add ... flow map \ key dst and 0xff - similar, but reverse ordered: tc filter add ... flow map \ key dst and 0xff xor 0xff Perturbation is currently not supported because we can't reliable kill the timer on destruction. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:28:36 -08:00
Patrick McHardy	94de78d195	[NET_SCHED]: sch_sfq: make internal queues visible as classes Add support for dumping statistics and make internal queues visible as classes. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:28:35 -08:00
Patrick McHardy	7d2681a6ff	[NET_SCHED]: sch_sfq: add support for external classifiers Add support for external classifiers to allow using different flow hash functions similar to ESFQ. When no classifier is attached the built-in hash is used as before. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:28:34 -08:00
Patrick McHardy	5239008b0d	[NET_SCHED]: Constify struct tcf_ext_map Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:28:34 -08:00
Roel Kluin	cc8fd14dca	[PKT_SCHED] sch_teql.c: Duplicate IFF_BROADCAST in FMASK, remove 2nd. Signed-off-by: Roel Kluin <12o3l@tiscali.nl> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:28:29 -08:00
Patrick McHardy	72eb7bd269	[NET_SCHED]: sch_ingress: remove netfilter support Since the old policer code is gone, TC actions are needed for policing. The ingress qdisc can get packets directly from netif_receive_skb() in case TC actions are enabled or through netfilter otherwise, but since without TC actions there is no policer the only thing it actually does is count packets. Remove the netfilter support and always require TC actions. Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:28:25 -08:00
Patrick McHardy	7a9c1bd409	[NET_SCHED]: Use nla_policy for attribute validation in ematches Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:24 -08:00
Patrick McHardy	53b2bf3f8a	[NET_SCHED]: Use nla_policy for attribute validation in actions Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:23 -08:00
Patrick McHardy	6fa8c0144b	[NET_SCHED]: Use nla_policy for attribute validation in classifiers Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:23 -08:00
Patrick McHardy	27a3421e48	[NET_SCHED]: Use nla_policy for attribute validation in packet schedulers Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:22 -08:00
Patrick McHardy	5feb5e1aaa	[NET_SCHED]: sch_api: introduce constant for rate table size Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:21 -08:00
Patrick McHardy	1587bac49f	[NET_SCHED]: Use typeful attribute parsing helpers Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:21 -08:00
Patrick McHardy	24beeab539	[NET_SCHED]: Use typeful attribute construction helpers Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:20 -08:00
Patrick McHardy	57e1c487a4	[NET_SCHED]: Use NLA_PUT_STRING for string dumping Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:19 -08:00
Patrick McHardy	4b3550ef53	[NET_SCHED]: Use nla_nest_start/nla_nest_end Use nla_nest_start/nla_nest_end for dumping nested attributes. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:18 -08:00
Patrick McHardy	cee63723b3	[NET_SCHED]: Propagate nla_parse return value nla_parse() returns more detailed errno codes, propagate them back on error. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:18 -08:00
Patrick McHardy	ab27cfb85c	[NET_SCHED]: act_api: use PTR_ERR in tcf_action_init/tcf_action_get Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:17 -08:00
Patrick McHardy	c96c9471dd	[NET_SCHED]: act_api: use nlmsg_parse Convert open-coded nlmsg_parse to use the real function. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:16 -08:00
Patrick McHardy	6d834e04e5	[NET_SCHED]: act_api: fix netlink API conversion bug Fix two invalid attribute accesses, indices start at 1 with the new netlink API. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:15 -08:00
Patrick McHardy	b03f467200	[NET_SCHED]: sch_netem: use nla_parse_nested_compat Replace open coded equivalent of nla_parse_nested_compat(). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:15 -08:00
Patrick McHardy	f5e5cb7553	[NET_SCHED]: sch_atm: fix format string warning Fix format string warning introduces by the netlink API conversion: net/sched/sch_atm.c:250: warning: format '%lu' expects type 'long unsigned int', but argument 3 has type 'int'. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:14 -08:00
Patrick McHardy	7ba699c604	[NET_SCHED]: Convert actions from rtnetlink to new netlink API Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:11 -08:00
Patrick McHardy	add93b610a	[NET_SCHED]: Convert classifiers from rtnetlink to new netlink API Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:11 -08:00
Patrick McHardy	1e90474c37	[NET_SCHED]: Convert packet schedulers from rtnetlink to new netlink API Convert packet schedulers to use the netlink API. Unfortunately a gradual conversion is not possible without breaking compilation in the middle or adding lots of casts, so this patch converts them all in one step. The patch has been mostly generated automatically with some minor edits to at least allow seperate conversion of classifiers and actions. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:10 -08:00
Patrick McHardy	2eb9d75c72	[NET_SCHED]: mark classifier ops __read_mostly Additionally remove unnecessary NULL initilizations of the next pointer. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:08 -08:00
Patrick McHardy	62e3ba1b55	[NET_SCHED]: Move EXPORT_SYMBOL next to exported symbol Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:11:07 -08:00
Stephen Hemminger	aa767bfea4	[PKT_SCHED] net classifier: style cleanup's Classifier code cleanup. Get rid of printk wrapper, and fix whitespace and other style stuff reported by checkpatch Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:08:42 -08:00
Stephen Hemminger	786a90366f	[PKT_SCHED] sch_atm: style cleanup ATM scheduler clean house: * get rid of printk and qdisc_priv() wrapper * split some assignment in if() statements * whitespace and line breaks. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:08:41 -08:00
Stephen Hemminger	9d127fbdd2	[PKT_SCHED] dsmark: checkpatch warning cleanup Get rid of all style things checkpatch warns about, indentation and whitespace. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:08:40 -08:00
Stephen Hemminger	4c30719f4f	[PKT_SCHED] dsmark: handle cloned and non-linear skb's Make dsmark work properly with non-linear and cloned skb's Before modifying the header, it needs to check that skb header is writeable. Note: this makes the assumption, that if it queues a good skb then a good skb will come out of the embedded qdisc. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:08:40 -08:00
David S. Miller	5b0ac72bc5	[PKT_SCHED] dsmark: Use hweight32() instead of convoluted loop. Based upon a patch by Stephen Hemminger and suggestions from Patrick McHardy. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:08:39 -08:00
Stephen Hemminger	81da99ed71	[PKT_SCHED] dsmark: get rid of wrappers Remove extraneous macro wrappers for printk and qdisc_priv. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:08:38 -08:00
Patrick McHardy	13a0a096e5	[NET_SCHED]: kill obsolete NET_CLS_POLICE option The code is already gone for about half a year, the config option has been kept around to select the replacement options for easier upgrades. This seems long enough, people upgrading from older kernels will have to reconfigure a lot anyway. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:08:37 -08:00
Patrick McHardy	891687649a	[NET_SCHED]: sch_ingress: remove useless printk The printk about ingress qdisc registration error can't be triggered under normal circumstances. Since register_qdisc only fails for two identical registrations, the only way to trigger it is by loading the sch_ingress modules multiple times under different names, in which case we already return -EEXIST to userspace. Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:08:22 -08:00
Patrick McHardy	1389356735	[NET_SCHED]: sch_ingress: avoid a few #ifdefs Move the repeating "ifndef CONFIG_NET_CLS_ACT/ifdef CONFIG_NETFILTER" ifdefs into a single condition. Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:08:22 -08:00
Patrick McHardy	645a1e39e4	[NET_SCHED]: sch_ingress: move dependencies to Kconfig Instead of complaining at scheduler initialization time, check the dependencies in Kconfig. Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:08:21 -08:00
Patrick McHardy	c6ee877f2e	[NET_SCHED]: sch_ingress: remove unnecessary ops - ->reset is optional - sch_api provides identical defaults for ->dequeue/->requeue - ->drop can't happen since ingress never has a parent qdisc Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:08:20 -08:00
Patrick McHardy	e037834758	[NET_SCHED]: sch_ingress: return proper error code in ingress_graft() Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:08:20 -08:00
Patrick McHardy	c21d4d5dd2	[NET_SCHED]: sch_ingress: remove unused inner qdisc Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:08:19 -08:00
Patrick McHardy	cb53c04891	[NET_SCHED]: sch_ingress: remove qdisc_priv() wrapper Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:08:18 -08:00
Patrick McHardy	a47812211b	[NET_SCHED]: sch_ingress: remove excessive debugging Remove excessive debugging statements and some "future use" stuff. Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:08:18 -08:00
Patrick McHardy	58f4df423e	[NET_SCHED]: sch_ingress: formatting fixes Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:08:17 -08:00
Stephen Hemminger	6f9e98f7a9	[PKT_SCHED] SFQ: whitespace cleanup Add whitespace around operators, and add a few blank lines to improve readability. Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:08:16 -08:00
Stephen Hemminger	d46f8dd87d	[PKT_SCHED] SFQ: use net_random SFQ doesn't need true random numbers, it is only using them to salt a hash. Therefore it is better to use net_random() and avoid any possible problems with depleting the entropy pool. Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:08:15 -08:00
Stephen Hemminger	d3e994830d	[PKT_SCHED] SFQ: timer is deferrable The perturbation timer used for re-keying can be deferred, it doesn't need to be deterministic. Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:08:15 -08:00
Ilpo Järvinen	d88c305a03	[PKT_SCHED] HTB: htb_classid is dead static inline Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:01:59 -08:00
Eric Dumazet	9a429c4983	[NET]: Add some acquires/releases sparse annotations. Add __acquires() and __releases() annotations to suppress some sparse warnings. example of warnings : net/ipv4/udp.c:1555:14: warning: context imbalance in 'udp_seq_start' - wrong count at exit net/ipv4/udp.c:1571:13: warning: context imbalance in 'udp_seq_stop' - unexpected unlock Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 15:00:31 -08:00
Patrick McHardy	1999414a4e	[NETFILTER]: Mark hooks __read_mostly Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:56:07 -08:00
Patrick McHardy	41c5b31703	[NETFILTER]: Use nf_register_hooks for multiple registrations Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:56:06 -08:00
Patrick McHardy	be0ea7d5da	[NETFILTER]: Convert old checksum helper names Kill the defines again, convert to the new checksum helper names and remove the dependency of NET_ACT_NAT on NETFILTER. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:55:15 -08:00
Denis V. Lunev	97c53cacf0	[NET]: Make rtnetlink infrastructure network namespace aware (v3) After this patch none of the netlink callback support anything except the initial network namespace but the rtnetlink infrastructure now handles multiple network namespaces. Changes from v2: - IPv6 addrlabel processing Changes from v1: - no need for special rtnl_unlock handling - fixed IPv6 ndisc Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:54:25 -08:00
Denis V. Lunev	b854272b3c	[NET]: Modify all rtnetlink methods to only work in the initial namespace (v2) Before I can enable rtnetlink to work in all network namespaces I need to be certain that something won't break. So this patch deliberately disables all of the rtnletlink methods in everything except the initial network namespace. After the methods have been audited this extra check can be disabled. Changes from v1: - added IPv6 addrlabel protection Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2008-01-28 14:54:24 -08:00
Eric Dumazet	20fea08b5f	[NET]: Move Qdisc_class_ops and Qdisc_ops in appropriate sections. Qdisc_class_ops are const, and Qdisc_ops are mostly read. Using "const" and "__read_mostly" qualifiers helps to reduce false sharing. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:58 -08:00
Patrick McHardy	6e23ae2a48	[NETFILTER]: Introduce NF_INET_ hook values The IPv4 and IPv6 hook values are identical, yet some code tries to figure out the "correct" value by looking at the address family. Introduce NF_INET_* values for both IPv4 and IPv6. The old values are kept in a #ifndef __KERNEL__ section for userspace compatibility. Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:55 -08:00
Pavel Emelyanov	b24b8a247f	[NET]: Convert init_timer into setup_timer Many-many code in the kernel initialized the timer->function and timer->data together with calling init_timer(timer). There is already a helper for this. Use it for networking code. The patch is HUGE, but makes the code 130 lines shorter (98 insertions(+), 228 deletions(-)). Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:35 -08:00
Joe Perches	9a94b35184	[PKT_SCHED]: Spelling fixes Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-12-20 14:02:40 -08:00
Peter P Waskiewicz Jr	5f1a485d59	[PKT_SCHED]: Check subqueue status before calling hard_start_xmit The only qdiscs that check subqueue state before dequeue'ing are PRIO and RR. The other qdiscs, including the default pfifo_fast qdisc, will allow traffic bound for subqueue 0 through to hard_start_xmit. The check for netif_queue_stopped() is done above in pkt_sched.h, so it is unnecessary for qdisc_restart(). However, if the underlying driver is multiqueue capable, and only sets queue states on subqueues, this will allow packets to enter the driver when it's currently unable to process packets, resulting in expensive requeues and driver entries. This patch re-adds the check for the subqueue status before calling hard_start_xmit, so we can try and avoid the driver entry when the queues are stopped. Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-13 20:40:55 -08:00
Radu Rendec	b226801676	[PKT_SCHED] CLS_U32: Use ffs() instead of C code on hash mask to get first set bit. Computing the rank of the first set bit in the hash mask (for using later in u32_hash_fold()) was done with plain C code. Using ffs() instead makes the code more readable and improves performance (since ffs() is better optimized in assembler). Using the conditional operator on hash mask before applying ntohl() also saves one ntohl() call if mask is 0. Signed-off-by: Radu Rendec <radu.rendec@ines.ro> Signed-off-by: Jarek Poplawski <jarkao2@o2.pl> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-10 21:54:50 -08:00
Radu Rendec	543821c6f5	[PKT_SCHED] CLS_U32: Fix endianness problem with u32 classifier hash masks. While trying to implement u32 hashes in my shaping machine I ran into a possible bug in the u32 hash/bucket computing algorithm (net/sched/cls_u32.c). The problem occurs only with hash masks that extend over the octet boundary, on little endian machines (where htonl() actually does something). Let's say that I would like to use 0x3fc0 as the hash mask. This means 8 contiguous "1" bits starting at b6. With such a mask, the expected (and logical) behavior is to hash any address in, for instance, 192.168.0.0/26 in bucket 0, then any address in 192.168.0.64/26 in bucket 1, then 192.168.0.128/26 in bucket 2 and so on. This is exactly what would happen on a big endian machine, but on little endian machines, what would actually happen with current implementation is 0x3fc0 being reversed (into 0xc03f0000) by htonl() in the userspace tool and then applied to 192.168.x.x in the u32 classifier. When shifting right by 16 bits (rank of first "1" bit in the reversed mask) and applying the divisor mask (0xff for divisor 256), what would actually remain is 0x3f applied on the "168" octet of the address. One could say is this can be easily worked around by taking endianness into account in userspace and supplying an appropriate mask (0xfc03) that would be turned into contiguous "1" bits when reversed (0x03fc0000). But the actual problem is the network address (inside the packet) not being converted to host order, but used as a host-order value when computing the bucket. Let's say the network address is written as n31 n30 ... n0, with n0 being the least significant bit. When used directly (without any conversion) on a little endian machine, it becomes n7 ... n0 n8 ..n15 etc in the machine's registers. Thus bits n7 and n8 would no longer be adjacent and 192.168.64.0/26 and 192.168.128.0/26 would no longer be consecutive. The fix is to apply ntohl() on the hmask before computing fshift, and in u32_hash_fold() convert the packet data to host order before shifting down by fshift. With helpful feedback from Jamal Hadi Salim and Jarek Poplawski. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-07 04:11:45 -08:00
Evgeniy Polyakov	4f9f8311a0	[PKT_SCHED]: Fix OOPS when removing devices from a teql queuing discipline tecl_reset() is called from deactivate and qdisc is set to noop already, but subsequent teql_xmit does not know about it and dereference private data as teql qdisc and thus oopses. not catch it first :) Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-07 04:09:17 -08:00
Jamal Hadi Salim	a057ae3c10	[NET_CLS_ACT]: Use skb_act_clone clean skb_clone of any signs of CONFIG_NET_CLS_ACT and have mirred us skb_act_clone() Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-26 02:47:54 -07:00
Pavel Emelyanov	0034622693	[PKT_SCHED]: Fix sch_prio.c build with CONFIG_NETDEVICES_MULTIQUEUE Fix one more user of netiff_subqueue_stopped. To check for the queue id one must use the __netiff_subqueue_stoped call. This run out of my sight when I made the: `668f895a85` [NET]: Hide the queue_mapping field inside netif_subqueue_stopped commit :( Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-23 21:27:53 -07:00
Pavel Emelyanov	668f895a85	[NET]: Hide the queue_mapping field inside netif_subqueue_stopped Many places get the queue_mapping field from skb to pass it to the netif_subqueue_stopped() which will be 0 in any case. Make the helper that works with sk_buff Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-22 02:59:56 -07:00
Pavel Emelyanov	4e3ab47a54	[NET]: Make and use skb_get_queue_mapping Make the helper for getting the field, symmetrical to the "set" one. Return 0 if CONFIG_NETDEVICES_MULTIQUEUE=n Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-22 02:59:56 -07:00
Robert P. J. Day	3a4fa0a25d	Fix misspellings of "system", "controller", "interrupt" and "necessary". Fix the various misspellings of "system", controller", "interrupt" and "[un]necessary". Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Signed-off-by: Adrian Bunk <bunk@kernel.org>	2007-10-19 23:10:43 +02:00
Herbert Xu	ce0e32e65f	[NET]: Fix possible dev_deactivate race condition The function dev_deactivate is supposed to only return when all outstanding transmissions have completed. Unfortunately it is possible for store operations in the driver's transmit function to only become visible after dev_deactivate returns. This patch fixes this by taking the queue lock after we see the end of the queue run. This ensures that all effects of any previous transmit calls are visible. If however we detect that there is another queue run occuring, then we'll warn about it because this should never happen as we have pointed dev->qdisc to noop_qdisc within the same queue lock earlier in the functino. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-18 22:37:58 -07:00
Randy Dunlap	85ef3e5cad	[NET]: QoS/Sched as menuconfig Convert "QoS and/or fair queueing" to menuconfig. This makes it easy for someone to disable all sub-options with one config symbol. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-18 21:56:38 -07:00
Jeff Garzik	bfaae0f04c	[NET]: fix carrier-on bug? While looking at a net driver with the following construct, if (!netif_carrier_ok(dev)) netif_carrier_on(dev); it stuck me that the netif_carrier_ok() check was redundant, since netif_carrier_on() checks bit __LINK_STATE_NOCARRIER anyway. This is the same reason why netif_queue_stopped() need not be called prior to netif_wake_queue(). This is true, but there is however an unwanted side effect from assuming that netif_carrier_on() can be called multiple times: it touches the watchdog, regardless of pre-existing carrier state. The fix: move watchdog-up inside the bit-cleared code path. Signed-off-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-17 23:26:43 -07:00
Herbert Xu	3db05fea51	[NETFILTER]: Replace sk_buff ** with sk_buff * With all the users of the double pointers removed, this patch mops up by finally replacing all occurances of sk_buff ** in the netfilter API by sk_buff *. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-15 12:26:29 -07:00
Patrick McHardy	3c0cfc1358	[NET_SCHED]: Show timer resolution instead of clock resolution in /proc/net/psched The fourth parameter of /proc/net/psched is supposed to show the timer resultion and is used by HTB userspace to calculate the necessary burst rate. Currently we show the clock resolution, which results in a too low burst rate when the two differ. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:55:59 -07:00
Stephen Hemminger	cfcabdcc2d	[NET]: sparse warning fixes Fix a bunch of sparse warnings. Mostly about 0 used as NULL pointer, and shadowed variable declarations. One notable case was that hash size should have been unsigned. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:54:48 -07:00
Herbert Xu	b421995235	[PKT_SCHED]: Add stateless NAT Stateless NAT is useful in controlled environments where restrictions are placed on through traffic such that we don't need connection tracking to correctly NAT protocol-specific data. In particular, this is of interest when the number of flows or the number of addresses being NATed is large, or if connection tracking information has to be replicated and where it is not practical to do so. Previously we had stateless NAT functionality which was integrated into the IPv4 routing subsystem. This was a great solution as long as the NAT worked on a subnet to subnet basis such that the number of NAT rules was relatively small. The reason is that for SNAT the routing based system had to perform a linear scan through the rules. If the number of rules is large then major renovations would have take place in the routing subsystem to make this practical. For the time being, the least intrusive way of achieving this is to use the u32 classifier written by Alexey Kuznetsov along with the actions infrastructure implemented by Jamal Hadi Salim. The following patch is an attempt at this problem by creating a new nat action that can be invoked from u32 hash tables which would allow large number of stateless NAT rules that can be used/updated in constant time. The actual NAT code is mostly based on the previous stateless NAT code written by Alexey. In future we might be able to utilise the protocol NAT code from netfilter to improve support for other protocols. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:53:11 -07:00
Stephen Hemminger	3b04ddde02	[NET]: Move hardware header operations out of netdevice. Since hardware header operations are part of the protocol class not the device instance, make them into a separate object and save memory. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:52:52 -07:00
Stephen Hemminger	0c4e85813d	[NET]: Wrap netdevice hardware header creation. Add inline for common usage of hardware header creation, and fix bug in IPV6 mcast where the assumption about negative return is an errno. Negative return from hard_header means not enough space was available,(ie -N bytes). Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:52:50 -07:00
Jamal Hadi Salim	8236632fb3	[NET_SCHED]: explict hold dev tx lock For N cpus, with full throttle traffic on all N CPUs, funneling traffic to the same ethernet device, the devices queue lock is contended by all N CPUs constantly. The TX lock is only contended by a max of 2 CPUS. In the current mode of operation, after all the work of entering the dequeue region, we may endup aborting the path if we are unable to get the tx lock and go back to contend for the queue lock. As N goes up, this gets worse. The changes in this patch result in a small increase in performance with a 4CPU (2xdual-core) with no irq binding. Both e1000 and tg3 showed similar behavior; Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:52:15 -07:00
Ralf Baechle	10d024c1b2	[NET]: Nuke SET_MODULE_OWNER macro. It's been a useless no-op for long enough in 2.6 so I figured it's time to remove it. The number of people that could object because they're maintaining unified 2.4 and 2.6 drivers is probably rather small. [ Handled drivers added by netdev tree and some missed IRDA cases... -DaveM ] Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Jeff Garzik <jeff@garzik.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:51:13 -07:00
Jesper Dangaard Brouer	e9bef55d3d	[NET_SCHED]: Cleanup L2T macros and handle oversized packets Change L2T (length to time) macros, in all rate based schedulers, to call a common function qdisc_l2t() that does the rate table lookup. This function handles if the packet size lookup is larger than the rate table, which often occurs with TSO enabled. Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:49:20 -07:00
Eric W. Biederman	881d966b48	[NET]: Make the device list and device lookups per namespace. This patch makes most of the generic device layer network namespace safe. This patch makes dev_base_head a network namespace variable, and then it picks up a few associated variables. The functions: dev_getbyhwaddr dev_getfirsthwbytype dev_get_by_flags dev_get_by_name __dev_get_by_name dev_get_by_index __dev_get_by_index dev_ioctl dev_ethtool dev_load wireless_process_ioctl were modified to take a network namespace argument, and deal with it. vlan_ioctl_set and brioctl_set were modified so their hooks will receive a network namespace argument. So basically anthing in the core of the network stack that was affected to by the change of dev_base was modified to handle multiple network namespaces. The rest of the network stack was simply modified to explicitly use &init_net the initial network namespace. This can be fixed when those components of the network stack are modified to handle multiple network namespaces. For now the ifindex generator is left global. Fundametally ifindex numbers are per namespace, or else we will have corner case problems with migration when we get that far. At the same time there are assumptions in the network stack that the ifindex of a network device won't change. Making the ifindex number global seems a good compromise until the network stack can cope with ifindex changes when you change namespaces, and the like. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:49:10 -07:00
Eric W. Biederman	457c4cbc5a	[NET]: Make /proc/net per network namespace This patch makes /proc/net per network namespace. It modifies the global variables proc_net and proc_net_stat to be per network namespace. The proc_net file helpers are modified to take a network namespace argument, and all of their callers are fixed to pass &init_net for that argument. This ensures that all of the /proc/net files are only visible and usable in the initial network namespace until the code behind them has been updated to be handle multiple network namespaces. Making /proc/net per namespace is necessary as at least some files in /proc/net depend upon the set of network devices which is per network namespace, and even more files in /proc/net have contents that are relevant to a single network namespace. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:49:06 -07:00
Stephen Hemminger	bea3348eef	[NET]: Make NAPI polling independent of struct net_device objects. Several devices have multiple independant RX queues per net device, and some have a single interrupt doorbell for several queues. In either case, it's easier to support layouts like that if the structure representing the poll is independant from the net device itself. The signature of the ->poll() call back goes from: int foo_poll(struct net_device dev, int budget) to int foo_poll(struct napi_struct napi, int budget) The caller is returned the number of RX packets processed (or the number of "NAPI credits" consumed if you want to get abstract). The callee no longer messes around bumping dev->quota, budget, etc. because that is all handled in the caller upon return. The napi_struct is to be embedded in the device driver private data structures. Furthermore, it is the driver's responsibility to disable all NAPI instances in it's ->stop() device close handler. Since the napi_struct is privatized into the driver's private data structures, only the driver knows how to get at all of the napi_struct instances it may have per-device. With lots of help and suggestions from Rusty Russell, Roland Dreier, Michael Chan, Jeff Garzik, and Jamal Hadi Salim. Bug fixes from Thomas Graf, Roland Dreier, Peter Zijlstra, Joseph Fannin, Scott Wood, Hans J. Koch, and Michael Chan. [ Ported to current tree and all drivers converted. Integrated Stephen's follow-on kerneldoc additions, and restored poll_list handling to the old style to fix mutual exclusion issues. -DaveM ] Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-10 16:47:45 -07:00
Stephen Hemminger	bf1b803b01	[PKT_SCHED] cls_u32: error code isn't been propogated properly Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-07 23:57:45 -07:00

... 4 5 6 7 8 ...

826 Commits