At the moment, a lot of load balancing code that is irrelevant to non
SMP systems gets included during non SMP builds.
This patch addresses this issue and reduces the binary size on non
SMP systems:
text data bss dec hex filename
10983 28 1192 12203 2fab sched.o.before
10739 28 1192 11959 2eb7 sched.o.after
Signed-off-by: Peter Williams <pwil3058@bigpond.net.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
At the moment, balance_tasks() provides low level functionality for both
move_tasks() and move_one_task() (indirectly) via the load_balance()
function (in the sched_class interface) which also provides dual
functionality. This dual functionality complicates the interfaces and
internal mechanisms and makes the run time overhead of operations that
are called with two run queue locks held.
This patch addresses this issue and reduces the overhead of these
operations.
Signed-off-by: Peter Williams <pwil3058@bigpond.net.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
- replace "cont" with "cgrp" in a few places in the CFS cgroup code,
- use write_uint rather than write for cpu.shares write function
Signed-off-by: Paul Menage <menage@google.com>
Acked-by : Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
profile=sleep only works if CONFIG_SCHEDSTATS is set. This patch notes
the limitation in Documentation/kernel-parameters.txt and prints a
warning at boot-time if profile=sleep is used without CONFIG_SCHEDSTAT.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
A full register dump along with stack backtrace would make the
"scheduling while atomic" message more helpful. Use show_regs() instead
of dump_stack() for this. We already know we're atomic in here (that is
why this function was called) so show_regs()'s atomicity expectations
are guaranteed.
Also, modify the output of the "BUG: scheduling while atomic:" header a
bit to keep task->comm and task->pid together and preempt_count() after
them.
Signed-off-by: Satyam Sharma <satyam@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
clean up sched_domain_debug().
this also shrinks the code a bit:
text data bss dec hex filename
50474 4306 480 55260 d7dc sched.o.before
50404 4306 480 55190 d796 sched.o.after
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Jeff Dike noticed that wait_for_completion_interruptible()'s prototype
had a mismatched fastcall.
Fix this by removing the fastcall attributes from all the completion APIs.
Found-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
commit 029190c515 (cpuset
sched_load_balance flag) was not tested SCHED_DEBUG enabled as
committed as it dereferences NULL when used and it reordered
the sysctl registration to cause it to never show any domains
or their tunables.
Fixes:
1) restore arch_init_sched_domains ordering
we can't walk the domains before we build them
presently we register cpus with empty directories (no domain
directories or files).
2) make unregister_sched_domain_sysctl do nothing when already unregistered
detach_destroy_domains is now called one set of cpus at a time
unregister_syctl dereferences NULL if called with a null.
While the the function would always dereference null if called
twice, in the previous code it was always called once and then
was followed a register. So only the hidden bug of the
sysctl_root_table not being allocated followed by an attempt to
free it would have shown the error.
3) always call unregister and register in partition_sched_domains
The code is "smart" about unregistering only needed domains.
Since we aren't guaranteed any calls to unregister, always
unregister. Without calling register on the way out we
will not have a table or any sysctl tree.
4) warn if register is called without unregistering
The previous table memory is lost, leaving pointers to the
later freed memory in sysctl and leaking the memory of the
tables.
Before this patch on a 2-core 4-thread box compiled for SMT and NUMA,
the domains appear empty (there are actually 3 levels per cpu). And as
soon as two domains a null pointer is dereferenced (unreliable in this
case is stack garbage):
bu19a:~# ls -R /proc/sys/kernel/sched_domain/
/proc/sys/kernel/sched_domain/:
cpu0 cpu1 cpu2 cpu3
/proc/sys/kernel/sched_domain/cpu0:
/proc/sys/kernel/sched_domain/cpu1:
/proc/sys/kernel/sched_domain/cpu2:
/proc/sys/kernel/sched_domain/cpu3:
bu19a:~# mkdir /dev/cpuset
bu19a:~# mount -tcpuset cpuset /dev/cpuset/
bu19a:~# cd /dev/cpuset/
bu19a:/dev/cpuset# echo 0 > sched_load_balance
bu19a:/dev/cpuset# mkdir one
bu19a:/dev/cpuset# echo 1 > one/cpus
bu19a:/dev/cpuset# echo 0 > one/sched_load_balance
Unable to handle kernel paging request for data at address 0x00000018
Faulting instruction address: 0xc00000000006b608
NIP: c00000000006b608 LR: c00000000006b604 CTR: 0000000000000000
REGS: c000000018d973f0 TRAP: 0300 Not tainted (2.6.23-bml)
MSR: 9000000000009032 <EE,ME,IR,DR> CR: 28242442 XER: 00000000
DAR: 0000000000000018, DSISR: 0000000040000000
TASK = c00000001912e340[1987] 'bash' THREAD: c000000018d94000 CPU: 2
..
NIP [c00000000006b608] .unregister_sysctl_table+0x38/0x110
LR [c00000000006b604] .unregister_sysctl_table+0x34/0x110
Call Trace:
[c000000018d97670] [c000000007017270] 0xc000000007017270 (unreliable)
[c000000018d97720] [c000000000058710] .detach_destroy_domains+0x30/0xb0
[c000000018d977b0] [c00000000005cf1c] .partition_sched_domains+0x1bc/0x230
[c000000018d97870] [c00000000009fdc4] .rebuild_sched_domains+0xb4/0x4c0
[c000000018d97970] [c0000000000a02e8] .update_flag+0x118/0x170
[c000000018d97a80] [c0000000000a1768] .cpuset_common_file_write+0x568/0x820
[c000000018d97c00] [c00000000009d95c] .cgroup_file_write+0x7c/0x180
[c000000018d97cf0] [c0000000000e76b8] .vfs_write+0xe8/0x1b0
[c000000018d97d90] [c0000000000e810c] .sys_write+0x4c/0x90
[c000000018d97e30] [c00000000000852c] syscall_exit+0x0/0x40
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The patch is big. Really big. You just won't believe how vastly hugely
mindbogglingly big it is. I mean you may think it's a long way down the
road to the chemist, but that's just peanuts to how big the patch from
2.6.23 is.
But it's all good.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a port type definition for the Freescale UART driver ports (mcf.c).
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove old definitions of the timer function pointers.
Add definitions of the common hardware timer functions.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add platform support structure for use with new ColdFire UART driver.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mark the m68knommu setup_arch() function as __init.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Removed header includes not needed.
Remove use of old m68knommu timer function pointers.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Clean up 68EZ328 timer support code. Removed header includes not needed.
Remove use of old m68knommu timer function pointers.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Clean up 68360 timer support code. Removed header includes not needed.
Remove use of old m68knommu timer function pointers. Use common function
naming for 68328 timer functions.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use common function naming for 68328 timer functions to make them
consistent with the various other hardware m68knommu timers.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Clean up 68328 timer support code. Removed header includes not needed.
Remove use of old m68knommu timer function pointers.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'irq-upstream' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/misc-2.6:
[SPARC, XEN, NET/CXGB3] use irq_handler_t where appropriate
drivers/char/riscom8: clean up irq handling
isdn/sc: irq handler clean
isdn/act2000: fix major bug. clean irq handler.
char/pcmcia/synclink_cs: trim trailing whitespace
drivers/char/ip2: separate polling and irq-driven work entry points
drivers/char/ip2: split out irq core logic into separate function
[NETDRVR] lib82596, netxen: delete pointless tests from irq handler
Eliminate pointless casts from void* in a few driver irq handlers.
[PARPORT] Remove unused 'irq' argument from parport irq functions
[PARPORT] Kill useful 'irq' arg from parport_{generic_irq,ieee1284_interrupt}
[PARPORT] Consolidate code copies into a single generic irq handler
* 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6: (39 commits)
Remove Andrew Morton from list of net driver maintainers.
bonding: Acquire correct locks in alb for promisc change
bonding: Convert more locks to _bh, acquire rtnl, for new locking
bonding: Convert locks to _bh, rework alb locking for new locking
bonding: Convert miimon to new locking
bonding: Convert balance-rr transmit to new locking
Convert bonding timers to workqueues
Update MAINTAINERS to reflect my (jgarzik's) current efforts.
pasemi_mac: fix typo
defxx.c: dfx_bus_init() is __devexit not __devinit
s390 MAINTAINERS
remove header_ops bug in qeth driver
sky2: crash on remove
MIPSnet: Delete all the useless debugging printks.
AR7 ethernet: small post-merge cleanups and fixes
mv643xx_eth: Hook up mv643xx_get_sset_count
mv643xx_eth: Remove obsolete checksum offload comment
mv643xx_eth: Merge drivers/net/mv643xx_eth.h into mv643xx_eth.c
mv643xx_eth: Remove unused register defines
mv643xx_eth: Clean up mv643xx_eth.h
...
Set bits 0, 4, 5 and 7 of PCI configuration register 0x40 in the
quirk. This has the following effects and is recommended by the
vendor.
* Force enable of IDE channels (used to be left alone as BIOS
configured)
* Change initial phase behavior of PIO cycle such that the host pulls
down the bus instead of tristating it. Vendor recommends this
setting.
The above settings are better for the current generation of
controllers and needed for the upcoming next generation.
Tested on JMB363.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Ethan Hsiao <ethanhsiao@jmicron.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Another one doing spurious NCQ completions. Blacklist it.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Luca Tettamanti <kronos.it@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Did a complete audit of these and found we have another error case.
ata_bus_softreset calls ata_check_status which means that it tries to do
an ioread8 on the port blindly and check versus 0xFF for an error.
It should of course be using the ap->ops method for this via chk_status,
and this bug causes a wrog status call on the NS87415 at least.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Tackle the relatively sane complaints of checkpatch --file.
The vast majority is indentation and whitespace changes, the rest are
* #include fixes
* printk KERN_xxx prefix addition
* BSS/initializer cleanups
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Update ALB mode monitor to hold correct locks (RTNL and nothing
else) when calling dev_set_promiscuity.
Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Convert more lock acquisitions to _bh flavor to avoid deadlock
with workqueue activity and add acquisition of RTNL in appropriate places.
Affects ALB mode, as well as core bonding functions and sysfs.
Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Convert locking-related activity to new & improved system.
Convert some lock acquisitions to _bh and rework parts of ALB mode, both
to avoid deadlocks with workqueue activity.
Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Convert mii (link state) monitor to acquire correct locks for
failover events. In particular, failovers generally require RTNL at a low
level (when manipulating device MAC addresses, for example) and no other
locks. The high level monitor is responsible for acquiring a known set
of locks, RTNL, the bond->lock for read and the slave_lock for write, and
the low level failover processing can then release appropriate locks as
needed. This patch provides the high level portion.
As it is undesirable to acquire RTNL for every monitor pass (which
may occur as often as every 10 ms), the miimon has been converted to
do conditional locking. A first pass inspects all slaves to determine
if any action is required, and if so, a second pass (after acquring RTNL)
is done to perform any actions (doing a complete rescan, as the situation
may have changed when all locks were released).
Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Change locking in balance-rr transmit processing to use a free
running counter to determine which slave to transmit on. Instead, a
free-running counter is maintained, and modulo arithmetic used to select
a slave for transmit.
This removes lock operations from the TX path, and eliminates
a deadlock introduced by the conversion to work queues.
Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Convert bonding timers to workqueues. This converts the various
monitor functions to run in periodic work queues instead of timers. This
patch introduces the framework and convers the calls, but does not resolve
various locking issues, and does not stand alone.
Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Remove net driver entries (they fall under the more general 'net driver
maintainer') umbrella.
Remove entries for older drivers that either no longer exist, are about
to be removed, or I no longer care about.
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Add missing &:
drivers/net/pasemi_mac.c: In function 'pasemi_mac_clean_rx':
drivers/net/pasemi_mac.c:553: warning: passing argument 1 of 'prefetch'
makes pointer from integer without a cast
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
The dfx_bus_uninit() call is called from dfx_unregister() which is
__devexit and which is ultimately the ->remove call for the device.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
adding Frank Blaschka to s390 networking maintainers
Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
Signed-off-by: Ursula Braun <braunu@de.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Remove qeth bug caused by commit:
[NET]: Move hardware header operations out of netdevice.
This is the second part of the qeth header_ops patch, since
first patch sent 10/19 has been insufficient.
Nevertheless first patch is still valid and should be kept.
Signed-off-by: Ursula Braun <braunu@de.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Fix off-by one in remove logic that just got introduced.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>