linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-11-19 02:04:19 +08:00

Author	SHA1	Message	Date
Akinobu Mita	3cce4856ff	[PATCH] fix create_write_pipe() error check The return value of create_write_pipe()/create_read_pipe() should be checked by IS_ERR(). Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-11-28 17:26:50 -08:00
Jan Beulich	ff0a538d8b	[PATCH] x86-64: work around gcc4 issue with -Os in Dwarf2 stack unwind This fixes a problem with gcc4 mis-compiling the stack unwind code under -Os, which resulted in 'stuck' messages whenever an assembly routine was encountered. (The second hunk is trivial cleanup.) Signed-off-by: Jan Beulich <jbeulich@novell.com>	2006-11-28 20:12:59 +01:00
Arjan van de Ven	cfd3ef2346	[PATCH] lockdep: spin_lock_irqsave_nested() Introduce spin_lock_irqsave_nested(); implementation from: http://lkml.org/lkml/2006/6/1/122 Patch from: http://lkml.org/lkml/2006/9/13/258 [akpm@osdl.org: two compile fixes] Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Jiri Kosina <jikos@jikos.cz> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-11-25 13:28:34 -08:00
Akinobu Mita	753ca4f312	[PATCH] fix copy_process() error check The return value of copy_process() should be checked by IS_ERR(). Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-11-25 13:28:34 -08:00
Linus Torvalds	b42172fc7b	Don't call "note_interrupt()" with irq descriptor lock held This reverts commit `f72fa70760`, and solves the problem that it tried to fix by simply making "__do_IRQ()" call the note_interrupt() function without the lock held, the way everybody else does. It should be noted that all interrupt handling code must never allow the descriptor actors to be entered "recursively" (that's why we do all the magic IRQ_PENDING stuff in the first place), so there actually is exclusion at that much higher level, even in the absense of locking. Acked-by: Vivek Goyal <vgoyal@in.ibm.com> Acked-by:Pavel Emelianov <xemul@openvz.org> Cc: Andrew Morton <akpm@osdl.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Adrian Bunk <bunk@stusta.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-11-22 09:32:06 -08:00
David Howells	c4028958b6	WorkStruct: make allyesconfig Fix up for make allyesconfig. Signed-Off-By: David Howells <dhowells@redhat.com>	2006-11-22 14:57:56 +00:00
David Howells	65f27f3844	WorkStruct: Pass the work_struct pointer instead of context data Pass the work_struct pointer to the work function rather than context data. The work function can use container_of() to work out the data. For the cases where the container of the work_struct may go away the moment the pending bit is cleared, it is made possible to defer the release of the structure by deferring the clearing of the pending bit. To make this work, an extra flag is introduced into the management side of the work_struct. This governs auto-release of the structure upon execution. Ordinarily, the work queue executor would release the work_struct for further scheduling or deallocation by clearing the pending bit prior to jumping to the work function. This means that, unless the driver makes some guarantee itself that the work_struct won't go away, the work function may not access anything else in the work_struct or its container lest they be deallocated.. This is a problem if the auxiliary data is taken away (as done by the last patch). However, if the pending bit is not cleared before jumping to the work function, then the work function may access the work_struct and its container with no problems. But then the work function must itself release the work_struct by calling work_release(). In most cases, automatic release is fine, so this is the default. Special initiators exist for the non-auto-release case (ending in _NAR). Signed-Off-By: David Howells <dhowells@redhat.com>	2006-11-22 14:55:48 +00:00
David Howells	365970a1ea	WorkStruct: Merge the pending bit into the wq_data pointer Reclaim a word from the size of the work_struct by folding the pending bit and the wq_data pointer together. This shouldn't cause misalignment problems as all pointers should be at least 4-byte aligned. Signed-Off-By: David Howells <dhowells@redhat.com>	2006-11-22 14:54:49 +00:00
David Howells	6bb49e5965	WorkStruct: Typedef the work function prototype Define a type for the work function prototype. It's not only kept in the work_struct struct, it's also passed as an argument to several functions. This makes it easier to change it. Signed-Off-By: David Howells <dhowells@redhat.com>	2006-11-22 14:54:45 +00:00
David Howells	52bad64d95	WorkStruct: Separate delayable and non-delayable events. Separate delayable work items from non-delayable work items be splitting them into a separate structure (delayed_work), which incorporates a work_struct and the timer_list removed from work_struct. The work_struct struct is huge, and this limits it's usefulness. On a 64-bit architecture it's nearly 100 bytes in size. This reduces that by half for the non-delayable type of event. Signed-Off-By: David Howells <dhowells@redhat.com>	2006-11-22 14:54:01 +00:00
Ingo Molnar	1ff5683043	[PATCH] lockdep: fix static keys in module-allocated percpu areas lockdep got confused by certain locks in modules: INFO: trying to register non-static key. the code is fine but needs lockdep annotation. turning off the locking correctness validator. Call Trace: [<ffffffff8026f40d>] dump_trace+0xaa/0x3f2 [<ffffffff8026f78f>] show_trace+0x3a/0x60 [<ffffffff8026f9d1>] dump_stack+0x15/0x17 [<ffffffff802abfe8>] __lock_acquire+0x724/0x9bb [<ffffffff802ac52b>] lock_acquire+0x4d/0x67 [<ffffffff80267139>] rt_spin_lock+0x3d/0x41 [<ffffffff8839ed3f>] :ip_conntrack:__ip_ct_refresh_acct+0x131/0x174 [<ffffffff883a1334>] :ip_conntrack:udp_packet+0xbf/0xcf [<ffffffff8839f9af>] :ip_conntrack:ip_conntrack_in+0x394/0x4a7 [<ffffffff8023551f>] nf_iterate+0x41/0x7f [<ffffffff8025946a>] nf_hook_slow+0x64/0xd5 [<ffffffff802369a2>] ip_rcv+0x24e/0x506 [...] Steven Rostedt found the bug: static_obj() check did not take PERCPU_ENOUGH_ROOM into account, so in-module DEFINE_PER_CPU-area locks were triggering this message. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-11-17 11:10:37 -08:00
Zhang, Yanmin	b86432b42e	[PATCH] some irq_chip variables point to NULL I got an oops when booting 2.6.19-rc5-mm1 on my ia64 machine. Below is the log. Oops 11012296146944 [1] Modules linked in: binfmt_misc dm_mirror dm_multipath dm_mod thermal processor f an container button sg eepro100 e100 mii Pid: 0, CPU 0, comm: swapper psr : 0000121008022038 ifs : 800000000000040b ip : [<a0000001000e1411>] Not tainted ip is at __do_IRQ+0x371/0x3e0 unat: 0000000000000000 pfs : 000000000000040b rsc : 0000000000000003 rnat: 656960155aa56aa5 bsps: a00000010058b890 pr : 656960155aa55a65 ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c0270033f csd : 0000000000000000 ssd : 0000000000000000 b0 : a0000001000e1390 b6 : a0000001005beac0 b7 : e00000007f01aa00 f6 : 000000000000000000000 f7 : 0ffe69090000000000000 f8 : 1000a9090000000000000 f9 : 0ffff8000000000000000 f10 : 1000a908ffffff6f70000 f11 : 1003e0000000000000909 r1 : a000000100fbbff0 r2 : 0000000000010002 r3 : 0000000000010001 r8 : fffffffffffbffff r9 : a000000100bd8060 r10 : a000000100dd83b8 r11 : fffffffffffeffff r12 : a000000100bcbbb0 r13 : a000000100bc4000 r14 : 0000000000010000 r15 : 0000000000010000 r16 : a000000100c01aa8 r17 : a000000100d2c350 r18 : 0000000000000000 r19 : a000000100d2c300 r20 : a000000100c01a88 r21 : 0000000080010100 r22 : a000000100c01ac0 r23 : a0000001000108e0 r24 : e000000477980004 r25 : 0000000000000000 r26 : 0000000000000000 r27 : e00000000913400c r28 : e0000004799ee51c r29 : e0000004778b87f0 r30 : a000000100d2c300 r31 : a00000010005c7e0 Call Trace: [<a000000100014600>] show_stack+0x40/0xa0 sp=a000000100bcb760 bsp=a000000100bc4f40 [<a000000100014f00>] show_regs+0x840/0x880 sp=a000000100bcb930 bsp=a000000100bc4ee8 [<a000000100037fb0>] die+0x250/0x320 sp=a000000100bcb930 bsp=a000000100bc4ea0 [<a00000010005e5f0>] ia64_do_page_fault+0x8d0/0xa20 sp=a000000100bcb950 bsp=a000000100bc4e50 [<a00000010000caa0>] ia64_leave_kernel+0x0/0x290 sp=a000000100bcb9e0 bsp=a000000100bc4e50 [<a0000001000e1410>] __do_IRQ+0x370/0x3e0 sp=a000000100bcbbb0 bsp=a000000100bc4df0 [<a000000100011f50>] ia64_handle_irq+0x170/0x220 sp=a000000100bcbbb0 bsp=a000000100bc4dc0 [<a00000010000caa0>] ia64_leave_kernel+0x0/0x290 sp=a000000100bcbbb0 bsp=a000000100bc4dc0 [<a000000100012390>] ia64_pal_call_static+0x90/0xc0 sp=a000000100bcbd80 bsp=a000000100bc4d78 [<a000000100015630>] default_idle+0x90/0x160 sp=a000000100bcbd80 bsp=a000000100bc4d58 [<a000000100014290>] cpu_idle+0x1f0/0x440 sp=a000000100bcbe20 bsp=a000000100bc4d18 [<a000000100009980>] rest_init+0xc0/0xe0 sp=a000000100bcbe20 bsp=a000000100bc4d00 [<a0000001009f8ea0>] start_kernel+0x6a0/0x6c0 sp=a000000100bcbe20 bsp=a000000100bc4ca0 [<a0000001000089f0>] __end_ivt_text+0x6d0/0x6f0 sp=a000000100bcbe30 bsp=a000000100bc4c00 <0>Kernel panic - not syncing: Aiee, killing interrupt handler! The root cause is that some irq_chip variables, especially ia64_msi_chip, initiate their memeber end to point to NULL. __do_IRQ doesn't check if irq_chip->end is null and just calls it after processing the interrupt. As irq_chip->end is called at many places, so I fix it by reinitiating irq_chip->end to dummy_irq_chip.end, e.g., a noop function. Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-11-16 11:43:37 -08:00
Linus Torvalds	9a3a04ac38	Revert "[PATCH] fix Data Acess error in dup_fd" This reverts commit `0130b0b32e`. Sergey Vlasov points out (and Vadim Lobanov concurs) that the bug it was supposed to fix must be some unrelated memory corruption, and the "fix" actually causes more problems: "However, the new code does not look safe in all cases. If some other task has opened more files while dup_fd() released oldf->file_lock, the new code will update open_files to the new larger value. But newf was allocated with the old smaller value of open_files, therefore subsequent accesses to newf may try to write into unallocated memory." so revert it. Cc: Sharyathi Nagesh <sharyath@in.ibm.com> Cc: Sergey Vlasov <vsu@altlinux.ru> Cc: Vadim Lobanov <vlobanov@speakeasy.net> Cc: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-11-14 15:20:51 -08:00
Andrew Morton	8b126b7753	[PATCH] setup_irq(): better mismatch debugging When we get a mismatch between handlers on the same IRQ, all we get is "IRQ handler type mismatch for IRQ n". Let's print the name of the presently-registered handler with which we got the mismatch. Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-11-14 09:09:26 -08:00
Pavel Emelianov	f72fa70760	[PATCH] Fix misrouted interrupts deadlocks While testing kernel on machine with "irqpoll" option I've caught such a lockup: __do_IRQ() spin_lock(&desc->lock); desc->chip->ack(); /* IRQ is ACKed / note_interrupt() misrouted_irq() handle_IRQ_event() if (...) local_irq_enable_in_hardirq(); / interrupts are enabled from now / ... __do_IRQ() / same IRQ we've started from / spin_lock(&desc->lock); / LOCKUP / Looking at misrouted_irq() code I've found that a potential deadlock like this can also take place: 1CPU: __do_IRQ() spin_lock(&desc->lock); / irq = A / misrouted_irq() for (i = 1; i < NR_IRQS; i++) { spin_lock(&desc->lock); / irq = B / if (desc->status & IRQ_INPROGRESS) { 2CPU: __do_IRQ() spin_lock(&desc->lock); / irq = B / misrouted_irq() for (i = 1; i < NR_IRQS; i++) { spin_lock(&desc->lock); / irq = A */ if (desc->status & IRQ_INPROGRESS) { As the second lock on both CPUs is taken before checking that this irq is being handled in another processor this may cause a deadlock. This issue is only theoretical. I propose the attached patch to fix booth problems: when trying to handle misrouted IRQ active desc->lock may be unlocked. Acked-by: Ingo Molnar <mingo@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-11-13 07:40:43 -08:00
Sharyathi Nagesh	0130b0b32e	[PATCH] fix Data Acess error in dup_fd On running the Stress Test on machine for more than 72 hours following error message was observed. 0:mon> e cpu 0x0: Vector: 300 (Data Access) at [c00000007ce2f7f0] pc: c000000000060d90: .dup_fd+0x240/0x39c lr: c000000000060d6c: .dup_fd+0x21c/0x39c sp: c00000007ce2fa70 msr: 800000000000b032 dar: ffffffff00000028 dsisr: 40000000 current = 0xc000000074950980 paca = 0xc000000000454500 pid = 27330, comm = bash 0:mon> t [c00000007ce2fa70] c000000000060d28 .dup_fd+0x1d8/0x39c (unreliable) [c00000007ce2fb30] c000000000060f48 .copy_files+0x5c/0x88 [c00000007ce2fbd0] c000000000061f5c .copy_process+0x574/0x1520 [c00000007ce2fcd0] c000000000062f88 .do_fork+0x80/0x1c4 [c00000007ce2fdc0] c000000000011790 .sys_clone+0x5c/0x74 [c00000007ce2fe30] c000000000008950 .ppc_clone+0x8/0xc The problem is because of race window. When if(expand) block is executed in dup_fd unlocking of oldf->file_lock give a window for fdtable in oldf to be modified. So actual open_files in oldf may not match with open_files variable. Cc: Vadim Lobanov <vlobanov@speakeasy.net> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-11-13 07:40:43 -08:00
Eric W. Biederman	d99f160ac5	[PATCH] sysctl: allow a zero ctl_name in the middle of a sysctl table Since it is becoming clear that there are just enough users of the binary sysctl interface that completely removing the binary interface from the kernel will not be an option for foreseeable future, we need to find a way to address the sysctl maintenance issues. The basic problem is that sysctl requires one central authority to allocate sysctl numbers, or else conflicts and ABI breakage occur. The proc interface to sysctl does not have that problem, as names are not densely allocated. By not terminating a sysctl table until I have neither a ctl_name nor a procname, it becomes simple to add sysctl entries that don't show up in the binary sysctl interface. Which allows people to avoid allocating a binary sysctl value when not needed. I have audited the kernel code and in my reading I have not found a single sysctl table that wasn't terminated by a completely zero filled entry. So this change in behavior should not affect anything. I think this mechanism eases the pain enough that combined with a little disciple we can solve the reoccurring sysctl ABI breakage. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-11-06 01:46:23 -08:00
Eric W. Biederman	0e009be8a0	[PATCH] Improve the removed sysctl warnings Don't warn about libpthread's access to kernel.version. When it receives -ENOSYS it will read /proc/sys/kernel/version. If anything else shows up print the sysctl number string. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Cal Peake <cp@absolutedigital.net> Cc: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-11-06 01:46:23 -08:00
Peter Zijlstra	64efade11c	[PATCH] lockdep: fix delayacct locking bug Make the delayacct lock irqsave; this avoids the possible deadlock where an interrupt is taken while holding the delayacct lock which needs to take the delayacct lock. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Shailabh Nagar <nagar@watson.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-11-06 01:46:23 -08:00
Gautham R Shenoy	4b96b1a10c	[PATCH] Fix the spurious unlock_cpu_hotplug false warnings Cpu-hotplug locking has a minor race case caused because of setting the variable "recursive" to NULL after releasing the cpu_bitmask_lock in the function unlock_cpu_hotplug,instead of doing so before releasing the cpu_bitmask_lock. This was the cause of most of the recent false spurious lock_cpu_unlock warnings. This should fix the problem reported by Martin Lorenz reported in http://lkml.org/lkml/2006/10/29/127. Thanks to Srinivasa DS for pointing it out. Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-11-06 01:46:22 -08:00
Linus Torvalds	10b1fbdb0a	Make sure "user->sigpending" count is in sync The previous commit (`45c18b0bb5`, aka "Fix unlikely (but possible) race condition on task->user access") fixed a potential oops due to __sigqueue_alloc() getting its "user" pointer out of sync with switch_user(), and accessing a user pointer that had been de-allocated on another CPU. It still left another (much less serious) problem, where a concurrent __sigqueue_alloc and swich_user could cause sigqueue_alloc to do signal pending reference counting for a _different_ user than the one it then actually ended up using. No oops, but we'd end up with the wrong signal accounting. Another case of Oleg's eagle-eyes picking up the problem. This is trivially fixed by just making sure we load whichever "user" structure we decide to use (it doesn't matter _which_ one we pick, we just need to pick one) just once. Acked-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Andrew Morton <akpm@osdl.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-11-04 13:03:00 -08:00
Linus Torvalds	45c18b0bb5	Fix unlikely (but possible) race condition on task->user access There's a possible race condition when doing a "switch_uid()" from one user to another, which could race with another thread doing a signal allocation and looking at the old thread ->user pointer as it is freed. This explains an oops reported by Lukasz Trabinski: http://permalink.gmane.org/gmane.linux.kernel/462241 We fix this by delaying the (reference-counted) freeing of the user structure until the thread signal handler lock has been released, so that we know that the signal allocation has either seen the new value or has properly incremented the reference count of the old one. Race identified by Oleg Nesterov. Cc: Lukasz Trabinski <lukasz@wsisiz.edu.pl> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Andrew Morton <akpm@osdl.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-11-04 10:06:02 -08:00
Stephen Rothwell	3fd5939798	[PATCH] Create compat_sys_migrate_pages This is needed on bigendian 64bit architectures. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-11-03 12:27:59 -08:00
Rafael J. Wysocki	b918f6e62c	[PATCH] swsusp: debugging Add a swsusp debugging mode. This does everything that's needed for a suspend except for actually suspending. So we can look in the log messages and work out a) what code is being slow and b) which drivers are misbehaving. (1) # echo testproc > /sys/power/disk # echo disk > /sys/power/state This should turn off the non-boot CPU, freeze all processes, wait for 5 seconds and then thaw the processes and the CPU. (2) # echo test > /sys/power/disk # echo disk > /sys/power/state This should turn off the non-boot CPU, freeze all processes, shrink memory, suspend all devices, wait for 5 seconds, resume the devices etc. Cc: Pavel Machek <pavel@ucw.cz> Cc: Stefan Seyfried <seife@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-11-03 12:27:58 -08:00
Andrew Morton	19c6b6ed3f	[PATCH] schedule removal of FUTEX_FD Apparently FUTEX_FD is unfixably racy and nothing uses it (or if it does, it shouldn't). Add a warning printk, give any remaining users six months to migrate off it. Cc: Ulrich Drepper <drepper@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-11-03 12:27:58 -08:00
Andrew Morton	f46c483357	[PATCH] Add printk_timed_ratelimit() printk_ratelimit() has global state which makes it not useful for callers which wish to perform ratelimiting at a particular frequency. Add a printk_timed_ratelimit() which utilises caller-provided state storage to permit more flexibility. This function can in fact be used for things other than printk ratelimiting and is perhaps poorly named. Cc: Ulrich Drepper <drepper@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-11-03 12:27:58 -08:00
Oleg Nesterov	4a279ff1ea	[PATCH] taskstats: fix sub-threads accounting If there are no listeners, taskstats_exit_send() just returns because taskstats_exit_alloc() didn't allocate *tidstats. This is wrong, each sub-thread should do fill_tgid_exit() on exit, otherwise its ->delays is not recorded in ->signal->stats and lost. Q: We don't send TASKSTATS_TYPE_AGGR_TGID when single-threaded process exits. Is it good? How can the listener figure out that it was actually a process exit, not sub-thread? Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Balbir Singh <balbir@in.ibm.com> Acked-by: Shailabh Nagar <nagar@watson.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-31 08:07:00 -08:00
Oleg Nesterov	f0ec1aaf54	[PATCH] xacct_add_tsk: fix pure theoretical ->mm use-after-free Paranoid fix. The task can free its ->mm after the 'if (p->mm)' check. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Shailabh Nagar <nagar@watson.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Jay Lan <jlan@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-30 12:08:41 -08:00
Randy Dunlap	0e2d57fc6e	[PATCH] ndiswrapper: don't set the module->taints flags For ndiswrapper, don't set the module->taints flags, just set the kernel global tainted flag. This should allow ndiswrapper to continue to use GPL symbols. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Florin Malita <fmalita@gmail.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-30 12:08:40 -08:00
Oleg Nesterov	3d8334def5	[PATCH] taskstats: fix sk_buff size calculation prepare_reply() adds GENL_HDRLEN to the payload (genlmsg_total_size()), but then it does genlmsg_put()->nlmsg_put(). This means we forget to reserve a room for 'struct nlmsghdr'. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Thomas Graf <tgraf@suug.ch> Cc: Andrew Morton <akpm@osdl.org> Cc: Shailabh Nagar <nagar@watson.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Jay Lan <jlan@sgi.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-29 12:07:37 -08:00
Oleg Nesterov	d46a3d0d07	[PATCH] taskstats: fix sk_buff leak 'return genlmsg_cancel()' in taskstats_user_cmd/taskstats_exit_send potentially leaks a skb. Unless we pass 'rep_skb' to the netlink layer we own sk_buff. This means we should always do kfree_skb() on failure. [ Thomas acked and pointed out missing return value in original version ] Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Thomas Graf <tgraf@suug.ch> Cc: Andrew Morton <akpm@osdl.org> Cc: Shailabh Nagar <nagar@watson.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Jay Lan <jlan@sgi.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-29 12:07:37 -08:00
Alan Stern	057647fc47	[PATCH] workqueue: update kerneldoc This patch (as812) changes the kerneldoc comments explaining the return values from queue_work(), queue_delayed_work(), and queue_delayed_work_on(). The updated comments explain more accurately the meaning of the return code and avoid suggesting that a 0 value means the routine was unsuccessful. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-28 11:30:55 -07:00
Satoru Takeuchi	8fa1d7d3b2	[PATCH] cpu-hotplug: release `workqueue_mutex' properly on CPU hot-remove _cpu_down() acquires `workqueue_mutex' on its process, but doen't release it if __cpu_disable() fails. Signed-off-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-28 11:30:55 -07:00
Jim Houston	bb1d860551	[PATCH] time_adjust cleared before use I notice that the code which implements adjtime clears the time_adjust value before using it. The attached patch makes the obvious fix. Acked-by: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Jim Houston <jim.houston@ccur.com> Cc: John Stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-28 11:30:55 -07:00
Oleg Nesterov	d7c3f5f231	[PATCH] fill_tgid: cleanup delays accounting fill_tgid() should skip not only an already exited group leader. If the task has ->exit_state != 0 it already did exit_notify(), so it also did fill_tgid_exit()->delayacct_add_tsk(->signal->stats) and we should skip it to avoid a double accounting. This patch doesn't close the race completely, but it cleanups the code. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Shailabh Nagar <nagar@watson.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Jay Lan <jlan@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-28 11:30:55 -07:00
Oleg Nesterov	a98b609426	[PATCH] taskstats: don't use tasklist_lock Remove tasklist_lock from taskstats.c. find_task_by_pid() is rcu-safe. ->siglock allows us to traverse subthread without tasklist. Q: delay accounting looks wrong to me. If sub-thread has already called taskstats_exit_send() but didn't call release_task(self) yet it will be accounted twice. The window is big. No? Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Shailabh Nagar <nagar@watson.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Jay Lan <jlan@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-28 11:30:54 -07:00
Oleg Nesterov	b8534d7bd8	[PATCH] taskstats: kill ->taskstats_lock in favor of ->siglock signal_struct is (mostly) protected by ->sighand->siglock, I think we don't need ->taskstats_lock to protect ->stats. This also allows us to simplify the locking in fill_tgid(). Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Shailabh Nagar <nagar@watson.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Jay Lan <jlan@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-28 11:30:54 -07:00
Oleg Nesterov	093a8e8aec	[PATCH] taskstats_tgid_free: fix usage taskstats_tgid_free() is called on copy_process's error path. This is wrong. IF (clone_flags & CLONE_THREAD) We should not clear ->signal->taskstats, current uses it, it probably has a valid accumulated info. ELSE taskstats_tgid_init() set ->signal->taskstats = NULL, there is nothing to free. Move the callsite to __exit_signal(). We don't need any locking, entire thread group is exiting, nobody should have a reference to soon to be released ->signal. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Shailabh Nagar <nagar@watson.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Jay Lan <jlan@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-28 11:30:54 -07:00
Oleg Nesterov	05d5bcd60e	[PATCH] bacct_add_tsk: fix unsafe and wrong parent/group_leader dereference 1. ts = timespec_sub(uptime, current->group_leader->start_time); It is possible that current != tsk. Probably it was supposed to be 'tsk->group_leader->start_time. But why we are reading group_leader's start_time ? This accounting is per thread, not per procees, I changed this to 'tsk->start_time. Please corect me. 2. stats->ac_ppid = (tsk->parent) ? tsk->parent->pid : 0; tsk->parent never == NULL, and it is unsafe to dereference it. Both the task and it's parent may exit after the caller unlocks tasklist_lock, the memory could be unmapped (DEBUG_SLAB). (And we should use ->real_parent->tgid in fact). Q: I don't understand the 'if (thread_group_leader(tsk))' check. Why it is needed ? Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Shailabh Nagar <nagar@watson.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Acked-by: Jay Lan <jlan@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-28 11:30:54 -07:00
Oleg Nesterov	fca178c0c6	[PATCH] fill_tgid: fix task_struct leak and possible oops 1. fill_tgid() forgets to do put_task_struct(first). 2. release_task(first) can happen after fill_tgid() drops tasklist_lock, it is unsafe to dereference first->signal. This is a temporary fix, imho the locking should be reworked. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Shailabh Nagar <nagar@watson.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Jay Lan <jlan@sgi.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-28 11:30:54 -07:00
Stephen Rothwell	5fa3839a64	[PATCH] Constify compat_get_bitmap argument This means we can call it when the bitmap we want to fetch is declared const. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Christoph Lameter <clameter@engr.sgi.com> Cc: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-28 11:30:54 -07:00
Jan Dittmer	1d4d262769	[PATCH] Add missing space in module.c for taintskernel Obvious fix. Signed-off-by: Jan Dittmer <jdi@l4x.org> Acked-by: Florin Malita <fmalita@gmail.com> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-28 11:30:53 -07:00
Jan Beulich	690a973f48	[PATCH] x86-64: Speed up dwarf2 unwinder This changes the dwarf2 unwinder to do a binary search for CIEs instead of a linear work. The linker is unfortunately not able to build a proper lookup table at link time, instead it creates one at runtime as soon as the bootmem allocator is usable (so you'll continue using the linear lookup for the first [hopefully] few calls). The code should be ready to utilize a build-time created table once a fixed linker becomes available. Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andi Kleen <ak@suse.de>	2006-10-21 18:37:01 +02:00
Alexey Dobriyan	e05d722e45	[PATCH] kernel/nsproxy.c: use kmemdup() Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-20 10:26:44 -07:00
Randy Dunlap	d6f8ff7381	[PATCH] cad_pid sysctl with PROC_FS=n If CONFIG_PROC_FS=n: kernel/sysctl.c:148: warning: 'proc_do_cad_pid' used but never defined kernel/built-in.o:(.data+0x1228): undefined reference to `proc_do_cad_pid' make: *** [.tmp_vmlinux1] Error 1 Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-20 10:26:38 -07:00
Borislav Petkov	91fcdd4e03	[PATCH] readjust comments of task_timeslice for kernel doc Signed-off-by: Borislav Petkov <petkov@math.uni-muenster.de> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-20 10:26:37 -07:00
Linus Torvalds	43f82216f0	Merge git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input * git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: fm801-gp - handle errors from pci_enable_device() Input: gameport core - handle errors returned by device_bind_driver() Input: serio core - handle errors returned by device_bind_driver() Lockdep: fix compile error in drivers/input/serio/serio.c Input: serio - add lockdep annotations Lockdep: add lockdep_set_class_and_subclass() and lockdep_set_subclass() Input: atkbd - supress "too many keys" error message Input: i8042 - supress ACK/NAKs when blinking during panic Input: add missing exports to fix modular build	2006-10-17 08:56:43 -07:00
Neil Brown	bd5349cfd2	[PATCH] Convert cpu hotplug notifiers to use raw_notifier instead of blocking_notifier The use of blocking notifier by _cpu_up and _cpu_down in cpu.c has two problem. 1/ An interaction with the workqueue notifier causes lockdep to spit a warning. 2/ A notifier could conceivable be added or removed while _cpu_up or _cpu_down are in process. As each notifier is called twice (prepare then commit/abort) this could be unhealthy. To fix to we simply take cpu_add_remove_lock while adding or removing notifiers to/from the list. This makes the 'blocking' usage unnecessary as all accesses to cpu_chain are now protected by cpu_add_remove_lock. So change "blocking" to "raw" in all relevant places. This fixes 1. Credit: Andrew Morton Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Michal Piotrowski <michal.k.k.piotrowski@gmail.com> (reporter) Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-17 08:18:48 -07:00
Peter Zijlstra	bea493a031	[PATCH] rt-mutex: fixup rt-mutex debug code BUG: warning at kernel/rtmutex-debug.c:125/rt_mutex_debug_task_free() (Not tainted) [<c04051e3>] show_trace_log_lvl+0x58/0x16a [<c04057f0>] show_trace+0xd/0x10 [<c0405900>] dump_stack+0x19/0x1b [<c043f03d>] rt_mutex_debug_task_free+0x35/0x6a [<c04224c0>] free_task+0x15/0x24 [<c042378c>] copy_process+0x12bd/0x1324 [<c0423835>] do_fork+0x42/0x113 [<c04021dd>] sys_fork+0x19/0x1b [<c0403fb7>] syscall_call+0x7/0xb In copy_process(), dup_task_struct() also duplicates the ->pi_lock, ->pi_waiters and ->pi_blocked_on members. rt_mutex_debug_task_free() called from free_task() validates these members. However free_task() can be invoked before these members are reset for the new task. Move the initialization code before the first bail that can hit free_task(). Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-17 08:18:48 -07:00
Ingo Molnar	a460e745e8	[PATCH] genirq: clean up irq-flow-type naming Introduce desc->name and eliminate the handle_irq_name() hack. Add set_irq_chip_and_handler_name() to set the flow type and name at once. Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Matthew Wilcox <willy@debian.org> Cc: Kyle McMartin <kyle@mcmartin.ca> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-17 08:18:45 -07:00
Andrew Morton	c60099bfe3	[PATCH] swsusp: fix memory leaks My fancy new swsusp IO code had a big memory leak. It's somewhat invisible because the whole mem_map[] gets overwritten after resume, but it can cause us to get low on memory during the actual suspend process. Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-17 08:18:44 -07:00
Thomas Gleixner	ac08c26492	[PATCH] posix-cpu-timers: prevent signal delivery starvation The integer divisions in the timer accounting code can round the result down to 0. Adding 0 is without effect and the signal delivery stops. Clamp the division result to minimum 1 to avoid this. Problem was reported by Seongbae Park <spark@google.com>, who provided also an inital patch. Roland sayeth: I have had some more time to think about the problem, and to reproduce it using Toyo's test case. For the record, if my understanding of the problem is correct, this happens only in one very particular case. First, the expiry time has to be so soon that in cputime_t units (usually 1s/HZ ticks) it's < nthreads so the division yields zero. Second, it only affects each thread that is so new that its CPU time accumulation is zero so now+0 is still zero and ->it__expires winds up staying zero. For the VIRT and PROF clocks when cputime_t is tick granularity (or the SCHED clock on configurations where sched_clock's value only advances on clock ticks), this is not hard to arrange with new threads starting up and blocking before they accumulate a whole tick of CPU time. That's what happens in Toyo's test case. Note that in general it is fine for that division to round down to zero, and set each thread's expiry time to its "now" time. The problem only arises with thread's whose "now" value is still zero, so that now+0 winds up 0 and is interpreted as "not set" instead of ">= now". So it would be a sufficient and more precise fix to just use max(ticks, 1) inside the loop when setting each it__expires value. But, it does no harm to round the division up to one and always advance every thread's expiry time. If the thread didn't already fire timers for the expiry time of "now", there is no expectation that it will do so before the next tick anyway. So I followed Thomas's patch in lifting the max out of the loops. This patch also covers the reload cases, which are harder to write a test for (and I didn't try). I've tested it with Toyo's case and it fixes that. [toyoa@mvista.com: fix: min_t -> max_t] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Daniel Walker <dwalker@mvista.com> Cc: Toyo Abe <toyoa@mvista.com> Cc: john stultz <johnstul@us.ibm.com> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Seongbae Park <spark@google.com> Cc: Peter Mattis <pmattis@google.com> Cc: Rohit Seth <rohitseth@google.com> Cc: Martin Bligh <mbligh@google.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-17 08:18:43 -07:00
john stultz	3f4a0b917c	[PATCH] i386 Time: Avoid PIT SMP lockups Avoid possible PIT livelock issues seen on SMP systems (and reported by Andi), by not allowing it as a clocksource on SMP boxes. However, since the PIT may no longer be present, we have to properly handle the cases where SMP systems have TSC skew and fall back from the TSC. Since the PIT isn't there, it would "fall back" to the TSC again. So this changes the jiffies rating to 1, and the TSC-bad rating value to 0. Thus you will get the following behavior priority on i386 systems: tsc [if present & stable] hpet [if present] cyclone [if present] acpi_pm [if present] pit [if UP] jiffies Rather then the current more complicated: tsc [if present & stable] hpet [if present] cyclone [if present] acpi_pm [if present] pit [if cpus < 4] tsc [if present & unstable] jiffies Signed-off-by: John Stultz <johnstul@us.ibm.com> Cc: Andi Kleen <ak@suse.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-17 08:18:42 -07:00
Ingo Molnar	ca268c691d	[PATCH] lockdep: increase max allowed recursion depth In general, lockdep warnings are intended to be non-fatal, so I have put in various practical limits on internal data structure failure modes. We haven't had a /single/ lockdep-internal crash ever since lockdep went upstream [the unwinder crashes are outside of lockdep], and that's largely due to the good internal checks it does. Recursion within the dependency graph is currently limited to 20, that's probably not enough on some many-CPU boxes - this patch doubles it to 40. I have written the lockdep functions to have as small stackframes as possible, so 40 should be OK too. (The practical recursion limit should be somewhere between 100 and 200 entries. If we hit that then I'll change the algorithm to be iteration-based. Graph walking logic is so easy to program via recursion, so i'd like to keep recursion as long as possible.) Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-17 08:18:42 -07:00
Randy Dunlap	39af114377	[PATCH] fix epoll_pwait when EPOLL=n Fixes http://bugzilla.kernel.org/show_bug.cgi?id=7371 sys_epoll_pwait needs to be listed as a conditional (weak) entry point for CONFIG_EPOLL=n. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-16 09:14:05 -07:00
Ingo Molnar	256a6b4136	[PATCH] lockdep: fix printk recursion logic Bug reported and fixed by Tilman Schmidt <tilman@imap.cc>: if lockdep is enabled then log messages make it to /var/log/messages belatedly. The reason is a missed wakeup of klogd. Initially there was only a lockdep_internal() protection against lockdep recursion within vprintk() - it grew the 'outer' lockdep_off()/on() protection only later on. But that lockdep_off() made the release_console_sem() within vprintk() always happen under the lockdep_internal() condition, causing the bug. The right solution to remove the inner protection against recursion here - the outer one is enough. Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Tilman Schmidt <tilman@imap.cc> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-11 11:14:24 -07:00
Alexey Dobriyan	3dc3099a9b	[PATCH] lockdep: use BUILD_BUG_ON Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-11 11:14:24 -07:00
Reinette Chatre	01a3ee2b20	[PATCH] bitmap: parse input from kernel and user buffers lib/bitmap.c:bitmap_parse() is a library function that received as input a user buffer. This seemed to have originated from the way the write_proc function of the /proc filesystem operates. This has been reworked to not use kmalloc and eliminates a lot of get_user() overhead by performing one access_ok before using __get_user(). We need to test if we are in kernel or user space (is_user) and access the buffer differently. We cannot use __get_user() to access kernel addresses in all cases, for example in architectures with separate address space for kernel and user. This function will be useful for other uses as well; for example, taking input for /sysfs instead of /proc, so it was changed to accept kernel buffers. We have this use for the Linux UWB project, as part as the upcoming bandwidth allocator code. Only a few routines used this function and they were changed too. Signed-off-by: Reinette Chatre <reinette.chatre@linux.intel.com> Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com> Cc: Paul Jackson <pj@sgi.com> Cc: Joe Korty <joe.korty@ccur.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-11 11:14:22 -07:00
Nick Piggin	beed33a816	[PATCH] sched: likely profiling This likely profiling is pretty fun. I found a few possible problems in sched.c. This patch may be not measurable, but when I did measure long ago, nooping (un)likely cost a couple of % on scheduler heavy benchmarks, so it all adds up. Tweak some branch hints: - the 2nd 64 bits in the bitmask is likely to be populated, because it contains the first 28 bits (nearly 3/4) of the normal priorities. (ratio of 669669:691 ~= 1000:1). - it isn't unlikely that context switching switches to another process. it might be very rapidly switching to and from the idle process (ratio of 475815:419004 and 471330:423544). Let the branch predictor decide. - preempt_enable seems to be very often called in a nested preempt_disable or with interrupts disabled (ratio of 3567760:87965 ~= 40:1) Signed-off-by: Nick Piggin <npiggin@suse.de> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Daniel Walker <dwalker@mvista.com> Cc: Hua Zhong <hzhong@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-11 11:14:22 -07:00
Florin Malita	fa3ba2e81e	[PATCH] fix Module taint flags listing in Oops/panic Module taint flags listing in Oops/panic has a couple of issues: * taint_flags() doesn't null-terminate the buffer after printing the flags * per-module taints are only set if the kernel is not already tainted (with that particular flag) => only the first offending module gets its taint info correctly updated Some additional changes: * 'license_gplok' is no longer needed - equivalent to !(taints & TAINT_PROPRIETARY_MODULE) - so we can drop it from struct module * exporting module taint info via /proc/module: pwc 88576 0 - Live 0xf8c32000 evilmod 6784 1 pwc, Live 0xf8bbf000 (PF) Signed-off-by: Florin Malita <fmalita@gmail.com> Cc: "Randy.Dunlap" <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-11 11:14:21 -07:00
Christoph Lameter	469340236a	[PATCH] mm: kevent threads: use MPOL_DEFAULT Switch the memory policy of the kevent threads to MPOL_DEFAULT while leaving the kzalloc of the workqueue structure on interleave. This means that all code executed in the context of the kevent thread is allocating node local. Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Christoph Lameter <clameter@sgi.com> Cc: Alok Kataria <alok.kataria@calsoftinc.com> Cc: Andi Kleen <ak@suse.de> Cc: <pj@sgi.com> Cc: <shai@scalex86.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-11 11:14:19 -07:00
Rafael J. Wysocki	97c7801cd5	[PATCH] swsusp: Use suspend_console Add suspend_console() and resume_console() to the suspend-to-disk code paths so that the users of netconsole can use swsusp with it. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-11 11:14:14 -07:00
Peter Zijlstra	4dfbb9d8c6	Lockdep: add lockdep_set_class_and_subclass() and lockdep_set_subclass() This annotation makes it possible to assign a subclass on lock init. This annotation is meant to reduce the _nested() annotations by assigning a default subclass. One could do without this annotation and rely on lockdep_set_class() exclusively, but that would require a manual stack of struct lock_class_key objects. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Dmitry Torokhov <dtor@mail.ru>	2006-10-11 01:45:14 -04:00
Al Viro	1af9892811	[PATCH] cpuset ANSI prototype Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Paul Jackson <pj@sgi.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-10 15:37:23 -07:00
Al Viro	ba2397efe1	[PATCH] make kernel/relay.c __user-clean Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-10 15:37:22 -07:00
Al Viro	ba46df984b	[PATCH] __user annotations: futex Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-10 15:37:22 -07:00
Rafael J. Wysocki	5c339d4541	[PATCH] swsusp: Make userland suspend work on SMP again Unfortunately one of the recent changes in swsusp has broken the userland suspend on SMP. Fix it. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-07 10:51:14 -07:00
Frederik Deweerdt	e317c8ccaa	[PATCH] ixp4xxdefconfig arm fixes With the following patch, the ixp4xxdefconfig builds correctly. I'll test some more configs if I get some time. Signed-off-by: Frederik Deweerdt <frederik.deweerdt@gmail.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-06 12:11:08 -07:00
Andrew Morton	4899b8b16b	[PATCH] kauditd_thread warning fix Squash this warning: kernel/audit.c: In function 'kauditd_thread': kernel/audit.c:367: warning: no return statement in function returning non-void We might as test kthread_should_stop(), although it's not very pointful at present. The code which starts this thread looks racy - the kernel could start multiple threads. Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Jeff Garzik <jeff@garzik.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-06 08:53:39 -07:00
David Howells	7d12e780e0	IRQ: Maintain regs pointer globally rather than passing to IRQ handlers Maintain a per-CPU global "struct pt_regs " variable which can be used instead of passing regs around manually through all ~1800 interrupt handlers in the Linux kernel. The regs pointer is used in few places, but it potentially costs both stack space and code to pass it around. On the FRV arch, removing the regs parameter from all the genirq function results in a 20% speed up of the IRQ exit path (ie: from leaving timer_interrupt() to leaving do_IRQ()). Where appropriate, an arch may override the generic storage facility and do something different with the variable. On FRV, for instance, the address is maintained in GR28 at all times inside the kernel as part of general exception handling. Having looked over the code, it appears that the parameter may be handed down through up to twenty or so layers of functions. Consider a USB character device attached to a USB hub, attached to a USB controller that posts its interrupts through a cascaded auxiliary interrupt controller. A character device driver may want to pass regs to the sysrq handler through the input layer which adds another few layers of parameter passing. I've build this code with allyesconfig for x86_64 and i386. I've runtested the main part of the code on FRV and i386, though I can't test most of the drivers. I've also done partial conversion for powerpc and MIPS - these at least compile with minimal configurations. This will affect all archs. Mostly the changes should be relatively easy. Take do_IRQ(), store the regs pointer at the beginning, saving the old one: struct pt_regs old_regs = set_irq_regs(regs); And put the old one back at the end: set_irq_regs(old_regs); Don't pass regs through to generic_handle_irq() or __do_IRQ(). In timer_interrupt(), this sort of change will be necessary: - update_process_times(user_mode(regs)); - profile_tick(CPU_PROFILING, regs); + update_process_times(user_mode(get_irq_regs())); + profile_tick(CPU_PROFILING); I'd like to move update_process_times()'s use of get_irq_regs() into itself, except that i386, alone of the archs, uses something other than user_mode(). Some notes on the interrupt handling in the drivers: () input_dev() is now gone entirely. The regs pointer is no longer stored in the input_dev struct. () finish_unlinks() in drivers/usb/host/ohci-q.c needs checking. It does something different depending on whether it's been supplied with a regs pointer or not. (*) Various IRQ handler function pointers have been moved to type irq_handler_t. Signed-Off-By: David Howells <dhowells@redhat.com> (cherry picked from 1b16e7ac850969f38b375e511e3fa2f474a33867 commit)	2006-10-05 15:10:12 +01:00
David Howells	da482792a6	IRQ: Typedef the IRQ handler function type Typedef the IRQ handler function type. Signed-Off-By: David Howells <dhowells@redhat.com> (cherry picked from 1356d1e5fd256997e3d3dce0777ab787d0515c7a commit)	2006-10-05 13:28:27 +01:00
David Howells	57a58a9435	IRQ: Typedef the IRQ flow handler function type Typedef the IRQ flow handler function type. Signed-Off-By: David Howells <dhowells@redhat.com> (cherry picked from 8e973fbdf5716b93a0a8c0365be33a31ca0fa351 commit)	2006-10-05 13:28:06 +01:00
Linus Torvalds	fefd26b3b8	Merge master.kernel.org:/pub/scm/linux/kernel/git/davej/configh * master.kernel.org:/pub/scm/linux/kernel/git/davej/configh: Remove all inclusions of <linux/config.h> Manually resolved trivial path conflicts due to removed files in the sound/oss/ subdirectory.	2006-10-04 09:59:57 -07:00
Linus Torvalds	18e6756a6b	Merge branch 'audit.b32' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current * 'audit.b32' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current: [PATCH] message types updated [PATCH] name_count array overrun [PATCH] PPID filtering fix [PATCH] arch filter lists with < or > should not be accepted	2006-10-04 08:15:55 -07:00
Oleg Nesterov	20e9751bd9	[PATCH] rcu: simplify/improve batch tuning Kill a hard-to-calculate 'rsinterval' boot parameter and per-cpu rcu_data.last_rs_qlen. Instead, it adds adds a flag rcu_ctrlblk.signaled, which records the fact that one of CPUs has sent a resched IPI since the last rcu_start_batch(). Roughly speaking, we need two rcu_start_batch()s in order to move callbacks from ->nxtlist to ->donelist. This means that when ->qlen exceeds qhimark and continues to grow, we should send a resched IPI, and then do it again after we gone through a quiescent state. On the other hand, if it was already sent, we don't need to do it again when another CPU detects overflow of the queue. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-04 07:55:31 -07:00
Josh Triplett	4b6c2cca6e	[PATCH] rcu: add sched torture type to rcutorture Implement torture testing for the "sched" variant of RCU, which uses preempt_disable, preempt_enable, and synchronize_sched. Signed-off-by: Josh Triplett <josh@freedesktop.org> Acked-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-04 07:55:31 -07:00
Josh Triplett	11a147013e	[PATCH] rcu: add rcu_bh_sync torture type to rcutorture Use the newly-generic synchronous deferred free function to implement torture testing for rcu_bh using synchronize_rcu_bh rather than the asynchronous call_rcu_bh. Signed-off-by: Josh Triplett <josh@freedesktop.org> Acked-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-04 07:55:31 -07:00
Josh Triplett	20d2e4283a	[PATCH] rcu: add rcu_sync torture type to rcutorture Use the newly-generic synchronous deferred free function to implement torture testing for RCU using synchronize_rcu rather than the asynchronous call_rcu. Signed-off-by: Josh Triplett <josh@freedesktop.org> Acked-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-04 07:55:31 -07:00
Josh Triplett	e303373658	[PATCH] rcu: refactor srcu_torture_deferred_free to work for any implementation Make srcu_torture_deferred_free use cur_ops->sync() so it will work for any implementation. Move and rename it in preparation for use in the ops of other implementations. Signed-off-by: Josh Triplett <josh@freedesktop.org> Acked-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-04 07:55:31 -07:00
Josh Triplett	b772e1dd4b	[PATCH] RCU: add fake writers to rcutorture rcutorture currently has one writer and an arbitrary number of readers. To better exercise some of the code paths in RCU implementations, add fake writer threads which call the synchronize function for the RCU variant in a loop, with a delay between calls to arrange for different numbers of writers running in parallel. [bunk@stusta.de: cleanup] Acked-by: Paul McKenney <paulmck@us.ibm.com> Cc: Dipkanar Sarma <dipankar@in.ibm.com> Signed-off-by: Josh Triplett <josh@freedesktop.org> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-04 07:55:31 -07:00
Josh Triplett	75cfef32f2	[PATCH] rcu: Fix sign bug making rcu_random always return the same sequence rcu_random uses a counter rrs_count to occasionally mix data from get_random_bytes into the state of its pseudorandom generator. However, the rrs_counter gets declared as an unsigned long, and rcu_random checks for --rrs_count < 0, so this code will never mix any real random data into the state, and will thus always return the same sequence of random numbers. Also, change the return value of rcu_random from long to unsigned long, to avoid potential issues caused by the use of the % operator, which can return negative values for negative left operands. Signed-off-by: Josh Triplett <josh@freedesktop.org> Acked-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-04 07:55:31 -07:00
Josh Triplett	2860aaba4d	[PATCH] rcu: Avoid kthread_stop on invalid pointer if rcutorture reader startup fails rcu_torture_init kmallocs the array of reader threads, then creates each one with kthread_run, cleaning up with rcu_torture_cleanup if this fails. rcu_torture_cleanup calls kthread_stop on any non-NULL pointer in the array; however, any readers after the one that failed to start up will have invalid pointers, not null pointers. Avoid this by using kzalloc instead. Signed-off-by: Josh Triplett <josh@freedesktop.org> Acked-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-04 07:55:30 -07:00
Josh Triplett	3c29e03d91	[PATCH] rcu: Mention rcu_bh in description of rcutorture's torture_type parameter The comment for rcutorture's torture_type parameter only lists the RCU variants rcu and srcu, but not rcu_bh; add rcu_bh to the list. Signed-off-by: Josh Triplett <josh@freedesktop.org> Acked-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-04 07:55:30 -07:00
Josh Triplett	ff2c93a537	[PATCH] rcu: Add MODULE_AUTHOR to rcutorture module Signed-off-by: Josh Triplett <josh@freedesktop.org> Acked-by: "Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-04 07:55:30 -07:00
Alan Stern	e6a92013ba	[PATCH] SRCU: report out-of-memory errors Currently the init_srcu_struct() routine has no way to report out-of-memory errors. This patch (as761) makes it return -ENOMEM when the per-cpu data allocation fails. The patch also makes srcu_init_notifier_head() report a BUG if a notifier head can't be initialized. Perhaps it should return -ENOMEM instead, but in the most likely cases where this might occur I don't think any recovery is possible. Notifier chains generally are not created dynamically. [akpm@osdl.org: avoid statement-with-side-effect in macro] Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Acked-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-04 07:55:30 -07:00
Alan Stern	eabc069401	[PATCH] Add SRCU-based notifier chains This patch (as751) adds a new type of notifier chain, based on the SRCU (Sleepable Read-Copy Update) primitives recently added to the kernel. An SRCU notifier chain is much like a blocking notifier chain, in that it must be called in process context and its callout routines are allowed to sleep. The difference is that the chain's links are protected by the SRCU mechanism rather than by an rw-semaphore, so calling the chain has extremely low overhead: no memory barriers and no cache-line bouncing. On the other hand, unregistering from the chain is expensive and the chain head requires special runtime initialization (plus cleanup if it is to be deallocated). SRCU notifiers are appropriate for notifiers that will be called very frequently and for which unregistration occurs very seldom. The proposed "task notifier" scheme qualifies, as may some of the network notifiers. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Acked-by: Paul E. McKenney <paulmck@us.ibm.com> Acked-by: Chandra Seetharaman <sekharan@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-04 07:55:30 -07:00
Paul E. McKenney	b2896d2e75	[PATCH] srcu-3: add SRCU operations to rcutorture Adds SRCU operations to rcutorture and updates rcutorture documentation. Also increases the stress imposed by the rcutorture test. [bunk@stusta.de: make needlessly global code static] Signed-off-by: Paul E. McKenney <paulmck@us.ibm.com> Cc: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-04 07:55:30 -07:00
Paul E. McKenney	621934ee7e	[PATCH] srcu-3: RCU variant permitting read-side blocking Updated patch adding a variant of RCU that permits sleeping in read-side critical sections. SRCU is as follows: o Each use of SRCU creates its own srcu_struct, and each srcu_struct has its own set of grace periods. This is critical, as it prevents one subsystem with a blocking reader from holding up SRCU grace periods for other subsystems. o The SRCU primitives (srcu_read_lock(), srcu_read_unlock(), and synchronize_srcu()) all take a pointer to a srcu_struct. o The SRCU primitives must be called from process context. o srcu_read_lock() returns an int that must be passed to the matching srcu_read_unlock(). Realtime RCU avoids the need for this by storing the state in the task struct, but SRCU needs to allow a given code path to pass through multiple SRCU domains -- storing state in the task struct would therefore require either arbitrary space in the task struct or arbitrary limits on SRCU nesting. So I kicked the state-storage problem up to the caller. Of course, it is not permitted to call synchronize_srcu() while in an SRCU read-side critical section. o There is no call_srcu(). It would not be hard to implement one, but it seems like too easy a way to OOM the system. (Hey, we have enough trouble with call_rcu(), which does -not- permit readers to sleep!!!) So, if you want it, please tell me why... [josht@us.ibm.com: sparse notation] Signed-off-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: Josh Triplett <josh@freedesktop.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-04 07:55:30 -07:00
Eric W. Biederman	1f80025e62	[PATCH] msi: simplify msi sanity checks by adding with generic irq code Currently msi.c is doing sanity checks that make certain before an irq is destroyed it has no more users. By adding irq_has_action I can perform the test is a generic way, instead of relying on a msi specific data structure. By performing the core check in dynamic_irq_cleanup I ensure every user of dynamic irqs has a test present and we don't free resources that are in use. In msi.c this allows me to kill the attrib.state member of msi_desc and all of the assciated code to maintain it. To keep from freeing data structures when irq cleanup code is called to soon changing dyanamic_irq_cleanup is insufficient because there are msi specific data structures that are also not safe to free. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Tony Luck <tony.luck@intel.com> Cc: Andi Kleen <ak@suse.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Greg KH <greg@kroah.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-04 07:55:29 -07:00
Eric W. Biederman	3a16d71362	[PATCH] genirq: irq: add a dynamic irq creation API With the msi support comes a new concept in irq handling, irqs that are created dynamically at run time. Currently the msi code allocates irqs backwards. First it allocates a platform dependent routing value for an interrupt the ``vector'' and then it figures out from the vector which irq you are on. This msi backwards allocator suffers from two basic problems. The allocator suffers because it is trying to do something that is architecture specific in a generic way making it brittle, inflexible, and tied to tightly to the architecture implementation. The alloctor also suffers from it's very backwards nature as it has tied things together that should have no dependencies. To solve the basic dynamic irq allocation problem two new architecture specific functions are added: create_irq and destroy_irq. create_irq takes no input and returns an unused irq number, that won't be reused until it is returned to the free poll with destroy_irq. The irq then can be used for any purpose although the only initial consumer is the msi code. destroy_irq takes an irq number allocated with create_irq and returns it to the free pool. Making this functionality per architecture increases the simplicity of the irq allocation code and increases it's flexibility. dynamic_irq_init() and dynamic_irq_cleanup() are added to automate the irq_desc initializtion that should happen for dynamic irqs. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rajesh Shah <rajesh.shah@intel.com> Cc: Andi Kleen <ak@muc.de> Cc: "Protasevich, Natalie" <Natalie.Protasevich@UNISYS.com> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-04 07:55:27 -07:00
Eric W. Biederman	e7b946e98a	[PATCH] genirq: irq: add moved_masked_irq Currently move_native_irq disables and renables the irq we are migrating to ensure we don't take that irq when we are actually doing the migration operation. Disabling the irq needs to happen but sometimes doing the work is move_native_irq is too late. On x86 with ioapics the irq move sequences needs to be: edge_triggered: mask irq. move irq. unmask irq. ack irq. level_triggered: mask irq. ack irq. move irq. unmask irq. We can easily perform the edge triggered sequence, with the current defintion of move_native_irq. However the level triggered case does not map well. For that I have added move_masked_irq, to allow me to disable the irqs around both the ack and the move. Q: Why have we not seen this problem earlier? A: The only symptom I have been able to reproduce is that if we change the vector before acknowleding an irq the wrong irq is acknowledged. Since we currently are not reprogramming the irq vector during migration no problems show up. We have to mask the irq before we acknowledge the irq or else we could hit a window where an irq is asserted just before we acknowledge it. Edge triggered irqs do not have this problem because acknowledgements do not propogate in the same way. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rajesh Shah <rajesh.shah@intel.com> Cc: Andi Kleen <ak@muc.de> Cc: "Protasevich, Natalie" <Natalie.Protasevich@UNISYS.com> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-04 07:55:26 -07:00
Eric W. Biederman	a24ceab4f4	[PATCH] genirq: irq: convert the move_irq flag from a 32bit word to a single bit The primary aim of this patchset is to remove maintenances problems caused by the irq infrastructure. The two big issues I address are an artificially small cap on the number of irqs, and that MSI assumes vector == irq. My primary focus is on x86_64 but I have touched other architectures where necessary to keep them from breaking. - To increase the number of irqs I modify the code to look at the (cpu, vector) pair instead of just looking at the vector. With a large number of irqs available systems with a large irq count no longer need to compress their irq numbers to fit. Removing a lot of brittle special cases. For acpi guys the result is that irq == gsi. - Addressing the fact that MSI assumes irq == vector takes a few more patches. But suffice it to say when I am done none of the generic irq code even knows what a vector is. In quick testing on a large Unisys x86_64 machine we stumbled over at least one driver that assumed that NR_IRQS could always fit into an 8 bit number. This driver is clearly buggy today. But this has become a class of bugs that it is now much easier to hit. This patch: This is a minor space optimization. In practice I don't think this has any affect because of our alignment constraints and the other fields but there is not point in chewing up an uncessary word and since we already read the flag field this should improve the cache hit ratio of the irq handler. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rajesh Shah <rajesh.shah@intel.com> Cc: Andi Kleen <ak@muc.de> Cc: "Protasevich, Natalie" <Natalie.Protasevich@UNISYS.com> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-04 07:55:26 -07:00
Steve Grubb	ac9910ce01	[PATCH] name_count array overrun Hi, This patch removes the rdev logging from the previous patch The below patch closes an unbounded use of name_count. This can lead to oopses in some new file systems. Signed-off-by: Steve Grubb <sgrubb@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2006-10-04 08:31:21 -04:00
Alexander Viro	419c58f11f	[PATCH] PPID filtering fix On Thu, Sep 28, 2006 at 04:03:06PM -0400, Eric Paris wrote: > After some looking I did not see a way to get into audit_log_exit > without having set the ppid. So I am dropping the set from there and > only doing it at the beginning. > > Please comment/ack/nak as soon as possible. Ehh... That's one hell of an overhead to be had ;-/ Let's be lazy. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2006-10-04 08:31:19 -04:00
Eric Paris	4b8a311bb1	[PATCH] arch filter lists with < or > should not be accepted Currently the kernel audit system represents arch's as numbers and will gladly accept comparisons between archs using >, <, >=, <= when the only thing that makes sense is = or !=. I'm told that the next revision of auditctl will do this checking but this will provide enforcement in the kernel even for old userspace. A simple command to show the issue would be to run auditctl -d entry,always -F arch>i686 -S chmod with this patch the kernel will reject this with -EINVAL Please comment/ack/nak as soon as possible. -Eric kernel/auditfilter.c \| 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2006-10-04 08:31:16 -04:00
Dave Jones	038b0a6d8d	Remove all inclusions of <linux/config.h> kbuild explicitly includes this at build time. Signed-off-by: Dave Jones <davej@redhat.com>	2006-10-04 03:38:54 -04:00
Josh Triplett	4802211cfd	rcutorture: Fix incorrect description of default for nreaders parameter The comment for the nreaders parameter of rcutorture gives the default as 4ncpus, but the value actually defaults to 2ncpus; fix the comment. Signed-off-by: Josh Triplett <josh@freedesktop.org> Acked-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>	2006-10-03 23:26:16 +02:00
Rolf Eike Beer	9f5d785e93	remove duplicate "until" from kernel/workqueue.c s/until until/until/ Signed-off-by: Rolf Eike Beer <eike-kernel@sf-tec.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>	2006-10-03 23:07:31 +02:00
Uwe Zeisberger	f30c226954	fix file specification in comments Many files include the filename at the beginning, serveral used a wrong one. Signed-off-by: Uwe Zeisberger <Uwe_Zeisberger@digi.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>	2006-10-03 23:01:26 +02:00
Christoph Lameter	ce164428c4	[PATCH] scheduler: NUMA aware placement of sched_group_allnodes When the per cpu sched domains are build then they also need to be placed on the node where the cpu resides otherwise we will have frequent off node accesses which will slow down the system. Signed-off-by: Christoph Lameter <clameter@sgi.com> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-03 08:04:07 -07:00
Satoru Takeuchi	0feaece977	[PATCH] sched: fixing wrong comment for find_idlest_cpu() Fixing wrong comment for find_idlest_cpu(). Signed-off-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-03 08:04:07 -07:00
Siddha, Suresh B	89c4710ee9	[PATCH] sched: cleanup sched_group cpu_power setup Up to now sched group's cpu_power for each sched domain is initialized independently. This made the setup code ugly as the new sched domains are getting added. Make the sched group cpu_power setup code generic, by using domain child field and new domain flag in sched_domain. For most of the sched domains(except NUMA), sched group's cpu_power is now computed generically using the domain properties of itself and of the child domain. sched groups in NUMA domains are setup little differently and hence they don't use this generic mechanism. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-03 08:04:06 -07:00
Siddha, Suresh B	1a84887080	[PATCH] sched: introduce child field in sched_domain Introduce the child field in sched_domain struct and use it in sched_balance_self(). We will also use this field in cleaning up the sched group cpu_power setup(done in a different patch) code. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-03 08:04:06 -07:00
Dave Jones	7473264643	[PATCH] sched: don't print migration cost when only 1 CPU If only a single CPU is present, printing this doesn't make much sense. Signed-off-by: Dave Jones <davej@redhat.com> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-03 08:04:06 -07:00
Siddha, Suresh B	a616058b78	[PATCH] sched: remove unnecessary sched group allocations Remove dynamic sched group allocations for MC and SMP domains. These allocations can easily fail on big systems(1024 or so CPUs) and we can live with out these dynamic allocations. [akpm@osdl.org: build fix] Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-03 08:04:06 -07:00
Nick Piggin	5c1e176781	[PATCH] sched: force /sbin/init off isolated cpus Force /sbin/init off isolated cpus (unless every CPU is specified as an isolcpu). Users seem to think that the isolated CPUs shouldn't have much running on them to begin with. That's fair enough: intuitive, I guess. It also means that the cpu affinity masks of tasks will not include isolcpus by default, which is also more intuitive, perhaps. /sbin/init is spawned from the boot CPU's idle thread, and /sbin/init starts the rest of userspace. So if the boot CPU is specified to be an isolcpu, then prior to this patch, all of userspace will be run there. (throw in a couple of plausible devinit -> cpuinit conversions I spotted while we're here). Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Dimitri Sivanich <sivanich@sgi.com> Acked-by: Paul Jackson <pj@sgi.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-03 08:04:06 -07:00
Randy Dunlap	e1ca66d1b9	[PATCH] kernel-doc for kernel/resource.c Add kernel-doc function headers in kernel/resource.c and use them in DocBook. Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-03 08:03:41 -07:00
Randy Dunlap	eed34d0fc5	[PATCH] kernel-doc for kernel/dma.c Add kernel-doc function headers in kernel/dma.c and use it in DocBook. Clean up kernel-doc in mca_dma.h (the colon (':') represents a section header). Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-03 08:03:41 -07:00
Franck Bui-Huu	ffc5089196	[PATCH] Create kallsyms_lookup_size_offset() Some uses of kallsyms_lookup() do not need to find out the name of a symbol and its module's name it belongs. This is specially true in arch specific code, which needs to unwind the stack to show the back trace during oops (mips is an example). In this specific case, we just need to retreive the function's size and the offset of the active intruction inside it. Adds a new entry "kallsyms_lookup_size_offset()" This new entry does exactly the same as kallsyms_lookup() but does not require any buffers to store any names. It returns 0 if it fails otherwise 1. Signed-off-by: Franck Bui-Huu <vagabon.xyz@gmail.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-03 08:03:41 -07:00
David Howells	3f2e05e90e	[PATCH] BLOCK: Revert patch to hack around undeclared sigset_t in linux/compat.h Revert Andrew Morton's patch to temporarily hack around the lack of a declaration of sigset_t in linux/compat.h to make the block-disablement patches build on IA64. This got accidentally pushed to Linus and should be fixed in a different manner. Also make linux/compat.h #include asm/signal.h to gain a definition of sigset_t so that it can externally declare sigset_from_compat(). This has been compile-tested for i386, x86_64, ia64, mips, mips64, frv, ppc and ppc64 and run-tested on frv. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 08:03:31 -07:00
Cedric Le Goater	9ec52099e4	[PATCH] replace cad_pid by a struct pid There are a few places in the kernel where the init task is signaled. The ctrl+alt+del sequence is one them. It kills a task, usually init, using a cached pid (cad_pid). This patch replaces the pid_t by a struct pid to avoid pid wrap around problem. The struct pid is initialized at boot time in init() and can be modified through systctl with /proc/sys/kernel/cad_pid [ I haven't found any distro using it ? ] It also introduces a small helper routine kill_cad_pid() which is used where it seemed ok to use cad_pid instead of pid 1. [akpm@osdl.org: cleanups, build fix] Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:25 -07:00
Oleg Nesterov	1a657f78dc	[PATCH] introduce get_task_pid() to fix unsafe get_pid() proc_pid_make_inode: ei->pid = get_pid(task_pid(task)); I think this is not safe. get_pid() can be preempted after checking "pid != NULL". Then the task exits, does detach_pid(), and RCU frees the pid. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:25 -07:00
Arnd Bergmann	6760856791	[PATCH] introduce kernel_execve The use of execve() in the kernel is dubious, since it relies on the __KERNEL_SYSCALLS__ mechanism that stores the result in a global errno variable. As a first step of getting rid of this, change all users to a global kernel_execve function that returns a proper error code. This function is a terrible hack, and a later patch removes it again after the kernel syscalls are gone. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Andi Kleen <ak@muc.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Ian Molton <spyro@f2s.com> Cc: Mikael Starvik <starvik@axis.com> Cc: David Howells <dhowells@redhat.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Hirokazu Takata <takata.hirokazu@renesas.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp> Cc: Richard Curnow <rc@rc0.org.uk> Cc: William Lee Irwin III <wli@holomorphy.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jeff Dike <jdike@addtoit.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp> Cc: Chris Zankel <chris@zankel.net> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:23 -07:00
Pavel	5d124e99c2	[PATCH] nsproxy cloning error path fix This patch fixes copy_namespaces()'s error path. when new nsproxy (new_ns) is created pointers to namespaces (ipc, uts) are copied from the old nsproxy. Later in copy_utsname, copy_ipcs, etc. according namespaces are get-ed. On error path needed namespaces are put-ed, so there's no need to put new nsproxy itelf as it woud cause putting namespaces for the second time. Found when incorporating namespaces into OpenVZ kernel. Signed-off-by: Pavel Emelianov <xemul@openvz.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:22 -07:00
Kirill Korotaev	fcfbd547b1	[PATCH] IPC namespace - sysctls Sysctl tweaks for IPC namespace Signed-off-by: Pavel Emelianiov <xemul@openvz.org> Signed-off-by: Kirill Korotaev <dev@openvz.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:22 -07:00
Kirill Korotaev	73ea41302b	[PATCH] IPC namespace - utils This patch adds basic IPC namespace functionality to IPC utils: - init_ipc_ns - copy/clone/unshare/free IPC ns - /proc preparations Signed-off-by: Pavel Emelianov <xemul@openvz.org> Signed-off-by: Kirill Korotaev <dev@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:22 -07:00
Kirill Korotaev	25b21cb2f6	[PATCH] IPC namespace core This patch set allows to unshare IPCs and have a private set of IPC objects (sem, shm, msg) inside namespace. Basically, it is another building block of containers functionality. This patch implements core IPC namespace changes: - ipc_namespace structure - new config option CONFIG_IPC_NS - adds CLONE_NEWIPC flag - unshare support [clg@fr.ibm.com: small fix for unshare of ipc namespace] [akpm@osdl.org: build fix] Signed-off-by: Pavel Emelianov <xemul@openvz.org> Signed-off-by: Kirill Korotaev <dev@openvz.org> Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:22 -07:00
Serge Hallyn	c0b2fc3165	[PATCH] uts: copy nsproxy only when needed The nsproxy was being copied in unshare() when anything was being unshared, even if it was something not referenced from nsproxy. This should end up in some cases with far more memory usage than necessary. Signed-off-by: Serge Hallyn <serue@us.ibm.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Andrey Savochkin <saw@sw.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:22 -07:00
Serge E. Hallyn	071df104f8	[PATCH] namespaces: utsname: implement CLONE_NEWUTS flag Implement a CLONE_NEWUTS flag, and use it at clone and sys_unshare. [clg@fr.ibm.com: IPC unshare fix] [bunk@stusta.de: cleanup] Signed-off-by: Serge Hallyn <serue@us.ibm.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Andrey Savochkin <saw@sw.ru> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:22 -07:00
Serge E. Hallyn	8218c74c02	[PATCH] namespaces: utsname: sysctl Sysctl uts patch. This will need to be done another way, but since sysctl itself needs to be container aware, 'the right thing' is a separate patchset. [akpm@osdl.org: ia64 build fix] [sam.vilain@catalyst.net.nz: cleanup] [sam.vilain@catalyst.net.nz: add proc_do_utsns_string] Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Andrey Savochkin <saw@sw.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:21 -07:00
Serge E. Hallyn	4865ecf131	[PATCH] namespaces: utsname: implement utsname namespaces This patch defines the uts namespace and some manipulators. Adds the uts namespace to task_struct, and initializes a system-wide init namespace. It leaves a #define for system_utsname so sysctl will compile. This define will be removed in a separate patch. [akpm@osdl.org: build fix, cleanup] Signed-off-by: Serge Hallyn <serue@us.ibm.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Andrey Savochkin <saw@sw.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:21 -07:00
Serge E. Hallyn	96b644bdec	[PATCH] namespaces: utsname: use init_utsname when appropriate In some places, particularly drivers and __init code, the init utsns is the appropriate one to use. This patch replaces those with a the init_utsname helper. Changes: Removed several uses of init_utsname(). Hope I picked all the right ones in net/ipv4/ipconfig.c. These are now changed to utsname() (the per-process namespace utsname) in the previous patch (2/7) [akpm@osdl.org: CIFS fix] Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Andrey Savochkin <saw@sw.ru> Cc: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:21 -07:00
Serge E. Hallyn	e9ff3990f0	[PATCH] namespaces: utsname: switch to using uts namespaces Replace references to system_utsname to the per-process uts namespace where appropriate. This includes things like uname. Changes: Per Eric Biederman's comments, use the per-process uts namespace for ELF_PLATFORM, sunrpc, and parts of net/ipv4/ipconfig.c [jdike@addtoit.com: UML fix] [clg@fr.ibm.com: cleanup] [akpm@osdl.org: build fix] Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Andrey Savochkin <saw@sw.ru> Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:21 -07:00
Cedric Le Goater	fab413a334	[PATCH] namespaces: exit_task_namespaces() invalidates nsproxy exit_task_namespaces() has replaced the former exit_namespace(). It invalidates task->nsproxy and associated namespaces. This is an issue for the (futur) pid namespace which is required to be valid in exit_notify(). This patch moves exit_task_namespaces() after exit_notify() to keep nsproxy valid. Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Cc: Serge E. Hallyn <serue@us.ibm.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Andrey Savochkin <saw@sw.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:21 -07:00
Serge E. Hallyn	1651e14e28	[PATCH] namespaces: incorporate fs namespace into nsproxy This moves the mount namespace into the nsproxy. The mount namespace count now refers to the number of nsproxies point to it, rather than the number of tasks. As a result, the unshare_namespace() function in kernel/fork.c no longer checks whether it is being shared. Signed-off-by: Serge Hallyn <serue@us.ibm.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Andrey Savochkin <saw@sw.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:20 -07:00
Serge E. Hallyn	0437eb594e	[PATCH] nsproxy: move init_nsproxy into kernel/nsproxy.c Move the init_nsproxy definition out of arch/ into kernel/nsproxy.c. This avoids all arches having to be updated. Compiles and boots on s390. Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Andrey Savochkin <saw@sw.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:20 -07:00
Serge E. Hallyn	ab516013ad	[PATCH] namespaces: add nsproxy This patch adds a nsproxy structure to the task struct. Later patches will move the fs namespace pointer into this structure, and introduce a new utsname namespace into the nsproxy. The vserver and openvz functionality, then, would be implemented in large part by virtualizing/isolating more and more resources into namespaces, each contained in the nsproxy. [akpm@osdl.org: build fix] Signed-off-by: Serge Hallyn <serue@us.ibm.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Andrey Savochkin <saw@sw.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:20 -07:00
Adrian Bunk	b1ba4ddde0	[PATCH] make kernel/sysctl.c:_proc_do_string() static This patch makes the needlessly global _proc_do_string() static. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:20 -07:00
Sam Vilain	f5dd3d6fad	[PATCH] proc: sysctl: add _proc_do_string helper The logic in proc_do_string is worth re-using without passing in a ctl_table structure (say, we want to calculate a pointer and pass that in instead); pass in the two fields it uses from that structure as explicit arguments. Signed-off-by: Sam Vilain <sam.vilain@catalyst.net.nz> Cc: Serge E. Hallyn <serue@us.ibm.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Andrey Savochkin <saw@sw.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:20 -07:00
Greg Banks	e16b38f713	[PATCH] cpumask: export cpu_online_map and cpu_possible_map consistently cpumask: ensure that the cpu_online_map and cpu_possible_map bitmasks, and hence all the macros in <linux/cpumask.h> that require them, are available to modules for all supported combinations of architecture and CONFIG_SMP. Signed-off-by: Greg Banks <gnb@melbourne.sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:17 -07:00
bibo,mao	99219a3fbc	[PATCH] kretprobe spinlock deadlock patch kprobe_flush_task() possibly calls kfree function during holding kretprobe_lock spinlock, if kfree function is probed by kretprobe that will incur spinlock deadlock. This patch moves kfree function out scope of kretprobe_lock. Signed-off-by: bibo, mao <bibo.mao@intel.com> Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:16 -07:00
bibo,mao	f2aa85a0cc	[PATCH] disallow kprobes on notifier_call_chain When kprobe is re-entered, the re-entered kprobe kernel path will will call atomic_notifier_call_chain function, if this function is kprobed that will incur numerous kprobe recursive fault. This patch disallows kprobes on atomic_notifier_call_chain function. Signed-off-by: bibo, mao <bibo.mao@intel.com> Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:16 -07:00
bibo,mao	62c27be0dd	[PATCH] kprobe whitespace cleanup Whitespace is used to indent, this patch cleans up these sentences by kernel coding style. Signed-off-by: bibo, mao <bibo.mao@intel.com> Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:16 -07:00
Ananth N Mavinakayanahalli	3a872d89ba	[PATCH] Kprobes: Make kprobe modules more portable In an effort to make kprobe modules more portable, here is a patch that: o Introduces the "symbol_name" field to struct kprobe. The symbol->address resolution now happens in the kernel in an architecture agnostic manner. 64-bit powerpc users no longer have to specify the ".symbols" o Introduces the "offset" field to struct kprobe to allow a user to specify an offset into a symbol. o The legacy mechanism of specifying the kprobe.addr is still supported. However, if both the kprobe.addr and kprobe.symbol_name are specified, probe registration fails with an -EINVAL. o The symbol resolution code uses kallsyms_lookup_name(). So CONFIG_KPROBES now depends on CONFIG_KALLSYMS o Apparantly kprobe modules were the only legitimate out-of-tree user of the kallsyms_lookup_name() EXPORT. Now that the symbol resolution happens in-kernel, remove the EXPORT as suggested by Christoph Hellwig o Modify tcp_probe.c that uses the kprobe interface so as to make it work on multiple platforms (in its earlier form, the code wouldn't work, say, on powerpc) Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Signed-off-by: Prasanna S Panchamukhi <prasanna@in.ibm.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:16 -07:00
Eric W. Biederman	2425c08b37	[PATCH] usb: fixup usb so it uses struct pid The problem with remembering a user space process by its pid is that it is possible that the process will exit, pid wrap around will occur. Converting to a struct pid avoid that problem, and paves the way for implementing a pid namespace. Also since usb is the only user of kill_proc_info_as_uid rename kill_proc_info_as_uid to kill_pid_info_as_uid and have the new version take a struct pid. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:15 -07:00
Eric W. Biederman	f40f50d3bb	[PATCH] Use struct pspace in next_pidmap and find_ge_pid This updates my proc: readdir race fix (take 3) patch to account for the changes made by: Sukadev Bhattiprolu <sukadev@us.ibm.com> to introduce struct pspace. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:15 -07:00
Sukadev Bhattiprolu	3fbc964864	[PATCH] Define struct pspace Define a per-container pid space object. And create one instance of this object, init_pspace, to define the entire pid space. Subsequent patches will provide/use interfaces to create/destroy pid spaces. Its a subset/rework of Eric Biederman's patch http://lkml.org/lkml/2006/2/6/285 . Signed-off-by: Eric Biederman <ebiederm@xmission.com> Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Serge Hallyn <serue@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Kirill Korotaev <dev@sw.ru> Cc: Andrey Savochkin <saw@sw.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:15 -07:00
Sukadev Bhattiprolu	aa5a6662f9	[PATCH] Move pidmap to pspace.h Move struct pidmap and PIDMAP_ENTRIES to a new file, include/linux/pspace.h where it will be used in subsequent patches to define pid spaces. Its a subset of Eric Biederman's patch http://lkml.org/lkml/2006/2/6/285 [akpm@osdl.org: cleanups] Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Serge Hallyn <serue@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:15 -07:00
Eric W. Biederman	c88be3eb2e	[PATCH] pids coding style use struct pidmap in next_pidmap Use struct pidmap instead of pidmap_t. This updates my proc: readdir race fix (take 3) patch to account for the changes made by: Sukadev Bhattiprolu <sukadev@us.ibm.com> to kill pidmap_t. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:15 -07:00
Sukadev Bhattiprolu	6a1f3b8455	[PATCH] pids: coding style: use struct pidmap Use struct pidmap instead of pidmap_t. Its a subset of Eric Biederman's patch http://lkml.org/lkml/2006/2/6/271. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Serge Hallyn <serue@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:14 -07:00
Eric W. Biederman	609d7fa956	[PATCH] file: modify struct fown_struct to use a struct pid File handles can be requested to send sigio and sigurg to processes. By tracking the destination processes using struct pid instead of pid_t we make the interface safe from all potential pid wrap around problems. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:14 -07:00
Eric W. Biederman	bbf73147e2	[PATCH] pid: export the symbols needed to use struct pid * pids aren't something that drivers should care about. However there are a lot of helper layers in the kernel that do care, and are built as modules. Before I can convert them to using struct pid instead of pid_t I need to export the appropriate symbols so they can continue to be built. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:13 -07:00
Eric W. Biederman	c4b92fc112	[PATCH] pid: implement signal functions that take a struct pid * Currently the signal functions all either take a task or a pid_t argument. This patch implements variants that take a struct pid *. After all of the users have been update it is my intention to remove the variants that take a pid_t as using pid_t can be more work (an extra hash table lookup) and difficult to get right in the presence of multiple pid namespaces. There are two kinds of functions introduced in this patch. The are the general use functions kill_pgrp and kill_pid which take a priv argument that is ultimately used to create the appropriate siginfo information, Then there are _kill_pgrp_info, kill_pgrp_info, kill_pid_info the internal implementation helpers that take an explicit siginfo. The distinction is made because filling out an explcit siginfo is tricky, and will be even more tricky when pid namespaces are introduced. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:13 -07:00
Eric W. Biederman	0804ef4b0d	[PATCH] proc: readdir race fix (take 3) The problem: An opendir, readdir, closedir sequence can fail to report process ids that are continually in use throughout the sequence of system calls. For this race to trigger the process that proc_pid_readdir stops at must exit before readdir is called again. This can cause ps to fail to report processes, and it is in violation of posix guarantees and normal application expectations with respect to readdir. Currently there is no way to work around this problem in user space short of providing a gargantuan buffer to user space so the directory read all happens in on system call. This patch implements the normal directory semantics for proc, that guarantee that a directory entry that is neither created nor destroyed while reading the directory entry will be returned. For directory that are either created or destroyed during the readdir you may or may not see them. Furthermore you may seek to a directory offset you have previously seen. These are the guarantee that ext[23] provides and that posix requires, and more importantly that user space expects. Plus it is a simple semantic to implement reliable service. It is just a matter of calling readdir a second time if you are wondering if something new has show up. These better semantics are implemented by scanning through the pids in numerical order and by making the file offset a pid plus a fixed offset. The pid scan happens on the pid bitmap, which when you look at it is remarkably efficient for a brute force algorithm. Given that a typical cache line is 64 bytes and thus covers space for 64*8 == 200 pids. There are only 40 cache lines for the entire 32K pid space. A typical system will have 100 pids or more so this is actually fewer cache lines we have to look at to scan a linked list, and the worst case of having to scan the entire pid bitmap is pretty reasonable. If we need something more efficient we can go to a more efficient data structure for indexing the pids, but for now what we have should be sufficient. In addition this takes no additional locks and is actually less code than what we are doing now. Also another very subtle bug in this area has been fixed. It is possible to catch a task in the middle of de_thread where a thread is assuming the thread of it's thread group leader. This patch carefully handles that case so if we hit it we don't fail to return the pid, that is undergoing the de_thread dance. Thanks to KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> for providing the first fix, pointing this out and working on it. [oleg@tv-sign.ru: fix it] Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Jean Delvare <jdelvare@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:12 -07:00
Randy Dunlap	2bc2d61a96	[PATCH] list module taint flags in Oops/panic When listing loaded modules during an oops or panic, also list each module's Tainted flags if non-zero (P: Proprietary or F: Forced load only). If a module is did not taint the kernel, it is just listed like usbcore but if it did taint the kernel, it is listed like wizmodem(PF) Example: [ 3260.121718] Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP: [ 3260.121729] [<ffffffff8804c099>] :dump_test:proc_dump_test+0x99/0xc8 [ 3260.121742] PGD fe8d067 PUD 264a6067 PMD 0 [ 3260.121748] Oops: 0002 [1] SMP [ 3260.121753] CPU 1 [ 3260.121756] Modules linked in: dump_test(P) snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device ide_cd generic ohci1394 snd_hda_intel snd_hda_codec snd_pcm snd_timer snd ieee1394 snd_page_alloc piix ide_core arcmsr aic79xx scsi_transport_spi usblp [ 3260.121785] Pid: 5556, comm: bash Tainted: P 2.6.18-git10 #1 [Alternatively, I can look into listing tainted flags with 'lsmod', but that won't help in oopsen/panics so much.] [akpm@osdl.org: cleanup] Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-02 07:57:12 -07:00
Andi Kleen	d025c9db7f	[PATCH] Support piping into commands in /proc/sys/kernel/core_pattern Using the infrastructure created in previous patches implement support to pipe core dumps into programs. This is done by overloading the existing core_pattern sysctl with a new syntax: \|program When the first character of the pattern is a '\|' the kernel will instead threat the rest of the pattern as a command to run. The core dump will be written to the standard input of that program instead of to a file. This is useful for having automatic core dump analysis without filling up disks. The program can do some simple analysis and save only a summary of the core dump. The core dump proces will run with the privileges and in the name space of the process that caused the core dump. I also increased the core pattern size to 128 bytes so that longer command lines fit. Most of the changes comes from allowing core dumps without seeks. They are fairly straight forward though. One small incompatibility is that if someone had a core pattern previously that started with '\|' they will get suddenly new behaviour. I think that's unlikely to be a real problem though. Additional background: > Very nice, do you happen to have a program that can accept this kind of > input for crash dumps? I'm guessing that the embedded people will > really want this functionality. I had a cheesy demo/prototype. Basically it wrote the dump to a file again, ran gdb on it to get a backtrace and wrote the summary to a shared directory. Then there was a simple CGI script to generate a "top 10" crashes HTML listing. Unfortunately this still had the disadvantage to needing full disk space for a dump except for deleting it afterwards (in fact it was worse because over the pipe holes didn't work so if you have a holey address map it would require more space). Fortunately gdb seems to be happy to handle /proc/pid/fd/xxx input pipes as cores (at least it worked with zsh's =(cat core) syntax), so it would be likely possible to do it without temporary space with a simple wrapper that calls it in the right way. I ran out of time before doing that though. The demo prototype scripts weren't very good. If there is really interest I can dig them out (they are currently on a laptop disk on the desk with the laptop itself being in service), but I would recommend to rewrite them for any serious application of this and fix the disk space problem. Also to be really useful it should probably find a way to automatically fetch the debuginfos (I cheated and just installed them in advance). If nobody else does it I can probably do the rewrite myself again at some point. My hope at some point was that desktops would support it in their builtin crash reporters, but at least the KDE people I talked too seemed to be happy with their user space only solution. Alan sayeth: I don't believe that piping as such as neccessarily the right model, but the ability to intercept and processes core dumps from user space is asked for by many enterprise users as well. They want to know about, capture, analyse and process core dumps, often centrally and in automated form. [akpm@osdl.org: loff_t != unsigned long] Signed-off-by: Andi Kleen <ak@suse.de> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-01 00:39:33 -07:00
Andi Kleen	e239ca5405	[PATCH] Create call_usermodehelper_pipe() A new member in the ever growing family of call_usermode* functions is born. The new call_usermodehelper_pipe() function allows to pipe data to the stdin of the called user mode progam and behaves otherwise like the normal call_usermodehelp() (except that it always waits for the child to finish) Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-01 00:39:33 -07:00
Dave Hansen	d8c76e6f45	[PATCH] r/o bind mount prepwork: inc_nlink() helper This is mostly included for parity with dec_nlink(), where we will have some more hooks. This one should stay pretty darn straightforward for now. Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Acked-by: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-01 00:39:30 -07:00
Jay Lan	db5fed26b2	[PATCH] csa accounting taskstats update ChangeLog: Feedbacks from Andrew Morton: - define TS_COMM_LEN to 32 - change acct_stimexpd field of task_struct to be of cputime_t, which is to be used to save the tsk->stime of last timer interrupt update. - a new Documentation/accounting/taskstats-struct.txt to describe fields of taskstats struct. Feedback from Balbir Singh: - keep the stime of a task to be zero when both stime and utime are zero as recoreded in task_struct. Misc: - convert accumulated RSS/VM from platform dependent pages-ticks to MBytes-usecs in the kernel Cc: Shailabh Nagar <nagar@watson.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Jes Sorensen <jes@sgi.com> Cc: Chris Sturtivant <csturtiv@sgi.com> Cc: Tony Ernst <tee@sgi.com> Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-01 00:39:29 -07:00
Jay Lan	8f0ab51479	[PATCH] csa: convert CONFIG tag for extended accounting routines There were a few accounting data/macros that are used in CSA but are #ifdef'ed inside CONFIG_BSD_PROCESS_ACCT. This patch is to change those ifdef's from CONFIG_BSD_PROCESS_ACCT to CONFIG_TASK_XACCT. A few defines are moved from kernel/acct.c and include/linux/acct.h to kernel/tsacct.c and include/linux/tsacct_kern.h. Signed-off-by: Jay Lan <jlan@sgi.com> Cc: Shailabh Nagar <nagar@watson.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Jes Sorensen <jes@sgi.com> Cc: Chris Sturtivant <csturtiv@sgi.com> Cc: Tony Ernst <tee@sgi.com> Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-01 00:39:29 -07:00
Jay Lan	9acc185351	[PATCH] csa: Extended system accounting over taskstats Add extended system accounting handling over taskstats interface. A CONFIG_TASK_XACCT flag is created to enable the extended accounting code. Signed-off-by: Jay Lan <jlan@sgi.com> Cc: Shailabh Nagar <nagar@watson.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Jes Sorensen <jes@sgi.com> Cc: Chris Sturtivant <csturtiv@sgi.com> Cc: Tony Ernst <tee@sgi.com> Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-01 00:39:29 -07:00
Jay Lan	f3cef7a994	[PATCH] csa: basic accounting over taskstats Add some basic accounting fields to the taskstats struct, add a new kernel/tsacct.c to handle basic accounting data handling upon exit. A handle is added to taskstats.c to invoke the basic accounting data handling. Signed-off-by: Jay Lan <jlan@sgi.com> Cc: Shailabh Nagar <nagar@watson.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Jes Sorensen <jes@sgi.com> Cc: Chris Sturtivant <csturtiv@sgi.com> Cc: Tony Ernst <tee@sgi.com> Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net> Cc: "Michal Piotrowski" <michal.k.k.piotrowski@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-01 00:39:29 -07:00
Balbir Singh	0ae646845b	[PATCH] Fix taskstats size calculation (use the new genetlink utility functions) The addition of the CSA patch pushed the size of struct taskstats to 256 bytes. This exposed a problem with prepare_reply(), we were not allocating space for the netlink and genetlink header. It worked earlier because alloc_skb() would align the skb to SMP_CACHE_BYTES, which added some additonal bytes. Signed-off-by: Balbir Singh <balbir@in.ibm.com> Cc: Jamal Hadi <hadi@cyberus.ca> Cc: Shailabh Nagar <nagar@watson.ibm.com> Cc: Thomas Graf <tgraf@suug.ch> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jay Lan <jlan@engr.sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-01 00:39:29 -07:00
Atsushi Nemoto	8ef386092d	[PATCH] kill wall_jiffies With 2.6.18-rc4-mm2, now wall_jiffies will always be the same as jiffies. So we can kill wall_jiffies completely. This is just a cleanup and logically should not change any real behavior except for one thing: RTC updating code in (old) ppc and xtensa use a condition "jiffies - wall_jiffies == 1". This condition is never met so I suppose it is just a bug. I just remove that condition only instead of kill the whole "if" block. [heiko.carstens@de.ibm.com: s390 build fix and cleanup] Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp> Cc: Andi Kleen <ak@muc.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Ian Molton <spyro@f2s.com> Cc: Mikael Starvik <starvik@axis.com> Cc: David Howells <dhowells@redhat.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Hirokazu Takata <takata.hirokazu@renesas.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp> Cc: Richard Curnow <rc@rc0.org.uk> Cc: William Lee Irwin III <wli@holomorphy.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jeff Dike <jdike@addtoit.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp> Cc: Chris Zankel <chris@zankel.net> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-01 00:39:27 -07:00
Adrian Bunk	70bc42f90a	[PATCH] kernel/time/ntp.c: possible cleanups This patch contains the following possible cleanups: - make the following needlessly global function static: - ntp_update_frequency() - make the following needlessly global variables static: - time_state - time_offset - time_constant - time_reftime - remove the following read-only global variable: - time_precision Signed-off-by: Adrian Bunk <bunk@stusta.de> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: john stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-01 00:39:27 -07:00
Roman Zippel	f199239373	[PATCH] ntp: convert to the NTP4 reference model This converts the kernel ntp model into a model which matches the nanokernel reference implementations. The previous patches already increased the resolution and precision of the computations, so that this conversion becomes quite simple. <linux@horizon.com> explains: The original NTP kernel interface was defined in units of microseconds. That's what Linux implements. As computers have gotten faster and can now split microseconds easily, a new kernel interface using nanosecond units was defined ("the nanokernel", confusing as that name is to OS hackers), and there's an STA_NANO bit in the adjtimex() status field to tell the application which units it's using. The current ntpd supports both, but Linux loses some possible timing resolution because of quantization effects, and the ntpd hackers would really like to be able to drop the backwards compatibility code. Ulrich Windl has been maintaining a patch set to do the conversion for years, but it's hard to keep in sync. Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Cc: john stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-01 00:39:27 -07:00
Roman Zippel	04b617e71e	[PATCH] ntp: convert time_freq to nsec value This converts time_freq to a scaled nsec value and adds around 6bit of extra resolution. This pushes the time_freq to its 32bit limits so the calculatons have to be done with 64bit. Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Cc: john stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-01 00:39:27 -07:00
Roman Zippel	97eebe138c	[PATCH] ntp: remove time_tolerance time_tolerance isn't changed at all in the kernel, so simply remove it, this simplifies the next patch, as it avoids a number of conversions. Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Cc: john stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-01 00:39:26 -07:00
Roman Zippel	8f807f8d21	[PATCH] ntp: add time_adjust to tick length This folds update_ntp_one_tick() into second_overflow() and adds time_adjust to the tick length, this makes time_next_adjust unnecessary. This slightly changes the adjtime() behaviour, instead of applying it to the next tick, it's applied to the next second. Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Cc: john stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-01 00:39:26 -07:00
Roman Zippel	3d3675cc3d	[PATCH] ntp: prescale time_offset This converts time_offset into a scaled per tick value. This avoids now completely the crude compensation in second_overflow(). Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Cc: john stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-01 00:39:26 -07:00
Roman Zippel	dc6a43e46f	[PATCH] ntp: add time_freq to tick length This adds the frequency part to ntp_update_frequency(). Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Cc: john stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-01 00:39:26 -07:00
Roman Zippel	ab8783b688	[PATCH] ntp: add time_adj to tick length This makes time_adj local to second_overflow() and integrates it into the tick length instead of adding it everytime. Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Cc: john stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-01 00:39:26 -07:00
Roman Zippel	b0ee75561b	[PATCH] ntp: add ntp_update_frequency This introduces ntp_update_frequency() and deinlines ntp_clear() (as it's not performance critical). ntp_update_frequency() calculates the base tick length using tick_usec and adds a base adjustment, in case the frequency doesn't divide evenly by HZ. Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Cc: john stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-01 00:39:26 -07:00
john stultz	4c7ee8de95	[PATCH] NTP: Move all the NTP related code to ntp.c Move all the NTP related code to ntp.c [akpm@osdl.org: cleanups, build fix] Signed-off-by: John Stultz <johnstul@us.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-01 00:39:26 -07:00
Martin Schwidefsky	ef6edc9746	[PATCH] Directed yield: cpu_relax variants for spinlocks and rw-locks On systems running with virtual cpus there is optimization potential in regard to spinlocks and rw-locks. If the virtual cpu that has taken a lock is known to a cpu that wants to acquire the same lock it is beneficial to yield the timeslice of the virtual cpu in favour of the cpu that has the lock (directed yield). With CONFIG_PREEMPT="n" this can be implemented by the architecture without common code changes. Powerpc already does this. With CONFIG_PREEMPT="y" the lock loops are coded with _raw_spin_trylock, _raw_read_trylock and _raw_write_trylock in kernel/spinlock.c. If the lock could not be taken cpu_relax is called. A directed yield is not possible because cpu_relax doesn't know anything about the lock. To be able to yield the lock in favour of the current lock holder variants of cpu_relax for spinlocks and rw-locks are needed. The new _raw_spin_relax, _raw_read_relax and _raw_write_relax primitives differ from cpu_relax insofar that they have an argument: a pointer to the lock structure. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Haavard Skinnemoen <hskinnemoen@atmel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-01 00:39:21 -07:00
Cal Peake	756184b7d7	[PATCH] CodingStyle cleanup for kernel/sys.c Fix up kernel/sys.c to be consistent with CodingStyle and the rest of the file. Signed-off-by: Cal Peake <cp@absolutedigital.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-01 00:39:20 -07:00
Arjan van de Ven	5c87579e65	[PATCH] maximum latency tracking infrastructure Add infrastructure to track "maximum allowable latency" for power saving policies. The reason for adding this infrastructure is that power management in the idle loop needs to make a tradeoff between latency and power savings (deeper power save modes have a longer latency to running code again). The code that today makes this tradeoff just does a rather simple algorithm; however this is not good enough: There are devices and use cases where a lower latency is required than that the higher power saving states provide. An example would be audio playback, but another example is the ipw2100 wireless driver that right now has a very direct and ugly acpi hook to disable some higher power states randomly when it gets certain types of error. The proposed solution is to have an interface where drivers can * announce the maximum latency (in microseconds) that they can deal with * modify this latency * give up their constraint and a function where the code that decides on power saving strategy can query the current global desired maximum. This patch has a user of each side: on the consumer side, ACPI is patched to use this, on the producer side the ipw2100 driver is patched. A generic maximum latency is also registered of 2 timer ticks (more and you lose accurate time tracking after all). While the existing users of the patch are x86 specific, the infrastructure is not. I'd like to ask the arch maintainers of other architectures if the infrastructure is generic enough for their use (assuming the architecture has such a tradeoff as concept at all), and the sound/multimedia driver owners to look at the driver facing API to see if this is something they can use. [akpm@osdl.org: cleanups] Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Jesse Barnes <jesse.barnes@intel.com> Cc: "Brown, Len" <len.brown@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-01 00:39:19 -07:00
David Howells	9361401eb7	[PATCH] BLOCK: Make it possible to disable the block layer [try #6 ] Make it possible to disable the block layer. Not all embedded devices require it, some can make do with just JFFS2, NFS, ramfs, etc - none of which require the block layer to be present. This patch does the following: () Introduces CONFIG_BLOCK to disable the block layer, buffering and blockdev support. () Adds dependencies on CONFIG_BLOCK to any configuration item that controls an item that uses the block layer. This includes: () Block I/O tracing. () Disk partition code. () All filesystems that are block based, eg: Ext3, ReiserFS, ISOFS. () The SCSI layer. As far as I can tell, even SCSI chardevs use the block layer to do scheduling. Some drivers that use SCSI facilities - such as USB storage - end up disabled indirectly from this. () Various block-based device drivers, such as IDE and the old CDROM drivers. () MTD blockdev handling and FTL. () JFFS - which uses set_bdev_super(), something it could avoid doing by taking a leaf out of JFFS2's book. () Makes most of the contents of linux/blkdev.h, linux/buffer_head.h and linux/elevator.h contingent on CONFIG_BLOCK being set. sector_div() is, however, still used in places, and so is still available. () Also made contingent are the contents of linux/mpage.h, linux/genhd.h and parts of linux/fs.h. () Makes a number of files in fs/ contingent on CONFIG_BLOCK. () Makes mm/bounce.c (bounce buffering) contingent on CONFIG_BLOCK. () set_page_dirty() doesn't call __set_page_dirty_buffers() if CONFIG_BLOCK is not enabled. () fs/no-block.c is created to hold out-of-line stubs and things that are required when CONFIG_BLOCK is not set: () Default blockdev file operations (to give error ENODEV on opening). () Makes some /proc changes: () /proc/devices does not list any blockdevs. () /proc/diskstats and /proc/partitions are contingent on CONFIG_BLOCK. () Makes some compat ioctl handling contingent on CONFIG_BLOCK. () If CONFIG_BLOCK is not defined, makes sys_quotactl() return -ENODEV if given command other than Q_SYNC or if a special device is specified. () In init/do_mounts.c, no reference is made to the blockdev routines if CONFIG_BLOCK is not defined. This does not prohibit NFS roots or JFFS2. () The bdflush, ioprio_set and ioprio_get syscalls can now be absent (return error ENOSYS by way of cond_syscall if so). () The seclvl_bd_claim() and seclvl_bd_release() security calls do nothing if CONFIG_BLOCK is not set, since they can't then happen. Signed-Off-By: David Howells <dhowells@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2006-09-30 20:52:31 +02:00
David Howells	07f3f05c1e	[PATCH] BLOCK: Move extern declarations out of fs/.c into header files [try #6 ] Create a new header file, fs/internal.h, for common definitions local to the sources in the fs/ directory. Move extern definitions that should be in header files from fs/.c to fs/internal.h or other main header files where they span directories. Signed-Off-By: David Howells <dhowells@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2006-09-30 20:52:18 +02:00
David Howells	0d67a46df0	[PATCH] BLOCK: Remove duplicate declaration of exit_io_context() [try #6 ] Remove the duplicate declaration of exit_io_context() from linux/sched.h. Signed-Off-By: David Howells <dhowells@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2006-09-30 20:31:20 +02:00
Andi Kleen	34596dc9e5	[PATCH] Define vsyscall cache as blob to make clearer that user space shouldn't use it Signed-off-by: Andi Kleen <ak@suse.de>	2006-09-30 01:47:55 +02:00
Andi Kleen	29cbc78b90	[PATCH] x86: Clean up x86 NMI sysctls Use prototypes in headers Don't define panic_on_unrecovered_nmi for all architectures Cc: dzickus@redhat.com Signed-off-by: Andi Kleen <ak@suse.de>	2006-09-30 01:47:55 +02:00
Paul Jackson	181b648036	[PATCH] cpuset: fix obscure attach_task vs exiting race Fix obscure race condition in kernel/cpuset.c attach_task() code. There is basically zero chance of anyone accidentally being harmed by this race. It requires a special 'micro-stress' load and a special timing loop hacks in the kernel to hit in less than an hour, and even then you'd have to hit it hundreds or thousands of times, followed by some unusual and senseless cpuset configuration requests, including removing the top cpuset, to cause any visibly harm affects. One could, with perhaps a few days or weeks of such effort, get the reference count on the top cpuset below zero, and manage to crash the kernel by asking to remove the top cpuset. I found it by code inspection. The race was introduced when 'the_top_cpuset_hack' was introduced, and one piece of code was not updated. An old check for a possibly null task cpuset pointer needed to be changed to a check for a task marked PF_EXITING. The pointer can't be null anymore, thanks to the_top_cpuset_hack (documented in kernel/cpuset.c). But the task could have gone into PF_EXITING state after it was found in the task_list scan. If a task is PF_EXITING in this code, it is possible that its task->cpuset pointer is pointing to the top cpuset due to the_top_cpuset_hack, rather than because the top_cpuset was that tasks last valid cpuset. In that case, the wrong cpuset reference counter would be decremented. The fix is trivial. Instead of failing the system call if the tasks cpuset pointer is null here, fail it if the task is in PF_EXITING state. The code for 'the_top_cpuset_hack' that changes an exiting tasks cpuset to the top_cpuset is done without locking, so could happen at anytime. But it is done during the exit handling, after the PF_EXITING flag is set. So if we verify that a task is still not PF_EXITING after we copy out its cpuset pointer (into 'oldcs', below), we know that 'oldcs' is not one of these hack references to the top_cpuset. Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:25 -07:00
Ingo Molnar	03cbc358aa	[PATCH] lockdep core: improve the lock-chain-hash With CONFIG_DEBUG_LOCK_ALLOC turned off i was getting sporadic failures in the locking self-test: ------------> \| Locking API testsuite: ---------------------------------------------------------------------------- \| spin \|wlock \|rlock \|mutex \| wsem \| rsem \| -------------------------------------------------------------------------- A-A deadlock: ok \| ok \| ok \| ok \| ok \| ok \| A-B-B-A deadlock: ok \| ok \| ok \| ok \| ok \| ok \| A-B-B-C-C-A deadlock: ok \| ok \| ok \| ok \| ok \| ok \| A-B-C-A-B-C deadlock: ok \| ok \| ok \| ok \| ok \| ok \| A-B-B-C-C-D-D-A deadlock: ok \|FAILED\| ok \| ok \| ok \| ok \| A-B-C-D-B-D-D-A deadlock: ok \| ok \| ok \| ok \| ok \| ok \| A-B-C-D-B-C-D-A deadlock: ok \| ok \| ok \| ok \| ok \|FAILED\| after much debugging it turned out to be caused by accidental chain-hash key collisions. The current hash is: #define iterate_chain_key(key1, key2) \ (((key1) << MAX_LOCKDEP_KEYS_BITS/2) ^ \ ((key1) >> (64-MAX_LOCKDEP_KEYS_BITS/2)) ^ \ (key2)) where MAX_LOCKDEP_KEYS_BITS is 11. This hash is pretty good as it will shift by 5 bits in every iteration, where every new ID 'mixed' into the hash would have up to 11 bits. But because there was a 6 bits overlap between subsequent IDs and their high bits tended to be similar, there was a chance for accidental chain-hash collision for a low number of locks held. the solution is to shift by 11 bits: #define iterate_chain_key(key1, key2) \ (((key1) << MAX_LOCKDEP_KEYS_BITS) ^ \ ((key1) >> (64-MAX_LOCKDEP_KEYS_BITS)) ^ \ (key2)) This keeps the hash perfect up to 5 locks held, but even above that the hash is still good because 11 bits is a relative prime to the total 64 bits, so a complete match will only occur after 64 held locks (which doesnt happen in Linux). Even after 5 locks held, entropy of the 5 IDs mixed into the hash is already good enough so that overlap doesnt generate a colliding hash ID. with this change the false positives went away. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:25 -07:00
Alan Cox	eb84a20e9e	[PATCH] audit/accounting: tty locking Add tty locking around the audit and accounting code. The whole current->signal-> locking is all deeply strange but it's for someone else to sort out. Add rather than replace the lock for acct.c Signed-off-by: Alan Cox <alan@redhat.com> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:25 -07:00
Rusty Russell	e5582ca21a	[PATCH] stop_machine.c copyright I had to look back: this code was extracted from the module.c code in 2005. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:24 -07:00
Ian S. Nelson	04b1db9fd7	[PATCH] /sys/modules: allow full length section names I've been using systemtap for some debugging and I noticed that it can't probe a lot of modules. Turns out it's kind of silly, the sections section of /sys/module is limited to 32byte filenames and many of the actual sections are a a bit longer than that. [akpm@osdl.org: rewrite to use dymanic allocation] Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:23 -07:00
Paul Jackson	b1aac8bb82	[PATCH] cpuset: hotunplug cpus and mems in all cpusets The cpuset code handling hot unplug of CPUs or Memory Nodes was incorrect - it could remove a CPU or Node from the top cpuset, while leaving it still in some child cpusets. One basic rule of cpusets is that each cpusets cpus and mems are subsets of its parents. The cpuset hot unplug code violated this rule. So the cpuset hotunplug handler must walk down the tree, removing any removed CPU or Node from all cpusets. However, it is not allowed to make a cpusets cpus or mems become empty. They can only transition from empty to non-empty, not back. So if the last CPU or Node would be removed from a cpuset by the above walk, we scan back up the cpuset hierarchy, finding the nearest ancestor that still has something online, and copy its CPU or Memory placement. Signed-off-by: Paul Jackson <pj@sgi.com> Cc: Nathan Lynch <ntl@pobox.com> Cc: Anton Blanchard <anton@samba.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:21 -07:00
Paul Jackson	38837fc75a	[PATCH] cpuset: top_cpuset tracks hotplug changes to node_online_map Change the list of memory nodes allowed to tasks in the top (root) nodeset to dynamically track what cpus are online, using a call to a cpuset hook from the memory hotplug code. Make this top cpus file read-only. On systems that have cpusets configured in their kernel, but that aren't actively using cpusets (for some distros, this covers the majority of systems) all tasks end up in the top cpuset. If that system does support memory hotplug, then these tasks cannot make use of memory nodes that are added after system boot, because the memory nodes are not allowed in the top cpuset. This is a surprising regression over earlier kernels that didn't have cpusets enabled. One key motivation for this change is to remain consistent with the behaviour for the top_cpuset's 'cpus', which is also read-only, and which automatically tracks the cpu_online_map. This change also has the minor benefit that it fixes a long standing, little noticed, minor bug in cpusets. The cpuset performance tweak to short circuit the cpuset_zone_allowed() check on systems with just a single cpuset (see 'number_of_cpusets', in linux/cpuset.h) meant that simply changing the 'mems' of the top_cpuset had no affect, even though the change (the write system call) appeared to succeed. With the following change, that write to the 'mems' file fails -EACCES, and the 'mems' file stubbornly refuses to be changed via user space writes. Thus no one should be mislead into thinking they've changed the top_cpusets's 'mems' when in affect they haven't. In order to keep the behaviour of cpusets consistent between systems actively making use of them and systems not using them, this patch changes the behaviour of the 'mems' file in the top (root) cpuset, making it read only, and making it automatically track the value of node_online_map. Thus tasks in the top cpuset will have automatic use of hot plugged memory nodes allowed by their cpuset. [akpm@osdl.org: build fix] [bunk@stusta.de: build fix] Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:21 -07:00
Oleg Nesterov	c394cc9fbb	[PATCH] introduce TASK_DEAD state I am not sure about this patch, I am asking Ingo to take a decision. task_struct->state == EXIT_DEAD is a very special case, to avoid a confusion it makes sense to introduce a new state, TASK_DEAD, while EXIT_DEAD should live only in ->exit_state as documented in sched.h. Note that this state is not visible to user-space, get_task_state() masks off unsuitable states. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:21 -07:00
Oleg Nesterov	55a101f8f7	[PATCH] kill PF_DEAD flag After the previous change (->flags & PF_DEAD) <=> (->state == EXIT_DEAD), we don't need PF_DEAD any longer. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:20 -07:00
Oleg Nesterov	29b8849216	[PATCH] set EXIT_DEAD state in do_exit(), not in schedule() schedule() checks PF_DEAD on every context switch and sets ->state = EXIT_DEAD to ensure that the exiting task will be deactivated. Note that this EXIT_DEAD is in fact a "random" value, we can use any bit except normal TASK_XXX values. It is better to set this state in do_exit() along with PF_DEAD flag and remove that check in schedule(). We are safe wrt concurrent try_to_wake_up() (for example ptrace, tkill), it can not change task's ->state: the 'state' argument of try_to_wake_up() can't have EXIT_DEAD bit. And in case when try_to_wake_up() sees a stale value of ->state == TASK_RUNNING it will do nothing. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:20 -07:00
Oleg Nesterov	aaa2a97eb9	[PATCH] sys_get_robust_list(): don't take tasklist_lock use rcu locks for find_task_by_pid(). Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:18 -07:00
Oleg Nesterov	d359b549bf	[PATCH] futex_find_get_task(): don't take tasklist_lock It is ok to do find_task_by_pid() + get_task_struct() under rcu_read_lock(), we cand drop tasklist_lock. Note that testing of ->exit_state is racy with or without tasklist anyway. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:18 -07:00
Oleg Nesterov	5b160f5ecd	[PATCH] copy_process: cosmetic ->ioprio tweak copy_process: // holds tasklist_lock + ->siglock /* * inherit ioprio */ p->ioprio = current->ioprio; Why? ->ioprio was already copied in dup_task_struct(). I guess this is needed to ensure that the child can't escape sys_ioprio_set(IOPRIO_WHO_{PGRP,USER}), yes? In that case we don't need ->siglock held, and the comment should be updated. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Jens Axboe <axboe@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:18 -07:00
Oleg Nesterov	1c573afebc	[PATCH] reparent_to_init(): use has_rt_policy() Remove open-coded has_rt_policy(), no changes in kernel/exit.o Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:18 -07:00
Oleg Nesterov	8dc3e9099e	[PATCH] sched_setscheduler: fix? policy checks I am not sure this patch is correct: I can't understand what the current code does, and I don't know what it was supposed to do. The comment says: * can't change policy, except between SCHED_NORMAL * and SCHED_BATCH: The code: if (((policy != SCHED_NORMAL && p->policy != SCHED_BATCH) && (policy != SCHED_BATCH && p->policy != SCHED_NORMAL)) && But this is equivalent to: if ( (is_rt_policy(policy) && has_rt_policy(p)) && which means something different. We can't _decrease_ the current ->rt_priority with such a check (if rlim[RLIMIT_RTPRIO] == 0). Probably, it was supposed to be: if ( !(policy == SCHED_NORMAL && p->policy == SCHED_BATCH) && !(policy == SCHED_BATCH && p->policy == SCHED_NORMAL) this matches the comment, but strange: it doesn't allow to _drop_ the realtime priority when rlim[RLIMIT_RTPRIO] == 0. I think the right check would be: /* can't set/change rt policy */ if (is_rt_policy(policy) && policy != p->policy && !rlim_rtprio) return -EPERM; Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:17 -07:00
Oleg Nesterov	57a6f51c42	[PATCH] introduce is_rt_policy() helper Imho, makes the code a bit easier to read. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:17 -07:00
Oleg Nesterov	5fe1d75f34	[PATCH] do_sched_setscheduler(): don't take tasklist_lock Use rcu locks instead. sched_setscheduler() now takes ->siglock before reading ->signal->rlim[]. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:17 -07:00
Cal Peake	c9472e0f28	[PATCH] kill extraneous printk in kernel_restart() Get rid of an extraneous printk in kernel_restart(). Signed-off-by: Cal Peake <cp@absolutedigital.net> Acked-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:16 -07:00
Björn Steinbrink	111dbe0c8a	[PATCH] Fix ____call_usermodehelper errors being silently ignored If ____call_usermodehelper fails, we're not interested in the child process' exit value, but the real error, so let's stop wait_for_helper from overwriting it in that case. Issue discovered by Benedikt Böhm while working on a Linux-VServer usermode helper. Signed-off-by: Björn Steinbrink <B.Steinbrink@gmx.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:16 -07:00
Alan Cox	54306cf04c	[PATCH] exit: fix crash case If we are going to BUG() not panic() here then we should cover the case of the BUG being compiled out Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:16 -07:00
Roland McGrath	0b4a8a789a	[PATCH] kexec warning fix This fixes a couple of compiler warnings, and adds paranoia checks as well. Signed-off-by: Roland McGrath <roland@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:15 -07:00
Atsushi Nemoto	3171a0305d	[PATCH] simplify update_times (avoid jiffies/jiffies_64 aliasing problem) Pass ticks to do_timer() and update_times(), and adjust x86_64 and s390 timer interrupt handler with this change. Currently update_times() calculates ticks by "jiffies - wall_jiffies", but callers of do_timer() should know how many ticks to update. Passing ticks get rid of this redundant calculation. Also there are another redundancy pointed out by Martin Schwidefsky. This cleanup make a barrier added by `5aee405c66` needless. So this patch removes it. As a bonus, this cleanup make wall_jiffies can be removed easily, since now wall_jiffies is always synced with jiffies. (This patch does not really remove wall_jiffies. It would be another cleanup patch) Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Cc: Andi Kleen <ak@muc.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Acked-by: Russell King <rmk@arm.linux.org.uk> Cc: Ian Molton <spyro@f2s.com> Cc: Mikael Starvik <starvik@axis.com> Acked-by: David Howells <dhowells@redhat.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Hirokazu Takata <takata.hirokazu@renesas.com> Acked-by: Ralf Baechle <ralf@linux-mips.org> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp> Cc: Richard Curnow <rc@rc0.org.uk> Cc: William Lee Irwin III <wli@holomorphy.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jeff Dike <jdike@addtoit.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp> Cc: Chris Zankel <chris@zankel.net> Acked-by: "Luck, Tony" <tony.luck@intel.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:15 -07:00
Roland McGrath	27d91e07f9	[PATCH] __dequeue_signal() cleanup This tightens up __dequeue_signal a little. It also avoids doing recalc_sigpending twice in a row, instead doing it once in dequeue_signal. Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:15 -07:00
Roland McGrath	b9ecb2bd5d	[PATCH] has_stopped_jobs() cleanup This check has been obsolete since the introduction of TASK_TRACED. Now TASK_STOPPED always means job control stop. Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:15 -07:00
Toyo Abe	e4b765551a	[PATCH] posix-timers: Fix the flags handling in posix_cpu_nsleep() When a posix_cpu_nsleep() sleep is interrupted by a signal more than twice, it incorrectly reports the sleep time remaining to the user. Because posix_cpu_nsleep() doesn't report back to the user when it's called from restart function due to the wrong flags handling. This patch, which applies after previous one, moves the nanosleep() function from posix_cpu_nsleep() to do_cpu_nanosleep() and cleans up the flags handling appropriately. Signed-off-by: Toyo Abe <toyoa@mvista.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:15 -07:00
Toyo Abe	1711ef3866	[PATCH] posix-timers: Fix clock_nanosleep() doesn't return the remaining time in compatibility mode The clock_nanosleep() function does not return the time remaining when the sleep is interrupted by a signal. This patch creates a new call out, compat_clock_nanosleep_restart(), which handles returning the remaining time after a sleep is interrupted. This patch revives clock_nanosleep_restart(). It is now accessed via the new call out. The compat_clock_nanosleep_restart() is used for compatibility access. Since this is implemented in compatibility mode the normal path is virtually unaffected - no real performance impact. Signed-off-by: Toyo Abe <toyoa@mvista.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:15 -07:00
Akinobu Mita	07dccf3344	[PATCH] check return value of cpu_callback Spawing ksoftirqd, migration, or watchdog, and calling init_timers_cpu() may fail with small memory. If it happens in initcalls, kernel NULL pointer dereference happens later. This patch makes crash happen immediately in such cases. It seems a bit better than getting kernel NULL pointer dereference later. Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Akinobu Mita <mita@miraclelinux.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:14 -07:00
Paul E. McKenney	a45bce4954	[PATCH] memory ordering in __kfifo primitives Both __kfifo_put() and __kfifo_get() have header comments stating that if there is but one concurrent reader and one concurrent writer, locking is not necessary. This is almost the case, but a couple of memory barriers are needed. Another option would be to change the header comments to remove the bit about locking not being needed, and to change the those callers who currently don't use locking to add the required locking. The attachment analyzes this approach, but the patch below seems simpler. Signed-off-by: Paul E. McKenney <paulmck@us.ibm.com> Cc: Stelian Pop <stelian@popies.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-09-29 09:18:13 -07:00

... 2 3 4 5 6 ...

1723 Commits