Commit Graph

73606 Commits

Author SHA1 Message Date
Avi Kivity
70433389cc KVM: SVM: Fix SMP with kernel apic
AP processor needs to reset to the SIPI vector, not normal INIT.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-11-08 12:05:36 +02:00
Avi Kivity
1e35d3c4a7 KVM: x86 emulator: fix 'push imm8' emulation
'push imm8' found itself in the wrong switch somehow, so it is never executed.

This fixes Windows 2003 installation.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-11-08 10:42:04 +02:00
Paul Mundt
541c547731 Merge branch 'page_colouring_despair' 2007-11-08 17:01:42 +09:00
Tejun Heo
fffe487d59 pktcdvd: fix BUG caused by sysfs module reference semantics change
pkt_setup_dev() expects module reference to be held on invocation.
This used to be true for sysfs callbacks but not anymore.  Test and
grab module reference around pkt_setup_dev() in
class_pktcdvd_store_add().

Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Peter Osterlund <petero2@telia.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-11-08 08:00:24 +01:00
Paul Mackerras
688016f4e2 Merge branch 'for-2.6.24' of master.kernel.org:/pub/scm/linux/kernel/git/jwboyer/powerpc-4xx into merge 2007-11-08 14:28:14 +11:00
Linas Vepstas
2c84b4076c [POWERPC] EEH: Make sure warning message is printed
Fix old buglet; a warning message should have been printed
when a hardware reset takes too long.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-11-08 14:15:34 +11:00
Johannes Berg
2e6f40deb7 [POWERPC] Make altivec code in swsusp_32.S depend on CONFIG_ALTIVEC
This makes the altivec code in swsusp_32.S depend on CONFIG_ALTIVEC to
avoid build failures for systems that don't have altivec. I'm not sure
whether the code will actually work for other systems, but it was merged
for just ppc32 rather than powermac a very long time ago.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-11-08 14:15:34 +11:00
Johannes Berg
67b60518b0 [POWERPC] windfarm: Fix windfarm thread freezer interaction
When I fixed the windfarm freezer interaction first in commit
1ed2ddf380, an earlier patch than the one
I came up with after comments was committed. This has come back to haunt
us now because commit d5d8c5976d changed
the freezer to no long send signals. Fix it by removing the windfarm
thread's signal logic and restoring the original try_to_freeze().

We could simply revert 1ed2ddf380 now
but I feel that the assertion that no signal is delivered to the
windfarm thread needs not be there.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-11-08 14:15:34 +11:00
Benjamin Herrenschmidt
a792e75d9b [POWERPC] Fix si_addr value on low level hash failures
If the low level MMU hash table insertion returns an error (which
can happen in some rare circumstances when the hypervisor refuses
the insertion of a PTE, typically if you try to access junk via
/dev/mem), the generated signal had an incorrect si_addr value due
to a bug in the assembly, which was loading it as a 32 bits quantity
instead of a 64 bits quantity.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-11-08 14:15:34 +11:00
Olof Johansson
7e22fa4a1d [POWERPC] Refresh ppc64_defconfig and enable pasemi-related options
Refresh ppc64_defconfig, add PPC_PASEMI and various options that the
common boards there need:

* Chip drivers (iommu, ethernet, IDE, CF, EDAC, MDIO/PHY)
* PCMCIA
* PATA_PCMCIA
* RTC_CLASS
* SATA_MV
* SATA_SIL24
* IP_PNP + NFS_ROOT for diskless booting

+ possibly some other things I might have missed to list

Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-11-08 14:15:34 +11:00
Olof Johansson
0c83ddfeb4 [POWERPC] pasemi: Update defconfig
Update pasemi_defconfig.  Add a few missing options for default devices
on electra boards, enable tickless and hrtimers, etc, etc.

Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-11-08 14:15:33 +11:00
Stephen Rothwell
7992344fde [POWERPC] iSeries: Fix ref counting in vio setup
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-11-08 14:15:33 +11:00
Li Zefan
aca71ef882 [POWERPC] ] Fix memset size error
The size passing to memset is wrong.

Signed-off-by Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-11-08 14:15:33 +11:00
Stephen Rothwell
e95c91821f [POWERPC] Fix link errors for allyesconfig
An allyesconfig build creates a .text section that is so big that the
.text.init.refok and .fixup sections are too far away for the relocations
to be fixed up correctly. This patch fixes that by linking all the
relevent text sections for each file together.

Suggested by Paul Mackerras.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-11-08 14:15:33 +11:00
Stephen Rothwell
18244cfbc3 [POWERPC] iSeries_init_IRQ non-PCI tidy
ppc_md.init_IRQ is not called if it is NULL, so we don't need an empty
routine in the non PCI case.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-11-08 14:15:33 +11:00
Patrick Mansfield
f2205fbb5a [POWERPC] Change fallocate to match unistd.h on powerpc
Fix the fallocate system call on powerpc to match its unistd.h.

This implies none of these system calls are currently working with the
unistd.h sys call values:
	fallocate
	signalfd
	timerfd
	eventfd
	sync_file_range2

Signed-off-by: Patrick Mansfield <patmans@us.ibm.com>
Acked-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-11-08 14:15:32 +11:00
Linas Vepstas
b37ceefe7c [POWERPC] EEH: Avoid crash on null device
Bugfix: avoid crash if there's no PCI device for a given
openfirmware node.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-11-08 14:15:32 +11:00
Linas Vepstas
2a50f144fc [POWERPC] EEH: Drivers that need reset trump others
Bugfix: if a driver controlling one part of a multi-function PCI card
has asked for a reset, honor that request above all others.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-11-08 14:15:32 +11:00
Linas Vepstas
638799b335 [POWERPC] EEH: Clean up comments
Clean up commentary, remove dead code.

Signed-off-by Linas Vepstas <linas@austin.ibm.com>

Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-11-08 14:15:32 +11:00
Paul Mackerras
43875cc0a5 [POWERPC] Fix off-by-one error in setting decrementer on Book E/4xx (v2)
The decrementer in Book E and 4xx processors interrupts on the
transition from 1 to 0, rather than on the 0 to -1 transition as on
64-bit server and 32-bit "classic" (6xx/7xx/7xxx) processors.  At the
moment we subtract 1 from the count of how many decrementer ticks are
required before the next interrupt before putting it into the
decrementer, which is correct for server/classic processors, but could
possibly cause the interrupt to happen too early on Book E and 4xx if
the timebase/decrementer frequency is low.

This fixes the problem by making set_dec subtract 1 from the count for
server and classic processors, instead of having the callers subtract
1.  Since set_dec already had a bunch of ifdefs to handle different
processor types, there is no net increase in ugliness. :)

Note that calling set_dec(0) may not generate an interrupt on some
processors.  To make sure that decrementer_set_next_event always calls
set_dec with an interval of at least 1 tick, we set min_delta_ns of
the decrementer_clockevent to correspond to 2 ticks (2 rather than 1
to compensate for truncations in the conversions between ticks and
ns).

This also removes a redundant call to set the decrementer to
0x7fffffff - it was already set to that earlier in timer_interrupt.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-11-08 14:15:31 +11:00
will schmidt
465ccab9eb [POWERPC] Fix switch_slb handling of 1T ESID values
Now that we have 1TB segment size support, we need to be using the
GET_ESID_1T macro when comparing ESID values for pc, stack, and
unmapped_base within switch_slb().   A new helper function called
esids_match() contains the logic for deciding when to call GET_ESID
and GET_ESID_1T.

This fixes a duplicate-slb-entry inspired machine-check exception I
was seeing when trying to run java on a power6 partition.

Tested on power6 and power5.

Signed-off-by: Will Schmidt <will_schmidt@vnet.ibm.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-11-08 14:15:31 +11:00
Tony Breeds
e7bda183d4 [POWERPC] Fix build failure when CONFIG_VIRT_CPU_ACCOUNTING is not defined
Without this patch I get the following build failure
  CC      arch/powerpc/platforms/celleb/setup.o
arch/powerpc/platforms/celleb/setup.c:151: error: 'generic_calibrate_decr' undeclared here (not in a function)

Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Acked-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-11-08 14:15:31 +11:00
will schmidt
aa39be09df [POWERPC] Include udbg.h when using udbg_printf
This fixes the error
	error: implicit declaration of function "udbg_printf"

We have a few spots where we reference udbg_printf() without #including
udbg.h.  These are within #ifdef DEBUG blocks, so unnoticed until we do
a #define DEBUG or #define DEBUG_LOW nearby.

Signed-off-by: Will Schmidt <will_schmidt@vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-11-08 14:15:31 +11:00
Benjamin Herrenschmidt
20474abda6 [POWERPC] Fix cache line vs. block size confusion
We had an historical confusion in the kernel between cache line
and cache block size. The former is an implementation detail of
the L1 cache which can be useful for performance optimisations,
the later is the actual size on which the cache control
instructions operate, which can be different.

For some reason, we had a weird hack reading the right property
on powermac and the wrong one on any other 64 bits (32 bits is
unaffected as it only uses the cputable for cache block size
infos at this stage).

This fixes the booting-without-of.txt documentation to mention
the right properties, and fixes the 64 bits initialization code
to look for the block size first, with a fallback to the line
size if the property is missing.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-11-08 14:15:30 +11:00
Alexey Dobriyan
fb293ae1c0 [POWERPC] Fix sysctl table check failure on PowerMac
kernel was marked with 0755. Everywhere else it's 0555.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-11-08 14:15:30 +11:00
Olof Johansson
4bfac36891 [POWERPC] Fix CONFIG_SMP=n build break
Fix two build errors on powerpc allyesconfig + CONFIG_SMP=n:

arch/powerpc/platforms/built-in.o: In function `cpu_affinity_set':
arch/powerpc/platforms/cell/spu_priv1_mmio.c:78: undefined reference to `.iic_get_target_id'
arch/powerpc/platforms/built-in.o: In function `iic_init_IRQ':
arch/powerpc/platforms/cell/interrupt.c:397: undefined reference to `.iic_setup_cpu'

Signed-off-by: Olof Johansson <olof@lixom.net>
Acked-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-11-08 14:15:30 +11:00
Scott Wood
aeb4552fad [POWERPC] bootwrapper: Revert ps3 binary flag usage, and remove .bin suffix
The ps3 target produces two images, and the binary one is not the
"primary" image that corresponds to the -o flag; thus, it no longer
uses the generic binary flag.

On platforms which do use the binary flag, it no longer produces a
.bin suffix, so that the output file matches what was passed to the -o flag.

This should fix the zImage ln problems for the ps3 target.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Acked-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-11-08 14:15:30 +11:00
Dale Farnsworth
d102c9d5d3 [POWERPC] Fix mv643xx_pci sysfs .read and .write functions
Commit 91a69029 introduced an additional parameter to the .read and .write
methods for sysfs binary attributes.  Two mv64x60_pci functions
were missed in that patch, resulting in these errors:
	/cache/git/linux-2.6/arch/powerpc/sysdev/mv64x60_pci.c:77: warning: initialization from incompatible pointer type
	/cache/git/linux-2.6/arch/powerpc/sysdev/mv64x60_pci.c:78: warning: initialization from incompatible pointer type

Add the missing "struct bin_attribute *" parameter.

Signed-off-by: Dale Farnsworth <dale@farnsworth.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-11-08 14:15:29 +11:00
Aurelien Jarno
3a800ff50a [POWERPC] i8259: Add disable method
Since commit 76d2160147, the NE2000 card
is not working anymore on PPC and POWERPC and produces WATCHDOG
timeouts.

The patch below fixes that the same way it has been done on x86, x86_64
and MIPS.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-11-08 14:15:29 +11:00
Michael Ellerman
1db3e890ae [POWERPC] Read back MSI message in rtas_setup_msi_irqs() so restore works
There are plans afoot to use pci_restore_msi_state() to restore MSI
state after a device reset.  In order for this to work for the RTAS MSI
backend, we need to read back the MSI message from config space after
it has been setup by firmware.

This should be sufficient for restoring the MSI state after a device
reset, however we will need to revisit this for suspend to disk if that
is ever implemented on pseries.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-11-08 14:15:29 +11:00
Olof Johansson
bdd71eec9b [POWERPC] Fix build break in arch/ppc/syslib/m8260_setup.c
Fix build break and warnings in current mainline git:

arch/ppc/syslib/m8260_setup.c: In function 'm8260_setup_arch':
arch/ppc/syslib/m8260_setup.c:63: error: implicit declaration of function 'identify_ppc_sys_by_name_and_id'
arch/ppc/syslib/m8260_setup.c:64: warning: passing argument 1 of 'in_be32' makes pointer from integer without a cast
arch/ppc/syslib/m8260_setup.c: In function 'm8260_show_cpuinfo':
arch/ppc/syslib/m8260_setup.c:158: warning: format '%08x' expects type 'unsigned int', but argument 5 has type 'long unsigned int'
arch/ppc/syslib/m8260_setup.c:158: warning: format '%d' expects type 'int', but argument 6 has type 'long unsigned int'
arch/ppc/syslib/m8260_setup.c:158: warning: format '%u' expects type 'unsigned int', but argument 7 has type 'long unsigned int'
arch/ppc/syslib/m8260_setup.c:158: warning: format '%u' expects type 'unsigned int', but argument 8 has type 'long unsigned int'
arch/ppc/syslib/m8260_setup.c:158: warning: format '%u' expects type 'unsigned int', but argument 9 has type 'long unsigned int'
make[1]: *** [arch/ppc/syslib/m8260_setup.o] Error 1
make[1]: *** Waiting for unfinished jobs....

Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-11-08 14:15:29 +11:00
Paul Mundt
6d1c76d4e7 sh: Kill off broken snapgear ds1302 code.
This will force the snapgear boards to use the on-chip SH RTC instead,
until the rtc-ds1302 driver is merged. The current code is broken
and hasn't built in some time, so just kill it off and get the board
working again.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2007-11-08 11:24:33 +09:00
Stephen Smalley
45e5421eb5 SELinux: add more validity checks on policy load
Add more validity checks at policy load time to reject malformed
policies and prevent subsequent out-of-range indexing when in permissive
mode.  Resolves the NULL pointer dereference reported in
https://bugzilla.redhat.com/show_bug.cgi?id=357541.

Signed-off-by:  Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
2007-11-08 08:56:23 +11:00
KaiGai Kohei
6d2b685564 SELinux: fix bug in new ebitmap code.
The "e_iter = e_iter->next;" statement in the inner for loop is primally
bug.  It should be moved to outside of the for loop.

Signed-off-by: KaiGai Kohei <kaigai@kaigai.gr.jp>
Acked-by:  Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
2007-11-08 08:55:10 +11:00
Stephen Rothwell
57002bfb31 SELinux: suppress a warning for 64k pages.
On PowerPC allmodconfig build we get this:

security/selinux/xfrm.c:214: warning: comparison is always false due to limited range of data type

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: James Morris <jmorris@namei.org>
2007-11-08 08:55:04 +11:00
Vlad Yasevich
027f6e1ad3 SCTP: Fix a potential race between timers and receive path.
There is a possible race condition where the timer code will
free the association and the next packet in the queue will also
attempt to free the same association.

The example is, when we receive an ABORT at about the same time
as the retransmission timer fires.  If the timer wins the race,
it will free the association.  Once it releases the lock, the
queue processing will recieve the ABORT and will try to free
the association again.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2007-11-07 11:39:27 -05:00
Vlad Yasevich
73d9c4fd1a SCTP: Allow ADD_IP to work with AUTH for backward compatibility.
This patch adds a tunable that will allow ADD_IP to work without
AUTH for backward compatibility.  The default value is off since
the default value for ADD_IP is off as well.  People who need
to use ADD-IP with older implementations take risks of connection
hijacking and should consider upgrading or turning this tunable on.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2007-11-07 11:39:27 -05:00
Vlad Yasevich
88799fe5ec SCTP: Correctly disable ADD-IP when AUTH is not supported.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2007-11-07 11:39:27 -05:00
Vlad Yasevich
0ed90fb0f6 SCTP: Update RCU handling during the ADD-IP case
After learning more about rcu, it looks like the ADD-IP hadling
doesn't need to call call_rcu_bh.  All the rcu critical sections
use rcu_read_lock, so using call_rcu_bh is wrong here.
Now, restore the local_bh_disable() code blocks and use normal
call_rcu() calls.  Also restore the missing return statement.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2007-11-07 11:39:27 -05:00
Vlad Yasevich
b6157d8e03 SCTP: Fix difference cases of retransmit.
Commit d0ce92910b broke several retransmit
cases including fast retransmit.  The reason is that we should
only delay by rto while doing retranmists as a result of a timeout.
Retransmit as a result of path mtu discover, fast retransmit, or
other evernts that should trigger immidiate retransmissions got broken.

Also, since rto is doubled prior to marking of packets elegable for
retransmission, we never marked correct chunks anyway.

The fix is provide a reason for a given retransmission so that we
can mark chunks appropriately and to save the old rto value to do
comparisons against.

All regressions tests passed with this code.

Spotted by Wei Yongjun <yjwei@cn.fujitsu.com>

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2007-11-07 11:39:27 -05:00
Wei Yongjun
f3830ccc2e SCTP : Fix to process bundled ASCONF chunk correctly
If ASCONF chunk is bundled with other chunks as the first chunk, when
process the ASCONF parameters, full packet data will be process as the
parameters of the ASCONF chunk, not only the real parameters. So if you
send a ASCONF chunk bundled with other chunks, you will get an unexpect
result.
This problem also exists when ASCONF-ACK chunk is bundled with other chunks.

This patch fix this problem.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2007-11-07 11:39:27 -05:00
Wei Yongjun
64b0812b6d SCTP : Fix bad formatted comment in outqueue.c
Just fix the bad format of the comment in outqueue.c.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2007-11-07 11:39:26 -05:00
Russell King
70dfa3f875 [ARM] Allow watchdog drivers to be selected again
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2007-11-07 14:13:35 +00:00
Jens Axboe
8ec680e4c3 ioprio: allow sys_ioprio_set() value of 0 to reset ioprio setting
Normally io priorities follow the CPU nice, unless a specific scheduling
class has been set. Once that is set, there's no way to reset the
behaviour to 'none' so that it follows CPU nice again.

Currently passing in 0 as the ioprio class/value will return -1/EINVAL,
change that to allow resetting of a set scheduling class.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-11-07 13:54:07 +01:00
Oleg Nesterov
0e7be9edb9 cfq_idle_class_timer: add paranoid checks for jiffies overflow
In theory, if the queue was idle long enough, cfq_idle_class_timer may have
a false (and very long) timeout because jiffies can wrap into the past wrt
->last_end_request.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-11-07 13:51:35 +01:00
Patrick McHardy
c3d8d1e30c [NETLINK]: Fix unicast timeouts
Commit ed6dcf4a in the history.git tree broke netlink_unicast timeouts
by moving the schedule_timeout() call to a new function that doesn't
propagate the remaining timeout back to the caller. This means on each
retry we start with the full timeout again.

ipc/mqueue.c seems to actually want to wait indefinitely so this
behaviour is retained.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-07 04:15:12 -08:00
Eric Dumazet
230140cffa [INET]: Remove per bucket rwlock in tcp/dccp ehash table.
As done two years ago on IP route cache table (commit
22c047ccbc) , we can avoid using one
lock per hash bucket for the huge TCP/DCCP hash tables.

On a typical x86_64 platform, this saves about 2MB or 4MB of ram, for
litle performance differences. (we hit a different cache line for the
rwlock, but then the bucket cache line have a better sharing factor
among cpus, since we dirty it less often). For netstat or ss commands
that want a full scan of hash table, we perform fewer memory accesses.

Using a 'small' table of hashed rwlocks should be more than enough to
provide correct SMP concurrency between different buckets, without
using too much memory. Sizing of this table depends on
num_possible_cpus() and various CONFIG settings.

This patch provides some locking abstraction that may ease a future
work using a different model for TCP/DCCP table.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-07 04:15:11 -08:00
Rumen G. Bogdanovski
efac52762b [IPVS]: Synchronize closing of Connections
This patch makes the master daemon to sync the connection when it is about
to close.  This makes the connections on the backup to close or timeout
according their state.  Before the sync was performed only if the
connection is in ESTABLISHED state which always made the connections to
timeout in the hard coded 3 minutes. However the Andy Gospodarek's patch
([IPVS]: use proper timeout instead of fixed value) effectively did nothing
more than increasing this to 15 minutes (Established state timeout).  So
this patch makes use of proper timeout since it syncs the connections on
status changes to FIN_WAIT (2min timeout) and CLOSE (10sec timeout).
However if the backup misses CLOSE hopefully it did not miss FIN_WAIT.
Otherwise we will just have to wait for the ESTABLISHED state timeout. As
it is without this patch.  This way the number of the hanging connections
on the backup is kept to minimum. And very few of them will be left to
timeout with a long timeout.

This is important if we want to make use of the fix for the real server
overcommit on master/backup fail-over.

Signed-off-by: Rumen G. Bogdanovski <rumen@voicecho.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-07 04:15:10 -08:00
Rumen G. Bogdanovski
1e356f9cdf [IPVS]: Bind connections on stanby if the destination exists
This patch fixes the problem with node overload on director fail-over.
Given the scenario: 2 nodes each accepting 3 connections at a time and 2
directors, director failover occurs when the nodes are fully loaded (6
connections to the cluster) in this case the new director will assign
another 6 connections to the cluster, If the same real servers exist
there.

The problem turned to be in not binding the inherited connections to
the real servers (destinations) on the backup director. Therefore:
"ipvsadm -l" reports 0 connections:
root@test2:~# ipvsadm -l
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  test2.local:5999 wlc
  -> node473.local:5999           Route   1000   0          0
  -> node484.local:5999           Route   1000   0          0

while "ipvs -lnc" is right
root@test2:~# ipvsadm -lnc
IPVS connection entries
pro expire state       source             virtual            destination
TCP 14:56  ESTABLISHED 192.168.0.10:39164 192.168.0.222:5999
192.168.0.51:5999
TCP 14:59  ESTABLISHED 192.168.0.10:39165 192.168.0.222:5999
192.168.0.52:5999

So the patch I am sending fixes the problem by binding the received
connections to the appropriate service on the backup director, if it
exists, else the connection will be handled the old way. So if the
master and the backup directors are synchronized in terms of real
services there will be no problem with server over-committing since
new connections will not be created on the nonexistent real services
on the backup. However if the service is created later on the backup,
the binding will be performed when the next connection update is
received. With this patch the inherited connections will show as
inactive on the backup:

root@test2:~# ipvsadm -l
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  test2.local:5999 wlc
  -> node473.local:5999           Route   1000   0          1
  -> node484.local:5999           Route   1000   0          1

rumen@test2:~$ cat /proc/net/ip_vs
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP  C0A800DE:176F wlc
  -> C0A80033:176F      Route   1000   0          1
  -> C0A80032:176F      Route   1000   0          1

Regards,
Rumen Bogdanovski

Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Rumen G. Bogdanovski <rumen@voicecho.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2007-11-07 04:15:09 -08:00
Adrian Bunk
c183783e28 [NET]: Remove Documentation/networking/pt.txt
There's no no point in keeping documentation for a driver that was
removed many years ago.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-11-07 04:15:08 -08:00