Commit Graph

180444 Commits

Author SHA1 Message Date
David Rientjes
ca2107c9d6 x86, numa: Remove configurable node size support for numa emulation
Now that numa=fake=<size>[MG] is implemented, it is possible to remove
configurable node size support.  The command-line parsing was already
broken (numa=fake=*128, for example, would not work) and since fake nodes
are now interleaved over physical nodes, this support is no longer
required.

Signed-off-by: David Rientjes <rientjes@google.com>
LKML-Reference: <alpine.DEB.2.00.1002151343080.26927@chino.kir.corp.google.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-15 14:34:18 -08:00
David Rientjes
8df5bb34de x86, numa: Add fixed node size option for numa emulation
numa=fake=N specifies the number of fake nodes, N, to partition the
system into and then allocates them by interleaving over physical nodes.
This requires knowledge of the system capacity when attempting to
allocate nodes of a certain size: either very large nodes to benchmark
scalability of code that operates on individual nodes, or very small
nodes to find bugs in the VM.

This patch introduces numa=fake=<size>[MG] so it is possible to specify
the size of each node to allocate.  When used, nodes of the size
specified will be allocated and interleaved over the set of physical
nodes.

FAKE_NODE_MIN_SIZE was also moved to the more-appropriate
include/asm/numa_64.h.

Signed-off-by: David Rientjes <rientjes@google.com>
LKML-Reference: <alpine.DEB.2.00.1002151342510.26927@chino.kir.corp.google.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-15 14:34:10 -08:00
David Rientjes
68fd111e02 x86, numa: Fix numa emulation calculation of big nodes
numa=fake=N uses split_nodes_interleave() to partition the system into N
fake nodes.  Each node size must have be a multiple of
FAKE_NODE_MIN_SIZE, otherwise it is possible to get strange alignments.
Because of this, the remaining memory from each node when rounded to
FAKE_NODE_MIN_SIZE is consolidated into a number of "big nodes" that are
bigger than the rest.

The calculation of the number of big nodes is incorrect since it is using
a logical AND operator when it should be multiplying the rounded-off
portion of each node with N.

Signed-off-by: David Rientjes <rientjes@google.com>
LKML-Reference: <alpine.DEB.2.00.1002151342230.26927@chino.kir.corp.google.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-15 14:34:04 -08:00
Haicheng Li
0271f91003 x86, acpi: Map hotadded cpu to correct node.
When hotadd new cpu to system, if its affinitive node is online,
should map the cpu to its own node.  Otherwise, let kernel select one
online node for the new cpu later.

Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
LKML-Reference: <4B6AAA39.6000300@linux.intel.com>
Tested-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-10 11:00:43 -08:00
Linus Torvalds
e28cab42f3 Merge branch 'i2c-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging
* 'i2c-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
  i2c-tiny-usb: Fix on big-endian systems
2010-02-10 07:34:46 -08:00
Linus Torvalds
909ccdb4cf Merge branch 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6
* 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6:
  [S390] Fix struct _lowcore layout.
  [S390] qdio: prevent call trace if CHPID is offline
  [S390] qdio: continue polling for buffer state ERROR
2010-02-10 07:19:07 -08:00
Linus Torvalds
2cbd188388 Merge branch 'kvm-updates/2.6.33' of git://git.kernel.org/pub/scm/virt/kvm/kvm
* 'kvm-updates/2.6.33' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: PIT: control word is write-only
  kvmclock: count total_sleep_time when updating guest clock
  Export the symbol of getboottime and mmonotonic_to_bootbased
2010-02-10 07:18:15 -08:00
Linus Torvalds
5993fe31c0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/hskinnemoen/avr32-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/hskinnemoen/avr32-2.6:
  avr32: clean up memory allocation in at32_add_device_mci
  arch/avr32: Fix build failure for avr32 caused by typo
2010-02-10 07:17:54 -08:00
Linus Torvalds
53910146df Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
  powerpc: Fix address masking bug in hpte_need_flush()
2010-02-10 07:16:58 -08:00
Linus Torvalds
5551638acb Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
  cifs: fix dentry hash calculation for case-insensitive mounts
  [CIFS] Don't cache timestamps on utimes due to coarse granularity
  [CIFS] Maximum username length check in session setup does not match
  cifs: fix length calculation for converted unicode readdir names
  [CIFS] Add support for TCP_NODELAY
2010-02-10 07:16:44 -08:00
Linus Torvalds
0ea457839d Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (29 commits)
  drivers/net: Correct NULL test
  MAINTAINERS: networking drivers - Add git net-next tree
  net/sched: Fix module name in Kconfig
  cxgb3: fix GRO checksum check
  dst: call cond_resched() in dst_gc_task()
  netfilter: nf_conntrack: fix hash resizing with namespaces
  netfilter: xtables: compat out of scope fix
  netfilter: nf_conntrack: restrict runtime expect hashsize modifications
  netfilter: nf_conntrack: per netns nf_conntrack_cachep
  netfilter: nf_conntrack: fix memory corruption with multiple namespaces
  Bluetooth: Keep a copy of each HID device's report descriptor
  pktgen: Fix freezing problem
  igb: make certain to reassign legacy interrupt vectors after reset
  irda: add missing BKL in irnet_ppp ioctl
  irda: unbalanced lock_kernel in irnet_ppp
  ixgbe: Fix return of invalid txq
  ixgbe: Fix ixgbe_tx_map error path
  netxen: protect resource cleanup by rtnl lock
  netxen: fix tx timeout recovery for NX2031 chip
  Bluetooth: Enter active mode before establishing a SCO link.
  ...
2010-02-10 07:15:21 -08:00
David Gibson
77058e1adc powerpc: Fix address masking bug in hpte_need_flush()
Commit f71dc176aa 'Make
hpte_need_flush() correctly mask for multiple page sizes' introduced
bug, which is triggered when a kernel with a 64k base page size is run
on a system whose hardware does not 64k hash PTEs.  In this case, we
emulate 64k pages with multiple 4k hash PTEs, however in
hpte_need_flush() we incorrectly only mask the hardware page size from
the address, instead of the logical page size.  This causes things to
go wrong when we later attempt to iterate through the hardware
subpages of the logical page.

This patch corrects the error.  It has been tested on pSeries bare
metal by Michael Neuling.

Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-02-10 13:58:06 +11:00
Linus Torvalds
ac73fddfc5 Merge branch 'for-linus' of git://neil.brown.name/md
* 'for-linus' of git://neil.brown.name/md:
  md: fix some lockdep issues between md and sysfs.
  md: fix 'degraded' calculation when starting a reshape.
2010-02-09 17:01:26 -08:00
NeilBrown
ef286f6fa6 md: fix some lockdep issues between md and sysfs.
======
This fix is related to
    http://bugzilla.kernel.org/show_bug.cgi?id=15142
but does not address that exact issue.
======

sysfs does like attributes being removed while they are being accessed
(i.e. read or written) and waits for the access to complete.

As accessing some md attributes takes the same lock that is held while
removing those attributes a deadlock can occur.

This patch addresses 3 issues in md that could lead to this deadlock.

Two relate to calling flush_scheduled_work while the lock is held.
This is probably a bad idea in general and as we use schedule_work to
delete various sysfs objects it is particularly bad.

In one case flush_scheduled_work is called from md_alloc (called by
md_probe) called from do_md_run which holds the lock.  This call is
only present to ensure that ->gendisk is set.  However we can be sure
that gendisk is always set (though possibly we couldn't when that code
was originally written.  This is because do_md_run is called in three
different contexts:
  1/ from md_ioctl.  This requires that md_open has succeeded, and it
     fails if ->gendisk is not set.
  2/ from writing a sysfs attribute.  This can only happen if the
     mddev has been registered in sysfs which happens in md_alloc
     after ->gendisk has been set.
  3/ from autorun_array which is only called by autorun_devices, which
     checks for ->gendisk to be set before calling autorun_array.
So the call to md_probe in do_md_run can be removed, and the check on
->gendisk can also go.


In the other case flush_scheduled_work is being called in do_md_stop,
purportedly to wait for all md_delayed_delete calls (which delete the
component rdevs) to complete.  However there really isn't any need to
wait for them - they have already been disconnected in all important
ways.

The third issue is that raid5->stop() removes some attribute names
while the lock is held.  There is already some infrastructure in place
to delay attribute removal until after the lock is released (using
schedule_work).  So extend that infrastructure to remove the
raid5_attrs_group.

This does not address all lockdep issues related to the sysfs
"s_active" lock.  The rest can be address by splitting that lockdep
context between symlinks and non-symlinks which hopefully will happen.

Signed-off-by: NeilBrown <neilb@suse.de>
2010-02-10 11:26:09 +11:00
Linus Torvalds
3af9cf11b6 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
  9p: fix p9_client_destroy unconditional calling v9fs_put_trans
  9p: fix memory leak in v9fs_parse_options()
  9p: Fix the kernel crash on a failed mount
  9p: fix option parsing
  9p: Include fsync support for 9p client
  net/9p: fix statsize inside twstat
  net/9p: fail when user specifies a transport which we can't find
  net/9p: fix virtio transport to correctly update status on connect
2010-02-09 11:19:06 -08:00
Marcelo Tosatti
ee73f656a6 KVM: PIT: control word is write-only
PIT control word (address 0x43) is write-only, reads are undefined.

Cc: stable@kernel.org
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-02-09 19:20:15 +02:00
Jason Wang
923de3cf5b kvmclock: count total_sleep_time when updating guest clock
Current kvm wallclock does not consider the total_sleep_time which could cause
wrong wallclock in guest after host suspend/resume. This patch solve
this issue by counting total_sleep_time to get the correct host boot time.

Cc: stable@kernel.org
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Glauber Costa <glommer@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-02-09 19:20:15 +02:00
Jason Wang
c93d89f3db Export the symbol of getboottime and mmonotonic_to_bootbased
Export getboottime and monotonic_to_bootbased in order to let them
could be used by following patch.

Cc: stable@kernel.org
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-02-09 19:20:15 +02:00
Heiko Carstens
7717aefff3 [S390] Fix struct _lowcore layout.
Offsets and sizes are wrong for 32 bit.
Got broken with 866ba284 "[S390] cleanup lowcore.h".

Reported-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-02-09 09:46:23 +01:00
Jan Glauber
959153d345 [S390] qdio: prevent call trace if CHPID is offline
If a CHPID is offline during a device shutdown the ccw_device_halt|clear
may fail and the qdio device stays in state STOPPED until the shutdown is
finished. If an interrupt occurs before the device is set to INACTIVE
the STOPPED state triggers a WARN_ON in the interrupt handler.
Prevent this WARN_ON by catching the STOPPED state in the interrupt
handler.

Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-02-09 09:46:23 +01:00
Ursula Braun
4c52228d1b [S390] qdio: continue polling for buffer state ERROR
Inbound traffic handling may hang if next buffer to check is in
state ERROR, polling is stopped and the final check for further
available inbound buffers disregards buffers in state ERROR.
This patch includes state ERROR when checking availability of
more inbound buffers.

Cc: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-02-09 09:46:23 +01:00
David S. Miller
44bfce5c3e Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/holtmann/bluetooth-2.6 2010-02-08 22:45:56 -08:00
Julia Lawall
bcf4d812e6 drivers/net: Correct NULL test
Test the value that was just allocated rather than the previously tested one.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@r@
expression *x;
expression e;
identifier l;
@@

if (x == NULL || ...) {
    ... when forall
    return ...; }
... when != goto l;
    when != x = e
    when != &x
*x == NULL
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-08 22:44:18 -08:00
Joe Perches
3af26f58d1 MAINTAINERS: networking drivers - Add git net-next tree
During the rc period, patches that are not bugfixes
should be done using the net-next tree.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-08 22:42:40 -08:00
Jan Luebbe
d4ae20b379 net/sched: Fix module name in Kconfig
The action modules have been prefixed with 'act_', but the Kconfig
description was not changed.

Signed-off-by: Jan Luebbe <jluebbe@debian.org>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-08 22:41:44 -08:00
Divy Le Ray
2d171886b1 cxgb3: fix GRO checksum check
Verify the HW checksum state for frames handed to GRO processing.

Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-08 22:37:24 -08:00
NeilBrown
9eb07c2592 md: fix 'degraded' calculation when starting a reshape.
This code was written long ago when it was not possible to
reshape a degraded array.  Now it is so the current level of
degraded-ness needs to be taken in to account.  Also newly addded
devices should only reduce degradedness if they are deemed to be
in-sync.

In particular, if you convert a RAID5 to a RAID6, and increase the
number of devices at the same time, then the 5->6 conversion will
make the array degraded so the current code will produce a wrong
value for 'degraded' - "-1" to be precise.

If the reshape runs to completion end_reshape will calculate a correct
new value for 'degraded', but if a device fails during the reshape an
incorrect decision might be made based on the incorrect value of
"degraded".

This patch is suitable for 2.6.32-stable and if they are still open,
2.6.31-stable and 2.6.30-stable as well.

Cc: stable@kernel.org
Reported-by: Michael Evans <mjevans1983@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2010-02-09 16:34:29 +11:00
Linus Torvalds
deb0c98c7f Merge branch 'for-2.6.33' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.33' of git://linux-nfs.org/~bfields/linux:
  Revert "nfsd4: fix error return when pseudoroot missing"
2010-02-08 17:08:01 -08:00
Eric Van Hensbergen
8781ff9495 9p: fix p9_client_destroy unconditional calling v9fs_put_trans
restructure client create code to handle error cases better and
only cleanup initialized portions of the stack.

Signed-off-by: Venkateswararao Jujjuri <jvrao@us.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-02-08 18:18:34 -06:00
Linus Torvalds
a5f28ae4df Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2:
  ocfs2/cluster: Make o2net connect messages KERN_NOTICE
  ocfs2/dlm: Fix printing of lockname
  ocfs2: Fix contiguousness check in ocfs2_try_to_merge_extent_map()
  ocfs2/dlm: Remove BUG_ON in dlm recovery when freeing locks of a dead node
  ocfs2: Plugs race between the dc thread and an unlock ast message
  ocfs2: Remove overzealous BUG_ON during blocked lock processing
  ocfs2: Do not downconvert if the lock level is already compatible
  ocfs2: Prevent a livelock in dlmglue
  ocfs2: Fix setting of OCFS2_LOCK_BLOCKED during bast
  ocfs2: Use compat_ptr in reflink_arguments.
  ocfs2/dlm: Handle EAGAIN for compatibility - v2
  ocfs2: Add parenthesis to wrap the check for O_DIRECT.
  ocfs2: Only bug out when page size is larger than cluster size.
  ocfs2: Fix memory overflow in cow_by_page.
  ocfs2/dlm: Print more messages during lock migration
  ocfs2/dlm: Ignore LVBs of locks in the Blocked list
  ocfs2/trivial: Remove trailing whitespaces
  ocfs2: fix a misleading variable name
  ocfs2: Sync max_inline_data_with_xattr from tools.
  ocfs2: Fix refcnt leak on ocfs2_fast_follow_link() error path
2010-02-08 16:05:50 -08:00
Eric Van Hensbergen
bf2d29c64d 9p: fix memory leak in v9fs_parse_options()
If match_strdup() fail this function exits without freeing the options string.

Signed-off-by: Venkateswararao Jujjuri <jvrao@us.ibm.com>
Sigend-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-02-08 17:59:34 -06:00
Aneesh Kumar K.V
fb786100f7 9p: Fix the kernel crash on a failed mount
The patch fix the crash repoted below

[   15.149907] BUG: unable to handle kernel NULL pointer dereference at 00000001
[   15.150806] IP: [<c140b886>] p9_virtio_close+0x18/0x24
.....
....
[   15.150806] Call Trace:
[   15.150806]  [<c1408e78>] ? p9_client_destroy+0x3f/0x163
[   15.150806]  [<c1409342>] ? p9_client_create+0x25f/0x270
[   15.150806]  [<c1063b72>] ? trace_hardirqs_on+0xb/0xd
[   15.150806]  [<c11ed4e8>] ? match_token+0x64/0x164
[   15.150806]  [<c1175e8d>] ? v9fs_session_init+0x2f1/0x3c8
[   15.150806]  [<c109cfc9>] ? kmem_cache_alloc+0x98/0xb8
[   15.150806]  [<c1063b72>] ? trace_hardirqs_on+0xb/0xd
[   15.150806]  [<c1173dd1>] ? v9fs_get_sb+0x47/0x1e8
[   15.150806]  [<c1173dea>] ? v9fs_get_sb+0x60/0x1e8
[   15.150806]  [<c10a2e77>] ? vfs_kern_mount+0x81/0x11a
[   15.150806]  [<c10a2f55>] ? do_kern_mount+0x33/0xbe
[   15.150806]  [<c10b40b9>] ? do_mount+0x654/0x6b3
[   15.150806]  [<c1038949>] ? do_page_fault+0x0/0x284
[   15.150806]  [<c10b28ec>] ? copy_mount_options+0x73/0xd2
[   15.150806]  [<c10b4179>] ? sys_mount+0x61/0x94
[   15.150806]  [<c14284e9>] ? syscall_call+0x7/0xb
....
[   15.203562] ---[ end trace 1dd159357709eb4b ]---
[

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-02-08 17:25:33 -06:00
Eric Dumazet
2fc1b5dd99 dst: call cond_resched() in dst_gc_task()
Kernel bugzilla #15239

On some workloads, it is quite possible to get a huge dst list to
process in dst_gc_task(), and trigger soft lockup detection.

Fix is to call cond_resched(), as we run in process context.

Reported-by: Pawel Staszewski <pstaszewski@itcare.pl>
Tested-by: Pawel Staszewski <pstaszewski@itcare.pl>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-08 15:00:39 -08:00
Eric Van Hensbergen
d8c8a9e365 9p: fix option parsing
Options pointer is being moved before calling kfree() which seems
to cause problems.  This uses a separate pointer to track and free
original allocation.

Signed-off-by: Venkateswararao Jujjuri <jvrao@us.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>w
2010-02-08 16:23:23 -06:00
M. Mohan Kumar
7a4439c406 9p: Include fsync support for 9p client
Implement the fsync in the client side by marking stat field values to 'don't touch' so that server may 
interpret it as a request to guarantee that the contents of the associated file are committed to stable 
storage before the Rwstat message is returned.

Without this patch, calling fsync on a 9p file results in "Invalid argument" error. Please check the attached 
C program.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> 
Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> 
Acked-by: Venkateswararao Jujjuri (JV) <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-02-08 15:36:48 -06:00
Linus Torvalds
8defcaa6ba Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
  [CPUFREQ] Fix ondemand to not request targets outside policy limits
  [CPUFREQ] Fix use after free of struct powernow_k8_data
  [CPUFREQ] fix default value for ondemand governor
2010-02-08 13:33:31 -08:00
Sunil Mushran
6efd806634 ocfs2/cluster: Make o2net connect messages KERN_NOTICE
Connect and disconnect messages are more than informational as they are required
during root cause analysis for failures. This patch changes them from KERN_INFO
to KERN_NOTICE.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Acked-by: Mark Faseh <mfasheh@suse.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-08 13:02:28 -08:00
Sunil Mushran
86a06abab0 ocfs2/dlm: Fix printing of lockname
The debug call printing the name of the lock resource was chopping
off the last character. This patch fixes the problem.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Acked-by: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-08 13:01:31 -08:00
J. Bruce Fields
260c64d235 Revert "nfsd4: fix error return when pseudoroot missing"
Commit f39bde24b2 fixed the error return from PUTROOTFH in the
case where there is no pseudofilesystem.

This is really a case we shouldn't hit on a correctly configured server:
in the absence of a root filehandle, there's no point accepting version
4 NFS rpc calls at all.

But the shared responsibility between kernel and userspace here means
the kernel on its own can't eliminate the possiblity of this happening.
And we have indeed gotten this wrong in distro's, so new client-side
mount code that attempts to negotiate v4 by default first has to work
around this case.

Therefore when commit f39bde24b2 arrived at roughly the same
time as the new v4-default mount code, which explicitly checked only for
the previous error, the result was previously fine mounts suddenly
failing.

We'll fix both sides for now: revert the error change, and make the
client-side mount workaround more robust.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-02-08 15:25:23 -05:00
Eric Van Hensbergen
9d6939dac7 net/9p: fix statsize inside twstat
stat structures contain a size prefix.  In our twstat messages
we were including the size of the size prefix in the prefix, which is not
what the protocol wants, and Inferno servers would complain.

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-02-08 14:13:30 -06:00
Eric Van Hensbergen
349d3bb878 net/9p: fail when user specifies a transport which we can't find
If the user specifies a transport and we can't find it, we failed back
to the default trainsport silently.  This patch will make the code
complain more loudly and return an error code.

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-02-08 14:13:30 -06:00
Eric Van Hensbergen
562ada6120 net/9p: fix virtio transport to correctly update status on connect
The 9p virtio transport was not updating its connection status correctly
preventing it from being able to mount the server.

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-02-08 14:13:30 -06:00
Patrick McHardy
d696c7bdaa netfilter: nf_conntrack: fix hash resizing with namespaces
As noticed by Jon Masters <jonathan@jonmasters.org>, the conntrack hash
size is global and not per namespace, but modifiable at runtime through
/sys/module/nf_conntrack/hashsize. Changing the hash size will only
resize the hash in the current namespace however, so other namespaces
will use an invalid hash size. This can cause crashes when enlarging
the hashsize, or false negative lookups when shrinking it.

Move the hash size into the per-namespace data and only use the global
hash size to initialize the per-namespace value when instanciating a
new namespace. Additionally restrict hash resizing to init_net for
now as other namespaces are not handled currently.

Cc: stable@kernel.org
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-08 11:18:07 -08:00
Alexey Dobriyan
14c7dbe043 netfilter: xtables: compat out of scope fix
As per C99 6.2.4(2) when temporary table data goes out of scope,
the behaviour is undefined:

	if (compat) {
		struct foo tmp;
		...
		private = &tmp;
	}
	[dereference private]

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-02-08 11:17:43 -08:00
Alexey Dobriyan
13ccdfc2af netfilter: nf_conntrack: restrict runtime expect hashsize modifications
Expectation hashtable size was simply glued to a variable with no code
to rehash expectations, so it was a bug to allow writing to it.
Make "expect_hashsize" readonly.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-02-08 11:17:22 -08:00
Eric Dumazet
5b3501faa8 netfilter: nf_conntrack: per netns nf_conntrack_cachep
nf_conntrack_cachep is currently shared by all netns instances, but
because of SLAB_DESTROY_BY_RCU special semantics, this is wrong.

If we use a shared slab cache, one object can instantly flight between
one hash table (netns ONE) to another one (netns TWO), and concurrent
reader (doing a lookup in netns ONE, 'finding' an object of netns TWO)
can be fooled without notice, because no RCU grace period has to be
observed between object freeing and its reuse.

We dont have this problem with UDP/TCP slab caches because TCP/UDP
hashtables are global to the machine (and each object has a pointer to
its netns).

If we use per netns conntrack hash tables, we also *must* use per netns
conntrack slab caches, to guarantee an object can not escape from one
namespace to another one.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
[Patrick: added unique slab name allocation]
Cc: stable@kernel.org
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-02-08 11:16:56 -08:00
Patrick McHardy
9edd7ca0a3 netfilter: nf_conntrack: fix memory corruption with multiple namespaces
As discovered by Jon Masters <jonathan@jonmasters.org>, the "untracked"
conntrack, which is located in the data section, might be accidentally
freed when a new namespace is instantiated while the untracked conntrack
is attached to a skb because the reference count it re-initialized.

The best fix would be to use a seperate untracked conntrack per
namespace since it includes a namespace pointer. Unfortunately this is
not possible without larger changes since the namespace is not easily
available everywhere we need it. For now move the untracked conntrack
initialization to the init_net setup function to make sure the reference
count is not re-initialized and handle cleanup in the init_net cleanup
function to make sure namespaces can exit properly while the untracked
conntrack is in use in other namespaces.

Cc: stable@kernel.org
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-02-08 11:16:26 -08:00
Linus Torvalds
08c4f1b096 Merge branch 'v4l_for_linus' of git://linuxtv.org/fixes
* 'v4l_for_linus' of git://linuxtv.org/fixes:
  V4L/DVB: dvb-core: fix initialization of feeds list in demux filter
  V4L/DVB: dvb_demux: Don't use vmalloc at dvb_dmx_swfilter_packet
  V4L/DVB: Fix the risk of an oops at dvb_dmx_release
2010-02-08 11:07:10 -08:00
Linus Torvalds
2b1f5c3ac3 Merge branch 'for-linus' of git://git.monstr.eu/linux-2.6-microblaze
* 'for-linus' of git://git.monstr.eu/linux-2.6-microblaze:
  microblaze: Invalidate dcache before enabling it
2010-02-08 10:10:36 -08:00
Linus Torvalds
9d2bc1a4cc Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
  powerpc/pseries: Fix kexec regression caused by CPPR tracking
2010-02-08 10:10:18 -08:00