Fix memory detection on Voodoo3 cards with SDRAM memory.
Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It turns out that event 0x4 merely indcates that a hotkey has been
pressed, not which one. A further query is required in order to determine
the actual keypress. The following patch adds support for that along with
the known keycodes.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
hp-wmi currently changes the RFKill state by altering the struct members
rather than using the dedicated interface, meaning that update events
won't be pushed to userspace. This patch fixes that, along with fixing
the declared type of the WWAN kill switch. It also ensures that rfkill
interfaces are only registered for hardware that exists.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Acked-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Cc: Ivo van Doorn <ivdoorn@gmail.com>
Cc: Dave Young <hidave.darkstar@gmail.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Update Documentation/filesystems/proc.txt: it describes the file
auto_msgmni intoduced to enable/disable msgmni automatic recomputing upon
memory add/remove (see thread http://lkml.org/lkml/2008/7/4/27). Also
added a description for msgmni (this filex is only listed in
Documentation/sysctl/kernel.txt).
Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Quicklists store pages for each CPU as caches. (Each CPU can cache
node_free_pages/16 pages)
It is used for page table cache. exit() will increase the cache size,
while fork() consumes it.
So for example if an apache-style application runs (one parent and many
child model), one CPU process will fork() while another CPU will process
the middleware work and exit().
At that time, the CPU on which the parent runs doesn't have page table
cache at all. Others (on which children runs) have maximum caches.
QList_max = (#ofCPUs - 1) x Free / 16
=> QList_max / (Free + QList_max) = (#ofCPUs - 1) / (16 + #ofCPUs - 1)
So, How much quicklist memory is used in the maximum case?
This is proposional to # of CPUs because the limit of per cpu quicklist
cache doesn't see the number of cpus.
Above calculation mean
Number of CPUs per node 2 4 8 16
============================== ====================
QList_max / (Free + QList_max) 5.8% 16% 30% 48%
Wow! Quicklist can spend about 50% memory at worst case.
My demonstration program is here
--------------------------------------------------------------------------------
#define _GNU_SOURCE
#include <stdio.h>
#include <errno.h>
#include <stdlib.h>
#include <string.h>
#include <sched.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/wait.h>
#define BUFFSIZE 512
int max_cpu(void) /* get max number of logical cpus from /proc/cpuinfo */
{
FILE *fd;
char *ret, buffer[BUFFSIZE];
int cpu = 1;
fd = fopen("/proc/cpuinfo", "r");
if (fd == NULL) {
perror("fopen(/proc/cpuinfo)");
exit(EXIT_FAILURE);
}
while (1) {
ret = fgets(buffer, BUFFSIZE, fd);
if (ret == NULL)
break;
if (!strncmp(buffer, "processor", 9))
cpu = atoi(strchr(buffer, ':') + 2);
}
fclose(fd);
return cpu;
}
void cpu_bind(int cpu) /* bind current process to one cpu */
{
cpu_set_t mask;
int ret;
CPU_ZERO(&mask);
CPU_SET(cpu, &mask);
ret = sched_setaffinity(0, sizeof(mask), &mask);
if (ret == -1) {
perror("sched_setaffinity()");
exit(EXIT_FAILURE);
}
sched_yield(); /* not necessary */
}
#define MMAP_SIZE (10 * 1024 * 1024) /* 10 MB */
#define FORK_INTERVAL 1 /* 1 second */
main(int argc, char *argv[])
{
int cpu_max, nextcpu;
long pagesize;
pid_t pid;
/* set max number of logical cpu */
if (argc > 1)
cpu_max = atoi(argv[1]) - 1;
else
cpu_max = max_cpu();
/* get the page size */
pagesize = sysconf(_SC_PAGESIZE);
if (pagesize == -1) {
perror("sysconf(_SC_PAGESIZE)");
exit(EXIT_FAILURE);
}
/* prepare parent process */
cpu_bind(0);
nextcpu = cpu_max;
loop:
/* select destination cpu for child process by round-robin rule */
if (++nextcpu > cpu_max)
nextcpu = 1;
pid = fork();
if (pid == 0) { /* child action */
char *p;
int i;
/* consume page tables */
p = mmap(0, MMAP_SIZE, PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
i = MMAP_SIZE / pagesize;
while (i-- > 0) {
*p = 1;
p += pagesize;
}
/* move to other cpu */
cpu_bind(nextcpu);
/*
printf("a child moved to cpu%d after mmap().\n", nextcpu);
fflush(stdout);
*/
/* back page tables to pgtable_quicklist */
exit(0);
} else if (pid > 0) { /* parent action */
sleep(FORK_INTERVAL);
waitpid(pid, NULL, WNOHANG);
}
goto loop;
}
----------------------------------------
When above program which does task migration runs, my 8GB box spends
800MB of memory for quicklist. This is not memory leak but doesn't seem
good.
% cat /proc/meminfo
MemTotal: 7701568 kB
MemFree: 4724672 kB
(snip)
Quicklists: 844800 kB
because
- My machine spec is
number of numa node: 2
number of cpus: 8 (4CPU x2 node)
total mem: 8GB (4GB x2 node)
free mem: about 5GB
- Then, 4.7GB x 16% ~= 880MB.
So, Quicklist can use 800MB.
So, if following spec machine run that program
CPUs: 64 (8cpu x 8node)
Mem: 1TB (128GB x8node)
Then, quicklist can waste 300GB (= 1TB x 30%). It is too large.
So, I don't like cache policies which is proportional to # of cpus.
My patch changes the number of caches
from:
per-cpu-cache-amount = memory_on_node / 16
to
per-cpu-cache-amount = memory_on_node / 16 / number_of_cpus_on_node.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Keiichiro Tokunaga <tokunaga.keiich@jp.fujitsu.com>
Acked-by: Christoph Lameter <cl@linux-foundation.org>
Tested-by: David Miller <davem@davemloft.net>
Acked-by: Mike Travis <travis@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
During the use of a dev_cgroup, we should guarantee the corresponding
cgroup won't be deleted (i.e. via rmdir). This can be done through
css_get(&dev_cgroup->css), but here we can just get and use the dev_cgroup
under rcu_read_lock.
And also remove checking NULL dev_cgroup, it won't be NULL since a task
always belongs to a cgroup.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
1. Check if virtual resolution fits into memory.
Otherwise, Linux hangs during panning.
2. When selected use all available memory to
maximize yres_virtual to speed up panning
(previously also xres_virtual was increased).
3. Simplify memory restriction calculations.
Signed-off-by: Krzysztof Helt <krzysztof.h1@poczta.fm>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We don't change pid_ns->child_reaper when the main thread of the
subnamespace init exits. As Robert Rex <robert.rex@exasol.com> pointed
out this is wrong.
Yes, the re-parenting itself works correctly, but if the reparented task
exits it needs ->parent->nsproxy->pid_ns in do_notify_parent(), and if the
main thread is zombie its ->nsproxy was already cleared by
exit_task_namespaces().
Introduce the new function, find_new_reaper(), which finds the new
->parent for the re-parenting and changes ->child_reaper if needed. Kill
the now unneeded exit_child_reaper().
Also move the changing of ->child_reaper from zap_pid_ns_processes() to
find_new_reaper(), this consolidates the games with ->child_reaper and
makes it stable under tasklist_lock.
Addresses http://bugzilla.kernel.org/show_bug.cgi?id=11391
Reported-by: Robert Rex <robert.rex@exasol.com>
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
zap_pid_ns_processes() sets pid_ns->child_reaper = NULL, this is wrong.
Yes, we have already killed all tasks in this namespace, and sys_wait4()
doesn't see any child. But this doesn't mean ->children list is empty, we
may have EXIT_DEAD tasks which are not visible to do_wait(). In that case
the subsequent forget_original_parent() will crash the kernel because it
will try to re-parent these tasks to the NULL reaper.
Even if there are no childs, it is not good that forget_original_parent()
uses reaper == NULL.
Change the code to set ->child_reaper = init_pid_ns.child_reaper instead.
We could use pid_ns->parent->child_reaper as well, I think this does not
really matter. These EXIT_DEAD tasks are not visible to the new ->parent
after re-parenting, they will silently do release_task() eventually.
Note that we must change ->child_reaper, otherwise
forget_original_parent() will use reaper == father, and in that case we
will hit the (correct) BUG_ON(!list_empty(&father->children)).
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
At91_mci is abusing dma_free_coherent(), which may not be called with IRQs
disabled. I saw "mkfs.ext3" on an MMC card objecting voluminously as each
write completed:
WARNING: at arch/arm/mm/consistent.c:368 dma_free_coherent+0x2c/0x224()
[<c002726c>] (dump_stack+0x0/0x14) from [<c00387d4>] (warn_on_slowpath+0x4c/0x68)
[<c0038788>] (warn_on_slowpath+0x0/0x68) from [<c0028768>] (dma_free_coherent+0x2c/0x224)
r6:00008008 r5:ffc06000 r4:00000000
[<c002873c>] (dma_free_coherent+0x0/0x224) from [<c01918ac>] (at91_mci_irq+0x374/0x420)
[<c0191538>] (at91_mci_irq+0x0/0x420) from [<c0065d9c>] (handle_IRQ_event+0x2c/0x6c)
...
This bug has been around for a LONG time. The MM warning is from late
2005, but the driver merged a year later ... so I'm puzzled why nobody
noticed this before now.
The fix involves noting that this buffer shouldn't be DMA-coherent; it's
just used for normal DMA writes. So replace it with standard kmalloc()
buffering and DMA mapping calls.
This is the quickie fix. A better one would not rely on allocating large
bounce buffers. (Note that dma_alloc_coherent could have failed too, but
that case was ignored... kmalloc is a bit more likely to fail though.)
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Acked-by: Pierre Ossman <drzeus-mmc@drzeus.cx>
Cc: Andrew Victor <linux@maxim.org.za>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Recent changes to tighten the check for UARTs that don't correctly
re-assert THRE (01c194d927: "serial 8250:
tighten test for using backup timer") caused problems when such a UART was
opened for the second time - the bug could only successfully be detected
at first initialization. For users of this version of this particular
UART IP it is fatal.
This patch stores the information about the bug in the bugs field of the
port structure when the port is first started up so subsequent opens can
check this bit even if the test for the bug fails.
David Brownell: "My own exposure to this is that the UART on DaVinci
hardware, which TI allegedly derived from its original 16550 logic, has
periodically gone from working to unusable with the mainline 8250.c ...
and back and forth a bunch. Currently it's "unusable", a regression from
some previous versions. With this patch from Will, it's usable."
Signed-off-by: Will Newton <will.newton@gmail.com>
Acked-by: Alex Williamson <alex.williamson@hp.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: David Brownell <david-b@pacbell.net>
Cc: <stable@kernel.org> [2.6.26.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
WARNING: vmlinux.o(.data+0x1f5c0): Section mismatch in reference from the variable contig_page_data to the variable .init.data:bootmem_node_data
The variable contig_page_data references
the variable __initdata bootmem_node_data
If the reference is valid then annotate the
variable with __init* (see linux/init.h) or name the variable:
*driver, *_template, *_timer, *_sht, *_ops, *_probe, *_probe_one, *_console,
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Johannes Weiner <hannes@saeurebad.de>
Cc: Sean MacLennan <smaclennan@pikatech.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The exit function neglects to remove debugfs entries, leading to a BUG
on reload.
[akpm@linux-foundation.org: cleanups]
Signed-off-by: Russ Dill <Russ.Dill@gmail.com>
Acked-by: Carlos Corbacho <carlos@strangeworlds.co.uk>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dio write returns EIO when try_to_release_page fails because bh is
still referenced.
The patch
commit 3f31fddfa2
Author: Mingming Cao <cmm@us.ibm.com>
Date: Fri Jul 25 01:46:22 2008 -0700
jbd: fix race between free buffer and commit transaction
was merged into 2.6.27-rc1, but I noticed that this patch is not enough
to fix the race.
I did fsstress test heavily to 2.6.27-rc1, and found that dio write still
sometimes got EIO through this test.
The patch above fixed race between freeing buffer(dio) and committing
transaction(jbd) but I discovered that there is another race, freeing
buffer(dio) and ext3/4_ordered_writepage.
: background_writeout()
->write_cache_pages()
->ext3_ordered_writepage()
walk_page_buffers() -> take a bh ref
block_write_full_page() -> unlock_page
: <- end_page_writeback
: <- race! (dio write->try_to_release_page fails)
walk_page_buffers() ->release a bh ref
ext3_ordered_writepage holds bh ref and does unlock_page remaining
taking a bh ref, so this causes the race and failure of
try_to_release_page.
To fix this race, I used the approach of falling back to buffered
writes if try_to_release_page() fails on a page.
[akpm@linux-foundation.org: cleanups]
Signed-off-by: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Mingming Cao <cmm@us.ibm.com>
Cc: Zach Brown <zach.brown@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I have gotten to the root cause of the hugetlb badness I reported back on
August 15th. My system has the following memory topology (note the
overlapping node):
Node 0 Memory: 0x8000000-0x44000000
Node 1 Memory: 0x0-0x8000000 0x44000000-0x80000000
setup_zone_migrate_reserve() scans the address range 0x0-0x8000000 looking
for a pageblock to move onto the MIGRATE_RESERVE list. Finding no
candidates, it happily continues the scan into 0x8000000-0x44000000. When
a pageblock is found, the pages are moved to the MIGRATE_RESERVE list on
the wrong zone. Oops.
setup_zone_migrate_reserve() should skip pageblocks in overlapping nodes.
Signed-off-by: Adam Litke <agl@us.ibm.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: <stable@kernel.org> [2.6.25.x, 2.6.26.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Update the location of the NTFS homepage in several files.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fix device tree ... don't forget to set the parent device
* let init/exit code be removed where practical
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
[bart: splitted it from bigger DaVinci patch, s/hw.parent/hw.dev/]
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
hwif_to_node() incorrectly assumes that hwif->dev always belongs to
a PCI device. This results in ide-cs oopsing in init_irq() after
commit c56c5648a3 accidentally fixed
device tree registration for ide-cs. Fix it by using dev_to_node().
Thanks to Martin Michlmayr and Larry Finger for help with debugging
the issue.
Reported-by: Martin Michlmayr <tbm@cyrius.com>
Tested-by: Martin Michlmayr <tbm@cyrius.com>
Cc: Larry Finger <Larry.Finger@lwfinger.net>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
The sff_dma_ops struct should be wrapped by BLK_DEV_IDEDMA_SFF instead
of BLK_DEV_IDEDMA_PCI.
Signed-off-by: Kevin Hilman <khilman@deeprootsystems.com>
Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* 'for-linus' of git://neil.brown.name/md:
Fix problem with waiting while holding rcu read lock in md/bitmap.c
Remove invalidate_partition call from do_md_stop.
With the new firmware infrastructure in 2.6.27, some files are generated and shouldn't be
diffed; add these 2 to the "dontdiff" file
Signed-off-by: Arjan van de Ven <arjan@Linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This fixes a typo in commit 2a2a64714d "Disable MWAIT via DMI on broken Compal board".
It allows the nomwait dmi check to actually detect the Acer 5220.
Signed-off-by: Dennis Jansen <dennis.jansen@web.de>
Tested-by: Dennis Jansen <dennis.jansen@web.de>
Acked-by: Zhao Yakui <yakui.zhao@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-2.6.27' of git://linux-nfs.org/~bfields/linux:
nfsd: fix buffer overrun decoding NFSv4 acl
sunrpc: fix possible overrun on read of /proc/sys/sunrpc/transports
nfsd: fix compound state allocation error handling
svcrdma: Fix race between svc_rdma_recvfrom thread and the dto_tasklet
Fix operator precedence bug in atari_keyb_init, which caused a failure on CT60
Signed-off-by: Michael Schmitz <schmitz@debian.org>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A parisc allmodconfig build produces this:
arch/parisc/hpux/fs.c:107: error: 'buffer' undeclared (first use in this function)
Introduced by commit da574983de ("[PATCH]
fix hpux_getdents()").
Helge Dille also reported this in bugzilla 11461:
http://bugzilla.kernel.org/show_bug.cgi?id=11461
and he posted an identical patch.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The recent commit 16d9679f33caf7e683471647d1472bfe133d858 changed
check_hung_task() to filter out the TASK_KILLABLE tasks. We can
move this check to the caller which has to test t->state anyway.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix kernel-doc warning for new function:
Warning(linux-2.6.27-rc5-git2//kernel/resource.c:448): No description found for parameter 'root'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
block: restore original behavior of /proc/partition when there's no partition
remove blk_register_filter and blk_unregister_filter in gendisk
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
sparc64: setup_valid_addr_bitmap_from_pavail() should be __init
sparc: Fix resource flags for PCI children in OF device tree.
sparc32: Implement smp_call_function_single().
Breaking lines due to some imaginary problem with a long line length is
often stupid and wrong, but never more so when it splits a string that
is printed out into multiple lines. This really ended up making it much
harder to find where some error strings were printed out, because a
simple 'grep' didn't work.
I'm sure there is tons more of this particular idiocy hiding in other
places, but this particular case hit me once more last week. So fix it.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The MEMGETREGIONINFO ioctl() in mtdchar.c was clobbering user memory by
overwriting more than intended, due the size of struct mtd_erase_region_info
changing in commit 0ecbc81adf ('Support
for auto locking flash on power up').
Fix avoids this by copying struct members one by one with put_user(), as there
is no longer a convenient struct to use the size of as the length argument to
copy_to_user().
Signed-off-by: Zev Weiss <zevweiss@gmail.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
This patch fixes a memory leak in an error path.
Reported-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
got rid of compilation warning:
ISO C90 forbids mixed declarations and code
Signed-off-by: Cordelia Sam <cordesam@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The array we kmalloc() here is not large enough.
Thanks to Johann Dahm and David Richter for bug report and testing.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: David Richter <richterd@citi.umich.edu>
Tested-by: Johann Dahm <jdahm@umich.edu>
Vegard Nossum reported
----------------------
> I noticed that something weird is going on with /proc/sys/sunrpc/transports.
> This file is generated in net/sunrpc/sysctl.c, function proc_do_xprt(). When
> I "cat" this file, I get the expected output:
> $ cat /proc/sys/sunrpc/transports
> tcp 1048576
> udp 32768
> But I think that it does not check the length of the buffer supplied by
> userspace to read(). With my original program, I found that the stack was
> being overwritten by the characters above, even when the length given to
> read() was just 1.
David Wagner added (among other things) that copy_to_user could be
probably used here.
Ingo Oeser suggested to use simple_read_from_buffer() here.
The conclusion is that proc_do_xprt doesn't check for userside buffer
size indeed so fix this by using Ingo's suggestion.
Reported-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
CC: Ingo Oeser <ioe-lkml@rameria.de>
Cc: Neil Brown <neilb@suse.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Greg Banks <gnb@sgi.com>
Cc: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>