linux-next

mirror of https://github.com/edk2-porting/linux-next.git synced 2024-11-20 00:26:39 +08:00

Author	SHA1	Message	Date
Josef Bacik	972fbf7798	ext3: don't try to resize if there are no reserved gdt blocks left When trying to resize a ext3 fs and you run out of reserved gdt blocks, you get an error that doesn't actually tell you what went wrong, it just says that the gdb it picked is not correct, which is the case since you don't have any reserved gdt blocks left. This patch adds a check to make sure you have reserved gdt blocks to use, and if not prints out a more relevant error. Signed-off-by: Josef Bacik <jbacik@redhat.com> Cc: <linux-ext4@vger.kernel.org> Cc: Andreas Dilger <adilger@sun.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:37 -07:00
Hidehiro Kawai	885e353c74	jbd: don't dirty original metadata buffer on abort Currently, original metadata buffers are dirtied when they are unfiled whether the journal has aborted or not. Eventually these buffers will be written-back to the filesystem by pdflush. This means some metadata buffers are written to the filesystem without journaling if the journal aborts. So if both journal abort and system crash happen at the same time, the filesystem would become inconsistent state. Additionally, replaying journaled metadata can overwrite the latest metadata on the filesystem partly. Because, if the journal aborts, journaled metadata are preserved and replayed during the next mount not to lose uncheckpointed metadata. This would also break the consistency of the filesystem. This patch prevents original metadata buffers from being dirtied on abort by clearing BH_JBDDirty flag from those buffers. Thus, no metadata buffers are written to the filesystem without journaling. Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Acked-by: Jan Kara <jack@suse.cz> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:37 -07:00
Hidehiro Kawai	d1645e526a	jbd: abort when failed to log metadata buffers If we failed to write metadata buffers to the journal space and succeeded to write the commit record, stale data can be written back to the filesystem as metadata in the recovery phase. To avoid this, when we failed to write out metadata buffers, abort the journal before writing the commit record. Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Acked-by: Jan Kara <jack@suse.cz> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:36 -07:00
Richard Holden	60c11d2abf	phonedev: remove BKL The phone_device array is covered by the phone_lock mutex in all cases and request_module no longer needs the BKL so we can remove the only remaining instance of the BKL from phonedev. Signed-off-by: Richard Holden <aciddeath@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:36 -07:00
Krzysztof Helt	3e680aae4e	fb: convert lock/unlock_kernel() into local fb mutex Change lock_kernel()/unlock_kernel() to local fb mutex. Each frame buffer instance has its own mutex. The one line try_to_load() function is unrolled to request_module() in two places for readability. [righi.andrea@gmail.com: fb: fix NULL pointer BUG dereference in fb_open()] Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl> Signed-off-by: Andrea Righi <righi.andrea@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:36 -07:00
Alan Cox	e53677113e	fb: push down the BKL in the ioctl handler Framebuffer is heavily BKL dependant at the moment so just wrap the ioctl handler in the driver as we push down. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Alan Cox <alan@redhat.com> Cc: Krzysztof Helt <krzysztof.h1@poczta.fm> Cc: "Antonino A. Daplas" <adaplas@pol.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:36 -07:00
David Brownell	978ccaa8ea	gpiolib: fix oops in gpio_get_value_cansleep() We can get the following oops from gpio_get_value_cansleep() when a GPIO controller doesn't provide a get() callback: Unable to handle kernel paging request for instruction fetch Faulting instruction address: 0x00000000 Oops: Kernel access of bad area, sig: 11 [#1] [...] NIP [00000000] 0x0 LR [c0182fb0] gpio_get_value_cansleep+0x40/0x50 Call Trace: [c7b79e80] [c0183f28] gpio_value_show+0x5c/0x94 [c7b79ea0] [c01a584c] dev_attr_show+0x30/0x7c [c7b79eb0] [c00d6b48] fill_read_buffer+0x68/0xe0 [c7b79ed0] [c00d6c54] sysfs_read_file+0x94/0xbc [c7b79ef0] [c008f24c] vfs_read+0xb4/0x16c [c7b79f10] [c008f580] sys_read+0x4c/0x90 [c7b79f40] [c0013a14] ret_from_syscall+0x0/0x38 It's OK to request the value of any GPIO; most GPIOs are bidirectional, so configuring them as outputs just enables an output driver and doesn't disable the input logic. So the problem is that gpio_get_value_cansleep() isn't making the same sanity check that gpio_get_value() does: making sure this GPIO isn't one of the atypical "no input logic" cases. Reported-by: Anton Vorontsov <avorontsov@ru.mvista.com> Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Cc: <stable@kernel.org> [2.6.27.x, 2.6.26.x, 2.6.25.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:36 -07:00
Steven A. Falco	e3274e9150	gpio: modify sysfs gpio export so that "value" displays as 0 or 1 gpiolib can export GPIOs to userspace via sysfs. This patch modifies the gpio_value_show() so that any non-zero value is explicitly printed as "1", rather than whatever numerical value the lower-level driver returns. Signed-off-by: Steve Falco <sfalco@harris.com> Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:36 -07:00
David Brownell	c8fc40cd34	rtc-cmos: export second NVRAM bank Teach rtc-cmos about the second bank of registers found on most modern x86 systems, giving access to 128 bytes more NVRAM. This version only sees that extra NVRAM when both register banks are provided as part of one PNP resource. Since BIOS on some systems presents them using two IO resources, and nothing merges them, this can't always show all the NVRAM. (We're supposed to be able to use PNP id PNP0b01 too, but BIOS tables doesn't often seem to use that particular option.) Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:36 -07:00
roel kluin	1f3ccaed13	Altix serial: fix In function sn_sal_switch_to_asynch(): drivers/serial/sn_console.c:713: HZ * SN_SAL_UART_FIFO_DEPTH / SN_SAL_UART_FIFO_SPEED_CPS; After preprocessing (see defines in patch) this becomes HZ * 16 / 9600 / 10 (associativity from left to right), not equivalent to HZ * 16 / 960. Looks-obviously-right-to: Tony Luck <tony.luck@intel.com> Cc: Jes Sorensen <jes@sgi.com> Acked-by: Pat Gefre <pfg@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:36 -07:00
Adrian Bunk	9f561dfcea	make probe_serial_gsc() static This patch makes the needlessly global probe_serial_gsc() static. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:36 -07:00
Jiri Slaby	da1cfe1ae4	Char: sx, remove bogus iomap readl/writel are not expected to accept iomap return value. Replace bogus mapping by standard ioremap. Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Cc: <R.E.Wolff@BitWizard.nl> Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:36 -07:00
Henrik Rydberg	8c9398d1e9	hwmon: applesmc: lighter wait mechanism, drastic improvement The read fail ratio is sensitive to the delay between the first byte written and the first byte read; apparently the sensors cannot be rushed. Increasing the minimum wait time, without changing the total wait time, improves the fail ratio from a 8% chance that any of the sensors fails in one read, down to 0.4%, on a Macbook Air. On a Macbook Pro 3,1, the effect is even more apparent. By reducing the number of status polls, the ratio is further improved to below 0.1%. Finally, increasing the total wait time brings the fail ratio down to virtually zero. Signed-off-by: Henrik Rydberg <rydberg@euromail.se> Tested-by: Bob McElrath <bob@mcelrath.org> Cc: Nicolas Boichat <nicolas@boichat.ch> Cc: "Mark M. Hoffman" <mhoffman@lightlink.com> Cc: Jean Delvare <khali@linux-fr.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:35 -07:00
Henrik Rydberg	07e8dbd3eb	hwmon: applesmc: Add support for Macbook Pro 3 Add temperature sensor support for Macbook Pro 3. Signed-off-by: Henrik Rydberg <rydberg@euromail.se> Cc: Nicolas Boichat <nicolas@boichat.ch> Cc: Riki Oktarianto <rkoktarianto@gmail.com> Cc: Mark M. Hoffman <mhoffman@lightlink.com> Cc: Jean Delvare <khali@linux-fr.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:35 -07:00
Henrik Rydberg	d7549905f1	hwmon: applesmc: Add support for Macbook Pro 4 Adds temperature sensor support for the Macbook Pro 4. Signed-off-by: Henrik Rydberg <rydberg@euromail.se> Cc: Nicolas Boichat <nicolas@boichat.ch> Cc: Riki Oktarianto <rkoktarianto@gmail.com> Cc: Mark M. Hoffman <mhoffman@lightlink.com> Cc: Jean Delvare <khali@linux-fr.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:35 -07:00
Andrew Morton	7b5e3cb28f	drivers/hwmon/applesmc.c: remove unneeded casts dmi_system_id.driver_data is already void*. Cc: Henrik Rydberg <rydberg@euromail.se> Cc: Nicolas Boichat <nicolas@boichat.ch> Cc: Riki Oktarianto <rkoktarianto@gmail.com> Cc: Mark M. Hoffman <mhoffman@lightlink.com> Cc: Jean Delvare <khali@linux-fr.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:35 -07:00
Henrik Rydberg	f5274c972b	hwmon: applesmc: add support for Macbook Air This patch adds accelerometer, backlight and temperature sensor support for the Macbook Air. Signed-off-by: Henrik Rydberg <rydberg@euromail.se> Cc: Nicolas Boichat <nicolas@boichat.ch> Cc: Riki Oktarianto <rkoktarianto@gmail.com> Cc: Mark M. Hoffman <mhoffman@lightlink.com> Cc: Jean Delvare <khali@linux-fr.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:35 -07:00
Henrik Rydberg	8bd1a12a51	hwmon: applesmc: allow for variable ALV0 and ALV1 package length On some recent Macbooks, the package length for the light sensors ALV0 and ALV1 has changed from 6 to 10. This patch allows for a variable package length encompassing both variants. Signed-off-by: Henrik Rydberg <rydberg@euromail.se> Cc: Nicolas Boichat <nicolas@boichat.ch> Cc: Riki Oktarianto <rkoktarianto@gmail.com> Cc: Mark M. Hoffman <mhoffman@lightlink.com> Cc: Jean Delvare <khali@linux-fr.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:35 -07:00
Henrik Rydberg	02fcbd144d	hwmon: applesmc: prolong status wait The time to wait for a status change while reading or writing to the SMC ports is a balance between read reliability and system performance. The current setting yields rougly three errors in a thousand when simultaneously reading three different temperature values on a Macbook Air. This patch increases the setting to a value yielding roughly one error in ten thousand, with no noticable system performance degradation. Signed-off-by: Henrik Rydberg <rydberg@euromail.se> Cc: Nicolas Boichat <nicolas@boichat.ch> Cc: Riki Oktarianto <rkoktarianto@gmail.com> Cc: Mark M. Hoffman <mhoffman@lightlink.com> Cc: Jean Delvare <khali@linux-fr.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:35 -07:00
Henrik Rydberg	84d2d7f2ee	hwmon: applesmc: fix the 'wait status failed: c != 8' problem On many Macbooks since mid 2007, the Pro, C2D and Air models, applesmc fails to read some or all SMC ports. This problem has various effects, such as flooded logfiles, malfunctioning temperature sensors, accelerometers failing to initialize, and difficulties getting backlight functionality to work properly. The root of the problem seems to be the command protocol. The current code sends out a command byte, then repeatedly polls for an ack before continuing to send or recieve data. From experiments leading to this patch, it seems the command protocol never quite worked or changed so that one now sends a command byte, waits a little bit, polls for an ack, and if it fails, repeats the whole thing by sending the command byte again. This patch implements a send_command function according to the new interpretation of the protocol, and should work also for earlier models. Signed-off-by: Henrik Rydberg <rydberg@euromail.se> Cc: Nicolas Boichat <nicolas@boichat.ch> Cc: Riki Oktarianto <rkoktarianto@gmail.com> Cc: Mark M. Hoffman <mhoffman@lightlink.com> Cc: Jean Delvare <khali@linux-fr.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:35 -07:00
Henrik Rydberg	05224091af	hwmon: applesmc: specified number of bytes to read should match actual At one single place in the code, the specified number of bytes to read and the actual number of bytes read differ by one. This one-liner patch fixes that inconsistency. Signed-off-by: Henrik Rydberg <rydberg@euromail.se> Cc: Nicolas Boichat <nicolas@boichat.ch> Cc: Riki Oktarianto <rkoktarianto@gmail.com> Cc: Mark M. Hoffman <mhoffman@lightlink.com> Cc: Jean Delvare <khali@linux-fr.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:35 -07:00
Jim Cromie	865c295360	hwmon/pc87360 separate alarm files: add therm-min/max/crit-alarms Adds therm-min/max/crit-alarm callbacks, sensor-device-attribute declarations, and refs to those new decls in the macro used to initialize the therm_group (of sysfs files) The thermistors use voltage channels to measure; so they don't have a fault-alarm, but unlike the other voltages, they do have an overtemp, which we call crit (by convention). [akpm@linux-foundation.org: cleanup] Signed-off-by: Jim Cromie <jim.cromie@gmail.com> Cc: Jean Delvare <khali@linux-fr.org> Cc: "Mark M. Hoffman" <mhoffman@lightlink.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:35 -07:00
Jim Cromie	8ca136741e	hwmon/pc87360 separate alarm files: add dev_dbg help temp and vin status register values may be set by chip specifications, set again by bios, or by this previously loaded driver. Debug output nicely displays modprobe init=\d actions. Signed-off-by: Jim Cromie <jim.cromie@gmail.com> Cc: Jean Delvare <khali@linux-fr.org> Cc: "Mark M. Hoffman" <mhoffman@lightlink.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:35 -07:00
Jim Cromie	2a32ec2500	hwmon/pc87360 separate alarm files: define LDNI_MAX const Driver handles 3 logical devices in fixed length array. Give this a define-d constant. Signed-off-by: Jim Cromie <jim.cromie@gmail.com> Cc: Jean Delvare <khali@linux-fr.org> Cc: "Mark M. Hoffman" <mhoffman@lightlink.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:35 -07:00
Jim Cromie	b267e8cdc6	hwmon/pc87360 separate alarm files: add temp-min/max/crit/fault-alarms Adds temp-min/max/crit/fault-alarm callbacks, sensor-device-attribute declarations, and refs to those new decls in the macro used to initialize the temp_group (of sysfs files) [akpm@linux-foundation.org: cleanups] Signed-off-by: Jim Cromie <jim.cromie@gmail.com> Cc: Jean Delvare <khali@linux-fr.org> Cc: "Mark M. Hoffman" <mhoffman@lightlink.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:35 -07:00
Jim Cromie	492e9657d1	hwmon/pc87360 separate alarm files: add in-min/max-alarms Adds vin-min/max-alarm callbacks, sensor-device-attribute declarations, and refs to those new decls in the macro used to initialize the vin_group (of sysfs files) [akpm@linux-foundation.org: cleanups] Signed-off-by: Jim Cromie <jim.cromie@gmail.com> Cc: Jean Delvare <khali@linux-fr.org> Cc: "Mark M. Hoffman" <mhoffman@lightlink.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:35 -07:00
Jim Cromie	28f74e7177	hwmon/pc87360 separate alarm files: define some constants Bring hwmon/pc87360 into agreement with Documentation/hwmon/sysfs-interface. Patchset adds separate limit alarms for voltages and temps, it also adds temp[123]_fault files. On my Soekris, temps 1,2 are unused/unconnected, so temp[123]_fault = 1,1,0 respectively. This agrees with /usr/bin/sensors, which has always shown them as OPEN. Temps 4,5,6 are thermistor based, and dont have a fault bit in their status register. This patch: 2 different kinds of constants added: - CHAN_ALM_* constants for (later) vin, temp alarm callbacks. - CHAN_* conversion constants, used in _init_device, partly for RW1C bits Signed-off-by: Jim Cromie <jim.cromie@gmail.com> Cc: Jean Delvare <khali@linux-fr.org> Cc: "Mark M. Hoffman" <mhoffman@lightlink.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:35 -07:00
Ameya Palande	4d235ba6c2	intel-iommu: typo fix and correct word in the comment Fix for a typo and and replacing incorrect word in the comment. Signed-off-by: Ameya Palande <2ameya@gmail.com> Cc: "Ashok Raj" <ashok.raj@intel.com> Cc: "Shaohua Li" <shaohua.li@intel.com> Cc: "Anil S Keshavamurthy" <anil.s.keshavamurthy@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:34 -07:00
WANG Cong	c3b9f5afc7	kernel/configs.c: remove useless comments These comments are useless, remove them. Signed-off-by: WANG Cong <wangcong@zeuux.org> Cc: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:34 -07:00
Eric Piel	5c62484116	HP-WMI: additional keycode (or typo) On my HP 2510, pressing the (i) button generates an unknown keycode: 0x213b. So here is a patch adding support for it. However, as it seems there is already support for a similar button connected to 0x231b as keycode, I wonder if it could be a typo in the driver? Signed-off-by: Eric Piel <eric.piel@tremplin-utc.net> Cc: Matthew Garrett <mjg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:34 -07:00
Andi Kleen	2a80a3783d	Fix documentation of sysrq-q I fell into the trap recently that it only dumps hrtimers instead of all timers. Fix the documentation. Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:34 -07:00
WANG Cong	966c8079c0	uml: fix a compile error Fix arch/um/sys-i386/signal.c: In function 'copy_sc_from_user': arch/um/sys-i386/signal.c:182: warning: dereferencing 'void *' pointer arch/um/sys-i386/signal.c:182: error: request for member '_fxsr_env' in something not a structure or union Signed-off-by: WANG Cong <wangcong@zeuux.org> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:34 -07:00
Huang Weiyi	d12a6f7f91	arch/m68k/bvme6000/rtc.c: remove duplicated include Removed duplicated include file <linux/smp_lock.h> in arch/m68k/bvme6000/rtc.c. Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:34 -07:00
Matt Helsley	bde5ab6558	container freezer: document the cgroup freezer subsystem. Describe why we need the freezer subsystem and how to use it in a documentation file. Since the cgroups.txt file is focused on the subsystem-agnostic portions of cgroups make a directory and move the old cgroups.txt file at the same time. Signed-off-by: Matt Helsley <matthltc@us.ibm.com> Cc: Paul Menage <menage@google.com> Cc: containers@lists.linux-foundation.org Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:34 -07:00
Matt Helsley	1aece34833	container freezer: rename check_if_frozen() check_if_frozen() sounds like it should return something when in fact it's just updating the freezer state. Signed-off-by: Matt Helsley <matthltc@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:34 -07:00
Matt Helsley	81dcf33c2a	container freezer: make freezer state names less generic Rename cgroup freezer states to be less generic to avoid any name collisions while also better describing what each state is. Signed-off-by: Matt Helsley <matthltc@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:34 -07:00
Matt Helsley	957a4eeaf4	container freezer: prevent frozen tasks or cgroups from changing Don't let frozen tasks or cgroups change. This means frozen tasks can't leave their current cgroup for another cgroup. It also means that tasks cannot be added to or removed from a cgroup in the FROZEN state. We enforce these rules by checking for frozen tasks and cgroups in the can_attach() function. Signed-off-by: Matt Helsley <matthltc@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:34 -07:00
Matt Helsley	5a06915c6d	container freezer: skip frozen cgroups during power management resume When a system is resumed after a suspend, it will also unfreeze frozen cgroups. This patchs modifies the resume sequence to skip the tasks which are part of a frozen control group. Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Matt Helsley <matthltc@us.ibm.com> Acked-by: Serge E. Hallyn <serue@us.ibm.com> Tested-by: Matt Helsley <matthltc@us.ibm.com> Acked-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:34 -07:00
Matt Helsley	dc52ddc0e6	container freezer: implement freezer cgroup subsystem This patch implements a new freezer subsystem in the control groups framework. It provides a way to stop and resume execution of all tasks in a cgroup by writing in the cgroup filesystem. The freezer subsystem in the container filesystem defines a file named freezer.state. Writing "FROZEN" to the state file will freeze all tasks in the cgroup. Subsequently writing "RUNNING" will unfreeze the tasks in the cgroup. Reading will return the current state. * Examples of usage : # mkdir /containers/freezer # mount -t cgroup -ofreezer freezer /containers # mkdir /containers/0 # echo $some_pid > /containers/0/tasks to get status of the freezer subsystem : # cat /containers/0/freezer.state RUNNING to freeze all tasks in the container : # echo FROZEN > /containers/0/freezer.state # cat /containers/0/freezer.state FREEZING # cat /containers/0/freezer.state FROZEN to unfreeze all tasks in the container : # echo RUNNING > /containers/0/freezer.state # cat /containers/0/freezer.state RUNNING This is the basic mechanism which should do the right thing for user space task in a simple scenario. It's important to note that freezing can be incomplete. In that case we return EBUSY. This means that some tasks in the cgroup are busy doing something that prevents us from completely freezing the cgroup at this time. After EBUSY, the cgroup will remain partially frozen -- reflected by freezer.state reporting "FREEZING" when read. The state will remain "FREEZING" until one of these things happens: 1) Userspace cancels the freezing operation by writing "RUNNING" to the freezer.state file 2) Userspace retries the freezing operation by writing "FROZEN" to the freezer.state file (writing "FREEZING" is not legal and returns EIO) 3) The tasks that blocked the cgroup from entering the "FROZEN" state disappear from the cgroup's set of tasks. [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: export thaw_process] Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Matt Helsley <matthltc@us.ibm.com> Acked-by: Serge E. Hallyn <serue@us.ibm.com> Tested-by: Matt Helsley <matthltc@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:34 -07:00
Matt Helsley	8174f1503f	container freezer: make refrigerator always available Now that the TIF_FREEZE flag is available in all architectures, extract the refrigerator() and freeze_task() from kernel/power/process.c and make it available to all. The refrigerator() can now be used in a control group subsystem implementing a control group freezer. Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Matt Helsley <matthltc@us.ibm.com> Acked-by: Serge E. Hallyn <serue@us.ibm.com> Tested-by: Matt Helsley <matthltc@us.ibm.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:33 -07:00
Matt Helsley	83224b0837	container freezer: add TIF_FREEZE flag to all architectures This patch series introduces a cgroup subsystem that utilizes the swsusp freezer to freeze a group of tasks. It's immediately useful for batch job management scripts. It should also be useful in the future for implementing container checkpoint/restart. The freezer subsystem in the container filesystem defines a cgroup file named freezer.state. Reading freezer.state will return the current state of the cgroup. Writing "FROZEN" to the state file will freeze all tasks in the cgroup. Subsequently writing "RUNNING" will unfreeze the tasks in the cgroup. * Examples of usage : # mkdir /containers/freezer # mount -t cgroup -ofreezer freezer /containers # mkdir /containers/0 # echo $some_pid > /containers/0/tasks to get status of the freezer subsystem : # cat /containers/0/freezer.state RUNNING to freeze all tasks in the container : # echo FROZEN > /containers/0/freezer.state # cat /containers/0/freezer.state FREEZING # cat /containers/0/freezer.state FROZEN to unfreeze all tasks in the container : # echo RUNNING > /containers/0/freezer.state # cat /containers/0/freezer.state RUNNING This patch: The first step in making the refrigerator() available to all architectures, even for those without power management. The purpose of such a change is to be able to use the refrigerator() in a new control group subsystem which will implement a control group freezer. [akpm@linux-foundation.org: fix sparc] Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Matt Helsley <matthltc@us.ibm.com> Acked-by: Pavel Machek <pavel@suse.cz> Acked-by: Serge E. Hallyn <serue@us.ibm.com> Acked-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Nigel Cunningham <nigel@tuxonice.net> Tested-by: Matt Helsley <matthltc@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:33 -07:00
Brice Goglin	5e9a0f023b	mm: extract do_pages_move() out of sys_move_pages() To prepare the chunking, move the sys_move_pages() code that is used when nodes!=NULL into do_pages_move(). And rename do_move_pages() into do_move_page_to_node_array(). Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr> Acked-by: Christoph Lameter <cl@linux-foundation.org> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:33 -07:00
Brice Goglin	2f007e74bb	mm: don't vmalloc a huge page_to_node array for do_pages_stat() do_pages_stat() does not need any page_to_node entry for real. Just pass the pointers to the user-space page address array and to the user-space status array, and have do_pages_stat() traverse the former and fill the latter directly. Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr> Acked-by: Christoph Lameter <cl@linux-foundation.org> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:33 -07:00
Brice Goglin	e78bbfa826	mm: stop returning -ENOENT from sys_move_pages() if nothing got migrated A patchset reworking sys_move_pages(). It removes the possibly large vmalloc by using multiple chunks when migrating large buffers. It also dramatically increases the throughput for large buffers since the lookup in new_page_node() is now limited to a single chunk, causing the quadratic complexity to have a much slower impact. There is no need to use any radix-tree-like structure to improve this lookup. sys_move_pages() duration on a 4-quadcore-opteron 2347HE (1.9Gz), migrating between nodes #2 and #3: length move_pages (us) move_pages+patch (us) 4kB 126 98 40kB 198 168 400kB 963 937 4MB 12503 11930 40MB 246867 11848 Patches #1 and #4 are the important ones: 1) stop returning -ENOENT from sys_move_pages() if nothing got migrated 2) don't vmalloc a huge page_to_node array for do_pages_stat() 3) extract do_pages_move() out of sys_move_pages() 4) rework do_pages_move() to work on page_sized chunks 5) move_pages: no need to set pp->page to ZERO_PAGE(0) by default This patch: There is no point in returning -ENOENT from sys_move_pages() if all pages were already on the right node, while we return 0 if only 1 page was not. Most application don't know where their pages are allocated, so it's not an error to try to migrate them anyway. Just return 0 and let the status array in user-space be checked if the application needs details. It will make the upcoming chunked-move_pages() support much easier. Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr> Acked-by: Christoph Lameter <cl@linux-foundation.org> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:33 -07:00
Nathan Fontenot	de7f0cba96	memory hotplug: release memory regions in PAGES_PER_SECTION chunks During hotplug memory remove, memory regions should be released on a PAGES_PER_SECTION size chunks. This mirrors the code in add_memory where resources are requested on a PAGES_PER_SECTION size. Attempting to release the entire memory region fails because there is not a single resource for the total number of pages being removed. Instead the resources for the pages are split in PAGES_PER_SECTION size chunks as requested during memory add. Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com> Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Acked-by: Yasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:32 -07:00
Andrea Righi	7a6560e025	documentation: clarify dirty_ratio and dirty_background_ratio description The current documentation of dirty_ratio and dirty_background_ratio is a bit misleading. In the documentation we say that they are "a percentage of total system memory", but the current page writeback policy, intead, is to apply the percentages to the dirtyable memory, that means free pages + reclaimable pages. Better to be more explicit to clarify this concept. Signed-off-by: Andrea Righi <righi.andrea@gmail.com> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:32 -07:00
Shaohua Li	9f1b16a51e	memory_probe: fix wrong sysfs file attribute This attribute just has a write operation. [akpm@linux-foundation.org: use S_IWUSR as suggested by Randy] Signed-off-by: Shaohua Li <shaohua.li@intel.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:32 -07:00
Gerald Schaefer	1125b4e394	setup_per_zone_pages_min(): take zone->lock instead of zone->lru_lock This replaces zone->lru_lock in setup_per_zone_pages_min() with zone->lock. There seems to be no need for the lru_lock anymore, but there is a need for zone->lock instead, because that function may call move_freepages() via setup_zone_migrate_reserve(). Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Tested-by: Yasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:32 -07:00
KOSAKI Motohiro	4b2e38ad70	hugepage: support ZERO_PAGE() Presently hugepage doesn't use zero page at all because zero page is only used for coredumping and hugepage can't core dump. However we have now implemented hugepage coredumping. Therefore we should implement the zero page of hugepage. Implementation note: o Why do we only check VM_SHARED for zero page? normal page checked as .. static inline int use_zero_page(struct vm_area_struct *vma) { if (vma->vm_flags & (VM_LOCKED \| VM_SHARED)) return 0; return !vma->vm_ops \|\| !vma->vm_ops->fault; } First, hugepages are never mlock()ed. We aren't concerned with VM_LOCKED. Second, hugetlbfs is a pseudo filesystem, not a real filesystem and it doesn't have any file backing. Thus ops->fault checking is meaningless. o Why don't we use zero page if !pte. !pte indicate {pud, pmd} doesn't exist or some error happened. So we shouldn't return zero page if any error occurred. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Adam Litke <agl@us.ibm.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Kawai Hidehiro <hidehiro.kawai.ez@hitachi.com> Cc: Mel Gorman <mel@skynet.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:32 -07:00
KOSAKI Motohiro	e575f111dc	coredump_filter: add hugepage dumping Presently hugepage's vma has a VM_RESERVED flag in order not to be swapped. But a VM_RESERVED vma isn't core dumped because this flag is often used for some kernel vmas (e.g. vmalloc, sound related). Thus hugepages are never dumped and it can't be debugged easily. Many developers want hugepages to be included into core-dump. However, We can't read generic VM_RESERVED area because this area is often IO mapping area. then these area reading may change device state. it is definitly undesiable side-effect. So adding a hugepage specific bit to the coredump filter is better. It will be able to hugepage core dumping and doesn't cause any side-effect to any i/o devices. In additional, libhugetlb use hugetlb private mapping pages as anonymous page. Then, hugepage private mapping pages should be core dumped by default. Then, /proc/[pid]/core_dump_filter has two new bits. - bit 5 mean hugetlb private mapping pages are dumped or not. (default: yes) - bit 6 mean hugetlb shared mapping pages are dumped or not. (default: no) I tested by following method. % ulimit -c unlimited % ./crash_hugepage 50 % ./crash_hugepage 50 -p % ls -lh % gdb ./crash_hugepage core % % echo 0x43 > /proc/self/coredump_filter % ./crash_hugepage 50 % ./crash_hugepage 50 -p % ls -lh % gdb ./crash_hugepage core #include <stdlib.h> #include <stdio.h> #include <unistd.h> #include <sys/mman.h> #include <string.h> #include "hugetlbfs.h" int main(int argc, char** argv){ char* p; int ch; int mmap_flags = MAP_SHARED; int fd; int nr_pages; while((ch = getopt(argc, argv, "p")) != -1) { switch (ch) { case 'p': mmap_flags &= ~MAP_SHARED; mmap_flags \|= MAP_PRIVATE; break; default: /* nothing/ break; } } argc -= optind; argv += optind; if (argc == 0){ printf("need # of pages\n"); exit(1); } nr_pages = atoi(argv[0]); if (nr_pages < 2) { printf("nr_pages must >2\n"); exit(1); } fd = hugetlbfs_unlinked_fd(); p = mmap(NULL, nr_pages gethugepagesize(), PROT_READ\|PROT_WRITE, mmap_flags, fd, 0); sleep(2); (p + gethugepagesize()) = 1; / COW / sleep(2); / crash! / (int*)0 = 1; return 0; } Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: Kawai Hidehiro <hidehiro.kawai.ez@hitachi.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: William Irwin <wli@holomorphy.com> Cc: Adam Litke <agl@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:52:32 -07:00

1 2 3 4 5 ...

115969 Commits