After a newly plugged CPU sets the cpu_online bit it enables
interrupts and goes idle. The cpu which brought up the new cpu waits
for the cpu_online bit and when it observes it, it sets the cpu_active
bit for this cpu. The cpu_active bit is the relevant one for the
scheduler to consider the cpu as a viable target.
With forced threaded interrupt handlers which imply forced threaded
softirqs we observed the following race:
cpu 0 cpu 1
bringup(cpu1);
set_cpu_online(smp_processor_id(), true);
local_irq_enable();
while (!cpu_online(cpu1));
timer_interrupt()
-> wake_up(softirq_thread_cpu1);
-> enqueue_on(softirq_thread_cpu1, cpu0);
^^^^
cpu_notify(CPU_ONLINE, cpu1);
-> sched_cpu_active(cpu1)
-> set_cpu_active((cpu1, true);
When an interrupt happens before the cpu_active bit is set by the cpu
which brought up the newly onlined cpu, then the scheduler refuses to
enqueue the woken thread which is bound to that newly onlined cpu on
that newly onlined cpu due to the not yet set cpu_active bit and
selects a fallback runqueue. Not really an expected and desirable
behaviour.
So far this has only been observed with forced hard/softirq threading,
but in theory this could happen without forced threaded hard/softirqs
as well. It's probably unobservable as it would take a massive
interrupt storm on the newly onlined cpu which causes the softirq loop
to wake up the softirq thread and an even longer delay of the cpu
which waits for the cpu_online bit.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Peter Zijlstra <peterz@infradead.org>
Cc: stable@kernel.org # 2.6.39
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86/amd-iommu: Fix boot crash with hidden PCI devices
x86/amd-iommu: Use only per-device dma_ops
x86/amd-iommu: Fix 3 possible endless loops
* 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
drm/nv40: fall back to paged dma object for the moment
drm/nouveau: fix leak of gart mm node
drm/nouveau: fix vram page mapping when crossing page table boundaries
drm/nv17-nv40: Fix modesetting failure when pitch == 4096px (fdo bug 35901).
drm/nouveau: don't create accel engine objects when noaccel=1
drm/nvc0: recognise 0xdX chipsets as NV_C0
drm/i915: Add a no lvds quirk for the Asus EeeBox PC EB1007
drm/i915: Share the common force-audio property between connectors
drm/i915: Remove unused enum "chip_family"
drm/915: fix relaxed tiling on gen2: tile height
drm/i915/crt: Explicitly return false if connected to a digital monitor
drm/i915: Replace ironlake_compute_wm0 with g4x_compute_wm0
drm/i915: Only print out the actual number of fences for i915_error_state
drm/i915: s/addr & ~PAGE_MASK/offset_in_page(addr)/
drm: i915: correct return status in intel_hdmi_mode_valid()
drm/i915: fix regression after clock gating init split
drm/i915: fix if statement in ivybridge irq handler
* 'drm-radeon-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
drm/radeon/kms/atom: fix PHY init
drm/radeon/kms: add missing Evergreen texture formats to the CS parser
drm/radeon/kms: viewport height has to be even
drm/radeon/kms: remove duplicate reg from r600 safe regs
drm/radeon/kms: add support for Llano Fusion APUs
drm/radeon/kms: add llano pci ids
drm/radeon/kms: fill in asic struct for llano
drm/radeon/kms: add family ids for llano APUs
drm/radeon: fix oops in ttm reserve when pageflipping (v2)
drm/radeon/kms: clean up the radeon kms Kconfig
drm/radeon/kms: fix thermal sensor reading on juniper
drm/radeon/kms: add missing case for cayman thermal sensor
drm/radeon/kms: add blit support for cayman (v2)
drm/radeon/kms/blit: workaround some hw issues on evergreen+
* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
timers: Consider slack value in mod_timer()
clockevents: Handle empty cpumask gracefully
* 'kvm-updates/3.0' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: Initialize kvm before registering the mmu notifier
KVM: x86: use proper port value when checking io instruction permission
KVM: add missing void __user * cast to access_ok() call
We had to say goodbye when David passed away recently. David had a
huge impact on our community, both personally in the lives of the
people he worked with, and technically in the design and maintenance
of several subsystems. He is greatly missed.
He also leaves behind a number of much loved subsystems now orphaned.
This patch updates the MAINTAINERS file for the areas that David was
responsible for and adds an entry for him to the CREDITS file.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Harry Wei <harryxiyou@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
_sdata needs to be declared in the linker script now as of commit
a2d063ac21 ("extable, core_kernel_data(): Make sure all archs define
_sdata")
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
die_if_no_fixup() shouldn't use get_user() as it doesn't call set_fs() to
indicate that it wants to probe a kernel address. Instead it should use
probe_kernel_read().
This fixes the problem of gdb seeing SIGILL rather than SIGTRAP when hitting
the KGDB special breakpoint upon SysRq+g being seen. The problem was that
die_if_no_fixup() was failing to read the opcode of the instruction that caused
the exception, and thus not fixing up the exception.
This caused gdb to get a S04 response to the $? request in its remote protocol
rather than S05 - which would then cause it to continue with $C04 rather than
$c in an attempt to pass the signal onto the inferior process. The kernel,
however, does not support $Cnn, and so objects by returning an E22 response,
indicating an error. gdb does not expect this and prints:
warning: Remote failure reply: E22
and then returns to the gdb command prompt unable to continue.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
One of the kernel debugger cacheflush variants escaped proper testing. Two of
the labels are wrong, being derived from the code that was copied to construct
the variant.
The first label results in the following assembler message:
AS arch/mn10300/mm/cache-dbg-flush-by-reg.o
arch/mn10300/mm/cache-dbg-flush-by-reg.S: Assembler messages:
arch/mn10300/mm/cache-dbg-flush-by-reg.S:123: Error: symbol `debugger_local_cache_flushinv_no_dcache' is already defined
And the second label results in the following linker message:
arch/mn10300/mm/built-in.o:(.text+0x1d39): undefined reference to `mn10300_local_icache_inv_range_reg_end'
arch/mn10300/mm/built-in.o:(.text+0x1d39): relocation truncated to fit: R_MN10300_PCREL16 against undefined symbol `mn10300_local_icache_inv_range_reg_end'
To test this file the following configuration pieces must be set:
CONFIG_AM34=y
CONFIG_MN10300_CACHE_WBACK=y
CONFIG_MN10300_DEBUGGER_CACHE_FLUSH_BY_REG=y
CONFIG_MN10300_CACHE_MANAGE_BY_REG=y
CONFIG_AM34_HAS_CACHE_SNOOP=n
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6: (21 commits)
ARM: OMAP4: MMC: increase delay for pbias
arm: omap2plus: move NAND_BLOCK_SIZE out of boards
omap4: hwmod: Enable the keypad
omap3: Free Beagle rev gpios when they are read, so others can read them later
arm: omap3: beagle: Ensure msecure is mux'd to be able to set the RTC
omap: rx51: Don't power up speaker amplifier at bootup
omap: rx51: Set regulator V28_A always on
ARM: OMAP4: MMC: no regulator off during probe for eMMC
arm: omap2plus: fix ads7846 pendown gpio request
ARM: OMAP2: Add missing iounmap in omap4430_phy_init
ARM: omap4: Pass core and wakeup mux tables to omap4_mux_init
ARM: omap2+: mux: Allow board mux settings to be NULL
OMAP4: fix return value of omap4_l3_init
OMAP: iovmm: fix SW flags passed by user
arch/arm/mach-omap1/dma.c: Invert calls to platform_device_put and platform_device_del
OMAP2+: mux: fix compilation warnings
OMAP: SRAM: Fix warning: format '%08lx' expects type 'long unsigned int'
arm: omap3: cm-t3517: fix section mismatch warning
OMAP2+: Fix 9 section mismatch(es) warnings from mach-omap2/built-in.o
ARM: OMAP2: Add missing include of linux/gpio.h
...
* 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
vfs: make unlink() and rmdir() return ENOENT in preference to EROFS
lmLogOpen() broken failure exit
usb: remove bad dput after dentry_unhash
more conservative S_NOSEC handling
If user space attempts to remove a non-existent file or directory, and
the file system is mounted read-only, return ENOENT instead of EROFS.
Either error code is arguably valid/correct, but ENOENT is a more
specific error message.
Reported-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Callers of lmLogOpen() expect it to return -E... on failure exits, which
is what it returns, except for the case of blkdev_get_by_dev() failure.
It that case lmLogOpen() return the error with the wrong sign...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Sergey reported a CONFIG_PROVE_RCU warning in push_rt_task where
set_task_cpu() was called with both relevant rq->locks held, which
should be sufficient for running tasks since holding its rq->lock
will serialize against sched_move_task().
Update the comments and fix the task_group() lockdep test.
Reported-and-tested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1307115427.2353.3456.camel@twins
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The main lock_is_held() user is lockdep_assert_held(), avoid false
assertions in lockdep_off() sections by unconditionally reporting the
lock is taken.
[ the reason this is important is a lockdep_assert_held() in ttwu()
which triggers a warning under lockdep_off() as in printk() which
can trigger another wakeup and lock up due to spinlock
recursion, as reported and heroically debugged by Arne Jansen ]
Reported-and-tested-by: Arne Jansen <lists@die-jansens.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/r/1307398759.2497.966.camel@laptop
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Some PCIe cards ship with a PCI-PCIe bridge which is not
visible as a PCI device in Linux. But the device-id of the
bridge is present in the IOMMU tables which causes a boot
crash in the IOMMU driver.
This patch fixes by removing these cards from the IOMMU
handling. This is a pure -stable fix, a real fix to handle
this situation appriatly will follow for the next merge
window.
Cc: stable@kernel.org # > 2.6.32
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
* 'nouveau/drm-nouveau-fixes' of /ssd/git/drm-nouveau-next:
drm/nv40: fall back to paged dma object for the moment
drm/nouveau: fix leak of gart mm node
drm/nouveau: fix vram page mapping when crossing page table boundaries
drm/nv17-nv40: Fix modesetting failure when pitch == 4096px (fdo bug 35901).
drm/nouveau: don't create accel engine objects when noaccel=1
drm/nvc0: recognise 0xdX chipsets as NV_C0
* 'keithp/drm-intel-fixes' of /ssd/git/drm-next:
drm/i915: Add a no lvds quirk for the Asus EeeBox PC EB1007
drm/i915: Share the common force-audio property between connectors
drm/i915: Remove unused enum "chip_family"
drm/915: fix relaxed tiling on gen2: tile height
drm/i915/crt: Explicitly return false if connected to a digital monitor
drm/i915: Replace ironlake_compute_wm0 with g4x_compute_wm0
drm/i915: Only print out the actual number of fences for i915_error_state
drm/i915: s/addr & ~PAGE_MASK/offset_in_page(addr)/
drm: i915: correct return status in intel_hdmi_mode_valid()
drm/i915: fix regression after clock gating init split
drm/i915: fix if statement in ivybridge irq handler
PCI(E)GART isn't quite stable it seems, fall back to old method until I get
the time to sort it out properly.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Should hopefully get modesetting at least from this, it appears these are
GF119 chipsets. Accel will come eventually, once I order a board.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Unfortunatly there are systems where the AMD IOMMU does not
cover all devices. This breaks with the current driver as it
initializes the global dma_ops variable. This patch limits
the AMD IOMMU to the devices listed in the IVRS table fixing
DMA for devices not covered by the IOMMU.
Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
The driver contains several loops counting on an u16 value
where the exit-condition is checked against variables that
can have values up to 0xffff. In this case the loops will
never exit. This patch fixed 3 such loops.
Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
KVM is not available for 31 bit but the KVM defines cause warnings:
arch/s390/include/asm/pgtable.h: In function 'ptep_test_and_clear_user_dirty':
arch/s390/include/asm/pgtable.h:817: warning: integer constant is too large for 'unsigned long' type
arch/s390/include/asm/pgtable.h:818: warning: integer constant is too large for 'unsigned long' type
arch/s390/include/asm/pgtable.h: In function 'ptep_test_and_clear_user_young':
arch/s390/include/asm/pgtable.h:837: warning: integer constant is too large for 'unsigned long' type
arch/s390/include/asm/pgtable.h:838: warning: integer constant is too large for 'unsigned long' type
Add 31 bit versions of the KVM defines to remove the warnings.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Replace the s390 specific rcu page-table freeing code with the
generic variant. This requires to duplicate the definition for the
struct mmu_table_batch as s390 does not use the generic tlb flush
code.
While we are at it remove the restriction that page table fragments
can not be reused after a single fragment has been freed with rcu
and split out allocation and freeing of page tables with pgstes.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The qdio SBAL entry flag is made-up of four different values that are
independent of one another. Some of the bits are reserved by the
hardware and should not be changed by qdio. Currently all four values
are overwritten since the SBAL entry flag is defined as an u32.
Split the SBAL entry flag into four u8's as defined by the hardware
and don't touch the reserved bits.
Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
commit 9ff4cfb3fc ([S390] kvm-390: Let
kernel exit SIE instruction on work) fixed a problem of commit
commit cd3b70f5d4 ([S390] virtualization
aware cpu measurement) but uncovered another one.
If a kvm guest accesses guest real memory that doesnt exist, the
page fault handler calls the sie hook, which then rewrites
the return psw from sie_inst to either sie_exit or sie_reenter.
On return, the page fault handler will then detect the wrong access
as a kernel fault causing a kernel oops in sie_reenter or sie_exit.
We have to add these two addresses to the exception table to allow
graceful exits.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Since fb_info is now refcounted and thus may get freed at any time it
gets unregistered module unloading will try to unregister framebuffer
as stored in platform data on probe though this pointer may
be stale.
Cleanup platform data on framebuffer release.
CC: stable@kernel.org
Signed-off-by: Bruno Prémont <bonbons@linux-vserver.org>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Al Viro observes that in the hugetlb case, handle_mm_fault() may return
a value of the kind ENOSPC when its caller is expecting a value of the
kind VM_FAULT_SIGBUS: fix alloc_huge_page()'s failure returns.
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
ALSA: usb - turn off de-emphasis in s/pdif for cm6206
ALSA: asihpi: Use angle brackets for system includes
ALSA: fm801: add error handling if auto-detect fails
ALSA: hda - Check pin support EAPD in ad198x_power_eapd_write
ALSA: hda - Fix HP and Front pins of ad1988/ad1989 in ad198x_power_eapd()
ALSA: 6fire: Don't leak firmware in error path
ASoC: Fix wm_hubs input PGA ZC bits
ASoC: Fix dapm_is_shared_kcontrol so everything isn't shared
* 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/staging:
hwmon: (max6642): Better chip detection schema
hwmon: (coretemp) Further relax temperature range checks
hwmon: (coretemp) Fix TjMax detection for older CPUs
hwmon: (coretemp) Relax target temperature range check
hwmon: (max6642) Rename temp_fault sysfs attribute to temp2_fault
It doesn't make sense to ever see a half-initialized kvm structure on
mmu notifier callbacks. Previously, 85722cda changed the ordering to
ensure that the mmu_lock was initialized before mmu notifier
registration, but there is still a race where the mmu notifier could
come in and try accessing other portions of struct kvm before they are
intialized.
Solve this by moving the mmu notifier registration to occur after the
structure is completely initialized.
Google-Bug-Id: 452199
Signed-off-by: Mike Waychison <mikew@google.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Commit f6511935f4 moved the permission check for io instructions
to the ->check_perm callback. It failed to copy the port value from RDX
register for string and "in,out ax,dx" instructions.
Fix it by reading RDX register at decode stage when appropriate.
Fixes FC8.32 installation.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
The PHY was not initialized correctly after
ac89af1e10 since
the function bailed early as an encoder was not
assigned. The encoder isn't necessary for PHY init
so just assign to 0 for init so that the table
is executed.
Reported-by: Ari Savolainen <ari.m.savolainen@gmail.com>
Tested-by: Ari Savolainen <ari.m.savolainen@gmail.com>
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Currently, both the WM8903 and TPS6586x chips attempt to register with
gpiolib using the same GPIO numbers. This causes the audio driver to
fail to initialize.
To solve this, add a define to board-harmony.h for the TPS6586x, and make
board-harmony-power.c use this define, instead of directly referencing
TEGRA_NR_GPIOS.
This fixes a regression introduced by commit
6f168f2fa6.
ARM: tegra: harmony: initialize the TPS65862 PMIC
Signed-off-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Colin Cross <ccross@android.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (25 commits)
btrfs: fix uninitialized variable warning
btrfs: add helper for fs_info->closing
Btrfs: add mount -o inode_cache
btrfs: scrub: add explicit plugging
btrfs: use btrfs_ino to access inode number
Btrfs: don't save the inode cache if we are deleting this root
btrfs: false BUG_ON when degraded
Btrfs: don't save the inode cache in non-FS roots
Btrfs: make sure we don't overflow the free space cache crc page
Btrfs: fix uninit variable in the delayed inode code
btrfs: scrub: don't reuse bios and pages
Btrfs: leave spinning on lookup and map the leaf
Btrfs: check for duplicate entries in the free space cache
Btrfs: don't try to allocate from a block group that doesn't have enough space
Btrfs: don't always do readahead
Btrfs: try not to sleep as much when doing slow caching
Btrfs: kill BTRFS_I(inode)->block_group
Btrfs: don't look at the extent buffer level 3 times in a row
Btrfs: map the node block when looking for readahead targets
Btrfs: set range_start to the right start in count_range_bits
...