2
0
mirror of https://github.com/edk2-porting/linux-next.git synced 2024-12-19 18:53:52 +08:00
Commit Graph

71 Commits

Author SHA1 Message Date
Matt Fleming
65694c5aad x86/PCI: Map PCI setup data with ioremap() so it can be in highmem
f9a37be0f0 ("x86: Use PCI setup data") added support for using PCI ROM
images from setup_data.  This used phys_to_virt(), which is not valid for
highmem addresses, and can cause a crash when booting a 32-bit kernel via
the EFI boot stub.

pcibios_add_device() assumes that the physical addresses stored in
setup_data are accessible via the direct kernel mapping, and that calling
phys_to_virt() is valid.  This isn't guaranteed to be true on x86 where the
direct mapping range is much smaller than on x86-64.

Calling phys_to_virt() on a highmem address results in the following:

 BUG: unable to handle kernel paging request at 39a3c198
 IP: [<c262be0f>] pcibios_add_device+0x2f/0x90
 ...
 Call Trace:
  [<c2370c73>] pci_device_add+0xe3/0x130
  [<c274640b>] pci_scan_single_device+0x8b/0xb0
  [<c2370d08>] pci_scan_slot+0x48/0x100
  [<c2371904>] pci_scan_child_bus+0x24/0xc0
  [<c262a7b0>] pci_acpi_scan_root+0x2c0/0x490
  [<c23b7203>] acpi_pci_root_add+0x312/0x42f
  ...

The solution is to use ioremap() instead of phys_to_virt() to map the
setup data into the kernel address space.

[bhelgaas: changelog]
Tested-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Seth Forshee <seth.forshee@canonical.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: stable@vger.kernel.org	# v3.8+
2013-06-05 10:50:04 -06:00
Jiang Liu
89016506b6 x86/PCI: Implement pcibios_{add|remove}_bus() hooks
Implement pcibios_{add|remove}_bus() hooks for x86 platforms.

Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Yinghai Lu <yinghai@kernel.org>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Myron Stowe <myron.stowe@redhat.com>
2013-04-12 15:38:25 -06:00
Linus Torvalds
556f12f602 PCI changes for the v3.9 merge window:
Host bridge hotplug
     - Major overhaul of ACPI host bridge add/start (Rafael Wysocki, Yinghai Lu)
     - Major overhaul of PCI/ACPI binding (Rafael Wysocki, Yinghai Lu)
     - Split out ACPI host bridge and ACPI PCI device hotplug (Yinghai Lu)
     - Stop caching _PRT and make independent of bus numbers (Yinghai Lu)
 
   PCI device hotplug
     - Clean up cpqphp dead code (Sasha Levin)
     - Disable ARI unless device and upstream bridge support it (Yijing Wang)
     - Initialize all hot-added devices (not functions 0-7) (Yijing Wang)
 
   Power management
     - Don't touch ASPM if disabled (Joe Lawrence)
     - Fix ASPM link state management (Myron Stowe)
 
   Miscellaneous
     - Fix PCI_EXP_FLAGS accessor (Alex Williamson)
     - Disable Bus Master in pci_device_shutdown (Konstantin Khlebnikov)
     - Document hotplug resource and MPS parameters (Yijing Wang)
     - Add accessor for PCIe capabilities (Myron Stowe)
     - Drop pciehp suspend/resume messages (Paul Bolle)
     - Make pci_slot built-in only (not a module) (Jiang Liu)
     - Remove unused PCI/ACPI bind ops (Jiang Liu)
     - Removed used pci_root_bus (Bjorn Helgaas)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJRKS3hAAoJEFmIoMA60/r8xxoP/j1CS4oCZAnBIVT9fKBkis+/
 CENcfHIUKj6J9iMfJEVvqBELvqaLqtpeNwAGMcGPxV7VuT3K1QumChfaTpRDP0HC
 VDRmrjcmfenEK+YPOG7acsrTyqk2wjpLOyu9MKRxtC5u7tF6376KQpkEFpO4haL4
 eUHTxfE76OkrPBSvx3+PUSf6jqrvrNbjX8K6HdDVVlm3sVAQKmYJU/Wphv2NPOqa
 CAMyCzEGybFjr8hDRwvWgr+06c718GMwQUbnrPdHXAe7lMNMrN/XVBmU9ABN3Aas
 icd3lrDs+yPObgcO/gT8+sAZErCtdJ9zuHGYHdYpRbIQj/5JT4TMk7tw/Bj7vKY9
 Mqmho9GR5YmYTRN9f1r+2n5AQ/KYWXJDrRNOnt5/ys5BOM3vwJ7WJ902zpSwtFQp
 nLX+oD/hLfzpnoIQGDuBAoAXp2Kam3XWRgVvG78buRNrPj+kUzimk14a8qQeY+CB
 El6UKuwi5Uv/qgs1gAqqjmZmsAkon2DnsRZa6Fl8NTkDlis7LY4gp9OU38ySFpB+
 PhCmRyCZmDDqTVtwj6XzR3nPQ5LBSbvsTfgMxYMIUSXHa06tyb2q5p4mEIas0OmU
 RKaP5xQqZuTgD8fbdYrx0xgSrn7JHt/j/X//Qs6unlLCWhlpm3LjJZKxyw2FwBGr
 o4Lci+PiBh3MowCrju9D
 =ER3b
 -----END PGP SIGNATURE-----

Merge tag 'pci-v3.9-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI changes from Bjorn Helgaas:
 "Host bridge hotplug
    - Major overhaul of ACPI host bridge add/start (Rafael Wysocki, Yinghai Lu)
    - Major overhaul of PCI/ACPI binding (Rafael Wysocki, Yinghai Lu)
    - Split out ACPI host bridge and ACPI PCI device hotplug (Yinghai Lu)
    - Stop caching _PRT and make independent of bus numbers (Yinghai Lu)

  PCI device hotplug
    - Clean up cpqphp dead code (Sasha Levin)
    - Disable ARI unless device and upstream bridge support it (Yijing Wang)
    - Initialize all hot-added devices (not functions 0-7) (Yijing Wang)

  Power management
    - Don't touch ASPM if disabled (Joe Lawrence)
    - Fix ASPM link state management (Myron Stowe)

  Miscellaneous
    - Fix PCI_EXP_FLAGS accessor (Alex Williamson)
    - Disable Bus Master in pci_device_shutdown (Konstantin Khlebnikov)
    - Document hotplug resource and MPS parameters (Yijing Wang)
    - Add accessor for PCIe capabilities (Myron Stowe)
    - Drop pciehp suspend/resume messages (Paul Bolle)
    - Make pci_slot built-in only (not a module) (Jiang Liu)
    - Remove unused PCI/ACPI bind ops (Jiang Liu)
    - Removed used pci_root_bus (Bjorn Helgaas)"

* tag 'pci-v3.9-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (51 commits)
  PCI/ACPI: Don't cache _PRT, and don't associate them with bus numbers
  PCI: Fix PCI Express Capability accessors for PCI_EXP_FLAGS
  ACPI / PCI: Make pci_slot built-in only, not a module
  PCI/PM: Clear state_saved during suspend
  PCI: Use atomic_inc_return() rather than atomic_add_return()
  PCI: Catch attempts to disable already-disabled devices
  PCI: Disable Bus Master unconditionally in pci_device_shutdown()
  PCI: acpiphp: Remove dead code for PCI host bridge hotplug
  PCI: acpiphp: Create companion ACPI devices before creating PCI devices
  PCI: Remove unused "rc" in virtfn_add_bus()
  PCI: pciehp: Drop suspend/resume ENTRY messages
  PCI/ASPM: Don't touch ASPM if forcibly disabled
  PCI/ASPM: Deallocate upstream link state even if device is not PCIe
  PCI: Document MPS parameters pci=pcie_bus_safe, pci=pcie_bus_perf, etc
  PCI: Document hpiosize= and hpmemsize= resource reservation parameters
  PCI: Use PCI Express Capability accessor
  PCI: Introduce accessor to retrieve PCIe Capabilities Register
  PCI: Put pci_dev in device tree as early as possible
  PCI: Skip attaching driver in device_add()
  PCI: acpiphp: Keep driver loaded even if no slots found
  ...
2013-02-25 21:18:18 -08:00
Greg Kroah-Hartman
a18e3690a5 X86: drivers: remove __dev* attributes.
CONFIG_HOTPLUG is going away as an option.  As a result, the __dev*
markings need to be removed.

This change removes the use of __devinit, __devexit_p, __devinitconst,
and __devexit from these drivers.

Based on patches originally written by Bill Pemberton, but redone by me
in order to handle some of the coding style issues better, by hand.

Cc: Bill Pemberton <wfp5p@virginia.edu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Daniel Drake <dsd@laptop.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-03 15:57:04 -08:00
Bjorn Helgaas
b7869ba17c x86/PCI: Remove unused pci_root_bus
pci_root_bus is unused, so remove all references to it.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2013-01-03 15:28:34 -07:00
Myron Stowe
1278998f8f PCI: Work around Stratus ftServer broken PCIe hierarchy (fix DMI check)
Commit 284f5f9 was intended to disable the "only_one_child()" optimization
on Stratus ftServer systems, but its DMI check is wrong.  It looks for
DMI_SYS_VENDOR that contains "ftServer", when it should look for
DMI_SYS_VENDOR containing "Stratus" and DMI_PRODUCT_NAME containing
"ftServer".

Tested on Stratus ftServer 6400.

Reported-by: Fadeeva Marina <astarta@rat.ru>
Reference: https://bugzilla.kernel.org/show_bug.cgi?id=51331
Signed-off-by: Myron Stowe <myron.stowe@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: stable@vger.kernel.org	# v3.5+
2012-12-26 10:39:23 -07:00
Bjorn Helgaas
1cb73f8c47 Merge branch 'pci/mjg-pci-roms-from-efi' into next
* pci/mjg-pci-roms-from-efi:
  PCI: Use phys_addr_t for physical ROM address
2012-12-10 16:20:12 -07:00
Bjorn Helgaas
dbd3fc3345 PCI: Use phys_addr_t for physical ROM address
Use phys_addr_t rather than "void *" for physical memory address.
This removes casts and fixes a "cast from pointer to integer of different
size" warning on ppc44x_defconfig.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2012-12-10 11:24:42 -07:00
Bjorn Helgaas
72e1e868ca Merge branch 'pci/mjg-pci-roms-from-efi' into next
* pci/mjg-pci-roms-from-efi:
  x86: Use PCI setup data
  PCI: Add support for non-BAR ROMs
  PCI: Add pcibios_add_device
  EFI: Stash ROMs if they're not in the PCI BAR
2012-12-06 14:37:32 -07:00
Matthew Garrett
f9a37be0f0 x86: Use PCI setup data
EFI can provide PCI ROMs out of band via boot services, which may not be
available after boot. Add support for using the data handed off to us by
the boot stub or bootloader.

[bhelgaas: added Seth's boot_params section mismatch fix]
[bhelgaas: drop "boot_params.hdr.version < 0x0209" test]
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Seth Forshee <seth.forshee@canonical.com>
2012-12-05 14:38:26 -07:00
Taku Izumi
642c92da36 PCI: Don't pass pci_dev to pci_ext_cfg_avail()
pci_ext_cfg_avail() doesn't use the "struct pci_dev *" passed to
it, and there's no requirement that a host bridge even be represented
by a pci_dev.  This drops the pci_ext_cfg_avail() parameter.

[bhelgaas: changelog]
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2012-11-07 09:43:28 -07:00
Myron Stowe
15fa325beb x86/PCI: adjust section annotations for pcibios_setup()
Make pcibios_setup() consistently use the "__init" section annotation.

Signed-off-by: Myron Stowe <myron.stowe@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2012-07-05 15:09:14 -06:00
Bjorn Helgaas
0cbaa57d82 Merge branch 'topic/stratus' into next 2012-05-07 09:23:27 -06:00
Bjorn Helgaas
284f5f9dba PCI: work around Stratus ftServer broken PCIe hierarchy
A PCIe downstream port is a P2P bridge.  Its secondary interface is
a link that should lead only to device 0 (unless ARI is enabled)[1], so
we don't probe for non-zero device numbers.

Some Stratus ftServer systems have a PCIe downstream port (02:00.0) that
leads to both an upstream port (03:00.0) and a downstream port (03:01.0),
and 03:01.0 has important devices below it:

  [0000:02]-+-00.0-[03-3c]--+-00.0-[04-09]--...
                            \-01.0-[0a-0d]--+-[USB]
                                            +-[NIC]
                                            +-...

Previously, we didn't enumerate device 03:01.0, so USB and the network
didn't work.  This patch adds a DMI quirk to scan all device numbers,
not just 0, below a downstream port.

Based on a patch by Prarit Bhargava.

[1] PCIe spec r3.0, sec 7.3.1

CC: Myron Stowe <mstowe@redhat.com>
CC: Don Dutile <ddutile@redhat.com>
CC: James Paradis <james.paradis@stratus.com>
CC: Matthew Wilcox <matthew.r.wilcox@intel.com>
CC: Jesse Barnes <jbarnes@virtuousgeek.org>
CC: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2012-04-30 15:21:02 -06:00
Yinghai Lu
c57ca65a6e x86/PCI: merge pcibios_scan_root() and pci_scan_bus_on_node()
pcibios_scan_root() and pci_scan_bus_on_node() were almost identical,
so this patch merges them.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2012-04-30 14:52:43 -06:00
Bjorn Helgaas
2cd6975a4f x86/PCI: convert to pci_create_root_bus() and pci_scan_root_bus()
x86 has two kinds of PCI root bus scanning:

(1) ACPI-based, using _CRS resources.  This used pci_create_bus(), not
    pci_scan_bus(), because ACPI hotplug needed to split the
    pci_bus_add_devices() into a separate host bridge .start() method.

    This patch parses the _CRS resources earlier, so we can build a list of
    resources and pass it to pci_create_root_bus().

    Note that as before, we parse the _CRS even if we aren't going to use
    it so we can print it for debugging purposes.

(2) All other, which used either default resources (ioport_resource and
    iomem_resource) or information read from the hardware via amd_bus.c or
    similar.  This used pci_scan_bus().

    This patch converts x86_pci_root_bus_res_quirks() (previously called
    from pcibios_fixup_bus()) to x86_pci_root_bus_resources(), which builds
    a list of resources before we call pci_scan_root_bus().

    We also use x86_pci_root_bus_resources() if we have ACPI but are
    ignoring _CRS.

CC: Yinghai Lu <yinghai.lu@oracle.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2012-01-06 12:11:14 -08:00
Bjorn Helgaas
46fbade05c x86/PCI: use pci_scan_bus() instead of pci_scan_bus_parented()
This doesn't change any functionality, but it makes a subsequent patch
slightly simpler.

pci_scan_bus(NULL, ...) and pci_scan_bus_parented() are identical except
that pci_scan_bus() also calls pci_bus_add_devices():

  pci_scan_bus_parented
    pci_create_bus
    pci_scan_child_bus

  pci_scan_bus
    pci_create_bus
    pci_scan_child_bus
    pci_bus_add_devices

All callers of pcibios_scan_root() call pci_bus_add_devices() explicitly,
and we don't pass a parent device, so we might as well use pci_scan_bus().

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2012-01-06 12:11:13 -08:00
Jan Beulich
72da0b07b1 x86: constify PCI raw ops structures
As with any other such change, the goal is to prevent inadvertent
writes to these structures (assuming DEBUG_RODATA is enabled), and to
separate data (possibly frequently) written to from such never getting
modified.

Reviewed-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2011-10-14 09:05:28 -07:00
Narendra_K@Dell.com
9b373ed18f x86/PCI: Preserve existing pci=bfsort whitelist for Dell systems
Commit 6e8af08dfa enables pci=bfsort on
future Dell systems. But the identification string 'Dell System' matches
on already existing whitelist, which do not have SMBIOS type 0xB1,
causing pci=bfsort not being set on existing whitelist.

This patch fixes the regression by moving the type 0xB1 check beyond the
existing whitelist so that existing whitelist is walked before.

Signed-off-by: Shyam Iyer <shyam_iyer@dell.com>
Signed-off-by: Narendra K <narendra_k@dell.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2011-07-22 08:44:28 -07:00
Narendra_K@Dell.com
6e8af08dfa PCI: enable pci=bfsort by default on future Dell systems
This patch enables pci=bfsort by default on future Dell systems.
It reads SMBIOS type 0xB1 vendor specific record and sets pci=bfsort
accordingly.

Offset  Name    Length  Value   Description

04      Flags0  Word    Varies  Bits 9-10
                                - 10:9 = 00  Unknown
                                - 10:9 = 01  Breadth First
                                - 10:9 = 10  Depth First
                                - 10:9 = 11  Reserved

1. Any time pci=bfsort has to be enabled on a system, we need to add the
   model number of the system to the white list. With this patch, that
   is not required.

2. Typically, model number has to be added to the white list when the
   system is under development. With this change, that is not required.

Signed-off-by: Jordan Hargrave <jordan_hargrave@dell.com>
Signed-off-by: Narendra K <narendra_k@dell.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2011-01-14 08:55:41 -08:00
Alex Nixon
44de3395a4 x86/PCI: Clean up pci_cache_line_size
Separate out x86 cache_line_size initialisation code into its own
function (so it can be shared by Xen later in this patch series)

[ Impact: cleanup ]

Signed-off-by: Alex Nixon <alex.nixon@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: "H. Peter Anvin" <hpa@zytor.com>
Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: x86@kernel.org
2010-10-18 10:49:30 -04:00
Mike Habeck
7bd1c365fd x86/PCI: Add option to not assign BAR's if not already assigned
The Linux kernel assigns BARs that a BIOS did not assign, most likely
to handle broken BIOSes that didn't enumerate the devices correctly.
On UV the BIOS purposely doesn't assign I/O BARs for certain devices/
drivers we know don't use them (examples, LSI SAS, Qlogic FC, ...).
We purposely don't assign these I/O BARs because I/O Space is a very
limited resource.  There is only 64k of I/O Space, and in a PCIe
topology that space gets divided up into 4k chucks (this is due to
the fact that a pci-to-pci bridge's I/O decoder is aligned at 4k)...
Thus a system can have at most 16 cards with I/O BARs: (64k / 4k = 16)

SGI needs to scale to >16 devices with I/O BARs.  So by not assigning
I/O BARs on devices we know don't use them, we can do that (iff the
kernel doesn't go and assign these BARs that the BIOS purposely didn't
assign).

This patch will not assign a resource to a device BAR if that BAR was
not assigned by the BIOS, and the kernel cmdline option 'pci=nobar'
was specified.   This patch is closely modeled after the 'pci=norom'
option that currently exists in the tree.

Signed-off-by: Mike Habeck <habeck@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-07-30 09:29:12 -07:00
Thomas Gleixner
d19f61f098 x86/PCI: Convert pci_config_lock to raw_spinlock
pci_config_lock must be a real spinlock in preempt-rt. Convert it to
raw_spinlock. No change for !RT kernels.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-05-11 12:01:09 -07:00
Tejun Heo
5a0e3ad6af include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files.  percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed.  Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability.  As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

  http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
  only the necessary includes are there.  ie. if only gfp is used,
  gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
  blocks and try to put the new include such that its order conforms
  to its surrounding.  It's put in the include block which contains
  core kernel includes, in the same order that the rest are ordered -
  alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
  doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
  because the file doesn't have fitting include block), it prints out
  an error message indicating which .h file needs to be added to the
  file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
   over 4000 files, deleting around 700 includes and adding ~480 gfp.h
   and ~3000 slab.h inclusions.  The script emitted errors for ~400
   files.

2. Each error was manually checked.  Some didn't need the inclusion,
   some needed manual addition while adding it to implementation .h or
   embedding .c file was more appropriate for others.  This step added
   inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
   from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
   e.g. lib/decompress_*.c used malloc/free() wrappers around slab
   APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
   editing them as sprinkling gfp.h and slab.h inclusions around .h
   files could easily lead to inclusion dependency hell.  Most gfp.h
   inclusion directives were ignored as stuff from gfp.h was usually
   wildly available and often used in preprocessor macros.  Each
   slab.h inclusion directive was examined and added manually as
   necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
   were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
   distributed build env didn't work with gcov compiles) and a few
   more options had to be turned off depending on archs to make things
   build (like ipr on powerpc/64 which failed due to missing writeq).

   * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
   * powerpc and powerpc64 SMP allmodconfig
   * sparc and sparc64 SMP allmodconfig
   * ia64 SMP allmodconfig
   * s390 SMP allmodconfig
   * alpha SMP allmodconfig
   * um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
   a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-30 22:02:32 +09:00
Linus Torvalds
322aafa664 Merge branch 'x86-mrst-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-mrst-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (30 commits)
  x86, mrst: Fix whitespace breakage in apb_timer.c
  x86, mrst: Fix APB timer per cpu clockevent
  x86, mrst: Remove X86_MRST dependency on PCI_IOAPIC
  x86, olpc: Use pci subarch init for OLPC
  x86, pci: Add arch_init to x86_init abstraction
  x86, mrst: Add Kconfig dependencies for Moorestown
  x86, pci: Exclude Moorestown PCI code if CONFIG_X86_MRST=n
  x86, numaq: Make CONFIG_X86_NUMAQ depend on CONFIG_PCI
  x86, pci: Add sanity check for PCI fixed bar probing
  x86, legacy_irq: Remove duplicate vector assigment
  x86, legacy_irq: Remove left over nr_legacy_irqs
  x86, mrst: Platform clock setup code
  x86, apbt: Moorestown APB system timer driver
  x86, mrst: Add vrtc platform data setup code
  x86, mrst: Add platform timer info parsing code
  x86, mrst: Fill in PCI functions in x86_init layer
  x86, mrst: Add dummy legacy pic to platform setup
  x86/PCI: Moorestown PCI support
  x86, ioapic: Add dummy ioapic functions
  x86, ioapic: Early enable ioapic for timer irq
  ...

Fixed up semantic conflict of new clocksources due to commit
17622339af ("clocksource: add argument to resume callback").
2010-03-07 15:59:39 -08:00
Bjorn Helgaas
7bc5e3f2be x86/PCI: use host bridge _CRS info by default on 2008 and newer machines
The main benefit of using ACPI host bridge window information is that
we can do better resource allocation in systems with multiple host bridges,
e.g., http://bugzilla.kernel.org/show_bug.cgi?id=14183

Sometimes we need _CRS information even if we only have one host bridge,
e.g., https://bugs.launchpad.net/ubuntu/+source/linux/+bug/341681

Most of these systems are relatively new, so this patch turns on
"pci=use_crs" only on machines with a BIOS date of 2008 or newer.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-02-23 09:43:42 -08:00
Thomas Gleixner
b72d0db9dd x86: Move pci init function to x86_init
The PCI initialization in pci_subsys_init() is a mess. pci_numaq_init,
pci_acpi_init, pci_visws_init and pci_legacy_init are called and each
implementation checks and eventually modifies the global variable
pcibios_scanned.

x86_init functions allow us to do this more elegant. The pci.init
function pointer is preset to pci_legacy_init. numaq, acpi and visws
can modify the pointer in their early setup functions. The functions
return 0 when they did the full initialization including bus scan. A
non zero return value indicates that pci_legacy_init needs to be
called either because the selected function failed or wants the
generic bus scan in pci_legacy_init to happen (e.g. visws).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
LKML-Reference: <43F901BD926A4E43B106BF17856F07559FB80CFE@orsmsx508.amr.corp.intel.com>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-19 16:12:29 -08:00
Dave Jones
76b1a87b21 x86/PCI: Use generic cacheline sizing instead of per-vendor tests.
Instead of the PCI code needing to have code to determine the
cacheline size of each processor, use the data the cpu identification
code should have already determined during early boot.

(The vendor checks are also incomplete, and don't take into account
 modern CPUs)

I've been carrying a variant of this code in Fedora for a while,
that prints debug information.  There are a number of cases where we
are currently setting the PCI cacheline size to 32 bytes, when the CPU
cacheline size is 64 bytes.  With this patch, we set them both the same.

Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-04 08:47:12 -08:00
Jesse Barnes
ac1aa47b13 PCI: determine CLS more intelligently
Till now, CLS has been determined either by arch code or as
L1_CACHE_BYTES.  Only x86 and ia64 set CLS explicitly and x86 doesn't
always get it right.  On most configurations, the chance is that
firmware configures the correct value during boot.

This patch makes pci_init() determine CLS by looking at what firmware
has configured.  It scans all devices and if all non-zero values
agree, the value is used.  If none is configured or there is a
disagreement, pci_dfl_cache_line_size is used.  arch can set the dfl
value (via PCI_CACHE_LINE_BYTES or pci_dfl_cache_line_size) or
override the actual one.

ia64, x86 and sparc64 updated to set the default cls instead of the
actual one.

While at it, declare pci_cache_line_size and pci_dfl_cache_line_size
in pci.h and drop private declarations from arch code.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: David Miller <davem@davemloft.net>
Acked-by: Greg KH <gregkh@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-11-04 08:47:10 -08:00
Jesse Barnes
76baeebf7d x86/PCI: make 32 bit NUMA node array int, not unsigned char
We use -1 to indicate no node affinity, so we need a signed type here or
all sorts of bad things happen, like crashes in dev_attr_show as
reported by Ingo:

[  158.058140] warning: `dbus-daemon' uses 32-bit capabilities (legacy support in use)
[  159.370562] BUG: unable to handle kernel NULL pointer dereference at (null)
[  159.372694] IP: [<ffffffff8143b722>] bitmap_scnprintf+0x72/0xd0
[  159.372694] PGD 71d3e067 PUD 7052e067 PMD 0
[  159.372694] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
[  159.372694] last sysfs file: /sys/devices/pci0000:00/0000:00:01.0/local_cpus
[  159.372694] CPU 0
[  159.372694] Pid: 7364, comm: irqbalance Not tainted 2.6.31-tip #8043 System Product Name
[  159.372694] RIP: 0010:[<ffffffff8143b722>]  [<ffffffff8143b722>] bitmap_scnprintf+0x72/0xd0
[  159.372694] RSP: 0018:ffff8800712a1e38  EFLAGS: 00010246
[  159.372694] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[  159.372694] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff880077dc5000
[  159.372694] RBP: ffff8800712a1e68 R08: 0000000000000001 R09: 0000000000000001
[  159.372694] R10: ffffffff8215c47c R11: 0000000000000000 R12: 0000000000000000
[  159.372694] R13: 0000000000000000 R14: 0000000000000ffe R15: ffff880077dc5000
[  159.372694] FS:  00007f5f578f76f0(0000) GS:ffff880007000000(0000) knlGS:0000000000000000
[  159.372694] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  159.372694] CR2: 0000000000000000 CR3: 0000000071a77000 CR4: 00000000000006f0
[  159.372694] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  159.372694] DR3: ffffffff835109dc DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  159.372694] Process irqbalance (pid: 7364, threadinfo ffff8800712a0000, task ffff880070773000)
[  159.372694] Stack:
[  159.372694]  2222222222222222 ffff880077dc5000 fffffffffffffffb ffff88007d366b40
[  159.372694] <0> ffff8800712a1f48 ffff88007d3840a0 ffff8800712a1e88 ffffffff8146332b
[  159.372694] <0> fffffffffffffff4 ffffffff82450718 ffff8800712a1ea8 ffffffff815a9a1f
[  159.372694] Call Trace:
[  159.372694]  [<ffffffff8146332b>] local_cpus_show+0x3b/0x60
[  159.372694]  [<ffffffff815a9a1f>] dev_attr_show+0x2f/0x60
[  159.372694]  [<ffffffff8118ee6f>] sysfs_read_file+0xbf/0x1d0
[  159.372694]  [<ffffffff8112afe9>] vfs_read+0xc9/0x180
[  159.372694]  [<ffffffff8112c365>] sys_read+0x55/0x90
[  159.372694]  [<ffffffff810114f2>] system_call_fastpath+0x16/0x1b
[  159.372694] Code: 41 b9 01 00 00 00 44 8d 46 03 49 63 fc 0f 49 d3 c1 f8 1f 4c 01 ff c1 e8 1a c1 fa 06 41 c1 e8 02 8d 0c 03 48 63 d2 83 e1 3f 29 c1 <49> 8b 44 d5 00 48 c7 c2 8c 37 16 82 48 d3 e8 89 f1 44 89 f6 49
[  159.372694] RIP  [<ffffffff8143b722>] bitmap_scnprintf+0x72/0xd0
[  159.372694]  RSP <ffff8800712a1e38>
[  159.372694] CR2: 0000000000000000
[  159.600828] ---[ end trace 35550c356e84e60c ]---

Reported-by: Ingo Molnar <mingo@elte.hu>
Tested-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-09-18 09:13:57 -07:00
Jesse Barnes
2547089ca2 x86/PCI: initialize PCI bus node numbers early
The current mp_bus_to_node array is initialized only by AMD specific
code, since AMD platforms have registers that can be used for
determining mode numbers.  On new Intel platforms it's necessary to
initialize this array as well though, otherwise all PCI node numbers
will be 0, when in fact they should be -1 (indicating that I/O isn't
tied to any particular node).

So move the mp_bus_to_node code into the common PCI code, and
initialize it early with a default value of -1.  This may be overridden
later by arch code (e.g. the AMD code).

With this change, PCI consistent memory and other node specific
allocations (e.g. skbuff allocs) should occur on the "current" node.
If, for performance reasons, applications want to be bound to specific
nodes, they should open their devices only after being pinned to the
CPU where they'll run, for maximum locality.

Acked-by: Yinghai Lu <yinghai@kernel.org>
Tested-by: Jesse Brandeburg <jesse.brandeburg@gmail.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-09-09 13:29:21 -07:00
Linus Torvalds
236e946b53 Revert "PCI: use ACPI _CRS data by default"
This reverts commit 9e9f46c44e.

Quoting from the commit message:

 "At this point, it seems to solve more problems than it causes, so let's
  try using it by default.  It's an easy revert if it ends up causing
  trouble."

And guess what? The _CRS code causes trouble.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-24 16:23:03 -07:00
Jesse Barnes
9e9f46c44e PCI: use ACPI _CRS data by default
At this point, it seems to solve more problems than it causes, so let's try using it by default.  It's an easy revert if it ends up causing trouble.

Reviewed-by: Yinghai Lu <yhlu.kernel@gmail.com>
Acked-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-06-11 12:04:17 -07:00
Yinghai Lu
0e94ecd098 x86/PCI: set_pci_bus_resources_arch_default cleanups
Rename set_pci_bus_resources_arch_default to x86_pci_root_bus_res_quirks, move
the weak version from common.c to i386.c, and before calling, make sure it's a
root bus.

Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-04-22 14:47:46 -07:00
Matthew Wilcox
0bb1be3e30 x86/PCI: Move set_pci_bus_resources_arch_default into arch/x86
Commit 30a18d6c3f introduced a new
function to set the PCI bus resources.  Unfortunately, neither the
author, nor the committers seemed to know that we already have somewhere
to do that -- pcibios_fixup_bus().  This patch moves the hook (used only
by the K8 code) into x86-specific code where it should have been in the
first place.

Cc: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-04-22 13:57:36 -07:00
Jan Beulich
821508d4ef x86: move a few device initialization objects into .devinit.rodata
Impact: debuggability and micro-optimization

Putting whatever is possible into the (final) .rodata section increases
the likelihood of catching memory corruption bugs early, and reduces
false cache line sharing.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <49B909A5.76E4.0078.0@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-12 13:12:19 +01:00
Rafael J. Wysocki
16cf0ebc35 x86/PCI: Do not use interrupt links for devices using MSI-X
pcibios_enable_device() and pcibios_disable_device() don't handle
IRQs for devices that have MSI enabled and it should treat the
devices with MSI-X enabled in the same way.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-01-07 11:13:25 -08:00
Andrew Patterson
0ef5f8f615 ACPI/PCI: PCI extended config _OSC support called when root bridge added
The _OSC capability OSC_EXT_PCI_CONFIG_SUPPORT is set when the root
bridge is added with pci_acpi_osc_support() if we can access PCI
extended config space.

This adds the function pci_ext_cfg_avail which returns true if we can
access PCI extended config space (offset greater than 0xff). It
currently only returns false if arch=x86 and raw_pci_ext_ops is not set
(which might happen if pci=nommcfg is set on the kernel command-line).

Signed-off-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-01-07 11:12:28 -08:00
Jaswinder Singh Rajput
824877111c x86, pci: move arch/x86/pci/pci.h to arch/x86/include/asm/pci_x86.h
Impact: cleanup

Now that arch/x86/pci/pci.h is used in a number of other places as well,
move the lowlevel x86 pci definitions into the architecture include files.
(not to be confused with the existing arch/x86/include/asm/pci.h file,
which provides public details about x86 PCI)

Tested on: X86_32_UP, X86_32_SMP and X86_64_SMP

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-29 18:17:36 +01:00
Stefan Assmann
41b9eb264c x86, pci: introduce config option for pci reroute quirks (was: [PATCH 0/3] Boot IRQ quirks for Broadcom and AMD/ATI)
This is against linux-2.6-tip, branch pci-ioapic-boot-irq-quirks.

From: Stefan Assmann <sassmann@suse.de>
Subject: Introduce config option for pci reroute quirks

The config option X86_REROUTE_FOR_BROKEN_BOOT_IRQS is introduced to
enable (or disable) the redirection of the interrupt handler to the boot
interrupt line by default. Depending on the existence of interrupt
masking / threaded interrupt handling in the kernel (vanilla, rt, ...)
and the maturity of the rerouting patch, users can enable or disable the
redirection by default.

This means that the reroute quirk can be applied to any kernel without
changing it.

Interrupt sharing could be increased if this option is enabled. However this
option is vital for threaded interrupt handling, as done by the RT kernel.
It should simplify the consolidation with the RT kernel.

The option can be overridden by either pci=ioapicreroute or
pci=noioapicreroute.

Signed-off-by: Stefan Assmann <sassmann@suse.de>
Signed-off-by: Olaf Dabrunz <od@suse.de>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Jon Masters <jonathan@jonmasters.org>
Cc: Ihno Krumreich <ihno@suse.de>
Cc: Sven Dietrich <sdietrich@suse.de>
Cc: Daniel Gollub <dgollub@suse.de>
Cc: Felix Foerster <ffoerster@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-18 19:31:19 +02:00
Ingo Molnar
3e370b29d3 Merge branch 'linus' into x86/pci-ioapic-boot-irq-quirks
Conflicts:

	drivers/pci/quirks.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-18 19:31:12 +02:00
Linus Torvalds
dc7c65db28 Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (72 commits)
  Revert "x86/PCI: ACPI based PCI gap calculation"
  PCI: remove unnecessary volatile in PCIe hotplug struct controller
  x86/PCI: ACPI based PCI gap calculation
  PCI: include linux/pm_wakeup.h for device_set_wakeup_capable
  PCI PM: Fix pci_prepare_to_sleep
  x86/PCI: Fix PCI config space for domains > 0
  Fix acpi_pm_device_sleep_wake() by providing a stub for CONFIG_PM_SLEEP=n
  PCI: Simplify PCI device PM code
  PCI PM: Introduce pci_prepare_to_sleep and pci_back_from_sleep
  PCI ACPI: Rework PCI handling of wake-up
  ACPI: Introduce new device wakeup flag 'prepared'
  ACPI: Introduce acpi_device_sleep_wake function
  PCI: rework pci_set_power_state function to call platform first
  PCI: Introduce platform_pci_power_manageable function
  ACPI: Introduce acpi_bus_power_manageable function
  PCI: make pci_name use dev_name
  PCI: handle pci_name() being const
  PCI: add stub for pci_set_consistent_dma_mask()
  PCI: remove unused arch pcibios_update_resource() functions
  PCI: fix pci_setup_device()'s sprinting into a const buffer
  ...

Fixed up conflicts in various files (arch/x86/kernel/setup_64.c,
arch/x86/pci/irq.c, arch/x86/pci/pci.h, drivers/acpi/sleep/main.c,
drivers/pci/pci.c, drivers/pci/pci.h, include/acpi/acpi_bus.h) from x86
and ACPI updates manually.
2008-07-16 17:25:46 -07:00
Matthew Wilcox
beef3129b3 x86/PCI: Fix PCI config space for domains > 0
John Keller reports that PCI config space access is broken on machines
with more than one domain.  conf1 accesses only work for domain 0, so make sure
we check the domain number in the raw routines before trying conf1.

Reported-by: John Keller <jpk@sgi.com>
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2008-07-14 14:23:28 -07:00
Ingo Molnar
dbbcfb2211 Merge branch 'linus' into x86/pci-ioapic-boot-irq-quirks
Conflicts:

	arch/x86/mm/ioremap.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-13 22:52:27 +02:00
Robert Richter
8dd779b19c x86/pci: removing subsys_initcall ordering dependencies
So far subsys_initcalls has been executed in this order depending on
the object order in the Makefile:

arch/x86/pci/visws.c:subsys_initcall(pcibios_init);
arch/x86/pci/numa.c:subsys_initcall(pci_numa_init);
arch/x86/pci/acpi.c:subsys_initcall(pci_acpi_init);
arch/x86/pci/legacy.c:subsys_initcall(pci_legacy_init);
arch/x86/pci/irq.c:subsys_initcall(pcibios_irq_init);
arch/x86/pci/common.c:subsys_initcall(pcibios_init);

This patch removes the ordering dependency. There is now only one
subsys_initcall function that contains subsystem initialization code
with a defined order.

Signed-off-by: Robert Richter <robert.richter@amd.com>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-09 11:45:03 +02:00
Stefan Assmann
9197979b51 x86, pci: introduce pci=ioapicreroute kernel cmdline option
Introduce pci=ioapicreroute kernel cmdline option to enable rerouting of boot
interrupts to the primary io-apic.

Signed-off-by: Stefan Assmann <sassmann@suse.de>
Signed-off-by: Olaf Dabrunz <od@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 17:50:51 +02:00
Stefan Assmann
a9322f6488 x86, pci: introduce pci=noioapicquirk kernel cmdline option
Introduce pci=noioapicquirk kernel cmdline option to disable all boot
interrupt quirks

Signed-off-by: Stefan Assmann <sassmann@suse.de>
Signed-off-by: Olaf Dabrunz <od@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08 17:50:49 +02:00
Jesse Barnes
739db07f82 Revert "PCI: Correct last two HP entries in the bfsort whitelist"
This reverts commit a167607255.  It duplicates
the change from 8d64c781f0 and only one should be
applied, otherwise some of the Dell quirks are lost.

Thanks to Tony Camuso for catching this.

Acked-by: Tony Camuso <tcamuso@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2008-07-07 11:31:53 -07:00
Yinghai Lu
e3f2baebf4 PCI/x86: early dump pci conf space v2
Allows us to dump PCI space before any kernel changes have been made.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2008-06-10 10:59:52 -07:00
Gary Hade
bb71ad8802 PCI: boot parameter to avoid expansion ROM memory allocation
Contention for scarce PCI memory resources has been growing
due to an increasing number of PCI slots in large multi-node
systems.  The kernel currently attempts by default to
allocate memory for all PCI expansion ROMs so there has
also been an increasing number of PCI memory allocation
failures seen on these systems.  This occurs because the
BIOS either (1) provides insufficient PCI memory resource
for all the expansion ROMs or (2) provides adequate PCI
memory resource for expansion ROMs but provides the
space in kernel unexpected BIOS assigned P2P non-prefetch
windows.

The resulting PCI memory allocation failures may be benign
when related to memory requests for expansion ROMs themselves
but in some cases they can occur when attempting to allocate
space for more critical BARs.  This can happen when a successful
expansion ROM allocation request consumes memory resource
that was intended for a non-ROM BAR.  We have seen this
happen during PCI hotplug of an adapter that contains a
P2P bridge where successful memory allocation for an
expansion ROM BAR on device behind the bridge consumed
memory that was intended for a non-ROM BAR on the P2P bridge.
In all cases the allocation failure messages can be very
confusing for users.

This patch provides a new 'pci=norom' kernel boot parameter
that can be used to disable the default PCI expansion ROM memory
resource allocation.  This provides a way to avoid the above
described issues on systems that do not contain PCI devices
for which drivers or user-level applications depend on the
default PCI expansion ROM memory resource allocation behavior.

Signed-off-by: Gary Hade <garyhade@us.ibm.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2008-06-10 10:59:50 -07:00