The Alternate Routing-ID Interpretation capability allows a single device
to have up to 256 functions. They can be populated sparsely, so the
current technique of scanning every eighth function is not guaranteed
to find them all. By introducing a 'next_fn' function pointer, we can
use the linked list of functions in the ARI capability to scan all the
functions which exist.
We can then speed up the pci_scan_slot by skipping the scan of subsequent
devfns for PCIe devices which are the direct children of Root Ports or
Downstream Ports. These devices are only permitted to implement device
0, unless they are ARI devices, in which case they'll be scanned by the
ARI code above.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
We are missing these when building the pci_dev from scratch off
the Open Firmware device-tree
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Commit ae21ee65e8 "PCI: acs p2p upsteram
forwarding enabling" doesn't actually enable ACS.
Add a function to pci core to allow an IOMMU to request that ACS
be enabled. The existing mechanism of using iommu_found() in the pci
core to know when ACS should be enabled doesn't actually work due to
initialization order; iommu has only been detected not initialized.
Have Intel and AMD IOMMUs request ACS, and Xen does as well during early
init of dom0.
Cc: Allen Kay <allen.m.kay@intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Use pcie_cap() instead of pci_find_capability() to get PCIe capability
offset in PCI core code. This avoids unnecessary search in PCI
configuration space.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
There are a lot of codes that searches PCI express capability offset
in the PCI configuration space using pci_find_capability(). Caching it
in the struct pci_dev will reduce unncecessary search. This patch adds
an additional 'pcie_cap' fields into struct pci_dev, which is
initialized at pci device scan time (in set_pcie_port_type()).
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This makes PCI resource management messages more consistent and adds a few
new messages to aid debugging.
Whenever we assign resources to a device, update a BAR, or change a
bridge aperture, it's worth noting it.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Since we have a struct device, we might as well use dev_printk. Note that
both pr_debug() and dev_dbg() are completely compiled out unless DEBUG or
DYNAMIC_DEBUG is defined.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Jesse accidentally applied v1 [1] of the patchset instead of v2 [2]. This
is the diff between v1 and v2.
The changes in this patch are:
- tidied vsprintf stack buffer to shrink and compute size more
accurately
- use %pR for decoding and %pr for "raw" (with type and flags) instead
of adding %pRt and %pRf
[1] http://lkml.org/lkml/2009/10/6/491
[2] http://lkml.org/lkml/2009/10/13/441
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Change to populate the subsystem vendor and subsytem device IDs for
PCI-PCI bridges that implement the PCI Subsystem Vendor ID capability.
Previously bridges left subsystem vendor IDs unpopulated.
Signed-off-by: Gabe Black <gabe.black@ni.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Feedback from Hidetoshi Seto and Kenji Kaneshige incorporated. This
correctly handles PCI-X bridges, PCIe root ports and endpoints, and
prints debug messages when invalid/reserved types are found in the
HEST. PCI devices not in domain/segment 0 are not represented in
HEST, thus will be ignored.
Today, the PCIe Advanced Error Reporting (AER) driver attaches itself
to every PCIe root port for which BIOS reports it should, via ACPI
_OSC.
However, _OSC alone is insufficient for newer BIOSes. Part of ACPI
4.0 is the new APEI (ACPI Platform Error Interfaces) which is a way
for OS and BIOS to handshake over which errors for which components
each will handle. One table in ACPI 4.0 is the Hardware Error Source
Table (HEST), where BIOS can define that errors for certain PCIe
devices (or all devices), should be handled by BIOS ("Firmware First
mode"), rather than be handled by the OS.
Dell PowerEdge 11G server BIOS defines Firmware First mode in HEST, so
that it may manage such errors, log them to the System Event Log, and
possibly take other actions. The aer driver should honor this, and
not attach itself to devices noted as such.
Furthermore, Kenji Kaneshige reminded us to disallow changing the AER
registers when respecting Firmware First mode. Platform firmware is
expected to manage these, and if changes to them are allowed, it could
break that firmware's behavior.
The HEST parsing code may be replaced in the future by a more
feature-rich implementation. This patch provides the minimum needed
to prevent breakage until that implementation is available.
Reviewed-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
When probing for ROM BAR size, we should not change bits 1:10 in this
BAR, because these bits are marked as "reserved for future use" in PCI
spec, so changing them might have side effects.
No such issue for I/O or memory, as there is an implementation note in
PCI spec which explicitly allows writing 0xfffffffff there.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This patch is predicated on Jeremy's patch in include/xen/xen.h. It'll
prevent ACS init unless the platform has both an IOMMU and we're running
as dom0.
Signed-off-by: Allen Kay <allen.m.kay@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Note: dom0 checking in v4 has been separated out into 2/2.
This patch enables P2P upstream forwarding in ACS capable PCIe switches.
It solves two potential problems in virtualization environment where a PCIe
device is assigned to a guest domain using a HW iommu such as VT-d:
1) Unintentional failure caused by guest physical address programmed
into the device's DMA that happens to match the memory address range
of other downstream ports in the same PCIe switch. This causes the PCI
transaction to go to the matching downstream port instead of go to the
root complex to get translated by VT-d as it should be.
2) Malicious guest software intentionally attacks another downstream
PCIe device by programming the DMA address into the assigned device
that matches memory address range of the downstream PCIe port.
We are in process of implementing device filtering software in KVM/XEN
management software to allow device assignment of PCIe devices behind a PCIe
switch only if it has ACS capability and with the P2P upstream forwarding bits
enabled. This patch is intended to work for both KVM and Xen environments.
Signed-off-by: Allen Kay <allen.m.kay@intel.com>
Reviewed-by: Mathew Wilcox <willy@linux.intel.com>
Reviewed-by: Chris Wright <chris@sous-sol.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Some PCI devices fail if their standard configuration registers are
restored twice in a row. Prevent this from happening by making
pci_restore_state() clear the saved_state flag of the device right
after the device's standard configuration registers have been
populated with the previously saved values.
Simplify PCI PM callbacks by removing the direct clearing of
state_saved from them, as it shouldn't be necessary any more (except
in pci_pm_thaw(), where it has to be cleared, so that the values saved
during the "freeze" phase of hibernation are not used later by mistake).
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
In general a BIOS may goof or we may hotplug in a hotplug controller.
In either case the kernel needs to reserve resources for plugging
in more devices in the future instead of creating a minimal resource
assignment.
We already do this for cardbus bridges I am just adding a variant
for pcie bridges.
v2: Make testing for pcie hotplug bridges based on a flag.
So far we only set the flag for pcie but a header_quirk
could easily be added for the non-standard pci hotplug
bridges.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
We already print it out for pci bridges, so also print it out for pci devices.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This was #define'd as 0 on all platforms, so let's get rid of it.
This change makes pci_scan_slot() slightly easier to read.
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Tony Luck <tony.luck@intel.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
Acked-by: Russell King <linux@arm.linux.org.uk>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Kyle McMartin <kyle@mcmartin.ca>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Use pci_is_root_bus() in pci_read_bridge_bases() to check if the pci
bus is root, for code consistency.
Reviewed-by: Alex Chiang <achiang@hp.com>
Reviewed-by: Grant Grundler <grundler@parisc-linux.org>
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
We should not assign 64bit ranges to PCI devices that only take 32bit
prefetchable addresses.
Try to set IORESOURCE_MEM_64 in 64bit resource of pci_device/pci_bridge
and make the bus resource only have that bit set when all devices under
it support 64bit prefetchable memory. Use that flag to allocate
resources from that range.
Reported-by: Yannick <yannick.roehlly@free.fr>
Reviewed-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
The device class may be changed after the fixup, so re-read the class
value from pci_dev when configuring the device. Otherwise some devices
such as JMicron SATA controller won't work.
Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
Reviewed-by: Grant Grundler <grundler@parisc-linux.org>
Tested-by: Marc Dionne <marc.c.dionne@gmail.com>
Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Commit 30a18d6c3f introduced a new
function to set the PCI bus resources. Unfortunately, neither the
author, nor the committers seemed to know that we already have somewhere
to do that -- pcibios_fixup_bus(). This patch moves the hook (used only
by the K8 code) into x86-specific code where it should have been in the
first place.
Cc: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
pci_rescan_bus was annotated as __devinit, which is wrong,
because it will never be part of device initialization.
Howevever, we can't simply drop the annotation, because then we
get section warnings about calling pci_scan_child_bus (which is
correctly marked as __devinit).
pci_rescan_bus will only get built when CONFIG_HOTPLUG is set,
meaning that __devinit is a nop, so we know that pci_scan_child_bus
has not been freed.
Annotate as __ref to silence modpost.
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
New pci_cfg_space_size() needs invalid pdev->class, put it in the
right place in the pci_setup_device().
Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This interface allows the user to force a rescan of all PCI buses
in system, and rediscover devices that have been removed earlier.
pci_bus_attrs implementation from Trent Piepho.
Thanks to Vegard Nossum for discovering locking issues with the
sysfs interface.
Cc: Trent Piepho <xyzzy@speakeasy.org>
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This API is used by the PCI core to rescan a bus and rediscover
newly added devices.
Over time, it is expected that the various PCI hotplug drivers
will migrate to this interface and away from the old
pci_do_scan_bus() interface.
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
While scanning bridges, we stop our scan if we encounter a bus
that we've seen before, to work around some buggy chipsets. This
is a good idea, but prevents us from fully scanning the PCI bus
at a future time (to find newly hot-added devices, for example).
Change the logic so that we skip _re-adding_ an existing bus
that we've seen before, but also allow the scan to descend to
all child buses.
Now that we're potentially scanning our child buses again, we
also need to be sure not to attempt re-initializing their BARs
so we avoid that.
This patch lays the groundwork to allow the user to issue a
rescan of the PCI bus at any time.
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
pci_scan_slot() has been rewritten to be less complex and will now
return the number of *new* devices found.
Existing callers need not worry because they already assume that
they can't call pci_scan_slot() on an already-scanned slot.
Thus, there is no semantic change for existing callers: returning
newly found devices (this patch) is exactly equal to returning all
found devices (before this patch).
This patch adds some more groundwork to allow us to rescan the
PCI bus during runtime to discover newly added devices.
Signed-off-by: Trent Piepho <xyzzy@speakeasy.org>
Reviewed-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
pci_scan_single_device is supposed to add newly discovered
devices to pci_bus->devices, but doesn't check to see if the
device has already been added. This can cause problems if we ever
want to use this interface to rescan the PCI bus.
If the device is already added, just return it.
Signed-off-by: Trent Piepho <xyzzy@speakeasy.org>
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Move the device setup stuff into pci_setup_device() which will be used
to setup the Virtual Function later.
Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Reserve the bus number range used by the Virtual Function when
pcibios_assign_all_busses() returns true.
Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
If a device has the SR-IOV capability, initialize it (set the ARI
Capable Hierarchy in the lowest numbered PF if necessary; calculate
the System Page Size for the VF MMIO, probe the VF Offset, Stride
and BARs). A lock for the VF bus allocation is also initialized if
a PF is the lowest numbered PF.
Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Many host bridges support a 4k config space, so check them directy
instead of using quirks to add them.
We only need to do this extra check for host bridges at this point,
because only host bridges are known to have extended address space
without also having a PCI-X/PCI-E caps. Other devices with this
property could be done with quirks (if there are any).
As a bonus, we can remove the quirks for AMD host bridges with family
10h and 11h since they're not needed any more.
With this patch, we can get correct pci cfg size of new Intel CPUs/IOHs
with host bridges.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
Cc: <stable@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Current pci_read_bridge_bases() has an assumption that pci_bus->self
is NULL on the pci root bus (It checks pci_bus->self to see if the pci
bus is root bus). But is might not true on some platforms. We must
check pci_bus->parent instead.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
* 'cpus4096-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
[IA64] fix typo in cpumask_of_pcibus()
x86: fix x86_32 builds for summit and es7000 arch's
cpumask: use work_on_cpu in acpi-cpufreq.c for read_measured_perf_ctrs
cpumask: use work_on_cpu in acpi-cpufreq.c for drv_read and drv_write
cpumask: use cpumask_var_t in acpi-cpufreq.c
cpumask: use work_on_cpu in acpi/cstate.c
cpumask: convert struct cpufreq_policy to cpumask_var_t
cpumask: replace CPUMASK_ALLOC etc with cpumask_var_t
x86: cleanup remaining cpumask_t ops in smpboot code
cpumask: update pci_bus_show_cpuaffinity to use new cpumask API
cpumask: update local_cpus_show to use new cpumask API
ia64: cpumask fix for is_affinity_mask_valid()
When PCI devices are initialized, we check whether they support PCI PM
caps and set the device can_wakeup flag if so. However, some devices
may have platform provided wakeup events rather than PCI PME signals, so
we need to set can_wakeup in that case too. Doing so should allow
wakeups from many more devices, especially on cost constrained systems.
Reported-by: Alan Stern <stern@rowland.harvard.edu>
Tested-by: Joseph Chan <JosephChan@via.com.tw>
Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Allow pci_alloc_child_bus() to allocate buses without bridge devices.
Some SR-IOV devices can occupy more than one bus number, but there is no
explicit bridges because that have internal routing mechanism.
Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Export __pci_read_base() so it can be used by whole PCI subsystem.
Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This patch moves all definitions of the PCI resource names to an 'enum',
and also replaces some hard-coded resource variables with symbol
names. This change eases introduction of device specific resources.
Reviewed-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Since interrupts will soon be disabled at PCI resume time, we need to
pre-allocate memory to save/restore PCI config space (or use GFP_ATOMIC,
but this is safer).
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This patch is part of a larger patch series which will remove
the "char bus_id[20]" name string from struct device. The device
name is managed in the kobject anyway, and without any size
limitation, and just needlessly copied into "struct device".
To set and read the device name dev_name(dev) and dev_set_name(dev)
must be used. If your code uses static kobjects, which it shouldn't
do, "const char *init_name" can be used to statically provide the
name the registered device should have. At registration time, the
init_name field is cleared, to enforce the use of dev_name(dev) to
access the device name at a later time.
We need to get rid of all occurrences of bus_id in the entire tree
to be able to enable the new interface. Please apply this patch,
and possibly convert any remaining remaining occurrences of bus_id.
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-Off-By: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Impact: use new cpumask API to reduce stack usage
Replace the local cpumask_t variable with a pointer to the
const cpumask that needs to be printed.
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: change calling convention of existing cpumask APIs
Most cpumask functions started with cpus_: these have been replaced by
cpumask_ ones which take struct cpumask pointers as expected.
These four functions don't have good replacement names; fortunately
they're rarely used, so we just change them over.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: paulus@samba.org
Cc: mingo@redhat.com
Cc: tony.luck@intel.com
Cc: ralf@linux-mips.org
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: cl@linux-foundation.org
Cc: srostedt@redhat.com
This cleanup removes the resource assignment in pci_read_bridge_bases()
since it has taken care by pci_alloc_child_bus() when allocating the bus:
/* Set up default resource pointers and names.. */
for (i = 0; i < PCI_BRIDGE_RES_NUM; i++) {
child->resource[i] = &bridge->resource[PCI_BRIDGE_RESOURCES+i];
child->resource[i]->name = child->name;
}
Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Some firmware fail to properly configure P2P bridges, leaving them
with invalid bus numbers. In some cases, this happens on some embedded
4xx boards as the result of the kernel allocating different bus space
than the firmware does to host bridges while not setting
pcibios_assign_all_busses() for various reasons. In other cases, it can
just be bogus firmware.
This adds some sanity checking to the PCI probing code. If a bridge is
found whose primary bus number doesn't match the bus it's sitting on,
or whose secondary bus number not strictly above it's primary bus
number, then the bridge bus numbers are deconfigured in the first pass
of pci_scan_bridge() to be re-assigned in the second pass.
Tested-by: "Ayman El-Khashab" <AymanE@tanisys.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This adds the ability to mmap legacy IO space to the legacy_io files
in sysfs on platforms that support it. This will allow to clean up
X to use this instead of /dev/mem for legacy IO accesses such as
those performed by Int10.
While at it I moved pci_create/remove_legacy_files() to pci-sysfs.c
where I think they belong, thus making more things statis in there
and cleaned up some spurrious prototypes in the ia64 pci.h file
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This patch uniformizes PCI probing debug boot messages with dev_printk()
intead of manual printk()
It changes adress range output from [%llx, %llx] to [%#llx-%#llx], like
in pci_request_region().
For example, it goes from the mixed-style:
PCI: 0000:00:1b.0 reg 10 64bit mmio: [f4280000, f4283fff]
pci 0000:00:1b.0: PME# supported from D0 D3hot D3cold
to uniform:
pci 0000:00:1b.0: reg 10 64bit mmio: [0xf4280000-0xf4283fff]
pci 0000:00:1b.0: PME# supported from D0 D3hot D3cold
This patch has been runtime tested, boot log messages diffed, everything
looks OK.
Acked-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Vincent Legoll <vincent.legoll@gmail.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This patch adds support for PCI Express Alternative Routing-ID
Interpretation (ARI) capability.
The ARI capability extends the Function Number field of the PCI Express
Endpoint by reusing the Device Number which is otherwise hardwired to 0.
With ARI, an Endpoint can have up to 256 functions.
Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This patch centralizes the initialization and release functions of
various PCI capabilities in probe.c, which makes the introduction
of new capability support functions cleaner in the future.
Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Since patch 6ac665c63d my infiniband
controller hasn't worked. This is because it has 64-bit prefetchable
memory, which was mistakenly being taken to be 32-bit memory. The
resource flags in this case are PCI_BASE_ADDRESS_MEM_TYPE_64 |
PCI_BASE_ADDRESS_MEM_PREFETCH.
This patch checks only for the PCI_BASE_ADDRESS_MEM_TYPE_64 bit; thus
whether the region is prefetchable or not is ignored. This fixes my
Infiniband.
Reviewed-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: Peter Chubb <peterc@gelato.unsw.edu.au>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This is a cleanup that changes all PCI configuration space size
representations to the macros (PCI_CFG_SPACE_SIZE and
PCI_CFG_SPACE_EXP_SIZE). And the macros are also moved from
drivers/pci/probe.c to drivers/pci/pci.h.
Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
The introduction of struct pci_slot (f46753c5e3)
added a struct pci_slot pointer to struct pci_dev, but we forgot to
associate the two.
Connect the two structs together; the interesting portions of the object
lifetimes are:
- when a new pci_slot is created, connect it to the appropriate
pci_dev's. A single pci_slot may be associated with multiple
pci_dev's, e.g. any multi-function PCI device.
- when a pci_slot is released, look for all the pci_dev's it was
associated with, and set their pci_slot pointers to NULL
- when a pci_dev is created, look for slots to associate with.
Note -- when a pci_dev is released, we don't need to do any bookkeeping,
since pci_slot's do not have pointers to pci_dev's.
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Use "[%04x:%04x]" for PCI vendor/device IDs to follow the format
used by lspci(8).
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This converts things in drivers/pci to use %pR to printout the
content of a struct resource instead of hand-casted %llx or
other variants.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The PCI core wants to reorder the devices in the bus list. So move this
functionality out of the pci core and into the driver core so that
anyone else can also do this if needed. This also lets us change how
struct device is attached to drivers in the future without messing with
the PCI core.
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Print out for device BAR values before the kernel tries to update them.
Also make related output use KERN_DEBUG.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
The cleaned up resource code in probe.c introduced some warnings:
drivers/pci/probe.c: In function 'pci_read_bridge_bases':
drivers/pci/probe.c:386: warning: format '%llx' expects type 'long long unsigned int', but argument 3 has type 'resource_size_t'
drivers/pci/probe.c:386: warning: format '%llx' expects type 'long long unsigned int', but argument 4 has type 'resource_size_t'
drivers/pci/probe.c:398: warning: format '%llx' expects type 'long long unsigned int', but argument 3 has type 'resource_size_t'
drivers/pci/probe.c:398: warning: format '%llx' expects type 'long long unsigned int', but argument 4 has type 'resource_size_t'
drivers/pci/probe.c:434: warning: format '%llx' expects type 'long long unsigned int', but argument 4 has type 'resource_size_t'
drivers/pci/probe.c:434: warning: format '%llx' expects type 'long long unsigned int', but argument 5 has type 'resource_size_t'
So fix them up.
Signed-off-by: Johann Felix Soden <johfel@users.sourceforge.net>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Check the return value of device_create_bin_file in pci_create_bus and
unwind if necessary. Don't propagate error to caller, as failure to create
these files shouldn't prevent PCI from being initialised. Instead, just
log a warning.
Cc: Sven Wegener <sven.wegener@stealer.net>
Cc: Michael Ellerman <michael@ellerman.id.au>
Cc: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Disable ASPM on pre-1.1 PCIe devices, as many of them don't implement it
correctly.
Tested-by: Jack Howarth <howarth@bromo.msbb.uc.edu>
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
If the kernel is configured to support 64-bit resources on a 32-bit
machine, we can support 64-bit BARs properly. Just change the condition
to check sizeof(resource_size_t) instead of BITS_PER_LONG.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Factor out the code to read one BAR from the loop in pci_read_bases into
a new function, __pci_read_base. The new code is slightly more
readable, better commented and removes the ifdef.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
* Introduce function acpi_pm_device_sleep_wake() for enabling and
disabling the system wake-up capability of devices that are power
manageable by ACPI.
* Introduce function acpi_bus_can_wakeup() allowing other (dependent)
subsystems to check if ACPI is able to enable the system wake-up
capability of given device.
* Introduce callback .sleep_wake() in struct pci_platform_pm_ops and
for the ACPI PCI 'driver' make it use acpi_pm_device_sleep_wake().
* Introduce callback .can_wakeup() in struct pci_platform_pm_ops and
for the ACPI 'driver' make it use acpi_bus_can_wakeup().
* Move the PME# handlig code out of pci_enable_wake() and split it
into two functions, pci_pme_capable() and pci_pme_active(),
allowing the caller to check if given device is capable of
generating PME# from given power state and to enable/disable the
device's PME# functionality, respectively.
* Modify pci_enable_wake() to use the new ACPI callbacks and the new
PME#-related functions.
* Drop the generic .platform_enable_wakeup() callback that is not
used any more.
* Introduce device_set_wakeup_capable() that will set the
power.can_wakeup flag of given device.
* Rework PCI device PM initialization so that, if given device is
capable of generating wake-up events, either natively through the
PME# mechanism, or with the help of the platform, its
power.can_wakeup flag is set and its power.should_wakeup flag is
unset as appropriate.
* Make ACPI set the power.can_wakeup flag for devices found to be
wake-up capable by it.
* Make the ACPI wake-up code enable/disable GPEs for devices that
have the wakeup.flags.prepared flag set (which means that their
wake-up power has been enabled).
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This changes pci_setup_device to handle pci_name() now returning a
constant string.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Make pci_setup_device() write the bus ID directly into the allotted storage,
rather than using pci_name() as the address as that now returns a const
pointer.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Convert printks to use dev_printk().
I converted pr_debug() to dev_dbg(). Both use KERN_DEBUG and are enabled
only when DEBUG is defined.
I converted printk(KERN_DEBUG) to dev_printk(KERN_DEBUG), not to dev_dbg(),
because dev_dbg() is only enabled when DEBUG is defined.
I converted DBG(KERN_INFO) (only in setup-bus.c) to dev_info(). The DBG()
name makes it sound like debug, but it's been enabled forever, so dev_info()
preserves the previous behavior.
I tried to make the resource assignment formats more consistent, e.g.,
"BAR %d: got res [%#llx-%#llx] bus [%#llx-%#llx] flags %#lx\n"
instead of sometimes using "start-end" and sometimes using "size@start".
I'm not attached to one or the other; I'd just like them consistent.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Currently, /sys/bus/pci/slots/ only exposes hotplug attributes when a
hotplug driver is loaded, but PCI slots have attributes such as address,
speed, width, etc. that are not related to hotplug at all.
Introduce pci_slot as the primary data structure and kobject model.
Hotplug attributes described in hotplug_slot become a secondary
structure associated with the pci_slot.
This patch only creates the infrastructure that allows the separation of
PCI slot attributes and hotplug attributes. In this patch, the PCI
hotplug core remains the only user of this infrastructure, and thus,
/sys/bus/pci/slots/ will still only become populated when a hotplug
driver is loaded.
A later patch in this series will add a second user of this new
infrastructure and demonstrate splitting the task of exposing pci_slot
attributes from hotplug_slot attributes.
- Make pci_slot the primary sysfs entity. hotplug_slot becomes a
subsidiary structure.
o pci_create_slot() creates and registers a slot with the PCI core
o pci_slot_add_hotplug() gives it hotplug capability
- Change the prototype of pci_hp_register() to take the bus and
slot number (on parent bus) as parameters.
- Remove all the ->get_address methods since this functionality is
now handled by pci_slot directly.
[achiang@hp.com: rpaphp-correctly-pci_hp_register-for-empty-pci-slots]
Tested-by: Badari Pulavarty <pbadari@us.ibm.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: make headers_check happy]
[akpm@linux-foundation.org: nuther build fix]
[akpm@linux-foundation.org: fix typo in #include]
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Matthew Wilcox <matthew@wil.cx>
Cc: Greg KH <greg@kroah.com>
Cc: Kristen Carlson Accardi <kristen.c.accardi@intel.com>
Cc: Len Brown <lenb@kernel.org>
Acked-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
If a device supports #PME and can generate PME events from D0, we may see
superfluous events before a driver is loaded (drivers should only enable PME as
needed), preventing suspend from working if the corresponding GPE was enabled.
Likewise, if the ACPI device has the _PRW object, the _PSW/_DSW object will be
called in order to disable the wakeup functionality. But when it is allowed to
wake up the sleeping state, OSPM will enable it again.
So we should disable PME in the course of scanning PCI devices and enable it
again only when PME events are actually required to be generated from the
requested PCI state (for example, D3_hot or D3_cold). It is also safe to
disable PME again when the PME is disabled for the PCI devices.
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Signed-off-by: Li Shaohua <shaohua.li@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
so let pci_cfg_space_size call it directly without flag.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
scan AMD opteron io/mmio routing to make sure every pci root bus get correct
resource range. Thus later pci scan could assign correct resource to device
with unassigned resource.
this can fix a system without _CRS for multi pci root bus.
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
in the device_add, we try to use use parent numa_node.
need to make sure pci root bus's bridge device numa_node is set.
then we could use device->numa_node direclty for all device.
and don't need to call pcibus_to_node().
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
reuse pci_cfg_space_size but skip check pci express and pci-x CAP ID.
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
WARNING: drivers/pci/built-in.o(.text+0x150f): Section mismatch in reference from the function pci_scan_single_device() to the function .devinit.text:pci_scan_device()
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
WARNING: drivers/pci/built-in.o(.text+0xc4c): Section mismatch in reference from the function pci_add_new_bus() to the function .devinit.text:pci_alloc_child_bus()
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Done per Linus' request and suggestions. Linus has explained that
better than I'll be able to explain:
On Thu, Mar 27, 2008 at 10:12:10AM -0700, Linus Torvalds wrote:
> Actually, before we go any further, there might be a less intrusive
> alternative: add just a couple of flags to the resource flags field (we
> still have something like 8 unused bits on 32-bit), and use those to
> implement a generic "resource_alignment()" routine.
>
> Two flags would do it:
>
> - IORESOURCE_SIZEALIGN: size indicates alignment (regular PCI device
> resources)
>
> - IORESOURCE_STARTALIGN: start field is alignment (PCI bus resources
> during probing)
>
> and then the case of both flags zero (or both bits set) would actually be
> "invalid", and we would also clear the IORESOURCE_STARTALIGN flag when we
> actually allocate the resource (so that we don't use the "start" field as
> alignment incorrectly when it no longer indicates alignment).
>
> That wouldn't be totally generic, but it would have the nice property of
> automatically at least add sanity checking for that whole "res->start has
> the odd meaning of 'alignment' during probing" and remove the need for a
> new field, and it would allow us to have a generic "resource_alignment()"
> routine that just gets a resource pointer.
Besides, I removed IORESOURCE_BUS_HAS_VGA flag which was unused for ages.
Signed-off-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Gary Hade <garyhade@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Vital Product Data (VPD) may be exposed by PCI devices in several
ways. It is generally unsafe to read this information through the
existing interfaces to user-land because of stateful interfaces.
This adds:
- abstract operations for VPD access (struct pci_vpd_ops)
- VPD state information in struct pci_dev (struct pci_vpd)
- an implementation of the VPD access method specified in PCI 2.2
(in access.c)
- a 'vpd' binary file in sysfs directories for PCI devices with VPD
operations defined
It adds a probe for PCI 2.2 VPD in pci_scan_device() and release of
VPD state in pci_release_dev().
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
PCI Express ASPM defines a protocol for PCI Express components in the D0
state to reduce Link power by placing their Links into a low power state
and instructing the other end of the Link to do likewise. This
capability allows hardware-autonomous, dynamic Link power reduction
beyond what is achievable by software-only controlled power management.
However, The device should be configured by software appropriately.
Enabling ASPM will save power, but will introduce device latency.
This patch adds ASPM support in Linux. It introduces a global policy for
ASPM, a sysfs file /sys/module/pcie_aspm/parameters/policy can control
it. The interface can be used as a boot option too. Currently we have
below setting:
-default, BIOS default setting
-powersave, highest power saving mode, enable all available ASPM
state and clock power management
-performance, highest performance, disable ASPM and clock power
management
By default, the 'default' policy is used currently.
In my test, power difference between powersave mode and performance mode
is about 1.3w in a system with 3 PCIE links.
Note: some devices might not work well with aspm, either because chipset
issue or device issue. The patch provide API (pci_disable_link_state),
driver can disable ASPM for specific device.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The PCI bus names included in /proc/iomem and /proc/ioports are
of the form 'PCI Bus #XX' where XX is the bus number. This patch
changes the naming to 'PCI Bus XXXX:YY' where XXXX is the domain
number and YY is the bus number. For example, PCI bus 14 in
domain 0 will show as 'PCI Bus 0000:14' instead of 'PCI Bus #14'.
This change makes the naming consistent with other architectures
such as ia64 where multiple PCI domain support has been around
longer.
Signed-off-by: Gary Hade <garyhade@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch finally removes the global list of PCI devices. We are
relying entirely on the list held in the driver core now, and do not
need a separate "shadow" list as no one uses it.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This lets us check if the device is really added to the driver core or
not, which is what we need when walking some of the bus lists. The flag
is there in anticipation of getting rid of the other PCI device list,
which is what we used to check in this situation.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* Cleaned up references to cpumask_scnprintf() and added new
cpulist_scnprintf() interfaces where appropriate.
* Fix some small bugs (or code efficiency improvments) for various uses
of cpumask_scnprintf.
* Clean up some checkpatch errors.
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix following warning:
WARNING: vmlinux.o(.text+0x47bdb1): Section mismatch in reference from the function pci_scan_child_bus() to the function .devinit.text:pcibios_fixup_bus()
We had plenty of functions that could be annotated __devinit but due to
the former restriction that exported symbols could not be annotated
they were not so. So annotate these function and fix the references
from the pci/hotplug/* code to silence the resuting warnings.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This adds PCI's accessor for segment_boundary_mask in device_dma_parameters.
The default segment_boundary is set to 0xffffffff, same to the block layer's
default value (and the scsi mid layer uses the same value).
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Greg KH <greg@kroah.com>
Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This adds struct device_dma_parameters in struct pci_dev and properly
sets up a pointer in struct device.
The default max_segment_size is set to 64K, same to the block layer's
default value.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Mostly-acked-by: Jeff Garzik <jeff@garzik.org>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The following warnings were issued during build of
drivers/pci with an allyesconfig build:
WARNING: o-x86_64/drivers/pci/built-in.o(.text+0xdaf): Section mismatch in reference from the function pci_add_new_bus() to the function .devinit.text:pci_alloc_child_bus()
WARNING: o-x86_64/drivers/pci/built-in.o(.text+0x15e2): Section mismatch in reference from the function pci_scan_single_device() to the function .devinit.text:pci_scan_device()
WARNING: o-x86_64/drivers/pci/built-in.o(.text+0x1b0c5): Section mismatch in reference from the function pci_bus_assign_resources() to the function .devinit.text:pci_setup_bridge()
WARNING: o-x86_64/drivers/pci/built-in.o(.text+0x1b32d): Section mismatch in reference from the function pci_bus_size_bridges() to the function .devinit.text:pci_bus_size_cardbus()
Investigating each case closer it looked like all
referred functions are only used in the init phase
or during hotplug.
So to avoid wasting too much memory in the non-hotplug
case the simpler fix was to allow the fuctions to
use code/data from the __devinit sections.
This was done in all four case by adding the __ref
annotation.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Fix following warnings:
WARNING: o-x86_64/drivers/pci/built-in.o(.text+0xb054): Section mismatch in reference from the function cpci_configure_slot() to the function .devinit.text:pci_do_scan_bus()
WARNING: o-x86_64/drivers/pci/built-in.o(.text+0x153ab): Section mismatch in reference from the function shpchp_configure_device() to the function .devinit.text:pci_do_scan_bus()
WARNING: o-x86_64/drivers/pci/built-in.o(__ksymtab+0xc0): Section mismatch in reference from the variable __ksymtab_pci_do_scan_bus to the function .devinit.text:pci_do_scan_bus()
PCI hotplug were the only user of pci_do_scan_bus()
so moving this function to a separate file that is build
only when we enable CONFIG_HOTPLUG_PCI.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This reverts commit 6c723d5bd8.
It caused build errors on non-x86 platforms, config file confusion, and
even some boot errors on some x86-64 boxes. All around, not quite ready
for prime-time :(
Cc: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This moves the pci_bus class device to be a real struct device and at
the same time, place it in the device tree in the correct location.
Note, the old "bridge" symlink is now gone, but this was a non-standard
link and no userspace program used it. If you need to determine the
device that the bus is on, follow the standard device symlink, or walk
up the device tree.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
PCI Express ASPM defines a protocol for PCI Express components in the D0
state to reduce Link power by placing their Links into a low power state
and instructing the other end of the Link to do likewise. This
capability allows hardware-autonomous, dynamic Link power reduction
beyond what is achievable by software-only controlled power management.
However, The device should be configured by software appropriately.
Enabling ASPM will save power, but will introduce device latency.
This patch adds ASPM support in Linux. It introduces a global policy for
ASPM, a sysfs file /sys/module/pcie_aspm/parameters/policy can control
it. The interface can be used as a boot option too. Currently we have
below setting:
-default, BIOS default setting
-powersave, highest power saving mode, enable all available ASPM
state
and clock power management
-performance, highest performance, disable ASPM and clock power
management
By default, the 'default' policy is used currently.
In my test, power difference between powersave mode and performance mode
is about 1.3w in a system with 3 PCIE links.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
There's already a prototype for pci_scan_child_bus() at the correct place in
pci.h, so there's no reason for an additional one.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This allows an easier way to get to the device klist associated with a
struct bus_type (you have three to choose from...) This will make it
easier to move these fields to be dynamic in a future patch.
The only user of this is the PCI core which horribly abuses this
interface to rearrange the order of the pci devices. This should be
done using the existing bus device walking functions, but that's left
for future patches.
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
It appears that some PCI-E bridges do the wrong thing in the presense of
CRS Software Visibility and MMCONFIG. In particular, it looks like an
ATI bridge (device ID 7936) will return 0001 in the vendor ID field of
any bridged devices indefinitely.
Not enabling CRS SV avoids the problem, and as we currently do not
really make good use of the feature anyway (we just time out rather than
do any threaded discovery as suggested by the CRS specs), we're better
off just not enabling it.
This should fix a slew of problem reports with random devices (generally
graphics adapters or fairly high-performance networking cards, since it
only affected PCI-E) not getting properly recognized on these AMD systems.
If we really want to use CRS-SV, we may end up eventually needing a
whitelist of systems where this should be enabled, along with some kind
of "pcibios_enable_crs()" query to call the system-specific code.
Suggested-by: Loic Prylli <loic@myri.com>
Tested-by: Kai Ruhnau <kai@tragetaschen.dyndns.org>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Restore PCI expansion ROM P2P prefetch window creation.
This patch reverts previous "Avoid creating P2P prefetch
window for expansion ROMs" change due to regressions that
were spotted on some systems.
Signed-off-by: Gary Hade <garyhade@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This reverts commit fd6e732186, which
helped up things on MIPS, but was wrong for everything else. As Ralf
Baechle puts it:
"It seems the whole MIPS resource managment is complicated enough (out
of necessity) that only a few people actually grok it. Ioports being
actually memory mapped on MIPS only makes the confusion worse, sigh."
Requested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Alan Cox <alan@redhat.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When devices are under a p2p bridge, upstream transactions get replaced by the
device id of the bridge as it owns the PCIE transaction. Hence its necessary
to setup translations on behalf of the bridge as well. Due to this limitation
all devices under a p2p share the same domain in a DMAR.
We just cache the type of device, if its a native PCIe device
or not for later use.
[akpm@linux-foundation.org: BUG_ON -> WARN_ON+recover]
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Modify PCI Bridge Control ISA flag for clarity
This patch changes PCI_BRIDGE_CTL_NO_ISA to PCI_BRIDGE_CTL_ISA
and modifies it's clarifying comment and locations where used.
The change reduces the chance of future confusion since it makes
the set/unset meaning of the bit the same in both the bridge
control register and bridge_ctl field of the pci_bus struct.
Signed-off-by: Gary Hade <garyhade@us.ibm.com>
Acked-by: Linas Vepstas <linas@austin.ibm.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Avoid creating P2P prefetch window for expansion ROMs
Because of the future possibility that P2P prefetch windows will contain
address ranges above 4GB some BIOSes are providing space in the P2P
non-prefetch windows for expansion ROMs. This is due to expansion ROM
BAR 32-bit limitation. When expansion ROM BARs without BIOS assigned
address(es) are currently found behind a P2P bridge, the kernel attempts
to create a P2P prefetch window for them even though space for them has
already been provided in the non-prefetch window. _CRS on some systems
with certain resource conservation conscious BIOSes may not provide the
extra 1MB or more memory resource needed for the expansion ROM motivated
prefetch window causing resource allocation errors.
This change corrects the problem by removing IORESOURCE_PREFETCH from
the expansion ROM flags initialization. It also removes
IORESOURCE_CACHEABLE which seems inappropriate if only non-cacheable
memory is available.
Signed-off-by: Gary Hade <gary.hade@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Skip ISA ioresource alignment on some systems
To conserve limited PCI i/o resource on some IBM multi-node systems, the
BIOS allocates (via _CRS) and expects the kernel to use addresses in
ranges currently excluded by pcibios_align_resource() [i386/pci/i386.c].
This change allows the kernel to use the currently excluded address
ranges on the IBM x3800, x3850, and x3950.
Signed-off-by: Gary Hade <gary.hade@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
I got the following error on MIPS Cobalt.
PCI: Unable to reserve I/O region #1:8@f00001f0 for device 0000:00:09.1
pata_via 0000:00:09.1: failed to request/iomap BARs for port 0 (errno=-16)
PCI: Unable to reserve I/O region #3:8@f0000170 for device 0000:00:09.1
pata_via 0000:00:09.1: failed to request/iomap BARs for port 1 (errno=-16)
pata_via 0000:00:09.1: no available native port
The legacy mode IDE resources set the following order.
pci_setup_device()
Legacy mode ATA controllers have fixed addresses.
IDE resources: 0x1F0-0x1F7, 0x3F6, 0x170-0x177, 0x376
|
V
pcibios_fixup_bus()
MIPS Cobalt PCI bus regions have the -0x10000000 offset from PCI resources.
pcibios_fixup_bus() fix PCI bus regions.
0x1F0 - 0x10000000 = 0xF00001F0
|
V
ata_pci_init_one()
PCI: Unable to reserve I/O region #1:8@f00001f0 for device 0000:00:09.1
In some architectures, PCI bus regions have the offset from PCI resources.
For this reason, pci_setup_device() should set PCI bus regions to
dev->resource[].
[akpm@linux-foundation.org: use struct initialiser]
Signed-off-by: Yoichi Yuasa <yoichi_yuasa@tripeaks.co.jp>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Greg KH <greg@kroah.com>
Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
On MIPS with PCI && !HOTPLUG, I'm currently getting the following modpost
warning:
MODPOST vmlinux.o
WARNING: vmlinux.o(.text+0x1ce128): Section mismatch: reference to .init.text:pci_read_bridge_bases (between 'pcibios_fixup_bus' and 'pcibios_enable_device')
On MIPS I have the call chains pci_scan_child_bus -> pcibios_fixup_bus ->
pci_read_bridge_bases. pci_scan_child_bus can't be __devinit because it
it is an exported symbol, thus pcibios_fixup_bus and pci_read_bridge_bases
can't be either.
For some reason I don't see this issue on x86; I blame compiler differences.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Adrian Bunk wrote:
> Alois Nešpor wrote
>> PCI: Bus #0b (-#0e) is hidden behind transparent bridge #0a (-#0b) (try 'pci=assign-busses')
>> Please report the result to linux-kernel to fix this permanently"
>>
>> dmesg:
>> "Yenta: Raising subordinate bus# of parent bus (#0a) from #0b to #0e"
>> without pci=assign-busses and nothing with pci=assign-busses.
>
> Bernhard?
Ok, lets kill the message. As Alois Nešpor also saw, that's fixed up by Yenta,
so PCI does not have to warn about it. PCI could still warn about it if
is_cardbus is 0 in that instance of pci_scan_bridge(), but so far I have
not seen a report where this would have been the case so I think we can
spare the kernel of that check (removes ~300 lines of asm) unless debugging
is done.
History: The whole check was added in the days before we had the fixup
for this in Yenta and pci=assign-busses was the only way to get CardBus
cards detected on many (not all) of the machines which give this warning.
In theory, there could be cases when this warning would be triggered and
it's not cardbus, then the warning should still apply, but I think this
should only be the case when working on a completely broken PCI setup,
but one may have already enabled the debug code in drivers/pci and the
patched check would then trigger.
I do not sign this off yet because it's completely untested so far, but
everyone is free to test it (with the #ifdef DEBUG replaced by #if 1 and
pr_debug( changed to printk(.
We may also dump the whole check (remove everything within the #ifdef from
the source) if that's perferred.
On Alois Nešpor's machine this would then (only when debugging) this message:
"PCI: Bus #0b (-#0e) is partially hidden behind transparent bridge #0a (-#0b)"
"partially" should be in the message on his machine because #0b of #0b-#0e
is reachable behind #0a-#0b, but not #0c-#0e.
But that differentiation is now moot anyway because the fixup in Yenta takes
care of it as far as I could see so far, which means that unless somebody
is debugging a totally broken PCI setup, this message is not needed anymore,
not even for debugging PCI.
Ok, here the patch with the following changes:
* Refined to say that the bus is only partially hidden when the parent
bus numbers are not totally way off (outside of) the child bus range
* remove the reference to pci=assign-busses and the plea to report it
We could add a pure source code-only comment to keep a reference to
pci=assign-busses the in case when this is triggered by someone who
is debugging the cause of this message and looking the way to solve it.
From: Bernhard Kaindl <bk@suse.de>
Cc: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Before calling init_hwif_default, ide_unregister gets lock ide_lock and
disables irq. init_hwif_default calls ide_default_io_base which calls
pci_get_device and later pci_get_subsys tries to apply for semaphore
pci_bus_sem and goes to sleep.
Mostly, pci_get_device should be called when irq is turned on.
ide_default_io_base just needs find if list pci_devices is empty.
Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com>
Cc: Greg KH <greg@kroah.com>
Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6: (34 commits)
PCI: Only build PCI syscalls on architectures that want them
PCI: limit pci_get_bus_and_slot to domain 0
PCI: hotplug: acpiphp: avoid acpiphp "cannot get bridge info" PCI hotplug failure
PCI: hotplug: acpiphp: remove hot plug parameter write to PCI host bridge
PCI: hotplug: acpiphp: fix slot poweroff problem on systems without _PS3
PCI: hotplug: pciehp: wait for 1 second after power off slot
PCI: pci_set_power_state(): check for PM capabilities earlier
PCI: cpci_hotplug: Convert to use the kthread API
PCI: add pci_try_set_mwi
PCI: pcie: remove SPIN_LOCK_UNLOCKED
PCI: ROUND_UP macro cleanup in drivers/pci
PCI: remove pci_dac_dma_... APIs
PCI: pci-x-pci-express-read-control-interfaces cleanups
PCI: Fix typo in include/linux/pci.h
PCI: pci_ids, remove double or more empty lines
PCI: pci_ids, add atheros and 3com_2 vendors
PCI: pci_ids, reorder some entries
PCI: i386: traps, change VENDOR to DEVICE
PCI: ATM: lanai, change VENDOR to DEVICE
PCI: Change all drivers to use pci_device->revision
...
sysfs is now completely out of driver/module lifetime game. After
deletion, a sysfs node doesn't access anything outside sysfs proper,
so there's no reason to hold onto the attribute owners. Note that
often the wrong modules were accounted for as owners leading to
accessing removed modules.
This patch kills now unnecessary attribute->owner. Note that with
this change, userland holding a sysfs node does not prevent the
backing module from being unloaded.
For more info regarding lifetime rule cleanup, please read the
following message.
http://article.gmane.org/gmane.linux.kernel/510293
(tweaked by Greg to not delete the field just yet, to make it easier to
merge things properly.)
Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Currently there are 97 occurrences where drivers need the pci
revision ID. We can do this once for all devices. Even the pci
subsystem needs the revision several times for quirks. The extra
u8 member pads out nicely in the pci_dev struct.
Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Back in commit 8c4b2cf9af, Bernhard said
that he would fix up all instances of when this message happens. So
point people at him instead of the linux-kernel list which can not fix
things up.
Cc: Bernhard Kaindl <bk@suse.de>
Cc: Dave Jones <davej@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Miles Lane <miles.lane@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The msi descriptors are linked together with what looks a lot like
a linked list, but isn't a struct list_head list. Make it one.
The only complication is that previously we walked a list of irqs, and
got the descriptor for each with get_irq_msi(). Now we have a list of
descriptors and need to get the irq out of it, so it needs to be in the
actual struct msi_desc. We use 0 to indicate no irq is setup.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Convert code that allocs a struct pci_dev to use alloc_pci_dev().
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
There are currently several places in the kernel where we kmalloc()
a struct pci_dev and start initialising it. It'd be preferable to
have an allocator so we can ensure the pci_dev is correctly initialised
in one place.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Functions marked __devinit will be removed after kernel init. But being
exported they are potentially called by a module much later.
So the safer choice seems to be to keep the function even in the non
CONFIG_HOTPLUG case.
This silence the follwoing section mismatch warnings:
WARNING: drivers/built-in.o - Section mismatch: reference to .init.text:pci_bus_add_device from __ksymtab_gpl between '__ksymtab_pci_bus_add_device' (at offset 0x20) and '__ksymtab_pci_walk_bus'
WARNING: drivers/built-in.o - Section mismatch: reference to .init.text:pci_create_bus from __ksymtab_gpl between '__ksymtab_pci_create_bus' (at offset 0x40) and '__ksymtab_pci_stop_bus_device'
WARNING: drivers/built-in.o - Section mismatch: reference to .init.text:pci_bus_max_busnr from __ksymtab_gpl between '__ksymtab_pci_bus_max_busnr' (at offset 0xc0) and '__ksymtab_pci_assign_resource_fixed'
WARNING: drivers/built-in.o - Section mismatch: reference to .init.text:pci_claim_resource from __ksymtab_gpl between '__ksymtab_pci_claim_resource' (at offset 0xe0) and '__ksymtab_pcie_port_bus_type'
WARNING: drivers/built-in.o - Section mismatch: reference to .init.text:pci_bus_add_devices from __ksymtab between '__ksymtab_pci_bus_add_devices' (at offset 0x70) and '__ksymtab_pci_bus_alloc_resource'
WARNING: drivers/built-in.o - Section mismatch: reference to .init.text:pci_scan_bus_parented from __ksymtab between '__ksymtab_pci_scan_bus_parented' (at offset 0x90) and '__ksymtab_pci_root_buses'
WARNING: drivers/built-in.o - Section mismatch: reference to .init.text:pci_bus_assign_resources from __ksymtab between '__ksymtab_pci_bus_assign_resources' (at offset 0x4d0) and '__ksymtab_pci_bus_size_bridges'
WARNING: drivers/built-in.o - Section mismatch: reference to .init.text:pci_bus_size_bridges from __ksymtab between '__ksymtab_pci_bus_size_bridges' (at offset 0x4e0) and '__ksymtab_pci_setup_cardbus'
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The change to force legacy mode IDE channels' resources to fixed non-zero
values confuses (at least some versions of) X, because the values reported
by the kernel and those readable from PCI config space aren't consistent
anymore. Therefore, this patch arranges for the respective BARs to also
get updated if possible.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
pci_scan_msi_device() doesn't do anything anymore, so remove it.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
For pci mem resource that size is bigger than 4G, the sz returned by
pc_size will be 0.
So that resource is skipped, and register contained hi address will be
treated as another 32bit resource. We need to use sz64 and pci_sz64 for
64 bit resource for clear logical. Typical usages for this: Opteron
system with co-processor and the co-processor could take more than 4G
RAM as pre-fetchable mem resource.
Signed-off-by: Yinghai Lu <yinghai.lu@amd.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Don't have macros between a function's kernel-doc block and the function
definition. This is not valid for kernel-doc.
Warning(/var/linsrc/linux-2.6.20-rc1-git8//drivers/pci/probe.c:653): No description found for parameter 'IORESOURCE_PCI_FIXED'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Since commit 368c73d4f6 the kernel will try
to update the non-writeable BAR registers 0..3 of PIIX4 IDE adapters if
pci_assign_unassigned_resources() is used to do full resource assignment of
the bus. This fails because in the PIIX4 these BAR registers have
implicitly assumed values and read back as zero; it used to work because
the kernel used to just write zero to that register the read back value did
match what was written.
The fix is a new resource flag IORESOURCE_PCI_FIXED used to mark a resource
as non-movable. This will also be useful to keep other import system
resources from being moved around - for example system consoles on PCI
busses.
[akpm@osdl.org: cleanup]
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
For node-aware skb allocations we need information about the node in struct
net_device or struct device. Davem suggested to put it into struct device
which this patch does.
In particular:
- struct device gets a new int numa_node member if CONFIG_NUMA is set
- there are two new helpers, dev_to_node and set_dev_node to
transparently deal with the non-numa case
- for pci devices the node-info is set to the value we get from
pcibus_to_node.
Note that for some architectures pcibus_to_node doesn't work yet at the time
we call it currently. This is harmless and will just mean skb allocations
aren't node-local on this architectures until the implementation of
pcibus_to_node on these architectures have been updated (There are patches for
x86 and x86_64 floating around)
[akpm@osdl.org: cleanup]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Christoph Lameter <clameter@engr.sgi.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The number of permutations of crap we do is amazing and almost all of it
has the wrong effect in 2.6.
At the heart of this is the PCI SFF magic which says that compatibility
mode PCI IDE controllers use ISA IRQ routing and hard coded addresses
not the BAR values. The old quirks variously clears them, sets them,
adjusts them and then IDE ignores the result.
In order to drive all this garbage out and to do it portably we need to
handle the SFF rules directly and properly. Because we know the device
BAR 0-3 are not used in compatibility mode we load them with the values
that are implied (and indeed which many controllers actually
thoughtfully put there in this mode anyway).
This removes special cases in the IDE layer and libata which now knows
that bar 0/1/2/3 always contain the correct address. It means our
resource allocation map is accurate from boot, not "mostly accurate"
after ide is loaded, and it shoots lots of code. There is also lots more
code and magic constant knowledge to shoot once this is in and settled.
Been in my test tree for a while both with drivers/ide and with libata.
Wants some -mm shakedown in case I've missed something dumb or there are
corner cases lurking.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Problem:
New Dell PowerEdge servers have 2 embedded ethernet ports, which are
labeled NIC1 and NIC2 on the chassis, in the BIOS setup screens, and
in the printed documentation. Assuming no other add-in ethernet ports
in the system, Linux 2.4 kernels name these eth0 and eth1
respectively. Many people have come to expect this naming. Linux 2.6
kernels name these eth1 and eth0 respectively (backwards from
expectations). I also have reports that various Sun and HP servers
have similar behavior.
Root cause:
Linux 2.4 kernels walk the pci_devices list, which happens to be
sorted in breadth-first order (or pcbios_find_device order on i386,
which most often is breadth-first also). 2.6 kernels have both the
pci_devices list and the pci_bus_type.klist_devices list, the latter
is what is walked at driver load time to match the pci_id tables; this
klist happens to be in depth-first order.
On systems where, for physical routing reasons, NIC1 appears on a
lower bus number than NIC2, but NIC2's bridge is discovered first in
the depth-first ordering, NIC2 will be discovered before NIC1. If the
list were sorted breadth-first, NIC1 would be discovered before NIC2.
A PowerEdge 1955 system has the following topology which easily
exhibits the difference between depth-first and breadth-first device
lists.
-[0000:00]-+-00.0 Intel Corporation 5000P Chipset Memory Controller Hub
+-02.0-[0000:03-08]--+-00.0-[0000:04-07]--+-00.0-[0000:05-06]----00.0-[0000:06]----00.0 Broadcom Corporation NetXtreme II BCM5708S Gigabit Ethernet (labeled NIC2, 2.4 kernel name eth1, 2.6 kernel name eth0)
+-1c.0-[0000:01-02]----00.0-[0000:02]----00.0 Broadcom Corporation NetXtreme II BCM5708S Gigabit Ethernet (labeled NIC1, 2.4 kernel name eth0, 2.6 kernel name eth1)
Other factors, such as device driver load order and the presence of
PCI slots at various points in the bus hierarchy further complicate
this problem; I'm not trying to solve those here, just restore the
device order, and thus basic behavior, that 2.4 kernels had.
Solution:
The solution can come in multiple steps.
Suggested fix#1: kernel
Patch below optionally sorts the two device lists into breadth-first
ordering to maintain compatibility with 2.4 kernels. It adds two new
command line options:
pci=bfsort
pci=nobfsort
to force the sort order, or not, as you wish. It also adds DMI checks
for the specific Dell systems which exhibit "backwards" ordering, to
make them "right".
Suggested fix#2: udev rules from userland
Many people also have the expectation that embedded NICs are always
discovered before add-in NICs (which this patch does not try to do).
Using the PCI IRQ Routing Table provided by system BIOS, it's easy to
determine which PCI devices are embedded, or if add-in, which PCI slot
they're in. I'm working on a tool that would allow udev to name
ethernet devices in ascending embedded, slot 1 .. slot N order,
subsort by PCI bus/dev/fn breadth-first. It'll be possible to use it
independent of udev as well for those distributions that don't use
udev in their installers.
Suggested fix#3: system board routing rules
One can constrain the system board layout to put NIC1 ahead of NIC2
regardless of breadth-first or depth-first discovery order. This adds
a significant level of complexity to board routing, and may not be
possible in all instances (witness the above systems from several
major manufacturers). I don't want to encourage this particular train
of thought too far, at the expense of not doing #1 or #2 above.
Feedback appreciated. Patch tested on a Dell PowerEdge 1955 blade
with 2.6.18.
You'll also note I took some liberty and temporarily break the klist
abstraction to simplify and speed up the sort algorithm. I think
that's both safe and appropriate in this instance.
Signed-off-by: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The pci channel state is currently uninitialized, thus there are two ways
of indicating that "everything's OK": 0 and 1. This is a bit of a burden.
If a devce driver wants to check if the pci channel is in a working or a
disconnected state, the driver writer must perform checks similar to
if((pdev->error_state != 0) &&
(pdev->error_state != pci_channel_io_normal)) {
whatever();
}
which is rather akward. The first check is needed because stuct pci_dev is
inited to all-zeros. The scond is needed because the error recovery will
set the state to pci_channel_io_normal (which is not zero).
This patch fixes this awkwardness.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
pci_walk_bus has a race with pci_destroy_dev. When cb is called
in pci_walk_bus, pci_destroy_dev might unlink the dev pointed by next.
Later on in the next loop, pointer next becomes NULL and cause
kernel panic.
Below patch against 2.6.17-rc4 fixes it by changing pci_bus_lock (spin_lock)
to pci_bus_sem (rw_semaphore).
Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When we detect a 64-bit pre-set address in a BAR on a 32-bit platform,
we disable it and treat it as if it had been unset, thus allowing the
general address assignment code to assign a new address to it when the
device is enabled. This can happen either if the firmware assigns
64-bit addresses; additionally, some cards have been found "in the
wild" which do not come out of reset with all the BAR registers set to
zero.
Unfortunately, the patch that implemented this tested the low part of
the address instead of the high part of the address. This patch fixes
that.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
[pci] Ignore pre-set 64-bit BARs on 32-bit platforms
Currently, Linux always rejects a device which has a pre-set 64-bit
address on a 32-bit platform. On systems which do not do PCI
initialization in firmware, this causes some devices which don't
correctly power up with all BARs zero to fail.
This patch makes the kernel automatically zero out such an address
(thus treating it as if it had not been set at all, meaning it will
assign an address if necessary).
I have done this only for devices, not bridges. It seems potentially
hazardous to do for bridges.
Signed-off-by: H. Peter Anvin <hpa@c2micro.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
this patch converts drivers/pci to kzalloc usage.
Compile tested with allyes config.
Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
"In some cases, especially on modern laptops with a lot of PCI and cardbus
bridges, we're unable to assign correct secondary/subordinate bus numbers
to all cardbus bridges due to BIOS limitations unless we are using
"pci=assign-busses" boot option." -- Ivan Kokshaysky (from a patch comment)
Without it, Cardbus cards inserted are never seen by PCI because the parent
PCI-PCI Bridge of the Cardbus bridge will not pass and translate Type 1 PCI
configuration cycles correctly and the system will fail to find and
initialise the PCI devices in the system.
Reference: PCI-PCI Bridges: PCI Configuration Cycles and PCI Bus Numbering:
http://www.science.unitn.it/~fiorella/guidelinux/tlk/node72.html
The reason for this is that:
``All PCI busses located behind a PCI-PCI bridge must reside between the
secondary bus number and the subordinate bus number (inclusive).''
"pci=assign-busses" makes pcibios_assign_all_busses return 1 and this
turns on PCI renumbering during PCI probing.
Alan suggested to use DMI automatically set assign-busses on problem systems.
The only question for me was where to put it. I put it directly before
scanning PCI bus into pcibios_scan_root() because it's called from legacy,
acpi and numa and so it can be one place for all systems and configurations
which may need it.
AMD64 Laptops are also affected and fixed by assign-busses, and the code is
also incuded from arch/x86_64/pci/ that place will also work for x86_64
kernels, I only ifdef'-ed the x86-only Laptop in this example.
Affected and known or assumed to be fixed with it are (found by googling):
* ASUS Z71V and L3s
* Samsung X20
* Compaq R3140us and all Compaq R3000 series laptops with TI1620 Controller,
also Compaq R4000 series (from a kernel.org bugreport)
* HP zv5000z (AMD64 3700+, known that fixup_parent_subordinate_busnr fixes it)
* HP zv5200z
* IBM ThinkPad 240
* An IBM ThinkPad (1.8 GHz Pentium M) debugged by Pavel Machek
gives the correspondig message which detects the possible problem.
* MSI S260 / Medion SIM 2100 MD 95600
The patch also expands the "try pci=assign-busses" warning so testers will
help us to update the DMI table.
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
It turns out AMD 8131 quirk only affects MSI for devices behind the 8131 bridge.
Handle this by adding a flags field in pci_bus, inherited from parent to child.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
> On Mon, Feb 13, 2006 at 05:13:21PM -0800, David S. Miller wrote:
> >
> > In drivers/pci/probe.c:pci_scan_bridge(), if this is not the first
> > pass (pass != 0) we don't restore the PCI_BRIDGE_CONTROL_REGISTER and
> > thus leave PCI_BRIDGE_CTL_MASTER_ABORT off:
> >
> > int __devinit pci_scan_bridge(struct pci_bus *bus, struct pci_dev * dev, int max, int pass)
> > {
> > ...
> > /* Disable MasterAbortMode during probing to avoid reporting
> > of bus errors (in some architectures) */
> > pci_read_config_word(dev, PCI_BRIDGE_CONTROL, &bctl);
> > pci_write_config_word(dev, PCI_BRIDGE_CONTROL,
> > bctl & ~PCI_BRIDGE_CTL_MASTER_ABORT);
> > ...
> > if ((buses & 0xffff00) && !pcibios_assign_all_busses() && !is_cardbus) {
> > unsigned int cmax, busnr;
> > /*
> > * Bus already configured by firmware, process it in the first
> > * pass and just note the configuration.
> > */
> > if (pass)
> > return max;
> > ...
> > }
> >
> > pci_write_config_word(dev, PCI_BRIDGE_CONTROL, bctl);
> > ...
> >
> > This doesn't seem intentional.
Agreed, looks like an accident. The patch [1] originally came from Kip
Walker (Broadcom back then) between 2.6.0-test3 and 2.6.0-test4. As I
recall it was supposed to fix an issue with with PCI aborts being
signalled by the PCI bridge of the Broadcom BCM1250 family of SOCs when
probing behind pci_scan_bridge. It is undeseriable to disable
PCI_BRIDGE_CTL_MASTER_ABORT in pci_{read,write)_config_* and the
behaviour wasn't considered a bug in need of a workaround, so this was
put in probe.c.
I don't have an affected system at hand, so can't really test but I
propose something like the below patch.
[1] http://www.linux-mips.org/git?p=linux.git;a=commit;h=599457e0cb702a31a3247ea6a5d9c6c99c4cf195
[PCI] Avoid leaving MASTER_ABORT disabled permanently when returning from pci_scan_bridge.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
After you find the maximum value of the subordinate buses below the child
bus, you must fix the parent's subordinate bus number again, otherwise
it may be too small.
Signed-off-by: Kristen Carlson Accardi <kristen.c.accardi@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The powerpc PCI code sets up the PCI tree without doing config space
accesses in most cases, from the firmware tree. However, it still wants
to call pci_cfg_space_size() under some conditions, thus it needs to
be made non-static (though I don't see a point to export it to modules).
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Add a warning if a child bus may be inaccessible because the
parent bridge has wrong secondary or subordinate bus numbers.
Note that this may or may not happen on "transparent" bridges,
as can be seen in bug #5557.
Also, if we do not fix up the assignment of bus numbers, try to
make use of the bus number space available.
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
I've implemented a quirk to take advantage of the 1KB I/O space
granularity option on the Intel P64H2 PCI Bridge. I had to change
probe.c because it sets the resource start and end to be aligned on 4k
boundaries (after the quirk sets them to 1k boundaries). I've tested
this patch on a Unisys ES7000-600 both with and without the 1KB option
enabled. I also tested this on a 2 processor Dell box that doesn't have
a P64H2 to make sure there were no negative affects there.
Signed-off-by: Dan Yeisley <dan.yeisley@unisys.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Call pci_read_irq() for bridges too, so that the pin value
is stored for bridges that require interrupts.
Signed-off-by: Kristen Carlson Accardi <kristen.c.accardi@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Store the value of the INTERRUPT_PIN in the pci_dev structure
so that it can be retrieved later.
Signed-off-by: Kristen Carlson Accardi <kristen.c.accardi@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
PCI: add descriptions for missing function parameters.
Eliminate all kernel-doc warnings here.
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
I believe the change that broke things is introduction of
pci_fixup_parent_subordinate_busnr().
The patch here does two things:
- hunk #1 should fix the problems you've seen when you boot without
additional "pci" kernel options;
- hunk #2 supposedly fixes boot with "pci=assign-busses" option which
otherwise hangs Acer TM81xx machines as reported.
Please try this with and without "pci=assign-busses". If it boots,
I'd like to see 'lspci -vvx' for both cases.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This function expects an unsigned 32-bit type as its third argument:
static u32 pci_size(u32 base, u32 maxbase, u32 mask)
However, given these definitions:
#define PCI_BASE_ADDRESS_MEM_MASK (~0x0fUL)
#define PCI_ROM_ADDRESS_MASK (~0x7ffUL)
these two calls in drivers/pci/probe.c are problematic for architectures
for which a UL is not equivalent to a u32:
sz = pci_size(l, sz, PCI_BASE_ADDRESS_MEM_MASK);
sz = pci_size(l, sz, PCI_ROM_ADDRESS_MASK);
Hence the below compile warning when building for ARCH=ppc64:
drivers/pci/probe.c: In function `pci_read_bases':
/.../probe.c:168: warning: large integer implicitly truncated to unsigned type
/.../probe.c:218: warning: large integer implicitly truncated to unsigned type
Here is a simple fix.
Signed-off-by: Amos Waterland <apw@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
pcibus_to_cpumask expands into more than just an initialiser so gcc
moans about code before variable declarations.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch makes some small rearrangements of the PCI probing code in
order to make it possible for arch code to set up the PCI tree
without needing to duplicate code from the PCI layer unnecessarily.
PPC64 will use this to set up the PCI tree from the Open Firmware
device tree, which we need to do on logically-partitioned pSeries
systems.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
- support PCI PM CAP version 3 (as defined in PCI PM Interface Spec v1.2)
- pci/probe.c sets the PM state initially to 4 which is D3cold. add a
PCI_UNKNOWN
- minor cleanups
Signed-off-by: Daniel Ritz <daniel.ritz@gmx.ch>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The setup-bus code doesn't work correctly for configurations
with more than one display adapter in the same PCI domain.
This stuff actually is a leftover of an early 2.4 PCI setup code
and apparently it stopped working after some "bridge_ctl" changes.
So the best thing we can do is just to remove it and rely on the fact
that any firmware *has* to configure VGA port forwarding for the boot
display device properly.
But then we need to ensure that the bus->bridge_ctl will always
contain valid information collected at the probe time, therefore
the following change in pci_scan_bridge() is needed.
Signed-off-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>