summaryrefslogtreecommitdiff
path: root/drivers/pci
AgeCommit message (Collapse)Author
2016-12-07PCI: tegra: Implement PCA enable workaroundThierry Reding
Tegra210's PCIe controller has a bug that requires the PCA (performance counter) feature to be enabled. If this isn't done, accesses to device configuration space will hang the chip for tens of seconds. Implement the workaround. Based on commit 514e19138af2 ("pci: tegra: implement PCA enable workaround") from U-Boot by Stephen Warren <swarren@nvidia.com>. Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Bjorn Helgaas <helgaas@kernel.org>
2016-12-07PCI: tegra: Use new pci_register_host_bridge() interfaceArnd Bergmann
Tegra is one of the remaining platforms that still use the traditional pci_common_init_dev() interface for probing PCI host bridges. This demonstrates how to convert it to the pci_register_host interface I just added in a previous patch. This leads to a more linear probe sequence that can handle errors better because we avoid callbacks into the driver, and it makes the driver architecture independent. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Bjorn Helgaas <helgaas@kernel.org>
2016-12-07PCI: Export host bridge registration interfaceThierry Reding
Allow PCI host bridge drivers to use the new host bridge interfaces to register their host bridge. Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Bjorn Helgaas <helgaas@kernel.org>
2016-12-07PCI: Allow driver-specific data in host bridgeThierry Reding
Provide a way to allocate driver-specific data along with a PCI host bridge structure. The bridge's ->private field points to this data. Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Bjorn Helgaas <helgaas@kernel.org>
2016-12-07PCI: Add pci_register_host_bridge() interfaceArnd Bergmann
Make the existing pci_host_bridge structure a proper device that is usable by PCI host drivers in a more standard way. In addition to the existing pci_scan_bus(), pci_scan_root_bus(), pci_scan_root_bus_msi(), and pci_create_root_bus() interfaces, this unfortunately means having to add yet another interface doing basically the same thing, and add some extra code in the initial step. However, this time it's more likely to be extensible enough that we won't have to do another one again in the future, and we should be able to reduce code much more as a result. The main idea is to pull the allocation of 'struct pci_host_bridge' out of the registration, and let individual host drivers and architecture code fill the members before calling the registration function. There are a number of things we can do based on this: * Use a single memory allocation for the driver-specific structure and the generic PCI host bridge * consolidate the contents of driver-specific structures by moving them into pci_host_bridge * Add a consistent interface for removing a PCI host bridge again when unloading a host driver module * Replace the architecture specific __weak pcibios_*() functions with callbacks in a pci_host_bridge device * Move common boilerplate code from host drivers into the generic function, based on contents of the structure * Extend pci_host_bridge with additional members when needed without having to add arguments to pci_scan_*(). * Move members of struct pci_bus into pci_host_bridge to avoid having lots of identical copies. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Bjorn Helgaas <helgaas@kernel.org>
2016-12-06PCI: Add MCFG quirks for X-Gene host controllerDuc Dang
PCIe controllers in X-Gene SoCs are not ECAM compliant: software needs to configure additional controller's register to address device at bus:dev:function. Add a quirk to discover controller MMIO register space and configure controller registers to select and address the target secondary device. The quirk will only be applied for X-Gene PCIe MCFG table with OEM revison 1, 2, 3 or 4 (PCIe controller v1 and v2 on X-Gene SoCs). Tested-by: Jon Masters <jcm@redhat.com> Signed-off-by: Duc Dang <dhdang@apm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-12-06PCI: Add MCFG quirks for Cavium ThunderX pass1.x host controllerTomasz Nowicki
ThunderX pass1.x requires to emulate the EA headers for on-chip devices hence it has to use custom pci_thunder_ecam_ops for accessing PCI config space (pci-thunder-ecam.c). Add new entries to MCFG quirk array where it can be applied while probing ACPI based PCI host controller. ThunderX pass1.x is using the same way for accessing off-chip devices (so-called PEM) as silicon pass-2.x so we need to add PEM quirk entries too. Quirk is considered for ThunderX silicon pass1.x only which is identified via MCFG revision 2. ThunderX pass 1.x requires the following accessors: NUMA node 0 PCI segments 0- 3: pci_thunder_ecam_ops (MCFG quirk) NUMA node 0 PCI segments 4- 9: thunder_pem_ecam_ops (MCFG quirk) NUMA node 1 PCI segments 10-13: pci_thunder_ecam_ops (MCFG quirk) NUMA node 1 PCI segments 14-19: thunder_pem_ecam_ops (MCFG quirk) [bhelgaas: change Makefile/ifdefs so quirk doesn't depend on CONFIG_PCI_HOST_THUNDER_ECAM] Signed-off-by: Tomasz Nowicki <tn@semihalf.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-12-06PCI: Add MCFG quirks for Cavium ThunderX pass2.x host controllerTomasz Nowicki
ThunderX PCIe controller to off-chip devices (so-called PEM) is not fully compliant with ECAM standard. It uses non-standard configuration space accessors (see thunder_pem_ecam_ops) and custom configuration space granulation (see bus_shift = 24). In order to access configuration space and probe PEM as ACPI-based PCI host controller we need to add MCFG quirk infrastructure. This involves: 1. A new thunder_pem_acpi_init() init function to locate PEM-specific register ranges using ACPI. 2. Export PEM thunder_pem_ecam_ops structure so it is visible to MCFG quirk code. 3. New quirk entries for each PEM segment. Each contains platform IDs, mentioned thunder_pem_ecam_ops and CFG resources. Quirk is considered for ThunderX silicon pass2.x only which is identified via MCFG revision 1. ThunderX pass 2.x requires the following accessors: NUMA Node 0 PCI segments 0- 3: pci_generic_ecam_ops (ECAM-compliant) NUMA Node 0 PCI segments 4- 9: thunder_pem_ecam_ops (MCFG quirk) NUMA Node 1 PCI segments 10-13: pci_generic_ecam_ops (ECAM-compliant) NUMA Node 1 PCI segments 14-19: thunder_pem_ecam_ops (MCFG quirk) [bhelgaas: adapt to use acpi_get_rc_resources(), update Makefile/ifdefs so quirk doesn't depend on CONFIG_PCI_HOST_THUNDER_PEM] Signed-off-by: Tomasz Nowicki <tn@semihalf.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-12-06PCI: thunder-pem: Factor out resource lookupBjorn Helgaas
Pull the register resource lookup out of thunder_pem_init() so we can easily add a corresponding lookup using ACPI. No functional change intended. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-12-06PCI: Add MCFG quirks for HiSilicon Hip05/06/07 host controllersDongdong Liu
The PCIe controller in Hip05/Hip06/Hip07 SoCs is not completely ECAM-compliant. It is non-ECAM only for the RC bus config space; for any other bus underneath the root bus it does support ECAM access. Add specific quirks for PCI config space accessors. This involves: 1. New initialization call hisi_pcie_init() to obtain RC base addresses from PNP0C02 at the root of the ACPI namespace (under \_SB). 2. New entry in common quirk array. [bhelgaas: move to pcie-hisi.c and change Makefile/ifdefs so quirk doesn't depend on CONFIG_PCI_HISI] Signed-off-by: Dongdong Liu <liudongdong3@huawei.com> Signed-off-by: Gabriele Paoloni <gabriele.paoloni@huawei.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-12-06PCI: Add MCFG quirks for Qualcomm QDF2432 host controllerChristopher Covington
The Qualcomm Technologies QDF2432 SoC does not support accesses smaller than 32 bits to the PCI configuration space. Register the appropriate quirk. [bhelgaas: add QCOM_ECAM32 macro, ifdef for ACPI and PCI_QUIRKS] Signed-off-by: Christopher Covington <cov@codeaurora.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-12-06PCI/ACPI: Provide acpi_get_rc_resources() for ARM64 platformDongdong Liu
The acpi_get_rc_resources() is used to get the RC register address that can not be described in MCFG. It takes the _HID & segment to look for and outputs the RC address resource. Use PNP0C02 devices to describe such RC address resource. Use _UID to match segment to tell which root bus the PNP0C02 resource belongs to. [bhelgaas: add dev argument, wrap in #ifdef CONFIG_PCI_QUIRKS] Signed-off-by: Dongdong Liu <liudongdong3@huawei.com> Signed-off-by: Tomasz Nowicki <tn@semihalf.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-12-06Merge branches 'arm/mediatek', 'arm/smmu', 'x86/amd', 's390', 'core' and ↵Joerg Roedel
'arm/exynos' into next
2016-12-01Merge tag 'pci-v4.9-fixes-4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI fixes from Bjorn Helgaas: "PCI fixes: - Fix Read Completion Boundary setting, which fixes a boot failure on IBM x3850 with Mellanox MT27500 ConnectX-3 - Update some MAINTAINERS entries and email addresses" * tag 'pci-v4.9-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: PCI: Set Read Completion Boundary to 128 iff Root Port supports it (_HPX) PCI: Export pcie_find_root_port PCI: designware-plat: Update author email PCI: designware: Change maintainer to Joao Pinto MAINTAINERS: Add devicetree binding to PCI i.MX6 entry MAINTAINERS: Update Richard Zhu's email address
2016-11-30Merge branch 'for-joerg/arm-smmu/updates' of ↵Joerg Roedel
git://git.kernel.org/pub/scm/linux/kernel/git/will/linux into arm/smmu
2016-11-29PCI: Add comments about ROM BAR updatingBjorn Helgaas
pci_update_resource() updates a hardware BAR so its address matches the kernel's struct resource UNLESS it's a disabled ROM BAR. We only update those when we enable the ROM. It's not obvious from the code why ROM BARs should be handled specially. Apparently there are Matrox devices with defective ROM BARs that read as zero when disabled. That means that if pci_enable_rom() reads the disabled BAR, sets PCI_ROM_ADDRESS_ENABLE (without re-inserting the address), and writes it back, it would enable the ROM at address zero. Add comments and references to explain why we can't make the code look more rational. The code changes are from 755528c860b0 ("Ignore disabled ROM resources at setup") and 8085ce084c0f ("[PATCH] Fix PCI ROM mapping"). Link: https://lkml.org/lkml/2005/8/30/138 Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
2016-11-29PCI: Decouple IORESOURCE_ROM_ENABLE and PCI_ROM_ADDRESS_ENABLEBjorn Helgaas
Remove the assumption that IORESOURCE_ROM_ENABLE == PCI_ROM_ADDRESS_ENABLE. PCI_ROM_ADDRESS_ENABLE is the ROM enable bit defined by the PCI spec, so if we're reading or writing a BAR register value, that's what we should use. IORESOURCE_ROM_ENABLE is a corresponding bit in struct resource flags. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
2016-11-29PCI: Remove pci_resource_bar() and pci_iov_resource_bar()Bjorn Helgaas
pci_std_update_resource() only deals with standard BARs, so we don't have to worry about the complications of VF BARs in an SR-IOV capability. Compute the BAR address inline and remove pci_resource_bar(). That makes pci_iov_resource_bar() unused, so remove that as well. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
2016-11-29PCI: Don't update VF BARs while VF memory space is enabledBjorn Helgaas
If we update a VF BAR while it's enabled, there are two potential problems: 1) Any driver that's using the VF has a cached BAR value that is stale after the update, and 2) We can't update 64-bit BARs atomically, so the intermediate state (new lower dword with old upper dword) may conflict with another device, and an access by a driver unrelated to the VF may cause a bus error. Warn about attempts to update VF BARs while they are enabled. This is a programming error, so use dev_WARN() to get a backtrace. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
2016-11-29PCI: Separate VF BAR updates from standard BAR updatesBjorn Helgaas
Previously pci_update_resource() used the same code path for updating standard BARs and VF BARs in SR-IOV capabilities. Split the VF BAR update into a new pci_iov_update_resource() internal interface, which makes it simpler to compute the BAR address (we can get rid of pci_resource_bar() and pci_iov_resource_bar()). This patch: - Renames pci_update_resource() to pci_std_update_resource(), - Adds pci_iov_update_resource(), - Makes pci_update_resource() a wrapper that calls the appropriate one, No functional change intended. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
2016-11-29PCI: hv: Allocate physically contiguous hypercall params bufferLong Li
hv_do_hypercall() assumes that we pass a segment from a physically contiguous buffer. A buffer allocated on the stack may not work if CONFIG_VMAP_STACK=y is set. Use kmalloc() to allocate this buffer. Reported-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: Long Li <longli@microsoft.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: K. Y. Srinivasan <kys@microsoft.com>
2016-11-29ACPI: Implement acpi_dma_configureLorenzo Pieralisi
On DT based systems, the of_dma_configure() API implements DMA configuration for a given device. On ACPI systems an API equivalent to of_dma_configure() is missing which implies that it is currently not possible to set-up DMA operations for devices through the ACPI generic kernel layer. This patch fills the gap by introducing acpi_dma_configure/deconfigure() calls that for now are just wrappers around arch_setup_dma_ops() and arch_teardown_dma_ops() and also updates ACPI and PCI core code to use the newly introduced acpi_dma_configure/acpi_dma_deconfigure functions. Since acpi_dma_configure() is used to configure DMA operations, the function initializes the dma/coherent_dma masks to sane default values if the current masks are uninitialized (also to keep the default values consistent with DT systems) to make sure the device has a complete default DMA set-up. The DMA range size passed to arch_setup_dma_ops() is sized according to the device coherent_dma_mask (starting at address 0x0), mirroring the DT probing path behaviour when a dma-ranges property is not provided for the device being probed; this changes the current arch_setup_dma_ops() call parameters in the ACPI probing case, but since arch_setup_dma_ops() is a NOP on all architectures but ARM/ARM64 this patch does not change the current kernel behaviour on them. Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> [pci] Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Tomasz Nowicki <tn@semihalf.com> Tested-by: Hanjun Guo <hanjun.guo@linaro.org> Tested-by: Tomasz Nowicki <tn@semihalf.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Tomasz Nowicki <tn@semihalf.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-11-29PCI: Update BARs using property bits appropriate for typeBjorn Helgaas
The BAR property bits (0-3 for memory BARs, 0-1 for I/O BARs) are supposed to be read-only, but we do save them in res->flags and include them when updating the BAR. Mask the I/O property bits with ~PCI_BASE_ADDRESS_IO_MASK (0x3) instead of PCI_REGION_FLAG_MASK (0xf) to make it obvious that we can't corrupt bits 2-3 of I/O addresses. Use PCI_ROM_ADDRESS_MASK for ROM BARs. This means we'll only check the top 21 bits (instead of the 28 bits we used to check) of a ROM BAR to see if the update was successful. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-11-28PCI: Ignore BAR updates on virtual functionsBjorn Helgaas
VF BARs are read-only zero, so updating VF BARs will not have any effect. See the SR-IOV spec r1.1, sec 3.4.1.11. We already ignore these updates because of 70675e0b6a1a ("PCI: Don't try to restore VF BARs"); this merely restructures it slightly to make it easier to split updates for standard and SR-IOV BARs. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
2016-11-23PCI: Do any VF BAR updates before enabling the BARsGavin Shan
Previously we enabled VFs and enable their memory space before calling pcibios_sriov_enable(). But pcibios_sriov_enable() may update the VF BARs: for example, on PPC PowerNV we may change them to manage the association of VFs to PEs. Because 64-bit BARs cannot be updated atomically, it's unsafe to update them while they're enabled. The half-updated state may conflict with other devices in the system. Call pcibios_sriov_enable() before enabling the VFs so any BAR updates happen while the VF BARs are disabled. [bhelgaas: changelog] Tested-by: Carol Soto <clsoto@us.ibm.com> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-11-23PCI: iproc: Fix incorrect MSI address alignmentRay Jui
In the code to handle PAXB v2 based MSI steering, the logic aligns the MSI register address to the size of supported inbound mapping range. This is incorrect since it rounds "up" the starting address to the next aligned address, but what we want is the starting address to be rounded "down" to the aligned address. This patch fixes the issue and allows MSI writes to be properly steered to the GIC. Fixes: 4b073155fbd3 ("PCI: iproc: Add support for the next-gen PAXB controller") Signed-off-by: Ray Jui <ray.jui@broadcom.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-11-23PCI: qcom: Add support for MSM8996 PCIe controllerSrinivas Kandagatla
Add support for the MSM8996/APQ8096 PCIe controller. MSM8996 supports Gen 1/2, one lane, 3 PCIe root complexes with support for MSI and legacy interrupts, and it conforms to PCI Express Base 2.1 specification. Add a post_init callback to qcom_pcie_ops, as the PCIe pipe clocks are only setup after the phy is powered on. It also adds an ltssm_enable callback as it is very much different from other supported SoCs in the driver. Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Stanimir Varbanov <svarbanov@mm-sol.com>
2016-11-23PCI: cpqphp: Add missing call to pci_disable_device()Quentin Lambert
Most error branches following the call to pci_enable_device() contain a call to pci_disable_device(). Add these calls where they are missing. This issue was found with Hector. Signed-off-by: Quentin Lambert <lambert.quentin@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-11-23PCI: iproc: Add support for the next-gen PAXB controllerRay Jui
Add support for the next generation of the iProc PAXB host controller, used in Stingray. Signed-off-by: Oza Oza <oza.oza@broadcom.com> Signed-off-by: Ray Jui <ray.jui@broadcom.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Scott Branden <scott.branden@broadcom.com>
2016-11-23PCI: Set Read Completion Boundary to 128 iff Root Port supports it (_HPX)Johannes Thumshirn
Per PCIe spec r3.0, sec 2.3.1.1, the Read Completion Boundary (RCB) determines the naturally aligned address boundaries on which a Read Request may be serviced with multiple Completions: - For a Root Complex, RCB is 64 bytes or 128 bytes This value is reported in the Link Control Register Note: Bridges and Endpoints may implement a corresponding command bit which may be set by system software to indicate the RCB value for the Root Complex, allowing the Bridge/Endpoint to optimize its behavior when the Root Complex’s RCB is 128 bytes. - For all other system elements, RCB is 128 bytes Per sec 7.8.7, if a Root Port only supports a 64-byte RCB, the RCB of all downstream devices must be clear, indicating an RCB of 64 bytes. If the Root Port supports a 128-byte RCB, we may optionally set the RCB of downstream devices so they know they can generate larger Completions. Some BIOSes supply an _HPX that tells us to set RCB, even though the Root Port doesn't have RCB set, which may lead to Malformed TLP errors if the Endpoint generates completions larger than the Root Port can handle. The IBM x3850 X6 with BIOS version -[A8E120CUS-1.30]- 08/22/2016 supplies such an _HPX and a Mellanox MT27500 ConnectX-3 device fails to initialize: mlx4_core 0000:41:00.0: command 0xfff timed out (go bit not cleared) mlx4_core 0000:41:00.0: device is going to be reset mlx4_core 0000:41:00.0: Failed to obtain HW semaphore, aborting mlx4_core 0000:41:00.0: Fail to reset HCA ------------[ cut here ]------------ kernel BUG at drivers/net/ethernet/mellanox/mlx4/catas.c:193! After 6cd33649fa83 ("PCI: Add pci_configure_device() during enumeration") and 7a1562d4f2d0 ("PCI: Apply _HPX Link Control settings to all devices with a link"), we apply _HPX settings to *all* devices, not just those hot-added after boot. Before 7a1562d4f2d0, we didn't touch the Mellanox RCB, and the device worked. After 7a1562d4f2d0, we set its RCB to 128, and it failed. Set the RCB to 128 iff the Root Port supports a 128-byte RCB. Otherwise, set RCB to 64 bytes. This effectively ignores what _HPX tells us about RCB. Note that this change only affects _HPX handling. If we have no _HPX, this does nothing with RCB. [bhelgaas: changelog, clear RCB if not set for Root Port] Fixes: 6cd33649fa83 ("PCI: Add pci_configure_device() during enumeration") Fixes: 7a1562d4f2d0 ("PCI: Apply _HPX Link Control settings to all devices with a link") Link: https://bugzilla.kernel.org/show_bug.cgi?id=187781 Tested-by: Frank Danapfel <fdanapfe@redhat.com> Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Myron Stowe <myron.stowe@redhat.com> CC: stable@vger.kernel.org # v3.18+
2016-11-23PCI: Export pcie_find_root_portJohannes Thumshirn
Export pcie_find_root_port() so we can use it outside of PCIe-AER error injection. Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-11-23PCI: Support INTx masking on ConnectX-4 with firmware x.14.1100+Noa Osherovich
Mellanox devices were marked as having INTx masking ability broken. As a result, the VFIO driver fails to start when more than one device function is passed-through to a VM if both have the same INTx pin. Prior to Connect-IB, Mellanox devices exposed to the operating system one PCI function per all ports. Starting from Connect-IB, the devices are function-per-port. When passing the second function to a VM, VFIO will fail to start. Exclude ConnectX-4, ConnectX4-Lx and Connect-IB from the list of Mellanox devices marked as having broken INTx masking: - ConnectX-4 and ConnectX4-LX firmware version is checked. If INTx masking is supported, we unmark the broken INTx masking. - Connect-IB does not support INTx currently so will not cause any problem. [bhelgaas: call pci_disable_device() always, after iounmap()] Fixes: 11e42532ada3 ("PCI: Assume all Mellanox devices have broken INTx masking") Signed-off-by: Noa Osherovich <noaos@mellanox.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
2016-11-23PCI: Convert Mellanox broken INTx quirks to be for listed devices onlyNoa Osherovich
Change Mellanox's broken_intx_masking() quirk from an "all Mellanox devices" to a quirk for listed devices only. [bhelgaas: remove #defines, reorder to keep other quirks together] Signed-off-by: Noa Osherovich <noaos@mellanox.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
2016-11-23PCI: Convert broken INTx masking quirks from HEADER to FINALNoa Osherovich
Convert all quirk_broken_intx_masking() quirks from HEADER to FINAL. The quirk sets dev->broken_intx_masking, which is only used by pci_intx_mask_supported(), which is not needed until after FINAL quirks have been run. [bhelgaas: changelog] Signed-off-by: Noa Osherovich <noaos@mellanox.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
2016-11-22PCI/xgene-msi: Convert to hotplug state machineSebastian Andrzej Siewior
Install the callbacks via the state machine and let the core invoke the callbacks on the already online CPUs. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: linux-pci@vger.kernel.org Cc: Duc Dang <dhdang@apm.com> Cc: rt@linuxtronix.de Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/20161117183541.8588-8-bigeasy@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-21PCI: Warn on possible RW1C corruption for sub-32 bit config writesBjorn Helgaas
Hardware that supports only 32-bit config writes is not spec-compliant. For example, if software performs a 16-bit write, we must do a 32-bit read, merge in the 16 bits we intend to write, followed by a 32-bit write. If the 16 bits we *don't* intend to write happen to have any RW1C (write-one- to-clear) bits set, we just inadvertently cleared something we shouldn't have. Add a rate-limited warning when we do sub-32 bit config writes. Remove similar probe-time warnings from some of the affected host bridge drivers. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Enthusiastically-Acked-by: Russell King <rmk+kernel@armlinux.org.uk> Acked-by: Shawn Lin <shawn.lin@rock-chips.com> # rockchip Acked-by: Thierry Reding <treding@nvidia.com>
2016-11-21PCI: Create revision file in sysfsEmil Velikov
Currently the revision isn't available via sysfs/libudev thus if one wants to know the value one needs to read through the config file, which can be quite time-consuming because it wakes/powers up the device. There are at least two userspace components which could make use the new file: libpciaccess and libdrm. The former wakes up _every_ PCI device, which can be observed via glxinfo when using Mesa 10.0+ drivers. The latter, in association with Mesa 13.0, can lead to 2-3 second delays while starting firefox, thunderbird or chromium. Link: https://bugs.freedesktop.org/show_bug.cgi?id=98502 Tested-by: Mauro Santos <registo.mailling@gmail.com> Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch CC: Greg KH <gregkh@linuxfoundation.org>
2016-11-17PCI: pciehp: Add runtime PM support for PCIe hotplug portsLukas Wunner
Linux 4.8 added support for runtime suspending PCIe ports to D3hot with commit 006d44e49a25 ("PCI: Add runtime PM support for PCIe ports"), but excluded hotplug ports. Those are now afforded runtime PM by the present commit. Hotplug ports require a few extra considerations: - The configuration space of the port remains accessible in D3hot, so all the functions to read or modify the Slot Status and Slot Control registers need not be modified. Even turning on slot power doesn't seem to require the port to be in D0, at least the PCIe spec doesn't say so and I confirmed that by testing with a Thunderbolt controller. - However D0 is required to access devices on the secondary bus. This happens in pciehp_check_link_status() and pciehp_configure_device() (both called from board_added()) and in pciehp_unconfigure_device() (called from remove_board()), so acquire a runtime PM ref for their invocation. - The hotplug port stays active as long as it has active children. If all hotplugged devices below the port runtime suspend, the port is allowed to runtime suspend as well. Plug and unplug detection continues to work in D3hot. - Hotplug interrupts are delivered in-band, so while the hotplug port itself is allowed to go to D3hot, its parent ports must stay in D0 for interrupts to come through. Add a corresponding restriction to pci_dev_check_d3cold(). - Runtime PM may only be allowed if the hotplug port is handled natively by the OS. On ACPI systems, the port may alternatively be handled by the firmware and things break if the OS puts the port into D3 behind the firmware's back: E.g. Thunderbolt hotplug ports on non-Macs are handled by Intel's firmware in System Management Mode and the firmware is known to access devices on the port's secondary bus without checking first if the port is in D0: https://bugzilla.kernel.org/show_bug.cgi?id=53811 Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> CC: Mika Westerberg <mika.westerberg@linux.intel.com>
2016-11-17ACPI / hotplug / PCI: Make device_is_managed_by_native_pciehp() publicLukas Wunner
We're about to add runtime PM of hotplug ports, but we need to restrict it to ports that are handled natively by the OS: If they're handled by the firmware (which is the case for Thunderbolt on non-Macs), things would break if the OS put the ports into D3hot behind the firmware's back. To determine if a hotplug port is handled natively, one has to walk up from the port to the root bridge and check the cached _OSC Control Field for the value of the "PCI Express Native Hot Plug control" bit. There's already a function to do that, device_is_managed_by_native_pciehp(), but it's private to drivers/pci/hotplug/acpiphp_glue.c and only compiled in if CONFIG_HOTPLUG_PCI_ACPI is enabled. Make it public and move it to drivers/pci/pci-acpi.c, so that it is available in the more general CONFIG_ACPI case. The function contains a check if the device in question is a hotplug port and returns false if it's not. The caller we're going to add doesn't need this as it only calls the function if it actually *is* a hotplug port. Move the check out of the function into the single existing caller. Rename it to pciehp_is_native() and add some kerneldoc and polish. No functional change intended. Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-11-17ACPI / hotplug / PCI: Use cached copy of PCI_EXP_SLTCAP_HPC bitLukas Wunner
We cache the PCI_EXP_SLTCAP_HPC bit in pci_dev->is_hotplug_bridge on device probe, so there's no need to read it again when adding the ACPI hotplug context. Here's the call chain to prove that no ordering issue is introduced: pci_scan_child_bus [drivers/pci/probe.c] pci_scan_slot pci_scan_single_device pci_scan_device pci_setup_device set_pcie_hotplug_bridge [is_hotplug_bridge bit is set here] pci_scan_bridge pci_add_new_bus pci_alloc_child_bus pcibios_add_bus [arch/(x86|arm64|ia64)/...] acpi_pci_add_bus [drivers/pci/pci-acpi.c] acpiphp_enumerate_slots [drivers/pci/hotplug/acpiphp_glue.c] acpiphp_add_context device_is_managed_by_native_pciehp [is_hotplug_bridge bit is queried here] No functional change intended. Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-11-17PCI: Unfold conditions to block runtime PM on PCIe portsLukas Wunner
The conditions to block D3 on parent ports are currently condensed into a single expression in pci_dev_check_d3cold(). Upcoming commits will add further conditions for hotplug ports, making this expression fairly large and impenetrable. Unfold the conditions to maintain readability when they are amended. No functional change intended. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> CC: Mika Westerberg <mika.westerberg@linux.intel.com> CC: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-11-17PCI: Consolidate conditions to allow runtime PM on PCIe portsLukas Wunner
The conditions to allow runtime PM on PCIe ports are currently spread across two different files: The condition relating to hotplug ports is located in portdrv_pci.c whereas all other conditions are located in pci.c. Consolidate all conditions in a single place in pci.c, thus making it easier to follow the logic and amend conditions down the road. Note that the condition relating to hotplug ports is inserted *before* the condition relating to the "pcie_port_pm=force" command line option, so runtime PM is not afforded to hotplug ports even if this option is given. That's exactly how the code behaved up until now. If this is not desired, the ordering of the conditions can simply be reversed. No functional change intended. Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-11-17PCI: Activate runtime PM on a PCIe port only if it can suspendLukas Wunner
Currently pcie_portdrv_probe() activates runtime PM on a PCIe port even if it will never actually suspend because the BIOS is too old or the "pcie_port_pm=off" option was specified on the kernel command line. A few CPU cycles can be saved by not activating runtime PM at all in these cases, because rpm_idle() and rpm_suspend() will bail out right at the beginning when calling rpm_check_suspend_allowed(), instead of carrying out various locking and assignments, invoking rpm_callback(), getting back -EBUSY and rolling everything back. The conditions checked in pci_bridge_d3_possible() are all static, they never change during uptime of the system, hence it's safe to call this to determine if runtime PM should be activated. No functional change intended. Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-11-17PCI: Speed up algorithm in pci_bridge_d3_update()Lukas Wunner
After a device has been added, removed or had its D3cold attributes changed, we recheck whether its parent bridge may runtime suspend to D3hot with pci_bridge_d3_update(). The most naive algorithm would be to iterate over the bridge's children and check if any of them are blocking D3. The function already tries to be a bit smarter than that by first checking the device that was changed. If this device already blocks D3 on the bridge, then walking over all the other children can be skipped. A drawback of this approach is that if the device is *not* blocking D3, it will be checked a second time by pci_walk_bus(). But that's cheap and is outweighed by the performance gain of potentially skipping pci_walk_bus() altogether. The algorithm can be optimized further by taking into account if D3 is currently allowed for the bridge, as shown in the following truth table: (a) remove && bridge_d3: D3 is currently allowed for the bridge and removing one of its children won't change that. No action necessary. (b) remove && !bridge_d3: D3 may now be allowed for the bridge if the removed child was the only one blocking it. Check all its siblings to verify that. (c) !remove && bridge_d3: D3 may now be disallowed but this can only be caused by the added/changed child, not any of its siblings. Check only that single device. (d) !remove && !bridge_d3: D3 may now be allowed for the bridge if the changed child was the only one blocking it. Check all its siblings to verify that. By checking beforehand if the changed child is blocking D3, we may be able to skip checking its siblings. Currently we do not special-case option (a) and in case of option (c) we gratuitously call pci_walk_bus(). Speed up the algorithm by adding these optimizations. Reword the comments a bit in an attempt to improve clarity. No functional change intended. Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-11-17PCI: Autosense device removal in pci_bridge_d3_update()Lukas Wunner
The algorithm to update the flag indicating whether a bridge may go to D3 makes a few optimizations based on whether the update was caused by the removal of a device on the one hand, versus the addition of a device or the change of its D3cold flags on the other hand. The information whether the update pertains to a removal is currently passed in by the caller, but the function may as well determine that itself by examining the device in question, thereby allowing for a considerable simplification and reduction of the code. Out of several options to determine removal, I've chosen the function device_is_registered() because it's cheap: It merely returns the dev->kobj.state_in_sysfs flag. That flag is set through device_add() when the root bus is scanned and cleared through device_remove(). The call to pci_bridge_d3_update() happens after each of these calls, respectively, so the ordering is correct. No functional change intended. Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-11-17PCI: Don't acquire ref on parent in pci_bridge_d3_update()Lukas Wunner
This function is always called with an existing pci_dev struct, which holds a reference on the pci_bus struct it resides on, which in turn holds a reference on pci_bus->bridge, which is the pci_dev's parent. Hence there's no need to acquire an additional ref on the parent. More specifically, the pci_dev exists until pci_destroy_dev() drops the final reference on it, so all calls to pci_bridge_d3_update() must be finished before that. It is arguably the caller's responsibility to ensure that it doesn't call pci_bridge_d3_update() with a pci_dev that might suddenly disappear, but in any case the existing callers are all safe: - The call in pci_destroy_dev() happens before the call to put_device(). - The call in pci_bus_add_device() is synchronized with pci_destroy_dev() using pci_lock_rescan_remove(). - The calls to pci_d3cold_disable() from the xhci and nouveau drivers are safe because a ref on the pci_dev is held as long as it's bound to a driver. - The calls to pci_d3cold_enable() / pci_d3cold_disable() when modifying the sysfs "d3cold_allowed" entry are also safe because kernfs_drain() waits for existing sysfs users to finish before removing the entry, and pci_destroy_dev() is called way after that. No functional change intended. Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-11-17PCI: iproc: Add inbound DMA mapping supportRay Jui
Add support for inbound DMA mapping. The range of the inbound mapping is configured by the optional device tree property 'dma-ranges'. While inbound mapping is done automatically in the ASIC on most iProc-based SoCs, newer ASICs (e.g., Stingray) require inbound mapping to be configured explicitly in software. [bhelgaas: fold in fixes to avoid 32-bit division in iproc_pcie_ib_write() and uninitialized return value in iproc_pcie_setup_ib() from Arnd Bergmann <arnd@arndb.de>] Signed-off-by: Oza Oza <oza.oza@broadcom.com> Signed-off-by: Ray Jui <ray.jui@broadcom.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Scott Branden <scott.branden@broadcom.com>
2016-11-17PCI: iproc: Make outbound mapping code more genericRay Jui
Improve the iProc PCIe outbound mapping code by making it more generic and removing redundant device tree properties 'brcm,pcie-ob-window-size' and 'brcm,pcie-ob-oarr-size'. The driver is still backward compatible to device tree binaries with the two properties specified. The driver now automatically configures the correct mapping window size and number of mapping windows based on the value of device tree property 'ranges' and the capability of of the iProc PCIe controller. Signed-off-by: Oza Oza <oza.oza@broadcom.com> Signed-off-by: Ray Jui <ray.jui@broadcom.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Scott Branden <scott.branden@broadcom.com>
2016-11-17PCI: iproc: Add PAXC v2 supportRay Jui
Add support for the second generation of the iProc PCIe PAXC host controller. Signed-off-by: Ray Jui <ray.jui@broadcom.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Anup Patel <anup.patel@broadcom.com> Reviewed-by: Scott Branden <scott.branden@broadcom.com>
2016-11-16PCI: hv: Delete the device earlier from hbus->children for hot-removeDexuan Cui
After we send a PCI_EJECTION_COMPLETE message to the host, the host will immediately send us a PCI_BUS_RELATIONS message with relations->device_count == 0, so pci_devices_present_work(), running on another thread, can find the being-ejected device, mark the hpdev->reported_missing to true, and run list_move_tail()/list_del() for the device -- this races hv_eject_device_work() -> list_del(). Move the list_del() in hv_eject_device_work() to an earlier place, i.e., before we send PCI_EJECTION_COMPLETE, so later the pci_devices_present_work() can't see the device. Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Jake Oshins <jakeo@microsoft.com> Acked-by: K. Y. Srinivasan <kys@microsoft.com> CC: Haiyang Zhang <haiyangz@microsoft.com> CC: Vitaly Kuznetsov <vkuznets@redhat.com>