summaryrefslogtreecommitdiff
path: root/arch/powerpc/include
AgeCommit message (Collapse)Author
2014-01-15powerpc/powernv: Call OPAL sync before kexec'ingVasant Hegde
Its possible that OPAL may be writing to host memory during kexec (like dump retrieve scenario). In this situation we might end up corrupting host memory. This patch makes OPAL sync call to make sure OPAL stops writing to host memory before kexec'ing. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-01-15powerpc/eeh: Handle multiple EEH errorsGavin Shan
For one PCI error relevant OPAL event, we possibly have multiple EEH errors for that. For example, multiple frozen PEs detected on different PHBs. Unfortunately, we didn't cover the case. The patch enumarates the return value from eeh_ops::next_error() and change eeh_handle_special_event() and eeh_ops::next_error() to handle all existing EEH errors. As Ben pointed out, we needn't list_for_each_entry_safe() since we are not deleting any PHB from the hose_list and the EEH serialized lock should be held while purging EEH events. The patch covers those suggestions as well. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-01-15powerpc/thp: Fix crash on mremapAneesh Kumar K.V
This patch fix the below crash NIP [c00000000004cee4] .__hash_page_thp+0x2a4/0x440 LR [c0000000000439ac] .hash_page+0x18c/0x5e0 ... Call Trace: [c000000736103c40] [00001ffffb000000] 0x1ffffb000000(unreliable) [437908.479693] [c000000736103d50] [c0000000000439ac] .hash_page+0x18c/0x5e0 [437908.479699] [c000000736103e30] [c00000000000924c] .do_hash_page+0x4c/0x58 On ppc64 we use the pgtable for storing the hpte slot information and store address to the pgtable at a constant offset (PTRS_PER_PMD) from pmd. On mremap, when we switch the pmd, we need to withdraw and deposit the pgtable again, so that we find the pgtable at PTRS_PER_PMD offset from new pmd. We also want to move the withdraw and deposit before the set_pmd so that, when page fault find the pmd as trans huge we can be sure that pgtable can be located at the offset. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-01-15Merge remote-tracking branch 'scott/next' into nextBenjamin Herrenschmidt
Freescale updates from Scott: << Highlights include 32-bit booke relocatable support, e6500 hardware tablewalk support, various e500 SPE fixes, some new/revived boards, and e6500 deeper idle and altivec powerdown modes. >>
2014-01-15powerpc: Don't corrupt transactional state when using FP/VMX in kernelPaul Mackerras
Currently, when we have a process using the transactional memory facilities on POWER8 (that is, the processor is in transactional or suspended state), and the process enters the kernel and the kernel then uses the floating-point or vector (VMX/Altivec) facility, we end up corrupting the user-visible FP/VMX/VSX state. This happens, for example, if a page fault causes a copy-on-write operation, because the copy_page function will use VMX to do the copy on POWER8. The test program below demonstrates the bug. The bug happens because when FP/VMX state for a transactional process is stored in the thread_struct, we store the checkpointed state in .fp_state/.vr_state and the transactional (current) state in .transact_fp/.transact_vr. However, when the kernel wants to use FP/VMX, it calls enable_kernel_fp() or enable_kernel_altivec(), which saves the current state in .fp_state/.vr_state. Furthermore, when we return to the user process we return with FP/VMX/VSX disabled. The next time the process uses FP/VMX/VSX, we don't know which set of state (the current register values, .fp_state/.vr_state, or .transact_fp/.transact_vr) we should be using, since we have no way to tell if we are still in the same transaction, and if not, whether the previous transaction succeeded or failed. Thus it is necessary to strictly adhere to the rule that if FP has been enabled at any point in a transaction, we must keep FP enabled for the user process with the current transactional state in the FP registers, until we detect that it is no longer in a transaction. Similarly for VMX; once enabled it must stay enabled until the process is no longer transactional. In order to keep this rule, we add a new thread_info flag which we test when returning from the kernel to userspace, called TIF_RESTORE_TM. This flag indicates that there is FP/VMX/VSX state to be restored before entering userspace, and when it is set the .tm_orig_msr field in the thread_struct indicates what state needs to be restored. The restoration is done by restore_tm_state(). The TIF_RESTORE_TM bit is set by new giveup_fpu/altivec_maybe_transactional helpers, which are called from enable_kernel_fp/altivec, giveup_vsx, and flush_fp/altivec_to_thread instead of giveup_fpu/altivec. The other thing to be done is to get the transactional FP/VMX/VSX state from .fp_state/.vr_state when doing reclaim, if that state has been saved there by giveup_fpu/altivec_maybe_transactional. Having done this, we set the FP/VMX bit in the thread's MSR after reclaim to indicate that that part of the state is now valid (having been reclaimed from the processor's checkpointed state). Finally, in the signal handling code, we move the clearing of the transactional state bits in the thread's MSR a bit earlier, before calling flush_fp_to_thread(), so that we don't unnecessarily set the TIF_RESTORE_TM bit. This is the test program: /* Michael Neuling 4/12/2013 * * See if the altivec state is leaked out of an aborted transaction due to * kernel vmx copy loops. * * gcc -m64 htm_vmxcopy.c -o htm_vmxcopy * */ /* We don't use all of these, but for reference: */ int main(int argc, char *argv[]) { long double vecin = 1.3; long double vecout; unsigned long pgsize = getpagesize(); int i; int fd; int size = pgsize*16; char tmpfile[] = "/tmp/page_faultXXXXXX"; char buf[pgsize]; char *a; uint64_t aborted = 0; fd = mkstemp(tmpfile); assert(fd >= 0); memset(buf, 0, pgsize); for (i = 0; i < size; i += pgsize) assert(write(fd, buf, pgsize) == pgsize); unlink(tmpfile); a = mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0); assert(a != MAP_FAILED); asm __volatile__( "lxvd2x 40,0,%[vecinptr] ; " // set 40 to initial value TBEGIN "beq 3f ;" TSUSPEND "xxlxor 40,40,40 ; " // set 40 to 0 "std 5, 0(%[map]) ;" // cause kernel vmx copy page TABORT TRESUME TEND "li %[res], 0 ;" "b 5f ;" "3: ;" // Abort handler "li %[res], 1 ;" "5: ;" "stxvd2x 40,0,%[vecoutptr] ; " : [res]"=r"(aborted) : [vecinptr]"r"(&vecin), [vecoutptr]"r"(&vecout), [map]"r"(a) : "memory", "r0", "r3", "r4", "r5", "r6", "r7"); if (aborted && (vecin != vecout)){ printf("FAILED: vector state leaked on abort %f != %f\n", (double)vecin, (double)vecout); exit(1); } munmap(a, size); close(fd); printf("PASSED!\n"); return 0; } Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-01-15powerpc: Reclaim two unused thread_info flag bitsPaul Mackerras
TIF_PERFMON_WORK and TIF_PERFMON_CTXSW are completely unused. They appear to be related to the old perfmon2 code, which has been superseded by the perf_event infrastructure. This removes their definitions so that the bits can be used for other purposes. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-01-15Move precessing of MCE queued event out from syscall exit path.Mahesh Salgaonkar
Huge Dickins reported an issue that b5ff4211a829 "powerpc/book3s: Queue up and process delayed MCE events" breaks the PowerMac G5 boot. This patch fixes it by moving the mce even processing away from syscall exit, which was wrong to do that in first place, and using irq work framework to delay processing of mce event. Reported-by: Hugh Dickins <hughd@google.com Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-01-15powerpc: Fix the setup of CPU-to-Node mappings during CPU onlineSrivatsa S. Bhat
On POWER platforms, the hypervisor can notify the guest kernel about dynamic changes in the cpu-numa associativity (VPHN topology update). Hence the cpu-to-node mappings that we got from the firmware during boot, may no longer be valid after such updates. This is handled using the arch_update_cpu_topology() hook in the scheduler, and the sched-domains are rebuilt according to the new mappings. But unfortunately, at the moment, CPU hotplug ignores these updated mappings and instead queries the firmware for the cpu-to-numa relationships and uses them during CPU online. So the kernel can end up assigning wrong NUMA nodes to CPUs during subsequent CPU hotplug online operations (after booting). Further, a particularly problematic scenario can result from this bug: On POWER platforms, the SMT mode can be switched between 1, 2, 4 (and even 8) threads per core. The switch to Single-Threaded (ST) mode is performed by offlining all except the first CPU thread in each core. Switching back to SMT mode involves onlining those other threads back, in each core. Now consider this scenario: 1. During boot, the kernel gets the cpu-to-node mappings from the firmware and assigns the CPUs to NUMA nodes appropriately, during CPU online. 2. Later on, the hypervisor updates the cpu-to-node mappings dynamically and communicates this update to the kernel. The kernel in turn updates its cpu-to-node associations and rebuilds its sched domains. Everything is fine so far. 3. Now, the user switches the machine from SMT to ST mode (say, by running ppc64_cpu --smt=1). This involves offlining all except 1 thread in each core. 4. The user then tries to switch back from ST to SMT mode (say, by running ppc64_cpu --smt=4), and this involves onlining those threads back. Since CPU hotplug ignores the new mappings, it queries the firmware and tries to associate the newly onlined sibling threads to the old NUMA nodes. This results in sibling threads within the same core getting associated with different NUMA nodes, which is incorrect. The scheduler's build-sched-domains code gets thoroughly confused with this and enters an infinite loop and causes soft-lockups, as explained in detail in commit 3be7db6ab (powerpc: VPHN topology change updates all siblings). So to fix this, use the numa_cpu_lookup_table to remember the updated cpu-to-node mappings, and use them during CPU hotplug online operations. Further, we also need to ensure that all threads in a core are assigned to a common NUMA node, irrespective of whether all those threads were online during the topology update. To achieve this, we take care not to use cpu_sibling_mask() since it is not hotplug invariant. Instead, we use cpu_first_sibling_thread() and set up the mappings manually using the 'threads_per_core' value for that particular platform. This helps us ensure that we don't hit this bug with any combination of CPU hotplug and SMT mode switching. Cc: stable@vger.kernel.org Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-01-15powerpc/eeh: Hotplug improvementGavin Shan
When EEH error comes to one specific PCI device before its driver is loaded, we will apply hotplug to recover the error. During the plug time, the PCI device will be probed and its driver is loaded. Then we wrongly calls to the error handlers if the driver supports EEH explicitly. The patch intends to fix by introducing flag EEH_DEV_NO_HANDLER and set it before we remove the PCI device. In turn, we can avoid wrongly calls the error handlers of the PCI device after its driver loaded. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-01-15powerpc/eeh: Call opal_pci_reinit() on powernv for restoring config spaceGavin Shan
The patch implements the EEH operation backend restore_config() for PowerNV platform. That relies on OPAL API opal_pci_reinit() where we reinitialize the error reporting properly after PE or PHB reset. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-01-15powerpc/eeh: Add restore_config operationGavin Shan
After reset on the specific PE or PHB, we never configure AER correctly on PowerNV platform. We needn't care it on pSeries platform. The patch introduces additional EEH operation eeh_ops:: restore_config() so that we have chance to configure AER correctly for PowerNV platform. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-01-15powerpc: Delete non-required instances of include <linux/init.h>Paul Gortmaker
None of these files are actually using any __init type directives and hence don't need to include <linux/init.h>. Most are just a left over from __devinit and __cpuinit removal, or simply due to code getting copied from one driver to the next. The one instance where we add an include for init.h covers off a case where that file was implicitly getting it from another header which itself didn't need it. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-01-13Merge tag 'v3.13-rc8' into core/lockingIngo Molnar
Refresh the tree with the latest fixes, before applying new changes. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-12powerpc/512x: clk: support MPC5121/5123/5125 SoC variantsGerhard Sittig
improve the common clock support code for MPC512x - expand the CCM register set declaration with MPC5125 related registers (which reside in the previously "reserved" area) - tell the MPC5121, MPC5123, and MPC5125 SoC variants apart, and derive the availability of components and their clocks from the detected SoC (MBX, AXE, VIU, SPDIF, PATA, SATA, PCI, second FEC, second SDHC, number of PSC components, type of NAND flash controller, interpretation of the CPMF bitfield, PSC/CAN mux0 stage input clocks, output clocks on SoC pins) - add backwards compatibility (allow operation against a device tree which lacks clock related specs) for MPC5125 FECs, too telling SoC variants apart and adjusting the clock tree's generation occurs at runtime, a common generic binary supports all of the chips the MPC5125 approach to the NFC clock (one register with two counters for the high and low periods of the clock) is not implemented, as there are no users and there is no common implementation which supports this kind of clock -- the new implementation would be unused and could not get verified, so it shall wait until there is demand Signed-off-by: Gerhard Sittig <gsi@denx.de> Acked-by: Mike Turquette <mturquette@linaro.org> Signed-off-by: Anatolij Gustschin <agust@denx.de>
2014-01-12clk: mpc5xxx: switch to COMMON_CLK, retire PPC_CLOCKGerhard Sittig
the setup before the change was - arch/powerpc/Kconfig had the PPC_CLOCK option, off by default - depending on the PPC_CLOCK option the arch/powerpc/kernel/clock.c file was built, which implements the clk.h API but always returns -ENOSYS unless a platform registers specific callbacks - the MPC52xx platform selected PPC_CLOCK but did not register any callbacks, thus all clk.h API calls keep resulting in -ENOSYS errors (which is OK, all peripheral drivers deal with the situation) - the MPC512x platform selected PPC_CLOCK and registered specific callbacks implemented in arch/powerpc/platforms/512x/clock.c, thus provided real support for the clock API - no other powerpc platform did select PPC_CLOCK the situation after the change is - the MPC512x platform implements the COMMON_CLK interface, and thus the PPC_CLOCK approach in arch/powerpc/platforms/512x/clock.c has become obsolete - the MPC52xx platform still lacks genuine support for the clk.h API while this is not a change against the previous situation (the error code returned from COMMON_CLK stubs differs but every call still results in an error) - with all references gone, the arch/powerpc/kernel/clock.c wrapper and the PPC_CLOCK option have become obsolete, as did the clk_interface.h header file the switch from PPC_CLOCK to COMMON_CLK is done for all platforms within the same commit such that multiplatform kernels (the combination of 512x and 52xx within one executable) keep working Cc: Mike Turquette <mturquette@linaro.org> Cc: Anatolij Gustschin <agust@denx.de> Cc: linux-arm-kernel@lists.infradead.org Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Gerhard Sittig <gsi@denx.de> Signed-off-by: Anatolij Gustschin <agust@denx.de>
2014-01-12arch: Introduce smp_load_acquire(), smp_store_release()Peter Zijlstra
A number of situations currently require the heavyweight smp_mb(), even though there is no need to order prior stores against later loads. Many architectures have much cheaper ways to handle these situations, but the Linux kernel currently has no portable way to make use of them. This commit therefore supplies smp_load_acquire() and smp_store_release() to remedy this situation. The new smp_load_acquire() primitive orders the specified load against any subsequent reads or writes, while the new smp_store_release() primitive orders the specifed store against any prior reads or writes. These primitives allow array-based circular FIFOs to be implemented without an smp_mb(), and also allow a theoretical hole in rcu_assign_pointer() to be closed at no additional expense on most architectures. In addition, the RCU experience transitioning from explicit smp_read_barrier_depends() and smp_wmb() to rcu_dereference() and rcu_assign_pointer(), respectively resulted in substantial improvements in readability. It therefore seems likely that replacing other explicit barriers with smp_load_acquire() and smp_store_release() will provide similar benefits. It appears that roughly half of the explicit barriers in core kernel code might be so replaced. [Changelog by PaulMck] Reviewed-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Will Deacon <will.deacon@arm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Michael Ellerman <michael@ellerman.id.au> Cc: Michael Neuling <mikey@neuling.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Victor Kaplansky <VICTORK@il.ibm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Oleg Nesterov <oleg@redhat.com> Link: http://lkml.kernel.org/r/20131213150640.908486364@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-10powerpc/85xx: handle the eLBC error interrupt if it exists in dtsShaohui Xie
On P1020, P1021, P1022, and P1023, eLBC event interrupts are routed to internal interrupt 3 while ELBC error interrupts are routed to internal interrupt 0. We need to call request_irq for each. Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com> Signed-off-by: Wang Dongsheng <dongsheng.wang@freescale.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org> [scottwood@freescale.com: reworded commit message and fixed author] Signed-off-by: Scott Wood <scottwood@freescale.com>
2014-01-09powerpc/e6500: TLB miss handler with hardware tablewalk supportScott Wood
There are a few things that make the existing hw tablewalk handlers unsuitable for e6500: - Indirect entries go in TLB1 (though the resulting direct entries go in TLB0). - It has threads, but no "tlbsrx." -- so we need a spinlock and a normal "tlbsx". Because we need this lock, hardware tablewalk is mandatory on e6500 unless we want to add spinlock+tlbsx to the normal bolted TLB miss handler. - TLB1 has no HES (nor next-victim hint) so we need software round robin (TODO: integrate this round robin data with hugetlb/KVM) - The existing tablewalk handlers map half of a page table at a time, because IBM hardware has a fixed 1MiB indirect page size. e6500 has variable size indirect entries, with a minimum of 2MiB. So we can't do the half-page indirect mapping, and even if we could it would be less efficient than mapping the full page. - Like on e5500, the linear mapping is bolted, so we don't need the overhead of supporting nested tlb misses. Note that hardware tablewalk does not work in rev1 of e6500. We do not expect to support e6500 rev1 in mainline Linux. Signed-off-by: Scott Wood <scottwood@freescale.com> Cc: Mihai Caraman <mihai.caraman@freescale.com>
2014-01-09powerpc: introduce macro LOAD_REG_ADDR_PICKevin Hao
This is used to get the address of a variable when the kernel is not running at the linked or relocated address. Signed-off-by: Kevin Hao <haokexin@gmail.com> Signed-off-by: Scott Wood <scottwood@freescale.com>
2014-01-09KVM: PPC: Unify kvmppc_get_last_inst and scAlexander Graf
We had code duplication between the inline functions to get our last instruction on normal interrupts and system call interrupts. Unify both helper functions towards a single implementation. Signed-off-by: Alexander Graf <agraf@suse.de>
2014-01-09kvm: powerpc: use caching attributes as per linux pteBharat Bhushan
KVM uses same WIM tlb attributes as the corresponding qemu pte. For this we now search the linux pte for the requested page and get these cache caching/coherency attributes from pte. Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com> Reviewed-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-01-09kvm: powerpc: define a linux pte lookup functionBharat Bhushan
We need to search linux "pte" to get "pte" attributes for setting TLB in KVM. This patch defines a lookup_linux_ptep() function which returns pte pointer. Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com> Reviewed-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-01-09KVM: PPC: Store FP/VSX/VMX state in thread_fp/vr_state structuresPaul Mackerras
This uses struct thread_fp_state and struct thread_vr_state to store the floating-point, VMX/Altivec and VSX state, rather than flat arrays. This makes transferring the state to/from the thread_struct simpler and allows us to unify the get/set_one_reg implementations for the VSX registers. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-01-09KVM: PPC: Use load_fp/vr_state rather than load_up_fpu/altivecPaul Mackerras
The load_up_fpu and load_up_altivec functions were never intended to be called from C, and do things like modifying the MSR value in their callers' stack frames, which are assumed to be interrupt frames. In addition, on 32-bit Book S they require the MMU to be off. This makes KVM use the new load_fp_state() and load_vr_state() functions instead of load_up_fpu/altivec. This means we can remove the assembler glue in book3s_rmhandlers.S, and potentially fixes a bug on Book E, where load_up_fpu was called directly from C. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-01-09kvm/powerpc: move kvm_hypercall0() and friends to epapr_hypercall0()Bharat Bhushan
kvm_hypercall0() and friends have nothing KVM specific so moved to epapr_hypercall0() and friends. Also they are moved from arch/powerpc/include/asm/kvm_para.h to arch/powerpc/include/asm/epapr_hcalls.h Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-01-09kvm/powerpc: rename kvm_hypercall() to epapr_hypercall()Bharat Bhushan
kvm_hypercall() have nothing KVM specific, so renamed to epapr_hypercall(). Also this in moved to arch/powerpc/include/asm/epapr_hcalls.h Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-01-07powerpc/fsl: add E6500 PVR and SPRN_PWRMGTCR0 defineWang Dongsheng
E6500 PVR and SPRN_PWRMGTCR0 will be used in subsequent pw20/altivec idle patches. Signed-off-by: Wang Dongsheng <dongsheng.wang@freescale.com> Signed-off-by: Scott Wood <scottwood@freescale.com>
2014-01-07powerpc: fix exception clearing in e500 SPE float emulationJoseph Myers
The e500 SPE floating-point emulation code clears existing exceptions (__FPU_FPSCR &= ~FP_EX_MASK;) before ORing in the exceptions from the emulated operation. However, these exception bits are the "sticky", cumulative exception bits, and should only be cleared by the user program setting SPEFSCR, not implicitly by any floating-point instruction (whether executed purely by the hardware or emulated). The spurious clearing of these bits shows up as missing exceptions in glibc testing. Fixing this, however, is not as simple as just not clearing the bits, because while the bits may be from previous floating-point operations (in which case they should not be cleared), the processor can also set the sticky bits itself before the interrupt for an exception occurs, and this can happen in cases when IEEE 754 semantics are that the sticky bit should not be set. Specifically, the "invalid" sticky bit is set in various cases with non-finite operands, where IEEE 754 semantics do not involve raising such an exception, and the "underflow" sticky bit is set in cases of exact underflow, whereas IEEE 754 semantics are that this flag is set only for inexact underflow. Thus, for correct emulation the kernel needs to know the setting of these two sticky bits before the instruction being emulated. When a floating-point operation raises an exception, the kernel can note the state of the sticky bits immediately afterwards. Some <fenv.h> functions that affect the state of these bits, such as fesetenv and feholdexcept, need to use prctl with PR_GET_FPEXC and PR_SET_FPEXC anyway, and so it is natural to record the state of those bits during that call into the kernel and so avoid any need for a separate call into the kernel to inform it of a change to those bits. Thus, the interface I chose to use (in this patch and the glibc port) is that one of those prctl calls must be made after any userspace change to those sticky bits, other than through a floating-point operation that traps into the kernel anyway. feclearexcept and fesetexceptflag duly make those calls, which would not be required were it not for this issue. The previous EGLIBC port, and the uClibc code copied from it, is fundamentally broken as regards any use of prctl for floating-point exceptions because it didn't use the PR_FP_EXC_SW_ENABLE bit in its prctl calls (and did various worse things, such as passing a pointer when prctl expected an integer). If you avoid anything where prctl is used, the clearing of sticky bits still means it will never give anything approximating correct exception semantics with existing kernels. I don't believe the patch makes things any worse for existing code that doesn't try to inform the kernel of changes to sticky bits - such code may get incorrect exceptions in some cases, but it would have done so anyway in other cases. Signed-off-by: Joseph Myers <joseph@codesourcery.com> Signed-off-by: Scott Wood <scottwood@freescale.com>
2014-01-07powerpc/booke64: Add LRAT error exception handlerMihai Caraman
LRAT (Logical to Real Address Translation) present in MMU v2 provides hardware translation from a logical page number (LPN) to a real page number (RPN) when tlbwe is executed by a guest or when a page table translation occurs from a guest virtual address. Add LRAT error exception handler to Booke3E 64-bit kernel and the basic KVM handler to avoid build breakage. This is a prerequisite for KVM LRAT support that will follow. Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> Signed-off-by: Scott Wood <scottwood@freescale.com>
2014-01-06Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_pf.c net/ipv6/ip6_tunnel.c net/ipv6/ip6_vti.c ipv6 tunnel statistic bug fixes conflicting with consolidation into generic sw per-cpu net stats. qlogic conflict between queue counting bug fix and the addition of multiple MAC address support. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-30Merge branch 'merge' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull powerpc fixes from Ben Herrenschmidt: "A bit more endian problems found during testing of 3.13 and a few other simple fixes and regressions fixes" * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: powerpc: Fix alignment of secondary cpu spin vars powerpc: Align p_end powernv/eeh: Add buffer for P7IOC hub error data powernv/eeh: Fix possible buffer overrun in ioda_eeh_phb_diag() powerpc: Make 64-bit non-VMX __copy_tofrom_user bi-endian powerpc: Make unaligned accesses endian-safe for powerpc powerpc: Fix bad stack check in exception entry powerpc/512x: dts: disable MPC5125 usb module powerpc/512x: dts: remove misplaced IRQ spec from 'soc' node (5125)
2013-12-30Merge branch 'merge' into nextBenjamin Herrenschmidt
Merge a pile of fixes that went into the "merge" branch (3.13-rc's) such as Anton Little Endian fixes. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-12-30powerpc/iommu: Update the generic code to use dynamic iommu page sizesAlistair Popple
This patch updates the generic iommu backend code to use the it_page_shift field to determine the iommu page size instead of using hardcoded values. Signed-off-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-12-30powerpc/iommu: Add it_page_shift field to determine iommu page sizeAlistair Popple
This patch adds a it_page_shift field to struct iommu_table and initiliases it to 4K for all platforms. Signed-off-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-12-30powerpc/iommu: Update constant names to reflect their hardcoded page sizeAlistair Popple
The powerpc iommu uses a hardcoded page size of 4K. This patch changes the name of the IOMMU_PAGE_* macros to reflect the hardcoded values. A future patch will use the existing names to support dynamic page sizes. Signed-off-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-12-30powerpc: Make unaligned accesses endian-safe for powerpcRajesh B Prathipati
The generic put_unaligned/get_unaligned macros were made endian-safe by calling the appropriate endian dependent macros based on the endian type of the powerpc processor. Signed-off-by: Rajesh B Prathipati <rprathip@linux.vnet.ibm.com> Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-12-30powerpc: Fix bad stack check in exception entryMichael Neuling
In EXCEPTION_PROLOG_COMMON() we check to see if the stack pointer (r1) is valid when coming from the kernel. If it's not valid, we die but with a nice oops message. Currently we allocate a stack frame (subtract INT_FRAME_SIZE) before we check to see if the stack pointer is negative. Unfortunately, this won't detect a bad stack where r1 is less than INT_FRAME_SIZE. This patch fixes the check to compare the modified r1 with -INT_FRAME_SIZE. With this, bad kernel stack pointers (including NULL pointers) are correctly detected again. Kudos to Paulus for finding this. Signed-off-by: Michael Neuling <mikey@neuling.org> cc: stable@vger.kernel.org Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-12-20Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fixes from Paolo Bonzini: "The PPC folks had a large amount of changes queued for 3.13, and now they are fixing the bugs" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: PPC: Book3S HV: Don't drop low-order page address bits powerpc: book3s: kvm: Don't abuse host r2 in exit path powerpc/kvm/booke: Fix build break due to stack frame size warning KVM: PPC: Book3S: PR: Enable interrupts earlier KVM: PPC: Book3S: PR: Make svcpu -> vcpu store preempt savvy KVM: PPC: Book3S: PR: Export kvmppc_copy_to|from_svcpu KVM: PPC: Book3S: PR: Don't clobber our exit handler id powerpc: kvm: fix rare but potential deadlock scene KVM: PPC: Book3S HV: Take SRCU read lock around kvm_read_guest() call KVM: PPC: Book3S HV: Make tbacct_lock irq-safe KVM: PPC: Book3S HV: Refine barriers in guest entry/exit KVM: PPC: Book3S HV: Fix physical address calculations
2013-12-20Merge tag 'signed-for-3.13' of git://github.com/agraf/linux-2.6 into kvm-masterPaolo Bonzini
Patch queue for 3.13 - 2013-12-18 This fixes some grave issues we've only found after 3.13-rc1: - Make the modularized HV/PR book3s kvm work well as modules - Fix some race conditions - Fix compilation with certain compilers (booke) - Fix THP for book3s_hv - Fix preemption for book3s_pr Alexander Graf (4): KVM: PPC: Book3S: PR: Don't clobber our exit handler id KVM: PPC: Book3S: PR: Export kvmppc_copy_to|from_svcpu KVM: PPC: Book3S: PR: Make svcpu -> vcpu store preempt savvy KVM: PPC: Book3S: PR: Enable interrupts earlier Aneesh Kumar K.V (1): powerpc: book3s: kvm: Don't abuse host r2 in exit path Paul Mackerras (5): KVM: PPC: Book3S HV: Fix physical address calculations KVM: PPC: Book3S HV: Refine barriers in guest entry/exit KVM: PPC: Book3S HV: Make tbacct_lock irq-safe KVM: PPC: Book3S HV: Take SRCU read lock around kvm_read_guest() call KVM: PPC: Book3S HV: Don't drop low-order page address bits Scott Wood (1): powerpc/kvm/booke: Fix build break due to stack frame size warning pingfan liu (1): powerpc: kvm: fix rare but potential deadlock scene
2013-12-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/ethernet/intel/i40e/i40e_main.c drivers/net/macvtap.c Both minor merge hassles, simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-18powerpc: book3s: kvm: Don't abuse host r2 in exit pathAneesh Kumar K.V
We don't use PACATOC for PR. Avoid updating HOST_R2 with PR KVM mode when both HV and PR are enabled in the kernel. Without this we get the below crash (qemu) Unable to handle kernel paging request for data at address 0xffffffffffff8310 Faulting instruction address: 0xc00000000001d5a4 cpu 0x2: Vector: 300 (Data Access) at [c0000001dc53aef0] pc: c00000000001d5a4: .vtime_delta.isra.1+0x34/0x1d0 lr: c00000000001d760: .vtime_account_system+0x20/0x60 sp: c0000001dc53b170 msr: 8000000000009032 dar: ffffffffffff8310 dsisr: 40000000 current = 0xc0000001d76c62d0 paca = 0xc00000000fef1100 softe: 0 irq_happened: 0x01 pid = 4472, comm = qemu-system-ppc enter ? for help [c0000001dc53b200] c00000000001d760 .vtime_account_system+0x20/0x60 [c0000001dc53b290] c00000000008d050 .kvmppc_handle_exit_pr+0x60/0xa50 [c0000001dc53b340] c00000000008f51c kvm_start_lightweight+0xb4/0xc4 [c0000001dc53b510] c00000000008cdf0 .kvmppc_vcpu_run_pr+0x150/0x2e0 [c0000001dc53b9e0] c00000000008341c .kvmppc_vcpu_run+0x2c/0x40 [c0000001dc53ba50] c000000000080af4 .kvm_arch_vcpu_ioctl_run+0x54/0x1b0 [c0000001dc53bae0] c00000000007b4c8 .kvm_vcpu_ioctl+0x478/0x730 [c0000001dc53bca0] c0000000002140cc .do_vfs_ioctl+0x4ac/0x770 [c0000001dc53bd80] c0000000002143e8 .SyS_ioctl+0x58/0xb0 [c0000001dc53be30] c000000000009e58 syscall_exit+0x0/0x98 Signed-off-by: Alexander Graf <agraf@suse.de>
2013-12-17lib: Add missing arch generic-y entries for asm-generic/hash.hDavid S. Miller
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17Merge tag 'v3.13-rc4' into core/lockingIngo Molnar
Merge Linux 3.13-rc4, to refresh this rather old tree with the latest fixes. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-12-16Merge tag 'v3.13-rc4' into perf/coreIngo Molnar
Merge Linux 3.13-rc4, to refresh this branch with the latest fixes. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-12-16powerpc: Full barrier for smp_mb__after_unlock_lock()Paul E. McKenney
The powerpc lock acquisition sequence is as follows: lwarx; cmpwi; bne; stwcx.; lwsync; Lock release is as follows: lwsync; stw; If CPU 0 does a store (say, x=1) then a lock release, and CPU 1 does a lock acquisition then a load (say, r1=y), then there is no guarantee of a full memory barrier between the store to 'x' and the load from 'y'. To see this, suppose that CPUs 0 and 1 are hardware threads in the same core that share a store buffer, and that CPU 2 is in some other core, and that CPU 2 does the following: y = 1; sync; r2 = x; If 'x' and 'y' are both initially zero, then the lock acquisition and release sequences above can result in r1 and r2 both being equal to zero, which could not happen if unlock+lock was a full barrier. This commit therefore makes powerpc's smp_mb__after_unlock_lock() be a full barrier. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Reviewed-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: linuxppc-dev@lists.ozlabs.org Cc: <linux-arch@vger.kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1386799151-2219-8-git-send-email-paulmck@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-12-13powerpc/powernv: Fix OPAL LPC access in Little EndianBenjamin Herrenschmidt
We are passing pointers to the firmware for reads, we need to properly convert the result as OPAL is always BE. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-12-13powerpc/powernv: Fix endian issue in opal_xscom_readAnton Blanchard
opal_xscom_read uses a pointer to return the data so we need to byteswap it on LE builds. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-12-11powerpc/kvm/booke: Fix build break due to stack frame size warningScott Wood
Commit ce11e48b7fdd256ec68b932a89b397a790566031 ("KVM: PPC: E500: Add userspace debug stub support") added "struct thread_struct" to the stack of kvmppc_vcpu_run(). thread_struct is 1152 bytes on my build, compared to 48 bytes for the recently-introduced "struct debug_reg". Use the latter instead. This fixes the following error: cc1: warnings being treated as errors arch/powerpc/kvm/booke.c: In function 'kvmppc_vcpu_run': arch/powerpc/kvm/booke.c:760:1: error: the frame size of 1424 bytes is larger than 1024 bytes make[2]: *** [arch/powerpc/kvm/booke.o] Error 1 make[1]: *** [arch/powerpc/kvm] Error 2 make[1]: *** Waiting for unfinished jobs.... Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2013-12-10powerpc: Fix PTE page address mismatch in pgtable ctor/dtorHong H. Pham
In pte_alloc_one(), pgtable_page_ctor() is passed an address that has not been converted by page_address() to the newly allocated PTE page. When the PTE is freed, __pte_free_tlb() calls pgtable_page_dtor() with an address to the PTE page that has been converted by page_address(). The mismatch in the PTE's page address causes pgtable_page_dtor() to access invalid memory, so resources for that PTE (such as the page lock) is not properly cleaned up. On PPC32, only SMP kernels are affected. On PPC64, only SMP kernels with 4K page size are affected. This bug was introduced by commit d614bb041209fd7cb5e4b35e11a7b2f6ee8f62b8 "powerpc: Move the pte free routines from common header". On a preempt-rt kernel, a spinlock is dynamically allocated for each PTE in pgtable_page_ctor(). When the PTE is freed, calling pgtable_page_dtor() with a mismatched page address causes a memory leak, as the pointer to the PTE's spinlock is bogus. On mainline, there isn't any immediately obvious symptoms, but the problem still exists here. Fixes: d614bb041209fd7c "powerpc: Move the pte free routes from common header" Cc: Paul Mackerras <paulus@samba.org> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: linux-stable <stable@vger.kernel.org> # v3.10+ Signed-off-by: Hong H. Pham <hong.pham@windriver.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-12-09KVM: PPC: Book3S: PR: Make svcpu -> vcpu store preempt savvyAlexander Graf
As soon as we get back to our "highmem" handler in virtual address space we may get preempted. Today the reason we can get preempted is that we replay interrupts and all the lazy logic thinks we have interrupts enabled. However, it's not hard to make the code interruptible and that way we can enable and handle interrupts even earlier. This fixes random guest crashes that happened with CONFIG_PREEMPT=y for me. Signed-off-by: Alexander Graf <agraf@suse.de>