summaryrefslogtreecommitdiff
path: root/arch/s390/include/asm/pgtable.h
AgeCommit message (Collapse)Author
2014-10-27s390/mm: pmdp_get_and_clear_full optimizationMartin Schwidefsky
Analog to ptep_get_and_clear_full define a variant of the pmpd_get_and_clear primitive which gets the full hint from the mmu_gather struct. This allows s390 to avoid a costly instruction when destroying an address space. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-10-27s390/ftrace,kprobes: allow to patch first instructionHeiko Carstens
If the function tracer is enabled, allow to set kprobes on the first instruction of a function (which is the function trace caller): If no kprobe is set handling of enabling and disabling function tracing of a function simply patches the first instruction. Either it is a nop (right now it's an unconditional branch, which skips the mcount block), or it's a branch to the ftrace_caller() function. If a kprobe is being placed on a function tracer calling instruction we encode if we actually have a nop or branch in the remaining bytes after the breakpoint instruction (illegal opcode). This is possible, since the size of the instruction used for the nop and branch is six bytes, while the size of the breakpoint is only two bytes. Therefore the first two bytes contain the illegal opcode and the last four bytes contain either "0" for nop or "1" for branch. The kprobes code will then execute/simulate the correct instruction. Instruction patching for kprobes and function tracer is always done with stop_machine(). Therefore we don't have any races where an instruction is patched concurrently on a different cpu. Besides that also the program check handler which executes the function trace caller instruction won't be executed concurrently to any stop_machine() execution. This allows to keep full fault based kprobes handling which generates correct pt_regs contents automatically. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-10-27s390/mm: disable KSM for storage key enabled pagesDominik Dingel
When storage keys are enabled unmerge already merged pages and prevent new pages from being merged. Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-10-27s390/mm: prevent and break zero page mappings in case of storage keysDominik Dingel
As soon as storage keys are enabled we need to stop working on zero page mappings to prevent inconsistencies between storage keys and pgste. Otherwise following data corruption could happen: 1) guest enables storage key 2) guest sets storage key for not mapped page X -> change goes to PGSTE 3) guest reads from page X -> as X was not dirty before, the page will be zero page backed, storage key from PGSTE for X will go to storage key for zero page 4) guest sets storage key for not mapped page Y (same logic as above 5) guest reads from page Y -> as Y was not dirty before, the page will be zero page backed, storage key from PGSTE for Y will got to storage key for zero page overwriting storage key for X While holding the mmap sem, we are safe against changes on entries we already fixed, as every fault would need to take the mmap_sem (read). Other vCPUs executing storage key instructions will get a one time interception and be serialized also with mmap_sem. Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-10-27s390/mm: recfactor global pgste updatesDominik Dingel
Replace the s390 specific page table walker for the pgste updates with a call to the common code walk_page_range function. There are now two pte modification functions, one for the reset of the CMMA state and another one for the initialization of the storage keys. Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-10-14Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Martin Schwidefsky: "This patch set contains the main portion of the changes for 3.18 in regard to the s390 architecture. It is a bit bigger than usual, mainly because of a new driver and the vector extension patches. The interesting bits are: - Quite a bit of work on the tracing front. Uprobes is enabled and the ftrace code is reworked to get some of the lost performance back if CONFIG_FTRACE is enabled. - To improve boot time with CONFIG_DEBIG_PAGEALLOC, support for the IPTE range facility is added. - The rwlock code is re-factored to improve writer fairness and to be able to use the interlocked-access instructions. - The kernel part for the support of the vector extension is added. - The device driver to access the CD/DVD on the HMC is added, this will hopefully come in handy to improve the installation process. - Add support for control-unit initiated reconfiguration. - The crypto device driver is enhanced to enable the additional AP domains and to allow the new crypto hardware to be used. - Bug fixes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (39 commits) s390/ftrace: simplify enabling/disabling of ftrace_graph_caller s390/ftrace: remove 31 bit ftrace support s390/kdump: add support for vector extension s390/disassembler: add vector instructions s390: add support for vector extension s390/zcrypt: Toleration of new crypto hardware s390/idle: consolidate idle functions and definitions s390/nohz: use a per-cpu flag for arch_needs_cpu s390/vtime: do not reset idle data on CPU hotplug s390/dasd: add support for control unit initiated reconfiguration s390/dasd: fix infinite loop during format s390/mm: make use of ipte range facility s390/setup: correct 4-level kernel page table detection s390/topology: call set_sched_topology early s390/uprobes: architecture backend for uprobes s390/uprobes: common library for kprobes and uprobes s390/rwlock: use the interlocked-access facility 1 instructions s390/rwlock: improve writer fairness s390/rwlock: remove interrupt-enabling rwlock variant. s390/mm: remove change bit override support ...
2014-10-08Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Paolo Bonzini: "Fixes and features for 3.18. Apart from the usual cleanups, here is the summary of new features: - s390 moves closer towards host large page support - PowerPC has improved support for debugging (both inside the guest and via gdbstub) and support for e6500 processors - ARM/ARM64 support read-only memory (which is necessary to put firmware in emulated NOR flash) - x86 has the usual emulator fixes and nested virtualization improvements (including improved Windows support on Intel and Jailhouse hypervisor support on AMD), adaptive PLE which helps overcommitting of huge guests. Also included are some patches that make KVM more friendly to memory hot-unplug, and fixes for rare caching bugs. Two patches have trivial mm/ parts that were acked by Rik and Andrew. Note: I will soon switch to a subkey for signing purposes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (157 commits) kvm: do not handle APIC access page if in-kernel irqchip is not in use KVM: s390: count vcpu wakeups in stat.halt_wakeup KVM: s390/facilities: allow TOD-CLOCK steering facility bit KVM: PPC: BOOK3S: HV: CMA: Reserve cma region only in hypervisor mode arm/arm64: KVM: Report correct FSC for unsupported fault types arm/arm64: KVM: Fix VTTBR_BADDR_MASK and pgd alloc kvm: Fix kvm_get_page_retry_io __gup retval check arm/arm64: KVM: Fix set_clear_sgi_pend_reg offset kvm: x86: Unpin and remove kvm_arch->apic_access_page kvm: vmx: Implement set_apic_access_page_addr kvm: x86: Add request bit to reload APIC access page address kvm: Add arch specific mmu notifier for page invalidation kvm: Rename make_all_cpus_request() to kvm_make_all_cpus_request() and make it non-static kvm: Fix page ageing bugs kvm/x86/mmu: Pass gfn and level to rmapp callback. x86: kvm: use alternatives for VMCALL vs. VMMCALL if kernel text is read-only kvm: x86: use macros to compute bank MSRs KVM: x86: Remove debug assertion of non-PAE reserved bits kvm: don't take vcpu mutex for obviously invalid vcpu ioctls kvm: Faults which trigger IO release the mmap_sem ...
2014-09-30s390/mm: make use of ipte range facilityHeiko Carstens
Invalidate several pte entries at once if the ipte range facility is available. Currently this works only for DEBUG_PAGE_ALLOC where several up to 2 ^ MAX_ORDER may be invalidated at once. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-09-25s390/mm: remove change bit override supportHeiko Carstens
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-09-02KVM: s390/mm: Fix guest storage key corruption in ptep_set_access_flagsChristian Borntraeger
commit 0944fe3f4a32 ("s390/mm: implement software referenced bits") triggered another paging/storage key corruption. There is an unhandled invalid->valid pte change where we have to set the real storage key from the pgste. When doing paging a guest page might be swapcache or swap and when faulted in it might be read-only and due to a parallel scan old. An do_wp_page will make it writeable and young. Due to software reference tracking this page was invalid and now becomes valid. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: stable@vger.kernel.org # v3.12+
2014-09-02KVM: s390/mm: Fix storage key corruption during swappingChristian Borntraeger
Since 3.12 or more precisely commit 0944fe3f4a32 ("s390/mm: implement software referenced bits") guest storage keys get corrupted during paging. This commit added another valid->invalid translation for page tables - namely ptep_test_and_clear_young. We have to transfer the storage key into the pgste in that case. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: stable@vger.kernel.org # v3.12+
2014-08-26KVM: s390/mm: remove outdated gmap data structuresMartin Schwidefsky
The radix tree rework removed all code that uses the gmap_rmap and gmap_pgtable data structures. Remove these outdated definitions. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-08-26KVM: s390/mm: support gmap page tables with less than 5 levelsMartin Schwidefsky
Add an addressing limit to the gmap address spaces and only allocate the page table levels that are needed for the given limit. The limit is fixed and can not be changed after a gmap has been created. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-08-26KVM: s390/mm: use radix trees for guest to host mappingsMartin Schwidefsky
Store the target address for the gmap segments in a radix tree instead of using invalid segment table entries. gmap_translate becomes a simple radix_tree_lookup, gmap_fault is split into the address translation with gmap_translate and the part that does the linking of the gmap shadow page table with the process page table. A second radix tree is used to keep the pointers to the segment table entries for segments that are mapped in the guest address space. On unmap of a segment the pointer is retrieved from the radix tree and is used to carry out the segment invalidation in the gmap shadow page table. As the radix tree can only store one pointer, each host segment may only be mapped to exactly one guest location. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-08-25KVM: s390/mm: cleanup gmap function arguments, variable namesMartin Schwidefsky
Make the order of arguments for the gmap calls more consistent, if the gmap pointer is passed it is always the first argument. In addition distinguish between guest address and user address by naming the variables gaddr for a guest address and vmaddr for a user address. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-08-25KVM: s390/mm: readd address parameter to gmap_do_ipte_notifyMartin Schwidefsky
Revert git commit c3a23b9874c1 ("remove unnecessary parameter from gmap_do_ipte_notify"). Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-08-25KVM: s390/mm: readd address parameter to pgste_ipte_notifyMartin Schwidefsky
Revert git commit 1b7fd6952063 ("remove unecessary parameter from pgste_ipte_notify") Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-08-01s390/mm: implement dirty bits for large segment table entriesMartin Schwidefsky
The large segment table entry format has block of bits for the ACC/F values for the large page. These bits are valid only if another bit (AV bit 0x10000) of the segment table entry is set. The ACC/F bits do not have a meaning if the AV bit is off. This allows to put the THP splitting bit, the segment young bit and the new segment dirty bit into the ACC/F bits as long as the AV bit stays off. The dirty and young information is only available if the pmd is large. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-04-22KVM: s390/mm: new gmap_test_and_clear_dirty functionDominik Dingel
For live migration kvm needs to test and clear the dirty bit of guest pages. That for is ptep_test_and_clear_user_dirty, to be sure we are not racing with other code, we protect the pte. This needs to be done within the architecture memory management code. Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22KVM: s390/mm: use software dirty bit detection for user dirty trackingMartin Schwidefsky
Switch the user dirty bit detection used for migration from the hardware provided host change-bit in the pgste to a fault based detection method. This reduced the dependency of the host from the storage key to a point where it becomes possible to enable the RCP bypass for KVM guests. The fault based dirty detection will only indicate changes caused by accesses via the guest address space. The hardware based method can detect all changes, even those caused by I/O or accesses via the kernel page table. The KVM/qemu code needs to take this into account. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22KVM: s390: Allow skeys to be enabled for the current processDominik Dingel
Introduce a new function s390_enable_skey(), which enables storage key handling via setting the use_skey flag in the mmu context. This function is only useful within the context of kvm. Note that enabling storage keys will cause a one-time hickup when walking the page table; however, it saves us special effort for cases like clear reset while making it possible for us to be architecture conform. s390_enable_skey() takes the page table lock to prevent reseting storage keys triggered from multiple vcpus. Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22KVM: s390: Adding skey bit to mmu contextDominik Dingel
For lazy storage key handling, we need a mechanism to track if the process ever issued a storage key operation. This patch adds the basic infrastructure for making the storage key handling optional, but still leaves it enabled for now by default. Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-08Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull second set of s390 patches from Martin Schwidefsky: "The second part of Heikos uaccess rework, the page table walker for uaccess is now a thing of the past (yay!) The code change to fix the theoretical TLB flush problem allows us to add a TLB flush optimization for zEC12, this machine has new instructions that allow to do CPU local TLB flushes for single pages and for all pages of a specific address space. Plus the usual bug fixing and some more cleanup" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/uaccess: rework uaccess code - fix locking issues s390/mm,tlb: optimize TLB flushing for zEC12 s390/mm,tlb: safeguard against speculative TLB creation s390/irq: Use defines for external interruption codes s390/irq: Add defines for external interruption codes s390/sclp: add timeout for queued requests kvm/s390: also set guest pages back to stable on kexec/kdump lcs: Add missing destroy_timer_on_stack() s390/tape: Add missing destroy_timer_on_stack() s390/tape: Use del_timer_sync() s390/3270: fix crash with multiple reset device requests s390/bitops,atomic: add missing memory barriers s390/zcrypt: add length check for aligned data to avoid overflow in msg-type 6
2014-04-03s390/mm,tlb: optimize TLB flushing for zEC12Martin Schwidefsky
The zEC12 machines introduced the local-clearing control for the IDTE and IPTE instruction. If the control is set only the TLB of the local CPU is cleared of entries, either all entries of a single address space for IDTE, or the entry for a single page-table entry for IPTE. Without the local-clearing control the TLB flush is broadcasted to all CPUs in the configuration, which is expensive. The reset of the bit mask of the CPUs that need flushing after a non-local IDTE is tricky. As TLB entries for an address space remain in the TLB even if the address space is detached a new bit field is required to keep track of attached CPUs vs. CPUs in the need of a flush. After a non-local flush with IDTE the bit-field of attached CPUs is copied to the bit-field of CPUs in need of a flush. The ordering of operations on cpu_attach_mask, attach_count and mm_cpumask(mm) is such that an underindication in mm_cpumask(mm) is prevented but an overindication in mm_cpumask(mm) is possible. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-04-02Merge tag 'kvm-3.15-1' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "PPC and ARM do not have much going on this time. Most of the cool stuff, instead, is in s390 and (after a few releases) x86. ARM has some caching fixes and PPC has transactional memory support in guests. MIPS has some fixes, with more probably coming in 3.16 as QEMU will soon get support for MIPS KVM. For x86 there are optimizations for debug registers, which trigger on some Windows games, and other important fixes for Windows guests. We now expose to the guest Broadwell instruction set extensions and also Intel MPX. There's also a fix/workaround for OS X guests, nested virtualization features (preemption timer), and a couple kvmclock refinements. For s390, the main news is asynchronous page faults, together with improvements to IRQs (floating irqs and adapter irqs) that speed up virtio devices" * tag 'kvm-3.15-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (96 commits) KVM: PPC: Book3S HV: Save/restore host PMU registers that are new in POWER8 KVM: PPC: Book3S HV: Fix decrementer timeouts with non-zero TB offset KVM: PPC: Book3S HV: Don't use kvm_memslots() in real mode KVM: PPC: Book3S HV: Return ENODEV error rather than EIO KVM: PPC: Book3S: Trim top 4 bits of physical address in RTAS code KVM: PPC: Book3S HV: Add get/set_one_reg for new TM state KVM: PPC: Book3S HV: Add transactional memory support KVM: Specify byte order for KVM_EXIT_MMIO KVM: vmx: fix MPX detection KVM: PPC: Book3S HV: Fix KVM hang with CONFIG_KVM_XICS=n KVM: PPC: Book3S: Introduce hypervisor call H_GET_TCE KVM: PPC: Book3S HV: Fix incorrect userspace exit on ioeventfd write KVM: s390: clear local interrupts at cpu initial reset KVM: s390: Fix possible memory leak in SIGP functions KVM: s390: fix calculation of idle_mask array size KVM: s390: randomize sca address KVM: ioapic: reinject pending interrupts on KVM_SET_IRQCHIP KVM: Bump KVM_MAX_IRQ_ROUTES for s390 KVM: s390: irq routing for adapter interrupts. KVM: s390: adapter interrupt sources ...
2014-03-21s390/mm: remove unecessary parameter from pgste_ipte_notifyDominik Dingel
Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-03-21s390/mm: remove unnecessary parameter from gmap_do_ipte_notifyDominik Dingel
Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-02-21s390/kvm: support collaborative memory managementKonstantin Weitz
This patch enables Collaborative Memory Management (CMM) for kvm on s390. CMM allows the guest to inform the host about page usage (see arch/s390/mm/cmm.c). The host uses this information to avoid swapping in unused pages in the page fault handler. Further, a CPU provided list of unused invalid pages is processed to reclaim swap space of not yet accessed unused pages. [ Martin Schwidefsky: patch reordering and cleanup ] Signed-off-by: Konstantin Weitz <konstantin.weitz@gmail.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-02-21s390/mm,tlb: race of lazy TLB flush vs. recreation of TLB entriesMartin Schwidefsky
Git commit 050eef364ad70059 "[S390] fix tlb flushing vs. concurrent /proc accesses" introduced the attach counter to avoid using the mm_users value to decide between IPTE for every PTE and lazy TLB flushing with IDTE. That fixed the problem with mm_users but it introduced another subtle race, fortunately one that is very hard to hit. The background is the requirement of the architecture that a valid PTE may not be changed while it can be used concurrently by another cpu. The decision between IPTE and lazy TLB flushing needs to be done while the PTE is still valid. Now if the virtual cpu is temporarily stopped after the decision to use lazy TLB flushing but before the invalid bit of the PTE has been set, another cpu can attach the mm, find that flush_mm is set, do the IDTE, return to userspace, and recreate a TLB that uses the PTE in question. When the first, stopped cpu continues it will change the PTE while it is attached on another cpu. The first cpu will do another IDTE shortly after the modification of the PTE which makes the race window quite short. To fix this race the CPU that wants to attach the address space of a user space thread needs to wait for the end of the PTE modification. The number of concurrent TLB flushers for an mm is tracked in the upper 16 bits of the attach_count and finish_arch_post_lock_switch is used to wait for the end of the flush operation if required. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-01-30KVM: s390: Add FAULT_FLAG_RETRY_NOWAIT for guest faultDominik Dingel
In the case of a fault, we will retry to exit sie64 but with gmap fault indication for this thread set. This makes it possible to handle async page faults. Based on a patch from Martin Schwidefsky. Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2013-10-15s390/mm,kvm: fix software dirty bits vs. kvm for old machinesMartin Schwidefsky
For machines without enhanced supression on protection the software dirty bit code forces the pte dirty bit and clears the page protection bit in pgste_set_pte. This is done for all pte types, the check for present ptes is missing. As a result swap ptes and other not-present ptes can get corrupted. Add a check for the _PAGE_PRESENT bit to pgste_set_pte before modifying the pte value. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-09-04Merge branch 'next' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Gleb Natapov: "The highlights of the release are nested EPT and pv-ticketlocks support (hypervisor part, guest part, which is most of the code, goes through tip tree). Apart of that there are many fixes for all arches" Fix up semantic conflicts as discussed in the pull request thread.. * 'next' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (88 commits) ARM: KVM: Add newlines to panic strings ARM: KVM: Work around older compiler bug ARM: KVM: Simplify tracepoint text ARM: KVM: Fix kvm_set_pte assignment ARM: KVM: vgic: Bump VGIC_NR_IRQS to 256 ARM: KVM: Bugfix: vgic_bytemap_get_reg per cpu regs ARM: KVM: vgic: fix GICD_ICFGRn access ARM: KVM: vgic: simplify vgic_get_target_reg KVM: MMU: remove unused parameter KVM: PPC: Book3S PR: Rework kvmppc_mmu_book3s_64_xlate() KVM: PPC: Book3S PR: Make instruction fetch fallback work for system calls KVM: PPC: Book3S PR: Don't corrupt guest state when kernel uses VMX KVM: x86: update masterclock when kvmclock_offset is calculated (v2) KVM: PPC: Book3S: Fix compile error in XICS emulation KVM: PPC: Book3S PR: return appropriate error when allocation fails arch: powerpc: kvm: add signed type cast for comparation KVM: x86: add comments where MMIO does not return to the emulator KVM: vmx: count exits to userspace during invalid guest emulation KVM: rename __kvm_io_bus_sort_cmp to kvm_io_bus_cmp kvm: optimize away THP checks in kvm_is_mmio_pfn() ...
2013-08-29s390/mm: implement software referenced bitsMartin Schwidefsky
The last remaining use for the storage key of the s390 architecture is reference counting. The alternative is to make page table entries invalid while they are old. On access the fault handler marks the pte/pmd as young which makes the pte/pmd valid if the access rights allow read access. The pte/pmd invalidations required for software managed reference bits cost a bit of performance, on the other hand the RRBE/RRBM instructions to read and reset the referenced bits are quite expensive as well. Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-08-28s390/pgtable: fix mprotect for single-threaded KVM guestsMartin Schwidefsky
For a single-threaded KVM guest ptep_modify_prot_start will not use IPTE, the invalid bit will therefore not be set. If DEBUG_VM is set pgste_set_key called by ptep_modify_prot_commit will complain about the missing invalid bit. ptep_modify_prot_start should set the invalid bit in all cases. Cc: stable@vger.kernel.org # 3.10+ Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-08-22s390/mm: introduce ptep_flush_lazy helperMartin Schwidefsky
Isolate the logic of IDTE vs. IPTE flushing of ptes in two functions, ptep_flush_lazy and __tlb_flush_mm_lazy. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-08-22s390/pgtable: add pgste_get helperMartin Schwidefsky
ptep_modify_prot_start uses the pgste_set helper to store the pgste, while ptep_modify_prot_commit uses its own pointer magic to retrieve the value again. Add the pgste_get help function to keep things symmetrical and improve readability. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-08-22s390/pgtable: skip pgste updates on full flushMartin Schwidefsky
On process exit there is no more need for the pgste information. Skip the expensive storage key operations which should speed up termination of KVM processes. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-08-22s390/mm: cleanup page table definitionsMartin Schwidefsky
Improve the encoding of the different pte types and the naming of the page, segment table and region table bits. Due to the different pte encoding the hugetlbfs primitives need to be adapted as well. To improve compatability with common code make the huge ptes use the encoding of normal ptes. The conversion between the pte and pmd encoding for a huge pte is done with set_huge_pte_at and huge_ptep_get. Overall the code is now easier to understand. Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-07-29KVM: s390: allow sie enablement for multi-threaded programsMartin Schwidefsky
Improve the code to upgrade the standard 2K page tables to 4K page tables with PGSTEs to allow the operation to happen when the program is already multi-threaded. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-07-04Merge branch 'next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull powerpc updates from Ben Herrenschmidt: "This is the powerpc changes for the 3.11 merge window. In addition to the usual bug fixes and small updates, the main highlights are: - Support for transparent huge pages by Aneesh Kumar for 64-bit server processors. This allows the use of 16M pages as transparent huge pages on kernels compiled with a 64K base page size. - Base VFIO support for KVM on power by Alexey Kardashevskiy - Wiring up of our nvram to the pstore infrastructure, including putting compressed oopses in there by Aruna Balakrishnaiah - Move, rework and improve our "EEH" (basically PCI error handling and recovery) infrastructure. It is no longer specific to pseries but is now usable by the new "powernv" platform as well (no hypervisor) by Gavin Shan. - I fixed some bugs in our math-emu instruction decoding and made it usable to emulate some optional FP instructions on processors with hard FP that lack them (such as fsqrt on Freescale embedded processors). - Support for Power8 "Event Based Branch" facility by Michael Ellerman. This facility allows what is basically "userspace interrupts" for performance monitor events. - A bunch of Transactional Memory vs. Signals bug fixes and HW breakpoint/watchpoint fixes by Michael Neuling. And more ... I appologize in advance if I've failed to highlight something that somebody deemed worth it." * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (156 commits) pstore: Add hsize argument in write_buf call of pstore_ftrace_call powerpc/fsl: add MPIC timer wakeup support powerpc/mpic: create mpic subsystem object powerpc/mpic: add global timer support powerpc/mpic: add irq_set_wake support powerpc/85xx: enable coreint for all the 64bit boards powerpc/8xx: Erroneous double irq_eoi() on CPM IRQ in MPC8xx powerpc/fsl: Enable CONFIG_E1000E in mpc85xx_smp_defconfig powerpc/mpic: Add get_version API both for internal and external use powerpc: Handle both new style and old style reserve maps powerpc/hw_brk: Fix off by one error when validating DAWR region end powerpc/pseries: Support compression of oops text via pstore powerpc/pseries: Re-organise the oops compression code pstore: Pass header size in the pstore write callback powerpc/powernv: Fix iommu initialization again powerpc/pseries: Inform the hypervisor we are using EBB regs powerpc/perf: Add power8 EBB support powerpc/perf: Core EBB support for 64-bit book3s powerpc/perf: Drop MMCRA from thread_struct powerpc/perf: Don't enable if we have zero events ...
2013-07-03Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fixes from Paolo Bonzini: "On the x86 side, there are some optimizations and documentation updates. The big ARM/KVM change for 3.11, support for AArch64, will come through Catalin Marinas's tree. s390 and PPC have misc cleanups and bugfixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (87 commits) KVM: PPC: Ignore PIR writes KVM: PPC: Book3S PR: Invalidate SLB entries properly KVM: PPC: Book3S PR: Allow guest to use 1TB segments KVM: PPC: Book3S PR: Don't keep scanning HPTEG after we find a match KVM: PPC: Book3S PR: Fix invalidation of SLB entry 0 on guest entry KVM: PPC: Book3S PR: Fix proto-VSID calculations KVM: PPC: Guard doorbell exception with CONFIG_PPC_DOORBELL KVM: Fix RTC interrupt coalescing tracking kvm: Add a tracepoint write_tsc_offset KVM: MMU: Inform users of mmio generation wraparound KVM: MMU: document fast invalidate all mmio sptes KVM: MMU: document fast invalidate all pages KVM: MMU: document fast page fault KVM: MMU: document mmio page fault KVM: MMU: document write_flooding_count KVM: MMU: document clear_spte_count KVM: MMU: drop kvm_mmu_zap_mmio_sptes KVM: MMU: init kvm generation close to mmio wrap-around value KVM: MMU: add tracepoint for check_mmio_spte KVM: MMU: fast invalidate all mmio sptes ...
2013-06-29consolidate io_remap_pfn_range definitionsAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-20mm/THP: add pmd args to pgtable deposit and withdraw APIsAneesh Kumar K.V
This will be later used by powerpc THP support. In powerpc we want to use pgtable for storing the hash index values. So instead of adding them to mm_context list, we would like to store them in the second half of pmd Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-05s390/pgtable: make pgste lock an explicit barrierChristian Borntraeger
Getting and Releasing the pgste lock has lock semantics. Make the code an explicit barrier. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-06-05s390/pgtable: Save pgste during modify_prot_start/commitChristian Borntraeger
In modify_prot_start we update the pgste value but never store it back into the original location. Lets save the calculated result, since modify_prot_commit will use the value of the pgste. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-06-05s390/pgtable: Fix guest overindication for change bitChristian Borntraeger
When doing the transition invalid->valid in the host page table for a guest, then the guest view of C/R is in the pgste. After validation the view is pgste OR real key. We must zero out the real key C/R to avoid guest over-indication for change (and reference). Touching the real key is ok also for the host: The change bit is tracked via write protection and the reference bit is also ok because set_pte_at was called and the page will be touched anyway soon. Furthermore architecture defines reference as "substantially accurate", over- and underindication are ok. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-05-28s390/pgtable: Fix check for pgste/storage key handlingChristian Borntraeger
pte_present might return true on PAGE_TYPE_NONE, even if the invalid bit is on. Modify the existing check of the pgste functions to avoid crashes. [ Martin Schwidefsky: added ptep_modify_prot_[start|commit] bits ] Reported-by: Martin Schwidefky <schwidefsky@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> CC: stable@vger.kernel.org Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-05-21s390/kvm: Kick guests out of sie if prefix page host pte is touchedChristian Borntraeger
The guest prefix pages must be mapped writeable all the time while SIE is running, otherwise the guest might see random behaviour. (pinned at the pte level) Turns out that mlocking is not enough, the page table entry (not the page) might change or become r/o. This patch uses the gmap notifiers to kick guest cpus out of SIE. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-05-21s390/kvm: rename RCP_xxx defines to PGSTE_xxxMartin Schwidefsky
The RCP byte is a part of the PGSTE value, the existing RCP_xxx names are inaccurate. As the defines describe bits and pieces of the PGSTE, the names should start with PGSTE_. The KVM_UR_BIT and KVM_UC_BIT are part of the PGSTE as well, give them better names as well. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-05-21s390/pgtable: fix ipte notify bitChristian Borntraeger
Dont use the same bit as user referenced. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>