summaryrefslogtreecommitdiff
path: root/arch/x86/kernel
AgeCommit message (Collapse)Author
2013-01-29x86: Factor out e820_add_kernel_range()Yinghai Lu
Separate out the reservation of the kernel static memory areas into a separate function. Also add support for case when memmap=xxM$yyM is used without exactmap. Need to remove reserved range at first before we add E820_RAM range, otherwise added E820_RAM range will be ignored. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1359058816-7615-5-git-send-email-yinghai@kernel.org Cc: Jacob Shin <jacob.shin@amd.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29Merge remote-tracking branch 'origin/x86/boot' into x86/mm2H. Peter Anvin
Coming patches to x86/mm2 require the changes and advanced baseline in x86/boot. Resolved Conflicts: arch/x86/kernel/setup.c mm/nobootmem.c Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29Merge branch 'master' into for-nextJiri Kosina
Conflicts: drivers/devfreq/exynos4_bus.c Sync with Linus' tree to be able to apply patches that are against newer code (mvneta).
2013-01-29x86, boot: Sanitize boot_params if not zeroed on creationH. Peter Anvin
Use the new sentinel field to detect bootloaders which fail to follow protocol and don't initialize fields in struct boot_params that they do not explicitly initialize to zero. Based on an original patch and research by Yinghai Lu. Changed by hpa to be invoked both in the decompression path and in the kernel proper; the latter for the case where a bootloader takes over decompression. Originally-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1359058816-7615-26-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-28x86, io_apic: Introduce eoi_ioapic_pin call-backJoerg Roedel
This callback replaces the old __eoi_ioapic_pin function which needs a special path for interrupt remapping. Signed-off-by: Joerg Roedel <joro@8bytes.org> Acked-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-01-28x86, msi: Introduce x86_msi.compose_msi_msg call-backJoerg Roedel
This call-back points to the right function for initializing the msi_msg structure. The old code for msi_msg generation was split up into the irq-remapped and the default case. The irq-remapped case just calls into the specific Intel or AMD implementation when the device is behind an IOMMU. Otherwise the default function is called. Signed-off-by: Joerg Roedel <joro@8bytes.org> Acked-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-01-28x86, irq: Introduce setup_remapped_irq()Joerg Roedel
This function does irq-remapping specific interrupt setup like modifying the chip defaults. Signed-off-by: Joerg Roedel <joro@8bytes.org> Acked-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-01-28x86, irq: Move irq_remapped() check into free_remapped_irqJoerg Roedel
The function is called unconditionally now in IO-APIC code removing another irq_remapped() check from x86 core code. Signed-off-by: Joerg Roedel <joro@8bytes.org> Acked-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-01-28x86, io-apic: Remove !irq_remapped() check from __target_IO_APIC_irq()Joerg Roedel
This function is only called from default_ioapic_set_affinity() which is only used when interrupt remapping is disabled since the introduction of the set_affinity function pointer. So the check will always evaluate as true and can be removed. Signed-off-by: Joerg Roedel <joro@8bytes.org> Acked-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-01-28x86, io-apic: Move CONFIG_IRQ_REMAP code out of x86 coreJoerg Roedel
Move all the code to either to the header file asm/irq_remapping.h or to drivers/iommu/. Signed-off-by: Joerg Roedel <joro@8bytes.org> Acked-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-01-28x86, io_apic: Remove irq_remapping_enabled check in setup_timer_IRQ0_pinJoerg Roedel
This function is only called when irq-remapping is disabled. Signed-off-by: Joerg Roedel <joro@8bytes.org> Acked-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-01-28x86, io_apic: Move irq_remapping_enabled checks out of check_timer()Joerg Roedel
Move these checks to IRQ remapping code by introducing the panic_on_irq_remap() function. Signed-off-by: Joerg Roedel <joro@8bytes.org> Acked-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-01-28x86, io_apic: Convert setup_ioapic_entry to function pointerJoerg Roedel
This pointer is changed to a different function when IRQ remapping is enabled. Signed-off-by: Joerg Roedel <joro@8bytes.org> Acked-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-01-28x86, io_apic: Introduce set_affinity function pointerJoerg Roedel
With interrupt remapping a special function is used to change the affinity of an IO-APIC interrupt. Abstract this with a function pointer. Signed-off-by: Joerg Roedel <joro@8bytes.org> Acked-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-01-28x86, msi: Use IRQ remapping specific setup_msi_irqs routineJoerg Roedel
Use seperate routines to setup MSI IRQs for both irq_remapping_enabled cases. Signed-off-by: Joerg Roedel <joro@8bytes.org> Acked-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-01-28x86, hpet: Introduce x86_msi_ops.setup_hpet_msiJoerg Roedel
This function pointer can be overwritten by the IRQ remapping code. The irq_remapping_enabled check can be removed from default_setup_hpet_msi. Signed-off-by: Joerg Roedel <joro@8bytes.org> Acked-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-01-28x86, io_apic: Introduce x86_io_apic_ops.print_entries for debuggingJoerg Roedel
This call-back is used to dump IO-APIC entries for debugging purposes into the kernel log. VT-d needs a special routine for this and will overwrite the default. Signed-off-by: Joerg Roedel <joro@8bytes.org> Acked-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-01-28x86, io_apic: Introduce x86_io_apic_ops.disable()Joerg Roedel
This function pointer is used to call a system-specific function for disabling the IO-APIC. Currently this is used for IRQ remapping which has its own disable routine. Also introduce the necessary infrastructure in the interrupt remapping code to overwrite this and other function pointers as necessary by interrupt remapping. Signed-off-by: Joerg Roedel <joro@8bytes.org> Acked-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-01-28x86, apic: Mask IO-APIC and PIC unconditionally on LAPIC resumeJoerg Roedel
IO-APIC and PIC use the same resume routines when IRQ remapping is enabled or disabled. So it should be safe to mask the other APICs for the IRQ-remapping-disabled case too. Signed-off-by: Joerg Roedel <joro@8bytes.org> Acked-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-01-28x86, apic: Move irq_remapping_enabled checks into IRQ-remapping codeJoerg Roedel
Move the three easy to move checks in the x86' apic.c file into the IRQ-remapping code. Signed-off-by: Joerg Roedel <joro@8bytes.org> Acked-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-01-27cputime: Use accessors to read task cputime statsFrederic Weisbecker
This is in preparation for the full dynticks feature. While remotely reading the cputime of a task running in a full dynticks CPU, we'll need to do some extra-computation. This way we can account the time it spent tickless in userspace since its last cputime snapshot. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de>
2013-01-25x86, kvm: Fix kvm's use of __pa() on percpu areasDave Hansen
In short, it is illegal to call __pa() on an address holding a percpu variable. This replaces those __pa() calls with slow_virt_to_phys(). All of the cases in this patch are in boot time (or CPU hotplug time at worst) code, so the slow pagetable walking in slow_virt_to_phys() is not expected to have a performance impact. The times when this actually matters are pretty obscure (certain 32-bit NUMA systems), but it _does_ happen. It is important to keep KVM guests working on these systems because the real hardware is getting harder and harder to find. This bug manifested first by me seeing a plain hang at boot after this message: CPU 0 irqstacks, hard=f3018000 soft=f301a000 or, sometimes, it would actually make it out to the console: [ 0.000000] BUG: unable to handle kernel paging request at ffffffff I eventually traced it down to the KVM async pagefault code. This can be worked around by disabling that code either at compile-time, or on the kernel command-line. The kvm async pagefault code was injecting page faults in to the guest which the guest misinterpreted because its "reason" was not being properly sent from the host. The guest passes a physical address of an per-cpu async page fault structure via an MSR to the host. Since __pa() is broken on percpu data, the physical address it sent was bascially bogus and the host went scribbling on random data. The guest never saw the real reason for the page fault (it was injected by the host), assumed that the kernel had taken a _real_ page fault, and panic()'d. The behavior varied, though, depending on what got corrupted by the bad write. Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20130122212435.4905663F@kernel.stglabs.ibm.com Acked-by: Rik van Riel <riel@redhat.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-25Merge tag 'v3.8-rc5' into x86/mmH. Peter Anvin
The __pa() fixup series that follows touches KVM code that is not present in the existing branch based on v3.7-rc5, so merge in the current upstream from Linus. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-26PM / tracing: remove deprecated power trace APIPaul Gortmaker
The text in Documentation said it would be removed in 2.6.41; the text in the Kconfig said removal in the 3.1 release. Either way you look at it, we are well past both, so push it off a cliff. Note that the POWER_CSTATE and the POWER_PSTATE are part of the legacy tracing API. Remove all tracepoints which use these flags. As can be seen from context, most already have a trace entry via trace_cpu_idle anyways. Also, the cpufreq/cpufreq.c PSTATE one is actually unpaired, as compared to the CSTATE ones which all have a clear start/stop. As part of this, the trace_power_frequency also becomes orphaned, so it too is deleted. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-01-24x86/process: Change %8s to %s for pr_warn() in release_thread()Chen Gang
the length of dead_task->comm[] is 16 (TASK_COMM_LEN) on pr_warn(), it is not meaningful to use %8s for task->comm[]. So change it to %s, since the line is not solid anyway. Additional information: %8s limit the width, not for the original string output length if name length is more than 8, it still can be fully displayed. if name length is less than 8, the ' ' will be filled before name. %.8s truly limit the original string output length (precision) Signed-off-by: Chen Gang <gang.chen@asianux.com> Link: http://lkml.kernel.org/n/tip-nridm1zvreai1tgfLjuexDmd@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24x86/msr: Add capabilities checkAlan Cox
At the moment the MSR driver only relies upon file system checks. This means that anything as root with any capability set can write to MSRs. Historically that wasn't very interesting but on modern processors the MSRs are such that writing to them provides several ways to execute arbitary code in kernel space. Sample code and documentation on doing this is circulating and MSR attacks are used on Windows 64bit rootkits already. In the Linux case you still need to be able to open the device file so the impact is fairly limited and reduces the security of some capability and security model based systems down towards that of a generic "root owns the box" setup. Therefore they should require CAP_SYS_RAWIO to prevent an elevation of capabilities. The impact of this is fairly minimal on most setups because they don't have heavy use of capabilities. Those using SELinux, SMACK or AppArmor rules might want to consider if their rulesets on the MSR driver could be tighter. Signed-off-by: Alan Cox <alan@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Horses <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24x86/dma-debug: Bump PREALLOC_DMA_DEBUG_ENTRIESMaarten Lankhorst
I ran out of free entries when I had CONFIG_DMA_API_DEBUG enabled. Some other archs seem to default to 65536, so increase this limit for x86 too. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Link: http://lkml.kernel.org/r/50A612AA.7040206@canonical.com Signed-off-by: Ingo Molnar <mingo@kernel.org> ----
2013-01-24x86/MSI: Support multiple MSIs in presense of IRQ remappingAlexander Gordeev
The MSI specification has several constraints in comparison with MSI-X, most notable of them is the inability to configure MSIs independently. As a result, it is impossible to dispatch interrupts from different queues to different CPUs. This is largely devalues the support of multiple MSIs in SMP systems. Also, a necessity to allocate a contiguous block of vector numbers for devices capable of multiple MSIs might cause a considerable pressure on x86 interrupt vector allocator and could lead to fragmentation of the interrupt vectors space. This patch overcomes both drawbacks in presense of IRQ remapping and lets devices take advantage of multiple queues and per-IRQ affinity assignments. Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Jeff Garzik <jgarzik@pobox.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/c8bd86ff56b5fc118257436768aaa04489ac0a4c.1353324359.git.agordeev@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24x86: Convert a few mistaken __cpuinit annotations to __initJan Beulich
The first two are functions serving as initcalls; the SFI one is only being called from __init code. Signed-off-by: Jan Beulich <jbeulich@suse.com> Link: http://lkml.kernel.org/r/50AFB35102000078000AAECA@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24x86: Fix a typoYuanhan Liu
legact -> legacy Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24x86/perf: Add IvyBridge EP supportYouquan Song
Running the perf utility on a Ivybridge EP server we encounter "not supported" events: <not supported> L1-dcache-loads <not supported> L1-dcache-load-misses <not supported> L1-dcache-stores <not supported> L1-dcache-store-misses <not supported> L1-dcache-prefetches <not supported> L1-dcache-prefetch-misses This patch adds support for this processor. Signed-off-by: Youquan Song <youquan.song@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Youquan Song <youquan.song@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1355851223-27705-1-git-send-email-youquan.song@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24perf/x86: Fix P6 driver section warningyangyongqiang
Fix a compile warning - 'a section type conflict' by removing __initconst. Signed-off-by: yangyongqiang <yangyongqiang01@baidu.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24perf/x86: Enable Intel Lincroft/Penwell/Cloverview Atom supportShuoX Liu
These three chip are based on Atom and have different model id. So add such three id for perf HW event support. Signed-off-by: ShuoX Liu <shuox.liu@intel.com> Cc: yanmin_zhang@intel.linux.com Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1356713324-12442-1-git-send-email-shuox.liu@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24x86/time/rtc: Don't print extended CMOS year when reading RTCBjorn Helgaas
We shouldn't print the current century every time we read the RTC. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20130104224146.15189.14874.stgit@bhelgaas.mtv.corp.google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24Merge branch 'tip/perf/core' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace into perf/core Pull tracing updates from Steve Rostedt. This commit: tracing: Remove the extra 4 bytes of padding in events changes the ABI. All involved parties seem to agree that it's safe to do now, but the devil is in the details ... Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24x86/apic: Allow x2apic without IR on VMware platformAlok N Kataria
This patch updates x2apic initializaition code to allow x2apic on VMware platform even without interrupt remapping support. The hypervisor_x2apic_available hook was added in x2apic initialization code and used by KVM and XEN, before this. I have also cleaned up that code to export this hook through the hypervisor_x86 structure. Compile tested for KVM and XEN configs, this patch doesn't have any functional effect on those two platforms. On VMware platform, verified that x2apic is used in physical mode on products that support this. Signed-off-by: Alok N Kataria <akataria@vmware.com> Reviewed-by: Doug Covelli <dcovelli@vmware.com> Reviewed-by: Dan Hecht <dhecht@vmware.com> Acked-by: H. Peter Anvin <hpa@zytor.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Avi Kivity <avi@redhat.com> Link: http://lkml.kernel.org/r/1358466282.423.60.camel@akataria-dtop.eng.vmware.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24x86/apb/timer: Remove unnecessary "if"Cong Ding
adev has no chance to be NULL, so we don't need to check it. It is also dereferenced just before the check . Signed-off-by: Cong Ding <dinggnu@gmail.com> Cc: Sasha Levin <sasha.levin@oracle.com> Link: http://lkml.kernel.org/r/1358199561-15518-1-git-send-email-dinggnu@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24x86/apic: Remove noisy zero-mask warning from default_send_IPI_mask_logical()Dave Jones
Since circa 3.5, we've had dozens of reports of people hitting this warning. Forwarded reports have been met with silence, so just remove the warning if no-one cares. Example reports: https://bugzilla.redhat.com/show_bug.cgi?id=797687 https://bugzilla.redhat.com/show_bug.cgi?id=867174 https://bugzilla.redhat.com/show_bug.cgi?id=894865 Signed-off-by: Dave Jones <davej@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20130118175847.GA27662@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24x86-64: Fix unwind annotations in recent NMI changesJan Beulich
While in one case a plain annotation is necessary, in the other case the stack adjustment can simply be folded into the immediately preceding RESTORE_ALL, thus getting the correct annotation for free. Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Alexander van Heukelum <heukelum@mailshack.com> Link: http://lkml.kernel.org/r/51010C9302000078000B9045@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-22Merge tag 'perf-urgent-for-mingo' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux Pull perf/urgent fixes from Arnaldo Carvalho de Melo: . revert 20b279 - require exclude_guest to use PEBS - kernel side, now older binaries will continue working for things like cycles:pp without needing to pass extra modifiers, from David Ahern. . Fix building from 'make perf-*-src-pkg' tarballs, broken by UAPI, from Sebastian Andrzej Siewior [ Pulling directly, Ingo would normally pull but has been unresponsive ] * tag 'perf-urgent-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: perf tools: Fix building from 'make perf-*-src-pkg' tarballs perf x86: revert 20b279 - require exclude_guest to use PEBS - kernel side
2013-01-22ptrace: ensure arch_ptrace/ptrace_request can never race with SIGKILLOleg Nesterov
putreg() assumes that the tracee is not running and pt_regs_access() can safely play with its stack. However a killed tracee can return from ptrace_stop() to the low-level asm code and do RESTORE_REST, this means that debugger can actually read/modify the kernel stack until the tracee does SAVE_REST again. set_task_blockstep() can race with SIGKILL too and in some sense this race is even worse, the very fact the tracee can be woken up breaks the logic. As Linus suggested we can clear TASK_WAKEKILL around the arch_ptrace() call, this ensures that nobody can ever wakeup the tracee while the debugger looks at it. Not only this fixes the mentioned problems, we can do some cleanups/simplifications in arch_ptrace() paths. Probably ptrace_unfreeze_traced() needs more callers, for example it makes sense to make the tracee killable for oom-killer before access_process_vm(). While at it, add the comment into may_ptrace_stop() to explain why ptrace_stop() still can't rely on SIGKILL and signal_pending_state(). Reported-by: Salman Qazi <sqazi@google.com> Reported-by: Suleiman Souhlal <suleiman@google.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-22Merge tag 'numascale' into x86/platformH. Peter Anvin
This patchset adds support for federated systems where multiple memory controllers can exist and see each other over multiple PCI domains. This basically means that AMD node ids can be more than 8 now and the code handling this is taught to incorporate PCI domain into those IDs. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-21kprobes/x86: Move kprobes stuff under arch/x86/kernel/kprobes/Masami Hiramatsu
Move arch-dep kprobes stuff under arch/x86/kernel/kprobes. Link: http://lkml.kernel.org/r/20120928081522.3560.75469.stgit@ltc138.sdl.hitachi.co.jp Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> [ fixed whitespace and s/__attribute__((packed))/__packed/ ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-01-21kprobes/x86: Move ftrace-based kprobe code into kprobes-ftrace.cMasami Hiramatsu
Split ftrace-based kprobes code from kprobes, and introduce CONFIG_(HAVE_)KPROBES_ON_FTRACE Kconfig flags. For the cleanup reason, this also moves kprobe_ftrace check into skip_singlestep. Link: http://lkml.kernel.org/r/20120928081520.3560.25624.stgit@ltc138.sdl.hitachi.co.jp Cc: Ingo Molnar <mingo@elte.hu> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-01-21taint: add explicit flag to show whether lock dep is still OK.Rusty Russell
Fix up all callers as they were before, with make one change: an unsigned module taints the kernel, but doesn't turn off lockdep. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-01-19x86-32: Start out cr0 clean, disable paging before modifying cr3/4H. Peter Anvin
Patch 5a5a51db78e x86-32: Start out eflags and cr4 clean ... made x86-32 match x86-64 in that we initialize %eflags and %cr4 from scratch. This broke OLPC XO-1.5, because the XO enters the kernel with paging enabled, which the kernel doesn't expect. Since we no longer support 386 (the source of most of the variability in %cr0 configuration), we can simply match further x86-64 and initialize %cr0 to a fixed value -- the one variable part remaining in %cr0 is for FPU control, but all that is handled later on in initialization; in particular, configuring %cr0 as if the FPU is present until proven otherwise is correct and necessary for the probe to work. To deal with the XO case sanely, explicitly disable paging in %cr0 before we muck with %cr3, %cr4 or EFER -- those operations are inherently unsafe with paging enabled. NOTE: There is still a lot of 386-related junk in head_32.S which we can and should get rid of, however, this is intended as a minimal fix whereas the cleanup can be deferred to the next merge window. Reported-by: Andres Salomon <dilinger@queued.net> Tested-by: Daniel Drake <dsd@laptop.org> Link: http://lkml.kernel.org/r/50FA0661.2060400@linux.intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-18Merge tag 'stable/for-linus-3.8-rc3-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen Pull Xen fixes from Konrad Rzeszutek Wilk: - CVE-2013-0190/XSA-40 (or stack corruption for 32-bit PV kernels) - Fix racy vma access spotted by Al Viro - Fix mmap batch ioctl potentially resulting in large O(n) page allcations. - Fix vcpu online/offline BUG:scheduling while atomic.. - Fix unbound buffer scanning for more than 32 vCPUs. - Fix grant table being incorrectly initialized - Fix incorrect check in pciback - Allow privcmd in backend domains. Fix up whitespace conflict due to ugly merge resolution in Xen tree in arch/arm/xen/enlighten.c * tag 'stable/for-linus-3.8-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen: Fix stack corruption in xen_failsafe_callback for 32bit PVOPS guests. Revert "xen/smp: Fix CPU online/offline bug triggering a BUG: scheduling while atomic." xen/gntdev: remove erronous use of copy_to_user xen/gntdev: correctly unmap unlinked maps in mmu notifier xen/gntdev: fix unsafe vma access xen/privcmd: Fix mmap batch ioctl. Xen: properly bound buffer access when parsing cpu/*/availability xen/grant-table: correctly initialize grant table version 1 x86/xen : Fix the wrong check in pciback xen/privcmd: Relax access control in privcmd_ioctl_mmap
2013-01-17x86/nmi: export local_touch_nmi() symbol for modulesJacob Pan
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2013-01-16xen: Fix stack corruption in xen_failsafe_callback for 32bit PVOPS guests.Andrew Cooper
This fixes CVE-2013-0190 / XSA-40 There has been an error on the xen_failsafe_callback path for failed iret, which causes the stack pointer to be wrong when entering the iret_exc error path. This can result in the kernel crashing. In the classic kernel case, the relevant code looked a little like: popl %eax # Error code from hypervisor jz 5f addl $16,%esp jmp iret_exc # Hypervisor said iret fault 5: addl $16,%esp # Hypervisor said segment selector fault Here, there are two identical addls on either option of a branch which appears to have been optimised by hoisting it above the jz, and converting it to an lea, which leaves the flags register unaffected. In the PVOPS case, the code looks like: popl_cfi %eax # Error from the hypervisor lea 16(%esp),%esp # Add $16 before choosing fault path CFI_ADJUST_CFA_OFFSET -16 jz 5f addl $16,%esp # Incorrectly adjust %esp again jmp iret_exc It is possible unprivileged userspace applications to cause this behaviour, for example by loading an LDT code selector, then changing the code selector to be not-present. At this point, there is a race condition where it is possible for the hypervisor to return back to userspace from an interrupt, fault on its own iret, and inject a failsafe_callback into the kernel. This bug has been present since the introduction of Xen PVOPS support in commit 5ead97c84 (xen: Core Xen implementation), in 2.6.23. Signed-off-by: Frediano Ziglio <frediano.ziglio@citrix.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Cc: stable@vger.kernel.org Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-01-16Merge branch 'x86/urgent' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Peter Anvin: "This is mainly a workaround for a bug in Sandy Bridge graphics which causes corruption of certain memory pages." * 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/Sandy Bridge: Sandy Bridge workaround depends on CONFIG_PCI x86/Sandy Bridge: mark arrays in __init functions as __initconst x86/Sandy Bridge: reserve pages when integrated graphics is present x86, efi: correct precedence of operators in setup_efi_pci