summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2011-11-21freezer: don't distinguish nosig tasks on thawTejun Heo
There's no point in thawing nosig tasks before others. There's no ordering requirement between the two groups on thaw, which the staged thawing can't guarantee anyway. Simplify thaw_processes() by removing the distinction and collapsing thaw_tasks() into thaw_processes(). This will help further updates to freezer. Signed-off-by: Tejun Heo <tj@kernel.org>
2011-11-21freezer: remove racy clear_freeze_flag() and set PF_NOFREEZE on dead tasksTejun Heo
clear_freeze_flag() in exit_mm() is racy. Freezing can start afterwards. Remove it. Skipping freezer for exiting task will be properly implemented later. Also, freezable() was testing exit_state directly to make system freezer ignore dead tasks. Let the exiting task set PF_NOFREEZE after entering TASK_DEAD instead. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com>
2011-11-21freezer: rename thaw_process() to __thaw_task() and simplify the implementationTejun Heo
thaw_process() now has only internal users - system and cgroup freezers. Remove the unnecessary return value, rename, unexport and collapse __thaw_process() into it. This will help further updates to the freezer code. -v3: oom_kill grew a use of thaw_process() while this patch was pending. Convert it to use __thaw_task() for now. In the longer term, this should be handled by allowing tasks to die if killed even if it's frozen. -v2: minor style update as suggested by Matt. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Paul Menage <menage@google.com> Cc: Matt Helsley <matthltc@us.ibm.com>
2011-11-21freezer: implement and use kthread_freezable_should_stop()Tejun Heo
Writeback and thinkpad_acpi have been using thaw_process() to prevent deadlock between the freezer and kthread_stop(); unfortunately, this is inherently racy - nothing prevents freezing from happening between thaw_process() and kthread_stop(). This patch implements kthread_freezable_should_stop() which enters refrigerator if necessary but is guaranteed to return if kthread_stop() is invoked. Both thaw_process() users are converted to use the new function. Note that this deadlock condition exists for many of freezable kthreads. They need to be converted to use the new should_stop or freezable workqueue. Tested with synthetic test case. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Henrique de Moraes Holschuh <ibm-acpi@hmh.eng.br> Cc: Jens Axboe <axboe@kernel.dk> Cc: Oleg Nesterov <oleg@redhat.com>
2011-11-21freezer: unexport refrigerator() and update try_to_freeze() slightlyTejun Heo
There is no reason to export two functions for entering the refrigerator. Calling refrigerator() instead of try_to_freeze() doesn't save anything noticeable or removes any race condition. * Rename refrigerator() to __refrigerator() and make it return bool indicating whether it scheduled out for freezing. * Update try_to_freeze() to return bool and relay the return value of __refrigerator() if freezing(). * Convert all refrigerator() users to try_to_freeze(). * Update documentation accordingly. * While at it, add might_sleep() to try_to_freeze(). Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Samuel Ortiz <samuel@sortiz.org> Cc: Chris Mason <chris.mason@oracle.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jan Kara <jack@suse.cz> Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp> Cc: Christoph Hellwig <hch@infradead.org>
2011-11-21freezer: fix current->state restoration race in refrigerator()Tejun Heo
refrigerator() saves current->state before entering frozen state and restores it before returning using __set_current_state(); however, this is racy, for example, please consider the following sequence. set_current_state(TASK_INTERRUPTIBLE); try_to_freeze(); if (kthread_should_stop()) break; schedule(); If kthread_stop() races with ->state restoration, the restoration can restore ->state to TASK_INTERRUPTIBLE after kthread_stop() sets it to TASK_RUNNING but kthread_should_stop() may still see zero ->should_stop because there's no memory barrier between restoring TASK_INTERRUPTIBLE and kthread_should_stop() test. This isn't restricted to kthread_should_stop(). current->state is often used in memory barrier based synchronization and silently restoring it w/o mb breaks them. Use set_current_state() instead. Signed-off-by: Tejun Heo <tj@kernel.org>
2011-11-20Merge branch 'pm-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm * 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PM / Suspend: Fix bug in suspend statistics update PM / Hibernate: Fix the early termination of test modes PM / shmobile: Fix build of sh7372_pm_init() for CONFIG_PM unset PM Sleep: Do not extend wakeup paths to devices with ignore_children set PM / driver core: disable device's runtime PM during shutdown PM / devfreq: correct Kconfig dependency PM / devfreq: fix use after free in devfreq_remove_device PM / shmobile: Avoid restoring the INTCS state during initialization PM / devfreq: Remove compiler error after irq.h update PM / QoS: Properly use the WARN() macro in dev_pm_qos_add_request() PM / Clocks: Only disable enabled clocks in pm_clk_suspend() ARM: mach-shmobile: sh7372 A3SP no_suspend_console fix PM / shmobile: Don't skip debugging output in pd_power_up()
2011-11-19PM / Suspend: Fix bug in suspend statistics updateSrivatsa S. Bhat
After commit 2a77c46de1e3dace73745015635ebbc648eca69c (PM / Suspend: Add statistics debugfs file for suspend to RAM) a missing pair of braces inside the state_store() function causes even invalid arguments to suspend to be wrongly treated as failed suspend attempts. Fix this. [rjw: Put the hash/subject of the buggy commit into the changelog.] Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-11-19hrtimer: Fix extra wakeups from __remove_hrtimer()Jeff Ohlstein
__remove_hrtimer() attempts to reprogram the clockevent device when the timer being removed is the next to expire. However, __remove_hrtimer() reprograms the clockevent *before* removing the timer from the timerqueue and thus when hrtimer_force_reprogram() finds the next timer to expire it finds the timer we're trying to remove. This is especially noticeable when the system switches to NOHz mode and the system tick is removed. The timer tick is removed from the system but the clockevent is programmed to wakeup in another HZ anyway. Silence the extra wakeup by removing the timer from the timerqueue before calling hrtimer_force_reprogram() so that we actually program the clockevent for the next timer to expire. This was broken by 998adc3 "hrtimers: Convert hrtimers to use timerlist infrastructure". Signed-off-by: Jeff Ohlstein <johlstei@codeaurora.org> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1321660030-8520-1-git-send-email-johlstei@codeaurora.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-11-18PM / Hibernate: Fix the early termination of test modesSrivatsa S. Bhat
Commit 2aede851ddf08666f68ffc17be446420e9d2a056 (PM / Hibernate: Freeze kernel threads after preallocating memory) postponed the freezing of kernel threads to after preallocating memory for hibernation. But while doing that, the hibernation test TEST_FREEZER and the test mode HIBERNATION_TESTPROC were not moved accordingly. As a result, when using these test modes, it only goes upto the freezing of userspace and exits, when in fact it should go till the complete end of task freezing stage, namely the freezing of kernel threads as well. So, move these points of exit to appropriate places so that freezing of kernel threads is also tested while using these test harnesses. Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-11-17timekeeping: add arch_offset hook to ktime_get functionsHector Palacios
ktime_get and ktime_get_ts were calling timekeeping_get_ns() but later they were not calling arch_gettimeoffset() so architectures using this mechanism returned 0 ns when calling these functions. This happened for example when running Busybox's ping which calls syscall(__NR_clock_gettime, CLOCK_MONOTONIC, ts) which eventually calls ktime_get. As a result the returned ping travel time was zero. CC: stable@kernel.org Signed-off-by: Hector Palacios <hector.palacios@digi.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
2011-11-17genirq: Don't allow per cpu interrupts to be suspendedMarc Zyngier
The power management functions related to interrupts do not know (yet) about per-cpu interrupts and end up calling the wrong low-level methods to enable/disable interrupts. This leads to all kind of interesting issues (action taken on one CPU only, updating a refcount which is not used otherwise...). The workaround for the time being is simply to flag these interrupts with IRQF_NO_SUSPEND. At least on ARM, these interrupts are actually dealt with at the architecture level. Reported-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/1321446459-31409-1-git-send-email-marc.zyngier@arm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-11-17tracing: Add entries in buffer and total entries to default output headerSteven Rostedt
Knowing the number of event entries in the ring buffer compared to the total number that were written is useful information. The latency format gives this information and there's no reason that the default format does not. This information is now added to the default header, along with the number of online CPUs: # tracer: nop # # entries-in-buffer/entries-written: 159836/64690869 #P:4 # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / delay # TASK-PID CPU# |||| TIMESTAMP FUNCTION # | | | |||| | | <idle>-0 [000] ...2 49.442971: local_touch_nmi <-cpu_idle <idle>-0 [000] d..2 49.442973: enter_idle <-cpu_idle <idle>-0 [000] d..2 49.442974: atomic_notifier_call_chain <-enter_idle <idle>-0 [000] d..2 49.442976: __atomic_notifier_call_chain <-atomic_notifier The above shows that the trace contains 159836 entries, but 64690869 were written. One could figure out that there were 64531033 entries that were dropped. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-11-17tracing: Add irq, preempt-count and need resched info to default trace outputSteven Rostedt
People keep asking how to get the preempt count, irq, and need resched info and we keep telling them to enable the latency format. Some developers think that traces without this info is completely useless, and for a lot of tasks it is useless. The first option was to enable the latency trace as the default format, but the header for the latency format is pretty useless for most tracers and it also does the timestamp in straight microseconds from the time the trace started. This is sometimes more difficult to read as the default trace is seconds from the start of boot up. Latency format: # tracer: nop # # nop latency trace v1.1.5 on 3.2.0-rc1-test+ # -------------------------------------------------------------------- # latency: 0 us, #159771/64234230, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4) # ----------------- # | task: -0 (uid:0 nice:0 policy:0 rt_prio:0) # ----------------- # # _------=> CPU# # / _-----=> irqs-off # | / _----=> need-resched # || / _---=> hardirq/softirq # ||| / _--=> preempt-depth # |||| / delay # cmd pid ||||| time | caller # \ / ||||| \ | / migratio-6 0...2 41778231us+: rcu_note_context_switch <-__schedule migratio-6 0...2 41778233us : trace_rcu_utilization <-rcu_note_context_switch migratio-6 0...2 41778235us+: rcu_sched_qs <-rcu_note_context_switch migratio-6 0d..2 41778236us+: rcu_preempt_qs <-rcu_note_context_switch migratio-6 0...2 41778238us : trace_rcu_utilization <-rcu_note_context_switch migratio-6 0...2 41778239us+: debug_lockdep_rcu_enabled <-__schedule default format: # tracer: nop # # TASK-PID CPU# TIMESTAMP FUNCTION # | | | | | migration/0-6 [000] 50.025810: rcu_note_context_switch <-__schedule migration/0-6 [000] 50.025812: trace_rcu_utilization <-rcu_note_context_switch migration/0-6 [000] 50.025813: rcu_sched_qs <-rcu_note_context_switch migration/0-6 [000] 50.025815: rcu_preempt_qs <-rcu_note_context_switch migration/0-6 [000] 50.025817: trace_rcu_utilization <-rcu_note_context_switch migration/0-6 [000] 50.025818: debug_lockdep_rcu_enabled <-__schedule migration/0-6 [000] 50.025820: debug_lockdep_rcu_enabled <-__schedule The latency format header has latency information that is pretty meaningless for most tracers. Although some of the header is useful, and we can add that later to the default format as well. What is really useful with the latency format is the irqs-off, need-resched hard/softirq context and the preempt count. This commit adds the option irq-info which is on by default that adds this information: # tracer: nop # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / delay # TASK-PID CPU# |||| TIMESTAMP FUNCTION # | | | |||| | | <idle>-0 [000] d..2 49.309305: cpuidle_get_driver <-cpuidle_idle_call <idle>-0 [000] d..2 49.309307: mwait_idle <-cpu_idle <idle>-0 [000] d..2 49.309309: need_resched <-mwait_idle <idle>-0 [000] d..2 49.309310: test_ti_thread_flag <-need_resched <idle>-0 [000] d..2 49.309312: trace_power_start.constprop.13 <-mwait_idle <idle>-0 [000] d..2 49.309313: trace_cpu_idle <-mwait_idle <idle>-0 [000] d..2 49.309315: need_resched <-mwait_idle If a user wants the old format, they can disable the 'irq-info' option: # tracer: nop # # TASK-PID CPU# TIMESTAMP FUNCTION # | | | | | <idle>-0 [000] 49.309305: cpuidle_get_driver <-cpuidle_idle_call <idle>-0 [000] 49.309307: mwait_idle <-cpu_idle <idle>-0 [000] 49.309309: need_resched <-mwait_idle <idle>-0 [000] 49.309310: test_ti_thread_flag <-need_resched <idle>-0 [000] 49.309312: trace_power_start.constprop.13 <-mwait_idle <idle>-0 [000] 49.309313: trace_cpu_idle <-mwait_idle <idle>-0 [000] 49.309315: need_resched <-mwait_idle Requested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-11-17Merge branch 'irq-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq: Fix irqfixup, irqpoll regression
2011-11-17writeback: remove vm_dirties and task->dirtiesWu Fengguang
They are not used any more. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
2011-11-17sched: Move all scheduler bits into kernel/sched/Peter Zijlstra
There's too many sched*.[ch] files in kernel/, give them their own directory. (No code changed, other than Makefile glue added.) Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-11-17sched: Make separate sched*.c translation unitsPeter Zijlstra
Since once needs to do something at conferences and fixing compile warnings doesn't actually require much if any attention I decided to break up the sched.c #include "*.c" fest. This further modularizes the scheduler code. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-x0fcd3mnp8f9c99grcpewmhi@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-11-16sched: Fix comment for requeue_rt_entityRichard Weinberger
Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1321117677-3282-1-git-send-email-richard@nod.at Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-11-16sched: Don't call task_group() too many times in set_task_rq()Andrew Vagin
It improves perfomance, especially if autogroup is enabled. The size of set_task_rq() was 0x180 and now it is 0xa0. Signed-off-by: Andrew Vagin <avagin@openvz.org> Acked-by: Paul Turner <pjt@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1321020240-3874331-1-git-send-email-avagin@openvz.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-11-16sched, trivial: Initialize root cgroup's sibling listGlauber Costa
Even though there are no siblings, the list should be initialized to not contain bogus values. Signed-off-by: Glauber Costa <glommer@parallels.com> Acked-by: Paul Menage <paul@paulmenage.org> Acked-by: Paul Turner <pjt@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1320182360-20043-2-git-send-email-glommer@parallels.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-11-16sched: Use jump labels to reduce overhead when bandwidth control is inactivePaul Turner
Now that the linkage of jump-labels has been fixed they show a measurable improvement in overhead for the enabled-but-unused case. Workload is: 'taskset -c 0 perf stat --repeat 50 -e instructions,cycles,branches bash -c "for ((i=0;i<5;i++)); do $(dirname $0)/pipe-test 20000; done"' There's a speedup for all situations: instructions cycles branches ------------------------------------------------------------------------- Intel Westmere base 806611770 745895590 146765378 +jumplabel 803090165 (-0.44%) 713381840 (-4.36%) 144561130 AMD Barcelona base 824657415 740055589 148855354 +jumplabel 821056910 (-0.44%) 737558389 (-0.34%) 146635229 Signed-off-by: Paul Turner <pjt@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20111108042736.560831357@google.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-11-16sched: Fix buglet in return_cfs_rq_runtime()Paul Turner
In return_cfs_rq_runtime() we want to return bandwidth when there are no remaining tasks, not "return" when this is the case. Signed-off-by: Paul Turner <pjt@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20111108042736.623812423@google.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-11-16sched: Avoid SMT siblings in select_idle_sibling() if possiblePeter Zijlstra
Avoid select_idle_sibling() from picking a sibling thread if there's an idle core that shares cache. This fixes SMT balancing in the increasingly common case where there's a shared cache core available to balance to. Tested-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Link: http://lkml.kernel.org/r/1321350377.1421.55.camel@twins Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-11-15Merge branch 'core' of git://amd64.org/linux/rric into perf/coreIngo Molnar
2011-11-14lockdep: Always try to set ->class_cache in register_lock_class() ↵Yong Zhang
lockdep_init_map() Commit ["62016250 lockdep: Add improved subclass caching"] tries to improve performance (expecially to reduce the cost of rq->lock) when using lockdep, but it fails due to lockdep_init_map() in which ->class_cache is cleared. The typical caller is lock_set_subclass(), after that class will not be cached anymore. This patch tries to achive the goal of commit 62016250 by always setting ->class_cache in register_lock_class(). === Score comparison of benchmarks === for i in `seq 1 10`; do ./perf bench -f simple sched messaging; done before: min: 0.604, max: 0.660, avg: 0.622 after: min: 0.414, max: 0.473, avg: 0.427 for i in `seq 1 10`; do ./perf bench -f simple sched messaging -g 40; done before: min: 2.347, max: 2.421, avg: 2.391 after: min: 1.652, max: 1.699, avg: 1.671 Signed-off-by: Yong Zhang <yong.zhang0@gmail.com> Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20111109080714.GC8124@zhy Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-11-14printk, lockdep: Switch to tracked irq opsPeter Zijlstra
Switch to local_irq_ ops so that the irq state is properly tracked (raw_local_irq_* isn't tracked by lockdep, causing confusion). Possible now that commit dd4e5d3ac4a ("lockdep: Fix trace_[soft,hard]irqs_[on,off]() recursion") cured the reason we needed the raw_ ops. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-11-14printk, lockdep: Remove superfluous preempt_disable()Peter Zijlstra
The raw_lock_irq_{save,restore}() already implies a non-preemptibility. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-11-14printk, lockdep: Disable lock debugging on zap_locks()Peter Zijlstra
zap_locks() is used by printk() in a last ditch effort to get data out, clearly we cannot trust lock state after this so make it disable lock debugging. Also don't treat printk recursion through lockdep as a normal recursion bug but try hard to get the lockdep splat out. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-kqxwmo4xz37e1s8w0xopvr0q@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-11-14events: Don't divide events if it has field periodAndrew Vagin
This patch solves the following problem: Now some samples may be lost due to throttling. The number of samples is restricted by sysctl_perf_event_sample_rate/HZ. A trace event is divided on some samples according to event's period. I don't sure, that we should generate more than one sample on each trace event. I think the better way to use SAMPLE_PERIOD. E.g.: I want to trace when a process sleeps. I created a process, which sleeps for 1ms and for 4ms. perf got 100 events in both cases. swapper 0 [000] 1141.371830: sched_stat_sleep: comm=foo pid=1801 delay=1386750 [ns] swapper 0 [000] 1141.369444: sched_stat_sleep: comm=foo pid=1801 delay=4499585 [ns] In the first case a kernel want to send 4499585 events and in the second case it wants to send 1386750 events. perf-reports shows that process sleeps in both places equal time. It's bug. With this patch kernel generates one event on each "sleep" and the time slice is saved in the field "period". Perf knows how handle it. Signed-off-by: Andrew Vagin <avagin@openvz.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1320670457-2633428-3-git-send-email-avagin@openvz.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-11-14perf: Carve out callchain functionalityBorislav Petkov
Split the callchain code from the perf events core into a new kernel/events/callchain.c file. This simplifies a bit the big core.c Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Stephane Eranian <eranian@google.com> [keep ctx recursion handling inline and use internal headers] Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1318778104-17152-1-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-11-14perf: Do not set task_ctx pointer in cpuctx if there are no events in the ↵Gleb Natapov
context Do not set task_ctx pointer during sched_in if there are no events associated with the context. Otherwise if during task execution total number of events in the system will become zero perf_event_context_sched_out() will not be called and cpuctx->task_ctx will be left with a stale value. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20111023171033.GI17571@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-11-14sched: Set the command name of the idle tasks in SMP kernelsCarsten Emde
In UP systems, the idle task is initialized using the init_task structure from which the command name is taken (currently "swapper"). In SMP systems, one idle task per CPU is forked by the worker thread from which the task structure is copied. The command name is, therefore, "kworker/0:0" or "kworker/0:1", if not updated. Since such update was lacking, all idle tasks in SMP systems were incorrectly named. This longtime bug was not discovered immediately, because there is no /proc/0 entry - the bug only becomes apparent when tracing is enabled. This patch sets the command name of the idle tasks in SMP systems to the name that is used in the INIT_TASK structure suffixed by a slash and the number of the CPU. Signed-off-by: Carsten Emde <C.Emde@osadl.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20111026211708.768925506@osadl.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-11-14sched, rt: Provide means of disabling cross-cpu bandwidth sharingPeter Zijlstra
Normally the RT bandwidth scheme will share bandwidth across the entire root_domain. However sometimes its convenient to disable this sharing for debug purposes. Provide a simple feature switch to this end. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-11-14sched: Document wait_for_completion_*() return valuesJ. Bruce Fields
The return-value convention for these functions varies depending on whether they're interruptible or can timeout. It can be a little confusing--document it. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20111006192246.GB28026@fieldses.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-11-14sched_fair: Fix a typo in the comment describing update_sd_lb_statsHui Kang
Signed-off-by: Hui Kang <hkang.sunysb@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1318388459-4427-1-git-send-email-hkang.sunysb@gmail.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-11-14sched: Add a comment to effective_load() since it's a painPeter Zijlstra
Every time I have to stare at this function I need to completely reverse engineer its workings, about time I write a comment explaining the thing. Collected bits and pieces from previous changelogs, mostly: 4be9daaa1b33701f011f4117f22dc1e45a3e6e34 83378269a5fad98f562ebc0f09c349575e6cbfe1 Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1318518057.27731.2.camel@twins Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-11-13Merge branch 'master' into for-nextJiri Kosina
Sync with Linus tree to have 157550ff ("mtd: add GPMI-NAND driver in the config and Makefile") as I have patch depending on that one.
2011-11-11Merge branch 'tip/perf/core' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace into perf/core
2011-11-11Merge branch 'formingo/3.2/tip/timers/core' of ↵Ingo Molnar
git://git.linaro.org/people/jstultz/linux into timers/core Conflicts: kernel/time/timekeeping.c
2011-11-10clocksource: Avoid selecting mult values that might overflow when adjustedJohn Stultz
For some frequencies, the clocks_calc_mult_shift() function will unfortunately select mult values very close to 0xffffffff. This has the potential to overflow when NTP adjusts the clock, adding to the mult value. This patch adds a clocksource.maxadj value, which provides an approximation of an 11% adjustment(NTP limits adjustments to 500ppm and the tick adjustment is limited to 10%), which could be made to the clocksource.mult value. This is then used to both check that the current mult value won't overflow/underflow, as well as warning us if the timekeeping_adjust() code pushes over that 11% boundary. v2: Fix max_adjustment calculation, and improve WARN_ONCE messages. v3: Don't warn before maxadj has actually been set CC: Yong Zhang <yong.zhang0@gmail.com> CC: David Daney <ddaney.cavm@gmail.com> CC: Thomas Gleixner <tglx@linutronix.de> CC: Chen Jie <chenj@lemote.com> CC: zhangfx <zhangfx@lemote.com> CC: stable@kernel.org Reported-by: Chen Jie <chenj@lemote.com> Reported-by: zhangfx <zhangfx@lemote.com> Tested-by: Yong Zhang <yong.zhang0@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
2011-11-08Merge branch 'perf/core' into oprofile/masterRobert Richter
Merge reason: Resolve conflicts with Don's NMI rework: commit 9c48f1c629ecfa114850c03f875c6691003214de Author: Don Zickus <dzickus@redhat.com> Date: Fri Sep 30 15:06:21 2011 -0400 x86, nmi: Wire up NMI handlers to new routines Conflicts: arch/x86/oprofile/nmi_timer_int.c Signed-off-by: Robert Richter <robert.richter@amd.com>
2011-11-07PM / QoS: Set cpu_dma_pm_qos->nameDominik Brodowski
Since commit 4a31a334, the name of this misc device is not initialized, which leads to a funny device named /dev/(null) being created and /proc/misc containing an entry with just a number but no name. The latter leads to complaints by cryptsetup, which caused me to investigate this matter. Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-11-07tracing/latency: Fix header output for latency tracersJiri Olsa
In case the the graph tracer (CONFIG_FUNCTION_GRAPH_TRACER) or even the function tracer (CONFIG_FUNCTION_TRACER) are not set, the latency tracers do not display proper latency header. The involved/fixed latency tracers are: wakeup_rt wakeup preemptirqsoff preemptoff irqsoff The patch adds proper handling of tracer configuration options for latency tracers, and displaying correct header info accordingly. * The current output (for wakeup tracer) with both graph and function tracers disabled is: # tracer: wakeup # <idle>-0 0d.h5 1us+: 0:120:R + [000] 7: 0:R watchdog/0 <idle>-0 0d.h5 3us+: ttwu_do_activate.clone.1 <-try_to_wake_up ... * The fixed output is: # tracer: wakeup # # wakeup latency trace v1.1.5 on 3.1.0-tip+ # -------------------------------------------------------------------- # latency: 55 us, #4/4, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2) # ----------------- # | task: migration/0-6 (uid:0 nice:0 policy:1 rt_prio:99) # ----------------- # # _------=> CPU# # / _-----=> irqs-off # | / _----=> need-resched # || / _---=> hardirq/softirq # ||| / _--=> preempt-depth # |||| / delay # cmd pid ||||| time | caller # \ / ||||| \ | / cat-1129 0d..4 1us : 1129:120:R + [000] 6: 0:R migration/0 cat-1129 0d..4 2us+: ttwu_do_activate.clone.1 <-try_to_wake_up * The current output (for wakeup tracer) with only function tracer enabled is: # tracer: wakeup # cat-1140 0d..4 1us+: 1140:120:R + [000] 6: 0:R migration/0 cat-1140 0d..4 2us : ttwu_do_activate.clone.1 <-try_to_wake_up * The fixed output is: # tracer: wakeup # # wakeup latency trace v1.1.5 on 3.1.0-tip+ # -------------------------------------------------------------------- # latency: 207 us, #109/109, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2) # ----------------- # | task: watchdog/1-12 (uid:0 nice:0 policy:1 rt_prio:99) # ----------------- # # _------=> CPU# # / _-----=> irqs-off # | / _----=> need-resched # || / _---=> hardirq/softirq # ||| / _--=> preempt-depth # |||| / delay # cmd pid ||||| time | caller # \ / ||||| \ | / <idle>-0 1d.h5 1us+: 0:120:R + [001] 12: 0:R watchdog/1 <idle>-0 1d.h5 3us : ttwu_do_activate.clone.1 <-try_to_wake_up Link: http://lkml.kernel.org/r/20111107150849.GE1807@m.brq.redhat.com Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-11-07ftrace: Fix hash record accounting bugSteven Rostedt
If the set_ftrace_filter is cleared by writing just whitespace to it, then the filter hash refcounts will be decremented but not updated. This causes two bugs: 1) No functions will be enabled for tracing when they all should be 2) If the users clears the set_ftrace_filter twice, it will crash ftrace: ------------[ cut here ]------------ WARNING: at /home/rostedt/work/git/linux-trace.git/kernel/trace/ftrace.c:1384 __ftrace_hash_rec_update.part.27+0x157/0x1a7() Modules linked in: Pid: 2330, comm: bash Not tainted 3.1.0-test+ #32 Call Trace: [<ffffffff81051828>] warn_slowpath_common+0x83/0x9b [<ffffffff8105185a>] warn_slowpath_null+0x1a/0x1c [<ffffffff810ba362>] __ftrace_hash_rec_update.part.27+0x157/0x1a7 [<ffffffff810ba6e8>] ? ftrace_regex_release+0xa7/0x10f [<ffffffff8111bdfe>] ? kfree+0xe5/0x115 [<ffffffff810ba51e>] ftrace_hash_move+0x2e/0x151 [<ffffffff810ba6fb>] ftrace_regex_release+0xba/0x10f [<ffffffff8112e49a>] fput+0xfd/0x1c2 [<ffffffff8112b54c>] filp_close+0x6d/0x78 [<ffffffff8113a92d>] sys_dup3+0x197/0x1c1 [<ffffffff8113a9a6>] sys_dup2+0x4f/0x54 [<ffffffff8150cac2>] system_call_fastpath+0x16/0x1b ---[ end trace 77a3a7ee73794a02 ]--- Link: http://lkml.kernel.org/r/20111101141420.GA4918@debian Reported-by: Rabin Vincent <rabin@rab.in> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-11-07jump_label: jump_label_inc may return before the code is patchedGleb Natapov
If cpu A calls jump_label_inc() just after atomic_add_return() is called by cpu B, atomic_inc_not_zero() will return value greater then zero and jump_label_inc() will return to a caller before jump_label_update() finishes its job on cpu B. Link: http://lkml.kernel.org/r/20111018175551.GH17571@redhat.com Cc: stable@vger.kernel.org Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Jason Baron <jbaron@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-11-07ftrace: Remove force undef config value left for testingSteven Rostedt
A forced undef of a config value was used for testing and was accidently left in during the final commit. This causes x86 to run slower than needed while running function tracing as well as causes the function graph selftest to fail when DYNMAIC_FTRACE is not set. This is because the code in MCOUNT expects the ftrace code to be processed with the config value set that happened to be forced not set. The forced config option was left in by: commit 6331c28c962561aee59e5a493b7556a4bb585957 ftrace: Fix dynamic selftest failure on some archs Link: http://lkml.kernel.org/r/20111102150255.GA6973@debian Cc: stable@vger.kernel.org Reported-by: Rabin Vincent <rabin@rab.in> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-11-07lockdep: Show subclass in pretty print of lockdep outputSteven Rostedt
The pretty print of the lockdep debug splat uses just the lock name to show how the locking scenario happens. But when it comes to nesting locks, the output becomes confusing which takes away the point of the pretty printing of the lock scenario. Without displaying the subclass info, we get the following output: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(slock-AF_INET); lock(slock-AF_INET); lock(slock-AF_INET); lock(slock-AF_INET); *** DEADLOCK *** The above looks more of a A->A locking bug than a A->B B->A. By adding the subclass to the output, we can see what really happened: other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(slock-AF_INET); lock(slock-AF_INET/1); lock(slock-AF_INET); lock(slock-AF_INET/1); *** DEADLOCK *** This bug was discovered while tracking down a real bug caught by lockdep. Link: http://lkml.kernel.org/r/20111025202049.GB25043@hostway.ca Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Reported-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Simon Kirby <sim@hostway.ca> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-11-06Merge branch 'upstream/jump-label-noearly' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen * 'upstream/jump-label-noearly' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen: jump-label: initialize jump-label subsystem much earlier x86/jump_label: add arch_jump_label_transform_static() s390/jump-label: add arch_jump_label_transform_static() jump_label: add arch_jump_label_transform_static() to optimise non-live code updates sparc/jump_label: drop arch_jump_label_text_poke_early() x86/jump_label: drop arch_jump_label_text_poke_early() jump_label: if a key has already been initialized, don't nop it out stop_machine: make stop_machine safe and efficient to call early jump_label: use proper atomic_t initializer Conflicts: - arch/x86/kernel/jump_label.c Added __init_or_module to arch_jump_label_text_poke_early vs removal of that function entirely - kernel/stop_machine.c same patch ("stop_machine: make stop_machine safe and efficient to call early") merged twice, with whitespace fix in one version
2011-11-06Merge branch 'modsplit-Oct31_2011' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux * 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits) Revert "tracing: Include module.h in define_trace.h" irq: don't put module.h into irq.h for tracking irqgen modules. bluetooth: macroize two small inlines to avoid module.h ip_vs.h: fix implicit use of module_get/module_put from module.h nf_conntrack.h: fix up fallout from implicit moduleparam.h presence include: replace linux/module.h with "struct module" wherever possible include: convert various register fcns to macros to avoid include chaining crypto.h: remove unused crypto_tfm_alg_modname() inline uwb.h: fix implicit use of asm/page.h for PAGE_SIZE pm_runtime.h: explicitly requires notifier.h linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h miscdevice.h: fix up implicit use of lists and types stop_machine.h: fix implicit use of smp.h for smp_processor_id of: fix implicit use of errno.h in include/linux/of.h of_platform.h: delete needless include <linux/module.h> acpi: remove module.h include from platform/aclinux.h miscdevice.h: delete unnecessary inclusion of module.h device_cgroup.h: delete needless include <linux/module.h> net: sch_generic remove redundant use of <linux/module.h> net: inet_timewait_sock doesnt need <linux/module.h> ... Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in - drivers/media/dvb/frontends/dibx000_common.c - drivers/media/video/{mt9m111.c,ov6650.c} - drivers/mfd/ab3550-core.c - include/linux/dmaengine.h