summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2012-09-17Merge branch 'pm-timers'Rafael J. Wysocki
* pm-timers: PM: Do not use the syscore flag for runtime PM sh: MTU2: Basic runtime PM support sh: CMT: Basic runtime PM support sh: TMU: Basic runtime PM support PM / Domains: Do not measure start time for "irq safe" devices PM / Domains: Move syscore flag from subsys data to struct device PM / Domains: Rename the always_on device flag to syscore PM / Runtime: Allow helpers to be called by early platform drivers PM: Reorganize device PM initialization sh: MTU2: Introduce clock events suspend/resume routines sh: CMT: Introduce clocksource/clock events suspend/resume routines sh: TMU: Introduce clocksource/clock events suspend/resume routines timekeeping: Add suspend and resume of clock event devices PM / Domains: Add power off/on function for system core suspend stage PM / Domains: Introduce simplified power on routine for system resume
2012-09-17arm64: Add support for /proc/sys/debug/exception-traceCatalin Marinas
This patch allows setting of the show_unhandled_signals variable via /proc/sys/debug/exception-trace. The default value is currently 1 showing unhandled user faults (undefined instructions, data aborts) and invalid signal stack frames. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Tony Lindgren <tony@atomide.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Nicolas Pitre <nico@linaro.org> Acked-by: Olof Johansson <olof@lixom.net> Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
2012-09-16Revert "sched: Improve scalability via 'CPU buddies', which withstand random ↵Linus Torvalds
perturbations" This reverts commit 970e178985cadbca660feb02f4d2ee3a09f7fdda. Nikolay Ulyanitsky reported thatthe 3.6-rc5 kernel has a 15-20% performance drop on PostgreSQL 9.2 on his machine (running "pgbench"). Borislav Petkov was able to reproduce this, and bisected it to this commit 970e178985ca ("sched: Improve scalability via 'CPU buddies' ...") apparently because the new single-idle-buddy model simply doesn't find idle CPU's to reschedule on aggressively enough. Mike Galbraith suspects that it is likely due to the user-mode spinlocks in PostgreSQL not reacting well to preemption, but we don't really know the details - I'll just revert the commit for now. There are hopefully other approaches to improve scheduler scalability without it causing these kinds of downsides. Reported-by: Nikolay Ulyanitsky <lystor@gmail.com> Bisected-by: Borislav Petkov <bp@alien8.de> Acked-by: Mike Galbraith <efault@gmx.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-09-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: net/netfilter/nfnetlink_log.c net/netfilter/xt_LOG.c Rather easy conflict resolution, the 'net' tree had bug fixes to make sure we checked if a socket is a time-wait one or not and elide the logging code if so. Whereas on the 'net-next' side we are calculating the UID and GID from the creds using different interfaces due to the user namespace changes from Eric Biederman. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-15uprobes: Introduce arch_uprobe_enable/disable_step()Sebastian Andrzej Siewior
As Oleg pointed out in [0] uprobe should not use the ptrace interface for enabling/disabling single stepping. [0] http://lkml.kernel.org/r/20120730141638.GA5306@redhat.com Add the new "__weak arch" helpers which simply call user_*_single_step() as a preparation. This is only needed to not break the powerpc port, we will fold this logic into arch_uprobe_pre/post_xol() hooks later. We should also change handle_singlestep(), _disable_step(&uprobe->arch) should be called before put_uprobe(). Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2012-09-15uprobes: Teach find_active_uprobe() to clear MMF_HAS_UPROBESOleg Nesterov
The wrong MMF_HAS_UPROBES doesn't really hurt, just it triggers the "slow" and unnecessary handle_swbp() path if the task hits the non-uprobe breakpoint. So this patch changes find_active_uprobe() to check every valid vma and clear MMF_HAS_UPROBES if no uprobes were found. This is adds the slow O(n) path, but it is only called in unlikely case when the task hits the normal breakpoint first time after uprobe_unregister(). Note the "not strictly accurate" comment in mmf_recalc_uprobes(). We can fix this, we only need to teach vma_has_uprobes() to return a bit more more info, but I am not sure this worth the trouble. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2012-09-15uprobes: Introduce MMF_RECALC_UPROBESOleg Nesterov
Add the new MMF_RECALC_UPROBES flag, it means that MMF_HAS_UPROBES can be false positive after remove_breakpoint() or uprobe_munmap(). It is also set by uprobe_dup_mmap(), this is not optimal but simple. We could add the new hook, uprobe_dup_vma(), to set MMF_HAS_UPROBES only if the new mm actually has uprobes, but I don't think this makes sense. The next patch will use this flag to clear MMF_HAS_UPROBES. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2012-09-15uprobes: uprobes_treelock should not disable irqsOleg Nesterov
Nobody plays with uprobes_tree/uprobes_treelock in interrupt context, no need to disable irqs. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2012-09-15uprobes: Don't put NULL pointer in uprobe_register()Sebastian Andrzej Siewior
alloc_uprobe() might return a NULL pointer, put_uprobe() can't deal with this. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
2012-09-14Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Smaller fixlets" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched: Fix kernel-doc warnings in kernel/sched/fair.c sched: Unthrottle rt runqueues in __disable_runtime() sched: Add missing call to calc_load_exit_idle() sched: Fix load avg vs cpu-hotplug
2012-09-14Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "This tree includes various fixes" Ingo really needs to improve on the whole "explain git pull" part. "Various fixes" indeed. * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/hwpb: Invoke __perf_event_disable() if interrupts are already disabled perf/x86: Enable Intel Cedarview Atom suppport perf_event: Switch to internal refcount, fix race with close() oprofile, s390: Fix uninitialized memory access when writing to oprofilefs perf/x86: Fix microcode revision check for SNB-PEBS
2012-09-14cgroup: mark subsystems with broken hierarchy support and whine if cgroups ↵Tejun Heo
are nested for them Currently, cgroup hierarchy support is a mess. cpu related subsystems behave correctly - configuration, accounting and control on a parent properly cover its children. blkio and freezer completely ignore hierarchy and treat all cgroups as if they're directly under the root cgroup. Others show yet different behaviors. These differing interpretations of cgroup hierarchy make using cgroup confusing and it impossible to co-mount controllers into the same hierarchy and obtain sane behavior. Eventually, we want full hierarchy support from all subsystems and probably a unified hierarchy. Users using separate hierarchies expecting completely different behaviors depending on the mounted subsystem is deterimental to making any progress on this front. This patch adds cgroup_subsys.broken_hierarchy and sets it to %true for controllers which are lacking in hierarchy support. The goal of this patch is two-fold. * Move users away from using hierarchy on currently non-hierarchical subsystems, so that implementing proper hierarchy support on those doesn't surprise them. * Keep track of which controllers are broken how and nudge the subsystems to implement proper hierarchy support. For now, start with a single warning message. We can whine louder later on. v2: Fixed a typo spotted by Michal. Warning message updated. v3: Updated memcg part so that it doesn't generate warning in the cases where .use_hierarchy=false doesn't make the behavior different from root.use_hierarchy=true. Fixed a typo spotted by Glauber. v4: Check ->broken_hierarchy after cgroup creation is complete so that ->create() can affect the result per Michal. Dropped unnecessary memcg root handling per Michal. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Michal Hocko <mhocko@suse.cz> Acked-by: Li Zefan <lizefan@huawei.com> Acked-by: Serge E. Hallyn <serue@us.ibm.com> Cc: Glauber Costa <glommer@parallels.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul Turner <pjt@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Thomas Graf <tgraf@suug.ch> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
2012-09-14cgroup: Assign subsystem IDs during compile timeDaniel Wagner
WARNING: With this change it is impossible to load external built controllers anymore. In case where CONFIG_NETPRIO_CGROUP=m and CONFIG_NET_CLS_CGROUP=m is set, corresponding subsys_id should also be a constant. Up to now, net_prio_subsys_id and net_cls_subsys_id would be of the type int and the value would be assigned during runtime. By switching the macro definition IS_SUBSYS_ENABLED from IS_BUILTIN to IS_ENABLED, all *_subsys_id will have constant value. That means we need to remove all the code which assumes a value can be assigned to net_prio_subsys_id and net_cls_subsys_id. A close look is necessary on the RCU part which was introduces by following patch: commit f845172531fb7410c7fb7780b1a6e51ee6df7d52 Author: Herbert Xu <herbert@gondor.apana.org.au> Mon May 24 09:12:34 2010 Committer: David S. Miller <davem@davemloft.net> Mon May 24 09:12:34 2010 cls_cgroup: Store classid in struct sock Tis code was added to init_cgroup_cls() /* We can't use rcu_assign_pointer because this is an int. */ smp_wmb(); net_cls_subsys_id = net_cls_subsys.subsys_id; respectively to exit_cgroup_cls() net_cls_subsys_id = -1; synchronize_rcu(); and in module version of task_cls_classid() rcu_read_lock(); id = rcu_dereference(net_cls_subsys_id); if (id >= 0) classid = container_of(task_subsys_state(p, id), struct cgroup_cls_state, css)->classid; rcu_read_unlock(); Without an explicit explaination why the RCU part is needed. (The rcu_deference was fixed by exchanging it to rcu_derefence_index_check() in a later commit, but that is a minor detail.) So here is my pondering why it was introduced and why it safe to remove it now. Note that this code was copied over to net_prio the reasoning holds for that subsystem too. The idea behind the RCU use for net_cls_subsys_id is to make sure we get a valid pointer back from task_subsys_state(). task_subsys_state() is just blindly accessing the subsys array and returning the pointer. Obviously, passing in -1 as id into task_subsys_state() returns an invalid value (out of lower bound). So this code makes sure that only after module is loaded and the subsystem registered, the id is assigned. Before unregistering the module all old readers must have left the critical section. This is done by assigning -1 to the id and issuing a synchronized_rcu(). Any new readers wont call task_subsys_state() anymore and therefore it is safe to unregister the subsystem. The new code relies on the same trick, but it looks at the subsys pointer return by task_subsys_state() (remember the id is constant and therefore we allways have a valid index into the subsys array). No precautions need to be taken during module loading module. Eventually, all CPUs will get a valid pointer back from task_subsys_state() because rebind_subsystem() which is called after the module init() function will assigned subsys[net_cls_subsys_id] the newly loaded module subsystem pointer. When the subsystem is about to be removed, rebind_subsystem() will called before the module exit() function. In this case, rebind_subsys() will assign subsys[net_cls_subsys_id] a NULL pointer and then it calls synchronize_rcu(). All old readers have left by then the critical section. Any new reader wont access the subsystem anymore. At this point we are safe to unregister the subsystem. No synchronize_rcu() call is needed. Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Eric Dumazet <edumazet@google.com> Cc: Gao feng <gaofeng@cn.fujitsu.com> Cc: Glauber Costa <glommer@parallels.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: John Fastabend <john.r.fastabend@intel.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: netdev@vger.kernel.org Cc: cgroups@vger.kernel.org
2012-09-14cgroup: Do not depend on a given order when populating the subsys arrayDaniel Wagner
The *_subsys_id will be used as index to access the subsys. Therefore we need to care we populate the subsystem at the correct position by using designated initialization. With this change we are able to interleave builtin and modules in the subsys array. Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Cc: Gao feng <gaofeng@cn.fujitsu.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: John Fastabend <john.r.fastabend@intel.com> Cc: netdev@vger.kernel.org Cc: cgroups@vger.kernel.org
2012-09-14cgroup: Wrap subsystem selection macroDaniel Wagner
Before we are able to define all subsystem ids at compile time we need a more fine grained control what gets defined when we include cgroup_subsys.h. For example we define the enums for the subsystems or to declare for struct cgroup_subsys (builtin subsystem) by including cgroup_subsys.h and defining SUBSYS accordingly. Currently, the decision if a subsys is used is defined inside the header by testing if CONFIG_*=y is true. By moving this test outside of cgroup_subsys.h we are able to control it on the include level. This is done by introducing IS_SUBSYS_ENABLED which then is defined according the task, e.g. is CONFIG_*=y or CONFIG_*=m. Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Cc: Gao feng <gaofeng@cn.fujitsu.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: John Fastabend <john.r.fastabend@intel.com> Cc: netdev@vger.kernel.org Cc: cgroups@vger.kernel.org
2012-09-14cgroup: Remove CGROUP_BUILTIN_SUBSYS_COUNTDaniel Wagner
CGROUP_BUILTIN_SUBSYS_COUNT is used as start index or stop index when looping over the subsys array looking either at the builtin or the module subsystems. Since all the builtin subsystems have an id which is lower then CGROUP_BUILTIN_SUBSYS_COUNT we know that any module will have an id larger than CGROUP_BUILTIN_SUBSYS_COUNT. In short the ids are sorted. We are about to change id assignment to happen only at compile time later in this series. That means we can't rely on the above trick since all ids will always be defined at compile time. Furthermore, ordering the builtin subsystems and the module subsystems is not really necessary. So we need a different way to know which subsystem is a builtin or a module one. We can use the subsys[]->module pointer for this. Any place where we need to know if a subsys is module we just check for the pointer. If it is NULL then the subsystem is a builtin one. With this we are able to drop the CGROUP_BUILTIN_SUBSYS_COUNT enum. Though we need to introduce a temporary placeholder so that we don't get a compilation error when only CONFIG_CGROUP is selected and no single controller. An empty enum definition is not valid. Later in this series we are able to remove the placeholder again. And with this change we get a fix for this: kernel/cgroup.c: In function ‘cgroup_load_subsys’: kernel/cgroup.c:4326:38: warning: array subscript is below array bounds [-Warray-bounds] when CONFIG_CGROUP=y and no built in controller was enabled. Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Cc: Gao feng <gaofeng@cn.fujitsu.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: John Fastabend <john.r.fastabend@intel.com> Cc: netdev@vger.kernel.org Cc: cgroups@vger.kernel.org
2012-09-14Merge branch 'tip/perf/core' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace into perf/core Pull tracing updates from Steve Rostedt. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-09-13kprobes/x86: Fix to support jprobes on ftrace-based kprobeMasami Hiramatsu
Fix kprobes/x86 to support jprobes on ftrace-based kprobes. Because of -mfentry support of ftrace, ftrace is now put on the beginning of function where jprobes are put. Originally ftrace-based kprobes doesn't support jprobe because it will change regs->ip and ftrace doesn't support changing IP and ftrace itself doesn't conflict jprobe. However, ftrace -mfentry support moves mcount call on the top of functions where jprobes are put. This means that jprobe always conflicts with ftrace-based kprobe and fails. This patch allows ftrace-based kprobes to support jprobes by allowing to modify regs->ip and kprobes breakpoint handler also allows to skip singlestepping because there is a ftrace call (not an original instruction). Link: http://lkml.kernel.org/r/20120905143125.10329.90836.stgit@localhost.localdomain Reported-by: Fengguang Wu <fengguang.wu@intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-09-13trace: Stop compiling in trace_clock unconditionallyJosh Triplett
Commit 56449f437 "tracing: make the trace clocks available generally", in April 2009, made trace_clock available unconditionally, since CONFIG_X86_DS used it too. Commit faa4602e47 "x86, perf, bts, mm: Delete the never used BTS-ptrace code", in March 2010, removed CONFIG_X86_DS, and now only CONFIG_RING_BUFFER (split out from CONFIG_TRACING for general use) has a dependency on trace_clock. So, only compile in trace_clock with CONFIG_RING_BUFFER or CONFIG_TRACING enabled. Link: http://lkml.kernel.org/r/20120903024513.GA19583@leaf Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-09-13tracing: Skip printing "OK" if failed to disable eventYuanhan Liu
No acutal case found. But logically, we should skip "OK" in case any error met. Link: http://lkml.kernel.org/r/1346051625-25231-1-git-send-email-yuanhan.liu@linux.intel.com Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-09-13locking: Adjust spin lock inlining Kconfig optionsJan Beulich
Break out the DEBUG_SPINLOCK dependency (requires moving up UNINLINE_SPIN_UNLOCK, as this was the only one in that block not depending on that option). Avoid putting values not selected into the resulting .config - they are not useful for anything, make the output less legible, and just consume space: Use "depends on" rather than directly setting the default from the combined dependency values. Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/504DF2AC020000780009A2DF@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-09-13time: Fix timeekeping_get_ns overflow on 32bit systemsJohn Stultz
Daniel Lezcano reported seeing multi-second stalls from keyboard input on his T61 laptop when NOHZ and CPU_IDLE were enabled on a 32bit kernel. He bisected the problem down to commit 1e75fa8be9fb6 ("time: Condense timekeeper.xtime into xtime_sec"). After reproducing this issue, I narrowed the problem down to the fact that timekeeping_get_ns() returns a 64bit nsec value that hasn't been accumulated. In some cases this value was being then stored in timespec.tv_nsec (which is a long). On 32bit systems, with idle times larger then 4 seconds (or less, depending on the value of xtime_nsec), the returned nsec value would overflow 32bits. This limited kept time from increasing, causing timers to not expire. The fix is to make sure we don't directly store the result of timekeeping_get_ns() into a tv_nsec field, instead using a 64bit nsec value which can then be added into the timespec via timespec_add_ns(). Reported-and-bisected-by: Daniel Lezcano <daniel.lezcano@linaro.org> Tested-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: John Stultz <john.stultz@linaro.org> Acked-by: Prarit Bhargava <prarit@redhat.com> Cc: Richard Cochran <richardcochran@gmail.com> Link: http://lkml.kernel.org/r/1347405963-35715-1-git-send-email-john.stultz@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-09-13Merge branch 'core/rcu' into perf/coreIngo Molnar
Steve Rostedt asked for the merge of a single commit, into both the RCU and the perf/tracing tree: | Josh made a change to the tracing code that affects both the | work Paul McKenney and I are currently doing. At the last | Kernel Summit back in August, Linus said when such a case | exists, it is best to make a separate branch based off of his | tree and place the change there. This way, the repositories | that need to share the change can both pull them in and the | SHA1 will match for both. Whichever branch is pulled in first | by Linus will also pull in the necessary change for the other | branch as well. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-09-13lockdep: Check if nested lock is actually heldMaarten Lankhorst
It is considered good form to lock the lock you claim to be nested in. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com> [ removed nest_lock arg to print_lock_nested_lock_not_held in favour of hlock->nest_lock, also renamed the lock arg to hlock since its a held_lock type ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/5051A9E7.5040501@canonical.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-09-13sched: cpu_power: enable ARCH_POWERVincent Guittot
Heteregeneous ARM platform uses arch_scale_freq_power function to reflect the relative capacity of each core Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1341826026-6504-6-git-send-email-vincent.guittot@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-09-13sched/nohz: Clean up select_nohz_load_balancer()Alex Shi
There is no load_balancer to be selected now. It just sets the state of the nohz tick to stop. So rename the function, pass the 'cpu' as a parameter and then remove the useless call from tick_nohz_restart_sched_tick(). [ s/set_nohz_tick_stopped/nohz_balance_enter_idle/g s/clear_nohz_tick_stopped/nohz_balance_exit_idle/g ] Signed-off-by: Alex Shi <alex.shi@intel.com> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Venkatesh Pallipadi <venki@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1347261059-24747-1-git-send-email-alex.shi@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-09-13sched: Fix load avg vs. cpu-hotplugPeter Zijlstra
Commit f319da0c68 ("sched: Fix load avg vs cpu-hotplug") was an incomplete fix: In particular, the problem is that at the point it calls calc_load_migrate() nr_running := 1 (the stopper thread), so move the call to CPU_DEAD where we're sure that nr_running := 0. Also note that we can call calc_load_migrate() without serialization, we know the state of rq is stable since its cpu is dead, and we modify the global state using appropriate atomic ops. Suggested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1346882630.2600.59.camel@twins Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-09-13sched: Remove __ARCH_WANT_INTERRUPTS_ON_CTXSWPeter Zijlstra
Now that the last architecture to use this has stopped doing so (ARM, thanks Catalin!) we can remove this complexity from the scheduler core. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Link: http://lkml.kernel.org/n/tip-g9p2a1w81xxbrze25v9zpzbf@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-09-13sched: Fix nohz_idle_balance()Vincent Guittot
On tickless systems, one CPU runs load balance for all idle CPUs. The cpu_load of this CPU is updated before starting the load balance of each other idle CPUs. We should instead update the cpu_load of the balance_cpu. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Venkatesh Pallipadi <venki@google.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Link: http://lkml.kernel.org/r/1347509486-8688-1-git-send-email-vincent.guittot@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-09-13task_work: Simplify the usage in ptrace_notify() and get_signal_to_deliver()Oleg Nesterov
ptrace_notify() and get_signal_to_deliver() do unnecessary things before task_work_run(): 1. smp_mb__after_clear_bit() is not needed, test_and_clear_bit() implies mb(). 2. And we do not need the barrier at all, in this case we only care about the "synchronous" works added by the task itself. 3. No need to clear TIF_NOTIFY_RESUME, and we should not assume task_works is the only user of this flag. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20120826191217.GA4238@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-09-13task_work: task_work_add() should not succeed after exit_task_work()Oleg Nesterov
ed3e694d "move exit_task_work() past exit_files() et.al" destroyed the add/exit synchronization we had, the caller itself should ensure task_work_add() can't race with the exiting task. However, this is not convenient/simple, and the only user which tries to do this is buggy (see the next patch). Unless the task is current, there is simply no way to do this in general. Change exit_task_work()->task_work_run() to use the dummy "work_exited" entry to let task_work_add() know it should fail. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20120826191211.GA4228@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-09-13task_work: Make task_work_add() locklessOleg Nesterov
Change task_work's to use llist-like code to avoid pi_lock in task_work_add(), this makes it useable under rq->lock. task_work_cancel() and task_work_run() still use pi_lock to synchronize with each other. (This is in preparation for a deadlock fix.) Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20120826191209.GA4221@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-09-12audit: export audit_log_task_infoPeter Moody
At the suggestion of eparis@redhat.com, move this chunk of task logging from audit_log_exit to audit_log_task_info and export this function so it's usuable elsewhere in the kernel. This patch is against git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity#next-ima-appraisal Changelog v2: - add empty audit_log_task_info if CONFIG_AUDITSYSCALL isn't set. Changelog v1: - Initial post. Signed-off-by: Peter Moody <pmoody@google.com> Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
2012-09-12Merge branch 'for-3.6-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue fixes from Tejun Heo: "It's later than I'd like but well the timing just didn't work out this time. There are three bug fixes. One from before 3.6-rc1 and two from the new CPU hotplug code. Kudos to Lai for discovering all of them and providing fixes. * Atomicity bug when clearing a flag and setting another. The two operation should have been atomic but wasn't. This bug has existed for a long time but is unlikely to have actually happened. Fix is safe. Marked for -stable. * If CPU hotplug cycles happen back-to-back before workers finish the previous cycle, the states could get out of sync and it could get stuck. Fixed by waiting for workers to complete before finishing hotplug cycle. * While CPU hotplug is in progress, idle workers could be depleted which can then lead to deadlock. I think both happening together is highly unlikely but still better to fix it and the fix isn't too scary. There's another workqueue related regression which reported a few days ago: https://bugzilla.kernel.org/show_bug.cgi?id=47301 It's a bit of head scratcher but there is a semi-reliable reproduce case, so I'm hoping to resolve it soonish." * 'for-3.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: fix possible idle worker depletion across CPU hotplug workqueue: restore POOL_MANAGING_WORKERS workqueue: fix possible deadlock in idle worker rebinding workqueue: move WORKER_REBIND clearing in rebind_workers() to the end of the function workqueue: UNBOUND -> REBIND morphing in rebind_workers() should be atomic
2012-09-10netlink: Rename pid to portid to avoid confusionEric W. Biederman
It is a frequent mistake to confuse the netlink port identifier with a process identifier. Try to reduce this confusion by renaming fields that hold port identifiers portid instead of pid. I have carefully avoided changing the structures exported to userspace to avoid changing the userspace API. I have successfully built an allyesconfig kernel with this change. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-10properly __init-annotate pm_sysrq_init()Jan Beulich
This is used only as argument to subsys_initcall(). Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2012-09-10workqueue: fix possible idle worker depletion across CPU hotplugLai Jiangshan
To simplify both normal and CPU hotplug paths, worker management is prevented while CPU hoplug is in progress. This is achieved by CPU hotplug holding the same exclusion mechanism used by workers to ensure there's only one manager per pool. If someone else seems to be performing the manager role, workers proceed to execute work items. CPU hotplug using the same mechanism can lead to idle worker depletion because all workers could proceed to execute work items while CPU hotplug is in progress and CPU hotplug itself wouldn't actually perform the worker management duty - it doesn't guarantee that there's an idle worker left when it releases management. This idle worker depletion, under extreme circumstances, can break forward-progress guarantee and thus lead to deadlock. This patch fixes the bug by using separate mechanisms for manager exclusion among workers and hotplug exclusion. For manager exclusion, POOL_MANAGING_WORKERS which was restored by the previous patch is used. pool->manager_mutex is now only used for exclusion between the elected manager and CPU hotplug. The elected manager won't proceed without holding pool->manager_mutex. This ensures that the worker which won the manager position can't skip managing while CPU hotplug is in progress. It will block on manager_mutex and perform management after CPU hotplug is complete. Note that hotplug may happen while waiting for manager_mutex. A manager isn't either on idle or busy list and thus the hoplug code can't unbind/rebind it. Make the manager handle its own un/rebinding. tj: Updated comment and description. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2012-09-10workqueue: restore POOL_MANAGING_WORKERSLai Jiangshan
This patch restores POOL_MANAGING_WORKERS which was replaced by pool->manager_mutex by 6037315269 "workqueue: use mutex for global_cwq manager exclusion". There's a subtle idle worker depletion bug across CPU hotplug events and we need to distinguish an actual manager and CPU hotplug preventing management. POOL_MANAGING_WORKERS will be used for the former and manager_mutex the later. This patch just lays POOL_MANAGING_WORKERS on top of the existing manager_mutex and doesn't introduce any synchronization changes. The next patch will update it. Note that this patch fixes a non-critical anomaly where too_many_workers() may return %true spuriously while CPU hotplug is in progress. While the issue could schedule idle timer spuriously, it didn't trigger any actual misbehavior. tj: Rewrote patch description. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2012-09-08netlink: hide struct module parameter in netlink_kernel_createPablo Neira Ayuso
This patch defines netlink_kernel_create as a wrapper function of __netlink_kernel_create to hide the struct module *me parameter (which seems to be THIS_MODULE in all existing netlink subsystems). Suggested by David S. Miller. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-07PM / QoS: Add return code to pm_qos_get_value function.Luis Gonzalez Fernandez
pm_qos_get_value don't return a return code in all cases. It's sure that anything interesting happend after BUG() but this prevent any compilation warning. [rjw: Chaneged the new return value to PM_QOS_DEFAULT_VALUE.] Signed-off-by: Luis Gonzalez Fernandez <luisgf@gmail.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2012-09-06pstore/ftrace: Convert to its own enable/disable debugfs knobAnton Vorontsov
With this patch we no longer reuse function tracer infrastructure, now we register our own tracer back-end via a debugfs knob. It's a bit more code, but that is the only downside. On the bright side we have: - Ability to make persistent_ram module removable (when needed, we can move ftrace_ops struct into a module). Note that persistent_ram is still not removable for other reasons, but with this patch it's just one thing less to worry about; - Pstore part is more isolated from the generic function tracer. We tried it already by registering our own tracer in available_tracers, but that way we're loosing ability to see the traces while we record them to pstore. This solution is somewhere in the middle: we only register "internal ftracer" back-end, but not the "front-end"; - When there is only pstore tracing enabled, the kernel will only write to the pstore buffer, omitting function tracer buffer (which, of course, still can be enabled via 'echo function > current_tracer'). Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
2012-09-05workqueue: fix possible deadlock in idle worker rebindingTejun Heo
Currently, rebind_workers() and idle_worker_rebind() are two-way interlocked. rebind_workers() waits for idle workers to finish rebinding and rebound idle workers wait for rebind_workers() to finish rebinding busy workers before proceeding. Unfortunately, this isn't enough. The second wait from idle workers is implemented as follows. wait_event(gcwq->rebind_hold, !(worker->flags & WORKER_REBIND)); rebind_workers() clears WORKER_REBIND, wakes up the idle workers and then returns. If CPU hotplug cycle happens again before one of the idle workers finishes the above wait_event(), rebind_workers() will repeat the first part of the handshake - set WORKER_REBIND again and wait for the idle worker to finish rebinding - and this leads to deadlock because the idle worker would be waiting for WORKER_REBIND to clear. This is fixed by adding another interlocking step at the end - rebind_workers() now waits for all the idle workers to finish the above WORKER_REBIND wait before returning. This ensures that all rebinding steps are complete on all idle workers before the next hotplug cycle can happen. This problem was diagnosed by Lai Jiangshan who also posted a patch to fix the issue, upon which this patch is based. This is the minimal fix and further patches are scheduled for the next merge window to simplify the CPU hotplug path. Signed-off-by: Tejun Heo <tj@kernel.org> Original-patch-by: Lai Jiangshan <laijs@cn.fujitsu.com> LKML-Reference: <1346516916-1991-3-git-send-email-laijs@cn.fujitsu.com>
2012-09-05workqueue: move WORKER_REBIND clearing in rebind_workers() to the end of the ↵Tejun Heo
function This doesn't make any functional difference and is purely to help the next patch to be simpler. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
2012-09-04workqueue: UNBOUND -> REBIND morphing in rebind_workers() should be atomicLai Jiangshan
The compiler may compile the following code into TWO write/modify instructions. worker->flags &= ~WORKER_UNBOUND; worker->flags |= WORKER_REBIND; so the other CPU may temporarily see worker->flags which doesn't have either WORKER_UNBOUND or WORKER_REBIND set and perform local wakeup prematurely. Fix it by using single explicit assignment via ACCESS_ONCE(). Because idle workers have another WORKER_NOT_RUNNING flag, this bug doesn't exist for them; however, update it to use the same pattern for consistency. tj: Applied the change to idle workers too and updated comments and patch description a bit. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org
2012-09-04perf/hwpb: Invoke __perf_event_disable() if interrupts are already disabledK.Prasad
While debugging a warning message on PowerPC while using hardware breakpoints, it was discovered that when perf_event_disable is invoked through hw_breakpoint_handler function with interrupts disabled, a subsequent IPI in the code path would trigger a WARN_ON_ONCE message in smp_call_function_single function. This patch calls __perf_event_disable() when interrupts are already disabled, instead of perf_event_disable(). Reported-by: Edjunior Barbosa Machado <emachado@linux.vnet.ibm.com> Signed-off-by: K.Prasad <Prasad.Krishnan@gmail.com> [naveen.n.rao@linux.vnet.ibm.com: v3: Check to make sure we target current task] Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20120802081635.5811.17737.stgit@localhost.localdomain [ Fixed build error on MIPS. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-09-04perf_event: Switch to internal refcount, fix race with close()Al Viro
Don't mess with file refcounts (or keep a reference to file, for that matter) in perf_event. Use explicit refcount of its own instead. Deal with the race between the final reference to event going away and new children getting created for it by use of atomic_long_inc_not_zero() in inherit_event(); just have the latter free what it had allocated and return NULL, that works out just fine (children of siblings of something doomed are created as singletons, same as if the child of leader had been created and immediately killed). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@kernel.org Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20120820135925.GG23464@ZenIV.linux.org.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-09-04sched: Remove useless code in yield_to()Michael Wang
It's impossible to enter the else branch if we have set skip_clock_update in task_yield_fair(), as yield_to_task_fair() will directly return true after invoke task_yield_fair(). Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com> Acked-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/4FF2925A.9060005@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-09-04sched: Add time unit suffix to sched sysctl knobsNamhyung Kim
Unlike others, sched_migration_cost, sched_time_avg and sched_shares_window doesn't have time unit as suffix. Add them. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1345083330-19486-1-git-send-email-namhyung@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-09-04sched/debug: Limit sd->*_idx range on sysctlNamhyung Kim
Various sd->*_idx's are used for refering the rq's load average table when selecting a cpu to run. However they can be set to any number with sysctl knobs so that it can crash the kernel if something bad is given. Fix it by limiting them into the actual range. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1345104204-8317-1-git-send-email-namhyung@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-09-04sched: Remove AFFINE_WAKEUPS feature flagNamhyung Kim
Commit beac4c7e4a1c ("sched: Remove AFFINE_WAKEUPS feature") removed use of the flag but left the definition. Get rid of it. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Link: http://lkml.kernel.org/r/1345090865-20851-1-git-send-email-namhyung@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>