summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2014-03-11sched/idle: Reorganize the idle loopDaniel Lezcano
Now that we have the main cpuidle function in idle.c, move some code from the idle mainloop to this function for the sake of clarity. That removes if then else indentation difficult to follow when looking at the code. This patch does not change the current behavior. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: tglx@linutronix.de Cc: rjw@rjwysocki.net Cc: preeti@linux.vnet.ibm.com Link: http://lkml.kernel.org/r/1393832934-11625-3-git-send-email-daniel.lezcano@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-11cpuidle/idle: Move the cpuidle_idle_call function to idle.cDaniel Lezcano
The cpuidle_idle_call does nothing more than calling the three individuals function and is no longer used by any arch specific code but only in the cpuidle framework code. We can move this function into the idle task code to ensure better proximity to the scheduler code. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: rjw@rjwysocki.net Cc: preeti@linux.vnet.ibm.com Link: http://lkml.kernel.org/r/1393832934-11625-2-git-send-email-daniel.lezcano@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-11idle/cpuidle: Split cpuidle_idle_call main function into smaller functionsDaniel Lezcano
In order to allow better integration between the cpuidle framework and the scheduler, reducing the distance between these two sub-components will facilitate this integration by moving part of the cpuidle code in the idle task file and, because idle.c is in the sched directory, we have access to the scheduler's private structures. This patch splits the cpuidle_idle_call main entry function into 3 calls to a newly added API: 1. select the idle state 2. enter the idle state 3. reflect the idle state The cpuidle_idle_call calls these three functions to implement the main idle entry function. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: rjw@rjwysocki.net Cc: preeti@linux.vnet.ibm.com Link: http://lkml.kernel.org/r/1393832934-11625-1-git-send-email-daniel.lezcano@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-28Merge branch 'timers/core' into sched/idleIngo Molnar
Avoid heavy conflicts caused by WIP patches in drivers/cpuidle/cpuidle.c, by merging these into a single base. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-28Merge branch 'timers.2014.02.25a' of ↵Thomas Gleixner
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into timers/core Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-02-27trace: Replace hardcoding of 19 with MAX_NICEDongsheng Yang
Use MAX_NICE instead of the value 19 for ring_buffer_benchmark. Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/r/1393251121-25534-1-git-send-email-yangds.fnst@cn.fujitsu.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-27sched: Guarantee task priority in pick_next_task()Peter Zijlstra
Michael spotted that the idle_balance() push down created a task priority problem. Previously, when we called idle_balance() before pick_next_task() it wasn't a problem when -- because of the rq->lock droppage -- an rt/dl task slipped in. Similarly for pre_schedule(), rt pre-schedule could have a dl task slip in. But by pulling it into the pick_next_task() loop, we'll not try a higher task priority again. Cure this by creating a re-start condition in pick_next_task(); and triggering this from pick_next_task_{rt,fair}(). It also fixes a live-lock where we get stuck in pick_next_task_fair() due to idle_balance() seeing !0 nr_running but there not actually being any fair tasks about. Reported-by: Michael Wang <wangyun@linux.vnet.ibm.com> Fixes: 38033c37faab ("sched: Push down pre_schedule() and idle_balance()") Tested-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20140224121218.GR15586@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-27sched/idle: Remove stale old filePeter Zijlstra
Commit cf37b6b48428d ("sched/idle: Move cpu/idle.c to sched/idle.c") said to simply move a file; somehow it got mangled and created an old version of the file and forgot to remove the old file. Fix this fail; add the lost change and remove the now identical old file. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: rjw@rjwysocki.net Cc: nicolas.pitre@linaro.org Cc: preeti@linux.vnet.ibm.com Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Link: http://lkml.kernel.org/r/20140224172207.GC9987@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-27sched: Put rq's sched_avg under CONFIG_FAIR_GROUP_SCHEDDietmar Eggemann
The struct sched_avg of struct rq is only used in case group scheduling is enabled inside __update_tg_runnable_avg() to update per-cpu representation of a task group. I.e. that there is no need to maintain the runnable avg of a rq in the !CONFIG_FAIR_GROUP_SCHED case. This patch guards struct sched_avg of struct rq and update_rq_runnable_avg() with CONFIG_FAIR_GROUP_SCHED. There is an extra empty definition for update_rq_runnable_avg() necessary for the !CONFIG_FAIR_GROUP_SCHED && CONFIG_SMP case. The function print_cfs_group_stats() which prints out struct sched_avg of struct rq is already guarded with CONFIG_FAIR_GROUP_SCHED. Reviewed-by: Ben Segall <bsegall@google.com> Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/530DCDC5.1060406@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-25timers: Make internal_add_timer() update ->next_timer if ->active_timers == 0Oleg Nesterov
The internal_add_timer() function updates base->next_timer only if timer->expires < base->next_timer. This is correct, but it also makes sense to do the same if we add the first non-deferrable timer. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Tested-by: Mike Galbraith <bitbucket@online.de>
2014-02-25timers: Reduce future __run_timers() latency for first add to empty listPaul E. McKenney
The __run_timers() function currently steps through the list one jiffy at a time in order to update the timer wheel. However, if the timer wheel is empty, no adjustment is needed other than updating ->timer_jiffies. Therefore, just before we add a timer to an empty timer wheel, we should mark the timer wheel as being up to date. This marking will reduce (and perhaps eliminate) the jiffy-stepping that a future __run_timers() call will need to do in response to some future timer posting or migration. This commit therefore updates ->timer_jiffies for this case. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Tested-by: Mike Galbraith <bitbucket@online.de>
2014-02-25timers: Reduce future __run_timers() latency for newly emptied listPaul E. McKenney
The __run_timers() function currently steps through the list one jiffy at a time in order to update the timer wheel. However, if the timer wheel is empty, no adjustment is needed other than updating ->timer_jiffies. Therefore, if we just emptied the timer wheel, for example, by deleting the last timer, we should mark the timer wheel as being up to date. This marking will reduce (and perhaps eliminate) the jiffy-stepping that a future __run_timers() call will need to do in response to some future timer posting or migration. This commit therefore catches ->timer_jiffies for this case. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Tested-by: Mike Galbraith <bitbucket@online.de>
2014-02-25timers: Reduce __run_timers() latency for empty listPaul E. McKenney
The __run_timers() function currently steps through the list one jiffy at a time in order to update the timer wheel. However, if the timer wheel is empty, no adjustment is needed other than updating ->timer_jiffies. In this case, which is likely to be common for NO_HZ_FULL kernels, the kernel currently incurs a large latency for no good reason. This commit therefore short-circuits this case. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Tested-by: Mike Galbraith <bitbucket@online.de>
2014-02-25timers: Track total number of timers in listPaul E. McKenney
Currently, the tvec_base structure's ->active_timers field tracks only the non-deferrable timers, which means that even if ->active_timers is zero, there might well be deferrable timers in the list. This commit therefore adds an ->all_timers field to track all the timers, whether deferrable or not. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Tested-by: Mike Galbraith <bitbucket@online.de>
2014-02-22cpuidle/arm64: Remove redundant cpuidle_idle_call()Nicolas Pitre
The core idle loop now takes care of it. Signed-off-by: Nicolas Pitre <nico@linaro.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Paul Burton <paul.burton@imgtec.com> Cc: Deepthi Dharwar <deepthi@linux.vnet.ibm.com> Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: linux-pm@vger.kernel.org Cc: linaro-kernel@lists.linaro.org Link: http://lkml.kernel.org/n/tip-wk9vpc8dsn46s12pl602ljpo@git.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-22cpuidle/powernv: Remove redundant cpuidle_idle_call()Nicolas Pitre
The core idle loop now takes care of it. We need to add the runlatch function calls to the idle routines which was earlier taken care of by the arch specific idle routine. Signed-off-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Reviewed-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Paul Burton <paul.burton@imgtec.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: linux-pm@vger.kernel.org Cc: linaro-kernel@lists.linaro.org Link: http://lkml.kernel.org/n/tip-nr4mtbkkzf2oomaj85m24o7c@git.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-22sched, nohz: Exclude isolated cores from load balancingMike Galbraith
The user explicitly disabled load balancing, else this core would not be disconnected. Don't add these to nohz.idle_cpus_mask. Signed-off-by: Mike Galbraith <mgalbraith@suse.de> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Lei Wen <leiwen@marvell.com> Link: http://lkml.kernel.org/n/tip-vmme4f49psirp966pklm5l9j@git.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-22sched: Fix select_task_rq_fair() description commentsMorten Rasmussen
Brings select_task_rq_fair() description comments up-to-date. Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1392732864-10927-1-git-send-email-morten.rasmussen@arm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-22workqueue: Replace hardcoding of -20 and 19 with MIN_NICE and MAX_NICEDongsheng Yang
Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/6d85138180c00ce86975addab6e34b24b84f00a5.1392103744.git.yangds.fnst@cn.fujitsu.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-22sys: Replace hardcoding of -20 and 19 with MIN_NICE and MAX_NICEDongsheng Yang
Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Robin Holt <holt@sgi.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Kees Cook <keescook@chromium.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Link: http://lkml.kernel.org/r/0261f094b836f1acbcdf52e7166487c0c77323c8.1392103744.git.yangds.fnst@cn.fujitsu.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-22sched: Replace hardcoding of -20 and 19 with MIN_NICE and MAX_NICEDongsheng Yang
Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/bd80780f19b4f9b4a765acc353c8dbc130274dd6.1392103744.git.yangds.fnst@cn.fujitsu.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-22rcu: Use MAX_NICE to replace hardcoding of 19Dongsheng Yang
Reviewed-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/5b3bf232f41b33ab703a1595e94671b303e2d1fc.1392103744.git.yangds.fnst@cn.fujitsu.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-22sched/prio: Add 3 macros of MAX_NICE, MIN_NICE and NICE_WIDTH in prio.hDongsheng Yang
Currently there is lots of hard coding to 19 and -20, to represent maximum and minimum of nice values. This patch add three macros in prio.h for maximum, minimum and width of nice value, and uses it to remove hardcoded values in prio.h. Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/3994e89327b2b15f992277cdf9f409c516f87d1b.1392103744.git.yangds.fnst@cn.fujitsu.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> [ Collapsed two small patches. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-22sched/prio: Use DEFAULT_PRIO to define NICE_TO_PRIO() and PRIO_TO_NICE()Dongsheng Yang
There is already a macro named DEFAULT_PRIO in prio.h, we can use it to define NICE_TO_PRIO and PRIO_TO_NICE rather than use hard coding of (MAX_RT_PRIO + 20). Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/4e28ec36fb49e8906027cbbdd900ab26a149905e.1392103744.git.yangds.fnst@cn.fujitsu.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-22sched/rt: Make init_sched_rt_calss() __initLi Zefan
It's a bootstrap function. Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/52F5CC09.1080502@huawei.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-22sched/rt: Remove 'leaf_rt_rq_list' from 'struct rq'Li Zefan
This is a leftover from commit e23ee74777f389369431d77390c4b09332ce026a ("sched/rt: Simplify pull_rt_task() logic and remove .leaf_rt_rq_list"). Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/52F5CBF6.4060901@huawei.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-22sched: Consider pi boosting in setscheduler()Thomas Gleixner
If a PI boosted task policy/priority is modified by a setscheduler() call we unconditionally dequeue and requeue the task if it is on the runqueue even if the new priority is lower than the current effective boosted priority. This can result in undesired reordering of the priority bucket list. If the new priority is less or equal than the current effective we just store the new parameters in the task struct and leave the scheduler class and the runqueue untouched. This is handled when the task deboosts itself. Only if the new priority is higher than the effective boosted priority we apply the change immediately. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> [ Rebase ontop of v3.14-rc1. ] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Dario Faggioli <raistlin@linux.it> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1391803122-4425-7-git-send-email-bigeasy@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-22sched: Queue RT tasks to head when prio dropsThomas Gleixner
The following scenario does not work correctly: Runqueue of CPUx contains two runnable and pinned tasks: T1: SCHED_FIFO, prio 80 T2: SCHED_FIFO, prio 80 T1 is on the cpu and executes the following syscalls (classic priority ceiling scenario): sys_sched_setscheduler(pid(T1), SCHED_FIFO, .prio = 90); ... sys_sched_setscheduler(pid(T1), SCHED_FIFO, .prio = 80); ... Now T1 gets preempted by T3 (SCHED_FIFO, prio 95). After T3 goes back to sleep the scheduler picks T2. Surprise! The same happens w/o actual preemption when T1 is forced into the scheduler due to a sporadic NEED_RESCHED event. The scheduler invokes pick_next_task() which returns T2. So T1 gets preempted and scheduled out. This happens because sched_setscheduler() dequeues T1 from the prio 90 list and then enqueues it on the tail of the prio 80 list behind T2. This violates the POSIX spec and surprises user space which relies on the guarantee that SCHED_FIFO tasks are not scheduled out unless they give the CPU up voluntarily or are preempted by a higher priority task. In the latter case the preempted task must get back on the CPU after the preempting task schedules out again. We fixed a similar issue already in commit 60db48c (sched: Queue a deboosted task to the head of the RT prio queue). The same treatment is necessary for sched_setscheduler(). So enqueue to head of the prio bucket list if the priority of the task is lowered. It might be possible that existing user space relies on the current behaviour, but it can be considered highly unlikely due to the corner case nature of the application scenario. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1391803122-4425-6-git-send-email-bigeasy@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-22sched: Adjust p->sched_reset_on_fork when nothing else changesThomas Gleixner
If the policy and priority remain unchanged a possible modification of p->sched_reset_on_fork gets lost in the early exit path. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> [ Rebase ontop of v3.14-rc1. ] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1391803122-4425-5-git-send-email-bigeasy@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-22sched: Add better debug output for might_sleep()Thomas Gleixner
might_sleep() can tell us where interrupts have been disabled, but we have no idea what disabled preemption. Add some debug infrastructure. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1391803122-4425-4-git-send-email-bigeasy@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-22sched: Check for idle task in might_sleep()Thomas Gleixner
Idle is not allowed to call sleeping functions ever! Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1391803122-4425-3-git-send-email-bigeasy@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-22sched: Init idle->on_rq in init_idle()Thomas Gleixner
We stumbled in RT over a SMP bringup issue on ARM where the idle->on_rq == 0 was causing try_to_wakeup() on the other cpu to run into nada land. After adding that idle->on_rq = 1; I was able to find the root cause of the lockup: the idle task on the newly woken up cpu was fiddling with a sleeping spinlock, which is a nono. I kept the init of idle->on_rq to keep the state consistent and to avoid another long lasting debug session. As a side note, the whole debug mess could have been avoided if might_sleep() would have yelled when called from the idle task. That's fixed with patch 2/6 - and that one actually has a changelog :) Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1391803122-4425-2-git-send-email-bigeasy@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-21sched: Remove some #ifdefferyPeter Zijlstra
Remove a few gratuitous #ifdefs in pick_next_task*(). Cc: Ingo Molnar <mingo@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Juri Lelli <juri.lelli@gmail.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-nnzddp5c4fijyzzxxrwlxghf@git.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-02-21sched: Fix hotplug task migrationPeter Zijlstra
Dan Carpenter reported: > kernel/sched/rt.c:1347 pick_next_task_rt() warn: variable dereferenced before check 'prev' (see line 1338) > kernel/sched/deadline.c:1011 pick_next_task_dl() warn: variable dereferenced before check 'prev' (see line 1005) Kirill also spotted that migrate_tasks() will have an instant NULL deref because pick_next_task() will immediately deref prev. Instead of fixing all the corner cases because migrate_tasks() can pass in a NULL prev task in the unlikely case of hot-un-plug, provide a fake task such that we can remove all the NULL checks from the far more common paths. A further problem; not previously spotted; is that because we pushed pre_schedule() and idle_balance() into pick_next_task() we now need to avoid those getting called and pulling more tasks on our dying CPU. We avoid pull_{dl,rt}_task() by setting fake_task.prio to MAX_PRIO+1. We also note that since we call pick_next_task() exactly the amount of times we have runnable tasks present, we should never land in idle_balance(). Fixes: 38033c37faab ("sched: Push down pre_schedule() and idle_balance()") Cc: Juri Lelli <juri.lelli@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Reported-by: Kirill Tkhai <tkhai@yandex.ru> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20140212094930.GB3545@laptop.programming.kicks-ass.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-02-21sched/fair: Remove idle_balance() declaration in sched.hPeter Zijlstra
Remove idle_balance() from the public life; also reduce some #ifdef clutter by folding the pick_next_task_fair() idle path into idle_balance(). Cc: mingo@kernel.org Reported-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20140211151148.GP27965@twins.programming.kicks-ass.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-02-21sched/fair: Reset se-depth when task switched to FAIRMichael wang
Sasha reported: [ 522.645288] BUG: unable to handle kernel NULL pointer dereference at ... [ 522.646271] IP: [<ffffffff81186c6f>] check_preempt_wakeup+0x11f/0x210 ... [ 522.650021] Call Trace: [ 522.650021] <IRQ> [ 522.650021] [<ffffffff8117361d>] check_preempt_curr+0x3d/0xb0 [ 522.650021] [<ffffffff81175d88>] ttwu_do_wakeup+0x18/0x130 ... which was caused by the se-depth changed during the time when task is not FAIR, and we will use the wrong depth value after it switched back to FAIR. This patch reset the depth at the time when task switched to FAIR, make sure that we always have the correct value when task is FAIR. Cc: Ingo Molnar <mingo@kernel.org> Reported-by: Sasha Levin <sasha.levin@oracle.com> Tested-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/5305732D.70001@linux.vnet.ibm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-02-21Merge branch 'linus' into sched/coreThomas Gleixner
Reason: Bring bakc upstream modification to resolve conflicts Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-02-20Merge tag 'pci-v3.14-fixes-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: "The most interesting thing here is the change to enable INTx (by clearing PCI_COMMAND_INTX_DISABLE) if the BIOS left INTx disabled. Apparently the Baytrail BIOS does this, which means EHCI doesn't work. Also, fix an AHCI MSI regression and other issues with the recent MSI changes. This also adds pci_enable_msi_exact() and pci_enable_msix_exact(), which aren't regression fixes, but will keep us from touching drivers twice (once to stop using the deprecated pci_enable_msi(), etc., and again to use the *_exact() variants). There's also a minor MVEBU fix. Summary: MSI: - Fix AHCI single-MSI fallback (Alexander Gordeev) - Fix populate_msi_sysfs() error paths (Greg Kroah-Hartman) - Fix htmldocs problem (Masanari Iida) - Add pci_enable_msi_exact() and pci_enable_msix_exact() (Alexander Gordeev) - Update documentation (Alexander Gordeev) Miscellaneous: - mvebu: expose device ID & revision via lspci (Andrew Lunn) - Enable INTx if the BIOS left them disabled (Bjorn Helgaas)" * tag 'pci-v3.14-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: ahci: Fix broken fallback to single MSI mode PCI: Enable INTx if BIOS left them disabled PCI/MSI: Add pci_enable_msi_exact() and pci_enable_msix_exact() PCI/MSI: Fix cut-and-paste errors in documentation PCI/MSI: Add pci_enable_msi() documentation back PCI/MSI: Fix pci_msix_vec_count() htmldocs failure PCI/MSI: Fix leak of msi_attrs PCI/MSI: Check kmalloc() return value, fix leak of name PCI: mvebu: Use Device ID and revision from underlying endpoint
2014-02-20Merge branch 'for-3.14-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata Pull libata fixes from Tejun Heo: "Various device specific fixes. Nothing too interesting" * 'for-3.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata: ahci: disable NCQ on Samsung pci-e SSDs on macbooks ata: sata_mv: Cleanup only the initialized ports sata_sil: apply MOD15WRITE quirk to TOSHIBA MK2561GSYN ata: enable quirk from jmicron JMB350 for JMB394 ATA: SATA_MV: Add missing Kconfig select statememnt ata: pata_imx: Check the return value from clk_prepare_enable()
2014-02-20Merge branch 'for-3.14-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: "Quite a few fixes this time. Three locking fixes, all marked for -stable. A couple error path fixes and some misc fixes. Hugh found a bug in memcg offlining sequence and we thought we could fix that from cgroup core side but that turned out to be insufficient and got reverted. A different fix has been applied to -mm" * 'for-3.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: update cgroup_enable_task_cg_lists() to grab siglock Revert "cgroup: use an ordered workqueue for cgroup destruction" cgroup: protect modifications to cgroup_idr with cgroup_mutex cgroup: fix locking in cgroup_cfts_commit() cgroup: fix error return from cgroup_create() cgroup: fix error return value in cgroup_mount() cgroup: use an ordered workqueue for cgroup destruction nfs: include xattr.h from fs/nfs/nfs3proc.c cpuset: update MAINTAINERS entry arm, pm, vmpressure: add missing slab.h includes
2014-02-20Merge branch 'for-3.14-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue fixes from Tejun Heo: "Two workqueue fixes. One for an unlikely but possible critical bug during kworker shutdown and the other to make lockdep names a bit more descriptive" * 'for-3.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: ensure @task is valid across kthread_stop() workqueue: add args to workqueue lockdep name
2014-02-20Merge branch 'fixes-for-v3.14' of ↵Linus Torvalds
git://git.linaro.org/people/mszyprowski/linux-dma-mapping Pull DMA-mapping fixes from Marek Szyprowski: "This contains fixes for incorrect atomic test in dma-mapping subsystem for ARM and x86 architecture" * 'fixes-for-v3.14' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping: x86: dma-mapping: fix GFP_ATOMIC macro usage ARM: dma-mapping: fix GFP_ATOMIC macro usage
2014-02-20user_namespace.c: Remove duplicated word in commentBrian Campbell
Signed-off-by: Brian Campbell <brian.campbell@editshare.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-02-19Merge tag 'nfs-for-3.14-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client bugfixes from Trond Myklebust: "Highlights include stable fixes for the following bugs: - General performance regression due to NFS_INO_INVALID_LABEL being set when the server doesn't support labeled NFS - Hang in the RPC code due to a socket out-of-buffer race - Infinite loop when trying to establish the NFSv4 lease - Use-after-free bug in the RPCSEC gss code. - nfs4_select_rw_stateid is returning with a non-zero error value on success Other bug fixes: - Potential memory scribble in the RPC bi-directional RPC code - Pipe version reference leak - Use the correct net namespace in the new NFSv4 migration code" * tag 'nfs-for-3.14-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFS fix error return in nfs4_select_rw_stateid NFSv4: Use the correct net namespace in nfs4_update_server SUNRPC: Fix a pipe_version reference leak SUNRPC: Ensure that gss_auth isn't freed before its upcall messages SUNRPC: Fix potential memory scribble in xprt_free_bc_request() SUNRPC: Fix races in xs_nospace() SUNRPC: Don't create a gss auth cache unless rpc.gssd is running NFS: Do not set NFS_INO_INVALID_LABEL unless server supports labeled NFS
2014-02-19Merge tag 'mfd-fixes-3.14-1' of git://git.linaro.org/people/lee.jones/mfdLinus Torvalds
Pull MFD fixes from Lee Jones: "Couple of small issues solved: - Suspend/Resume call-backs require CONFIG_PM_SLEEP - Some drivers written for 32bit architectures fail when compiled with a 64bit compiler. The fixes will future proof the drivers" * tag 'mfd-fixes-3.14-1' of git://git.linaro.org/people/lee.jones/mfd: mfd: sec-core: sec_pmic_{suspend,resume}() should depend on CONFIG_PM_SLEEP mfd: max14577: max14577_{suspend,resume}() should depend on CONFIG_PM_SLEEP mfd: tps65217: Naturalise cross-architecture discrepancies mfd: wm8994-core: Naturalise cross-architecture discrepancies mfd: max8998: Naturalise cross-architecture discrepancies mfd: max8997: Naturalise cross-architecture discrepancies
2014-02-19NFS fix error return in nfs4_select_rw_stateidAndy Adamson
Do not return an error when nfs4_copy_delegation_stateid succeeds. Signed-off-by: Andy Adamson <andros@netapp.com> Link: http://lkml.kernel.org/r/1392737765-41942-1-git-send-email-andros@netapp.com Fixes: ef1820f9be27b (NFSv4: Don't try to recover NFSv4 locks when...) Cc: NeilBrown <neilb@suse.de> Cc: stable@vger.kernel.org # 3.12+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-02-19mfd: sec-core: sec_pmic_{suspend,resume}() should depend on CONFIG_PM_SLEEPGeert Uytterhoeven
If CONFIG_PM_SLEEP=n: drivers/mfd/sec-core.c:349: warning: ‘sec_pmic_suspend’ defined but not used drivers/mfd/sec-core.c:371: warning: ‘sec_pmic_resume’ defined but not used Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Lee Jones <lee.jones@linaro.org>
2014-02-19mfd: max14577: max14577_{suspend,resume}() should depend on CONFIG_PM_SLEEPGeert Uytterhoeven
If CONFIG_PM_SLEEP=n: drivers/mfd/max14577.c:177: warning: ‘max14577_suspend’ defined but not used drivers/mfd/max14577.c:200: warning: ‘max14577_resume’ defined but not used Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Lee Jones <lee.jones@linaro.org>
2014-02-19mfd: tps65217: Naturalise cross-architecture discrepanciesLee Jones
If we compile the TPS65217 for a 64bit architecture we receive the following warnings: drivers/mfd/tps65217.c: In function ‘tps65217_probe’: drivers/mfd/tps65217.c:173:13: warning: cast from pointer to integer of different size chip_id = (unsigned int)match->data; ^ Signed-off-by: Lee Jones <lee.jones@linaro.org>
2014-02-19mfd: wm8994-core: Naturalise cross-architecture discrepanciesLee Jones
If we compile the WM8994 for a 64bit architecture we receive the following warnings: drivers/mfd/wm8994-core.c: In function ‘wm8994_i2c_probe’: drivers/mfd/wm8994-core.c:639:19: warning: cast from pointer to integer of different size wm8994->type = (int)of_id->data; ^ Signed-off-by: Lee Jones <lee.jones@linaro.org>