summaryrefslogtreecommitdiff
path: root/kernel/trace
AgeCommit message (Collapse)Author
2016-06-30bpf: minor cleanups on fd maps and helpersDaniel Borkmann
Some minor cleanups: i) Remove the unlikely() from fd array map lookups and let the CPU branch predictor do its job, scenarios where there is not always a map entry are very well valid. ii) Move the attribute type check in the bpf_perf_event_read() helper a bit earlier so it's consistent wrt checks with bpf_perf_event_output() helper as well. iii) remove some comments that are self-documenting in kprobe_prog_is_valid_access() and therefore make it consistent to tp_prog_is_valid_access() as well. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Several cases of overlapping changes, except the packet scheduler conflicts which deal with the addition of the free list parameter to qdisc_enqueue(). Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: "I've been traveling so this accumulates more than week or so of bug fixing. It perhaps looks a little worse than it really is. 1) Fix deadlock in ath10k driver, from Ben Greear. 2) Increase scan timeout in iwlwifi, from Luca Coelho. 3) Unbreak STP by properly reinjecting STP packets back into the stack. Regression fix from Ido Schimmel. 4) Mediatek driver fixes (missing malloc failure checks, leaking of scratch memory, wrong indexing when mapping TX buffers, etc.) from John Crispin. 5) Fix endianness bug in icmpv6_err() handler, from Hannes Frederic Sowa. 6) Fix hashing of flows in UDP in the ruseport case, from Xuemin Su. 7) Fix netlink notifications in ovs for tunnels, delete link messages are never emitted because of how the device registry state is handled. From Nicolas Dichtel. 8) Conntrack module leaks kmemcache on unload, from Florian Westphal. 9) Prevent endless jump loops in nft rules, from Liping Zhang and Pablo Neira Ayuso. 10) Not early enough spinlock initialization in mlx4, from Eric Dumazet. 11) Bind refcount leak in act_ipt, from Cong WANG. 12) Missing RCU locking in HTB scheduler, from Florian Westphal. 13) Several small MACSEC bug fixes from Sabrina Dubroca (missing RCU barrier, using heap for SG and IV, and erroneous use of async flag when allocating AEAD conext.) 14) RCU handling fix in TIPC, from Ying Xue. 15) Pass correct protocol down into ipv4_{update_pmtu,redirect}() in SIT driver, from Simon Horman. 16) Socket timer deadlock fix in TIPC from Jon Paul Maloy. 17) Fix potential deadlock in team enslave, from Ido Schimmel. 18) Memory leak in KCM procfs handling, from Jiri Slaby. 19) ESN generation fix in ipv4 ESP, from Herbert Xu. 20) Fix GFP_KERNEL allocations with locks held in act_ife, from Cong WANG. 21) Use after free in netem, from Eric Dumazet. 22) Uninitialized last assert time in multicast router code, from Tom Goff. 23) Skip raw sockets in sock_diag destruction broadcast, from Willem de Bruijn. 24) Fix link status reporting in thunderx, from Sunil Goutham. 25) Limit resegmentation of retransmit queue so that we do not retransmit too large GSO frames. From Eric Dumazet. 26) Delay bpf program release after grace period, from Daniel Borkmann" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (141 commits) openvswitch: fix conntrack netlink event delivery qed: Protect the doorbell BAR with the write barriers. neigh: Explicitly declare RCU-bh read side critical section in neigh_xmit() e1000e: keep VLAN interfaces functional after rxvlan off cfg80211: fix proto in ieee80211_data_to_8023 for frames without LLC header qlcnic: use the correct ring in qlcnic_83xx_process_rcv_ring_diag() bpf, perf: delay release of BPF prog after grace period net: bridge: fix vlan stats continue counter tcp: do not send too big packets at retransmit time ibmvnic: fix to use list_for_each_safe() when delete items net: thunderx: Fix TL4 configuration for secondary Qsets net: thunderx: Fix link status reporting net/mlx5e: Reorganize ethtool statistics net/mlx5e: Fix number of PFC counters reported to ethtool net/mlx5e: Prevent adding the same vxlan port net/mlx5e: Check for BlueFlame capability before allocating SQ uar net/mlx5e: Change enum to better reflect usage net/mlx5: Add ConnectX-5 PCIe 4.0 to list of supported devices net/mlx5: Update command strings net: marvell: Add separate config ANEG function for Marvell 88E1111 ...
2016-06-27tracing/function_graph: Fix filters for function_graph thresholdJoel Fernandes
Function graph tracer currently ignores filters if tracing_thresh is set. For example, even if set_ftrace_pid is set, then its ignored if tracing_thresh set, resulting in all processes being traced. To fix this, we reuse the same entry function as when tracing_thresh is not set and do everything as in the regular case except for writing the function entry to the ring buffer. Link: http://lkml.kernel.org/r/1466228694-2677-1-git-send-email-agnel.joel@gmail.com Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: Joel Fernandes <agnel.joel@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-06-23tracing: Skip more functions when doing stack tracing of eventsSteven Rostedt (Red Hat)
# echo 1 > options/stacktrace # echo 1 > events/sched/sched_switch/enable # cat trace <idle>-0 [002] d..2 1982.525169: <stack trace> => save_stack_trace => __ftrace_trace_stack => trace_buffer_unlock_commit_regs => event_trigger_unlock_commit => trace_event_buffer_commit => trace_event_raw_event_sched_switch => __schedule => schedule => schedule_preempt_disabled => cpu_startup_entry => start_secondary The above shows that we are seeing 6 functions before ever making it to the caller of the sched_switch event. # echo stacktrace > events/sched/sched_switch/trigger # cat trace <idle>-0 [002] d..3 2146.335208: <stack trace> => trace_event_buffer_commit => trace_event_raw_event_sched_switch => __schedule => schedule => schedule_preempt_disabled => cpu_startup_entry => start_secondary The stacktrace trigger isn't as bad, because it adds its own skip to the stacktracing, but still has two events extra. One issue is that if the stacktrace passes its own "regs" then there should be no addition to the skip, as the regs will not include the functions being called. This was an issue that was fixed by commit 7717c6be6999 ("tracing: Fix stacktrace skip depth in trace_buffer_unlock_commit_regs()" as adding the skip number for kprobes made the probes not have any stack at all. But since this is only an issue when regs is being used, a skip should be added if regs is NULL. Now we have: # echo 1 > options/stacktrace # echo 1 > events/sched/sched_switch/enable # cat trace <idle>-0 [000] d..2 1297.676333: <stack trace> => __schedule => schedule => schedule_preempt_disabled => cpu_startup_entry => rest_init => start_kernel => x86_64_start_reservations => x86_64_start_kernel # echo stacktrace > events/sched/sched_switch/trigger # cat trace <idle>-0 [002] d..3 1370.759745: <stack trace> => __schedule => schedule => schedule_preempt_disabled => cpu_startup_entry => start_secondary And kprobes are not touched. Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-06-20tracing: Expose CPU physical addresses (resource values) for PCI devicesBjorn Helgaas
Previously, mmio_print_pcidev() put "user" addresses in the trace buffer. On most architectures, these are the same as CPU physical addresses, but on microblaze, mips, powerpc, and sparc, they may be something else, typically a raw BAR value (a bus address as opposed to a CPU address). Always expose the CPU physical address to avoid this arch-dependent behavior. This change should have no user-visible effect because this file currently depends on CONFIG_HAVE_MMIOTRACE_SUPPORT, which is only defined for x86, and pci_resource_to_user() is a no-op on x86. Link: http://lkml.kernel.org/r/20160511190657.5898.4248.stgit@bhelgaas-glaptop2.roam.corp.google.com Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-06-20tracing: Show the preempt count of when the event was calledSteven Rostedt (Red Hat)
Because tracepoint callbacks are done with preemption enabled, the trace events are always called with preempt disable due to the rcu_read_lock_sched_notrace() in __DO_TRACE(). This causes the preempt count shown in the recorded trace event to be inaccurate. It is always one more that what the preempt_count was when the tracepoint was called. If CONFIG_PREEMPT is enabled, subtract 1 from the preempt_count before recording it in the trace buffer. Link: http://lkml.kernel.org/r/20160525132537.GA10808@linutronix.de Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-06-20tracing: Choose static tp_printk buffer by explicit nesting countAndy Lutomirski
Currently, the trace_printk code chooses which static buffer to use based on what type of atomic context (NMI, IRQ, etc) it's in. Simplify the code and make it more robust: simply count the nesting depth and choose a buffer based on the current nesting depth. The new code will only drop an event if we nest more than 4 deep, and the old code was guaranteed to malfunction if that happened. Link: http://lkml.kernel.org/r/07ab03aecfba25fcce8f9a211b14c9c5e2865c58.1464289095.git.luto@kernel.org Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-06-20tracing: expose current->comm to [ku]probe eventsOmar Sandoval
ftrace is very quick to give up on saving the task command line (see `trace_save_cmdline()`). The workaround for events which really care about the command line is to explicitly assign it as part of the entry. However, this doesn't work for kprobe events, as there's no straightforward way to get access to current->comm. Add a kprobe/uprobe event variable $comm which provides exactly that. Link: http://lkml.kernel.org/r/f59b472033b943a370f5f48d0af37698f409108f.1465435894.git.osandov@fb.com Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-06-20ftrace: Have set_ftrace_pid use the bitmap like events doSteven Rostedt (Red Hat)
Convert set_ftrace_pid to use the bitmap like set_event_pid does. This allows for instances to use the pid filtering as well, and will allow for function-fork option to set if the children of a traced function should be traced or not. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-06-20tracing: Move pid_list write processing into its own functionSteven Rostedt (Red Hat)
The addition of PIDs into a pid_list via the write operation of set_event_pid is a bit complex. The same operation will be needed for function tracing pids. Move the code into its own generic function in trace.c, so that we can avoid duplication of this code. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-06-20tracing: Move the pid_list seq_file functions to be globalSteven Rostedt (Red Hat)
To allow other aspects of ftrace to use the pid_list logic, we need to reuse the seq_file functions. Making the generic part into functions that can be called by other files will help in this regard. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-06-20tracing: Move filtered_pid helper functions into trace.cSteven Rostedt
As the filtered_pid functions are going to be used by function tracer as well as trace_events, move the code into the generic trace.c file. The functions moved are: trace_find_filtered_pid() trace_ignore_this_task() trace_filter_add_remove_task() Kernel Doc text was also added. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-06-20tracing: Make the pid filtering helper functions globalSteven Rostedt
Make the functions used for pid filtering global for tracing, such that the function tracer can use the pid code as well. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-06-20tracing: Handle NULL formats in hold_module_trace_bprintk_format()Steven Rostedt (Red Hat)
If a task uses a non constant string for the format parameter in trace_printk(), then the trace_printk_fmt variable is set to NULL. This variable is then saved in the __trace_printk_fmt section. The function hold_module_trace_bprintk_format() checks to see if duplicate formats are used by modules, and reuses them if so (saves them to the list if it is new). But this function calls lookup_format() that does a strcmp() to the value (which is now NULL) and can cause a kernel oops. This wasn't an issue till 3debb0a9ddb ("tracing: Fix trace_printk() to print when not using bprintk()") which added "__used" to the trace_printk_fmt variable, and before that, the kernel simply optimized it out (no NULL value was saved). The fix is simply to handle the NULL pointer in lookup_format() and have the caller ignore the value if it was NULL. Link: http://lkml.kernel.org/r/1464769870-18344-1-git-send-email-zhengjun.xing@intel.com Reported-by: xingzhen <zhengjun.xing@intel.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Fixes: 3debb0a9ddb ("tracing: Fix trace_printk() to print when not using bprintk()") Cc: stable@vger.kernel.org # v3.5+ Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-06-17blktrace: avoid using timespecArnd Bergmann
The blktrace code stores the current time in a 32-bit word in its user interface. This is a bad idea because 32-bit seconds overflow at some point. We probably have until 2106 before this one overflows, as it seems to use an 'unsigned' variable, but we should confirm that user space treats it the same way. Aside from this, we want to stop using 'struct timespec' here, so I'm adding a comment about the overflow and change the code to use timespec64 instead to make the loss of range more obvious. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-15bpf, maps: flush own entries on perf map releaseDaniel Borkmann
The behavior of perf event arrays are quite different from all others as they are tightly coupled to perf event fds, f.e. shown recently by commit e03e7ee34fdd ("perf/bpf: Convert perf_event_array to use struct file") to make refcounting on perf event more robust. A remaining issue that the current code still has is that since additions to the perf event array take a reference on the struct file via perf_event_get() and are only released via fput() (that cleans up the perf event eventually via perf_event_release_kernel()) when the element is either manually removed from the map from user space or automatically when the last reference on the perf event map is dropped. However, this leads us to dangling struct file's when the map gets pinned after the application owning the perf event descriptor exits, and since the struct file reference will in such case only be manually dropped or via pinned file removal, it leads to the perf event living longer than necessary, consuming needlessly resources for that time. Relations between perf event fds and bpf perf event map fds can be rather complex. F.e. maps can act as demuxers among different perf event fds that can possibly be owned by different threads and based on the index selection from the program, events get dispatched to one of the per-cpu fd endpoints. One perf event fd (or, rather a per-cpu set of them) can also live in multiple perf event maps at the same time, listening for events. Also, another requirement is that perf event fds can get closed from application side after they have been attached to the perf event map, so that on exit perf event map will take care of dropping their references eventually. Likewise, when such maps are pinned, the intended behavior is that a user application does bpf_obj_get(), puts its fds in there and on exit when fd is released, they are dropped from the map again, so the map acts rather as connector endpoint. This also makes perf event maps inherently different from program arrays as described in more detail in commit c9da161c6517 ("bpf: fix clearing on persistent program array maps"). To tackle this, map entries are marked by the map struct file that added the element to the map. And when the last reference to that map struct file is released from user space, then the tracked entries are purged from the map. This is okay, because new map struct files instances resp. frontends to the anon inode are provided via bpf_map_new_fd() that is called when we invoke bpf_obj_get_user() for retrieving a pinned map, but also when an initial instance is created via map_create(). The rest is resolved by the vfs layer automatically for us by keeping reference count on the map's struct file. Any concurrent updates on the map slot are fine as well, it just means that perf_event_fd_array_release() needs to delete less of its own entires. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15bpf, trace: check event type in bpf_perf_event_readAlexei Starovoitov
similar to bpf_perf_event_output() the bpf_perf_event_read() helper needs to check the type of the perf_event before reading the counter. Fixes: a43eec304259 ("bpf: introduce bpf_perf_event_output() helper") Reported-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15bpf: fix matching of data/data_end in verifierAlexei Starovoitov
The ctx structure passed into bpf programs is different depending on bpf program type. The verifier incorrectly marked ctx->data and ctx->data_end access based on ctx offset only. That caused loads in tracing programs int bpf_prog(struct pt_regs *ctx) { .. ctx->ax .. } to be incorrectly marked as PTR_TO_PACKET which later caused verifier to reject the program that was actually valid in tracing context. Fix this by doing program type specific matching of ctx offsets. Fixes: 969bf05eb3ce ("bpf: direct packet access") Reported-by: Sasha Goldshtein <goldshtn@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-09block: add a separate operation type for secure eraseChristoph Hellwig
Instead of overloading the discard support with the REQ_SECURE flag. Use the opportunity to rename the queue flag as well, and remove the dead checks for this flag in the RAID 1 and RAID 10 drivers that don't claim support for secure erase. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-07bpf, trace: use READ_ONCE for retrieving file ptrDaniel Borkmann
In bpf_perf_event_read() and bpf_perf_event_output(), we must use READ_ONCE() for fetching the struct file pointer, which could get updated concurrently, so we must prevent the compiler from potential refetching. We already do this with tail calls for fetching the related bpf_prog, but not so on stored perf events. Semantics for both are the same with regards to updates. Fixes: a43eec304259 ("bpf: introduce bpf_perf_event_output() helper") Fixes: 35578d798400 ("bpf: Implement function bpf_perf_event_read() that get the selected hardware PMU conuter") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07block, drivers, fs: rename REQ_FLUSH to REQ_PREFLUSHMike Christie
To avoid confusion between REQ_OP_FLUSH, which is handled by request_fn drivers, and upper layers requesting the block layer perform a flush sequence along with possibly a WRITE, this patch renames REQ_FLUSH to REQ_PREFLUSH. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-07block, drivers: add REQ_OP_FLUSH operationMike Christie
This adds a REQ_OP_FLUSH operation that is sent to request_fn based drivers by the block layer's flush code, instead of sending requests with the request->cmd_flags REQ_FLUSH bit set. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-07blktrace: use op accessorsMike Christie
Have blktrace use the req/bio op accessor to get the REQ_OP. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-22Merge tag 'trace-v4.7-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull motr tracing updates from Steven Rostedt: "Three more changes. - I forgot that I had another selftest to stress test the ftrace instance creation. It was actually suppose to go into the 4.6 merge window, but I never committed it. I almost forgot about it again, but noticed it was missing from your tree. - Soumya PN sent me a clean up patch to not disable interrupts when taking the tasklist_lock for read, as it's unnecessary because that lock is never taken for write in irq context. - Newer gcc's can cause the jump in the function_graph code to the global ftrace_stub label to be a short jump instead of a long one. As that jump is dynamically converted to jump to the trace code to do function graph tracing, and that conversion expects a long jump it can corrupt the ftrace_stub itself (it's directly after that call). One way to prevent gcc from using a short jump is to declare the ftrace_stub as a weak function, which we do here to keep gcc from optimizing too much" * tag 'trace-v4.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ftrace/x86: Set ftrace_stub to weak to prevent gcc from using short jumps to it ftrace: Don't disable irqs when taking the tasklist_lock read_lock ftracetest: Add instance created, delete, read and enable event test
2016-05-20ftrace: Don't disable irqs when taking the tasklist_lock read_lockSoumya PN
In ftrace.c inside the function alloc_retstack_tasklist() (which will be invoked when function_graph tracing is on) the tasklist_lock is being held as reader while iterating through a list of threads. Here the lock is being held as reader with irqs disabled. The tasklist_lock is never write_locked in interrupt context so it is safe to not disable interrupts for the duration of read_lock in this block which, can be significant, given the block of code iterates through all threads. Hence changing the code to call read_lock() and read_unlock() instead of read_lock_irqsave() and read_unlock_irqrestore(). A similar change was made in commits: 8063e41d2ffc ("tracing: Change syscall_*regfunc() to check PF_KTHREAD and use for_each_process_thread()")' and 3472eaa1f12e ("sched: normalize_rt_tasks(): Don't use _irqsave for tasklist_lock, use task_rq_lock()")' Link: http://lkml.kernel.org/r/1463500874-77480-1-git-send-email-soumya.p.n@hpe.com Signed-off-by: Soumya PN <soumya.p.n@hpe.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-05-20Merge tag 'powerpc-4.7-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: "Highlights: - Support for Power ISA 3.0 (Power9) Radix Tree MMU from Aneesh Kumar K.V - Live patching support for ppc64le (also merged via livepatching.git) Various cleanups & minor fixes from: - Aaro Koskinen, Alexey Kardashevskiy, Andrew Donnellan, Aneesh Kumar K.V, Chris Smart, Daniel Axtens, Frederic Barrat, Gavin Shan, Ian Munsie, Lennart Sorensen, Madhavan Srinivasan, Mahesh Salgaonkar, Markus Elfring, Michael Ellerman, Oliver O'Halloran, Paul Gortmaker, Paul Mackerras, Rashmica Gupta, Russell Currey, Suraj Jitindar Singh, Thiago Jung Bauermann, Valentin Rothberg, Vipin K Parashar. General: - Update LMB associativity index during DLPAR add/remove from Nathan Fontenot - Fix branching to OOL handlers in relocatable kernel from Hari Bathini - Add support for userspace Power9 copy/paste from Chris Smart - Always use STRICT_MM_TYPECHECKS from Michael Ellerman - Add mask of possible MMU features from Michael Ellerman PCI: - Enable pass through of NVLink to guests from Alexey Kardashevskiy - Cleanups in preparation for powernv PCI hotplug from Gavin Shan - Don't report error in eeh_pe_reset_and_recover() from Gavin Shan - Restore initial state in eeh_pe_reset_and_recover() from Gavin Shan - Revert "powerpc/eeh: Fix crash in eeh_add_device_early() on Cell" from Guilherme G Piccoli - Remove the dependency on EEH struct in DDW mechanism from Guilherme G Piccoli selftests: - Test cp_abort during context switch from Chris Smart - Add several tests for transactional memory support from Rashmica Gupta perf: - Add support for sampling interrupt register state from Anju T - Add support for unwinding perf-stackdump from Chandan Kumar cxl: - Configure the PSL for two CAPI ports on POWER8NVL from Philippe Bergheaud - Allow initialization on timebase sync failures from Frederic Barrat - Increase timeout for detection of AFU mmio hang from Frederic Barrat - Handle num_of_processes larger than can fit in the SPA from Ian Munsie - Ensure PSL interrupt is configured for contexts with no AFU IRQs from Ian Munsie - Add kernel API to allow a context to operate with relocate disabled from Ian Munsie - Check periodically the coherent platform function's state from Christophe Lombard Freescale: - Updates from Scott: "Contains 86xx fixes, minor device tree fixes, an erratum workaround, and a kconfig dependency fix." * tag 'powerpc-4.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (192 commits) powerpc/86xx: Fix PCI interrupt map definition powerpc/86xx: Move pci1 definition to the include file powerpc/fsl: Fix build of the dtb embedded kernel images powerpc/fsl: Fix rcpm compatible string powerpc/fsl: Remove FSL_SOC dependency from FSL_LBC powerpc/fsl-pci: Add a workaround for PCI 5 errata powerpc/fsl: Fix SPI compatible on t208xrdb and t1040rdb powerpc/powernv/npu: Add PE to PHB's list powerpc/powernv: Fix insufficient memory allocation powerpc/iommu: Remove the dependency on EEH struct in DDW mechanism Revert "powerpc/eeh: Fix crash in eeh_add_device_early() on Cell" powerpc/eeh: Drop unnecessary label in eeh_pe_change_owner() powerpc/eeh: Ignore handlers in eeh_pe_reset_and_recover() powerpc/eeh: Restore initial state in eeh_pe_reset_and_recover() powerpc/eeh: Don't report error in eeh_pe_reset_and_recover() Revert "powerpc/powernv: Exclude root bus in pnv_pci_reset_secondary_bus()" powerpc/powernv/npu: Enable NVLink pass through powerpc/powernv/npu: Rework TCE Kill handling powerpc/powernv/npu: Add set/unset window helpers powerpc/powernv/ioda2: Export debug helper pe_level_printk() ...
2016-05-18Merge tag 'trace-v4.7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "This includes two new updates for the ftrace infrastructure. - With the changing of the code for filtering events by pid, from a list of pids to a bitmask, we can now easily implement following forks. With a new tracing option "event-fork" which, when set, will have tasks with pids in set_event_pid, when they fork, to have their child pids added to set_event_pid and the child will be traced as well. Note, if "event-fork" is set and a task with its pid in set_event_pid exits, its pid will be removed from set_event_pid - The addition of Tom Zanussi's hist triggers. This includes a very thorough documentatino on how to use the hist triggers with events. This introduces a quick and easy way to get histogram data from events and their fields. Some other cleanups and updates were added as well. Like Masami Hiramatsu added test cases for the event trigger and hist triggers. Also I added a speed up of filtering by using a temp buffer when filters are set" * tag 'trace-v4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (45 commits) tracing: Use temp buffer when filtering events tracing: Remove TRACE_EVENT_FL_USE_CALL_FILTER logic tracing: Remove unused function trace_current_buffer_lock_reserve() tracing: Remove one use of trace_current_buffer_lock_reserve() tracing: Have trace_buffer_unlock_commit() call the _regs version with NULL tracing: Remove unused function trace_current_buffer_discard_commit() tracing: Move trace_buffer_unlock_commit{_regs}() to local header tracing: Fold filter_check_discard() into its only user tracing: Make filter_check_discard() local tracing: Move event_trigger_unlock_commit{_regs}() to local header tracing: Don't use the address of the buffer array name in copy_from_user tracing: Handle tracing_map_alloc_elts() error path correctly tracing: Add check for NULL event field when creating hist field tracing: checking for NULL instead of IS_ERR() tracing: Do not inherit event-fork option for instances tracing: Fix unsigned comparison to zero in hist trigger code kselftests/ftrace: Add a test for log2 modifier of hist trigger tracing: Add hist trigger 'log2' modifier kselftests/ftrace: Add hist trigger testcases kselftests/ftrace : Add event trigger testcases ...
2016-05-17Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching Pull livepatching updates from Jiri Kosina: - remove of our own implementation of architecture-specific relocation code and leveraging existing code in the module loader to perform arch-dependent work, from Jessica Yu. The relevant patches have been acked by Rusty (for module.c) and Heiko (for s390). - live patching support for ppc64le, which is a joint work of Michael Ellerman and Torsten Duwe. This is coming from topic branch that is share between livepatching.git and ppc tree. - addition of livepatching documentation from Petr Mladek * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching: livepatch: make object/func-walking helpers more robust livepatch: Add some basic livepatch documentation powerpc/livepatch: Add live patching support on ppc64le powerpc/livepatch: Add livepatch stack to struct thread_info powerpc/livepatch: Add livepatch header livepatch: Allow architectures to specify an alternate ftrace location ftrace: Make ftrace_location_range() global livepatch: robustify klp_register_patch() API error checking Documentation: livepatch: outline Elf format and requirements for patch modules livepatch: reuse module loader code to write relocations module: s390: keep mod_arch_specific for livepatch modules module: preserve Elf information for livepatch modules Elf: add livepatch-specific Elf constants
2016-05-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Miller: "Highlights: 1) Support SPI based w5100 devices, from Akinobu Mita. 2) Partial Segmentation Offload, from Alexander Duyck. 3) Add GMAC4 support to stmmac driver, from Alexandre TORGUE. 4) Allow cls_flower stats offload, from Amir Vadai. 5) Implement bpf blinding, from Daniel Borkmann. 6) Optimize _ASYNC_ bit twiddling on sockets, unless the socket is actually using FASYNC these atomics are superfluous. From Eric Dumazet. 7) Run TCP more preemptibly, also from Eric Dumazet. 8) Support LED blinking, EEPROM dumps, and rxvlan offloading in mlx5e driver, from Gal Pressman. 9) Allow creating ppp devices via rtnetlink, from Guillaume Nault. 10) Improve BPF usage documentation, from Jesper Dangaard Brouer. 11) Support tunneling offloads in qed, from Manish Chopra. 12) aRFS offloading in mlx5e, from Maor Gottlieb. 13) Add RFS and RPS support to SCTP protocol, from Marcelo Ricardo Leitner. 14) Add MSG_EOR support to TCP, this allows controlling packet coalescing on application record boundaries for more accurate socket timestamp sampling. From Martin KaFai Lau. 15) Fix alignment of 64-bit netlink attributes across the board, from Nicolas Dichtel. 16) Per-vlan stats in bridging, from Nikolay Aleksandrov. 17) Several conversions of drivers to ethtool ksettings, from Philippe Reynes. 18) Checksum neutral ILA in ipv6, from Tom Herbert. 19) Factorize all of the various marvell dsa drivers into one, from Vivien Didelot 20) Add VF support to qed driver, from Yuval Mintz" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1649 commits) Revert "phy dp83867: Fix compilation with CONFIG_OF_MDIO=m" Revert "phy dp83867: Make rgmii parameters optional" r8169: default to 64-bit DMA on recent PCIe chips phy dp83867: Make rgmii parameters optional phy dp83867: Fix compilation with CONFIG_OF_MDIO=m bpf: arm64: remove callee-save registers use for tmp registers asix: Fix offset calculation in asix_rx_fixup() causing slow transmissions switchdev: pass pointer to fib_info instead of copy net_sched: close another race condition in tcf_mirred_release() tipc: fix nametable publication field in nl compat drivers: net: Don't print unpopulated net_device name qed: add support for dcbx. ravb: Add missing free_irq() calls to ravb_close() qed: Remove a stray tab net: ethernet: fec-mpc52xx: use phy_ethtool_{get|set}_link_ksettings net: ethernet: fec-mpc52xx: use phydev from struct net_device bpf, doc: fix typo on bpf_asm descriptions stmmac: hardware TX COE doesn't work when force_thresh_dma_mode is set net: ethernet: fs-enet: use phy_ethtool_{get|set}_link_ksettings net: ethernet: fs-enet: use phydev from struct net_device ...
2016-05-17Merge branch 'for-4.7/core' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull core block layer updates from Jens Axboe: "This is the core block IO changes for this merge window. Nothing earth shattering in here, it's mostly just fixes. In detail: - Fix for a long standing issue where wrong ordering in blk-mq caused order_to_size() to spew a warning. From Bart. - Async discard support from Christoph. Basically just splitting our sync interface into a submit + wait part. - Add a cleaner interface for flagging whether a device has a write back cache or not. We've previously overloaded blk_queue_flush() with this, but let's make it more explicit. Drivers cleaned up and updated in the drivers pull request. From me. - Fix for a double check for whether IO accounting is enabled or not. From Michael Callahan. - Fix for the async discard from Mike Snitzer, reinstating the early EOPNOTSUPP return if the device doesn't support discards. - Also from Mike, export bio_inc_remaining() so dm can drop it's private copy of it. - From Ming Lin, add support for passing in an offset for request payloads. - Tag function export from Sagi, which will be used in NVMe in the drivers pull. - Two blktrace related fixes from Shaohua. - Propagate NOMERGE flag when making a request from a bio, also from Shaohua. - An optimization to not parse cgroup paths in blk-throttle, if we don't need to. From Shaohua" * 'for-4.7/core' of git://git.kernel.dk/linux-block: blk-mq: fix undefined behaviour in order_to_size() blk-throttle: don't parse cgroup path if trace isn't enabled blktrace: add missed mask name blktrace: delete garbage for message trace block: make bio_inc_remaining() interface accessible again block: reinstate early return of -EOPNOTSUPP from blkdev_issue_discard block: Minor blk_account_io_start usage cleanup block: add __blkdev_issue_discard block: remove struct bio_batch block: copy NOMERGE flag from bio to request block: add ability to flag write back caching on a device blk-mq: Export tagset iter function block: add offset in blk_add_request_payload() writeback: Fix performance regression in wb_over_bg_thresh()
2016-05-17Merge tag 'trace-fixes-v4.6-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing ring-buffer fixes from Steven Rostedt: "Hao Qin reported an integer overflow possibility with signed and unsigned numbers in the ring-buffer code. https://bugzilla.kernel.org/show_bug.cgi?id=118001 At first I did not think this was too much of an issue, because the overflow would be caught later when either too much data was allocated or it would trigger RB_WARN_ON() which shuts down the ring buffer. But looking closer into it, I found that the right settings could bypass the checks and crash the kernel. Luckily, this is only accessible by root. The first fix is to convert all the variables into long, such that we don't get into issues between 32 bit variables being assigned 64 bit ones. This fixes the RB_WARN_ON() triggering. The next fix is to get rid of a duplicate DIV_ROUND_UP() that when called twice with the right value, can cause a kernel crash. The first DIV_ROUND_UP() is to normalize the input and it is checked against the minimum allowable value. But then DIV_ROUND_UP() is called again, which can overflow due to the (a + b - 1)/b, logic. The first called upped the value, the second can overflow (with the +b part). The second call to DIV_ROUND_UP() came in via a second change a while ago and the code is cleaned up to remove it" * tag 'trace-fixes-v4.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ring-buffer: Prevent overflow of size in ring_buffer_resize() ring-buffer: Use long for nr_pages to avoid overflow failures
2016-05-16Merge tag 'pm-4.7-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "The majority of changes go into the cpufreq subsystem this time. To me, quite obviously, the biggest ticket item is the new "schedutil" governor. Interestingly enough, it's the first new cpufreq governor since the beginning of the git era (except for some out-of-the-tree ones). There are two main differences between it and the existing governors. First, it uses the information provided by the scheduler directly for making its decisions, so it doesn't have to track anything by itself. Second, it can invoke drivers (supporting that feature) to adjust CPU performance right away without having to spawn work items to be executed in process context or similar. Currently, the acpi-cpufreq driver is the only one supporting that mode of operation, but then it is used on a large number of systems. The "schedutil" governor as included here is very simple and mostly regarded as a foundation for future work on the integration of the scheduler with CPU power management (in fact, there is work in progress on top of it already). Nevertheless it works and the preliminary results obtained with it are encouraging. There also is some consolidation of CPU frequency management for ARM platforms that can add their machine IDs the the new stub dt-platdev driver now and that will take care of creating the requisite platform device for cpufreq-dt, so it is not necessary to do that in platform code any more. Several ARM platforms are switched over to using this generic mechanism. In addition to that, the intel_pstate driver is now going to respect CPU frequency limits set by the platform firmware (or a BMC) and provided via the ACPI _PPC object. The devfreq subsystem is getting a new "passive" governor for SoCs subsystems that will depend on somebody else to manage their voltage rails and its support for Samsung Exynos SoCs is consolidated. The rest is support for new hardware (Intel Broxton support in intel_idle for one example), bug fixes, optimizations and cleanups in a number of places. Specifics: - New cpufreq "schedutil" governor (making decisions based on CPU utilization information provided by the scheduler and capable of switching CPU frequencies right away if the underlying driver supports that) and support for fast frequency switching in the acpi-cpufreq driver (Rafael Wysocki) - Consolidation of CPU frequency management on ARM platforms allowing them to get rid of some platform-specific boilerplate code if they are going to use the cpufreq-dt driver (Viresh Kumar, Finley Xiao, Marc Gonzalez) - Support for ACPI _PPC and CPU frequency limits in the intel_pstate driver (Srinivas Pandruvada) - Fixes and cleanups in the cpufreq core and generic governor code (Rafael Wysocki, Sai Gurrappadi) - intel_pstate driver optimizations and cleanups (Rafael Wysocki, Philippe Longepe, Chen Yu, Joe Perches) - cpufreq powernv driver fixes and cleanups (Akshay Adiga, Shilpasri Bhat) - cpufreq qoriq driver fixes and cleanups (Jia Hongtao) - ACPI cpufreq driver cleanups (Viresh Kumar) - Assorted cpufreq driver updates (Ashwin Chaugule, Geliang Tang, Javier Martinez Canillas, Paul Gortmaker, Sudeep Holla) - Assorted cpufreq fixes and cleanups (Joe Perches, Arnd Bergmann) - Fixes and cleanups in the OPP (Operating Performance Points) framework, mostly related to OPP sharing, and reorganization of OF-dependent code in it (Viresh Kumar, Arnd Bergmann, Sudeep Holla) - New "passive" governor for devfreq (for SoC subsystems that will rely on someone else for the management of their power resources) and consolidation of devfreq support for Exynos platforms, coding style and typo fixes for devfreq (Chanwoo Choi, MyungJoo Ham) - PM core fixes and cleanups, mostly to make it work better with the generic power domains (genpd) framework, and updates for that framework (Ulf Hansson, Thierry Reding, Colin Ian King) - Intel Broxton support for the intel_idle driver (Len Brown) - cpuidle core optimization and fix (Daniel Lezcano, Dave Gerlach) - ARM cpuidle cleanups (Jisheng Zhang) - Intel Kabylake support for the RAPL power capping driver (Jacob Pan) - AVS (Adaptive Voltage Switching) rockchip-io driver update (Heiko Stuebner) - Updates for the cpupower tool (Arjun Sreedharan, Colin Ian King, Mattia Dongili, Thomas Renninger)" * tag 'pm-4.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (112 commits) intel_pstate: Clean up get_target_pstate_use_performance() intel_pstate: Use sample.core_avg_perf in get_avg_pstate() intel_pstate: Clarify average performance computation intel_pstate: Avoid unnecessary synchronize_sched() during initialization cpufreq: schedutil: Make default depend on CONFIG_SMP cpufreq: powernv: del_timer_sync when global and local pstate are equal cpufreq: powernv: Move smp_call_function_any() out of irq safe block intel_pstate: Clean up intel_pstate_get() cpufreq: schedutil: Make it depend on CONFIG_SMP cpufreq: governor: Fix handling of special cases in dbs_update() PM / OPP: Move CONFIG_OF dependent code in a separate file cpufreq: intel_pstate: Ignore _PPC processing under HWP cpufreq: arm_big_little: use generic OPP functions for {init, free}_opp_table PM / OPP: add non-OF versions of dev_pm_opp_{cpumask_, }remove_table cpufreq: tango: Use generic platdev driver PM / OPP: pass cpumask by reference cpufreq: Fix GOV_LIMITS handling for the userspace governor cpupower: fix potential memory leak PM / devfreq: style/typo fixes PM / devfreq: exynos: Add the detailed correlation for Exynos5422 bus ..
2016-05-16Merge branch 'pm-cpufreq'Rafael J. Wysocki
* pm-cpufreq: (63 commits) intel_pstate: Clean up get_target_pstate_use_performance() intel_pstate: Use sample.core_avg_perf in get_avg_pstate() intel_pstate: Clarify average performance computation intel_pstate: Avoid unnecessary synchronize_sched() during initialization cpufreq: schedutil: Make default depend on CONFIG_SMP cpufreq: powernv: del_timer_sync when global and local pstate are equal cpufreq: powernv: Move smp_call_function_any() out of irq safe block intel_pstate: Clean up intel_pstate_get() cpufreq: schedutil: Make it depend on CONFIG_SMP cpufreq: governor: Fix handling of special cases in dbs_update() cpufreq: intel_pstate: Ignore _PPC processing under HWP cpufreq: arm_big_little: use generic OPP functions for {init, free}_opp_table cpufreq: tango: Use generic platdev driver cpufreq: Fix GOV_LIMITS handling for the userspace governor cpufreq: mvebu: Move cpufreq code into drivers/cpufreq/ cpufreq: dt: Kill platform-data mvebu: Use dev_pm_opp_set_sharing_cpus() to mark OPP tables as shared cpufreq: dt: Identify cpu-sharing for platforms without operating-points-v2 cpufreq: governor: Change confusing struct field and variable names cpufreq: intel_pstate: Enable PPC enforcement for servers ...
2016-05-13ring-buffer: Prevent overflow of size in ring_buffer_resize()Steven Rostedt (Red Hat)
If the size passed to ring_buffer_resize() is greater than MAX_LONG - BUF_PAGE_SIZE then the DIV_ROUND_UP() will return zero. Here's the details: # echo 18014398509481980 > /sys/kernel/debug/tracing/buffer_size_kb tracing_entries_write() processes this and converts kb to bytes. 18014398509481980 << 10 = 18446744073709547520 and this is passed to ring_buffer_resize() as unsigned long size. size = DIV_ROUND_UP(size, BUF_PAGE_SIZE); Where DIV_ROUND_UP(a, b) is (a + b - 1)/b BUF_PAGE_SIZE is 4080 and here 18446744073709547520 + 4080 - 1 = 18446744073709551599 where 18446744073709551599 is still smaller than 2^64 2^64 - 18446744073709551599 = 17 But now 18446744073709551599 / 4080 = 4521260802379792 and size = size * 4080 = 18446744073709551360 This is checked to make sure its still greater than 2 * 4080, which it is. Then we convert to the number of buffer pages needed. nr_page = DIV_ROUND_UP(size, BUF_PAGE_SIZE) but this time size is 18446744073709551360 and 2^64 - (18446744073709551360 + 4080 - 1) = -3823 Thus it overflows and the resulting number is less than 4080, which makes 3823 / 4080 = 0 an nr_pages is set to this. As we already checked against the minimum that nr_pages may be, this causes the logic to fail as well, and we crash the kernel. There's no reason to have the two DIV_ROUND_UP() (that's just result of historical code changes), clean up the code and fix this bug. Cc: stable@vger.kernel.org # 3.5+ Fixes: 83f40318dab00 ("ring-buffer: Make removal of ring buffer pages atomic") Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-05-13ring-buffer: Use long for nr_pages to avoid overflow failuresSteven Rostedt (Red Hat)
The size variable to change the ring buffer in ftrace is a long. The nr_pages used to update the ring buffer based on the size is int. On 64 bit machines this can cause an overflow problem. For example, the following will cause the ring buffer to crash: # cd /sys/kernel/debug/tracing # echo 10 > buffer_size_kb # echo 8556384240 > buffer_size_kb Then you get the warning of: WARNING: CPU: 1 PID: 318 at kernel/trace/ring_buffer.c:1527 rb_update_pages+0x22f/0x260 Which is: RB_WARN_ON(cpu_buffer, nr_removed); Note each ring buffer page holds 4080 bytes. This is because: 1) 10 causes the ring buffer to have 3 pages. (10kb requires 3 * 4080 pages to hold) 2) (2^31 / 2^10 + 1) * 4080 = 8556384240 The value written into buffer_size_kb is shifted by 10 and then passed to ring_buffer_resize(). 8556384240 * 2^10 = 8761737461760 3) The size passed to ring_buffer_resize() is then divided by BUF_PAGE_SIZE which is 4080. 8761737461760 / 4080 = 2147484672 4) nr_pages is subtracted from the current nr_pages (3) and we get: 2147484669. This value is saved in a signed integer nr_pages_to_update 5) 2147484669 is greater than 2^31 but smaller than 2^32, a signed int turns into the value of -2147482627 6) As the value is a negative number, in update_pages_handler() it is negated and passed to rb_remove_pages() and 2147482627 pages will be removed, which is much larger than 3 and it causes the warning because not all the pages asked to be removed were removed. Link: https://bugzilla.kernel.org/show_bug.cgi?id=118001 Cc: stable@vger.kernel.org # 2.6.28+ Fixes: 7a8e76a3829f1 ("tracing: unified trace buffer") Reported-by: Hao Qin <QEver.cn@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-05-11Merge branch 'perf/urgent' into perf/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-10blktrace: add missed mask nameShaohua Li
BLK_TC_NOTIFY is missed in mask_maps, so we can't print out notify or set mask with 'notify' name. Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-10blktrace: delete garbage for message traceShaohua Li
commit f4a1d08ce65 introduces a regression. Originally for BLK_TN_MESSAGE, we add message in trace and return. The commit ignores the early return and add garbage info. Signed-off-by: Shaohua Li <shli@fb.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
In netdevice.h we removed the structure in net-next that is being changes in 'net'. In macsec.c and rtnetlink.c we have overlaps between fixes in 'net' and the u64 attribute changes in 'net-next'. The mlx5 conflicts have to do with vxlan support dependencies. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-03tracing: Use temp buffer when filtering eventsSteven Rostedt (Red Hat)
Filtering of events requires the data to be written to the ring buffer before it can be decided to filter or not. This is because the parameters of the filter are based on the result that is written to the ring buffer and not on the parameters that are passed into the trace functions. The ftrace ring buffer is optimized for writing into the ring buffer and committing. The discard procedure used when filtering decides the event should be discarded is much more heavy weight. Thus, using a temporary filter when filtering events can speed things up drastically. Without a temp buffer we have: # trace-cmd start -p nop # perf stat -r 10 hackbench 50 0.790706626 seconds time elapsed ( +- 0.71% ) # trace-cmd start -e all # perf stat -r 10 hackbench 50 1.566904059 seconds time elapsed ( +- 0.27% ) # trace-cmd start -e all -f 'common_preempt_count==20' # perf stat -r 10 hackbench 50 1.690598511 seconds time elapsed ( +- 0.19% ) # trace-cmd start -e all -f 'common_preempt_count!=20' # perf stat -r 10 hackbench 50 1.707486364 seconds time elapsed ( +- 0.30% ) The first run above is without any tracing, just to get a based figure. hackbench takes ~0.79 seconds to run on the system. The second run enables tracing all events where nothing is filtered. This increases the time by 100% and hackbench takes 1.57 seconds to run. The third run filters all events where the preempt count will equal "20" (this should never happen) thus all events are discarded. This takes 1.69 seconds to run. This is 10% slower than just committing the events! The last run enables all events and filters where the filter will commit all events, and this takes 1.70 seconds to run. The filtering overhead is approximately 10%. Thus, the discard and commit of an event from the ring buffer may be about the same time. With this patch, the numbers change: # trace-cmd start -p nop # perf stat -r 10 hackbench 50 0.778233033 seconds time elapsed ( +- 0.38% ) # trace-cmd start -e all # perf stat -r 10 hackbench 50 1.582102692 seconds time elapsed ( +- 0.28% ) # trace-cmd start -e all -f 'common_preempt_count==20' # perf stat -r 10 hackbench 50 1.309230710 seconds time elapsed ( +- 0.22% ) # trace-cmd start -e all -f 'common_preempt_count!=20' # perf stat -r 10 hackbench 50 1.786001924 seconds time elapsed ( +- 0.20% ) The first run is again the base with no tracing. The second run is all tracing with no filtering. It is a little slower, but that may be well within the noise. The third run shows that discarding all events only took 1.3 seconds. This is a speed up of 23%! The discard is much faster than even the commit. The one downside is shown in the last run. Events that are not discarded by the filter will take longer to add, this is due to the extra copy of the event. Cc: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-05-03tracing: Don't display trigger file for events that can't be enabledChunyu Hu
Currently register functions for events will be called through the 'reg' field of event class directly without any check when seting up triggers. Triggers for events that don't support register through debug fs (events under events/ftrace are for trace-cmd to read event format, and most of them don't have a register function except events/ftrace/functionx) can't be enabled at all, and an oops will be hit when setting up trigger for those events, so just not creating them is an easy way to avoid the oops. Link: http://lkml.kernel.org/r/1462275274-3911-1-git-send-email-chuhu@redhat.com Cc: stable@vger.kernel.org # 3.14+ Fixes: 85f2b08268c01 ("tracing: Add basic event trigger framework") Signed-off-by: Chunyu Hu <chuhu@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-05-02tracing: Remove TRACE_EVENT_FL_USE_CALL_FILTER logicSteven Rostedt (Red Hat)
Nothing sets TRACE_EVENT_FL_USE_CALL_FILTER anymore. Remove it. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-29tracing: Remove unused function trace_current_buffer_lock_reserve()Steven Rostedt (Red Hat)
trace_current_buffer_lock_reserve() has no more users. Remove it. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-29tracing: Remove one use of trace_current_buffer_lock_reserve()Steven Rostedt (Red Hat)
The only user of trace_current_buffer_lock_reserve() is in the boot up self tests. Restructure the code a little to have that code use what everything else uses: trace_event_buffer_lock_reserve(). Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-29tracing: Have trace_buffer_unlock_commit() call the _regs version with NULLSteven Rostedt (Red Hat)
There's no real difference between trace_buffer_unlock_commit() and trace_buffer_unlock_commit_regs() except that the former passes NULL to ftrace_stack_trace() instead of regs. Have the former be a static inline of the latter which passes NULL for regs. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-29tracing: Remove unused function trace_current_buffer_discard_commit()Steven Rostedt (Red Hat)
The function trace_current_buffer_discard_commit() has no callers, remove it. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-29tracing: Move trace_buffer_unlock_commit{_regs}() to local headerSteven Rostedt (Red Hat)
The functions trace_buffer_unlock_commit() and the _regs() version are only used within the kernel/trace directory. Move them to the local header and remove the export as well. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-29tracing: Fold filter_check_discard() into its only userSteven Rostedt (Red Hat)
The function filter_check_discard() is small and only called by one user, its code can be folded into that one caller and make the code a bit less comlplex. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2016-04-27tracing: Make filter_check_discard() localSteven Rostedt (Red Hat)
Nothing outside of the tracing directory calls filter_check_discard() or check_filter_check_discard(). They should not be called by modules. Move their prototypes into the local tracing header and remove their EXPORT_SYMBOL() macros. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>