summaryrefslogtreecommitdiff
path: root/tools/perf/util
AgeCommit message (Collapse)Author
2023-04-06perf lock contention: Do not try to update if hash map is fullNamhyung Kim
It doesn't delete data in the task_data and lock_stat maps. The data is kept there until it's consumed by userspace at the end. But it calls bpf_map_update_elem() again and again, and the data will be discarded if the map is full. This is not good. Worse, in the bpf_map_update_elem(), it keeps trying to get a new node even if the map was full. I guess it makes sense if it deletes some node like in the tstamp map (that's why I didn't make the change there). In a pre-allocated hash map, that means it'd iterate all CPU to check the freelist. And it has a bad performance impact on large machines. I've checked it on my 64 CPU machine with this. $ perf bench sched messaging -g 1000 # Running 'sched/messaging' benchmark: # 20 sender and receiver processes per group # 1000 groups == 40000 processes run Total time: 2.825 [sec] And I used the task mode, so that it can guarantee the map is full. The default map entry size is 16K and this workload has 40K tasks. Before: $ sudo ./perf lock con -abt -E3 -- perf bench sched messaging -g 1000 # Running 'sched/messaging' benchmark: # 20 sender and receiver processes per group # 1000 groups == 40000 processes run Total time: 11.299 [sec] contended total wait max wait avg wait pid comm 19284 3.51 s 3.70 ms 181.91 us 1305863 sched-messaging 243 84.09 ms 466.67 us 346.04 us 1336608 sched-messaging 177 66.35 ms 12.08 ms 374.88 us 1220416 node For some reason, it didn't report the data failures. But you can see the total time in the workload is increased a lot (2.8 -> 11.3). If it fails early when the map is full, it goes back to normal. After: $ sudo ./perf lock con -abt -E3 -- perf bench sched messaging -g 1000 # Running 'sched/messaging' benchmark: # 20 sender and receiver processes per group # 1000 groups == 40000 processes run Total time: 3.044 [sec] contended total wait max wait avg wait pid comm 18743 591.92 ms 442.96 us 31.58 us 1431454 sched-messaging 51 210.64 ms 207.45 ms 4.13 ms 1468724 sched-messaging 81 68.61 ms 65.79 ms 847.07 us 1463183 sched-messaging === output for debug === bad: 1164137, total: 2253341 bad rate: 51.66 % histogram of failure reasons task: 0 stack: 0 time: 0 data: 1164137 Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Hao Luo <haoluo@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230406210611.1622492-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-06perf lock contention: Revise needs_callstack() conditionNamhyung Kim
It needs callstacks for two reasons: * for stack aggregation mode, the map key is the stack id and it can also show the full stack traces when -v is used * for other aggregation modes, the stack filter can be used to limit lock contentions from known call paths The -v option is meaningful (in terms of stack trace) only for stack aggregation mode, so it should not set the save_callstack for other mode like with -t or -l options. I've noticed this with the following command line: $ sudo ./perf lock con -ablv -E 3 -M 16 -- ./perf bench sched messaging ... contended total wait max wait avg wait address symbol 88 4.59 ms 108.07 us 52.13 us ffff935757f46ec0 (spinlock) 33 905.22 us 73.67 us 27.43 us ffff935757f41700 (spinlock) 28 703.69 us 79.28 us 25.13 us ffff938a3d9b0c80 rq_lock (spinlock) === output for debug === bad: 12272, total: 12421 bad rate: 98.80 % histogram of failure reasons task: 8285 stack: 3987 <---------- here time: 0 data: 0 It should not have any failure on stacks since it doesn't use it. No functional change intended. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Hao Luo <haoluo@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230406210611.1622492-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-06perf lock contention: Update total/bad stats for hidden entriesNamhyung Kim
When -E option is used, it only prints the given number of entries but the event stat at the end should have the numbers for entire entries. Likewise, -S option will hide entries that don't have the named function in the callstack. Also update event stat for them. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Hao Luo <haoluo@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230406210611.1622492-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-06perf lock contention: Add data failure statNamhyung Kim
It's possible to fail to update the data when the lock_stat map is full. We should check that case and show the number at the end. $ sudo ./perf lock con -ablv -E3 -- ./perf bench sched messaging ... contended total wait max wait avg wait address symbol 6157 208.48 ms 69.29 us 33.86 us ffff934c001c1f00 (spinlock) 4030 72.04 ms 61.84 us 17.88 us ffff934c000415c0 (spinlock) 3201 50.30 ms 47.73 us 15.71 us ffff934c2eead850 (spinlock) === output for debug === bad: 0, total: 13388 bad rate: 0.00 % histogram of failure reasons task: 0 stack: 0 time: 0 data: 0 <----- added Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Hao Luo <haoluo@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230406210611.1622492-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-06perf lock contention: Update default map size to 16384Namhyung Kim
The BPF hash map will align the map size to a power of 2. So 10k would be 16k anyway. Let's have the actual size to avoid confusions. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Hao Luo <haoluo@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230406210611.1622492-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-06tools: Rename __fallthrough to fallthroughLiam Howlett
Rename the fallthrough attribute to better align with the kernel version. Copy the definition from include/linux/compiler_attributes.h including the #else clause. Adding the #else clause allows the tools compiler.h header to drop the check for a definition entirely and keeps both definitions together. Change any __fallthrough statements to fallthrough anywhere it was used within perf. This allows other tools to use the same key word as the kernel. Committer notes: Did some missing conversions to: builtin-list.c Also included gtk.h before the 'fallthrough' definition in: tools/perf/ui/gtk/hists.c tools/perf/ui/gtk/helpline.c tools/perf/ui/gtk/browser.c As it is the arg name for a macro in glib.h: /var/home/acme/git/perf-tools-next/tools/include/linux/compiler-gcc.h:16:55: error: missing binary operator before token "(" 16 | # define fallthrough __attribute__((__fallthrough__)) | ^ /usr/include/glib-2.0/glib/gmacros.h:637:28: note: in expansion of macro ‘fallthrough’ 637 | #if g_macro__has_attribute(fallthrough) Reviewed-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Liam Howlett <Liam.Howlett@oracle.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Tom Rix <trix@redhat.com> Cc: linux-sparse@vger.kernel.org <linux-sparse@vger.kernel.org> Cc: llvm@lists.linux.dev <llvm@lists.linux.dev> Link: https://lore.kernel.org/r/20221125154947.2163498-1-Liam.Howlett@oracle.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-06perf pmu: Fix a few potential fd leaksIan Rogers
Ensure fd is closed on error paths. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Gaosheng Cui <cuigaosheng1@huawei.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Rob Herring <robh@kernel.org> Cc: Sean Christopherson <seanjc@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20230406065224.2553640-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-06perf pmu: Make parser reentrantIan Rogers
By default bison uses global state for compatibility with yacc. Make the parser reentrant so that it may be used in asynchronous and multithreaded situations. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Gaosheng Cui <cuigaosheng1@huawei.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Rob Herring <robh@kernel.org> Cc: Sean Christopherson <seanjc@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20230406065224.2553640-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-04perf map: Add accessor for start and endIan Rogers
Later changes will add reference count checking for struct map, start and end are frequently accessed variables. Add an accessor so that the reference count check is only necessary in one place. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Darren Hart <dvhart@infradead.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: German Gomez <german.gomez@arm.com> Cc: Hao Luo <haoluo@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Miaoqian Lin <linmq006@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com> Cc: Song Liu <song@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Stephen Brennan <stephen.s.brennan@oracle.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Yury Norov <yury.norov@gmail.com> Link: https://lore.kernel.org/r/20230320212248.1175731-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-04perf map: Add accessor for dsoIan Rogers
Later changes will add reference count checking for struct map, with dso being the most frequently accessed variable. Add an accessor so that the reference count check is only necessary in one place. Additional changes: - add a dso variable to avoid repeated map__dso calls. - in builtin-mem.c dump_raw_samples, code only partially tested for dso == NULL. Make the possibility of NULL consistent. - in thread.c thread__memcpy fix use of spaces and use tabs. Committer notes: Did missing conversions on these files: tools/perf/arch/powerpc/util/skip-callchain-idx.c tools/perf/arch/powerpc/util/sym-handling.c tools/perf/ui/browsers/hists.c tools/perf/ui/gtk/annotate.c tools/perf/util/cs-etm.c tools/perf/util/thread.c tools/perf/util/unwind-libunwind-local.c tools/perf/util/unwind-libunwind.c Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Darren Hart <dvhart@infradead.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: German Gomez <german.gomez@arm.com> Cc: Hao Luo <haoluo@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Miaoqian Lin <linmq006@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com> Cc: Song Liu <song@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Stephen Brennan <stephen.s.brennan@oracle.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Yury Norov <yury.norov@gmail.com> Link: https://lore.kernel.org/r/20230320212248.1175731-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-04perf maps: Add functions to access mapsIan Rogers
Introduce functions to access struct maps. These functions reduce the number of places reference counting is necessary. While tidying APIs do some small const-ification, in particlar to unwind_libunwind_ops. Committer notes: Fixed up tools/perf/util/unwind-libunwind.c: - return ops->get_entries(cb, arg, thread, data, max_stack); + return ops->get_entries(cb, arg, thread, data, max_stack, best_effort); Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Darren Hart <dvhart@infradead.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: German Gomez <german.gomez@arm.com> Cc: Hao Luo <haoluo@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Miaoqian Lin <linmq006@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com> Cc: Song Liu <song@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Stephen Brennan <stephen.s.brennan@oracle.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Yury Norov <yury.norov@gmail.com> Link: https://lore.kernel.org/r/20230320212248.1175731-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-04perf maps: Remove rb_node from struct mapIan Rogers
struct map is reference counted, having it also be a node in an red-black tree complicates the reference counting. Switch to having a map_rb_node which is a red-block tree node but points at the reference counted struct map. This reference is responsible for a single reference count. Committer notes: Fixed up tools/perf/util/unwind-libunwind-local.c to use map_rb_node as well. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Darren Hart <dvhart@infradead.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: German Gomez <german.gomez@arm.com> Cc: Hao Luo <haoluo@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Miaoqian Lin <linmq006@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com> Cc: Song Liu <song@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Stephen Brennan <stephen.s.brennan@oracle.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Yury Norov <yury.norov@gmail.com> Link: https://lore.kernel.org/r/20230320212248.1175731-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-04perf map: Move map list node into symbolIan Rogers
Using a perf map as a list node is only done in symbol. Move the list_node struct into symbol as a single pointer to the map. This makes reference count behavior more obvious and easy to check. Committer notes: Some changes to reduce the number of lines touched by keeping, for instance, the 'new_map' variable and setting it to new_node->map, so that we keep more of the project history in place and keep as much as possible the value of the 'git blame' tool. Also use map__zput() when putting a struct members, so that when we free the container struct we can get use-after-free errors as NULL pointer derefs sometimes. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Darren Hart <dvhart@infradead.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: German Gomez <german.gomez@arm.com> Cc: Hao Luo <haoluo@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Miaoqian Lin <linmq006@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com> Cc: Song Liu <song@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Stephen Brennan <stephen.s.brennan@oracle.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Yury Norov <yury.norov@gmail.com> Link: https://lore.kernel.org/r/20230320212248.1175731-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-04perf jit: Fix a few memory leaksIan Rogers
As reported by leak sanitizer. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Brian Robbins <brianrob@linux.microsoft.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yuan Can <yuancan@huawei.com> Link: https://lore.kernel.org/r/20230403203545.1872196-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-04perf srcline: Avoid addr2line SIGPIPEsIan Rogers
Ignore SIGPIPEs when addr2line is configured. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tom Rix <trix@redhat.com> Cc: llvm@lists.linux.dev Link: https://lore.kernel.org/r/20230403184033.1836023-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-04perf srcline: Support for llvm-addr2lineIan Rogers
The sentinel value differs for llvm-addr2line. Configure this once and then detect when reading records. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tom Rix <trix@redhat.com> Cc: llvm@lists.linux.dev Link: https://lore.kernel.org/r/20230403184033.1836023-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-04perf srcline: Simplify addr2line subprocessIan Rogers
Don't wrap stdin and stdout of subprocess with streams, use the api/io library for buffering. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tom Rix <trix@redhat.com> Cc: llvm@lists.linux.dev Link: https://lore.kernel.org/r/20230403184033.1836023-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-04perf pmu: Add perf_pmu__{open,scan}_file_at()Namhyung Kim
These two helpers will also use openat() to reduce the overhead with relative pathnames. Convert other functions in pmu_lookup() to use the new helpers. Committer testing: Before: ⬢[acme@toolbox perf-tools-next]$ perf bench internals pmu-scan # Running 'internals/pmu-scan' benchmark: Computing performance of sysfs PMU event scan for 100 times Average PMU scanning took: 2729.040 usec (+- 7.117 usec) ⬢[acme@toolbox perf-tools-next]$ After: ⬢[acme@toolbox perf-tools-next]$ perf bench internals pmu-scan # Running 'internals/pmu-scan' benchmark: Computing performance of sysfs PMU event scan for 100 times Average PMU scanning took: 2419.870 usec (+- 9.057 usec) ⬢[acme@toolbox perf-tools-next]$ Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20230331202949.810326-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-04perf pmu: Use relative path in perf_pmu__caps_parse()Namhyung Kim
Likewise, it needs to traverse the pmu/caps directory, let's use openat() with the dirfd instead of open() using the absolute path. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20230331202949.810326-2-namhyung@kernel.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: LKML <linux-kernel@vger.kernel.org> Cc: linux-perf-users@vger.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-04perf pmu: Use relative path for sysfs scanNamhyung Kim
The PMU information is in the kernel sysfs so it needs to scan the directory to get the whole information like event aliases, formats and so on. During the traversal, it opens a lot of files and directories like below: dir = opendir("/sys/bus/event_source/devices"); while (dentry = readdir(dir)) { char buf[PATH_MAX]; snprintf(buf, sizeof(buf), "%s/%s", "/sys/bus/event_source/devices", dentry->d_name); fd = open(buf, O_RDONLY); ... } But this is not good since it needs to copy the string to build the absolute pathname, and it makes redundant pathname walk (from the /sys) unnecessarily. We can use openat(2) to open the file in the given directory. While it's not a problem ususally, it can be a problem when the kernel has contentions on the sysfs. Add a couple of new helper to return the file descriptor of PMU directory so that it can use it with relative paths. * perf_pmu__event_source_devices_fd() - returns a fd for the PMU root ("/sys/bus/event_source/devices") * perf_pmu__pathname_fd() - returns a fd for "<pmu>/<file>" under the PMU root Now the above code can be converted something like below: dirfd = perf_pmu__event_source_devices_fd(); dir = fdopendir(dirfd); while (dentry = readdir(dir)) { fd = openat(dirfd, dentry->d_name, O_RDONLY); ... } Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20230331202949.810326-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-04perf pmu: Add perf_pmu__destroy() functionNamhyung Kim
It seems there's no function to delete the perf pmu struct. Add the perf_pmu__destroy() to do the job. While at it, add some more helper functions to delete pmu aliases and caps. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20230331202949.810326-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-04perf tools: Fix a asan issue in parse_events_multi_pmu_add()Namhyung Kim
In the parse_events_multi_pmu_add() it passes the 'config' variable twice to parse_events_term__num() - one for config and another for loc_term. I'm not sure about the second one as it's converted to YYLTYPE variable. Asan reports it like below: In function ‘parse_events_term__num’, inlined from ‘parse_events_multi_pmu_add’ at util/parse-events.c:1602:6: util/parse-events.c:2653:64: error: array subscript ‘YYLTYPE[0]’ is partly outside array bounds of ‘char[8]’ [-Werror=array-bounds] 2653 | .err_term = loc_term ? loc_term->first_column : 0, | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~ util/parse-events.c: In function ‘parse_events_multi_pmu_add’: util/parse-events.c:1587:15: note: object ‘config’ of size 8 1587 | char *config; | ^~~~~~ cc1: all warnings being treated as errors Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20230331202949.810326-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-04perf list: Use relative path for tracepoint scanNamhyung Kim
Committer notes: Added missing #include <unistd.h> for the close() prototype to fix this on Alma Linux 8: 1 21.54 almalinux:8 : FAIL gcc version 8.5.0 20210514 (Red Hat 8.5.0-16) (GCC) util/print-events.c: In function 'print_tracepoint_events': util/print-events.c:103:4: error: implicit declaration of function 'close'; did you mean 'clone'? [-Werror=implicit-function-declaration] close(evt_fd); ^~~~~ clone Also use the newly added scandirat feature test to check if that function is available, providing a HAVE_SCANDIRAT_SUPPORT conditional warning to the user if it isn't available. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20230331202949.810326-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-04perf intel-pt: Fix CYC timestamps after standalone CBRAdrian Hunter
After a standalone CBR (not associated with TSC), update the cycles reference timestamp and reset the cycle count, so that CYC timestamps are calculated relative to that point with the new frequency. Fixes: cc33618619cefc6d ("perf tools: Add Intel PT support for decoding CYC packets") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230403154831.8651-2-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-04perf auxtrace: Fix address filter entire kernel sizeAdrian Hunter
kallsyms is not completely in address order. In find_entire_kern_cb(), calculate the kernel end from the maximum address not the last symbol. Example: Before: $ sudo cat /proc/kallsyms | grep ' [twTw] ' | tail -1 ffffffffc00b8bd0 t bpf_prog_6deef7357e7b4530 [bpf] $ sudo cat /proc/kallsyms | grep ' [twTw] ' | sort | tail -1 ffffffffc15e0cc0 t iwl_mvm_exit [iwlmvm] $ perf.d093603a05aa record -v --kcore -e intel_pt// --filter 'filter *' -- uname |& grep filter Address filter: filter 0xffffffff93200000/0x2ceba000 After: $ perf.8fb0f7a01f8e record -v --kcore -e intel_pt// --filter 'filter *' -- uname |& grep filter Address filter: filter 0xffffffff93200000/0x2e3e2000 Fixes: 1b36c03e356936d6 ("perf record: Add support for using symbols in address filters") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230403154831.8651-2-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-04perf arm-spe: Add raw decoding for SPEv1.3 MTE and MOPS load/storeRob Herring
Arm SPEv1.3 adds new load/store operation subclasses for Memory Tagging Extension (MTE) and memory operations (MOPS). The memory operations are memcpy and memset. Add support for decoding these new subclasses in the raw decoding. Reviewed-by: Leo Yan <leo.yan@linaro.org Signed-off-by: Rob Herring <robh@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20230327162057.4057188-1-robh@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-04perf cs-etm: Handle PERF_RECORD_AUX_OUTPUT_HW_ID packetMike Leach
When using dynamically assigned CoreSight trace IDs the drivers can output the ID / CPU association as a PERF_RECORD_AUX_OUTPUT_HW_ID packet. Update cs-etm decoder to handle this packet by setting the CPU/Trace ID mapping. Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Mike Leach <mike.leach@linaro.org> Acked-by: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Darren Hart <darren@os.amperecomputing.com> Cc: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230331055645.26918-2-mike.leach@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-04perf cs-etm: Move mapping of Trace ID and cpu into helper functionMike Leach
The information to associate Trace ID and CPU will be changing. Drivers will start outputting this as a hardware ID packet in the data file which if present will be used in preference to the AUXINFO values. To prepare for this we provide a helper functions to do the individual ID mapping, and one to extract the IDs from the completed metadata blocks. Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Mike Leach <mike.leach@linaro.org> Acked-by: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Darren Hart <darren@os.amperecomputing.com> Cc: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230331055645.26918-2-mike.leach@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-04perf lock contention: Show detail failure reason for BPFNamhyung Kim
It can fail to collect lock stat from BPF for various reasons. For example, I've got a report that sometimes time calculation seems wrong in case of contended spinlocks. I suspect the time delta went negative for some reason. Count them separately and show in the output like below: $ sudo perf lock contention -abE5 sleep 10 contended total wait max wait avg wait type caller 13 785.61 us 79.36 us 60.43 us spinlock remove_wait_queue+0x14 10 469.02 us 87.51 us 46.90 us spinlock prepare_to_wait+0x27 9 289.09 us 69.08 us 32.12 us spinlock finish_wait+0x36 114 251.05 us 8.56 us 2.20 us spinlock try_to_wake_up+0x1f5 132 188.63 us 5.01 us 1.43 us spinlock __wake_up_common_lock+0x62 === output for debug === bad: 1, total: 279 bad rate: 0.36 % histogram of failure reasons task: 1 stack: 0 time: 0 Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Hao Luo <haoluo@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20230327225711.245738-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-04perf symbol: Remove unused branch_callstackAdrian Hunter
branch_callstack was added by commit 8b7bad58efb7 ("perf callchain: Support handling complete branch stacks as histograms") but never used. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20230330131833.12864-2-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-04perf block-range: Move debug code behind ifndef NDEBUGIan Rogers
Make good on a comment and avoid a unused-but-set-variable warning. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20230330183827.1412303-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-04perf symbol: Add command line support for addr2line pathIan Rogers
Allow addr2line to be set either on the command line or via the perfconfig file. This doesn't currently work with llvm-addr2line as the addr2line code emits two things: 1) the address to decode, 2) a bogus ',' value. The expectation is the bogus value will generate: ?? ??:0 that terminates the addr2line reading. However, the output from llvm-addr2line is a single line with just the input ',' locking up the addr2line reading that is expecting a second line. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andres Freund <andres@anarazel.de> Cc: German Gomez <german.gomez@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Tom Rix <trix@redhat.com> Cc: llvm@lists.linux.dev Link: https://lore.kernel.org/r/20230328235543.1082207-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-04perf annotate: Allow objdump to be set in perfconfigIan Rogers
Allow the setting of the objdump command in the perfconfig. Update man page for this new option. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andres Freund <andres@anarazel.de> Cc: German Gomez <german.gomez@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Tom Rix <trix@redhat.com> Cc: llvm@lists.linux.dev Link: https://lore.kernel.org/r/20230328235543.1082207-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-04perf annotate: Own objdump_path and disassembler_style stringsIan Rogers
Make struct annotation_options own the strings objdump_path and disassembler_style, freeing them on exit. Add missing strdup for disassembler_style when read from a config file. Committer notes: Converted free(obj->member) to zfree(&obj->member) in annotation_options__exit() Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andres Freund <andres@anarazel.de> Cc: German Gomez <german.gomez@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Tom Rix <trix@redhat.com> Cc: llvm@lists.linux.dev Link: https://lore.kernel.org/r/20230328235543.1082207-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-04perf annotate: Add init/exit to annotation_options remove defaultIan Rogers
The annotation__default_options global variable was used to initialize annotation_options. Switch to the init/exit pattern as later changes will give ownership over strings and this will be necessary to avoid memory leaks. Committer note: Fix the GTK2=1 build, hist_entry__gtk_annotate() needs to receive a 'struct annotation_options' pointer. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andres Freund <andres@anarazel.de> Cc: German Gomez <german.gomez@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Tom Rix <trix@redhat.com> Cc: llvm@lists.linux.dev Link: https://lore.kernel.org/r/20230328235543.1082207-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-04perf tools: Avoid warning in do_realloc_array_as_needed()Adrian Hunter
do_realloc_array_as_needed() used memcpy() of zero size with a NULL pointer. Check the size first to avoid sanitize warning. Discovered using EXTRA_CFLAGS="-fsanitize=undefined -fsanitize=address". Reported-by: kernel test robot <yujie.liu@intel.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/oe-lkp/202303061424.6ad43294-yujie.liu@intel.com Link: https://lore.kernel.org/r/20230316194156.8320-2-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-04perf symbols: Fix unaligned access in get_x86_64_plt_disp()Adrian Hunter
Use memcpy() to avoid unaligned access. Discovered using EXTRA_CFLAGS="-fsanitize=undefined -fsanitize=address". Fixes: ce4c8e7966f317ef ("perf symbols: Get symbols for .plt.got for x86-64") Reported-by: kernel test robot <yujie.liu@intel.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/oe-lkp/202303061424.6ad43294-yujie.liu@intel.com Link: https://lore.kernel.org/r/20230316194156.8320-2-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-04perf symbols: Fix use-after-free in get_plt_got_name()Adrian Hunter
Fix use-after-free in get_plt_got_name(). Discovered using EXTRA_CFLAGS="-fsanitize=undefined -fsanitize=address". Fixes: ce4c8e7966f317ef ("perf symbols: Get symbols for .plt.got for x86-64") Reported-by: kernel test robot <yujie.liu@intel.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/oe-lkp/202303061424.6ad43294-yujie.liu@intel.com Link: https://lore.kernel.org/r/20230316194156.8320-2-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-04perf metrics: Add has_pmem literalIan Rogers
Add literal so that if nvdimms aren't installed we can record fewer events. The file detection mechanism was suggested by Dan Williams <dan.j.williams@intel.com> in: https://lore.kernel.org/linux-perf-users/641bbe1eced26_1b98bb29440@dwillia2-xfh.jf.intel.com.notmuch/ Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Edward Baker <edward.baker@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20230324072218.181880-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-04perf lock contention: Fix msan issue in lock_contention_read()Namhyung Kim
I got a report of a msan failure like below: $ sudo perf lock con -ab -- sleep 1 ... ==224416==WARNING: MemorySanitizer: use-of-uninitialized-value #0 0x5651160d6c96 in lock_contention_read util/bpf_lock_contention.c:290:8 #1 0x565115f90870 in __cmd_contention builtin-lock.c:1919:3 #2 0x565115f90870 in cmd_lock builtin-lock.c:2385:8 #3 0x565115f03a83 in run_builtin perf.c:330:11 #4 0x565115f03756 in handle_internal_command perf.c:384:8 #5 0x565115f02d53 in run_argv perf.c:428:2 #6 0x565115f02d53 in main perf.c:562:3 #7 0x7f43553bc632 in __libc_start_main #8 0x565115e865a9 in _start It was because the 'key' variable is not initialized. Actually it'd be set by bpf_map_get_next_key() but msan didn't seem to understand it. Let's make msan happy by initializing the variable. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20230324001922.937634-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-04perf hist: Improve srcfile sort key performance (really)Namhyung Kim
The earlier commit f0cdde28fecc0d7f ("perf hist: Improve srcfile sort key performance") updated the srcfile logic but missed to change the ->cmp() callback which is called for every sample. It should use the same logic like in the srcline to speed up the processing because it'd return the same information repeatedly for the same address. The real processing will be done in sort__srcfile_collapse(). Fixes: f0cdde28fecc0d7f ("perf hist: Improve srcfile sort key performance") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20230323025005.191239-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-04perf report: Append inlines to non-DWARF callchainsArtem Savkov
Append information about inlined functions to FP and LBR callchains from DWARF debuginfo when available. Do so by calling append_inlines() from add_callchain_ip(). Testing it: Frame-pointer mode recorded with 'perf record --call-graph=fp --freq=max -- ./a.out' #include <stdio.h> #include <stdint.h> static __attribute__((noinline)) uint32_t func5(uint32_t i) { return i + 10; } static uint32_t func4(uint32_t i) { return func5(i + 5); } static inline uint32_t func3(uint32_t i) { return func4(i + 4); } static __attribute__((noinline)) uint32_t func2(uint32_t i) { return func3(i + 3); } static uint32_t func1(uint32_t i) { return func2(i + 2); } __attribute__((noinline)) uint64_t entry(void) { uint64_t ret = 0; uint32_t i = 0; for (i = 0; i < 1000000; i++) { ret += func1(i); ret -= func2(i); ret += func3(i); ret += func4(i); ret -= func5(i); } return ret; } int main(int argc, char **argv) { printf("%s\n", __func__); return entry(); } ====== Here is the output I get with '--call-graph callee --no-children' ====== # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 250 of event 'cycles:u' # Event count (approx.): 26819859 # # Overhead Command Shared Object Symbol # ........ ....... .................... ..................................... # 43.58% a.out a.out [.] func5 | |--28.93%--entry | main | __libc_start_call_main | --14.65%--func4 (inlined) | |--10.45%--entry | main | __libc_start_call_main | --4.20%--func3 (inlined) entry main __libc_start_call_main 38.80% a.out a.out [.] entry | |--23.27%--func4 (inlined) | | | |--20.28%--func3 (inlined) | | func2 | | main | | __libc_start_call_main | | | --2.99%--entry | main | __libc_start_call_main | |--8.17%--func5 | main | __libc_start_call_main | |--3.89%--func1 (inlined) | entry | main | __libc_start_call_main | --3.48%--entry main __libc_start_call_main 13.07% a.out a.out [.] func2 | ---func5 main __libc_start_call_main 1.54% a.out [unknown] [k] 0xffffffff81e011b7 1.16% a.out [unknown] [k] 0xffffffff81e00193 | --0.57%--__mmap64 (inlined) __mmap64 (inlined) 0.34% a.out ld-linux-x86-64.so.2 [.] __tunable_get_val 0.34% a.out ld-linux-x86-64.so.2 [.] strcmp 0.32% a.out libc.so.6 [.] strchr 0.31% a.out ld-linux-x86-64.so.2 [.] _dl_relocate_object 0.22% a.out ld-linux-x86-64.so.2 [.] _dl_init_paths 0.18% a.out ld-linux-x86-64.so.2 [.] get_common_cache_info.constprop.0 0.14% a.out ld-linux-x86-64.so.2 [.] __GI___tunables_init # # (Tip: Show individual samples with: perf script) # ====== It does not seem to be out of order, or at least it is consistent with what I get with dwarf unwinders. Committer notes: Adrian Hunter pointed out that this breaks --branch-history, so don't do it for branches, see the second Link below. Suggested-by: Andrii Nakryiko <andrii.nakryiko@gmail.com> Signed-off-by: <asavkov@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Milian Wolff <milian.wolff@kdab.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20230316133557.868731-2-asavkov@redhat.com Link: https://lore.kernel.org/r/54129783-2960-84e1-05e9-97ac70ffb432@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-03-21perf tools: Add support for perf_event_attr::config3Rob Herring
perf_event_attr has gained a new field, config3, so add support for it extending the existing configN support. Signed-off-by: Rob Herring <robh@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20220914-arm-perf-tool-spe1-2-v2-v5-2-2cf5210b2f77@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-03-21perf kvm: Reference count 'struct kvm_info'Leo Yan
hists__add_entry_ops() doesn't allocate a new histogram entry if it has an existing entry for a KVM event, in this case, find_create_kvm_event() allocates a 'struct kvm_info' but it's not used by any histograms and never freed. To fix the memory leak, this patch first introduces a refcnt and a set of functions for refcnt operations on 'struct kvm_info'. When the data structure is not anymore used (the refcnt hits zero) kvm_info__zput() will free the memory used. Committer: Provide a nop version of kvm_info__zput() to be used when HAVE_KVM_STAT_SUPPORT isn't defined as it is used unconditionally in hists__findnew_entry() and hist_entry__delete(). Signed-off-by: Leo Yan <leo.yan@linaro.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20230320061619.29520-2-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-03-20perf report: Add 'simd' sort fieldGerman Gomez
Add 'simd' sort field to visualize SIMD ops in 'perf report'. Rows are labeled with the SIMD ISA, and the type of predicate (if any): - [p] partial predicate - [e] empty predicate (no elements in the vector being used) Example with Arm SPE and SVE (Scalable Vector Extension): #include <arm_sve.h> double src[1025], dst[1025]; int main(void) { svfloat64_t vc = svdup_f64(1); for(;;) for(int i = 0; i < 1025; i += svcntd()) { svbool_t pg = svwhilelt_b64(i, 1025); svfloat64_t vsrc = svld1(pg, &src[i]); svfloat64_t vdst = svadd_x(pg, vsrc, vc); svst1(pg, &dst[i], vdst); } return 0; } ... compiled using "gcc-11 -march=armv8-a+sve -O3" Profiling on a platform that implements FEAT_SVE and FEAT_SPEv1p1: $ perf record -e arm_spe_0// -- ./a.out $ perf report --itrace=i1i -s overhead,pid,simd,sym Overhead Pid:Command Simd Symbol ........ ................ ....... ...................... 53.76% 10758:program [.] main 46.14% 10758:program [.] SVE [.] main 0.09% 10758:program [p] SVE [.] main The report shows 0.09% of the sampled SVE operations use partial predicates due to src and dst arrays not being multiples of the vector register lengths. Signed-off-by: German Gomez <german.gomez@arm.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anshuman.Khandual@arm.com Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230320151509.1137462-2-james.clark@arm.com Signed-off-by: James Clark <james.clark@arm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-03-20perf arm-spe: Add SVE flags to the SPE samplesGerman Gomez
Add flags from the Scalable Vector Extension (SVE) to the SPE samples which are available from Armv8.3 (FEAT_SPEv1p1). These will be displayed in a new SIMD sort field in a later commit. Signed-off-by: German Gomez <german.gomez@arm.com> Signed-off-by: James Clark <james.clark@arm.com> Acked-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20230320151509.1137462-2-james.clark@arm.com Cc: Anshuman.Khandual@arm.com Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: linux-arm-kernel@lists.infradead.org Cc: John Garry <john.g.garry@oracle.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: linux-kernel@vger.kernel.org Cc: linux-perf-users@vger.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-03-20perf arm-spe: Refactor arm-spe to support operation packet typeGerman Gomez
Extend the decoder of Arm SPE records to support more fields from the operation packet type. Not all fields are being decoded by this commit. Only those needed to support the use-case SVE load/store/other operations. Suggested-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: German Gomez <german.gomez@arm.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anshuman.Khandual@arm.com Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230320151509.1137462-2-james.clark@arm.com Signed-off-by: James Clark <james.clark@arm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-03-20perf event: Add 'simd_flags' field to 'struct perf_sample'German Gomez
Add new field to 'struct perf_sample' to store flags related to SIMD ops. It will be used to store SIMD information from SVE and NEON when profiling using ARM SPE. Signed-off-by: German Gomez <german.gomez@arm.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anshuman.Khandual@arm.com Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20230320151509.1137462-2-james.clark@arm.com Signed-off-by: James Clark <james.clark@arm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-03-20perf intel-pt: Add support for new branch instructions ERETS and ERETUAdrian Hunter
Intel Flexible Return and Event Delivery (FRED) adds instructions ERETS (return to supervisor) and ERETU (return to user). Intel PT instruction decoder needs to know about these instructions because they are branch instructions. Similar to IRET instructions, when the decoder encounters one of these instructions it will match it to a TIP (target instruction pointer) packet that informs what the branch destination is. The existing "x86 instruction decoder - new instructions" test can be used to test the result e.g. $ perf test -v ins |& grep eret Decoded ok: f2 0f 01 ca erets Decoded ok: f3 0f 01 ca eretu Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20230320183517.15099-2-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-03-20perf symbol: Sort names under write lockIan Rogers
If finding a name doesn't find the sorted names then they are allocated and sorted. This shouldn't be done under a read lock as another reader may access it. Release the read lock and acquire the write lock, then release the write lock and reacquire the read lock. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: André Almeida <andrealmeid@collabora.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Darren Hart <dvhart@infradead.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: German Gomez <german.gomez@arm.com> Cc: Hao Luo <haoluo@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Miaoqian Lin <linmq006@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com> Cc: Song Liu <song@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Stephen Brennan <stephen.s.brennan@oracle.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Yury Norov <yury.norov@gmail.com> Link: https://lore.kernel.org/r/20230320033810.980165-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>