| .TH profile 8 "2020-03-18" "USER COMMANDS" |
| .SH NAME |
| profile \- Profile CPU usage by sampling stack traces. Uses Linux eBPF/bcc. |
| .SH SYNOPSIS |
| .B profile [\-adfh] [\-p PID | \-L TID] [\-U | \-K] [\-F FREQUENCY | \-c COUNT] |
| .B [\-\-stack\-storage\-size COUNT] [\-C CPU] [\-\-cgroupmap CGROUPMAP] [\-\-mntnsmap MAPPATH] [duration] |
| .SH DESCRIPTION |
| This is a CPU profiler. It works by taking samples of stack traces at timed |
| intervals. It will help you understand and quantify CPU usage: which code is |
| executing, and by how much, including both user-level and kernel code. |
| |
| By default this samples at 49 Hertz (samples per second), across all CPUs. |
| This frequency can be tuned using a command line option. The reason for 49, and |
| not 50, is to avoid lock-step sampling. |
| |
| This is also an efficient profiler, as stack traces are frequency counted in |
| kernel context, rather than passing each stack to user space for frequency |
| counting there. Only the unique stacks and counts are passed to user space |
| at the end of the profile, greatly reducing the kernel<->user transfer. |
| .SH REQUIREMENTS |
| CONFIG_BPF and bcc. |
| |
| This also requires Linux 4.9+ (BPF_PROG_TYPE_PERF_EVENT support). See tools/old |
| for an older version that may work on Linux 4.6 - 4.8. |
| .SH OPTIONS |
| .TP |
| \-h |
| Print usage message. |
| .TP |
| \-p PID |
| Trace process with one or more comma separated PIDs only (filtered in-kernel). |
| .TP |
| \-L TID |
| Trace thread with one or more comma separated TIDs only (filtered in-kernel). |
| .TP |
| \-F frequency |
| Frequency to sample stacks. |
| .TP |
| \-c count |
| Sample stacks every one in this many events. |
| .TP |
| \-f |
| Print output in folded stack format. |
| .TP |
| \-d |
| Include an output delimiter between kernel and user stacks (either "--", or, |
| in folded mode, "-"). |
| .TP |
| \-U |
| Show stacks from user space only (no kernel space stacks). |
| .TP |
| \-K |
| Show stacks from kernel space only (no user space stacks). |
| .TP |
| \-I |
| Include CPU idle stacks (by default these are excluded). |
| .TP |
| \-\-stack-storage-size COUNT |
| The maximum number of unique stack traces that the kernel will count (default |
| 16384). If the sampled count exceeds this, a warning will be printed. |
| .TP |
| \-C cpu |
| Collect stacks only from specified cpu. |
| .TP |
| \-\-cgroupmap MAPPATH |
| Profile cgroups in this BPF map only (filtered in-kernel). |
| .TP |
| duration |
| Duration to trace, in seconds. |
| .SH EXAMPLES |
| .TP |
| Profile (sample) stack traces system-wide at 49 Hertz (samples per second) until Ctrl-C: |
| # |
| .B profile |
| .TP |
| Profile for 5 seconds only: |
| # |
| .B profile 5 |
| .TP |
| Profile at 99 Hertz for 5 seconds only: |
| # |
| .B profile -F 99 5 |
| .TP |
| Profile 1 in a million events for 5 seconds only: |
| # |
| .B profile -c 1000000 5 |
| .TP |
| Profile process with PID 181 only: |
| # |
| .B profile -p 181 |
| .TP |
| Profile thread with TID 181 only: |
| # |
| .B profile -L 181 |
| .TP |
| Profile for 5 seconds and output in folded stack format (suitable as input for flame graphs), including a delimiter between kernel and user stacks: |
| # |
| .B profile -df 5 |
| .TP |
| Profile kernel stacks only: |
| # |
| .B profile -K |
| .TP |
| Profile a set of cgroups only (see special_filtering.md from bcc sources for more details): |
| # |
| .B profile \-\-cgroupmap /sys/fs/bpf/test01 |
| .SH DEBUGGING |
| See "[unknown]" frames with bogus addresses? This can happen for different |
| reasons. Your best approach is to get Linux perf to work first, and then to |
| try this tool. Eg, "perf record \-F 49 \-a \-g \-\- sleep 1; perf script", and |
| to check for unknown frames there. |
| |
| The most common reason for "[unknown]" frames is that the target software has |
| not been compiled |
| with frame pointers, and so we can't use that simple method for walking the |
| stack. The fix in that case is to use software that does have frame pointers, |
| eg, gcc -fno-omit-frame-pointer, or Java's -XX:+PreserveFramePointer. |
| |
| Another reason for "[unknown]" frames is JIT compilers, which don't use a |
| traditional symbol table. The fix in that case is to populate a |
| /tmp/perf-PID.map file with the symbols, which this tool should read. How you |
| do this depends on the runtime (Java, Node.js). |
| |
| If you seem to have unrelated samples in the output, check for other |
| sampling or tracing tools that may be running. The current version of this |
| tool can include their events if profiling happened concurrently. Those |
| samples may be filtered in a future version. |
| .SH OVERHEAD |
| This is an efficient profiler, as stack traces are frequency counted in |
| kernel context, and only the unique stacks and their counts are passed to |
| user space. Contrast this with the current "perf record -F 99 -a" method |
| of profiling, which writes each sample to user space (via a ring buffer), |
| and then to the file system (perf.data), which must be post-processed. |
| |
| This uses perf_event_open to setup a timer which is instrumented by BPF, |
| and for efficiency it does not initialize the perf ring buffer, so the |
| redundant perf samples are not collected. |
| |
| It's expected that the overhead while sampling at 49 Hertz (the default), |
| across all CPUs, should be negligible. If you increase the sample rate, the |
| overhead might begin to be measurable. |
| .SH SOURCE |
| This is from bcc. |
| .IP |
| https://github.com/iovisor/bcc |
| .PP |
| Also look in the bcc distribution for a companion _examples.txt file containing |
| example usage, output, and commentary for this tool. |
| .SH OS |
| Linux |
| .SH STABILITY |
| Unstable - in development. |
| .SH AUTHOR |
| Brendan Gregg |
| .SH SEE ALSO |
| offcputime(8) |