| HOWTO - using the library with perf {#howto_perf} |
| =================================== |
| |
| @brief Using command line perf and OpenCSD to collect and decode trace. |
| |
| This HOWTO explains how to use the perf cmd line tools and the openCSD |
| library to collect and extract program flow traces generated by the |
| CoreSight IP blocks on a Linux system. The examples have been generated using |
| an aarch64 Juno-r0 platform. |
| |
| |
| On Target Trace Acquisition - Perf Record |
| ----------------------------------------- |
| |
| Compile the perf tool from the same kernel source code version you are using with: |
| |
| make -C tools/perf |
| |
| This will yield a `perf` executable that will support CoreSight trace collection. |
| |
| *Note:* If traces are to be decompressed **off** target, there is no need to download |
| and compile the openCSD library (on the target). |
| |
| If you are instead planning to use perf to record and decode the trace on the target, |
| compile the perf tool linking against the openCSD library, in the following way: |
| |
| make -C tools/perf VF=1 CORESIGHT=1 |
| |
| Further information on the needed build environments and options are detailed later |
| in the section **Off Target Perf Tools Compilation**. |
| |
| Before launching a trace run a sink that will collect trace data needs to be |
| identified. All CoreSight blocks identified by the framework are registed in |
| sysFS: |
| |
| |
| linaro@linaro-nano:~$ ls /sys/bus/coresight/devices/ |
| etm0 etm2 etm4 etm6 funnel0 funnel2 funnel4 stm0 tmc_etr0 |
| etm1 etm3 etm5 etm7 funnel1 funnel3 replicator0 tmc_etf0 |
| |
| |
| CoreSight blocks are listed in the device tree for a specific system and |
| discovered at boot time. Since tracers can be linked to more than one sink, |
| the sink that will recieve trace data needs to be identified and given as an |
| option on the perf command line. Once a sink has been identify trace collection |
| can start. An easy and yet interesting example is the `uname` command: |
| |
| linaro@linaro-nano:~/kernel$ ./tools/perf/perf record -e cs_etm/@tmc_etr0/ --per-thread uname |
| |
| This will generate a `perf.data` file where execution has been traced for both |
| user and kernel space. To narrow the field to either user or kernel space the |
| `u` and `k` options can be specified. For example the following will limit |
| traces to user space: |
| |
| |
| linaro@linaro-nano:~/kernel$ ./tools/perf/perf record -vvv -e cs_etm/@tmc_etr0/u --per-thread uname |
| Problems setting modules path maps, continuing anyway... |
| ----------------------------------------------------------- |
| perf_event_attr: |
| type 8 |
| size 112 |
| { sample_period, sample_freq } 1 |
| sample_type IP|TID|IDENTIFIER |
| read_format ID |
| disabled 1 |
| exclude_kernel 1 |
| exclude_hv 1 |
| enable_on_exec 1 |
| sample_id_all 1 |
| ------------------------------------------------------------ |
| sys_perf_event_open: pid 11375 cpu -1 group_fd -1 flags 0x8 |
| ------------------------------------------------------------ |
| perf_event_attr: |
| type 1 |
| size 112 |
| config 0x9 |
| { sample_period, sample_freq } 1 |
| sample_type IP|TID|IDENTIFIER |
| read_format ID |
| disabled 1 |
| exclude_kernel 1 |
| exclude_hv 1 |
| mmap 1 |
| comm 1 |
| enable_on_exec 1 |
| task 1 |
| sample_id_all 1 |
| mmap2 1 |
| comm_exec 1 |
| ------------------------------------------------------------ |
| sys_perf_event_open: pid 11375 cpu -1 group_fd -1 flags 0x8 |
| mmap size 266240B |
| AUX area mmap length 131072 |
| perf event ring buffer mmapped per thread |
| Synthesizing auxtrace information |
| Linux |
| auxtrace idx 0 old 0 head 0x11ea0 diff 0x11ea0 |
| [ perf record: Woken up 1 times to write data ] |
| overlapping maps: |
| 7f99daf000-7f99db0000 0 [vdso] |
| 7f99d84000-7f99db3000 0 /lib/aarch64-linux-gnu/ld-2.21.so |
| 7f99d84000-7f99daf000 0 /lib/aarch64-linux-gnu/ld-2.21.so |
| 7f99db0000-7f99db3000 0 /lib/aarch64-linux-gnu/ld-2.21.so |
| failed to write feature 8 |
| failed to write feature 9 |
| failed to write feature 14 |
| [ perf record: Captured and wrote 0.072 MB perf.data ] |
| |
| linaro@linaro-nano:~/kernel$ ls -l ~/.debug/ perf.data |
| _-rw------- 1 linaro linaro 77888 Mar 2 20:41 perf.data |
| |
| /home/linaro/.debug/: |
| total 16 |
| drwxr-xr-x 2 linaro linaro 4096 Mar 2 20:40 [kernel.kallsyms] |
| drwxr-xr-x 2 linaro linaro 4096 Mar 2 20:40 [vdso] |
| drwxr-xr-x 3 linaro linaro 4096 Mar 2 20:40 bin |
| drwxr-xr-x 3 linaro linaro 4096 Mar 2 20:40 lib |
| |
| Trace data filtering |
| -------------------- |
| The amount of traces generated by CoreSight tracers is staggering, event for |
| the most simple trace scenario. Reducing trace generation to specific areas |
| of interest is desirable to save trace buffer space and avoid getting lost in |
| the trace data that isn't relevant. Supplementing the 'k' and 'u' options |
| described above is the notion of address filters. |
| |
| On CoreSight two types of address filter have been implemented - address range |
| and start/stop filter: |
| |
| **Address range filters:** |
| With address range filters traces are generated if the instruction pointer |
| falls within the specified range. Any work done by the CPU outside of that |
| range will not be traced. Address range filters can be specified for both |
| user and kernel space session: |
| |
| perf record -e cs_etm/@tmc_etr0/k --filter 'filter 0xffffff8008562d0c/0x48' --per-thread uname |
| |
| perf record -e cs_etm/@tmc_etr0/u --filter 'filter 0x72c/0x40@/opt/lib/libcstest.so.1.0' --per-thread ./main |
| |
| When dealing with kernel space trace addresses are typically taken in the |
| 'System.map' file. In user space addresses are relocatable and can be |
| extracted from an objdump output: |
| |
| $ aarch64-linux-gnu-objdump -d libcstest.so.1.0 |
| ... |
| ... |
| 000000000000072c <coresight_test1>: <------------ Beginning of traces |
| 72c: d10083ff sub sp, sp, #0x20 |
| 730: b9000fe0 str w0, [sp,#12] |
| 734: b9001fff str wzr, [sp,#28] |
| 738: 14000007 b 754 <coresight_test1+0x28> |
| 73c: b9400fe0 ldr w0, [sp,#12] |
| 740: 11000800 add w0, w0, #0x2 |
| 744: b9000fe0 str w0, [sp,#12] |
| 748: b9401fe0 ldr w0, [sp,#28] |
| 74c: 11000400 add w0, w0, #0x1 |
| 750: b9001fe0 str w0, [sp,#28] |
| 754: b9401fe0 ldr w0, [sp,#28] |
| 758: 7100101f cmp w0, #0x4 |
| 75c: 54ffff0d b.le 73c <coresight_test1+0x10> |
| 760: b9400fe0 ldr w0, [sp,#12] |
| 764: 910083ff add sp, sp, #0x20 |
| 768: d65f03c0 ret |
| ... |
| ... |
| |
| Following the address the amount of byte is specified and if tracing in user |
| space, the full path to the binary (or library) being traced. |
| |
| **Start/Stop filters:** |
| With start/stop filters traces are generated when the instruction pointer is |
| equal to the start address. Incidentally traces stop being generated when the |
| insruction pointer is equal to the stop address. Anything that happens between |
| there to events is traced: |
| |
| perf record -e cs_etm/@tmc_etr0/k --filter 'start 0xffffff800856bc50,stop 0xffffff800856bcb0' --per-thread uname |
| |
| perf record -vvv -e cs_etm/@tmc_etr0/u --filter 'start 0x72c@/opt/lib/libcstest.so.1.0, \ |
| stop 0x40082c@/home/linaro/main' \ |
| --per-thread ./main |
| |
| **Limitation on address filters:** |
| The only limitation on address filters is the amount of address comparator |
| found on an implementation and the mutual exclusion between range and |
| start stop filters. As such the following example would _not_ work: |
| |
| perf record -e cs_etm/@tmc_etr0/k --filter 'start 0xffffff800856bc50,stop 0xffffff800856bcb0, \ // start/stop |
| filter 0x72c/0x40@/opt/lib/libcstest.so.1.0' \ // address range |
| --per-thread uname |
| |
| Additional Trace Options |
| ------------------------ |
| Additional options can be used during trace collection that add information to the captured trace. |
| |
| - Timestamps: These packets are added to the trace streams to allow correlation of different sources where tools support this. |
| - Cycle Counts: These packets are added to get a count of cycles for blocks of executed instructions. Adding cycle counts will considerably increase the amount of generated trace. |
| The relationship between cycle counts and executed instructions differs according to the trace protocol. |
| For example, the ETMv4 protocol will emit counts for groups of instructions according to a minimum count threshold. |
| Presently this threshold is fixed at 256 cycles for `perf record`. |
| |
| Command line options in `perf record` to use these features are part of the options for the `cs_etm` event: |
| |
| perf record -e cs_etm/timestamp,cycacc,@tmc_etr0/ --per-thread uname |
| |
| At current version, `perf record` and `perf script` do not use this additional information. |
| |
| The cs_etm perf event |
| --------------------- |
| |
| System information for this perf pmu event can be found at: |
| |
| /sys/devices/cs_etm |
| |
| This contains internal format of the parameters described above: |
| |
| root@linaro-developer:~# ls /sys/devices/cs_etm/format |
| contextid cycacc retstack sinkid timestamp |
| |
| and names of registered sinks: |
| |
| root@linaro-developer:~# ls /sys/devices/cs_etm/sinks |
| tmc_etf0 tmc_etr0 tpiu0 |
| |
| Note: The `sinkid` parameter is there to document the usage of a 32-bit internal parameter to |
| pass the sink name used in the cs_etm/@sink/ command to the kernel drivers. It can be used |
| directly as cs_etm/sinkid=<hash_value>/ but this is not recommended as the values used are |
| considered opaque and subject to changes. |
| |
| On Target Trace Collection |
| -------------------------- |
| The entire program flow will have been recorded in the `perf.data` file. |
| Information about libraries and executable is stored under `$HOME/.debug`: |
| |
| linaro@linaro-nano:~/kernel$ tree ~/.debug |
| .debug |
| ├── [kernel.kallsyms] |
| │ └── 0542921808098d591a7acba5a1163e8991897669 |
| │ └── kallsyms |
| ├── [vdso] |
| │ └── 551fbbe29579eb63be3178a04c16830b8d449769 |
| │ └── vdso |
| ├── bin |
| │ └── uname |
| │ └── ed95e81f97c4471fb2ccc21e356b780eb0c92676 |
| │ └── elf |
| └── lib |
| └── aarch64-linux-gnu |
| ├── ld-2.21.so |
| │ └── 94912dc5a1dc8c7ef2c4e4649d4b1639b6ebc8b7 |
| │ └── elf |
| └── libc-2.21.so |
| └── 169a143e9c40cfd9d09695333e45fd67743cd2d6 |
| └── elf |
| |
| 13 directories, 5 files |
| linaro@linaro-nano:~/kernel$ |
| |
| |
| All this information needs to be collected in order to successfully decode |
| traces off target: |
| |
| linaro@linaro-nano:~/kernel$ tar czf uname.trace.tgz perf.data ~/.debug |
| |
| |
| Note that file `vmlinux` should also be added to the bundle if kernel traces |
| have also been collected. |
| |
| |
| Off Target OpenCSD Compilation |
| ------------------------------ |
| The openCSD library is not part of the perf tools. It is available on |
| [github][1] and needs to be compiled before the perf tools. Checkout the |
| required branch/tag version into a local directory. |
| |
| linaro@t430:~/linaro/coresight$ git clone https://github.com/Linaro/OpenCSD.git my-opencsd |
| Cloning into 'OpenCSD'... |
| remote: Counting objects: 2063, done. |
| remote: Total 2063 (delta 0), reused 0 (delta 0), pack-reused 2063 |
| Receiving objects: 100% (2063/2063), 2.51 MiB | 1.24 MiB/s, done. |
| Resolving deltas: 100% (1399/1399), done. |
| Checking connectivity... done. |
| linaro@t430:~/linaro/coresight$ ls my-opencsd |
| decoder LICENSE README.md HOWTO.md TODO |
| |
| Once the source code has been acquired compilation of the openCSD library can |
| take place. For Linux two options are available, LINUX and LINUX64, based on |
| the host's (which has nothing to do with the target) architecture: |
| |
| linaro@t430:~/linaro/coresight/$ cd my-opencsd/decoder/build/linux/ |
| linaro@t430:~/linaro/coresight/my-opencsd/decoder/build/linux$ ls |
| makefile rctdl_c_api_lib ref_trace_decode_lib |
| |
| linaro@t430:~/linaro/coresight/my-opencsd/decoder/build/linux$ make LINUX64=1 DEBUG=1 |
| ... |
| ... |
| |
| linaro@t430:~/linaro/coresight/my-opencsd/decoder/build/linux$ ls ../../lib/linux64/dbg/ |
| libopencsd.a libopencsd_c_api.a libopencsd_c_api.so libopencsd.so |
| |
| From there the header file and libraries need to be installed on the system, |
| something that requires root privileges. The default installation path is |
| /usr/include/opencsd for the header files and /usr/lib/ for the libraries: |
| |
| linaro@t430:~/linaro/coresight/my-opencsd/decoder/build/linux$ sudo make install |
| linaro@t430:~/linaro/coresight/my-opencsd/decoder/build/linux$ ls -l /usr/include/opencsd |
| total 60 |
| drwxr-xr-x 2 root root 4096 Dec 12 10:19 c_api |
| drwxr-xr-x 2 root root 4096 Dec 12 10:19 etmv3 |
| drwxr-xr-x 2 root root 4096 Dec 12 10:19 etmv4 |
| -rw-r--r-- 1 root root 28049 Dec 12 10:19 ocsd_if_types.h |
| drwxr-xr-x 2 root root 4096 Dec 12 10:19 ptm |
| drwxr-xr-x 2 root root 4096 Dec 12 10:19 stm |
| -rw-r--r-- 1 root root 7264 Dec 12 10:19 trc_gen_elem_types.h |
| -rw-r--r-- 1 root root 3972 Dec 12 10:19 trc_pkt_types.h |
| |
| linaro@t430:~/linaro/coresight/my-opencsd/decoder/build/linux$ ls -l /usr/lib/libopencsd* |
| -rw-r--r-- 1 root root 598720 Dec 12 10:19 /usr/lib/libopencsd_c_api.so |
| -rw-r--r-- 1 root root 4692200 Dec 12 10:19 /usr/lib/libopencsd.so |
| |
| A "clean_install" target is also available so that openCSD installed files can |
| be removed from a system. Going forward the goal is to have the openCSD library |
| packaged as a Debian or RPM archive so that it can be installed from a |
| distribution without having to be compiled. |
| |
| |
| Off Target Perf Tools Compilation |
| --------------------------------- |
| |
| As mentioned above the openCSD library is not part of the perf tools' code base |
| and needs to be installed on a system prior to compilation. Information about |
| the status of the openCSD library on a system is given at compile time by the |
| perf tools build script: |
| |
| linaro@t430:~/linaro/linux-kernel$ make CORESIGHT=1 VF=1 -C tools/perf |
| Auto-detecting system features: |
| ... dwarf: [ on ] |
| ... dwarf_getlocations: [ on ] |
| ... glibc: [ on ] |
| ... gtk2: [ on ] |
| ... libaudit: [ on ] |
| ... libbfd: [ OFF ] |
| ... libelf: [ on ] |
| ... libnuma: [ OFF ] |
| ... numa_num_possible_cpus: [ OFF ] |
| ... libperl: [ on ] |
| ... libpython: [ on ] |
| ... libslang: [ on ] |
| ... libcrypto: [ on ] |
| ... libunwind: [ OFF ] |
| ... libdw-dwarf-unwind: [ on ] |
| ... zlib: [ on ] |
| ... lzma: [ OFF ] |
| ... get_cpuid: [ on ] |
| ... bpf: [ on ] |
| ... libopencsd: [ on ] <------- |
| |
| |
| At the end of the compilation a new perf binary is available in `tools/perf/`: |
| |
| linaro@t430:~/linaro/linux-kernel$ ldd tools/perf/perf |
| linux-vdso.so.1 => (0x00007fff135db000) |
| libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f15f9176000) |
| librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f15f8f6e000) |
| libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f15f8c64000) |
| libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f15f8a60000) |
| libopencsd_c_api.so => /usr/lib/libopencsd_c_api.so (0x00007f15f884e000) <------- |
| libelf.so.1 => /usr/lib/x86_64-linux-gnu/libelf.so.1 (0x00007f15f8635000) |
| libdw.so.1 => /usr/lib/x86_64-linux-gnu/libdw.so.1 (0x00007f15f83ec000) |
| libaudit.so.1 => /lib/x86_64-linux-gnu/libaudit.so.1 (0x00007f15f81c5000) |
| libslang.so.2 => /lib/x86_64-linux-gnu/libslang.so.2 (0x00007f15f7e38000) |
| libperl.so.5.22 => /usr/lib/x86_64-linux-gnu/libperl.so.5.22 (0x00007f15f7a5d000) |
| libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f15f7693000) |
| libpython2.7.so.1.0 => /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0 (0x00007f15f7104000) |
| libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f15f6eea000) |
| /lib64/ld-linux-x86-64.so.2 (0x0000559b88038000) |
| libopencsd.so => /usr/lib/libopencsd.so (0x00007f15f6c62000) <------- |
| libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f15f68df000) |
| libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f15f66c9000) |
| liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 (0x00007f15f64a6000) |
| libbz2.so.1.0 => /lib/x86_64-linux-gnu/libbz2.so.1.0 (0x00007f15f6296000) |
| libcrypt.so.1 => /lib/x86_64-linux-gnu/libcrypt.so.1 (0x00007f15f605e000) |
| libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007f15f5e5a000) |
| |
| |
| Additional debug output from the decoder can be compiled in by setting the |
| `CSTRACE_RAW` environment variable. Setting this to `packed` gets trace frame |
| output as follows:- |
| |
| Frame Data; Index 576; RAW_PACKED; d6 d6 d6 d6 d6 d6 d6 d6 fc fb d6 d6 d6 d6 e0 7f |
| Frame Data; Index 576; ID_DATA[0x14]; d7 d6 d7 d6 d7 d6 d7 d6 fd fb d7 d6 d7 d6 e0 |
| |
| Set to any other value will remove the RAW_PACKED lines. |
| |
| Working with an alternate version of the openCSD library |
| -------------------------------------------------------- |
| When compiling the perf tools it is possible to reference another version of |
| the openCSD library than the one installed on the system. This is useful when |
| working with multiple development trees or having the desire to keep system |
| libraries intact. Two environment variable are available to tell the perf tools |
| build script where to get the header file and libraries, namely CSINCLUDES and |
| CSLIBS: |
| |
| linaro@t430:~/linaro/linux-kernel$ export CSINCLUDES=~/linaro/coresight/my-opencsd/decoder/include/ |
| linaro@t430:~/linaro/linux-kernel$ export CSLIBS=~/linaro/coresight/my-opencsd/decoder/lib/builddir/ |
| linaro@t430:~/linaro/linux-kernel$ make CORESIGHT=1 VF=1 -C tools/perf |
| |
| This will have the effect of compiling and linking against the provided library. |
| Since the system's openCSD library is in the loader's search patch the |
| LD_LIBRARY_PATH environment variable needs to be set. |
| |
| linaro@t430:~/linaro/linux-kernel$ export LD_LIBRARY_PATH=$CSLIBS |
| |
| |
| Trace Decoding with Perf Report |
| ------------------------------- |
| Before working with custom traces it is suggested to use a trace bundle that |
| is known to be working properly. A sample bundle has been made available |
| here [2]. Trace bundles can be extracted anywhere and have no dependencies on |
| where the perf tools and openCSD library have been compiled. |
| |
| linaro@t430:~/linaro/coresight$ mkdir sept20 |
| linaro@t430:~/linaro/coresight$ cd sept20 |
| linaro@t430:~/linaro/coresight/sept20$ wget http://people.linaro.org/~mathieu.poirier/openCSD/uname.v4.user.sept20.tgz |
| linaro@t430:~/linaro/coresight/sept20$ md5sum uname.v4.user.sept20.tgz |
| f53f11d687ce72bdbe9de2e67e960ec6 uname.v4.user.sept20.tgz |
| linaro@t430:~/linaro/coresight/sept20$ tar xf uname.v4.user.sept20.tgz |
| linaro@t430:~/linaro/coresight/sept20$ ls -la |
| total 1312 |
| drwxrwxr-x 3 linaro linaro 4096 Mar 3 10:26 . |
| drwxrwxr-x 5 linaro linaro 4096 Mar 3 10:13 .. |
| drwxr-xr-x 7 linaro linaro 4096 Feb 24 12:21 .debug |
| -rw------- 1 linaro linaro 78016 Feb 24 12:21 perf.data |
| -rw-rw-r-- 1 linaro linaro 1245881 Feb 24 12:25 uname.v4.user.sept20.tgz |
| |
| Perf is expecting files related to the trace capture (`perf.data`) to be located in the `buildid` directory. |
| By default this is under `~/.debug`. Alternatively the default `buildid` directory can be changed |
| using the command: |
| |
| perf config --system buildid.dir=/my/own/buildid/dir |
| |
| This example will remove the current `~/.debug` directory to be sure everything is clean. |
| |
| linaro@t430:~/linaro/coresight/sept20$ rm -rf ~/.debug |
| linaro@t430:~/linaro/coresight/sept20$ cp -dpR .debug ~/ |
| linaro@t430:~/linaro/coresight/sept20$ ../perf-opencsd-master/tools/perf/perf report --stdio |
| |
| # To display the perf.data header info, please use --header/--header-only options. |
| # |
| # |
| # Total Lost Samples: 0 |
| # |
| # Samples: 0 of event 'cs_etm//u' |
| # Event count (approx.): 0 |
| # |
| # Children Self Command Shared Object Symbol |
| # ........ ........ ....... ............. ...... |
| # |
| |
| |
| # Samples: 0 of event 'dummy:u' |
| # Event count (approx.): 0 |
| # |
| # Children Self Command Shared Object Symbol |
| # ........ ........ ....... ............. ...... |
| # |
| |
| |
| # Samples: 115K of event 'instructions:u' |
| # Event count (approx.): 522009 |
| # |
| # Children Self Command Shared Object Symbol |
| # ........ ........ ....... ................ ...................... |
| # |
| 4.13% 4.13% uname libc-2.21.so [.] 0x0000000000078758 |
| 3.81% 3.81% uname libc-2.21.so [.] 0x0000000000078e50 |
| 2.06% 2.06% uname libc-2.21.so [.] 0x00000000000fcaf4 |
| 1.65% 1.65% uname libc-2.21.so [.] 0x00000000000fcae4 |
| 1.59% 1.59% uname ld-2.21.so [.] 0x000000000000a7f4 |
| 1.50% 1.50% uname libc-2.21.so [.] 0x0000000000078e40 |
| 1.43% 1.43% uname libc-2.21.so [.] 0x00000000000fcac4 |
| 1.31% 1.31% uname libc-2.21.so [.] 0x000000000002f0c0 |
| 1.26% 1.26% uname ld-2.21.so [.] 0x0000000000016888 |
| 1.24% 1.24% uname libc-2.21.so [.] 0x0000000000078e7c |
| 1.24% 1.24% uname libc-2.21.so [.] 0x00000000000fcab8 |
| ... |
| |
| Additional data can be obtained, which contains a dump of the trace packets received using the command |
| |
| mjl@ubuntu-vbox:./perf-opencsd-master/coresight/tools/perf/perf report --stdio --dump |
| |
| resulting a large amount of data, trace looking like:- |
| |
| 0x618 [0x30]: PERF_RECORD_AUXTRACE size: 0x11ef0 offset: 0 ref: 0x4d881c1f13216016 idx: 0 tid: 15244 cpu: -1 |
| |
| . ... CoreSight ETM Trace data: size 73456 bytes |
| |
| 0: I_ASYNC : Alignment Synchronisation. |
| 12: I_TRACE_INFO : Trace Info. |
| 17: I_TRACE_ON : Trace On. |
| 18: I_ADDR_CTXT_L_64IS0 : Address & Context, Long, 64 bit, IS0.; Addr=0x0000007F89F24D80; Ctxt: AArch64,EL0, NS; |
| 28: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEEEEEEEEEEEEEEE |
| 29: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEEEEEEEEEEEEEEE |
| 30: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEEEEEEEEEEEEEEE |
| 32: I_ATOM_F6 : Atom format 6.; EEEEN |
| 33: I_ATOM_F1 : Atom format 1.; E |
| 34: I_EXCEPT : Exception.; Data Fault; Ret Addr Follows; |
| 36: I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.; Addr=0x0000007F89F2832C; |
| 45: I_ADDR_CTXT_L_64IS0 : Address & Context, Long, 64 bit, IS0.; Addr=0xFFFFFFC000083400; Ctxt: AArch64,EL1, NS; |
| 56: I_TRACE_ON : Trace On. |
| 57: I_ADDR_CTXT_L_64IS0 : Address & Context, Long, 64 bit, IS0.; Addr=0x0000007F89F2832C; Ctxt: AArch64,EL0, NS; |
| 68: I_ATOM_F3 : Atom format 3.; NEE |
| 69: I_ATOM_F3 : Atom format 3.; NEN |
| 70: I_ATOM_F3 : Atom format 3.; NNE |
| 71: I_ATOM_F5 : Atom format 5.; ENENE |
| 72: I_ATOM_F5 : Atom format 5.; NENEN |
| 73: I_ATOM_F5 : Atom format 5.; ENENE |
| 74: I_ATOM_F5 : Atom format 5.; NENEN |
| 75: I_ATOM_F5 : Atom format 5.; ENENE |
| 76: I_ATOM_F3 : Atom format 3.; NNE |
| 77: I_ATOM_F3 : Atom format 3.; NNE |
| 78: I_ATOM_F3 : Atom format 3.; NNE |
| 80: I_ATOM_F3 : Atom format 3.; NNE |
| 81: I_ATOM_F3 : Atom format 3.; ENN |
| 82: I_EXCEPT : Exception.; Data Fault; Ret Addr Follows; |
| 84: I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.; Addr=0x0000007F89F283F0; |
| 93: I_ADDR_CTXT_L_64IS0 : Address & Context, Long, 64 bit, IS0.; Addr=0xFFFFFFC000083400; Ctxt: AArch64,EL1, NS; |
| 104: I_TRACE_ON : Trace On. |
| 105: I_ADDR_CTXT_L_64IS0 : Address & Context, Long, 64 bit, IS0.; Addr=0x0000007F89F283F0; Ctxt: AArch64,EL0, NS; |
| 116: I_ATOM_F5 : Atom format 5.; NNNNN |
| 117: I_ATOM_F5 : Atom format 5.; NNNNN |
| |
| |
| Trace Decoding with Perf Script |
| ------------------------------- |
| Working with perf scripts needs more command line options but yields |
| interesting results. |
| |
| linaro@t430:~/linaro/coresight/sept20$ export EXEC_PATH=/home/linaro/coresight/perf-opencsd-master/tools/perf/ |
| linaro@t430:~/linaro/coresight/sept20$ export SCRIPT_PATH=$EXEC_PATH/scripts/python/ |
| linaro@t430:~/linaro/coresight/sept20$ export XTOOL_PATH=/your/aarch64/toolchain/path/bin/ |
| linaro@t430:~/linaro/coresight/sept20$ ../perf-opencsd-master/tools/perf/perf --exec-path=${EXEC_PATH} script --script=python:${SCRIPT_PATH}/arm-cs-trace-disasm.py -- -d ${XTOOL_PATH}/aarch64-linux-gnu-objdump |
| |
| 7f89f24d80: 910003e0 mov x0, sp |
| 7f89f24d84: 94000d53 bl 7f89f282d0 <free@plt+0x3790> |
| 7f89f282d0: d11203ff sub sp, sp, #0x480 |
| 7f89f282d4: a9ba7bfd stp x29, x30, [sp,#-96]! |
| 7f89f282d8: 910003fd mov x29, sp |
| 7f89f282dc: a90363f7 stp x23, x24, [sp,#48] |
| 7f89f282e0: 9101e3b7 add x23, x29, #0x78 |
| 7f89f282e4: a90573fb stp x27, x28, [sp,#80] |
| 7f89f282e8: a90153f3 stp x19, x20, [sp,#16] |
| 7f89f282ec: aa0003fb mov x27, x0 |
| 7f89f282f0: 910a82e1 add x1, x23, #0x2a0 |
| 7f89f282f4: a9025bf5 stp x21, x22, [sp,#32] |
| 7f89f282f8: a9046bf9 stp x25, x26, [sp,#64] |
| 7f89f282fc: 910102e0 add x0, x23, #0x40 |
| 7f89f28300: f800841f str xzr, [x0],#8 |
| 7f89f28304: eb01001f cmp x0, x1 |
| 7f89f28308: 54ffffc1 b.ne 7f89f28300 <free@plt+0x37c0> |
| 7f89f28300: f800841f str xzr, [x0],#8 |
| 7f89f28304: eb01001f cmp x0, x1 |
| 7f89f28308: 54ffffc1 b.ne 7f89f28300 <free@plt+0x37c0> |
| 7f89f28300: f800841f str xzr, [x0],#8 |
| 7f89f28304: eb01001f cmp x0, x1 |
| 7f89f28308: 54ffffc1 b.ne 7f89f28300 <free@plt+0x37c0> |
| |
| Kernel Trace Decoding |
| --------------------- |
| |
| When dealing with kernel space traces the vmlinux file has to be communicated |
| explicitely to perf using the "--vmlinux" command line option: |
| |
| linaro@t430:~/linaro/coresight/sept20$ ../perf-opencsd-master/tools/perf/perf report --stdio --vmlinux=./vmlinux |
| ... |
| ... |
| linaro@t430:~/linaro/coresight/sept20$ ../perf-opencsd-master/tools/perf/perf script --vmlinux=./vmlinux |
| |
| When using scripts things get a little more convoluted. Using the same example |
| an above but for traces but for kernel traces, the command line becomes: |
| |
| linaro@t430:~/linaro/coresight/sept20$ export EXEC_PATH=/home/linaro/coresight/perf-opencsd-master/tools/perf/ |
| linaro@t430:~/linaro/coresight/sept20$ export SCRIPT_PATH=$EXEC_PATH/scripts/python/ |
| linaro@t430:~/linaro/coresight/sept20$ export XTOOL_PATH=/your/aarch64/toolchain/path/bin/ |
| linaro@t430:~/linaro/coresight/sept20$ ../perf-opencsd-master/tools/perf/perf --exec-path=${EXEC_PATH} script \ |
| --vmlinux=./vmlinux \ |
| --script=python:${SCRIPT_PATH}/arm-cs-trace-disasm.py -- \ |
| -d ${XTOOLS_PATH}/aarch64-linux-gnu-objdump \ |
| -k ./vmlinux |
| ... |
| ... |
| |
| The option "--vmlinux=./vmlinux" is interpreted by the "perf script" command |
| the same way it if for "perf report". The option "-k ./vmlinux" is dependant |
| on the script being executed and has no related to the "--vmlinux", though it |
| is highly advised to keep them synchronized. |
| |
| |
| Perf Test Environment Scripts |
| ----------------------------- |
| |
| The decoder library comes with a number of `bash` scripts that ease the setting up of the |
| offline build and test environment for perf, and executing tests. |
| |
| These scripts can be found in |
| |
| decoder/tests/perf-test-scripts |
| |
| There are three scripts provided: |
| |
| - `perf-setup-env.bash` : this sets up all the environment variables mentioned above. |
| - `perf-test-report.bash` : this runs `perf report` - using the environment setup by `perf-setup-env.bash` |
| - `perf-test-script.bash` : this runs `perf script` - using the environment setup by `perf-setup-env.bash` |
| |
| Use as follows:- |
| |
| 1. Prior to building perf, edit `perf-setup-env.bash` to conform to your environment. There are four lines at the top of the file that will require editing. |
| |
| 2. Execute the script using the command: |
| |
| source perf-setup-env.bash |
| |
| This will set up a perf execute environment for using the perf report and script commands. |
| |
| Alternatively use the command: |
| |
| source perf-setup-env.base buildenv |
| |
| This will add in the build environment variables mentioned in the sections on building above alongside the |
| environment for using the used by the `perf-test...` scripts to run the tests. |
| |
| 3. Build perf as described above. |
| 4. Follow the instructions for downloading the test capture, or create a capture from your target. |
| 5. Copy the `perf-test...` scripts into the capture data directory -> the one that contains `perf.data`. |
| |
| 6. The scripts can now be run. No options are required for the default operation, but any command line options will be added to the perf report / perf script command line. |
| |
| e.g. |
| |
| ./perf-test-report.bash --dump |
| |
| will add the --dump option to the end of the command line and run |
| |
| ${PERF_EXEC_PATH}/perf report --stdio --dump |
| |
| |
| Generating coverage files for Feedback Directed Optimization: AutoFDO |
| --------------------------------------------------------------------- |
| |
| See autofdo.md (@ref AutoFDO) for details and scripts. |
| |
| |
| The Linaro CoreSight Team |
| ------------------------- |
| - Mike Leach |
| - Mathieu Poirier |
| |
| |
| One Last Thing |
| -------------- |
| We welcome help on this project. If you would like to add features or help |
| improve the way things work, we want to hear from you. |
| |
| Best regards, |
| *The Linaro CoreSight Team* |
| |
| -------------------------------------- |
| [1]: https://github.com/Linaro/OpenCSD |
| |
| [2]: http://people.linaro.org/~mathieu.poirier/openCSD/uname.v4.user.sept20.tgz |