| Demonstrations of funcinterval, the Linux eBPF/bcc version. |
| |
| eBPF/bcc is very suitable for platform performance tuning. |
| By funclatency, we can profile specific functions to know how latency |
| this function costs. However, sometimes performance drop is not about the |
| latency of function but the interval between function calls. |
| funcinterval is born for this purpose. |
| |
| Another story, hardware performance tuning on the platform we will use |
| protocol analyzer to analyze performance, but most protocol analyzers lack |
| the distribution feature. Using a protocol analyzer you need a lot of time |
| to check every detail latency. By funcinterval, we can save a lot of time |
| by distribution feature. |
| |
| For example: |
| |
| # ./funcinterval xhci_ring_ep_doorbell -d 2 -u |
| Tracing 1 functions for "xhci_ring_ep_doorbell"... Hit Ctrl-C to end. |
| |
| usecs : count distribution |
| 0 -> 1 : 0 | | |
| 2 -> 3 : 0 | | |
| 4 -> 7 : 0 | | |
| 8 -> 15 : 0 | | |
| 16 -> 31 : 0 | | |
| 32 -> 63 : 134 | | |
| 64 -> 127 : 2862 |******************** | |
| 128 -> 255 : 5552 |****************************************| |
| 256 -> 511 : 216 |* | |
| 512 -> 1023 : 2 | | |
| Detaching... |
| |
| This example output shows that the interval latency of xhci_ring_ep_doorbell |
| took between 64 and 255 microseconds. USB MAC will start its job after USB |
| doorbell register ringing, above information that can help hardware engineer to |
| analyze, the performance drop is because software rings the doorbell too |
| late or just slowly hardware DMA. |
| |
| # ./funcinterval blk_start_request -i 2 -u |
| Tracing 1 functions for "blk_start_request"... Hit Ctrl-C to end. |
| |
| usecs : count distribution |
| 0 -> 1 : 0 | | |
| 2 -> 3 : 0 | | |
| 4 -> 7 : 0 | | |
| 8 -> 15 : 5 |* | |
| 16 -> 31 : 0 | | |
| 32 -> 63 : 1 | | |
| 64 -> 127 : 2 | | |
| 128 -> 255 : 1 | | |
| 256 -> 511 : 0 | | |
| 512 -> 1023 : 1 | | |
| 1024 -> 2047 : 1 | | |
| 2048 -> 4095 : 117 |****************************************| |
| 4096 -> 8191 : 13 |**** | |
| 8192 -> 16383 : 1 | | |
| |
| If using biolatency tool that has no difference between two platforms. |
| Maybe the problem is related to the interval time instead of block |
| device I/O latency. |
| |
| # ./funcinterval ion_ioctl -i 2 -m |
| Tracing 1 functions for "ion_ioctl"... Hit Ctrl-C to end. |
| |
| msecs : count distribution |
| 0 -> 1 : 215 |****************************************| |
| 2 -> 3 : 0 | | |
| 4 -> 7 : 4 | | |
| 8 -> 15 : 5 | | |
| 16 -> 31 : 29 |***** | |
| |
| You can also check the ion_ioctl. By the above output, we know the activity |
| frequency of ion_ioctl() is high mostly(less than 1 ms), but has 29 times low |
| frequency. |
| |
| # ./funcinterval t:block:block_bio_queue -d 30 -u |
| Tracing tracepoint for "block:block_bio_queue"... Hit Ctrl-C to end. |
| |
| usecs : count distribution |
| 0 -> 1 : 0 | | |
| 2 -> 3 : 0 | | |
| 4 -> 7 : 0 | | |
| 8 -> 15 : 2 | | |
| 16 -> 31 : 262 | | |
| 32 -> 63 : 9075 |******************* | |
| 64 -> 127 : 18668 |****************************************| |
| 128 -> 255 : 1492 |*** | |
| 256 -> 511 : 2616 |***** | |
| 512 -> 1023 : 7226 |*************** | |
| 1024 -> 2047 : 8982 |******************* | |
| 2048 -> 4095 : 2394 |***** | |
| 4096 -> 8191 : 163 | | |
| 8192 -> 16383 : 42 | | |
| 16384 -> 32767 : 2 | | |
| 32768 -> 65535 : 0 | | |
| 65536 -> 131071 : 0 | | |
| 131072 -> 262143 : 0 | | |
| 262144 -> 524287 : 0 | | |
| 524288 -> 1048575 : 1 | | |
| Detaching... |
| |
| # ./funcinterval t:block:block_rq_issue -d 30 -u |
| Tracing tracepoint for "block:block_rq_issue"... Hit Ctrl-C to end. |
| |
| usecs : count distribution |
| 0 -> 1 : 0 | | |
| 2 -> 3 : 0 | | |
| 4 -> 7 : 0 | | |
| 8 -> 15 : 0 | | |
| 16 -> 31 : 5 | | |
| 32 -> 63 : 18 | | |
| 64 -> 127 : 32 | | |
| 128 -> 255 : 95 | | |
| 256 -> 511 : 2194 |****** | |
| 512 -> 1023 : 13830 |****************************************| |
| 1024 -> 2047 : 9001 |************************** | |
| 2048 -> 4095 : 1569 |**** | |
| 4096 -> 8191 : 96 | | |
| 8192 -> 16383 : 17 | | |
| Detaching... |
| |
| funcinterval also supports tracepoint filter. The above two cases are under EMMC |
| throughput testing, by those results you know which layer has a slower interval |
| time. In our case, mmc-cmdqd is slower than block layer. |
| |
| |
| # ./funcinterval -p `pidof -s top` c:malloc -i 3 |
| Tracing uprobe function for "malloc"... Hit Ctrl-C to end. |
| |
| nsecs : count distribution |
| 0 -> 1 : 0 | | |
| 2 -> 3 : 0 | | |
| 4 -> 7 : 0 | | |
| 8 -> 15 : 0 | | |
| 16 -> 31 : 0 | | |
| 32 -> 63 : 0 | | |
| 64 -> 127 : 0 | | |
| 128 -> 255 : 0 | | |
| 256 -> 511 : 0 | | |
| 512 -> 1023 : 0 | | |
| 1024 -> 2047 : 0 | | |
| 2048 -> 4095 : 0 | | |
| 4096 -> 8191 : 7 |************************* | |
| 8192 -> 16383 : 11 |****************************************| |
| 16384 -> 32767 : 4 |************** | |
| 32768 -> 65535 : 1 |*** | |
| 65536 -> 131071 : 1 |*** | |
| 131072 -> 262143 : 1 |*** | |
| 262144 -> 524287 : 0 | | |
| 524288 -> 1048575 : 0 | | |
| 1048576 -> 2097151 : 0 | | |
| 2097152 -> 4194303 : 0 | | |
| 4194304 -> 8388607 : 1 |*** | |
| |
| |
| nsecs : count distribution |
| 0 -> 1 : 0 | | |
| 2 -> 3 : 0 | | |
| 4 -> 7 : 0 | | |
| 8 -> 15 : 0 | | |
| 16 -> 31 : 0 | | |
| 32 -> 63 : 0 | | |
| 64 -> 127 : 0 | | |
| 128 -> 255 : 0 | | |
| 256 -> 511 : 0 | | |
| 512 -> 1023 : 0 | | |
| 1024 -> 2047 : 0 | | |
| 2048 -> 4095 : 0 | | |
| 4096 -> 8191 : 8 |******************************** | |
| 8192 -> 16383 : 10 |****************************************| |
| 16384 -> 32767 : 4 |**************** | |
| 32768 -> 65535 : 1 |**** | |
| 65536 -> 131071 : 1 |**** | |
| 131072 -> 262143 : 1 |**** | |
| 262144 -> 524287 : 0 | | |
| 524288 -> 1048575 : 0 | | |
| 1048576 -> 2097151 : 0 | | |
| 2097152 -> 4194303 : 0 | | |
| 4194304 -> 8388607 : 1 |**** | |
| |
| Time the interval of libc's malloc for top utility every 3 seconds. |
| |
| # ./funcinterval /usr/local/bin/python:main |
| Tracing uprobe function for "main"... Hit Ctrl-C to end. |
| ^C |
| nsecs : count distribution |
| 0 -> 1 : 0 | | |
| 2 -> 3 : 0 | | |
| 4 -> 7 : 0 | | |
| 8 -> 15 : 0 | | |
| 16 -> 31 : 0 | | |
| 32 -> 63 : 0 | | |
| 64 -> 127 : 0 | | |
| 128 -> 255 : 0 | | |
| 256 -> 511 : 0 | | |
| 512 -> 1023 : 0 | | |
| 1024 -> 2047 : 0 | | |
| 2048 -> 4095 : 0 | | |
| 4096 -> 8191 : 0 | | |
| 8192 -> 16383 : 0 | | |
| 16384 -> 32767 : 0 | | |
| 32768 -> 65535 : 0 | | |
| 65536 -> 131071 : 0 | | |
| 131072 -> 262143 : 0 | | |
| 262144 -> 524287 : 0 | | |
| 524288 -> 1048575 : 0 | | |
| 1048576 -> 2097151 : 0 | | |
| 2097152 -> 4194303 : 0 | | |
| 4194304 -> 8388607 : 0 | | |
| 8388608 -> 16777215 : 0 | | |
| 16777216 -> 33554431 : 0 | | |
| 33554432 -> 67108863 : 0 | | |
| 67108864 -> 134217727 : 0 | | |
| 134217728 -> 268435455 : 0 | | |
| 268435456 -> 536870911 : 1 |****************************************| |
| 536870912 -> 1073741823 : 1 |****************************************| |
| 1073741824 -> 2147483647 : 1 |****************************************| |
| 2147483648 -> 4294967295 : 1 |****************************************| |
| Detaching... |
| |
| Time the interal of python's main function. |
| |
| USAGE message: |
| |
| # ./funcinterval -h |
| usage: funcinterval [-h] [-p PID] [-i INTERVAL] [-d DURATION] [-T] [-u] [-m] |
| [-v] |
| pattern |
| |
| Time interval and print latency as a histogram |
| |
| positional arguments: |
| pattern Function name for tracing |
| |
| optional arguments: |
| -h, --help show this help message and exit |
| -p PID, --pid PID trace this PID only |
| -i INTERVAL, --interval INTERVAL |
| summary interval, in seconds |
| -d DURATION, --duration DURATION |
| total duration of trace, in seconds |
| -T, --timestamp include timestamp on output |
| -u, --microseconds microsecond histogram |
| -m, --milliseconds millisecond histogram |
| -v, --verbose print the BPF program (for debugging purposes) |
| |
| examples: |
| # time the interval of do_sys_open() |
| ./funcinterval do_sys_open |
| # time the interval of xhci_ring_ep_doorbell(), in microseconds |
| ./funcinterval -u xhci_ring_ep_doorbell |
| # time the interval of do_nanosleep(), in milliseconds |
| ./funcinterval -m do_nanosleep |
| # output every 5 seconds, with timestamps |
| ./funcinterval -mTi 5 vfs_read |
| # time process 181 only |
| ./funcinterval -p 181 vfs_read |
| # time the interval of mm_vmscan_direct_reclaim_begin tracepoint |
| ./funcinterval t:vmscan:mm_vmscan_direct_reclaim_begin |
| # time the interval of c:malloc used by top every 3 seconds |
| ./funcinterval -p `pidof -s top` -i 3 c:malloc |
| # time /usr/local/bin/python main function |
| ./funcinterval /usr/local/bin/python:main |