| Demonstrations of compactstall, the Linux eBPF/bcc version. |
| |
| |
| compactsnoop traces the compact zone system-wide, and print various details. |
| Example output (manual trigger by echo 1 > /proc/sys/vm/compact_memory): |
| |
| # ./compactsnoop |
| COMM PID NODE ZONE ORDER MODE LAT(ms) STATUS |
| zsh 23685 0 ZONE_DMA -1 SYNC 0.025 complete |
| zsh 23685 0 ZONE_DMA32 -1 SYNC 3.925 complete |
| zsh 23685 0 ZONE_NORMAL -1 SYNC 113.975 complete |
| zsh 23685 1 ZONE_NORMAL -1 SYNC 81.57 complete |
| zsh 23685 0 ZONE_DMA -1 SYNC 0.02 complete |
| zsh 23685 0 ZONE_DMA32 -1 SYNC 4.631 complete |
| zsh 23685 0 ZONE_NORMAL -1 SYNC 113.975 complete |
| zsh 23685 1 ZONE_NORMAL -1 SYNC 80.647 complete |
| zsh 23685 0 ZONE_DMA -1 SYNC 0.020 complete |
| zsh 23685 0 ZONE_DMA32 -1 SYNC 3.367 complete |
| zsh 23685 0 ZONE_NORMAL -1 SYNC 115.18 complete |
| zsh 23685 1 ZONE_NORMAL -1 SYNC 81.766 complete |
| zsh 23685 0 ZONE_DMA -1 SYNC 0.025 complete |
| zsh 23685 0 ZONE_DMA32 -1 SYNC 4.346 complete |
| zsh 23685 0 ZONE_NORMAL -1 SYNC 114.570 complete |
| zsh 23685 1 ZONE_NORMAL -1 SYNC 80.820 complete |
| zsh 23685 0 ZONE_DMA -1 SYNC 0.026 complete |
| zsh 23685 0 ZONE_DMA32 -1 SYNC 4.611 complete |
| zsh 23685 0 ZONE_NORMAL -1 SYNC 113.993 complete |
| zsh 23685 1 ZONE_NORMAL -1 SYNC 80.928 complete |
| zsh 23685 0 ZONE_DMA -1 SYNC 0.02 complete |
| zsh 23685 0 ZONE_DMA32 -1 SYNC 3.889 complete |
| zsh 23685 0 ZONE_NORMAL -1 SYNC 113.776 complete |
| zsh 23685 1 ZONE_NORMAL -1 SYNC 80.727 complete |
| ^C |
| |
| While tracing, the processes alloc pages due to memory fragmentation is too |
| serious to meet contiguous memory requirements in the system, compact zone |
| events happened, which will increase the waiting delay of the processes. |
| |
| compactsnoop can be useful for discovering when compact_stall(/proc/vmstat) |
| continues to increase, whether it is caused by some critical processes or not. |
| |
| The STATUS include (CentOS 7.6's kernel) |
| |
| compact_status = { |
| # COMPACT_SKIPPED: compaction didn't start as it was not possible or direct reclaim was more suitable |
| 0: "skipped", |
| # COMPACT_CONTINUE: compaction should continue to another pageblock |
| 1: "continue", |
| # COMPACT_PARTIAL: direct compaction partially compacted a zone and there are suitable pages |
| 2: "partial", |
| # COMPACT_COMPLETE: The full zone was compacted |
| 3: "complete", |
| } |
| |
| or (kernel 4.7 and above) |
| |
| compact_status = { |
| # COMPACT_NOT_SUITABLE_ZONE: For more detailed tracepoint output - internal to compaction |
| 0: "not_suitable_zone", |
| # COMPACT_SKIPPED: compaction didn't start as it was not possible or direct reclaim was more suitable |
| 1: "skipped", |
| # COMPACT_DEFERRED: compaction didn't start as it was deferred due to past failures |
| 2: "deferred", |
| # COMPACT_NOT_SUITABLE_PAGE: For more detailed tracepoint output - internal to compaction |
| 3: "no_suitable_page", |
| # COMPACT_CONTINUE: compaction should continue to another pageblock |
| 4: "continue", |
| # COMPACT_COMPLETE: The full zone was compacted scanned but wasn't successful to compact suitable pages. |
| 5: "complete", |
| # COMPACT_PARTIAL_SKIPPED: direct compaction has scanned part of the zone but wasn't successful to compact suitable pages. |
| 6: "partial_skipped", |
| # COMPACT_CONTENDED: compaction terminated prematurely due to lock contentions |
| 7: "contended", |
| # COMPACT_SUCCESS: direct compaction terminated after concluding that the allocation should now succeed |
| 8: "success", |
| } |
| |
| The -p option can be used to filter on a PID, which is filtered in-kernel. Here |
| I've used it with -T to print timestamps: |
| |
| # ./compactsnoop -Tp 24376 |
| TIME(s) COMM PID NODE ZONE ORDER MODE LAT(ms) STATUS |
| 101.364115000 zsh 24376 0 ZONE_DMA -1 SYNC 0.025 complete |
| 101.364555000 zsh 24376 0 ZONE_DMA32 -1 SYNC 3.925 complete |
| ^C |
| |
| This shows the zsh process allocs pages, and compact zone events happening, |
| and the delays are not affected much. |
| |
| A maximum tracing duration can be set with the -d option. For example, to trace |
| for 2 seconds: |
| |
| # ./compactsnoop -d 2 |
| COMM PID NODE ZONE ORDER MODE LAT(ms) STATUS |
| zsh 26385 0 ZONE_DMA -1 SYNC 0.025444 complete |
| ^C |
| |
| The -e option prints out extra columns |
| |
| # ./compactsnoop -e |
| COMM PID NODE ZONE ORDER MODE FRAGIDX MIN LOW HIGH FREE LAT(ms) STATUS |
| summ 28276 1 ZONE_NORMAL 3 ASYNC 0.728 11284 14105 16926 14193 3.58 partial |
| summ 28276 0 ZONE_NORMAL 2 ASYNC -1.000 11043 13803 16564 14479 0.0 complete |
| summ 28276 1 ZONE_NORMAL 2 ASYNC -1.000 11284 14105 16926 14785 0.019 complete |
| summ 28276 0 ZONE_NORMAL 2 ASYNC -1.000 11043 13803 16564 15199 0.006 partial |
| summ 28276 1 ZONE_NORMAL 2 ASYNC -1.000 11284 14105 16926 17360 0.030 complete |
| summ 28276 0 ZONE_NORMAL 2 ASYNC -1.000 11043 13803 16564 15443 0.024 complete |
| summ 28276 1 ZONE_NORMAL 2 ASYNC -1.000 11284 14105 16926 15634 0.018 complete |
| summ 28276 1 ZONE_NORMAL 3 ASYNC 0.832 11284 14105 16926 15301 0.006 partial |
| summ 28276 0 ZONE_NORMAL 2 ASYNC -1.000 11043 13803 16564 14774 0.005 partial |
| summ 28276 1 ZONE_NORMAL 3 ASYNC 0.733 11284 14105 16926 19888 0.012 partial |
| ^C |
| |
| The FRAGIDX is short for fragmentation index, which only makes sense if an |
| allocation of a requested size would fail. If that is true, the fragmentation |
| index indicates whether external fragmentation or a lack of memory was the |
| problem. The value can be used to determine if page reclaim or compaction |
| should be used. |
| |
| Index is between 0 and 1 so return within 3 decimal places |
| |
| 0 => allocation would fail due to lack of memory |
| 1 => allocation would fail due to fragmentation |
| |
| We can see the whole buddy's fragmentation index from /sys/kernel/debug/extfrag/extfrag_index |
| |
| The MIN/LOW/HIGH shows the watermarks of the zone, which can also get from |
| /proc/zoneinfo, and FREE means nr_free_pages (can be found in /proc/zoneinfo too). |
| |
| |
| The -K option prints out kernel stack |
| |
| # ./compactsnoop -K -e |
| |
| summ 28276 0 ZONE_NORMAL 3 ASYNC 0.528 11043 13803 16564 22654 13.258 partial |
| kretprobe_trampoline+0x0 |
| try_to_compact_pages+0x121 |
| __alloc_pages_direct_compact+0xac |
| __alloc_pages_slowpath+0x3e9 |
| __alloc_pages_nodemask+0x404 |
| alloc_pages_current+0x98 |
| new_slab+0x2c5 |
| ___slab_alloc+0x3ac |
| __slab_alloc+0x40 |
| kmem_cache_alloc_node+0x8b |
| copy_process+0x18e |
| do_fork+0x91 |
| sys_clone+0x16 |
| stub_clone+0x44 |
| |
| summ 28276 1 ZONE_NORMAL 3 ASYNC -1.000 11284 14105 16926 22074 0.008 partial |
| kretprobe_trampoline+0x0 |
| try_to_compact_pages+0x121 |
| __alloc_pages_direct_compact+0xac |
| __alloc_pages_slowpath+0x3e9 |
| __alloc_pages_nodemask+0x404 |
| alloc_pages_current+0x98 |
| new_slab+0x2c5 |
| ___slab_alloc+0x3ac |
| __slab_alloc+0x40 |
| kmem_cache_alloc_node+0x8b |
| copy_process+0x18e |
| do_fork+0x91 |
| sys_clone+0x16 |
| stub_clone+0x44 |
| |
| summ 28276 0 ZONE_NORMAL 3 ASYNC 0.527 11043 13803 16564 25653 9.812 partial |
| kretprobe_trampoline+0x0 |
| try_to_compact_pages+0x121 |
| __alloc_pages_direct_compact+0xac |
| __alloc_pages_slowpath+0x3e9 |
| __alloc_pages_nodemask+0x404 |
| alloc_pages_current+0x98 |
| new_slab+0x2c5 |
| ___slab_alloc+0x3ac |
| __slab_alloc+0x40 |
| kmem_cache_alloc_node+0x8b |
| copy_process+0x18e |
| do_fork+0x91 |
| sys_clone+0x16 |
| stub_clone+0x44 |
| |
| # ./compactsnoop -h |
| usage: compactsnoop.py [-h] [-T] [-p PID] [-d DURATION] [-K] [-e] |
| |
| Trace compact zone |
| |
| optional arguments: |
| -h, --help show this help message and exit |
| -T, --timestamp include timestamp on output |
| -p PID, --pid PID trace this PID only |
| -d DURATION, --duration DURATION |
| total duration of trace in seconds |
| -K, --kernel-stack output kernel stack trace |
| -e, --extended_fields |
| show system memory state |
| |
| examples: |
| ./compactsnoop # trace all compact stall |
| ./compactsnoop -T # include timestamps |
| ./compactsnoop -d 10 # trace for 10 seconds only |
| ./compactsnoop -K # output kernel stack trace |
| ./compactsnoop -e # show extended fields |