Create a known-patchable sequence for rdtsc trapping I was looking some disassembly that looked like this: ``` 0x00007f5d7d0e7ffd <+29>: 48 89 7c 24 10 mov %rdi,0x10(%rsp) 0x00007f5d7d0e8002 <+34>: 4c 8d 64 24 20 lea 0x20(%rsp),%r12 0x00007f5d7d0e8007 <+39>: eb 37 jmp 0x7f5d7d0e8040 <_ZN5tracy8Profiler14CalibrateDelayEv+96> 0x00007f5d7d0e8009 <+41>: 0f 1f 80 00 00 00 00 nopl 0x0(%rax) 0x00007f5d7d0e8010 <+48>: e9 03 8b 1a 3c jmpq 0x7f5db9290b18 0x00007f5d7d0e8015 <+53>: 90 nop 0x00007f5d7d0e8016 <+54>: 4c 8d 2c 02 lea (%rdx,%rax,1),%r13 0x00007f5d7d0e801a <+58>: e8 31 60 fc ff callq 0x7f5d7d0ae050 <_ZN5tracy28HardwareSupportsInvariantTSCEv@plt> 0x00007f5d7d0e801f <+63>: 84 c0 test %al,%al 0x00007f5d7d0e8021 <+65>: 74 4a je 0x7f5d7d0e806d <_ZN5tracy8Profiler14CalibrateDelayEv+141> => 0x00007f5d7d0e8023 <+67>: 0f 31 rdtsc 0x00007f5d7d0e8025 <+69>: 48 c1 e2 20 shl $0x20,%rdx ``` We do have a syscallbuf template for `shl $0x20,%rdx`. Irritatingly, however, the `7c 24` at the top trigger's rr's interfering branch heuristic by false-positive coincidence. Even more unfortunately, this is a calibration loop for a tracing library, so it does actually just call `rdtsc` in a loop a bunch. Additionally this is shipped as a binary, so every user is hitting this unfortunate coincidence. Of course, since we're shipping the library, I can suggest we modify the source to make it more friendly to rr patching. That said, I wasn't quite sure what would be best to suggest here. At first I thought, I could simply insert a bunch of nops after. Unfortunately, that doesn't actually do anything, because the branch could still be interfering into the middle of the nop sled. Then I thought maybe a singular large `nop` would be sufficient, but of course, there could be something that intentionally conditionally skips the rdtsc. In this PR, I propose that we make the following sequence known-safe: ``` nopl 0(%ax, %ax, 1) # single instruction, 5-byte nop rdtsc ``` This currently wouldn't quite work, because an interfering jump to the `rdtsc` would simply hit the trailing nop padding, ignoring the rdtsc. However, if we slightly tweak the patch to instead use: ``` 1: jmp %hook [usual nop padding here] jmp 1b 2: ``` Then everything works out well. Our return address is past the entire patch region, so we return to the correct place and it doesn't matter whether we jump to the nop or to the instruction itself. Of course, and actual interfering branch over the nop is unlikely (though I supposed not impossible depending on what comes before), since interfering branches are supported, this nicely takes care of the spurious branch problem above (assuming the extra 5-byte nop is inserted).
rr is a lightweight tool for recording, replaying and debugging execution of applications (trees of processes and threads). Debugging extends gdb with very efficient reverse-execution, which in combination with standard gdb/x86 features like hardware data watchpoints, makes debugging much more fun. More information about the project, including instructions on how to install, run, and build rr, is at https://rr-project.org. The best technical overview is currently the paper Engineering Record And Replay For Deployability: Extended Technical Report.
Or go directly to the installation and building instructions.
Please contribute! Make sure to review the pull request checklist before submitting a pull request.
If you find rr useful, please add a testimonial.
rr development is sponsored by Pernosco and was originated by Mozilla.
PTRACE_SETSIGMASK).