Create a known-patchable sequence for rdtsc trapping

I was looking some disassembly that looked like this:
```
   0x00007f5d7d0e7ffd <+29>:	48 89 7c 24 10	mov    %rdi,0x10(%rsp)
   0x00007f5d7d0e8002 <+34>:	4c 8d 64 24 20	lea    0x20(%rsp),%r12
   0x00007f5d7d0e8007 <+39>:	eb 37	jmp    0x7f5d7d0e8040 <_ZN5tracy8Profiler14CalibrateDelayEv+96>
   0x00007f5d7d0e8009 <+41>:	0f 1f 80 00 00 00 00	nopl   0x0(%rax)
   0x00007f5d7d0e8010 <+48>:	e9 03 8b 1a 3c	jmpq   0x7f5db9290b18
   0x00007f5d7d0e8015 <+53>:	90	nop
   0x00007f5d7d0e8016 <+54>:	4c 8d 2c 02	lea    (%rdx,%rax,1),%r13
   0x00007f5d7d0e801a <+58>:	e8 31 60 fc ff	callq  0x7f5d7d0ae050 <_ZN5tracy28HardwareSupportsInvariantTSCEv@plt>
   0x00007f5d7d0e801f <+63>:	84 c0	test   %al,%al
   0x00007f5d7d0e8021 <+65>:	74 4a	je     0x7f5d7d0e806d <_ZN5tracy8Profiler14CalibrateDelayEv+141>
=> 0x00007f5d7d0e8023 <+67>:	0f 31	rdtsc
   0x00007f5d7d0e8025 <+69>:	48 c1 e2 20	shl    $0x20,%rdx
```
We do have a syscallbuf template for `shl $0x20,%rdx`.
Irritatingly, however, the `7c 24` at the top trigger's rr's interfering branch heuristic
by false-positive coincidence. Even more unfortunately, this is a calibration loop
for a tracing library, so it does actually just call `rdtsc` in a loop a bunch.
Additionally this is shipped as a binary, so every user is hitting this unfortunate coincidence.
Of course, since we're shipping the library, I can suggest we modify the source to make
it more friendly to rr patching. That said, I wasn't quite sure what would be best to suggest here.

At first I thought, I could simply insert a bunch of nops after.
Unfortunately, that doesn't actually do anything, because the branch could still
be interfering into the middle of the nop sled.
Then I thought maybe a singular large `nop` would be sufficient, but of course,
there could be something that intentionally conditionally skips the rdtsc.

In this PR, I propose that we make the following sequence known-safe:
```
nopl 0(%ax, %ax, 1) # single instruction, 5-byte nop
rdtsc
```

This currently wouldn't quite work, because an interfering jump to the
`rdtsc` would simply hit the trailing nop padding, ignoring the rdtsc.
However, if we slightly tweak the patch to instead use:

```
1: jmp %hook
[usual nop padding here]
jmp 1b
2:
```

Then everything works out well. Our return address is past the entire patch
region, so we return to the correct place and it doesn't matter whether
we jump to the nop or to the instruction itself. Of course, and actual
interfering branch over the nop is unlikely (though I supposed not
impossible depending on what comes before), since interfering branches
are supported, this nicely takes care of the spurious branch problem
above (assuming the extra 5-byte nop is inserted).
5 files changed
tree: e42550bf468f13dcb076a2319fe64125decaf961
  1. .android/
  2. .buildkite/
  3. .github/
  4. include/
  5. release-process/
  6. scripts/
  7. snap/
  8. src/
  9. third-party/
  10. .clang-format
  11. .gitignore
  12. CMakeLists.txt
  13. CODE_OF_CONDUCT.md
  14. configure
  15. CONTRIBUTING.md
  16. LICENSE
  17. README.md
  18. rr.spec
  19. Vagrantfile
README.md

Overview

Build status

rr is a lightweight tool for recording, replaying and debugging execution of applications (trees of processes and threads). Debugging extends gdb with very efficient reverse-execution, which in combination with standard gdb/x86 features like hardware data watchpoints, makes debugging much more fun. More information about the project, including instructions on how to install, run, and build rr, is at https://rr-project.org. The best technical overview is currently the paper Engineering Record And Replay For Deployability: Extended Technical Report.

Or go directly to the installation and building instructions.

Please contribute! Make sure to review the pull request checklist before submitting a pull request.

If you find rr useful, please add a testimonial.

rr development is sponsored by Pernosco and was originated by Mozilla.

System requirements

  • Linux kernel ≥ 3.11 is required (for PTRACE_SETSIGMASK).
  • rr currently requires either:
  • Running in a VM guest is supported, as long as the VM supports virtualization of hardware performance counters. (VMware and KVM are known to work; Xen does not.)