| # Optimizations |
| This document tracks which optimizations have been done after the initial implementation passed corpus tests and a good amount of fuzzing. |
| |
| ## Introducing more unsafe code: |
| These optimizations introduced more unsafe code. These should yield significant improvements, or else they are not really worth it. |
| |
| ### Optimizing bitreader with byteorder which uses ptr::copy_nonoverlapping |
| * Reverse bitreader_reversed::get_bits was identified by linux perf tool using about 36% of the whole time |
| * Benchmark: decode enwik9 |
| |
| * Before: about 14.7 seconds |
| * After: about 12.2 seconds with about 25% of the time used for get_bits() |
| |
| ### Optimizing decodebuffer::repeat with ptr::copy_nonoverlapping |
| * decodebuffer::repeate was identified by linux perf tool using about 28% of the whole time |
| * Benchmark: decode enwik9 |
| |
| * Before: about 9.9 seconds |
| * After: about 9.4 seconds |
| |
| ### Use custom ringbuffer in the decodebuffer |
| The decode buffer must be able to do two things efficiently |
| * Collect bytes from the front |
| * Copy bytes from the contents to the end |
| |
| The stdlibs VecDequeu and Vec can each do one but not the other efficiently. So a custom implementation of a ringbuffer was written. |
| |
| ## Introducing NO additional unsafe code |
| These are just nice to have |
| |
| ### Even better bitreaders |
| Studying this material lead to a big improvement in bitreader speed |
| * https://fgiesen.wordpress.com/2018/02/19/reading-bits-in-far-too-many-ways-part-1/ |
| * https://fgiesen.wordpress.com/2018/02/20/reading-bits-in-far-too-many-ways-part-2/ |