This is my scratch pad for optimization ideas. Some of this I will implement, some I have implemented, some are just speculative.
operation: fn extent_matched(potential_prefix: Scope, s: Scope) -> u8
idea: any differences are beyond the length of the prefix. figure this out by xor and then ctz/clz then a compare to the length (however that works).
XXXXYYYY00000000 # prefix XXXXYYYYZZZZ0000 # testee 00000000ZZZZ0000 # = xored XXXXYYYYQQQQ0000 # non-prefix XXXXYYYYZZZZ0000 # testee 00000000GGGG0000 # = xored XXXXQQQQ00000000 # non-prefix XXXXYYYYZZZZ0000 # testee 0000BBBBZZZZ0000 # = xored
# On stats branch $cargo run --release --example syncat testdata/jquery.js | grep cmiss | wc -l Running `target/release/examples/syncat testdata/jquery.js` 61266 $cargo run --release --example syncat testdata/jquery.js | grep ptoken | wc -l Compiling syntect v0.1.0 (file:///Users/tristan/Box/Dev/Projects/syntect) Running `target/release/examples/syncat testdata/jquery.js` 98714 $wc -l testdata/jquery.js 9210 testdata/jquery.js $cargo run --release --example syncat testdata/jquery.js | grep cclear | wc -l Compiling syntect v0.1.0 (file:///Users/tristan/Box/Dev/Projects/syntect) Running `target/release/examples/syncat testdata/jquery.js` 71302 $cargo run --release --example syncat testdata/jquery.js | grep freshcachetoken | wc -l Compiling syntect v0.1.0 (file:///Users/tristan/Box/Dev/Projects/syntect) Running `target/release/examples/syncat testdata/jquery.js` 80512 # On stats-2 branch $cargo run --example syncat testdata/jquery.js | grep cachehit | wc -l Running `target/debug/examples/syncat testdata/jquery.js` 527774 $cargo run --example syncat testdata/jquery.js | grep regsearch | wc -l Running `target/debug/examples/syncat testdata/jquery.js` 2862948 $cargo run --example syncat testdata/jquery.js | grep regmatch | wc -l Compiling syntect v0.6.0 (file:///Users/tristan/Box/Dev/Projects/syntect) Running `target/debug/examples/syncat testdata/jquery.js` 296127 $cargo run --example syncat testdata/jquery.js | grep leastmatch | wc -l Compiling syntect v0.6.0 (file:///Users/tristan/Box/Dev/Projects/syntect) Running `target/debug/examples/syncat testdata/jquery.js` 137842 # With search caching $cargo run --example syncat testdata/jquery.js | grep searchcached | wc -l Compiling syntect v0.6.0 (file:///Users/tristan/Box/Dev/Projects/syntect) Running `target/debug/examples/syncat testdata/jquery.js` 2440527 $cargo run --example syncat testdata/jquery.js | grep regsearch | wc -l Running `target/debug/examples/syncat testdata/jquery.js` 950195
Average unique regexes per line is 87.58, average non-unique is regsearch/lines = 317
Ideally we should have only a couple fresh cache searches per line, not ~10
like the stats show (freshcachetoken/linecount).
In a fantabulous world these stats mean a possible 10x speed improvement, but since caching does have a cost and we can't always cache it likely will be nice but not that high.