Use the slice-by-four algorithm for CRC64.

Compared to the C version, this reads the input byte by byte
instead of four aligned bytes at a time. This is still faster
than the previous simpler version.

This code was adapted from XZ Utils by Brett Okken.
He also did benchmarking. Thanks!
1 file changed