| /*! |
| This library provides heavily optimized routines for string search primitives. |
| |
| # Overview |
| |
| This section gives a brief high level overview of what this crate offers. |
| |
| * The top-level module provides routines for searching for 1, 2 or 3 bytes |
| in the forward or reverse direction. When searching for more than one byte, |
| positions are considered a match if the byte at that position matches any |
| of the bytes. |
| * The [`memmem`] sub-module provides forward and reverse substring search |
| routines. |
| |
| In all such cases, routines operate on `&[u8]` without regard to encoding. This |
| is exactly what you want when searching either UTF-8 or arbitrary bytes. |
| |
| # Example: using `memchr` |
| |
| This example shows how to use `memchr` to find the first occurrence of `z` in |
| a haystack: |
| |
| ``` |
| use memchr::memchr; |
| |
| let haystack = b"foo bar baz quuz"; |
| assert_eq!(Some(10), memchr(b'z', haystack)); |
| ``` |
| |
| # Example: matching one of three possible bytes |
| |
| This examples shows how to use `memrchr3` to find occurrences of `a`, `b` or |
| `c`, starting at the end of the haystack. |
| |
| ``` |
| use memchr::memchr3_iter; |
| |
| let haystack = b"xyzaxyzbxyzc"; |
| |
| let mut it = memchr3_iter(b'a', b'b', b'c', haystack).rev(); |
| assert_eq!(Some(11), it.next()); |
| assert_eq!(Some(7), it.next()); |
| assert_eq!(Some(3), it.next()); |
| assert_eq!(None, it.next()); |
| ``` |
| |
| # Example: iterating over substring matches |
| |
| This example shows how to use the [`memmem`] sub-module to find occurrences of |
| a substring in a haystack. |
| |
| ``` |
| use memchr::memmem; |
| |
| let haystack = b"foo bar foo baz foo"; |
| |
| let mut it = memmem::find_iter(haystack, "foo"); |
| assert_eq!(Some(0), it.next()); |
| assert_eq!(Some(8), it.next()); |
| assert_eq!(Some(16), it.next()); |
| assert_eq!(None, it.next()); |
| ``` |
| |
| # Example: repeating a search for the same needle |
| |
| It may be possible for the overhead of constructing a substring searcher to be |
| measurable in some workloads. In cases where the same needle is used to search |
| many haystacks, it is possible to do construction once and thus to avoid it for |
| subsequent searches. This can be done with a [`memmem::Finder`]: |
| |
| ``` |
| use memchr::memmem; |
| |
| let finder = memmem::Finder::new("foo"); |
| |
| assert_eq!(Some(4), finder.find(b"baz foo quux")); |
| assert_eq!(None, finder.find(b"quux baz bar")); |
| ``` |
| |
| # Why use this crate? |
| |
| At first glance, the APIs provided by this crate might seem weird. Why provide |
| a dedicated routine like `memchr` for something that could be implemented |
| clearly and trivially in one line: |
| |
| ``` |
| fn memchr(needle: u8, haystack: &[u8]) -> Option<usize> { |
| haystack.iter().position(|&b| b == needle) |
| } |
| ``` |
| |
| Or similarly, why does this crate provide substring search routines when Rust's |
| core library already provides them? |
| |
| ``` |
| fn search(haystack: &str, needle: &str) -> Option<usize> { |
| haystack.find(needle) |
| } |
| ``` |
| |
| The primary reason for both of them to exist is performance. When it comes to |
| performance, at a high level at least, there are two primary ways to look at |
| it: |
| |
| * **Throughput**: For this, think about it as, "given some very large haystack |
| and a byte that never occurs in that haystack, how long does it take to |
| search through it and determine that it, in fact, does not occur?" |
| * **Latency**: For this, think about it as, "given a tiny haystack---just a |
| few bytes---how long does it take to determine if a byte is in it?" |
| |
| The `memchr` routine in this crate has _slightly_ worse latency than the |
| solution presented above, however, its throughput can easily be over an |
| order of magnitude faster. This is a good general purpose trade off to make. |
| You rarely lose, but often gain big. |
| |
| **NOTE:** The name `memchr` comes from the corresponding routine in `libc`. A |
| key advantage of using this library is that its performance is not tied to its |
| quality of implementation in the `libc` you happen to be using, which can vary |
| greatly from platform to platform. |
| |
| But what about substring search? This one is a bit more complicated. The |
| primary reason for its existence is still indeed performance, but it's also |
| useful because Rust's core library doesn't actually expose any substring |
| search routine on arbitrary bytes. The only substring search routine that |
| exists works exclusively on valid UTF-8. |
| |
| So if you have valid UTF-8, is there a reason to use this over the standard |
| library substring search routine? Yes. This routine is faster on almost every |
| metric, including latency. The natural question then, is why isn't this |
| implementation in the standard library, even if only for searching on UTF-8? |
| The reason is that the implementation details for using SIMD in the standard |
| library haven't quite been worked out yet. |
| |
| **NOTE:** Currently, only `x86_64`, `wasm32` and `aarch64` targets have vector |
| accelerated implementations of `memchr` (and friends) and `memmem`. |
| |
| # Crate features |
| |
| * **std** - When enabled (the default), this will permit features specific to |
| the standard library. Currently, the only thing used from the standard library |
| is runtime SIMD CPU feature detection. This means that this feature must be |
| enabled to get AVX2 accelerated routines on `x86_64` targets without enabling |
| the `avx2` feature at compile time, for example. When `std` is not enabled, |
| this crate will still attempt to use SSE2 accelerated routines on `x86_64`. It |
| will also use AVX2 accelerated routines when the `avx2` feature is enabled at |
| compile time. In general, enable this feature if you can. |
| * **alloc** - When enabled (the default), APIs in this crate requiring some |
| kind of allocation will become available. For example, the |
| [`memmem::Finder::into_owned`](crate::memmem::Finder::into_owned) API and the |
| [`arch::all::shiftor`](crate::arch::all::shiftor) substring search |
| implementation. Otherwise, this crate is designed from the ground up to be |
| usable in core-only contexts, so the `alloc` feature doesn't add much |
| currently. Notably, disabling `std` but enabling `alloc` will **not** result |
| in the use of AVX2 on `x86_64` targets unless the `avx2` feature is enabled |
| at compile time. (With `std` enabled, AVX2 can be used even without the `avx2` |
| feature enabled at compile time by way of runtime CPU feature detection.) |
| * **logging** - When enabled (disabled by default), the `log` crate is used |
| to emit log messages about what kinds of `memchr` and `memmem` algorithms |
| are used. Namely, both `memchr` and `memmem` have a number of different |
| implementation choices depending on the target and CPU, and the log messages |
| can help show what specific implementations are being used. Generally, this is |
| useful for debugging performance issues. |
| * **libc** - **DEPRECATED**. Previously, this enabled the use of the target's |
| `memchr` function from whatever `libc` was linked into the program. This |
| feature is now a no-op because this crate's implementation of `memchr` should |
| now be sufficiently fast on a number of platforms that `libc` should no longer |
| be needed. (This feature is somewhat of a holdover from this crate's origins. |
| Originally, this crate was literally just a safe wrapper function around the |
| `memchr` function from `libc`.) |
| */ |
| |
| #![deny(missing_docs)] |
| #![no_std] |
| // It's just not worth trying to squash all dead code warnings. Pretty |
| // unfortunate IMO. Not really sure how to fix this other than to either |
| // live with it or sprinkle a whole mess of `cfg` annotations everywhere. |
| #![cfg_attr( |
| not(any( |
| all(target_arch = "x86_64", target_feature = "sse2"), |
| all(target_arch = "wasm32", target_feature = "simd128"), |
| target_arch = "aarch64", |
| )), |
| allow(dead_code) |
| )] |
| // Same deal for miri. |
| #![cfg_attr(miri, allow(dead_code, unused_macros))] |
| |
| // Supporting 8-bit (or others) would be fine. If you need it, please submit a |
| // bug report at https://github.com/BurntSushi/memchr |
| #[cfg(not(any( |
| target_pointer_width = "16", |
| target_pointer_width = "32", |
| target_pointer_width = "64" |
| )))] |
| compile_error!("memchr currently not supported on non-{16,32,64}"); |
| |
| #[cfg(any(test, feature = "std"))] |
| extern crate std; |
| |
| #[cfg(any(test, feature = "alloc"))] |
| extern crate alloc; |
| |
| pub use crate::memchr::{ |
| memchr, memchr2, memchr2_iter, memchr3, memchr3_iter, memchr_iter, |
| memrchr, memrchr2, memrchr2_iter, memrchr3, memrchr3_iter, memrchr_iter, |
| Memchr, Memchr2, Memchr3, |
| }; |
| |
| #[macro_use] |
| mod macros; |
| |
| #[cfg(test)] |
| #[macro_use] |
| mod tests; |
| |
| pub mod arch; |
| mod cow; |
| mod ext; |
| mod memchr; |
| pub mod memmem; |
| mod vector; |