vendor/bstr-0.2.16/README.md - toolchain/rustc - Git at Google

 bstr
 ====
 This crate provides extension traits for `&[u8]` and `Vec<u8>` that enable
 their use as byte strings, where byte strings are _conventionally_ UTF-8. This
 differs from the standard library's `String` and `str` types in that they are
 not required to be valid UTF-8, but may be fully or partially valid UTF-8.

 [![Build status](https://github.com/BurntSushi/bstr/workflows/ci/badge.svg)](https://github.com/BurntSushi/bstr/actions)
 [![](https://meritbadge.herokuapp.com/bstr)](https://crates.io/crates/bstr)


 ### Documentation

 https://docs.rs/bstr


 ### When should I use byte strings?

 See this part of the documentation for more details:
 https://docs.rs/bstr/0.2.*/bstr/#when-should-i-use-byte-strings.

 The short story is that byte strings are useful when it is inconvenient or
 incorrect to require valid UTF-8.


 ### Usage

 Add this to your `Cargo.toml`:

 ```toml
 [dependencies]
 bstr = "0.2"
 ```


 ### Examples

 The following two examples exhibit both the API features of byte strings and
 the I/O convenience functions provided for reading line-by-line quickly.

 This first example simply shows how to efficiently iterate over lines in
 stdin, and print out lines containing a particular substring:

 ```rust
 use std::error::Error;
 use std::io::{self, Write};

 use bstr::{ByteSlice, io::BufReadExt};

 fn main() -> Result<(), Box<dyn Error>> {
     let stdin = io::stdin();
     let mut stdout = io::BufWriter::new(io::stdout());

     stdin.lock().for_byte_line_with_terminator(|line| {
         if line.contains_str("Dimension") {
             stdout.write_all(line)?;
         }
         Ok(true)
     })?;
     Ok(())
 }
 ```

 This example shows how to count all of the words (Unicode-aware) in stdin,
 line-by-line:

 ```rust
 use std::error::Error;
 use std::io;

 use bstr::{ByteSlice, io::BufReadExt};

 fn main() -> Result<(), Box<dyn Error>> {
     let stdin = io::stdin();
     let mut words = 0;
     stdin.lock().for_byte_line_with_terminator(|line| {
         words += line.words().count();
         Ok(true)
     })?;
     println!("{}", words);
     Ok(())
 }
 ```

 This example shows how to convert a stream on stdin to uppercase without
 performing UTF-8 validation _and_ amortizing allocation. On standard ASCII
 text, this is quite a bit faster than what you can (easily) do with standard
 library APIs. (N.B. Any invalid UTF-8 bytes are passed through unchanged.)

 ```rust
 use std::error::Error;
 use std::io::{self, Write};

 use bstr::{ByteSlice, io::BufReadExt};

 fn main() -> Result<(), Box<dyn Error>> {
     let stdin = io::stdin();
     let mut stdout = io::BufWriter::new(io::stdout());

     let mut upper = vec![];
     stdin.lock().for_byte_line_with_terminator(|line| {
         upper.clear();
         line.to_uppercase_into(&mut upper);
         stdout.write_all(&upper)?;
         Ok(true)
     })?;
     Ok(())
 }
 ```

 This example shows how to extract the first 10 visual characters (as grapheme
 clusters) from each line, where invalid UTF-8 sequences are generally treated
 as a single character and are passed through correctly:

 ```rust
 use std::error::Error;
 use std::io::{self, Write};

 use bstr::{ByteSlice, io::BufReadExt};

 fn main() -> Result<(), Box<dyn Error>> {
     let stdin = io::stdin();
     let mut stdout = io::BufWriter::new(io::stdout());

     stdin.lock().for_byte_line_with_terminator(|line| {
         let end = line
             .grapheme_indices()
             .map(|(_, end, _)| end)
             .take(10)
             .last()
             .unwrap_or(line.len());
         stdout.write_all(line[..end].trim_end())?;
         stdout.write_all(b"\n")?;
         Ok(true)
     })?;
     Ok(())
 }
 ```


 ### Cargo features

 This crates comes with a few features that control standard library, serde
 and Unicode support.

 * `std` - **Enabled** by default. This provides APIs that require the standard
   library, such as `Vec<u8>`.
 * `unicode` - **Enabled** by default. This provides APIs that require sizable
   Unicode data compiled into the binary. This includes, but is not limited to,
   grapheme/word/sentence segmenters. When this is disabled, basic support such
   as UTF-8 decoding is still included.
 * `serde1` - **Disabled** by default. Enables implementations of serde traits
   for the `BStr` and `BString` types.
 * `serde1-nostd` - **Disabled** by default. Enables implementations of serde
   traits for the `BStr` type only, intended for use without the standard
   library. Generally, you either want `serde1` or `serde1-nostd`, not both.


 ### Minimum Rust version policy

 This crate's minimum supported `rustc` version (MSRV) is `1.41.1`.

 In general, this crate will be conservative with respect to the minimum
 supported version of Rust. MSRV may be bumped in minor version releases.


 ### Future work

 Since this is meant to be a core crate, getting a `1.0` release is a priority.
 My hope is to move to `1.0` within the next year and commit to its API so that
 `bstr` can be used as a public dependency.

 A large part of the API surface area was taken from the standard library, so
 from an API design perspective, a good portion of this crate should be mature.
 The main differences from the standard library are in how the various substring
 search routines work. The standard library provides generic infrastructure for
 supporting different types of searches with a single method, where as this
 library prefers to define new methods for each type of search and drop the
 generic infrastructure.

 Some _probable_ future considerations for APIs include, but are not limited to:

 * A convenience layer on top of the `aho-corasick` crate.
 * Unicode normalization.
 * More sophisticated support for dealing with Unicode case, perhaps by
   combining the use cases supported by [`caseless`](https://docs.rs/caseless)
   and [`unicase`](https://docs.rs/unicase).
 * Add facilities for dealing with OS strings and file paths, probably via
   simple conversion routines.

 Here are some examples that are _probably_ out of scope for this crate:

 * Regular expressions.
 * Unicode collation.

 The exact scope isn't quite clear, but I expect we can iterate on it.

 In general, as stated below, this crate is an experiment in bringing lots of
 related APIs together into a single crate while simultaneously attempting to
 keep the total number of dependencies low. Indeed, every dependency of `bstr`,
 except for `memchr`, is optional.


 ### High level motivation

 Strictly speaking, the `bstr` crate provides very little that can't already be
 achieved with the standard library `Vec<u8>`/`&[u8]` APIs and the ecosystem of
 library crates. For example:

 * The standard library's
   [`Utf8Error`](https://doc.rust-lang.org/std/str/struct.Utf8Error.html)
   can be used for incremental lossy decoding of `&[u8]`.
 * The
   [`unicode-segmentation`](https://unicode-rs.github.io/unicode-segmentation/unicode_segmentation/index.html)
   crate can be used for iterating over graphemes (or words), but is only
   implemented for `&str` types. One could use `Utf8Error` above to implement
   grapheme iteration with the same semantics as what `bstr` provides (automatic
   Unicode replacement codepoint substitution).
 * The [`twoway`](https://docs.rs/twoway) crate can be used for
   fast substring searching on `&[u8]`.

 So why create `bstr`? Part of the point of the `bstr` crate is to provide a
 uniform API of coupled components instead of relying on users to piece together
 loosely coupled components from the crate ecosystem. For example, if you wanted
 to perform a search and replace in a `Vec<u8>`, then writing the code to do
 that with the `twoway` crate is not that difficult, but it's still additional
 glue code you have to write. This work adds up depending on what you're doing.
 Consider, for example, trimming and splitting, along with their different
 variants.

 In other words, `bstr` is partially a way of pushing back against the
 micro-crate ecosystem that appears to be evolving. It's not clear to me whether
 this experiment will be successful or not, but it is definitely a goal of
 `bstr` to keep its dependency list lightweight. For example, `serde` is an
 optional dependency because there is no feasible alternative, but `twoway` is
 not, where we instead prefer to implement our own substring search. In service
 of this philosophy, currently, the only required dependency of `bstr` is
 `memchr`.


 ### License

 This project is licensed under either of

  * Apache License, Version 2.0, ([LICENSE-APACHE](LICENSE-APACHE) or
    https://www.apache.org/licenses/LICENSE-2.0)
  * MIT license ([LICENSE-MIT](LICENSE-MIT) or
    https://opensource.org/licenses/MIT)

 at your option.

 The data in `src/unicode/data/` is licensed under the Unicode License Agreement
 ([LICENSE-UNICODE](https://www.unicode.org/copyright.html#License)), although
 this data is only used in tests.
	bstr
	====
	This crate provides extension traits for `&[u8]` and `Vec<u8>` that enable
	their use as byte strings, where byte strings are _conventionally_ UTF-8. This
	differs from the standard library's `String` and `str` types in that they are
	not required to be valid UTF-8, but may be fully or partially valid UTF-8.

	[![Build status](https://github.com/BurntSushi/bstr/workflows/ci/badge.svg)](https://github.com/BurntSushi/bstr/actions)
	[![](https://meritbadge.herokuapp.com/bstr)](https://crates.io/crates/bstr)


	### Documentation

	https://docs.rs/bstr


	### When should I use byte strings?

	See this part of the documentation for more details:
	https://docs.rs/bstr/0.2.*/bstr/#when-should-i-use-byte-strings.

	The short story is that byte strings are useful when it is inconvenient or
	incorrect to require valid UTF-8.


	### Usage

	Add this to your `Cargo.toml`:

	```toml
	[dependencies]
	bstr = "0.2"
	```


	### Examples

	The following two examples exhibit both the API features of byte strings and
	the I/O convenience functions provided for reading line-by-line quickly.

	This first example simply shows how to efficiently iterate over lines in
	stdin, and print out lines containing a particular substring:

	```rust
	use std::error::Error;
	use std::io::{self, Write};

	use bstr::{ByteSlice, io::BufReadExt};

	fn main() -> Result<(), Box<dyn Error>> {
	let stdin = io::stdin();
	let mut stdout = io::BufWriter::new(io::stdout());

	stdin.lock().for_byte_line_with_terminator(\|line\| {
	if line.contains_str("Dimension") {
	stdout.write_all(line)?;
	}
	Ok(true)
	})?;
	Ok(())
	}
	```

	This example shows how to count all of the words (Unicode-aware) in stdin,
	line-by-line:

	```rust
	use std::error::Error;
	use std::io;

	use bstr::{ByteSlice, io::BufReadExt};

	fn main() -> Result<(), Box<dyn Error>> {
	let stdin = io::stdin();
	let mut words = 0;
	stdin.lock().for_byte_line_with_terminator(\|line\| {
	words += line.words().count();
	Ok(true)
	})?;
	println!("{}", words);
	Ok(())
	}
	```

	This example shows how to convert a stream on stdin to uppercase without
	performing UTF-8 validation _and_ amortizing allocation. On standard ASCII
	text, this is quite a bit faster than what you can (easily) do with standard
	library APIs. (N.B. Any invalid UTF-8 bytes are passed through unchanged.)

	```rust
	use std::error::Error;
	use std::io::{self, Write};

	use bstr::{ByteSlice, io::BufReadExt};

	fn main() -> Result<(), Box<dyn Error>> {
	let stdin = io::stdin();
	let mut stdout = io::BufWriter::new(io::stdout());

	let mut upper = vec![];
	stdin.lock().for_byte_line_with_terminator(\|line\| {
	upper.clear();
	line.to_uppercase_into(&mut upper);
	stdout.write_all(&upper)?;
	Ok(true)
	})?;
	Ok(())
	}
	```

	This example shows how to extract the first 10 visual characters (as grapheme
	clusters) from each line, where invalid UTF-8 sequences are generally treated
	as a single character and are passed through correctly:

	```rust
	use std::error::Error;
	use std::io::{self, Write};

	use bstr::{ByteSlice, io::BufReadExt};

	fn main() -> Result<(), Box<dyn Error>> {
	let stdin = io::stdin();
	let mut stdout = io::BufWriter::new(io::stdout());

	stdin.lock().for_byte_line_with_terminator(\|line\| {
	let end = line
	.grapheme_indices()
	.map(\|(_, end, _)\| end)
	.take(10)
	.last()
	.unwrap_or(line.len());
	stdout.write_all(line[..end].trim_end())?;
	stdout.write_all(b"\n")?;
	Ok(true)
	})?;
	Ok(())
	}
	```


	### Cargo features

	This crates comes with a few features that control standard library, serde
	and Unicode support.

	* `std` - Enabled by default. This provides APIs that require the standard
	library, such as `Vec<u8>`.
	* `unicode` - Enabled by default. This provides APIs that require sizable
	Unicode data compiled into the binary. This includes, but is not limited to,
	grapheme/word/sentence segmenters. When this is disabled, basic support such
	as UTF-8 decoding is still included.
	* `serde1` - Disabled by default. Enables implementations of serde traits
	for the `BStr` and `BString` types.
	* `serde1-nostd` - Disabled by default. Enables implementations of serde
	traits for the `BStr` type only, intended for use without the standard
	library. Generally, you either want `serde1` or `serde1-nostd`, not both.


	### Minimum Rust version policy

	This crate's minimum supported `rustc` version (MSRV) is `1.41.1`.

	In general, this crate will be conservative with respect to the minimum
	supported version of Rust. MSRV may be bumped in minor version releases.


	### Future work

	Since this is meant to be a core crate, getting a `1.0` release is a priority.
	My hope is to move to `1.0` within the next year and commit to its API so that
	`bstr` can be used as a public dependency.

	A large part of the API surface area was taken from the standard library, so
	from an API design perspective, a good portion of this crate should be mature.
	The main differences from the standard library are in how the various substring
	search routines work. The standard library provides generic infrastructure for
	supporting different types of searches with a single method, where as this
	library prefers to define new methods for each type of search and drop the
	generic infrastructure.

	Some _probable_ future considerations for APIs include, but are not limited to:

	* A convenience layer on top of the `aho-corasick` crate.
	* Unicode normalization.
	* More sophisticated support for dealing with Unicode case, perhaps by
	combining the use cases supported by [`caseless`](https://docs.rs/caseless)
	and [`unicase`](https://docs.rs/unicase).
	* Add facilities for dealing with OS strings and file paths, probably via
	simple conversion routines.

	Here are some examples that are _probably_ out of scope for this crate:

	* Regular expressions.
	* Unicode collation.

	The exact scope isn't quite clear, but I expect we can iterate on it.

	In general, as stated below, this crate is an experiment in bringing lots of
	related APIs together into a single crate while simultaneously attempting to
	keep the total number of dependencies low. Indeed, every dependency of `bstr`,
	except for `memchr`, is optional.


	### High level motivation

	Strictly speaking, the `bstr` crate provides very little that can't already be
	achieved with the standard library `Vec<u8>`/`&[u8]` APIs and the ecosystem of
	library crates. For example:

	* The standard library's
	[`Utf8Error`](https://doc.rust-lang.org/std/str/struct.Utf8Error.html)
	can be used for incremental lossy decoding of `&[u8]`.
	* The
	[`unicode-segmentation`](https://unicode-rs.github.io/unicode-segmentation/unicode_segmentation/index.html)
	crate can be used for iterating over graphemes (or words), but is only
	implemented for `&str` types. One could use `Utf8Error` above to implement
	grapheme iteration with the same semantics as what `bstr` provides (automatic
	Unicode replacement codepoint substitution).
	* The [`twoway`](https://docs.rs/twoway) crate can be used for
	fast substring searching on `&[u8]`.

	So why create `bstr`? Part of the point of the `bstr` crate is to provide a
	uniform API of coupled components instead of relying on users to piece together
	loosely coupled components from the crate ecosystem. For example, if you wanted
	to perform a search and replace in a `Vec<u8>`, then writing the code to do
	that with the `twoway` crate is not that difficult, but it's still additional
	glue code you have to write. This work adds up depending on what you're doing.
	Consider, for example, trimming and splitting, along with their different
	variants.

	In other words, `bstr` is partially a way of pushing back against the
	micro-crate ecosystem that appears to be evolving. It's not clear to me whether
	this experiment will be successful or not, but it is definitely a goal of
	`bstr` to keep its dependency list lightweight. For example, `serde` is an
	optional dependency because there is no feasible alternative, but `twoway` is
	not, where we instead prefer to implement our own substring search. In service
	of this philosophy, currently, the only required dependency of `bstr` is
	`memchr`.


	### License

	This project is licensed under either of

	* Apache License, Version 2.0, ([LICENSE-APACHE](LICENSE-APACHE) or
	https://www.apache.org/licenses/LICENSE-2.0)
	* MIT license ([LICENSE-MIT](LICENSE-MIT) or
	https://opensource.org/licenses/MIT)

	at your option.

	The data in `src/unicode/data/` is licensed under the Unicode License Agreement
	([LICENSE-UNICODE](https://www.unicode.org/copyright.html#License)), although
	this data is only used in tests.