Dual-licensed under MIT or the UNLICENSE.
This shows how to convert a scalar value range (e.g., the basic multilingual plane) to a sequence of byte based character classes.
extern crate utf8_ranges; use utf8_ranges::Utf8Sequences; fn main() { for range in Utf8Sequences::new('\u{0}', '\u{FFFF}') { println!("{:?}", range); } }
The output:
[0-7F] [C2-DF][80-BF] [E0][A0-BF][80-BF] [E1-EC][80-BF][80-BF] [ED][80-9F][80-BF] [EE-EF][80-BF][80-BF]
These ranges can then be used to build an automaton. Namely: