A simple and general purpose html/xhtml parser lib/bin, using Pest.
<cat/>
, <Cat/>
and <C4-t/>
If your requirements matches any of the above, then you're most likely looking for one of the crates below:
Parse html file
html_parser index.html
Parse stdin with pretty output
curl <website> | html_parser -p
Parse html document
use html_parser::Dom; fn main() { let html = r#" <!doctype html> <html lang="en"> <head> <meta charset="utf-8"> <title>Html parser</title> </head> <body> <h1 id="a" class="b c">Hello world</h1> </h1> <!-- comments & dangling elements are ignored --> </body> </html>"#; assert!(Dom::parse(html).is_ok()); }
Parse html fragment
use html_parser::Dom; fn main() { let html = "<div id=cat />"; assert!(Dom::parse(html).is_ok()); }
Print to json
use html_parser::{Dom, Result}; fn main() -> Result<()> { let html = "<div id=cat />"; let json = Dom::parse(html)?.to_json_pretty()?; println!("{}", json); Ok(()) }