blob: 07f306c2e773b0c100fa3b649c7ee2b2d3df2478 [file] [log] [blame] [view]
Matthew Maurerbec0e9a2023-06-14 16:28:50 +00001xml-rs, an XML library for Rust
2===============================
3
4[![CI](https://github.com/kornelski/xml-rs/actions/workflows/main.yml/badge.svg)](https://github.com/kornelski/xml-rs/actions/workflows/main.yml)
5[![crates.io][crates-io-img]](https://lib.rs/crates/xml-rs)
6[![docs][docs-img]](https://docs.rs/xml-rs/)
7
8[Documentation](https://docs.rs/xml-rs/)
9
10 [crates-io-img]: https://img.shields.io/crates/v/xml-rs.svg
11 [docs-img]: https://img.shields.io/badge/docs-latest%20release-6495ed.svg
12
13xml-rs is an XML library for the [Rust](https://www.rust-lang.org/) programming language.
14It supports reading and writing of XML documents in a streaming fashion (without DOM).
15
16### Features
17
Matthew Maurerbec0e9a2023-06-14 16:28:50 +000018* XML spec conformance better than other pure-Rust libraries.
19
Jeff Vander Stoepbf787302023-12-04 10:36:33 +010020* Easy to use API based on `Iterator`s and regular `String`s without tricky lifetimes.
21
Matthew Maurerbec0e9a2023-06-14 16:28:50 +000022* Support for UTF-16, UTF-8, ISO-8859-1, and ASCII encodings.
23
Jeff Vander Stoepbf787302023-12-04 10:36:33 +010024* Written entirely in the safe Rust subset. Designed to safely handle untrusted input.
Matthew Maurerbec0e9a2023-06-14 16:28:50 +000025
26
27The API is heavily inspired by Java Streaming API for XML ([StAX][stax]). It contains a pull parser much like StAX event reader. It provides an iterator API, so you can leverage Rust's existing iterators library features.
28
29 [stax]: https://en.wikipedia.org/wiki/StAX
30
31It also provides a streaming document writer much like StAX event writer.
32This writer consumes its own set of events, but reader events can be converted to
33writer events easily, and so it is possible to write XML transformation chains in a pretty
34clean manner.
35
36This parser is mostly full-featured, however, there are limitations:
37* Legacy code pages and non-Unicode encodings are not supported;
38* DTD validation is not supported (but entities defined in the internal subset are supported);
39* attribute value normalization is not performed, and end-of-line characters are not normalized either.
40
41Other than that the parser tries to be mostly XML-1.1-compliant.
42
43Writer is also mostly full-featured with the following limitations:
44* no support for encodings other than UTF-8,
45* no support for emitting `<!DOCTYPE>` declarations;
46* more validations of input are needed, for example, checking that namespace prefixes are bounded
47 or comments are well-formed.
48
49Building and using
50------------------
51
52xml-rs uses [Cargo](https://crates.io), so add it with `cargo add xml` or modify `Cargo.toml`:
53
54```toml
55[dependencies]
Jeff Vander Stoepbf787302023-12-04 10:36:33 +010056xml = "0.8.16"
Matthew Maurerbec0e9a2023-06-14 16:28:50 +000057```
58
59The package exposes a single crate called `xml`.
60
61Reading XML documents
62---------------------
63
64[`xml::reader::EventReader`](EventReader) requires a [`Read`](stdread) instance to read from. It can be a `File` wrapped in `BufReader`, or a `Vec<u8>`, or a `&[u8]` slice.
65
66[EventReader]: https://docs.rs/xml-rs/latest/xml/reader/struct.EventReader.html
67[stdread]: https://doc.rust-lang.org/stable/std/io/trait.Read.html
68
69`EventReader` implements `IntoIterator` trait, so you can use it in a `for` loop directly:
70
71```rust,no_run
72use std::fs::File;
73use std::io::BufReader;
74
75use xml::reader::{EventReader, XmlEvent};
76
77fn main() -> std::io::Result<()> {
78 let file = File::open("file.xml")?;
79 let file = BufReader::new(file); // Buffering is important for performance
80
81 let parser = EventReader::new(file);
82 let mut depth = 0;
83 for e in parser {
84 match e {
85 Ok(XmlEvent::StartElement { name, .. }) => {
86 println!("{:spaces$}+{name}", "", spaces = depth * 2);
87 depth += 1;
88 }
89 Ok(XmlEvent::EndElement { name }) => {
90 depth -= 1;
91 println!("{:spaces$}-{name}", "", spaces = depth * 2);
92 }
93 Err(e) => {
94 eprintln!("Error: {e}");
95 break;
96 }
97 // There's more: https://docs.rs/xml-rs/latest/xml/reader/enum.XmlEvent.html
98 _ => {}
99 }
100 }
101
102 Ok(())
103}
104```
105
106Document parsing can end normally or with an error. Regardless of exact cause, the parsing
107process will be stopped, and the iterator will terminate normally.
108
109You can also have finer control over when to pull the next event from the parser using its own
110`next()` method:
111
112```rust,ignore
113match parser.next() {
114 ...
115}
116```
117
118Upon the end of the document or an error, the parser will remember the last event and will always
119return it in the result of `next()` call afterwards. If iterator is used, then it will yield
120error or end-of-document event once and will produce `None` afterwards.
121
122It is also possible to tweak parsing process a little using [`xml::reader::ParserConfig`][ParserConfig] structure.
123See its documentation for more information and examples.
124
125[ParserConfig]: https://docs.rs/xml-rs/latest/xml/reader/struct.ParserConfig.html
126
127You can find a more extensive example of using `EventReader` in `src/analyze.rs`, which is a
128small program (BTW, it is built with `cargo build` and can be run after that) which shows various
129statistics about specified XML document. It can also be used to check for well-formedness of
130XML documents - if a document is not well-formed, this program will exit with an error.
131
Jeff Vander Stoepbf787302023-12-04 10:36:33 +0100132
133## Parsing untrusted inputs
134
135The parser is written in safe Rust subset, so by Rust's guarantees the worst that it can do is to cause a panic.
136You can use `ParserConfig` to set limits on maximum lenghts of names, attributes, text, entities, etc.
137You should also set a maximum document size via `io::Read`'s [`take(max)`](https://doc.rust-lang.org/stable/std/io/trait.Read.html#method.take) method.
138
Matthew Maurerbec0e9a2023-06-14 16:28:50 +0000139Writing XML documents
140---------------------
141
142xml-rs also provides a streaming writer much like StAX event writer. With it you can write an
143XML document to any `Write` implementor.
144
145```rust,no_run
146use std::io;
147use xml::writer::{EmitterConfig, XmlEvent};
148
149/// A simple demo syntax where "+foo" makes `<foo>`, "-foo" makes `</foo>`
150fn make_event_from_line(line: &str) -> XmlEvent {
151 let line = line.trim();
152 if let Some(name) = line.strip_prefix("+") {
153 XmlEvent::start_element(name).into()
154 } else if line.starts_with("-") {
155 XmlEvent::end_element().into()
156 } else {
157 XmlEvent::characters(line).into()
158 }
159}
160
161fn main() -> io::Result<()> {
162 let input = io::stdin();
163 let output = io::stdout();
164 let mut writer = EmitterConfig::new()
165 .perform_indent(true)
166 .create_writer(output);
167
168 let mut line = String::new();
169 loop {
170 line.clear();
171 let bytes_read = input.read_line(&mut line)?;
172 if bytes_read == 0 {
173 break; // EOF
174 }
175
176 let event = make_event_from_line(&line);
177 if let Err(e) = writer.write(event) {
178 panic!("Write error: {e}")
179 }
180 }
181 Ok(())
182}
183```
184
185The code example above also demonstrates how to create a writer out of its configuration.
186Similar thing also works with `EventReader`.
187
188The library provides an XML event building DSL which helps to construct complex events,
189e.g. ones having namespace definitions. Some examples:
190
191```rust,ignore
192// <a:hello a:param="value" xmlns:a="urn:some:document">
193XmlEvent::start_element("a:hello").attr("a:param", "value").ns("a", "urn:some:document")
194
195// <hello b:config="name" xmlns="urn:default:uri">
196XmlEvent::start_element("hello").attr("b:config", "value").default_ns("urn:defaul:uri")
197
198// <![CDATA[some unescaped text]]>
199XmlEvent::cdata("some unescaped text")
200```
201
202Of course, one can create `XmlEvent` enum variants directly instead of using the builder DSL.
203There are more examples in [`xml::writer::XmlEvent`][XmlEvent] documentation.
204
205[XmlEvent]: https://docs.rs/xml-rs/latest/xml/reader/enum.XmlEvent.html
206
207The writer has multiple configuration options; see `EmitterConfig` documentation for more
208information.
209
210[EmitterConfig]: https://docs.rs/xml-rs/latest/xml/writer/struct.EmitterConfig.html
211
212Bug reports
213------------
214
215Please report issues at: <https://github.com/kornelski/xml-rs/issues>.
216