| prettyplease::unparse |
| ===================== |
| |
| [<img alt="github" src="https://img.shields.io/badge/github-dtolnay/prettyplease-8da0cb?style=for-the-badge&labelColor=555555&logo=github" height="20">](https://github.com/dtolnay/prettyplease) |
| [<img alt="crates.io" src="https://img.shields.io/crates/v/prettyplease.svg?style=for-the-badge&color=fc8d62&logo=rust" height="20">](https://crates.io/crates/prettyplease) |
| [<img alt="docs.rs" src="https://img.shields.io/badge/docs.rs-prettyplease-66c2a5?style=for-the-badge&labelColor=555555&logo=docs.rs" height="20">](https://docs.rs/prettyplease) |
| [<img alt="build status" src="https://img.shields.io/github/actions/workflow/status/dtolnay/prettyplease/ci.yml?branch=master&style=for-the-badge" height="20">](https://github.com/dtolnay/prettyplease/actions?query=branch%3Amaster) |
| |
| A minimal `syn` syntax tree pretty-printer. |
| |
| <br> |
| |
| ## Overview |
| |
| This is a pretty-printer to turn a `syn` syntax tree into a `String` of |
| well-formatted source code. In contrast to rustfmt, this library is intended to |
| be suitable for arbitrary generated code. |
| |
| Rustfmt prioritizes high-quality output that is impeccable enough that you'd be |
| comfortable spending your career staring at its output — but that means |
| some heavyweight algorithms, and it has a tendency to bail out on code that is |
| hard to format (for example [rustfmt#3697], and there are dozens more issues |
| like it). That's not necessarily a big deal for human-generated code because |
| when code gets highly nested, the human will naturally be inclined to refactor |
| into more easily formattable code. But for generated code, having the formatter |
| just give up leaves it totally unreadable. |
| |
| [rustfmt#3697]: https://github.com/rust-lang/rustfmt/issues/3697 |
| |
| This library is designed using the simplest possible algorithm and data |
| structures that can deliver about 95% of the quality of rustfmt-formatted |
| output. In my experience testing real-world code, approximately 97-98% of output |
| lines come out identical between rustfmt's formatting and this crate's. The rest |
| have slightly different linebreak decisions, but still clearly follow the |
| dominant modern Rust style. |
| |
| The tradeoffs made by this crate are a good fit for generated code that you will |
| *not* spend your career staring at. For example, the output of `bindgen`, or the |
| output of `cargo-expand`. In those cases it's more important that the whole |
| thing be formattable without the formatter giving up, than that it be flawless. |
| |
| <br> |
| |
| ## Feature matrix |
| |
| Here are a few superficial comparisons of this crate against the AST |
| pretty-printer built into rustc, and rustfmt. The sections below go into more |
| detail comparing the output of each of these libraries. |
| |
| | | prettyplease | rustc | rustfmt | |
| |:---|:---:|:---:|:---:| |
| | non-pathological behavior on big or generated code | 💚 | ❌ | ❌ | |
| | idiomatic modern formatting ("locally indistinguishable from rustfmt") | 💚 | ❌ | 💚 | |
| | throughput | 60 MB/s | 39 MB/s | 2.8 MB/s | |
| | number of dependencies | 3 | 72 | 66 | |
| | compile time including dependencies | 2.4 sec | 23.1 sec | 29.8 sec | |
| | buildable using a stable Rust compiler | 💚 | ❌ | ❌ | |
| | published to crates.io | 💚 | ❌ | ❌ | |
| | extensively configurable output | ❌ | ❌ | 💚 | |
| | intended to accommodate hand-maintained source code | ❌ | ❌ | 💚 | |
| |
| <br> |
| |
| ## Comparison to rustfmt |
| |
| - [input.rs](https://github.com/dtolnay/prettyplease/blob/0.1.0/examples/input.rs) |
| - [output.prettyplease.rs](https://github.com/dtolnay/prettyplease/blob/0.1.0/examples/output.prettyplease.rs) |
| - [output.rustfmt.rs](https://github.com/dtolnay/prettyplease/blob/0.1.0/examples/output.rustfmt.rs) |
| |
| If you weren't told which output file is which, it would be practically |
| impossible to tell — **except** for line 435 in the rustfmt output, which |
| is more than 1000 characters long because rustfmt just gave up formatting that |
| part of the file: |
| |
| ```rust |
| match segments[5] { |
| 0 => write!(f, "::{}", ipv4), |
| 0xffff => write!(f, "::ffff:{}", ipv4), |
| _ => unreachable!(), |
| } |
| } else { # [derive (Copy , Clone , Default)] struct Span { start : usize , len : usize , } let zeroes = { let mut longest = Span :: default () ; let mut current = Span :: default () ; for (i , & segment) in segments . iter () . enumerate () { if segment == 0 { if current . len == 0 { current . start = i ; } current . len += 1 ; if current . len > longest . len { longest = current ; } } else { current = Span :: default () ; } } longest } ; # [doc = " Write a colon-separated part of the address"] # [inline] fn fmt_subslice (f : & mut fmt :: Formatter < '_ > , chunk : & [u16]) -> fmt :: Result { if let Some ((first , tail)) = chunk . split_first () { write ! (f , "{:x}" , first) ? ; for segment in tail { f . write_char (':') ? ; write ! (f , "{:x}" , segment) ? ; } } Ok (()) } if zeroes . len > 1 { fmt_subslice (f , & segments [.. zeroes . start]) ? ; f . write_str ("::") ? ; fmt_subslice (f , & segments [zeroes . start + zeroes . len ..]) } else { fmt_subslice (f , & segments) } } |
| } else { |
| const IPV6_BUF_LEN: usize = (4 * 8) + 7; |
| let mut buf = [0u8; IPV6_BUF_LEN]; |
| let mut buf_slice = &mut buf[..]; |
| ``` |
| |
| This is a pretty typical manifestation of rustfmt bailing out in generated code |
| — a chunk of the input ends up on one line. The other manifestation is |
| that you're working on some code, running rustfmt on save like a conscientious |
| developer, but after a while notice it isn't doing anything. You introduce an |
| intentional formatting issue, like a stray indent or semicolon, and run rustfmt |
| to check your suspicion. Nope, it doesn't get cleaned up — rustfmt is just |
| not formatting the part of the file you are working on. |
| |
| The prettyplease library is designed to have no pathological cases that force a |
| bail out; the entire input you give it will get formatted in some "good enough" |
| form. |
| |
| Separately, rustfmt can be problematic to integrate into projects. It's written |
| using rustc's internal syntax tree, so it can't be built by a stable compiler. |
| Its releases are not regularly published to crates.io, so in Cargo builds you'd |
| need to depend on it as a git dependency, which precludes publishing your crate |
| to crates.io also. You can shell out to a `rustfmt` binary, but that'll be |
| whatever rustfmt version is installed on each developer's system (if any), which |
| can lead to spurious diffs in checked-in generated code formatted by different |
| versions. In contrast prettyplease is designed to be easy to pull in as a |
| library, and compiles fast. |
| |
| <br> |
| |
| ## Comparison to rustc_ast_pretty |
| |
| - [input.rs](https://github.com/dtolnay/prettyplease/blob/0.1.0/examples/input.rs) |
| - [output.prettyplease.rs](https://github.com/dtolnay/prettyplease/blob/0.1.0/examples/output.prettyplease.rs) |
| - [output.rustc.rs](https://github.com/dtolnay/prettyplease/blob/0.1.0/examples/output.rustc.rs) |
| |
| This is the pretty-printer that gets used when rustc prints source code, such as |
| `rustc -Zunpretty=expanded`. It's used also by the standard library's |
| `stringify!` when stringifying an interpolated macro_rules AST fragment, like an |
| $:expr, and transitively by `dbg!` and many macros in the ecosystem. |
| |
| Rustc's formatting is mostly okay, but does not hew closely to the dominant |
| contemporary style of Rust formatting. Some things wouldn't ever be written on |
| one line, like this `match` expression, and certainly not with a comma in front |
| of the closing brace: |
| |
| ```rust |
| fn eq(&self, other: &IpAddr) -> bool { |
| match other { IpAddr::V4(v4) => self == v4, IpAddr::V6(_) => false, } |
| } |
| ``` |
| |
| Some places use non-multiple-of-4 indentation, which is definitely not the norm: |
| |
| ```rust |
| pub const fn to_ipv6_mapped(&self) -> Ipv6Addr { |
| let [a, b, c, d] = self.octets(); |
| Ipv6Addr{inner: |
| c::in6_addr{s6_addr: |
| [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0xFF, |
| 0xFF, a, b, c, d],},} |
| } |
| ``` |
| |
| And although there isn't an egregious example of it in the link because the |
| input code is pretty tame, in general rustc_ast_pretty has pathological behavior |
| on generated code. It has a tendency to use excessive horizontal indentation and |
| rapidly run out of width: |
| |
| ```rust |
| ::std::io::_print(::core::fmt::Arguments::new_v1(&[""], |
| &match (&msg,) { |
| _args => |
| [::core::fmt::ArgumentV1::new(_args.0, |
| ::core::fmt::Display::fmt)], |
| })); |
| ``` |
| |
| The snippets above are clearly different from modern rustfmt style. In contrast, |
| prettyplease is designed to have output that is practically indistinguishable |
| from rustfmt-formatted code. |
| |
| <br> |
| |
| ## Example |
| |
| ```rust |
| // [dependencies] |
| // prettyplease = "0.2" |
| // syn = { version = "2", default-features = false, features = ["full", "parsing"] } |
| |
| const INPUT: &str = stringify! { |
| use crate::{ |
| lazy::{Lazy, SyncLazy, SyncOnceCell}, panic, |
| sync::{ atomic::{AtomicUsize, Ordering::SeqCst}, |
| mpsc::channel, Mutex, }, |
| thread, |
| }; |
| impl<T, U> Into<U> for T where U: From<T> { |
| fn into(self) -> U { U::from(self) } |
| } |
| }; |
| |
| fn main() { |
| let syntax_tree = syn::parse_file(INPUT).unwrap(); |
| let formatted = prettyplease::unparse(&syntax_tree); |
| print!("{}", formatted); |
| } |
| ``` |
| |
| <br> |
| |
| ## Algorithm notes |
| |
| The approach and terminology used in the implementation are derived from [*Derek |
| C. Oppen, "Pretty Printing" (1979)*][paper], on which rustc_ast_pretty is also |
| based, and from rustc_ast_pretty's implementation written by Graydon Hoare in |
| 2011 (and modernized over the years by dozens of volunteer maintainers). |
| |
| [paper]: http://i.stanford.edu/pub/cstr/reports/cs/tr/79/770/CS-TR-79-770.pdf |
| |
| The paper describes two language-agnostic interacting procedures `Scan()` and |
| `Print()`. Language-specific code decomposes an input data structure into a |
| stream of `string` and `break` tokens, and `begin` and `end` tokens for |
| grouping. Each `begin`–`end` range may be identified as either "consistent |
| breaking" or "inconsistent breaking". If a group is consistently breaking, then |
| if the whole contents do not fit on the line, *every* `break` token in the group |
| will receive a linebreak. This is appropriate, for example, for Rust struct |
| literals, or arguments of a function call. If a group is inconsistently |
| breaking, then the `string` tokens in the group are greedily placed on the line |
| until out of space, and linebroken only at those `break` tokens for which the |
| next string would not fit. For example, this is appropriate for the contents of |
| a braced `use` statement in Rust. |
| |
| Scan's job is to efficiently accumulate sizing information about groups and |
| breaks. For every `begin` token we compute the distance to the matched `end` |
| token, and for every `break` we compute the distance to the next `break`. The |
| algorithm uses a ringbuffer to hold tokens whose size is not yet ascertained. |
| The maximum size of the ringbuffer is bounded by the target line length and does |
| not grow indefinitely, regardless of deep nesting in the input stream. That's |
| because once a group is sufficiently big, the precise size can no longer make a |
| difference to linebreak decisions and we can effectively treat it as "infinity". |
| |
| Print's job is to use the sizing information to efficiently assign a "broken" or |
| "not broken" status to every `begin` token. At that point the output is easily |
| constructed by concatenating `string` tokens and breaking at `break` tokens |
| contained within a broken group. |
| |
| Leveraging these primitives (i.e. cleverly placing the all-or-nothing consistent |
| breaks and greedy inconsistent breaks) to yield rustfmt-compatible formatting |
| for all of Rust's syntax tree nodes is a fun challenge. |
| |
| Here is a visualization of some Rust tokens fed into the pretty printing |
| algorithm. Consistently breaking `begin`—`end` pairs are represented by |
| `«`⁠`»`, inconsistently breaking by `‹`⁠`›`, `break` by `·`, and the |
| rest of the non-whitespace are `string`. |
| |
| ```text |
| use crate::«{· |
| ‹ lazy::«{·‹Lazy,· SyncLazy,· SyncOnceCell›·}»,· |
| panic,· |
| sync::«{· |
| ‹ atomic::«{·‹AtomicUsize,· Ordering::SeqCst›·}»,· |
| mpsc::channel,· Mutex›,· |
| }»,· |
| thread›,· |
| }»;· |
| «‹«impl<«·T‹›,· U‹›·»>» Into<«·U·»>· for T›· |
| where· |
| U:‹ From<«·T·»>›,· |
| {· |
| « fn into(·«·self·») -> U {· |
| ‹ U::from(«·self·»)›· |
| » }· |
| »}· |
| ``` |
| |
| The algorithm described in the paper is not quite sufficient for producing |
| well-formatted Rust code that is locally indistinguishable from rustfmt's style. |
| The reason is that in the paper, the complete non-whitespace contents are |
| assumed to be independent of linebreak decisions, with Scan and Print being only |
| in control of the whitespace (spaces and line breaks). In Rust as idiomatically |
| formatted by rustfmt, that is not the case. Trailing commas are one example; the |
| punctuation is only known *after* the broken vs non-broken status of the |
| surrounding group is known: |
| |
| ```rust |
| let _ = Struct { x: 0, y: true }; |
| |
| let _ = Struct { |
| x: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx, |
| y: yyyyyyyyyyyyyyyyyyyyyyyyyyyyyy, //<- trailing comma if the expression wrapped |
| }; |
| ``` |
| |
| The formatting of `match` expressions is another case; we want small arms on the |
| same line as the pattern, and big arms wrapped in a brace. The presence of the |
| brace punctuation, comma, and semicolon are all dependent on whether the arm |
| fits on the line: |
| |
| ```rust |
| match total_nanos.checked_add(entry.nanos as u64) { |
| Some(n) => tmp = n, //<- small arm, inline with comma |
| None => { |
| total_secs = total_secs |
| .checked_add(total_nanos / NANOS_PER_SEC as u64) |
| .expect("overflow in iter::sum over durations"); |
| } //<- big arm, needs brace added, and also semicolon^ |
| } |
| ``` |
| |
| The printing algorithm implementation in this crate accommodates all of these |
| situations with conditional punctuation tokens whose selection can be deferred |
| and populated after it's known that the group is or is not broken. |
| |
| <br> |
| |
| #### License |
| |
| <sup> |
| Licensed under either of <a href="LICENSE-APACHE">Apache License, Version |
| 2.0</a> or <a href="LICENSE-MIT">MIT license</a> at your option. |
| </sup> |
| |
| <br> |
| |
| <sub> |
| Unless you explicitly state otherwise, any contribution intentionally submitted |
| for inclusion in this crate by you, as defined in the Apache-2.0 license, shall |
| be dual licensed as above, without any additional terms or conditions. |
| </sub> |