| LibCppBor: A Modern C++ CBOR Parser and Generator |
| ============================================== |
| |
| LibCppBor provides a natural and easy-to-use syntax for constructing and |
| parsing CBOR messages. It does not (yet) support all features of |
| CBOR, nor (yet) support validation against CDDL schemata, though both |
| are planned. CBOR features that aren't supported include: |
| |
| * Indefinite length values |
| * Semantic tagging |
| * Floating point |
| |
| LibCppBor requires C++-17. |
| |
| ## CBOR representation |
| |
| LibCppBor represents CBOR data items as instances of the `Item` class or, |
| more precisely, as instances of subclasses of `Item`, since `Item` is a |
| pure interface. The subclasses of `Item` correspond almost one-to-one |
| with CBOR major types, and are named to match the CDDL names to which |
| they correspond. They are: |
| |
| * `Uint` corresponds to major type 0, and can hold unsigned integers |
| up through (2^64 - 1). |
| * `Nint` corresponds to major type 1. It can only hold values from -1 |
| to -(2^63 - 1), since it's internal representation is an int64_t. |
| This can be fixed, but it seems unlikely that applications will need |
| the omitted range from -(2^63) to (2^64 - 1), since it's |
| inconvenient to represent them in many programming languages. |
| * `Int` is an abstract base of `Uint` and `Nint` that facilitates |
| working with all signed integers representable with int64_t. |
| * `Bstr` corresponds to major type 2, a byte string. |
| * `Tstr` corresponds to major type 3, a text string. |
| * `Array` corresponds to major type 4, an Array. It holds a |
| variable-length array of `Item`s. |
| * `Map` corresponds to major type 5, a Map. It holds a |
| variable-length array of pairs of `Item`s. |
| * `Simple` corresponds to major type 7. It's an abstract class since |
| items require more specific type. |
| * `Bool` is the only currently-implemented subclass of `Simple`. |
| |
| Note that major type 6, semantic tag, is not yet implemented. |
| |
| In practice, users of LibCppBor will rarely use most of these classes |
| when generating CBOR encodings. This is because LibCppBor provides |
| straightforward conversions from the obvious normal C++ types. |
| Specifically, the following conversions are provided in appropriate |
| contexts: |
| |
| * Signed and unsigned integers convert to `Uint` or `Nint`, as |
| appropriate. |
| * `std::string`, `std::string_view`, `const char*` and |
| `std::pair<char iterator, char iterator>` convert to `Tstr`. |
| * `std::vector<uint8_t>`, `std::pair<uint8_t iterator, uint8_t |
| iterator>` and `std::pair<uint8_t*, size_t>` convert to `Bstr`. |
| * `bool` converts to `Bool`. |
| |
| ## CBOR generation |
| |
| ### Complete tree generation |
| |
| The set of `encode` methods in `Item` provide the interface for |
| producing encoded CBOR. The basic process for "complete tree" |
| generation (as opposed to "incremental" generation, which is discussed |
| below) is to construct an `Item` which models the data to be encoded, |
| and then call one of the `encode` methods, whichever is convenient for |
| the encoding destination. A trivial example: |
| |
| ``` |
| cppbor::Uint val(0); |
| std::vector<uint8_t> encoding = val.encode(); |
| ``` |
| |
| It's relatively rare that single values are encoded as above. More often, the |
| "root" data item will be an `Array` or `Map` which contains a more complex structure.For example |
| : |
| |
| ``` using cppbor::Map; |
| using cppbor::Array; |
| |
| std::vector<uint8_t> vec = // ... |
| Map val("key1", Array(Map("key_a", 99 "key_b", vec), "foo"), "key2", true); |
| std::vector<uint8_t> encoding = val.encode(); |
| ``` |
| |
| This creates a map with two entries, with `Tstr` keys "Outer1" and |
| "Outer2", respectively. The "Outer1" entry has as its value an |
| `Array` containing a `Map` and a `Tstr`. The "Outer2" entry has a |
| `Bool` value. |
| |
| This example demonstrates how automatic conversion of C++ types to |
| LibCppBor `Item` subclass instances is done. Where the caller provides a |
| C++ or C string, a `Tstr` entry is added. Where the caller provides |
| an integer literal or variable, a `Uint` or `Nint` is added, depending |
| on whether the value is positive or negative. |
| |
| As an alternative, a more fluent-style API is provided for building up |
| structures. For example: |
| |
| ``` |
| using cppbor::Map; |
| using cppbor::Array; |
| |
| std::vector<uint8_t> vec = // ... |
| Map val(); |
| val.add("key1", Array().add(Map().add("key_a", 99).add("key_b", vec)).add("foo")).add("key2", true); |
| std::vector<uint8_t> encoding = val.encode(); |
| ``` |
| |
| An advantage of this interface over the constructor - |
| based creation approach above is that it need not be done all at once. |
| The `add` methods return a reference to the object added to to allow calls to be chained, |
| but chaining is not necessary; calls can be made |
| sequentially, as the data to add is available. |
| |
| #### `encode` methods |
| |
| There are several variations of `Item::encode`, all of which |
| accomplish the same task but output the encoded data in different |
| ways, and with somewhat different performance characteristics. The |
| provided options are: |
| |
| * `bool encode(uint8\_t** pos, const uint8\_t* end)` encodes into the |
| buffer referenced by the range [`*pos`, end). `*pos` is moved. If |
| the encoding runs out of buffer space before finishing, the method |
| returns false. This is the most efficient way to encode, into an |
| already-allocated buffer. |
| * `void encode(EncodeCallback encodeCallback)` calls `encodeCallback` |
| for each encoded byte. It's the responsibility of the implementor |
| of the callback to behave safely in the event that the output buffer |
| (if applicable) is exhausted. This is less efficient than the prior |
| method because it imposes an additional function call for each byte. |
| * `template </*...*/> void encode(OutputIterator i)` |
| encodes into the provided iterator. SFINAE ensures that the |
| template doesn't match for non-iterators. The implementation |
| actually uses the callback-based method, plus has whatever overhead |
| the iterator adds. |
| * `std::vector<uint8_t> encode()` creates a new std::vector, reserves |
| sufficient capacity to hold the encoding, and inserts the encoded |
| bytes with a std::pushback_iterator and the previous method. |
| * `std::string toString()` does the same as the previous method, but |
| returns a string instead of a vector. |
| |
| ### Incremental generation |
| |
| Incremental generation requires deeper understanding of CBOR, because |
| the library can't do as much to ensure that the output is valid. The |
| basic tool for intcremental generation is the `encodeHeader` |
| function. There are two variations, one which writes into a buffer, |
| and one which uses a callback. Both simply write out the bytes of a |
| header. To construct the same map as in the above examples, |
| incrementally, one might write: |
| |
| ``` |
| using namespace cppbor; // For example brevity |
| |
| std::vector encoding; |
| auto iter = std::back_inserter(result); |
| encodeHeader(MAP, 2 /* # of map entries */, iter); |
| std::string s = "key1"; |
| encodeHeader(TSTR, s.size(), iter); |
| std::copy(s.begin(), s.end(), iter); |
| encodeHeader(ARRAY, 2 /* # of array entries */, iter); |
| Map().add("key_a", 99).add("key_b", vec).encode(iter) |
| s = "foo"; |
| encodeHeader(TSTR, foo.size(), iter); |
| std::copy(s.begin(), s.end(), iter); |
| s = "key2"; |
| encodeHeader(TSTR, foo.size(), iter); |
| std::copy(s.begin(), s.end(), iter); |
| encodeHeader(SIMPLE, TRUE, iter); |
| ``` |
| |
| As the above example demonstrates, the styles can be mixed -- Note the |
| creation and encoding of the inner Map using the fluent style. |
| |
| ## Parsing |
| |
| LibCppBor also supports parsing of encoded CBOR data, with the same |
| feature set as encoding. There are two basic approaches to parsing, |
| "full" and "stream" |
| |
| ### Full parsing |
| |
| Full parsing means completely parsing a (possibly-compound) data |
| item from a byte buffer. The `parse` functions that do not take a |
| `ParseClient` pointer do this. They return a `ParseResult` which is a |
| tuple of three values: |
| |
| * std::unique_ptr<Item> that points to the parsed item, or is nullptr |
| if there was a parse error. |
| * const uint8_t* that points to the byte after the end of the decoded |
| item, or to the first unparseable byte in the event of an error. |
| * std::string that is empty on success or contains an error message if |
| a parse error occurred. |
| |
| Assuming a successful parse, you can then use `Item::type()` to |
| discover the type of the parsed item (e.g. MAP), and then use the |
| appropriate `Item::as*()` method (e.g. `Item::asMap()`) to get a |
| pointer to an interface which allows you to retrieve specific values. |
| |
| ### Stream parsing |
| |
| Stream parsing is more complex, but more flexible. To use |
| StreamParsing, you must create your own subclass of `ParseClient` and |
| call one of the `parse` functions that accepts it. See the |
| `ParseClient` methods docstrings for details. |
| |
| One unusual feature of stream parsing is that the `ParseClient` |
| callback methods not only provide the parsed Item, but also pointers |
| to the portion of the buffer that encode that Item. This is useful |
| if, for example, you want to find an element inside of a structure, |
| and then copy the encoding of that sub-structure, without bothering to |
| parse the rest. |
| |
| The full parser is implemented with the stream parser. |
| |
| ### Disclaimer |
| This is not an officially supported Google product |