| # How To Implement Field Presence for Proto3 |
| |
| Protobuf release 3.12 adds experimental support for `optional` fields in |
| proto3. Proto3 optional fields track presence like in proto2. For background |
| information about what presence tracking means, please see |
| [docs/field_presence](field_presence.md). |
| |
| ## Document Summary |
| |
| This document is targeted at developers who own or maintain protobuf code |
| generators. All code generators will need to be updated to support proto3 |
| optional fields. First-party code generators developed by Google are being |
| updated already. However third-party code generators will need to be updated |
| independently by their authors. This includes: |
| |
| - implementations of Protocol Buffers for other languages. |
| - alternate implementations of Protocol Buffers that target specialized use |
| cases. |
| - RPC code generators that create generated APIs for service calls. |
| - code generators that implement some utility code on top of protobuf generated |
| classes. |
| |
| While this document speaks in terms of "code generators", these same principles |
| apply to implementations that dynamically generate a protocol buffer API "on the |
| fly", directly from a descriptor, in languages that support this kind of usage. |
| |
| ## Background |
| |
| Presence tracking was added to proto3 in response to user feedback, both from |
| inside Google and [from open-source |
| users](https://github.com/protocolbuffers/protobuf/issues/1606). The [proto3 |
| wrapper |
| types](https://github.com/protocolbuffers/protobuf/blob/main/src/google/protobuf/wrappers.proto) |
| were previously the only supported presence mechanism for proto3. Users have |
| pointed to both efficiency and usability issues with the wrapper types. |
| |
| Presence in proto3 uses exactly the same syntax and semantics as in proto2. |
| Proto3 Fields marked `optional` will track presence like proto2, while fields |
| without any label (known as "singular fields"), will continue to omit presence |
| information. The `optional` keyword was chosen to minimize differences with |
| proto2. |
| |
| Unfortunately, for the current descriptor protos and `Descriptor` API (as of |
| 3.11.4) it is not possible to use the same representation as proto2. Proto3 |
| descriptors already use `LABEL_OPTIONAL` for proto3 singular fields, which do |
| not track presence. There is a lot of existing code that reflects over proto3 |
| protos and assumes that `LABEL_OPTIONAL` in proto3 means "no presence." Changing |
| the semantics now would be risky, since old software would likely drop proto3 |
| presence information, which would be a data loss bug. |
| |
| To minimize this risk we chose a descriptor representation that is semantically |
| compatible with existing proto3 reflection. Every proto3 optional field is |
| placed into a one-field `oneof`. We call this a "synthetic" oneof, as it was not |
| present in the source `.proto` file. |
| |
| Since oneof fields in proto3 already track presence, existing proto3 |
| reflection-based algorithms should correctly preserve presence for proto3 |
| optional fields with no code changes. For example, the JSON and TextFormat |
| parsers/serializers in C++ and Java did not require any changes to support |
| proto3 presence. This is the major benefit of synthetic oneofs. |
| |
| This design does leave some cruft in descriptors. Synthetic oneofs are a |
| compatibility measure that we can hopefully clean up in the future. For now |
| though, it is important to preserve them across different descriptor formats and |
| APIs. It is never safe to drop synthetic oneofs from a proto schema. Code |
| generators can (and should) skip synthetic oneofs when generating a user-facing |
| API or user-facing documentation. But for any schema representation that is |
| consumed programmatically, it is important to keep the synthetic oneofs around. |
| |
| In APIs it can be helpful to offer separate accessors that refer to "real" |
| oneofs (see [API Changes](#api-changes) below). This is a convenient way to omit |
| synthetic oneofs in code generators. |
| |
| ## Updating a Code Generator |
| |
| When a user adds an `optional` field to proto3, this is internally rewritten as |
| a one-field oneof, for backward-compatibility with reflection-based algorithms: |
| |
| ```protobuf |
| syntax = "proto3"; |
| |
| message Foo { |
| // Experimental feature, not generally supported yet! |
| optional int32 foo = 1; |
| |
| // Internally rewritten to: |
| // oneof _foo { |
| // int32 foo = 1 [proto3_optional=true]; |
| // } |
| // |
| // We call _foo a "synthetic" oneof, since it was not created by the user. |
| } |
| ``` |
| |
| As a result, the main two goals when updating a code generator are: |
| |
| 1. Give `optional` fields like `foo` normal field presence, as described in |
| [docs/field_presence](field_presence.md) If your implementation already |
| supports proto2, a proto3 `optional` field should use exactly the same API |
| and internal implementation as proto2 `optional`. |
| 2. Avoid generating any oneof-based accessors for the synthetic oneof. Its only |
| purpose is to make reflection-based algorithms work properly if they are |
| not aware of proto3 presence. The synthetic oneof should not appear anywhere |
| in the generated API. |
| |
| ### Satisfying the Experimental Check |
| |
| If you try to run `protoc` on a file with proto3 `optional` fields, you will get |
| an error because the feature is still experimental: |
| |
| ``` |
| $ cat test.proto |
| syntax = "proto3"; |
| |
| message Foo { |
| // Experimental feature, not generally supported yet! |
| optional int32 a = 1; |
| } |
| $ protoc --cpp_out=. test.proto |
| test.proto: This file contains proto3 optional fields, but --experimental_allow_proto3_optional was not set. |
| ``` |
| |
| There are two options for getting around this error: |
| |
| 1. Pass `--experimental_allow_proto3_optional` to protoc. |
| 2. Make your filename (or a directory name) contain the string |
| `test_proto3_optional`. This indicates that the proto file is specifically |
| for testing proto3 optional support, so the check is suppressed. |
| |
| These options are demonstrated below: |
| |
| ``` |
| # One option: |
| $ ./src/protoc test.proto --cpp_out=. --experimental_allow_proto3_optional |
| |
| # Another option: |
| $ cp test.proto test_proto3_optional.proto |
| $ ./src/protoc test_proto3_optional.proto --cpp_out=. |
| $ |
| ``` |
| |
| The experimental check will be removed in a future release, once we are ready |
| to make this feature generally available. Ideally this will happen for the 3.13 |
| release of protobuf, sometime in mid-2020, but there is not a specific date set |
| for this yet. Some of the timing will depend on feedback we get from the |
| community, so if you have questions or concerns please get in touch via a |
| GitHub issue. |
| |
| ### Signaling That Your Code Generator Supports Proto3 Optional |
| |
| If you now try to invoke your own code generator with the test proto, you will |
| run into a different error: |
| |
| ``` |
| $ ./src/protoc test_proto3_optional.proto --my_codegen_out=. |
| test_proto3_optional.proto: is a proto3 file that contains optional fields, but |
| code generator --my_codegen_out hasn't been updated to support optional fields in |
| proto3. Please ask the owner of this code generator to support proto3 optional. |
| ``` |
| |
| This check exists to make sure that code generators get a chance to update |
| before they are used with proto3 `optional` fields. Without this check an old |
| code generator might emit obsolete generated APIs (like accessors for a |
| synthetic oneof) and users could start depending on these. That would create |
| a legacy migration burden once a code generator actually implements the feature. |
| |
| To signal that your code generator supports `optional` fields in proto3, you |
| need to tell `protoc` what features you support. The method for doing this |
| depends on whether you are using the C++ |
| `google::protobuf::compiler::CodeGenerator` |
| framework or not. |
| |
| If you are using the CodeGenerator framework: |
| |
| ```c++ |
| class MyCodeGenerator : public google::protobuf::compiler::CodeGenerator { |
| // Add this method. |
| uint64_t GetSupportedFeatures() const override { |
| // Indicate that this code generator supports proto3 optional fields. |
| // (Note: don't release your code generator with this flag set until you |
| // have actually added and tested your proto3 support!) |
| return FEATURE_PROTO3_OPTIONAL; |
| } |
| } |
| ``` |
| |
| If you are generating code using raw `CodeGeneratorRequest` and |
| `CodeGeneratorResponse` messages from `plugin.proto`, the change will be very |
| similar: |
| |
| ```c++ |
| void GenerateResponse() { |
| CodeGeneratorResponse response; |
| response.set_supported_features(CodeGeneratorResponse::FEATURE_PROTO3_OPTIONAL); |
| |
| // Generate code... |
| } |
| ``` |
| |
| Once you have added this, you should now be able to successfully use your code |
| generator to generate a file containing proto3 optional fields: |
| |
| ``` |
| $ ./src/protoc test_proto3_optional.proto --my_codegen_out=. |
| ``` |
| |
| ### Updating Your Code Generator |
| |
| Now to actually add support for proto3 optional to your code generator. The goal |
| is to recognize proto3 optional fields as optional, and suppress any output from |
| synthetic oneofs. |
| |
| If your code generator does not currently support proto2, you will need to |
| design an API and implementation for supporting presence in scalar fields. |
| Generally this means: |
| |
| - allocating a bit inside the generated class to represent whether a given field |
| is present or not. |
| - exposing a `has_foo()` method for each field to return the value of this bit. |
| - make the parser set this bit when a value is parsed from the wire. |
| - make the serializer test this bit to decide whether to serialize. |
| |
| If your code generator already supports proto2, then most of your work is |
| already done. All you need to do is make sure that proto3 optional fields have |
| exactly the same API and behave in exactly the same way as proto2 optional |
| fields. |
| |
| From experience updating several of Google's code generators, most of the |
| updates that are required fall into one of several patterns. Here we will show |
| the patterns in terms of the C++ CodeGenerator framework. If you are using |
| `CodeGeneratorRequest` and `CodeGeneratorReply` directly, you can translate the |
| C++ examples to your own language, referencing the C++ implementation of these |
| methods where required. |
| |
| #### To test whether a field should have presence |
| |
| Old: |
| |
| ```c++ |
| bool MessageHasPresence(const google::protobuf::Descriptor* message) { |
| return message->file()->syntax() == |
| google::protobuf::FileDescriptor::SYNTAX_PROTO2; |
| } |
| ``` |
| |
| New: |
| |
| ```c++ |
| // Presence is no longer a property of a message, it's a property of individual |
| // fields. |
| bool FieldHasPresence(const google::protobuf::FieldDescriptor* field) { |
| return field->has_presence(); |
| // Note, the above will return true for fields in a oneof. |
| // If you want to filter out oneof fields, write this instead: |
| // return field->has_presence && !field->real_containing_oneof() |
| } |
| ``` |
| |
| #### To test whether a field is a member of a oneof |
| |
| Old: |
| |
| ```c++ |
| bool FieldIsInOneof(const google::protobuf::FieldDescriptor* field) { |
| return field->containing_oneof() != nullptr; |
| } |
| ``` |
| |
| New: |
| |
| ```c++ |
| bool FieldIsInOneof(const google::protobuf::FieldDescriptor* field) { |
| // real_containing_oneof() returns nullptr for synthetic oneofs. |
| return field->real_containing_oneof() != nullptr; |
| } |
| ``` |
| |
| #### To iterate over all oneofs |
| |
| Old: |
| |
| ```c++ |
| bool IterateOverOneofs(const google::protobuf::Descriptor* message) { |
| for (int i = 0; i < message->oneof_decl_count(); i++) { |
| const google::protobuf::OneofDescriptor* oneof = message->oneof(i); |
| // ... |
| } |
| } |
| ``` |
| |
| New: |
| |
| ```c++ |
| bool IterateOverOneofs(const google::protobuf::Descriptor* message) { |
| // Real oneofs are always first, and real_oneof_decl_count() will return the |
| // total number of oneofs, excluding synthetic oneofs. |
| for (int i = 0; i < message->real_oneof_decl_count(); i++) { |
| const google::protobuf::OneofDescriptor* oneof = message->oneof(i); |
| // ... |
| } |
| } |
| ``` |
| |
| ## Updating Reflection |
| |
| If your implementation offers reflection, there are a few other changes to make: |
| |
| ### API Changes |
| |
| The API for reflecting over fields and oneofs should make the following changes. |
| These match the changes implemented in C++ reflection. |
| |
| 1. Add a `FieldDescriptor::has_presence()` method returning `bool` |
| (adjusted to your language's naming convention). This should return true |
| for all fields that have explicit presence, as documented in |
| [docs/field_presence](field_presence.md). In particular, this includes |
| fields in a oneof, proto2 scalar fields, and proto3 `optional` fields. |
| This accessor will allow users to query what fields have presence without |
| thinking about the difference between proto2 and proto3. |
| 2. As a corollary of (1), please do *not* expose an accessor for the |
| `FieldDescriptorProto.proto3_optional` field. We want to avoid having |
| users implement any proto2/proto3-specific logic. Users should use the |
| `has_presence()` function instead. |
| 3. You may also wish to add a `FieldDescriptor::has_optional_keyword()` method |
| returning `bool`, which indicates whether the `optional` keyword is present. |
| Message fields will always return `true` for `has_presence()`, so this method |
| can allow a user to know whether the user wrote `optional` or not. It can |
| occasionally be useful to have this information, even though it does not |
| change the presence semantics of the field. |
| 4. If your reflection API may be used for a code generator, you may wish to |
| implement methods to help users tell the difference between real and |
| synthetic oneofs. In particular: |
| - `OneofDescriptor::is_synthetic()`: returns true if this is a synthetic |
| oneof. |
| - `FieldDescriptor::real_containing_oneof()`: like `containing_oneof()`, |
| but returns `nullptr` if the oneof is synthetic. |
| - `Descriptor::real_oneof_decl_count()`: like `oneof_decl_count()`, but |
| returns the number of real oneofs only. |
| |
| ### Implementation Changes |
| |
| Proto3 `optional` fields and synthetic oneofs must work correctly when |
| reflected on. Specifically: |
| |
| 1. Reflection for synthetic oneofs should work properly. Even though synthetic |
| oneofs do not really exist in the message, you can still make reflection work |
| as if they did. In particular, you can make a method like |
| `Reflection::HasOneof()` or `Reflection::GetOneofFieldDescriptor()` look at |
| the hasbit to determine if the oneof is present or not. |
| 2. Reflection for proto3 optional fields should work properly. For example, a |
| method like `Reflection::HasField()` should know to look for the hasbit for a |
| proto3 `optional` field. It should not be fooled by the synthetic oneof into |
| thinking that there is a `case` member for the oneof. |
| |
| Once you have updated reflection to work properly with proto3 `optional` and |
| synthetic oneofs, any code that *uses* your reflection interface should work |
| properly with no changes. This is the benefit of using synthetic oneofs. |
| |
| In particular, if you have a reflection-based implementation of protobuf text |
| format or JSON, it should properly support proto3 optional fields without any |
| changes to the code. The fields will look like they all belong to a one-field |
| oneof, and existing proto3 reflection code should know how to test presence for |
| fields in a oneof. |
| |
| So the best way to test your reflection changes is to try round-tripping a |
| message through text format, JSON, or some other reflection-based parser and |
| serializer, if you have one. |
| |
| ### Validating Descriptors |
| |
| If your reflection implementation supports loading descriptors at runtime, |
| you must verify that all synthetic oneofs are ordered after all "real" oneofs. |
| |
| Here is the code that implements this validation step in C++, for inspiration: |
| |
| ```c++ |
| // Validation that runs for each message. |
| // Synthetic oneofs must be last. |
| int first_synthetic = -1; |
| for (int i = 0; i < message->oneof_decl_count(); i++) { |
| const OneofDescriptor* oneof = message->oneof_decl(i); |
| if (oneof->is_synthetic()) { |
| if (first_synthetic == -1) { |
| first_synthetic = i; |
| } |
| } else { |
| if (first_synthetic != -1) { |
| AddError(message->full_name(), proto.oneof_decl(i), |
| DescriptorPool::ErrorCollector::OTHER, |
| "Synthetic oneofs must be after all other oneofs"); |
| } |
| } |
| } |
| |
| if (first_synthetic == -1) { |
| message->real_oneof_decl_count_ = message->oneof_decl_count_; |
| } else { |
| message->real_oneof_decl_count_ = first_synthetic; |
| } |
| ``` |