docs/architecture/psa-migration/strategy.md - platform/external/mbedtls - Git at Google

 This document explains the strategy that was used so far in starting the
 migration to PSA Crypto and mentions future perspectives and open questions.

 Goals
 =====

 Several benefits are expected from migrating to PSA Crypto:

 G1. Use PSA Crypto drivers when available.
 G2. Allow isolation of long-term secrets (for example, private keys).
 G3. Allow isolation of short-term secrets (for example, TLS session keys).
 G4. Have a clean, unified API for Crypto (retire the legacy API).
 G5. Code size: compile out our implementation when a driver is available.

 As of Mbed TLS 3.2, most of (G1) and all of (G2) is implemented when
 `MBEDTLS_USE_PSA_CRYPTO` is enabled. For (G2) to take effect, the application
 needs to be changed to use new APIs. For a more detailed account of what's
 implemented, see `docs/use-psa-crypto.md`, where new APIs are about (G2), and
 internal changes implement (G1).

 As of early 2023, work towards G5 is in progress: Mbed TLS 3.3 and 3.4 saw
 some improvements in this area, and more will be coming in future releases.

 Generally speaking, the numbering above doesn't mean that each goal requires
 the preceding ones to be completed.


 Compile-time options
 ====================

 We currently have a few compile-time options that are relevant to the migration:

 - `MBEDTLS_PSA_CRYPTO_C` - enabled by default, controls the presence of the PSA
   Crypto APIs.
 - `MBEDTLS_USE_PSA_CRYPTO` - disabled by default (enabled in "full" config),
   controls usage of PSA Crypto APIs to perform operations in X.509 and TLS
 (G1 above), as well as the availability of some new APIs (G2 above).
 - `PSA_CRYPTO_CONFIG` - disabled by default, supports builds with drivers and
   without the corresponding software implementation (G5 above).

 The reasons why `MBEDTLS_USE_PSA_CRYPTO` is optional and disabled by default
 are:
 - it's not fully compatible with `MBEDTLS_ECP_RESTARTABLE`: you can enable
   both, but then you won't get the full effect of RESTARTBLE (see the
 documentation of this option in `mbedtls_config.h`);
 - to avoid a hard/default dependency of TLS, X.509 and PK on
   `MBEDTLS_PSA_CRYPTO_C`, for backward compatibility reasons:
   - When `MBEDTLS_PSA_CRYPTO_C` is enabled and used, applications need to call
     `psa_crypto_init()` before TLS/X.509 uses PSA functions. (This prevents us
 from even enabling the option by default.)
   - `MBEDTLS_PSA_CRYPTO_C` has a hard dependency on `MBEDTLS_ENTROPY_C ||
     MBEDTLS_PSA_CRYPTO_EXTERNAL_RNG` but it's
     currently possible to compile TLS and X.509 without any of the options.
     Also, we can't just auto-enable `MBEDTLS_ENTROPY_C` as it doesn't build
     out of the box on all platforms, and even less
     `MBEDTLS_PSA_CRYPTO_EXTERNAL_RNG` as it requires a user-provided RNG
     function.

 The downside of this approach is that until we are able to make
 `MBDEDTLS_USE_PSA_CRYPTO` non-optional (always enabled), we have to maintain
 two versions of some parts of the code: one using PSA, the other using the
 legacy APIs. However, see next section for strategies that can lower that
 cost. The rest of this section explains the reasons for the
 incompatibilities mentioned above.

 At the time of writing (early 2022) it is unclear what could be done about the
 backward compatibility issues, and in particular if the cost of implementing
 solutions to these problems would be higher or lower than the cost of
 maintaining dual code paths until the next major version. (Note: these
 solutions would probably also solve other problems at the same time.)

 ### `MBEDTLS_ECP_RESTARTABLE`

 Currently this option controls not only the presence of restartable APIs in
 the crypto library, but also their use in the TLS and X.509 layers. Since PSA
 Crypto does not support restartable operations, there's a clear conflict: the
 TLS and X.509 layers can't both use only PSA APIs and get restartable
 behaviour.

 Support for restartable (aka interruptible) ECDSA sign/verify operation was
 added to PSA in Mbed TLS 3.4, but support for ECDH is not present yet.

 It will then require follow-up work to make use of the new PSA APIs in
 PK/X.509/TLS in all places where we currently allow restartable operations.

 ### Backward compatibility issues with making `MBEDTLS_USE_PSA_CRYPTO` always on

 1. Existing applications may not be calling `psa_crypto_init()` before using
    TLS, X.509 or PK. We can try to work around that by calling (the relevant
 part of) it ourselves under the hood as needed, but that would likely require
 splitting init between the parts that can fail and the parts that can't (see
 <https://github.com/ARM-software/psa-crypto-api/pull/536> for that).
 2. It's currently not possible to enable `MBEDTLS_PSA_CRYPTO_C` in
    configurations that don't have `MBEDTLS_ENTROPY_C`, and we can't just
 auto-enable the latter, as it won't build or work out of the box on all
 platforms. There are two kinds of things we'd need to do if we want to work
 around that:
    1. Make it possible to enable the parts of PSA Crypto that don't require an
       RNG (typically, public key operations, symmetric crypto, some key
 management functions (destroy etc)) in configurations that don't have
 `ENTROPY_C`. This requires going through the PSA code base to adjust
 dependencies. Risk: there may be annoying dependencies, some of which may be
 surprising.
    2. For operations that require an RNG, provide an alternative function
       accepting an explicit `f_rng` parameter (see #5238), that would be
 available in entropy-less builds. (Then code using those functions still needs
 to have one version using it, for entropy-less builds, and one version using
 the standard function, for driver support in build with entropy.)

 See <https://github.com/Mbed-TLS/mbedtls/issues/5156>.

 Taking advantage of the existing abstractions layers - or not
 =============================================================

 The Crypto library in Mbed TLS currently has 3 abstraction layers that offer
 algorithm-agnostic APIs for a class of algorithms:

 - MD for messages digests aka hashes (including HMAC)
 - Cipher for symmetric ciphers (included AEAD)
 - PK for asymmetric (aka public-key) cryptography (excluding key exchange)

 Note: key exchange (FFDH, ECDH) is not covered by an abstraction layer.

 These abstraction layers typically provide, in addition to the API for crypto
 operations, types and numerical identifiers for algorithms (for
 example `mbedtls_cipher_mode_t` and its values). The
 current strategy is to keep using those identifiers in most of the code, in
 particular in existing structures and public APIs, even when
 `MBEDTLS_USE_PSA_CRYPTO` is enabled. (This is not an issue for G1, G2, G3
 above, and is only potentially relevant for G4.)

 The are multiple strategies that can be used regarding the place of those
 layers in the migration to PSA.

 Silently call to PSA from the abstraction layer
 -----------------------------------------------

 - Provide a new definition (conditionally on `USE_PSA_CRYPTO`) of wrapper
   functions in the abstraction layer, that calls PSA instead of the legacy
 crypto API.
 - Upside: changes contained to a single place, no need to change TLS or X.509
   code anywhere.
 - Downside: tricky to implement if the PSA implementation is currently done on
   top of that layer (dependency loop).

 This strategy is currently (early 2023) used for all operations in the PK
 layer; the MD layer uses a variant where it dispatches to PSA if a driver is
 available and the driver subsystem has been initialized, regardless of whether
 `USE_PSA_CRYPTO` is enabled; see `md-cipher-dispatch.md` in the same directory
 for details.

 This strategy is not very well suited to the Cipher layer, as the PSA
 implementation is currently done on top of that layer.

 This strategy will probably be used for some time for the PK layer, while we
 figure out what the future of that layer is: parts of it (parse/write, ECDSA
 signatures in the format that X.509 & TLS want) are not covered by PSA, so
 they will need to keep existing in some way. (Also, the PK layer is a good
 place for dispatching to either PSA or `mbedtls_xxx_restartable` while that
 part is not covered by PSA yet, if we decide to do that.)

 Replace calls for each operation
 --------------------------------

 - For every operation that's done through this layer in TLS or X.509, just
   replace function call with calls to PSA (conditionally on `USE_PSA_CRYPTO`)
 - Upside: conceptually simple, and if the PSA implementation is currently done
   on top of that layer, avoids concerns about dependency loops.
 - Upside: opens the door to building TLS/X.509 without that layer, saving some
   code size.
 - Downside: TLS/X.509 code has to be done for each operation.

 This strategy is currently (early 2023) used for the MD layer and the Cipher
 layer in X.509 and TLS. Crypto modules however always call to MD which may
 then dispatch to PSA, see `md-cipher-dispatch.md`.

 Opt-in use of PSA from the abstraction layer
 --------------------------------------------

 - Provide a new way to set up a context that causes operations on that context
   to be done via PSA.
 - Upside: changes mostly contained in one place, TLS/X.509 code only needs to
   be changed when setting up the context, but not when using it. In
   particular, no changes to/duplication of existing public APIs that expect a
   key to be passed as a context of this layer (eg, `mbedtls_pk_context`).
 - Upside: avoids dependency loop when PSA implemented on top of that layer.
 - Downside: when the context is typically set up by the application, requires
   changes in application code.

 This strategy is not useful when no context is used, for example with the
 one-shot function `mbedtls_md()`.

 There are two variants of this strategy: one where using the new setup
 function also allows for key isolation (the key is only held by PSA,
 supporting both G1 and G2 in that area), and one without isolation (the key is
 still stored outside of PSA most of the time, supporting only G1).

 This strategy, with support for key isolation, is currently (early 2022) used for
 private-key operations in the PK layer - see `mbedtls_pk_setup_opaque()`. This
 allows use of PSA-held private ECDSA keys in TLS and X.509 with no change to
 the TLS/X.509 code, but a contained change in the application.

 This strategy, without key isolation, was also previously used (until 3.1
 included) in the Cipher layer - see `mbedtls_cipher_setup_psa()`. This allowed
 use of PSA for cipher operations in TLS with no change to the application
 code, and a contained change in TLS code. (It only supported a subset of
 ciphers.)

 Note: for private key operations in the PK layer, both the "silent" and the
 "opt-in" strategy can apply, and can complement each other, as one provides
 support for key isolation, but at the (unavoidable) code of change in
 application code, while the other requires no application change to get
 support for drivers, but fails to provide isolation support.

 Summary
 -------

 Strategies currently (early 2022) used with each abstraction layer:

 - PK (for G1): silently call PSA
 - PK (for G2): opt-in use of PSA (new key type)
 - Cipher (G1): replace calls at each call site
 - MD (G1, X.509 and TLS): replace calls at each call site (depending on
   `USE_PSA_CRYPTO`)
 - MD (G5): silently call PSA when a driver is available, see
   `md-cipher-dispatch.md`.


 Supporting builds with drivers without the software implementation
 ==================================================================

 This section presents a plan towards G5: save code size by compiling out our
 software implementation when a driver is available.

 Let's expand a bit on the definition of the goal: in such a configuration
 (driver used, software implementation and abstraction layer compiled out),
 we want:

 a. the library to build in a reasonably-complete configuration,
 b. with all tests passing,
 c. and no more tests skipped than the same configuration with software
    implementation.

 Criterion (c) ensures not only test coverage, but that driver-based builds are
 at feature parity with software-based builds.

 We can roughly divide the work needed to get there in the following steps:

 0. Have a working driver interface for the algorithms we want to replace.
 1. Have users of these algorithms call to PSA or an abstraction layer than can
    dispatch to PSA, but not the low-level legacy API, for all operations.
 (This is G1, and for PK, X.509 and TLS this is controlled by
 `MBEDTLS_USE_PSA_CRYPTO`.) This needs to be done in the library and tests.
 2. Have users of these algorithms not depend on the legacy API for information
    management (getting a size for a given algorithm, etc.)
 3. Adapt compile-time guards used to query availability of a given algorithm;
    this needs to be done in the library (for crypto operations and data) and
 tests.

 Note: the first two steps enable use of drivers, but not by themselves removal
 of the software implementation.

 Note: the fact that step 1 is not achieved for all of libmbedcrypto (see
 below) is the reason why criterion (a) has "a reasonably-complete
 configuration", to allow working around internal crypto dependencies when
 working on other parts such as X.509 and TLS - for example, a configuration
 without RSA PKCS#1 v2.1 still allows reasonable use of X.509 and TLS.

 Note: this is a conceptual division that will sometimes translate to how the
 work is divided into PRs, sometimes not. For example, in situations where it's
 not possible to achieve good test coverage at the end of step 1 or step 2, it
 is preferable to group with the next step(s) in the same PR until good test
 coverage can be reached.

 **Status as of end of March 2023 (shortly after 3.4):**

 - Step 0 is achieved for most algorithms, with only a few gaps remaining.
 - Step 1 is achieved for most of PK, X.509, and TLS when
   `MBEDTLS_USE_PSA_CRYPTO` is enabled with only a few gaps remaining (see
   docs/use-psa-crypto.md).
 - Step 1 is achieved for the crypto library regarding hashes: everything uses
   MD (not low-level hash APIs), which then dispatches to PSA if applicable.
 - Step 1 is not achieved for all of the crypto library when it come to
   ciphers. For example,`ctr_drbg.c` calls the legacy API `mbedtls_aes`.
 - Step 2 is achieved for most of X.509 and TLS (same gaps as step 1) when
   `MBEDTLS_USE_PSA_CRYPTO` is enabled.
 - Step 3 is done for hashes and top-level ECC modules (ECDSA, ECDH, ECJPAKE).

 **Strategy for step 1:**

 Regarding PK, X.509, and TLS, this is mostly achieved with only a few gaps.
 (The strategy was outlined in the previous section.)

 Regarding libmbedcrypto:
 - for hashes and ciphers, see `md-cipher-dispatch.md` in the same directory;
 - for ECC, we have no internal uses of the top-level algorithms (ECDSA, ECDH,
   ECJPAKE), however they all depend on `ECP_C` which in turn depends on
 `BIGNUM_C`. So, direct calls from TLS, X.509 and PK to ECP and Bignum will
 need to be replaced; see <https://github.com/Mbed-TLS/mbedtls/issues/6839> and
 linked issues for a summary of intermediate steps and open points.

 **Strategy for step 2:**

 The most satisfying situation here is when we can just use the PSA Crypto API
 for information management as well. However sometimes it may not be
 convenient, for example in parts of the code that accept old-style identifiers
 (such as `mbedtls_md_type_t`) in their API and can't assume PSA to be
 compiled in (such as `rsa.c`).

 When using an existing abstraction layer such as MD, it can provide
 information management functions. In other cases, information that was in a
 low-level module but logically belongs in a higher-level module can be moved
 to that module (for example, TLS identifiers of curves and there conversion
 to/from PSA or legacy identifiers belongs in TLS, not `ecp.c`).

 **Strategy for step 3:**

 There are currently two (complementary) ways for crypto-using code to check if a
 particular algorithm is supported: using `MBEDTLS_xxx` macros, and using
 `PSA_WANT_xxx` macros. For example, PSA-based code that want to use SHA-256
 will check for `PSA_WANT_ALG_SHA_256`, while legacy-based code that wants to
 use SHA-256 will check for `MBEDTLS_SHA256_C` if using the `mbedtls_sha256`
 API, or for `MBEDTLS_MD_C && MBEDTLS_SHA256_C` if using the `mbedtls_md` API.

 Code that obeys `MBEDTLS_USE_PSA_CRYPTO` will want to use one of the two
 dependencies above depending on whether `MBEDTLS_USE_PSA_CRYPTO` is defined:
 if it is, the code want the algorithm available in PSA, otherwise, it wants it
 available via the legacy API(s) is it using (MD and/or low-level).

 As much as possible, we're trying to create for each algorithm a single new
 macro that can be used to express dependencies everywhere (except pure PSA
 code that should always use `PSA_WANT`). For example, for hashes this is the
 `MBEDTLS_MD_CAN_xxx` family. For ECC algorithms, we have similar
 `MBEDTLS_PK_CAN_xxx` macros.

 Note that in order to achieve that goal, even for code that obeys
 `USE_PSA_CRYPTO`, it is useful to impose that all algorithms that are
 available via the legacy APIs are also available via PSA.

 Executing step 3 will mostly consist of using the right dependency macros in
 the right places (once the previous steps are done).

 **Note on testing**

 Since supporting driver-only builds is not about adding features, but about
 supporting existing features in new types of builds, testing will not involve
 adding cases to the test suites, but instead adding new components in `all.sh`
 that build and run tests in newly-supported configurations. For example, if
 we're making some part of the library work with hashes provided only by
 drivers when `MBEDTLS_USE_PSA_CRYPTO` is defined, there should be a place in
 `all.sh` that builds and run tests in such a configuration.

 There is however a risk, especially in step 3 where we change how dependencies
 are expressed (sometimes in bulk), to get things wrong in a way that would
 result in more tests being skipped, which is easy to miss. Care must be
 taken to ensure this does not happen. The following criteria can be used:

 1. The sets of tests skipped in the default config and the full config must be
   the same before and after the PR that implements step 3. This is tested
 manually for each PR that changes dependency declarations by using the script
 `outcome-analysis.sh` in the present directory.
 2. The set of tests skipped in the driver-only build is the same as in an
   equivalent software-based configuration. This is tested automatically by the
 CI in the "Results analysis" stage, by running
 `tests/scripts/analyze_outcomes.py`. See the
 `analyze_driver_vs_reference_xxx` actions in the script and the comments above
 their declaration for how to do that locally.


 Migrating away from the legacy API
 ==================================

 This section briefly introduces questions and possible plans towards G4,
 mainly as they relate to choices in previous stages.

 The role of the PK/Cipher/MD APIs in user migration
 ---------------------------------------------------

 We're currently taking advantage of the existing PK layer in order
 to reduce the number of places where library code needs to be changed. It's
 only natural to consider using the same strategy (with the PK, MD and Cipher
 layers) for facilitating migration of application code.

 Note: a necessary first step for that would be to make sure PSA is no longer
 implemented of top of the concerned layers

 ### Zero-cost compatibility layer?

 The most favourable case is if we can have a zero-cost abstraction (no
 runtime, RAM usage or code size penalty), for example just a bunch of
 `#define`s, essentially mapping `mbedtls_` APIs to their `psa_` equivalent.

 Unfortunately that's unlikely to fully work. For example, the MD layer uses the
 same context type for hashes and HMACs, while the PSA API (rightfully) has
 distinct operation types. Similarly, the Cipher layer uses the same context
 type for unauthenticated and AEAD ciphers, which again the PSA API
 distinguishes.

 It is unclear how much value, if any, a zero-cost compatibility layer that's
 incomplete (for example, for MD covering only hashes, or for Cipher covering
 only AEAD) or differs significantly from the existing API (for example,
 introducing new context types) would provide to users.

 ### Low-cost compatibility layers?

 Another possibility is to keep most or all of the existing API for the PK, MD
 and Cipher layers, implemented on top of PSA, aiming for the lowest possible
 cost. For example, `mbedtls_md_context_t` would be defined as a (tagged) union
 of `psa_hash_operation_t` and `psa_mac_operation_t`, then `mbedtls_md_setup()`
 would initialize the correct part, and the rest of the functions be simple
 wrappers around PSA functions. This would vastly reduce the complexity of the
 layers compared to the existing (no need to dispatch through function
 pointers, just call the corresponding PSA API).

 Since this would still represent a non-zero cost, not only in terms of code
 size, but also in terms of maintenance (testing, etc.) this would probably
 be a temporary solution: for example keep the compatibility layers in 4.0 (and
 make them optional), but remove them in 5.0.

 Again, this provides the most value to users if we can manage to keep the
 existing API unchanged. Their might be conflicts between this goal and that of
 reducing the cost, and judgment calls may need to be made.

 Note: when it comes to holding public keys in the PK layer, depending on how
 the rest of the code is structured, it may be worth holding the key data in
 memory controlled by the PK layer as opposed to a PSA key slot, moving it to a
 slot only when needed (see current `ecdsa_verify_wrap` when
 `MBEDTLS_USE_PSA_CRYPTO` is defined)  For example, when parsing a large
 number, N, of X.509 certificates (for example the list of trusted roots), it
 might be undesirable to use N PSA key slots for their public keys as long as
 the certs are loaded. OTOH, this could also be addressed by merging the "X.509
 parsing on-demand" (#2478), and then the public key data would be held as
 bytes in the X.509 CRT structure, and only moved to a PK context / PSA slot
 when it's actually used.

 Note: the PK layer actually consists of two relatively distinct parts: crypto
 operations, which will be covered by PSA, and parsing/writing (exporting)
 from/to various formats, which is currently not fully covered by the PSA
 Crypto API.

 ### Algorithm identifiers and other identifiers

 It should be easy to provide the user with a bunch of `#define`s for algorithm
 identifiers, for example `#define MBEDTLS_MD_SHA256 PSA_ALG_SHA_256`; most of
 those would be in the MD, Cipher and PK compatibility layers mentioned above,
 but there might be some in other modules that may be worth considering, for
 example identifiers for elliptic curves.

 ### Lower layers

 Generally speaking, we would retire all of the low-level, non-generic modules,
 such as AES, SHA-256, RSA, DHM, ECDH, ECP, bignum, etc, without providing
 compatibility APIs for them. People would be encouraged to switch to the PSA
 API. (The compatibility implementation of the existing PK, MD, Cipher APIs
 would mostly benefit people who already used those generic APis rather than
 the low-level, alg-specific ones.)

 ### APIs in TLS and X.509

 Public APIs in TLS and X.509 may be affected by the migration in at least two
 ways:

 1. APIs that rely on a legacy `mbedtls_` crypto type: for example
    `mbedtls_ssl_conf_own_cert()` to configure a (certificate and the
 associated) private key. Currently the private key is passed as a
 `mbedtls_pk_context` object, which would probably change to a `psa_key_id_t`.
 Since some users would probably still be using the compatibility PK layer, it
 would need a way to easily extract the PSA key ID from the PK context.

 2. APIs the accept list of identifiers: for example
    `mbedtls_ssl_conf_curves()` taking a list of `mbedtls_ecp_group_id`s. This
 could be changed to accept a list of pairs (`psa_ecc_family_t`, size) but we
 should probably take this opportunity to move to a identifier independent from
 the underlying crypto implementation and use TLS-specific identifiers instead
 (based on IANA values or custom enums), as is currently done in the new
 `mbedtls_ssl_conf_groups()` API, see #4859).

 Testing
 -------

 An question that needs careful consideration when we come around to removing
 the low-level crypto APIs and making PK, MD and Cipher optional compatibility
 layers is to be sure to preserve testing quality. A lot of the existing test
 cases use the low level crypto APIs; we would need to either keep using that
 API for tests, or manually migrate tests to the PSA Crypto API. Perhaps a
 combination of both, perhaps evolving gradually over time.