| ============================================ |
| Implementation plans for ``-fbounds-safety`` |
| ============================================ |
| |
| .. contents:: |
| :local: |
| |
| Gradual updates with experimental flag |
| ====================================== |
| |
| The feature will be implemented as a series of smaller PRs and we will guard our |
| implementation with an experimental flag ``-fexperimental-bounds-safety`` until |
| the usable model is fully available. Once the model is ready for use, we will |
| expose the flag ``-fbounds-safety``. |
| |
| Possible patch sets |
| ------------------- |
| |
| * External bounds annotations and the (late) parsing logic. |
| * Internal bounds annotations (wide pointers) and their parsing logic. |
| * Clang code generation for wide pointers with debug information. |
| * Pointer cast semantics involving bounds annotations (this could be divided |
| into multiple sub-PRs). |
| * CFG analysis for pairs of related pointer and count assignments and the likes. |
| * Bounds check expressions in AST and the Clang code generation (this could also |
| be divided into multiple sub-PRs). |
| |
| Proposed implementation |
| ======================= |
| |
| External bounds annotations |
| --------------------------- |
| |
| The bounds annotations are C type attributes appertaining to pointer types. If |
| an attribute is added to the position of a declaration attribute, e.g., ``int |
| *ptr __counted_by(size)``, the attribute appertains to the outermost pointer |
| type of the declaration (``int *``). |
| |
| New sugar types |
| --------------- |
| |
| An external bounds annotation creates a type sugar of the underlying pointer |
| types. We will introduce a new sugar type, ``DynamicBoundsPointerType`` to |
| represent ``__counted_by`` or ``__sized_by``. Using ``AttributedType`` would not |
| be sufficient because the type needs to hold the count or size expression as |
| well as some metadata necessary for analysis, while this type may be implemented |
| through inheritance from ``AttributedType``. Treating the annotations as type |
| sugars means two types with incompatible external bounds annotations may be |
| considered canonically the same types. This is sometimes necessary, for example, |
| to make the ``__counted_by`` and friends not participate in function |
| overloading. However, this design requires a separate logic to walk through the |
| entire type hierarchy to check type compatibility of bounds annotations. |
| |
| Late parsing for C |
| ------------------ |
| |
| A bounds annotation such as ``__counted_by(count)`` can be added to type of a |
| struct field declaration where count is another field of the same struct |
| declared later. Similarly, the annotation may apply to type of a function |
| parameter declaration which precedes the parameter count in the same function. |
| This means parsing the argument of bounds annotations must be done after the |
| parser has the whole context of a struct or a function declaration. Clang has |
| late parsing logic for C++ declaration attributes that require late parsing, |
| while the C declaration attributes and C/C++ type attributes do not have the |
| same logic. This requires introducing late parsing logic for C/C++ type |
| attributes. |
| |
| Internal bounds annotations |
| --------------------------- |
| |
| ``__indexable`` and ``__bidi_indexable`` alter pointer representations to be |
| equivalent to a struct with the pointer and the corresponding bounds fields. |
| Despite this difference in their representations, they are still pointers in |
| terms of types of operations that are allowed and their semantics. For instance, |
| a pointer dereference on a ``__bidi_indexable`` pointer will return the |
| dereferenced value same as plain C pointers, modulo the extra bounds checks |
| being performed before dereferencing the wide pointer. This means mapping the |
| wide pointers to struct types with equivalent layout won’t be sufficient. To |
| represent the wide pointers in Clang AST, we add an extra field in the |
| PointerType class to indicate the internal bounds of the pointer. This ensures |
| pointers of different representations are mapped to different canonical types |
| while they are still treated as pointers. |
| |
| In LLVM IR, wide pointers will be emitted as structs of equivalent |
| representations. Clang CodeGen will handle them as Aggregate in |
| ``TypeEvaluationKind (TEK)``. ``AggExprEmitter`` was extended to handle pointer |
| operations returning wide pointers. Alternatively, a new ``TEK`` and an |
| expression emitter dedicated to wide pointers could be introduced. |
| |
| Default bounds annotations |
| -------------------------- |
| |
| The model may implicitly add ``__bidi_indexable`` or ``__single`` depending on |
| the context of the declaration that has the pointer type. ``__bidi_indexable`` |
| implicitly adds to local variables, while ``__single`` implicitly adds to |
| pointer types specifying struct fields, function parameters, or global |
| variables. This means the parser may first create the pointer type without any |
| default pointer attribute and then recreate the type once the parser has the |
| declaration context and determined the default attribute accordingly. |
| |
| This also requires the parser to reset the type of the declaration with the |
| newly created type with the right default attribute. |
| |
| Promotion expression |
| -------------------- |
| |
| A new expression will be introduced to represent the conversion from a pointer |
| with an external bounds annotation, such as ``__counted_by``, to |
| ``__bidi_indexable``. This type of conversion cannot be handled by normal |
| CastExprs because it requires an extra subexpression(s) to provide the bounds |
| information necessary to create a wide pointer. |
| |
| Bounds check expression |
| ----------------------- |
| |
| Bounds checks are part of semantics defined in the ``-fbounds-safety`` language |
| model. Hence, exposing the bounds checks and other semantic actions in the AST |
| is desirable. A new expression for bounds checks has been added to the AST. The |
| bounds check expression has a ``BoundsCheckKind`` to indicate the kind of checks |
| and has the additional sub-expressions that are necessary to perform the check |
| according to the kind. |
| |
| Paired assignment check |
| ----------------------- |
| |
| ``-fbounds-safety`` enforces that variables or fields related with the same |
| external bounds annotation (e.g., ``buf`` and ``count`` related with |
| ``__counted_by`` in the example below) must be updated side by side within the |
| same basic block and without side effect in between. |
| |
| .. code-block:: c |
| |
| typedef struct { |
| int *__counted_by(count) buf; size_t count; |
| } sized_buf_t; |
| |
| void alloc_buf(sized_buf_t *sbuf, sized_t nelems) { |
| sbuf->buf = (int *)malloc(sizeof(int) * nelems); |
| sbuf->count = nelems; |
| } |
| |
| To implement this rule, the compiler requires a linear representation of |
| statements to understand the ordering and the adjacency between the two or more |
| assignments. The Clang CFG is used to implement this analysis as Clang CFG |
| provides a linear view of statements within each ``CFGBlock`` (Clang |
| ``CFGBlock`` represents a single basic block in a source-level CFG). |
| |
| Bounds check optimizations |
| -------------------------- |
| |
| In ``-fbounds-safety``, the Clang frontend emits run-time checks for every |
| memory dereference if the type system or analyses in the frontend couldn’t |
| verify its bounds safety. The implementation relies on LLVM optimizations to |
| remove redundant run-time checks. Using this optimization strategy, if the |
| original source code already has bounds checks, the fewer additional checks |
| ``-fbounds-safety`` will introduce. The LLVM ``ConstraintElimination`` pass is |
| design to remove provable redundant checks (please check Florian Hahn’s |
| presentation in 2021 LLVM Dev Meeting and the implementation to learn more). In |
| the following example, ``-fbounds-safety`` implicitly adds the redundant bounds |
| checks that the optimizer can remove: |
| |
| .. code-block:: c |
| |
| void fill_array_with_indices(int *__counted_by(count) p, size_t count) { |
| for (size_t i = 0; i < count; ++i) { |
| // implicit bounds checks: |
| // if (p + i < p || p + i + 1 > p + count) trap(); |
| p[i] = i; |
| } |
| } |
| |
| ``ConstraintElimination`` collects the following facts and determines if the |
| bounds checks can be safely removed: |
| |
| * Inside the for-loop, ``0 <= i < count``, hence ``1 <= i + 1 <= count``. |
| * Pointer arithmetic ``p + count`` in the if-condition doesn’t wrap. |
| * ``-fbounds-safety`` treats pointer arithmetic overflow as deterministically |
| two’s complement computation, not an undefined behavior. Therefore, |
| getelementptr does not typically have inbounds keyword. However, the compiler |
| does emit inbounds for ``p + count`` in this case because |
| ``__counted_by(count)`` has the invariant that p has at least as many as |
| elements as count. Using this information, ``ConstraintElimination`` is able |
| to determine ``p + count`` doesn’t wrap. |
| * Accordingly, ``p + i`` and ``p + i + 1`` also don’t wrap. |
| * Therefore, ``p <= p + i`` and ``p + i + 1 <= p + count``. |
| * The if-condition simplifies to false and becomes dead code that the subsequent |
| optimization passes can remove. |
| |
| ``OptRemarks`` can be utilized to provide insights into performance tuning. It |
| has the capability to report on checks that it cannot eliminate, possibly with |
| reasons, allowing programmers to adjust their code to unlock further |
| optimizations. |
| |
| Debugging |
| ========= |
| |
| Internal bounds annotations |
| --------------------------- |
| |
| Internal bounds annotations change a pointer into a wide pointer. The debugger |
| needs to understand that wide pointers are essentially pointers with a struct |
| layout. To handle this, a wide pointer is described as a record type in the |
| debug info. The type name has a special name prefix (e.g., |
| ``__bounds_safety$bidi_indexable``) which can be recognized by a debug info |
| consumer to provide support that goes beyond showing the internal structure of |
| the wide pointer. There are no DWARF extensions needed to support wide pointers. |
| In our implementation, LLDB recognizes wide pointer types by name and |
| reconstructs them as wide pointer Clang AST types for use in the expression |
| evaluator. |
| |
| External bounds annotations |
| --------------------------- |
| |
| Similar to internal bounds annotations, external bound annotations are described |
| as a typedef to their underlying pointer type in the debug info, and the bounds |
| are encoded as strings in the typedef’s name (e.g., |
| ``__bounds_safety$counted_by:N``). |
| |
| Recognizing ``-fbounds-safety`` traps |
| ------------------------------------- |
| |
| Clang emits debug info for ``-fbounds-safety`` traps as inlined functions, where |
| the function name encodes the error message. LLDB implements a frame recognizer |
| to surface a human-readable error cause to the end user. A debug info consumer |
| that is unaware of this sees an inlined function whose name encodes an error |
| message (e.g., : ``__bounds_safety$Bounds check failed``). |
| |
| Expression Parsing |
| ------------------ |
| |
| In our implementation, LLDB’s expression evaluator does not enable the |
| ``-fbounds-safety`` language option because it’s currently unable to fully |
| reconstruct the pointers with external bounds annotations, and also because the |
| evaluator operates in C++ mode, utilizing C++ reference types, while |
| ``-fbounds-safety`` does not currently support C++. This means LLDB’s expression |
| evaluator can only evaluate a subset of the ``-fbounds-safety`` language model. |
| Specifically, it’s capable of evaluating the wide pointers that already exist in |
| the source code. All other expressions are evaluated according to C/C++ |
| semantics. |
| |
| C++ support |
| =========== |
| |
| C++ has multiple options to write code in a bounds-safe manner, such as |
| following the bounds-safety core guidelines and/or using hardened libc++ along |
| with the `C++ Safe Buffer model |
| <https://discourse.llvm.org/t/rfc-c-buffer-hardening/65734>`_. However, these |
| techniques may require ABI changes and may not be applicable to code |
| interoperating with C. When the ABI of an existing program needs to be preserved |
| and for headers shared between C and C++, ``-fbounds-safety`` offers a potential |
| solution. |
| |
| ``-fbounds-safety`` is not currently supported in C++, but we believe the |
| general approach would be applicable for future efforts. |