This is implemented in comparison.{h,cc}
.
Graph comparison is the basis of all STG difference reporting.
All the various STG node attributes (such as size
or name
) and edge identifiers (whether explicit or not, such as function parameter index) can be reduced to generic labels. This (over)simplification reduces the problem to solve to:
Report generation from such a graph is the subject of Reporting.
STG compares the graphs starting at the root nodes and recursing along edges with matching labels. The recursion stops at incomparable nodes (such as a struct
and int
). Otherwise a comparison specific to the kind of node is performed.
Given that the graph will in general not be a tree and may contain cycles, STG
Overall, this ensures that any given pair of nodes (one from each graph) is compared at most once. In the case of a small change to a typical ABI graph of size N, O(N) node comparisons are performed.
It has proved useful to add selective suppression of certain kinds of differences, for distinct purposes.
ignore | directionality | purpose |
---|---|---|
interface addition | asymmetric | compatibility checking |
type definition addition | asymmetric | compatibility checking |
primitive type encoding | symmetric | cross comparison noise |
: : : reduction : | ||
member size | symmetric | cross comparison noise |
: : : reduction : | ||
enum underlying type | symmetric | cross comparison noise |
: : : reduction : | ||
qualifier | symmetric | cross comparison noise |
: : : reduction : | ||
symbol CRC | symmetric | cross comparison noise |
: : : reduction : | ||
symbol type presence | symmetric | libabigail XML comparison |
: : : noise reduction : | ||
type declaration status | symmetric | libabigail XML comparison |
: : : noise reduction : |
The first two options can be used to test whether one ABI is a subset of another.
It can be useful to cross compare ABIs extracted in different ways to validate the fidelity of one ABI source against another. Where the models or implementations differ systematically, suppressing those differences will make the remainder more obvious.
The libabigail versions used in Android‘s GKI project often generated ABIs with spurious differences due to the disappearance (or reappearance) of type definitions and (occasionally) symbol types. The corresponding ignore options replicate the behaviour of libabigail’s abidiff
.
When comparing a pair of nodes, each difference falls into one of the following categories:
Each node in an STG difference graph is one of the following[^1]:
[^1]: STG models comparisons as pairs of nodes where either node can be absent. The absent-absent comparison is used to represent “no change”. All such edges, except those representing the root of a “no change” graph comparison, are pruned during diff graph creation.
Note that STG‘s difference nodes are unkinded, there is only one kind of difference node, unlike STG’s data nodes where there is a separate kind of node for each kind of C type etc.
While an algorithm based on generic label comparison will work, there are a couple of issues:
struct
, for example)Comparing two pointer types is straightforward, just compare the pointed-to types. However, symbol tables, function arguments and struct members all require comparisons of multiple entities simultaneously.
STG compares edge aggregates as follows:
Comparison is mostly done pair-wise recursively with a DFS, by the function object CompareWorker
and with the help of the SCC finder.
The algorithm divides responsibility between operator()(Id, Id)
and various operator()(Node, Node)
methods. There are also various helpers in and outside CompareWorker
.
The Result
type encapsulates the difference between two nodes being compared. It contains both a list (Diff
) of differences (DiffDetail
) and a boolean equality outcome. The latter is used to propagate inequality information in the presence of cycles in the diff comparison graph.
operator()(Node, Node)
Specialised for each Node
type, these methods have the job of computing local differences, matching edges and obtaining edge differences from recursive calls to operator()(Id, Id)
(or Removed
and Added
, if edge labels are unmatched).
Local differences can easily be rendered as text, but edge differences need recursive calls, the results of which are merged into the local differences Result
with helper methods.
These methods form the bulk of the comparison code, so in general we want them to be as small as possible, containing no boilerplate and simply mirroring the node data. The helper functions were therefore chosen for power, laziness and concision.
Added
and Removed
These take care of comparisons where one side is absent.
There are several reasons for not folding this functionality into operator(Id, Id)
itself:
optional<Id>
argumentsAdded
and Removed
don't need to decorate their return values with any difference informationDefined
This takes care of comparisons of user-defined types which may be forward declarations or full definitions.
Nodes
These take care of comparisons of sequences of arbitrary nodes (or enumerators).
STG uses “matching keys” to reduce the problem of comparing sequences to comparing maps. It attempts to preserve original sequence order as described in order.h
.
operator(Id, Id)
This controls recursion and handles some special cases before delegating to some operator()(Node, Node)
in the “normal” case.
It takes care of the following:
Note that the non-trivial special cases relating to typedefs and qualified types (and their current concrete representations) require non-parallel traversals of the graphs being compared; this is the only place where the comparison is not purely structural.
STG comparison relies on SCC
to control recursion and behaviour in the face of graphs containing arbitrary cycles. Any difference found affecting one node in a comparison cycle affects all nodes in that cycle.
Excluding the two special cases documented in the following sections, the comparison steps are approximately:
struct
and an int
) go to Mismatch
which returns a difference; there is no further recursionThis special handling is subject to change.
typedef
foo bar ⇔ fooTypedefs are named type aliases which cannot refer to themselves or later defined types. The referred-to type is exactly identical to the typedef. So for difference finding, typedefs should just be resolved and skipped over. However, for reporting, it may be still useful to say where a difference came from. This requires extra handling to collect the typedef names on each side of the comparison, when there is something to report.
If operator()(Id, Id)
sees a comparison involving a typedef, it resolves typedef chains on both sides and keeps track of the names. Then it calls itself recursively. If the result is no-diff, it returns no-diff, otherwise, it reports the differences between the types at the end of the typedef chains.
An alternative would be to genuinely just follow the epsilons. Store the typedefs in the diff tree but record the comparison of what they resolve to. The presentation layer can decorate the comparison text with resolution chains.
Note that qualified typedefs present extra complications.
This special handling is subject to change.
const
→ volatile
→ foo ⇔ volatile
→ const
→ fooSTG currently represents type qualifiers as separate, individual nodes. They are relevant for finding differences but there may be no guarantee of the order in which they will appear. For diff reporting, STG currently reports added and removed qualifiers but also compares the underlying types.
This implies that when faced with a comparison involving a qualifier, operator()(Id, Id)
should collect and compare all qualifiers on both sides and treat the types as compound objects consisting of their qualifiers and the underlying types, either or both of which may have differences to report. Comparing the underlying types requires recursive calls.
Note that qualified typedefs present extra complications.
STG does not currently do anything special for qualified typedefs which can have subtle and surprising behaviours. For example:
Before:
const int quux;
After 1:
typedef int foo; const foo quux;
After 2:
typedef const int foo; foo quux;
After 3:
typedef const int foo; const foo quux;
In all cases above, the type of quux
is unchanged. These examples strongly suggest that a better model of C types would involve tracking qualification as a decoration present on every type node, including typedefs.
Note that this behaviour implies C's type system is not purely constructive as there is machinery to discard duplicate qualifiers which would be illegal elsewhere.
For the moment, we can pretend that outer qualifications are always significant, even though they may be absorbed by inner ones, and risk occasional false positives.
A worse case is:
Before:
const int quux[];
After 1:
typedef int foo[]; const foo quux;
After 2:
typedef const int foo[]; foo quux;
After 3:
typedef const int foo[]; const foo quux;
All the quux
are identically typed. There is an additional wart that what would normally be illegal qualifiers on an array type instead decorate its element type.
Finally, worst is:
Before:
const int quux();
After 1:
typedef int foo(); const foo quux;
After 2:
typedef const int foo(); foo quux;
After 3:
typedef const int foo(); const foo quux;
The two const foo quux
cases invoke undefined behaviour. The consistently crazy behaviour would have been to decorate the return type instead.
The “worstest” case is GCC allowing the specification of typedef alignment different to the defining type. This abomination should not exist and should never be used. Alignment is not currently modelled by STG.
These are mainly used by the CompareWorker::operator()(Node, Node)
methods.
MarkIncomparable
- nodes are just differentAddNodeDiff
- add node difference, unconditionallyAddEdgeDiff
- add edge difference (addition or removal), unconditionallyMaybeAddNodeDiff
- add node difference (label change), conditionallyMaybeAddEdgeDiff
- add matching edge recursive difference, conditionallyVariants are possible where text is generated lazily on a recursive diff being found, as are ones where labels are compared and serialised only if different.
In general, there are two problems to solve:
Node and edge description and report structure are the responsibility of the reporting code. See Naming for more detailed notes on node description, mainly C type name syntax.
Several report formats are supported and the simplest is (omitting various complications) a rendering of a difference graph as a difference tree where revisiting nodes is avoided by reporting 2 additional artificial kinds of difference:
The various formats are not documented further here.
Finally, node and edge difference description is currently the responsibility of the comparison code. This may change in the future, but might require a typed difference graph.