Comparison

This is implemented in comparison.{h,cc}.

Graph comparison is the basis of all STG difference reporting.

All the various STG node attributes (such as size or name) and edge identifiers (whether explicit or not, such as function parameter index) can be reduced to generic labels. This (over)simplification reduces the problem to solve to:

  • given two rooted directed graphs, containing labelled nodes and edges
  • generate a graph that encapsulates all the differences found by following matching pairs of edges

Report generation from such a graph is the subject of Reporting.

STG compares the graphs starting at the root nodes and recursing along edges with matching labels. The recursion stops at incomparable nodes (such as a struct and int). Otherwise a comparison specific to the kind of node is performed.

Given that the graph will in general not be a tree and may contain cycles, STG

  • memoises comparisons with known results
  • detects comparison cycles and ensures correct termination and propagation of results

Overall, this ensures that any given pair of nodes (one from each graph) is compared at most once. In the case of a small change to a typical ABI graph of size N, O(N) node comparisons are performed.

Ignoring certain kinds of differences

It has proved useful to add selective suppression of certain kinds of differences, for distinct purposes.

ignoredirectionalitypurpose
interface additionasymmetriccompatibility checking
type definition additionasymmetriccompatibility checking
primitive type encodingsymmetriccross comparison noise
: : : reduction :
member sizesymmetriccross comparison noise
: : : reduction :
enum underlying typesymmetriccross comparison noise
: : : reduction :
qualifiersymmetriccross comparison noise
: : : reduction :
symbol CRCsymmetriccross comparison noise
: : : reduction :
symbol type presencesymmetriclibabigail XML comparison
: : : noise reduction :
type declaration statussymmetriclibabigail XML comparison
: : : noise reduction :

The first two options can be used to test whether one ABI is a subset of another.

It can be useful to cross compare ABIs extracted in different ways to validate the fidelity of one ABI source against another. Where the models or implementations differ systematically, suppressing those differences will make the remainder more obvious.

The libabigail versions used in Android‘s GKI project often generated ABIs with spurious differences due to the disappearance (or reappearance) of type definitions and (occasionally) symbol types. The corresponding ignore options replicate the behaviour of libabigail’s abidiff.

Differences and difference graphs

When comparing a pair of nodes, each difference falls into one of the following categories:

  • node label - a purely local difference
  • matching edge - a recursive difference found by following edges with matching labels
  • added or removed labelled edges, where edges are identified by label - modelled as a recursive difference with a node absent on one side

Each node in an STG difference graph is one of the following[^1]:

  • a node removal or addition, containing
    • a reference to either a node in first graph or one in the second
  • a node change, containing
    • a reference to two nodes, one in each of the two graphs
    • a possibly-empty list of differences which can each be one of
      • a node attribute difference in the form of some informative text
      • an edge difference in the form of a link to a difference node

[^1]: STG models comparisons as pairs of nodes where either node can be absent. The absent-absent comparison is used to represent “no change”. All such edges, except those representing the root of a “no change” graph comparison, are pruned during diff graph creation.

Note that STG‘s difference nodes are unkinded, there is only one kind of difference node, unlike STG’s data nodes where there is a separate kind of node for each kind of C type etc.

Matching and comparing collections of edges

While an algorithm based on generic label comparison will work, there are a couple of issues:

  • collections of outgoing edges may not have an obvious label to assign (multiple anonymous members of a struct, for example)
  • edges are often ordered (parameters and members, for example) and we want to preserve this order when reporting differences

Comparing two pointer types is straightforward, just compare the pointed-to types. However, symbol tables, function arguments and struct members all require comparisons of multiple entities simultaneously.

STG compares edge aggregates as follows:

  • arrays (like function arguments): by index, comparing recursively items with the same index, reporting removals and additions of the remaining items
  • maps (like the symbol table): by key, comparing recursively items with matching keys, reporting removals and additions of unmatched items
  • otherwise synthesise a key for comparison, compare by key, report differences in the original order of items being compared (favouring the second list‘s order over the first’s); the reordering is an O(n²) operation and it might be possible to adapt the more general Myer's diff algorithm to reduce this

Implementation Details

Comparison is mostly done pair-wise recursively with a DFS, by the function object CompareWorker and with the help of the SCC finder.

The algorithm divides responsibility between operator()(Id, Id) and various operator()(Node, Node) methods. There are also various helpers in and outside CompareWorker.

The Result type encapsulates the difference between two nodes being compared. It contains both a list (Diff) of differences (DiffDetail) and a boolean equality outcome. The latter is used to propagate inequality information in the presence of cycles in the diff comparison graph.

operator()(Node, Node)

Specialised for each Node type, these methods have the job of computing local differences, matching edges and obtaining edge differences from recursive calls to operator()(Id, Id) (or Removed and Added, if edge labels are unmatched).

Local differences can easily be rendered as text, but edge differences need recursive calls, the results of which are merged into the local differences Result with helper methods.

These methods form the bulk of the comparison code, so in general we want them to be as small as possible, containing no boilerplate and simply mirroring the node data. The helper functions were therefore chosen for power, laziness and concision.

Added and Removed

These take care of comparisons where one side is absent.

There are several reasons for not folding this functionality into operator(Id, Id) itself:

  • it would result in unnecessary extra work as its callers would need to pack and the function would need to unpack optional<Id> arguments
  • added and removed nodes have none of the other interesting features that it handles
  • Added and Removed don't need to decorate their return values with any difference information

Defined

This takes care of comparisons of user-defined types which may be forward declarations or full definitions.

Nodes

These take care of comparisons of sequences of arbitrary nodes (or enumerators).

STG uses “matching keys” to reduce the problem of comparing sequences to comparing maps. It attempts to preserve original sequence order as described in order.h.

operator(Id, Id)

This controls recursion and handles some special cases before delegating to some operator()(Node, Node) in the “normal” case.

It takes care of the following:

  • revisited, completed comparisons
  • revisited, in-progress comparison
  • qualified types
  • typedefs
  • incomparable and comparable nodes
  • comparison cycles

Note that the non-trivial special cases relating to typedefs and qualified types (and their current concrete representations) require non-parallel traversals of the graphs being compared; this is the only place where the comparison is not purely structural.

Revisited Nodes and Recursive Comparison

STG comparison relies on SCC to control recursion and behaviour in the face of graphs containing arbitrary cycles. Any difference found affecting one node in a comparison cycle affects all nodes in that cycle.

Excluding the two special cases documented in the following sections, the comparison steps are approximately:

  1. if the comparison already has a known result then return this
  2. if the comparison already is in progress then return a potential difference
  3. start node visit, register the node with the SCC finder
    1. (special handling of qualified types and typedefs)
    2. incomparable nodes (such as a struct and an int) go to Mismatch which returns a difference; there is no further recursion
    3. otherwise delegate node comparison to a node-specific function; with possible recursion
    4. the comparison result here is tentative, due to potential cycles
  4. finish node visit, informing the SCC finder
  5. if an SCC was closed, we've just finished its root comparison
    1. root compared equal? discard unwanted potential differences
    2. difference found? record confirmed differences
    3. record all its comparison results as final
  6. return result (which will be final if an SCC was closed)

Typedefs

This special handling is subject to change.

  • typedef foo bar ⇔ foo

Typedefs are named type aliases which cannot refer to themselves or later defined types. The referred-to type is exactly identical to the typedef. So for difference finding, typedefs should just be resolved and skipped over. However, for reporting, it may be still useful to say where a difference came from. This requires extra handling to collect the typedef names on each side of the comparison, when there is something to report.

If operator()(Id, Id) sees a comparison involving a typedef, it resolves typedef chains on both sides and keeps track of the names. Then it calls itself recursively. If the result is no-diff, it returns no-diff, otherwise, it reports the differences between the types at the end of the typedef chains.

An alternative would be to genuinely just follow the epsilons. Store the typedefs in the diff tree but record the comparison of what they resolve to. The presentation layer can decorate the comparison text with resolution chains.

Note that qualified typedefs present extra complications.

Qualified Types

This special handling is subject to change.

  • constvolatile → foo ⇔ volatileconst → foo

STG currently represents type qualifiers as separate, individual nodes. They are relevant for finding differences but there may be no guarantee of the order in which they will appear. For diff reporting, STG currently reports added and removed qualifiers but also compares the underlying types.

This implies that when faced with a comparison involving a qualifier, operator()(Id, Id) should collect and compare all qualifiers on both sides and treat the types as compound objects consisting of their qualifiers and the underlying types, either or both of which may have differences to report. Comparing the underlying types requires recursive calls.

Note that qualified typedefs present extra complications.

Qualified typedefs

STG does not currently do anything special for qualified typedefs which can have subtle and surprising behaviours. For example:

Before:

const int quux;

After 1:

typedef int foo;
const foo quux;

After 2:

typedef const int foo;
foo quux;

After 3:

typedef const int foo;
const foo quux;

In all cases above, the type of quux is unchanged. These examples strongly suggest that a better model of C types would involve tracking qualification as a decoration present on every type node, including typedefs.

Note that this behaviour implies C's type system is not purely constructive as there is machinery to discard duplicate qualifiers which would be illegal elsewhere.

For the moment, we can pretend that outer qualifications are always significant, even though they may be absorbed by inner ones, and risk occasional false positives.

A worse case is:

Before:

const int quux[];

After 1:

typedef int foo[];
const foo quux;

After 2:

typedef const int foo[];
foo quux;

After 3:

typedef const int foo[];
const foo quux;

All the quux are identically typed. There is an additional wart that what would normally be illegal qualifiers on an array type instead decorate its element type.

Finally, worst is:

Before:

const int quux();

After 1:

typedef int foo();
const foo quux;

After 2:

typedef const int foo();
foo quux;

After 3:

typedef const int foo();
const foo quux;

The two const foo quux cases invoke undefined behaviour. The consistently crazy behaviour would have been to decorate the return type instead.

The “worstest” case is GCC allowing the specification of typedef alignment different to the defining type. This abomination should not exist and should never be used. Alignment is not currently modelled by STG.

Diff helpers

These are mainly used by the CompareWorker::operator()(Node, Node) methods.

  • MarkIncomparable - nodes are just different
  • AddNodeDiff - add node difference, unconditionally
  • AddEdgeDiff - add edge difference (addition or removal), unconditionally
  • MaybeAddNodeDiff - add node difference (label change), conditionally
  • MaybeAddEdgeDiff - add matching edge recursive difference, conditionally

Variants are possible where text is generated lazily on a recursive diff being found, as are ones where labels are compared and serialised only if different.

Diff Presentation

In general, there are two problems to solve:

  • generating suitable text, for
    • nodes and edges
    • node and edge differences
  • building a report with some meaningful structure

Node and edge description and report structure are the responsibility of the reporting code. See Naming for more detailed notes on node description, mainly C type name syntax.

Several report formats are supported and the simplest is (omitting various complications) a rendering of a difference graph as a difference tree where revisiting nodes is avoided by reporting 2 additional artificial kinds of difference:

  1. already reported - to handle diff sharing
  2. being compared - to handle diff cycles

The various formats are not documented further here.

Finally, node and edge difference description is currently the responsibility of the comparison code. This may change in the future, but might require a typed difference graph.