Tutorial

The simplest use for the STG tools is to extract, store and compare ABI representations.

This tutorial uses long options throughout. Equivalent short options can be found in the manual pages for stg and stgdiff. Both tools understand - as a shorthand for /dev/stdout.

This small code sample will be used as a working example. Copy it into a file called tree.c.

struct N {
  struct N * left;
  struct N * right;
  int value;
};

unsigned int count(struct N * tree) {
  return tree ? count(tree->left) + count(tree->right) + 1 : 0;
}

int sum(struct N * tree) {
  return tree ? sum(tree->left) + sum(tree->right) + tree->value : 0;
}

Compile it:

gcc -Wall -Wextra -g -c tree.c -o tree.o

Extraction from ELF / DWARF

stg is the tool for extracting ABI representations, though it can do more sophisticated things as well. The simplest invocation of stg looks something like this:

stg --elf library.so --output library.stg

Adding the --annotate option can be useful, especially if trying to debug ABI issues or when experimenting with the tools, like now.

If the output consists of just symbols and you get a warning about missing DWARF information, this means that library.so has no DWARF debugging information. For meaningful results, stg should be run on an unstripped ELF file which may require build system adjustments.

Run this:

stg --elf tree.o --annotate --output -

And you should get something like this:

version: 0x00000002
root_id: 0x84ea5130  # interface
pointer_reference {
  id: 0x32b38621
  kind: POINTER
  pointee_type_id: 0xe08efe1a  # struct N
}
primitive {
  id: 0x4585663f
  name: "unsigned int"
  encoding: UNSIGNED_INTEGER
  bytesize: 0x00000004
}
primitive {
  id: 0x6720d32f
  name: "int"
  encoding: SIGNED_INTEGER
  bytesize: 0x00000004
}
member {
  id: 0x35cbdb23
  name: "left"
  type_id: 0x32b38621  # struct N*
}
member {
  id: 0x0b440ffb
  name: "right"
  type_id: 0x32b38621  # struct N*
  offset: 64
}
member {
  id: 0xa06f75d5
  name: "value"
  type_id: 0x6720d32f  # int
  offset: 128
}
struct_union {
  id: 0xe08efe1a
  kind: STRUCT
  name: "N"
  definition {
    bytesize: 24
    member_id: 0x35cbdb23  # struct N* left
    member_id: 0x0b440ffb  # struct N* right
    member_id: 0xa06f75d5  # int value
  }
}
function {
  id: 0x912c02a7
  return_type_id: 0x6720d32f  # int
  parameter_id: 0x32b38621  # struct N*
}
function {
  id: 0xc2779f73
  return_type_id: 0x4585663f  # unsigned int
  parameter_id: 0x32b38621  # struct N*
}
elf_symbol {
  id: 0xbb237197
  name: "count"
  is_defined: true
  symbol_type: FUNCTION
  type_id: 0xc2779f73  # unsigned int(struct N*)
  full_name: "count"
}
elf_symbol {
  id: 0x4fdeca38
  name: "sum"
  is_defined: true
  symbol_type: FUNCTION
  type_id: 0x912c02a7  # int(struct N*)
  full_name: "sum"
}
interface {
  id: 0x84ea5130
  symbol_id: 0xbb237197  # unsigned int count(struct N*)
  symbol_id: 0x4fdeca38  # int sum(struct N*)
}

Filtering

One issue when first starting to manage the ABI of a binary is the wish to restrict the interface surface to just the necessary minimum. Any superfluous symbols or type definitions in the ABI representation can result in spurious ABI differences in reports later on.

When it comes to the symbols exposed, it's common to control symbol visibility. Type definitions can be either exposed in public header files or hidden in private header files, with perhaps only public forward declarations, but this does not remove any type definitions in the DWARF information.

STG provides filtering facilities for both symbols and types, for example:

stg --files '*.h' --elf library.so --output library.stg

This will ensure that definitions of any types defined outside any header files, and perhaps used as opaque pointer handles, are omitted from the ABI representation. If you separate public and private headers, then use an appropriate glob pattern that distinguishes the two.

Sets of symbol or file names can be read from a file. In this example, all symbols whose names begin with api_, except those in the obsolete file, are kept.

stg --symbols 'api_* & ! :obsolete' --elf library.so --output library.stg

For historical reasons, the literal filter file format is compatible with libabigail's symbol list one, but this is subject to change.

[list]
 # one symbol per line
 foo # comments, whitespace and empty lines are all ignored
 bar

 baz

Let's say that struct N is supposed to be an opaque type that user code only gets pointers to and, additionally, the function count should be excluded from the ABI (perhaps due to an argument over its return type). We can exclude the definition of struct N, along with that of any other types defined in tree.c, using a file filter. The symbol can be excluded by name.

Run this:

stg --elf tree.o --files '*.h' --symbols '!count' --output -

The result should be something like this:

version: 0x00000002
root_id: 0x84ea5130
pointer_reference {
  id: 0x26944aa7
  kind: POINTER
  pointee_type_id: 0xb011cc02
}
primitive {
  id: 0x6720d32f
  name: "int"
  encoding: SIGNED_INTEGER
  bytesize: 0x00000004
}
struct_union {
  id: 0xb011cc02
  kind: STRUCT
  name: "N"
}
function {
  id: 0x9425f186
  return_type_id: 0x6720d32f
  parameter_id: 0x26944aa7
}
elf_symbol {
  id: 0x4fdeca38
  name: "sum"
  is_defined: true
  symbol_type: FUNCTION
  type_id: 0x9425f186
  full_name: "sum"
}
interface {
  id: 0x84ea5130
  symbol_id: 0x4fdeca38
}

ABI Comparison

stgdiff is the tool for comparing ABI representations and reporting differences, though it has some other, more specialised, uses. The simplest invocation of stgdiff looks something like this:

stgdiff --stg old/library.stg new/library.stg --output -

This will report ABI differences in the default (small) format.

The function sum has a type that depends on struct N. Any change to either might affect the ABI exposed via sum. For example, if the type of the value member is changed to short and the file is recompiled, STG can detect this difference.

First rerun the STG extraction, specifying --output tree-old.stg. Make the source code change, recompile and extract the ABI with --output tree-new.stg.

Then run this:

stgdiff --stg tree-old.stg tree-new.stg --output -

To get this:

type 'struct N' changed
  member changed from 'int value' to 'short int value'
    type changed from 'int' to 'short int'

The small format omits parts of the ABI graph which haven't changed.[^1] To see all impacted nodes, use --format flat instead.

[^1]: The similarly named short format goes a bit further and will omit and summarise certain repetitive differences.

function symbol 'int sum(struct N*)' changed
  type 'int(struct N*)' changed
    parameter 1 type 'struct N*' changed
      pointed-to type 'struct N' changed

type 'struct N' changed
  member 'struct N* left' changed
    type 'struct N*' changed
      pointed-to type 'struct N' changed
  member 'struct N* right' changed
    type 'struct N*' changed
      pointed-to type 'struct N' changed
  member changed from 'int value' to 'short int value'
    type changed from 'int' to 'short int'

And if you really want to see more of the graph structure, use --format plain.

function symbol 'int sum(struct N*)' changed
  type 'int(struct N*)' changed
    parameter 1 type 'struct N*' changed
      pointed-to type 'struct N' changed
        member 'struct N* left' changed
          type 'struct N*' changed
            pointed-to type 'struct N' changed
              (being reported)
        member 'struct N* right' changed
          type 'struct N*' changed
            pointed-to type 'struct N' changed
              (being reported)
        member changed from 'int value' to 'short int value'
          type changed from 'int' to 'short int'

Or just use --format viz which generates input for Graphviz.

digraph "ABI diff" {
  "0" [shape=rectangle, label="'interface'"]
  "1" [label="'int sum(struct N*)'"]
  "2" [label="'int(struct N*)'"]
  "3" [label="'struct N*'"]
  "4" [shape=rectangle, label="'struct N'"]
  "5" [label="'struct N* left'"]
  "5" -> "3" [label=""]
  "4" -> "5" [label=""]
  "6" [label="'struct N* right'"]
  "6" -> "3" [label=""]
  "4" -> "6" [label=""]
  "7" [label="'int value' → 'short int value'"]
  "8" [color=red, label="'int' → 'short int'"]
  "7" -> "8" [label=""]
  "4" -> "7" [label=""]
  "3" -> "4" [label="pointed-to"]
  "2" -> "3" [label="parameter 1"]
  "1" -> "2" [label=""]
  "0" -> "1" [label=""]
}