The simplest use for the STG tools is to extract, store and compare ABI representations.
This tutorial uses long options throughout. Equivalent short options can be found in the manual pages for stg
and stgdiff
. Both tools understand -
as a shorthand for /dev/stdout
.
This small code sample will be used as a working example. Copy it into a file called tree.c
.
struct N { struct N * left; struct N * right; int value; }; unsigned int count(struct N * tree) { return tree ? count(tree->left) + count(tree->right) + 1 : 0; } int sum(struct N * tree) { return tree ? sum(tree->left) + sum(tree->right) + tree->value : 0; }
Compile it:
gcc -Wall -Wextra -g -c tree.c -o tree.o
stg
is the tool for extracting ABI representations, though it can do more sophisticated things as well. The simplest invocation of stg
looks something like this:
stg --elf library.so --output library.stg
Adding the --annotate
option can be useful, especially if trying to debug ABI issues or when experimenting with the tools, like now.
If the output consists of just symbols and you get a warning about missing DWARF information, this means that library.so
has no DWARF debugging information. For meaningful results, stg
should be run on an unstripped ELF file which may require build system adjustments.
Run this:
stg --elf tree.o --annotate --output -
And you should get something like this:
version: 0x00000002 root_id: 0x84ea5130 # interface pointer_reference { id: 0x32b38621 kind: POINTER pointee_type_id: 0xe08efe1a # struct N } primitive { id: 0x4585663f name: "unsigned int" encoding: UNSIGNED_INTEGER bytesize: 0x00000004 } primitive { id: 0x6720d32f name: "int" encoding: SIGNED_INTEGER bytesize: 0x00000004 } member { id: 0x35cbdb23 name: "left" type_id: 0x32b38621 # struct N* } member { id: 0x0b440ffb name: "right" type_id: 0x32b38621 # struct N* offset: 64 } member { id: 0xa06f75d5 name: "value" type_id: 0x6720d32f # int offset: 128 } struct_union { id: 0xe08efe1a kind: STRUCT name: "N" definition { bytesize: 24 member_id: 0x35cbdb23 # struct N* left member_id: 0x0b440ffb # struct N* right member_id: 0xa06f75d5 # int value } } function { id: 0x912c02a7 return_type_id: 0x6720d32f # int parameter_id: 0x32b38621 # struct N* } function { id: 0xc2779f73 return_type_id: 0x4585663f # unsigned int parameter_id: 0x32b38621 # struct N* } elf_symbol { id: 0xbb237197 name: "count" is_defined: true symbol_type: FUNCTION type_id: 0xc2779f73 # unsigned int(struct N*) full_name: "count" } elf_symbol { id: 0x4fdeca38 name: "sum" is_defined: true symbol_type: FUNCTION type_id: 0x912c02a7 # int(struct N*) full_name: "sum" } interface { id: 0x84ea5130 symbol_id: 0xbb237197 # unsigned int count(struct N*) symbol_id: 0x4fdeca38 # int sum(struct N*) }
One issue when first starting to manage the ABI of a binary is the wish to restrict the interface surface to just the necessary minimum. Any superfluous symbols or type definitions in the ABI representation can result in spurious ABI differences in reports later on.
When it comes to the symbols exposed, it's common to control symbol visibility. Type definitions can be either exposed in public header files or hidden in private header files, with perhaps only public forward declarations, but this does not remove any type definitions in the DWARF information.
STG provides filtering facilities for both symbols and types, for example:
stg --files '*.h' --elf library.so --output library.stg
This will ensure that definitions of any types defined outside any header files, and perhaps used as opaque pointer handles, are omitted from the ABI representation. If you separate public and private headers, then use an appropriate glob pattern that distinguishes the two.
Sets of symbol or file names can be read from a file. In this example, all symbols whose names begin with api_
, except those in the obsolete
file, are kept.
stg --symbols 'api_* & ! :obsolete' --elf library.so --output library.stg
For historical reasons, the literal filter file format is compatible with libabigail's symbol list one, but this is subject to change.
[list] # one symbol per line foo # comments, whitespace and empty lines are all ignored bar baz
Let's say that struct N
is supposed to be an opaque type that user code only gets pointers to and, additionally, the function count
should be excluded from the ABI (perhaps due to an argument over its return type). We can exclude the definition of struct N
, along with that of any other types defined in tree.c
, using a file filter. The symbol can be excluded by name.
Run this:
stg --elf tree.o --files '*.h' --symbols '!count' --output -
The result should be something like this:
version: 0x00000002 root_id: 0x84ea5130 pointer_reference { id: 0x26944aa7 kind: POINTER pointee_type_id: 0xb011cc02 } primitive { id: 0x6720d32f name: "int" encoding: SIGNED_INTEGER bytesize: 0x00000004 } struct_union { id: 0xb011cc02 kind: STRUCT name: "N" } function { id: 0x9425f186 return_type_id: 0x6720d32f parameter_id: 0x26944aa7 } elf_symbol { id: 0x4fdeca38 name: "sum" is_defined: true symbol_type: FUNCTION type_id: 0x9425f186 full_name: "sum" } interface { id: 0x84ea5130 symbol_id: 0x4fdeca38 }
stgdiff
is the tool for comparing ABI representations and reporting differences, though it has some other, more specialised, uses. The simplest invocation of stgdiff
looks something like this:
stgdiff --stg old/library.stg new/library.stg --output -
This will report ABI differences in the default (small
) format.
The function sum
has a type that depends on struct N
. Any change to either might affect the ABI exposed via sum
. For example, if the type of the value
member is changed to short
and the file is recompiled, STG can detect this difference.
First rerun the STG extraction, specifying --output tree-old.stg
. Make the source code change, recompile and extract the ABI with --output tree-new.stg
.
Then run this:
stgdiff --stg tree-old.stg tree-new.stg --output -
To get this:
type 'struct N' changed member changed from 'int value' to 'short int value' type changed from 'int' to 'short int'
The small
format omits parts of the ABI graph which haven't changed.[^1] To see all impacted nodes, use --format flat
instead.
[^1]: The similarly named short
format goes a bit further and will omit and summarise certain repetitive differences.
function symbol 'int sum(struct N*)' changed type 'int(struct N*)' changed parameter 1 type 'struct N*' changed pointed-to type 'struct N' changed type 'struct N' changed member 'struct N* left' changed type 'struct N*' changed pointed-to type 'struct N' changed member 'struct N* right' changed type 'struct N*' changed pointed-to type 'struct N' changed member changed from 'int value' to 'short int value' type changed from 'int' to 'short int'
And if you really want to see more of the graph structure, use --format plain
.
function symbol 'int sum(struct N*)' changed type 'int(struct N*)' changed parameter 1 type 'struct N*' changed pointed-to type 'struct N' changed member 'struct N* left' changed type 'struct N*' changed pointed-to type 'struct N' changed (being reported) member 'struct N* right' changed type 'struct N*' changed pointed-to type 'struct N' changed (being reported) member changed from 'int value' to 'short int value' type changed from 'int' to 'short int'
Or just use --format viz
which generates input for Graphviz.
digraph "ABI diff" { "0" [shape=rectangle, label="'interface'"] "1" [label="'int sum(struct N*)'"] "2" [label="'int(struct N*)'"] "3" [label="'struct N*'"] "4" [shape=rectangle, label="'struct N'"] "5" [label="'struct N* left'"] "5" -> "3" [label=""] "4" -> "5" [label=""] "6" [label="'struct N* right'"] "6" -> "3" [label=""] "4" -> "6" [label=""] "7" [label="'int value' → 'short int value'"] "8" [color=red, label="'int' → 'short int'"] "7" -> "8" [label=""] "4" -> "7" [label=""] "3" -> "4" [label="pointed-to"] "2" -> "3" [label="parameter 1"] "1" -> "2" [label=""] "0" -> "1" [label=""] }