When comparing types recursively, we explore a graph whose nodes are pairs of types. Nodes can be visited more than once, and there can even be cycles.
The standard tool for exploring directed graphs is Depth First Search. Dealing with cycles in such a graph requires the determination of its Strongly-Connected Components.
DFS code for graph traversal is trivial. SCC determination is not and we would like it to be provided by a reliable and efficient library component.
There are three commonly-studied asymptotically-optimal approaches to determining the Strongly-Connected Components of a directed graph. Each of these admits various optimisations and specialisations for different purposes.
Kosaraju's algorithm is unsuited to DFS-generated graphs (such as type comparison graphs) as it requires both forwards and reverse edges to be known ahead of time.
Tarjan's algorithm can be massaged into the form where it can be separated into a plain DFS traversal and SCC-specific pieces but the resulting code is a bit messy and responsibility for SCC state management is rather scattered.
The path-based algorithm is the best fit and can be put into a form where the DFS traversal and SCC state management are cleanly separated. The concept of “open” nodes carries directly over to the implementation used here and SCC state management occurs in two well-defined places.
The remainder of this document discusses the finer details of what the finder API should look like.
The choice of primitives should be determined by the interface priorities:
Roughly speaking, the SCC finder “opens” and “closes” nodes and the user code will try to open and close nodes at the beginning and end of each node visit.
The logic to follow when the SCC finder reports a complete SCC (nodes closed) is simple: the user code must ensure it considers each node as definitively visited from this point and avoid asking the finder to consider it again. This condition means the SCC does not have to independently track “ever visited” status that may be duplicated elsewhere.
When reaching a node via DFS, the user of the SCC finder has to perform a 3-way classification:
There are at least 3 different ways of structuring program logic to distinguish these paths.
Node lifecycle:
If a node has never been visited, it can be unconditionally opened. If it has been visited, we must still check if it‘s open. This is a bit odd in the context of the SCC algorithm as really_open
and is_open
become separate operations (-simplicity). It’s possible that a user could know, for other reasons, whether a visited node is open or not and then omitting to call is_open
could upset the SCC finder (which needs to update its internal state in the already-open case). This risk could be eliminated by duplicating the state update logic in both methods (-efficiency).
if (visited) { if (is_open(node)) { // cycle-breaking back link return; } else { // work-saving link to shared node return; } } mark_visited(); token = really_open(node); ... // do work ... nodes = close(token); if (!nodes.empty()) { ... }
Node lifecycle:
This scheme also requires separate is_open
and really_open
operations as nodes musn't be reopened (-simplicity, -efficiency). It does allow the user to mark nodes as visited any time between open and close (-simplicity, +power).
if (is_open(node)) { // cycle-breaking back link return; } if (visited) { // work-saving link to shared node return; } token = really_open(node); ... // do work ... nodes = close(token); if (!nodes.empty()) { // last chance to call mark_visited }
NOTE: This is the currently implemented approach.
Node lifecycle:
This is the purest form of the algorithm with the open
and close
operations clearly bracketing “real” work. really_open
and is_open
operations are folded together into open
which will fail to open an already open node (+simplicity, +efficiency).
The user value store cannot be something that gets initialised and updated between open and close or it will also need to duplicate the open/closed state and this correspondence will need to be maintained as an invariant (-power).
if (visited) { // work-saving link to shared node return; } token = open(node); if (!token) { // cycle-breaking back link return; } ... // do work ... nodes = close(token.value()); if (!nodes.empty()) { ... mark_visited(); }
Evaluating equality can be made to fit nicely into this category, as can using equality along with the return stack to build a diff tree with stubs for sharing- and cycle-breaking links.
However, building a graph (say a copy of the traversal, or a diff graph) requires open node state to be squirrelled away somewhere.