Enrico Z. Borba - Thoughts

This is the third post in a series on the theory and practice of using interaction nets as a programming medium.

Please read the first post if you are unfamiliar with interaction nets, and/or the second post for an introduction to polarity.

This post and the others would not exist without the always-fruitful conversations with T6.

Overview

There are currently two main strategies for normalizing interaction nets: lazy evaluation and strict evaluation. We'll see that these two strategies are somewhat at odds with each other. Specifically, strict evaluation is highly parallelizable and has garbage collection for free, while lazy evaluation can normalize a larger class of terms. In order to understand this difference, we'll have to introduce a new family¹ of interaction systems: -SIC with reference nodes.

-SIC + References

-SIC is natural extension of 2-SIC, where we have different kinds (also known as labels) of binary nodes that annihilate when they are of the same kind, and commute when they are of different kinds. Additionally, we're going to add a new kind of node known as a reference node. This is a deferred reference to a named net, which will be copied when interacted with. Named nets must have a free wire². The set of names is fixed.

For the remainder of the post, we'll refer to this interaction system as simply -SIC without explicitly mentioning the addition of reference (and eraser) nodes every time.

The kinds of nodes now looks like:

Figure 1: -SIC nodes

And we have the following rewrite rules:

Figure 2: -SIC rewrite rules

Some things to note:

The dashed circles in rules (3) and (4) represent replacing a reference node with the net it refers to.
Rule (3) can be optimized by tracking whether a constructor node of label exists in the definition of . If not, one can instead annihilate the node and duplicate the reference node , similar to rule (5). This is commonly known as the DUP-REF optimization, and prevents expanding a reference node that might be erased.
In interaction systems with effects, rule (6) may not be correct as there may be effects in the expansion of when erasing it.

Infinite Nets

Recursive – or mutually recursive – use of the new reference node allows for the construction of infinite nets. For example:

Figure 3: A lazy infinite net

is equivalent to the infinite net:

Figure 4: An expanded infinite net

This is very similar to infinite lists in Haskell, which are possible due to Haskell's laziness. Now, just like in lambda calculus, one does not need to add an additional construct like references in order to have recursion or non-termination. What we will see later on in this post is that this reference node defers computation, introducing a kind of laziness even in strict settings.

The difficulty in dealing with non-terminating nets is that strong confluence³ guarantees that if any reduction path is of infinite length, then all reduction paths are.

Strict Evaluation

The rewrite rules of interaction nets as described in previous posts and this one emulate a strict evaluation strategy. By strict we mean that every part of the net is normalized, even if it is disconnected from the root free wire. This implies that if any portion of the net does not have a normal form – because of non-termination – then the entire net does not have a normal form. A similar situation exists in lambda calculus. For example, consider the following lambda calculus program:

Using a normal order evaluation strategy, it should normalize to , as simply discards its second argument. Using a strict evaluation strategy, however, would result in non-termination as the argument has no normal form.

When encoding into -SIC⁴, we get the following:

Figure 5: A strictly diverging net

If you step through the reduction, you should reach the following intermediate net within a few rewrites:

Figure 6: Normalization of with reference nodes

Clearly, the reference to is erased, and by rewrite rule (6), the reference is not expanded. The result is then as expected.

Without reference nodes however, the presence of anywhere in the net will require that it is normalized, as it has an active pair. The intermediate net in Figure 6 would instead look like this:

Figure 7: without reference nodes

Notice that the net is no longer fully connected. Even though the root free wire is clearly referencing just , there are still active pairs in this net. Specifically, the erasure of . In strict interaction net settings, erasing a term normalizes it first. Like in lambda calculus, attempting to normalize in this -SIC encoding will never terminate. Thus, in a strict setting, normalization of using -SIC does not terminate.

Reference nodes are therefore a way to introduce laziness into an otherwise strict context, allowing one to erase infinite terms by deferring their expansion.

Strict Evaluation Algorithm

An implementation of a strict evaluation algorithm for interaction combinators / -SIC requires two main components, a buffer of nodes, and a collection of redexes. The buffer of nodes can be a self-referential array where some elements hold pointers to others in the array. These pointers are essentially the wires, and the nodes in the buffer contain information about the kind of node. The redexes can just be pairs of pointers to nodes.

The algorithm is then to continuously pop from the collection of redexes and perform the appropriate interaction depending on the kind of nodes in the redex. When linking wires together, new redexes may be formed. These should be re-added to the collection of redexes.

This algorithm is easy to parallelize. Spawn as many threads as you wish, and have them continuously pop from the redexes. One just needs to ensure that redexes are grabbed atomically. Since a node can only be part of at most a single redex, there isn't too much difficulty with thread-safety, as obtaining a redex implies ownership of the two nodes in that redex.

Once the collection of redexes is empty, the net is normalized.

Lazy Evaluation

While references allow for a form of laziness, they require explicitly marking a laziness boundary. Another way to handle the erasure of infinite nets is by not reducing any part of the net that is no longer connected to the root. This requires some way of knowing whether a redex actually contributes to the final term connected to the root wire.

Note that once a net is no longer to the root wire, it can never be reconnected, as disconnected nets cannot interact.

How can we determine which redexes are required in order to normalize the net eventually connected to the root wire? A first intuition would be to walk the net starting at the root, when we see a redex we reduce it, and then restart the walk.

Let's imagine some hypothetical net:

Figure 8: A hypothetical net

A portion of a net like this is known as a tree, where aux ports always connect to principal ports. Let's call this tree at the head of the root term the initial tree, and represent it as an outlined node. Node that the tree does not have to be balanced.

Figure 9: An initial tree

What's interesting about these initial trees is that, no matter what is downstream of the tree, they will always be present in the normalized term. In some sense, this initial portion of the term is already normalized. This is because these nodes can only be removed/rewritten if their main ports were wired to another main port.

Thus, when starting a net traversal at the root, we enter through main ports and exit through auxiliary ports. This is known as phase 1 of the lazy evaluation algorithm.

While walking the net, if we ever enter a node through one of its auxiliary ports, we may no longer be in the initial tree. However, we may be going back up the tree instead of going down. For example,

Figure 10: A walk reentering the initial tree

Thus, during a net traversal, when entering through auxiliary ports, we exit through main ports. Once we first enter through an auxiliary port, we are now in phase 2.

During phase 2, if we ever enter a main port, we are in a redex:

Figure 11: Example walk encountering a redex

I'm not drawing some of the auxiliary wires in order to minimize noise in an already noisy diagram.

Once we find a redex, we can reduce it, and restart the walk from the root⁵. If there is no redex, the walk will eventually return to the root, assuming that the net does not contain any vicious circles⁶.

Since we're walking through nodes connected to the root, disconnected components will not be reduced further. This algorithm searches for nodes to contribute to the initial tree, greedily expanding it until no further redexes exist. Any node added to the initial tree necessarily contributes to the final normalized term, since once a node is added to the initial tree it cannot be removed.

Memory Layout

When implementing a strict evaluator, one can keep a buffer of nodes where a node only stores pointers to its auxiliary ports. This is because we have a collection of redexes that we can grab from, and when performing a rewrite we only need to manipulate / rewire the auxiliary ports of the two interacting nodes. In the lazy case however, we perform a walk over the net and the phase 2 of this walk goes from an aux port to a main port. This means that we must also store pointer information in the reverse direction⁷. It is unclear at the moment if there is a unifying memory layout for both of these evaluation strategies.

Garbage Collection

One interesting practical drawback of the lazy algorithm is that disconnected components must be garbage collected somehow. In a strict setting, discarding a net by connecting it to an eraser will eventually reclaim all of the memory that was used to store that net. However, if a net becomes disconnected from the root in a lazy setting, we no longer perform interactions on it. Thus, we need some additional code / strategy to periodically detect such components and free their memory. If we devise such a mechanism, we gain an additional benefit that the memory can be freed without performing any interactions, but it will take some additional computation to detect disconnected nets.

Parallelism in Lazy Evaluation

One opportunity for parallelism in the lazy evaluation algorithm is in phase 1: when entering through a main port, spawn two threads to continue walking through both auxiliary ports. Detecting contention here is much more nuanced, as two walks can end up at the same redex such as in Figure 11. It is beyond the scope of this post to discuss potential parallel lazy implementations, as we're looking to focus more on the semantic differences between strict and lazy evaluation strategies.

However, it is important to note that the upper bound on the amount of parallelism (number of concurrent threads) is the number of leaves in the initial tree of the final term. This is because this algorithm can only be parallelized during phase 1, and phase 1 is the walk over the initial tree. Thus, if the final term is, say, a null eraser , then no concurrent threads were possibly spawned during reduction, no matter what initial term was.

This is a heavy consequence of a lazy evaluation strategy for a variety of computations. For example, numerical computations that normalize to a single number (in some interaction system with native integers) will not benefit from any of the inherent parallelism of interaction nets.

Semantics of Lazy Evaluation

One interesting question is: which lambda calculus evaluation strategy does the lazy interaction net evaluation strategy correspond to? It turns out that it is something in between a weak head normal form and a strict normalization. Infinite / non-terminating terms that eventually become disconnected from the root will not result in non-termination of the entire term. However, if the root term is in fact infinite / non-terminating, then normalization will never complete.

If there is a path of beta-reductions that terminates and produces a finite term, the lazy reduction algorithm will find it.

This is not as lazy as a lambda calculus normalization strategy that computes a weak head normal form, as WHNF leaves the body of abstractions unnormalized. However, it is lazy in a call-by-need sense in that function arguments are normalized only when needed. Thus implying that control flow mechanisms (such as branching) do not require any special nodes to short-circuit.

Of course, this algorithm is still Lévy/beta-optimal.

Conclusion

We discussed the two most prominent normalization strategies for interaction nets, and discussed their individual trade-offs. Specifically, the strict strategy has garbage collection for free, can easily be parallelized, but will not terminate if any subterm does not normalize. The lazy strategy is much more complex to parallelize, but can easily handle infinite terms, while still being beta-optimal. Additionally, we demonstrated that reference nodes in a strict setting are an intermediate mechanism that may balance the benefits of both normalization strategies.

1 To be more accurate, this family of interaction systems is parametrized by the number of binary nodes and the set of named nets . Additionally, the presented rewrite rules are also a family or schema of rewrite rules. There are actually possible kinds of interactions.

2 Much like how our translations of lambda calculus terms would refer to nets through a free wire.

3 See the strong confluence section in the post on normalization.

4 Two distinct labels are used in this encoding, one for duplication and one for the application/abstraction nodes. Additionally, the actual encoding of is slightly longer. The encoding in the diagram uses a duplication to share the term just to minimize the figure, but is equivalent.

5 There's an immediate optimization here, where one can restart the walk from the previously entered aux port, instead of restarting from the root

6 Vicious circles are nonsensical terms, such as a lambda returning itself as the body. These terms are "worse" than non-terminating terms like as they are "normalized" but cannot be read back. In some sense, these terms are not well-formed.

7 In the case of normalizing interaction combinators solely for the purposes of lambda calculus, it is possible to store only two pointers per node. Since we enter through positively polarized wires and exit through negative ones, we need only store the negative ports of each combinator. The traversal algorithm then is essentially chasing the value-producing wires.