Enrico Z. Borba - Thoughts

This will hopefully be the first in a series of posts on the theory and practice of using interaction nets as a programming medium.

This post assumes some familiarity with lambda calculus but none with interaction nets.

Overview

The purpose of this post is to introduce interaction nets to an audience familiar with lambda calculus. This post will explain some of the general theory behind interaction nets and a particular instantiation of them with the purpose of normalizing a subset of the lambda calculus. Lastly, we will explain the theoretical and practical benefits of using such a system to normalize lambda calculus terms.

Background

Lafont initially introduced interaction net systems in his 1990 paper, Interaction Nets. Interaction nets are a general graph-rewriting system that allows one to define a set of node labels and rewrite rules, and in return provides some guarantees on the "behavior" of normalizations of graphs under these rewrite rules. They are a generalization of Girard's Proof Nets, in the sense that proof nets are an instance of an interaction net system. By way of this generalization, interaction nets are also intimately related with Linear Logic.

Interaction nets are a framework for describing a whole class of graph-rewriting systems. In the context of modeling or encoding lambda calculus, one system that can be used is that of the Interaction Combinators, also devised by Lafont. The system we will describes below is a similar one known as the Symmetric Interaction Combinators introduced by Mazza in 2007.

Definitions & An Example Interaction Net System

Nodes and Ports

An interaction net system is a set of nodes (also called agents) that can occur in a net (or graph). Each node has an ordered set of ports, which are where wires (or edges) can connect two nodes. Each node has a single principal port, and the other ports, if any, are known as auxiliary ports. It's important to emphasize that the auxiliary ports are ordered. For example, we can refer to the first or left auxiliary port of the nodes below when looking at them at some fixed orientation.

Note that we deliberately confuse between discussing the kinds of nodes (of which there are a fixed number when discussing any given interaction system) and the occurrences of nodes in a net (of which there can be unboundedly many).

Let's define an example interaction system with three kinds of nodes:

I've specifically drawn the nodes such that the pointy end of the triangle connected to a wire is that node's principal port. The first/second auxiliary port is the left/right one when looking at the nodes in this orientation.

I'll name these nodes:

Eraser ( ), which has no auxiliary ports.
Constructor ( ), which has two auxiliary ports.
Duplicator ( ), which has two auxiliary ports.

The number of auxiliary ports of a node is also known as its arity. Extending that pattern, nilary nodes have an arity of zero and binary nodes have arity two. In my opinion, drawing non-nilary (arity of at least zero) nodes as triangles (with colors or text inside) make it clear which port is the principal port and which are the auxiliary ports.

An example instance of a net in this three-node system could be:

I've omitted the greek-letter labels of the nodes, as they are uniquely identifiable by their shape and colors.

Notice that the distinction between principal and auxiliary ports does not restrict the construction of nets, as wires can connect any kind of port to any other kind of port.

Rewrite Rules

In order to perform any kind of computation with interaction nets, we need to define rewrite rules. The restrictions in the kind of rules we can define are part of what bestows interaction nets their interesting properties. Specifically: rewrite rules only apply to – and are uniquely identified by – two nodes connected by their principal ports. Since these are the portions of the graph that can be rewritten or reduced, we call them reducible expressions or active pairs. Rewrite rules are also known as interactions.

The six (tersely written) rewrite rules we will be working with for the purpose of this post are as follows:

In these rules the left-hand side of the squiggly line describe a pair of nodes connected by their principal port (colored in red). If such a pair is seen in a graph, it is rewritten to the right-hand side of the rule. The open wires above and below each rule are connected to some other unrelated part of the net.

The first three rules are known as commutations, and the last three rules as annihilations. Commutations occur when nodes of different kinds interact and annihilations occur when nodes of the same kind interact.

This system is known as 2-Symmetric Interaction Combinators, or 2-SIC. The "2" is because there are two binary nodes, and the "symmetric" is because the annihilation interaction is the same regardless of whether we are annihilating constructor or duplicator nodes.

It is in this system where we will encode and soundly evaluate a subset of the lambda calculus.

Properties of Interaction Nets

Strong Confluence

By restricting the rewrite rules to only apply to nodes that are connected via their principal ports, rewrites are local. They can be performed independently of each other, as a single node can never be part of two reducible expressions at the same time. Additionally, the rewrite rules cannot affect other existing active pairs. Specifically, performing a rewrite will not cause any other existing active pair to no longer become active. This results in a one-step diamond property known as strong confluence:

Assume that is some net. If can reduce to or in a single step, and , and can each reduce in a single step to . This property results in the astonishing fact: the number of rewrites required to normalize an interaction net is independent of the order of the rewrites. This is not true of other term-rewriting systems such as lambda calculus, where the number of beta-reductions that are performed to normalize a term will depend on the order of the rewrites.

Aside: Practical Consequences & Hardware

Since the rewrite rules are local, and can be applied without affecting other parts of a net, there is a clear potential for parallel reduction / normalization of interaction nets. Modern computers are the result of huge investment in research with the goal of improving upon the von Neumann architecture. Normalization of interaction nets doesn't necessarily fit well into this architecture, so modelling such a system is an area of ongoing research.

Additionally, software that is intended to perform interaction net reduction can choose to fix the interaction net system that is being reduced, as opposed to allowing the user to provide a set of nodes and rewrite rules. Programming languages such as Vine and HVM both are reduction engines of fixed interaction net systems stemming from Mazza's symmetric combinators.

GPU's also provide an interesting hardware medium to perform interaction net reduction. But again, the nuance here is in designing an appropriate memory layout and algorithm to perform reduction in order to maximally realize the theoretical benefits of interaction nets.

Encoding Lambda Calculus Terms

Translation

We've defined above an interaction system of three nodes (eraser, constructor, duplicator) and six rewrite rules. We now need to create a translation or mapping from lambda calculus terms to interaction nets, such that beta-reduction is respected after reinterpreting the normalized interaction net as a lambda calculus term. Let's call this mapping .

In order to define , we need to define how it will act on all possible lambda terms. This can be done inductively (recursively) over the two possible kinds of lambda terms, abstraction and application:

There are a few things to digest here:

We are using an open or free wire to refer to the nets in the image of .
The translation of abstraction and application both use the same kind of node, a constructor.
is not a node, we are just labeling a wire to more clearly show its purpose.
The dashed wire will connect to wherever is referenced in . There are three scenarios to consider here:
- if is not referenced in , then connect it to an eraser node.
- if is referenced once, connect it to where it's referenced.
- if is referenced more than once, we need to stack duplication nodes as necessary until we have the same number of free auxiliary wires as references of in . The number of interactions performed is independent of how we choose to arrange these duplication nodes, as every interaction consumes one and produces two.

Translating an Example Term

Let's attempt a sample lambda calculus reduction using our 2-SIC translation , specifically of the term . Using a friendlier function-definition syntax, the term we are interested in is where

We expect the reduction to result in or .

First, we must translate . There are two abstractions and one application, thus we need three constructor nodes. We also use twice so we will need one duplication node. It's a good exercise to attempt this translation yourself! Click below when you're ready to see how I've drawn it:

click to show translated net

The left "half" of this net is the translation of , the right half includes application at the head of the term, and the identity . The free wire is the resulting term after all interactions have been performed.

Reduction of an Example Term

In order to reduce this, we need to look at every active pair and apply the relevant rewrite rule. Again, like the translation, this is a good exercise to attempt yourself and start building familiarity with how this graph rewrite system behaves. Especially so with this example, as there is more than one possible reduction path.

Click below when you're ready to see how I reduced it! I'll color the active pairs about to be reduced in red for clarity.

click to show the reduction steps

Some interesting things to note:

This reduction took 4 steps. Every other possible reduction path will also – by strong confluence – take 4 steps.
At step (3), there are actually two active pairs, and either one can be reduced.
Intermediate nets – such as the one in step (3) – are not necessarily interpretable as lambda calculus terms.
The normalized net is equivalent to the encoding of , which is what we expected.

Limits of This Encoding

One important point to mention about this encoding is that it does not soundly normalize all lambda calculus terms. Specifically, duplicating a function that duplicates its argument will not necessarily result in the same term when reduced in lambda calculus (and may not result in a lambda calculus term at all). This is due to the fact that duplication nodes annihilate each other when interacted, instead of duplicating each other.

You cannot simply change this interaction however. The annihilation of duplication nodes is important for sound normalization of lambda calculus terms, and is a rule we used above in our sample reduction when duplicating the identity function. There are multiple routes towards remedying this, and I plan on discussing them in another post.

Additionally, "garbage collection" here is explicit. Throwing away a variable (connecting it to an eraser) explicitly destroys it. Because of how the interactions are defined however, the discarded term will be normalized. This is because the eraser node can only interact with nodes through their active port, so any active pairs inside of the erased term will be reduced before erasure. This is in contrast to the naive term-rewriting of lambda calculus, where a substitution into an unused variable simply causes the term to "disappear". In some sense, unnecessary work may be performed. In other sense however, it's optimal.

What's the point?

Lévy/Beta-Optimality

When normalizing lambda calculus terms via a traditional term-rewriting system, one might duplicate an unnormalized term, thereby duplicating the amount of work required to further reach the fully normalized term. Jean-Jacques Lévy defined a "minimum" number of beta-reductions required to normalize any lambda calculus term in his 1978 PhD thesis, Réductions correctes et optimales dans le lambda calcul. His definition of a minimum was essentially the number of unique beta-reductions that can appear when normalizing a term. Duplicated reducible expressions (unnormalized beta-reductions) are not double counted. It is against this minimum that one can compare different lambda calculus reduction techniques. With respect to Lévy's definition of the minimum number of beta-reductions, the reduction of lambda calculus terms via interaction nets with the above encoding is optimal.

The beta-optimality of interaction nets is owed to the fact that abstractions are duplicated incrementally. If one looks closely at the interaction net reduction of an abstraction being duplicated, they will notice that the duplication node passes through the abstraction node. The abstraction is duplicated into two, but the body is shared.

Beta-optimality does not imply that computational complexity (in terms of term size) of this algorithm is superior to that of other lambda calculus normalizers – such as Haskell's Spineless Tagless G-Machine. This is for two main reasons:

Counting beta-reductions is nuanced when leaving the traditional term-rewriting system of lambda calculus.
Counting beta-reductions alone does not account for other work being done.

For example, in the interaction net case, beta-reductions are probably most honestly represented as the number of interactions between two constructor nodes, one for an abstraction and the other for an application. All of the other interactions, however, will not be counted by this metric. In the case of Haskell's STG, it's also not obvious how to compute the number of beta-reductions required to terminate the program.

John Lamping

This encoding of lambda calculus terms and the algorithm for reduction is actually equivalent to a portion of an algorithm presented by John Lamping's 1990 paper An Algorithm for Optimal Lambda Calculus Reduction. Astonishingly, it seems that this paper and the development of interaction nets were independent, as Lamping mentions neither Lafont or Girard. The main difference (or omission) of the algorithm presented in this post and Lamping's is the lack of bracket nodes. These nodes are one way to resolve the duplication issue mentioned above.

Conclusion & Other Topics

Hopefully you've gotten a taste of what interaction nets can offer, at least with respect to lambda calculus. I personally find their theory very rich and distinct from anything else I've seen. As mentioned in the introduction, this is intended to be a series, and I'm hoping to cover topics such as:

lazy and strict symmetric interaction combinator reduction algorithms and their trade-offs
a type system to ensure soundness of reducing an encoding of (an extended) lambda calculus
theoretical bounds on the best and worst-case complexity of a beta-optimal lambda calculus evaluator
polarized interaction systems