Enrico Z. Borba - Thoughts

After writing the post on lazy normalization, I thought I understood enough to implement a lazy normalizer. This proved to be false.

This post is a work in progress. It's contents may change. If/when I'm satisfied with it, I'll remove this message and the "WIP" above.

Overview

This post explains a chain-free¹ memory layout for lazy normalization and details the graph traversal using this layout. The memory layout includes both how nodes and wires are represented, and then the memory transformations that happen for each interaction.

Allocation

Ideally a VM's memory layout shouldn't care about how the allocator keeps track of free memory, but unfortunately the memory representation presented here will have such a coupling. This is because some of the node representations make use of free regions in memory to represent special values. So, we'll need to describe the allocator as well.

The allocator first allocates a single memory arena as large as the OS will allow. We'll refer to this region of memory as the heap. The heap is aligned to the nearest 16 bytes. This is because the allocator deals with double-word (128-bit) values.

The allocator has a simple API:

Figure 1: Allocator API

Memory Reuse

The allocator uses a free list to track free memory regions. This attempts to maximise the reuse of memory. If a single word is freed, the allocator checks if its neighboring half is also free. If so, the whole 128-bit value is freed. Since double words can only be allocated at 16-byte-aligned addresses, a free single word will not be reused unless its neighboring half is also free. This fact is explicitly relied on by the memory representation of nets. Specifically, some owned double-word values will keep one half free to represent a special form of the value.

In-Memory Representation of Nodes

Negative Ports (or Positive Wires)

We will be storing polarized -SIC nodes. Roughly, every 64-bit value on the heap is a negative port on the net. The way I prefer to view this is: values on the heap tell you what is on the other side of the positive end of a wire. There are some exceptions to this, but this is broadly the case.

Every double-word value on the heap, except the root at index 0, describes some node. For nodes with two negative ports, each word is one of those ports. This is because negative ports are connected to the positive end of a wire. The exception noted above occurs for nodes with a single negative port: one word is that negative port and the other is some node-dependent information.

The memory layout of these 64-bit, single-word, values is:

Figure 2: Memory layout of a port

The lowest 4 bits are a Tag, and these guide how to interpret the remaining 60 bits. Here are the tags:

Figure 3: Possible tags

The highest 16-bits are labels used by the duplication and superposition nodes, where each label represents a different kind of node. The application/lambda nodes are an additional kind of node, so the in -SIC here is equal to .

The 44-bit addresses are actually 48-bit user-space pointers into the heap². Since these pointers are 16-byte aligned, the bottom 4 bits are all 0. Thus, we can store the Tag inline with these addresses. Note that not all node tags store an address.

Here is a summary of how each tag uses its bits:

Figure 4: Tag-dependent port interpretations

Some additional clarity on the notation:

dw addr is a double-word (128-bit) aligned address.
0 | 1 is a boolean that a Dup port uses to know which (first or second) aux port this is.
index is an index into a global array of references or named nets.
xor is a 45-bit value (not a typo³) that is used to determine the location of a dup aux port, given that you know the location of the other one already.
half is a boolean indicating whether the dw addr points to the first or second half of a double-word⁴.

Notation

Ports which make use of their address bits expect a specific structure at that address in the heap. We will detail each of those structures now. In the memory layouts that follow, we'll use code-script to represent arbitrary values on the heap (e.g. fun or arg). Math-script is used to represent memory addresses (e.g. or ).

When displaying these nodes, memory addresses with only a single port next to them are 8-byte aligned addresses, and addresses with two ports are 16-byte aligned. For example, consider the diagram

Above, is 8-byte aligned while is 16-byte aligned. Lastly, an address with an apostrophe is the second half of a 16-byte-aligned address. For example, would refer to arg above.

Additionally, if we show a port with a tag of ., this implies that only one tag is possible given the context. This is made possibe to decrease noise in the memory diagrams. For example, we will see that in an example layout like this

The only possibility for the tag in is VarAddr, since it is pointed to by a node.

Lastly, we use subscripts to refer to the label portion of a port. For example, is a port with the Sup tag, label , and an address of .

Application Node

Figure 5: Layout of an application node

Lambda Node

Figure 6: Layout of a lambda node

Some important things to note:

The variable and lambda ports hold pointers to each other. This is so that when the lambda node is interacted with, the variable node is replaced with an appropriate value.
A Var x port cannot be moved without updating the VarAddr value.
If a variable does not occur in a lambda node's body, the variable is implicitly connected to an eraser. We represent this by having , since the root port can never have a Var.
If this lambda is the identity function, then .
Lambda nodes take up a variable amount of memory.
- The identity lambda and a lambda that erases its variable both use 3 64-bit values of memory.
- Lambda nodes that use their variables and are not an identity take up 4 64-bit values of memory.
It is nice when reading memory dumps that and point to the same address x.

Lastly, there is space in and ports to store a label, so we could have labelled lambdas and application nodes as well, to encode data structures.

Duplication Node

Figure 7: Layout of a duplication node

A few things to note here:

We make use of the label section of the aux ports of the dup node to store whether it is the first (0) or second (1) aux port.
We store an (XOR) of the two addresses of the aux ports of the duplication node. This is so if we have a hold of one of the aux ports, we can find the other by XOR'ing its address with what's stored in . That is, .
If one of the aux ports of this dup node was erased, say , then and the address in the DupXor is . If both aux ports have been erased, then the address in the DupXor port is 0.

Superposition Node

Figure 8: Layout of a superposition node

Reference Node

Figure 9: Layout of a reference node

Note that is not an address, but rather just an index into a global array of named nets.

Root

The root of the net is the single free positive wire that is "held on to" during normalization. This is the very first port in the heap, the left half of index 0. The second half of index 0 is always free.

Interactions

Now we will explain the memory transformations that occur during interactions.

App - Null

Problems:

Nothing points to arg anymore, it's a memory leak.
- Maybe we need to store a set of positive ports that are "to be erased"?

Dup - Nul

App - Lam

Problems:

How is a lambda that erases its argument encoded?
- Maybe the reference to the Var node is FREE. If so, we need to mark arg as to-be-erased.

Dup - Sup

Notes:

allocates two new double-wide nodes, which is what we expect.
frees nothing, so memory is maximally reused.
it's unclear to me whether it is possible to reach an intermediate net where . If so, there can be memory corruption issues here, given that the interaction is written such that we assume and are distinct from both and .

Dup - Lam

Notes:

allocates two new double-wide nodes, which is what we expect.
frees nothing, so memory is maximally reused.

App - Sup

Notes:

alocates two new double-wide nodes, which is what we expect.
frees nothing, so memory is maximally reused.

Walk Optimization

When performing lazy reduction, we traverse the net in a specific way to find active pairs. Once a single active pair is reduced, we restart the walk at the root. We repeat this until a walk is performed that came across no active pairs. Since the walk procedure is deterministic, we can optimize the restarting step by restarting the walk at the most recently entered aux port after a single active pair reduction. In order to do this, we need to maintain a stack of nodes that were entered in phase 2⁵, and pop from that stack during walk restarts. Nodes visited during phase 1 do not need to be tracked, as they cannot ever be interacted with, and in some sense are already normalized.

Problems

Single-wire-producing interactions need to be handled specially
- applying a function to the identity
- annihilating a sup / dup with loops
Freeing nodes might not always be safe, there might be a node moved to a location that is then freed…
Things cannot be moved freely. For example, a Var points to its lambda. Thus, the lambda cannot be moved unless the Var is updated.
No garbage collection, unclear how to safely erase things.
Garbage collection has to kind of be done during the interactions, there are special cases that tell us whether something should be erased (e.g a FREE variable port in the first half of Lam node). The interactions themselves need to handle this.

1 In most interaction net runtimes, there is a way to explicitly represent a wire. Sequences of these wires can form, which I'll refer to as a chain. These are essentially indirections that can increase in length as the network is reduced. The memory representation detailed in this post makes such chains unrepresentable, and thus interactions are always constant time. Whether or not this is practically useful is unclear. There are known algorithms where chain lengths grow logarithmically in the number of interactions, and the implementations of the interactions in such a runtime can be much simpler than what is presented here.

2 This representation explicitly relies on the fact that most 64-bit processors do not use the upper 16-bits of addressing space.

3 While this space is typically reserved only for 44-bit values, the 45th bit is actually stored as part of the tag. This is done by having two possible values be interpreted as a DupXor tag, 0b0000 and 0b1000. Notice that the bottom three bits are the same for both of these tags, but the 4th bit can be either one or zero. This 4th bit is the 45th bit of the xor value.

4 We could have also done the same two-tag trick as DupXor uses to store a 45-bit address. However, VarAddr has no label, so this is arguably a better use of the space available.

5 This is the same "phase 2" that was described in the lazy reduction post.