/ project

Cake: expressive algorithms and data structures in Scala

Cake is a Scala library for expressing algorithms and data structures in a declarative and typed manner, focusing on properties like:

  1. structure (directed graph, hierarchy)
  2. propagation of information (signal, feedback, random flow)
  3. parametrized node types
  4. IO behavior
  5. storage implementation (adjacency matrix/list)

It utilizes the cake pattern in Scala, combining traits in as flexible a manner as possible, helped via statically-checked dependency hierarchies of the mentioned properties.

Backpropagation with Cake

A good example of the kind of algorithm and data structure pairing that can be created with Cake is an implementation of backpropagation for supervised learning in neural networks.

The relevant structure here is a hierarchical graph, i.e. a set of layers of nodes and directed weighted connections between them, with a beginning input layer, N hidden layers, and a final output layer. The operations which the algorithm utilizes include propagating a signal from the input layer to the output layer, propagating the error signal (feedback) in the opposite direction (backpropagation), and finally using the accumulated error information to update the weight values of the connections (an isolated process/flow, not a directed signal).

I'll explain the code for the backpropagation algorithm in Cake in a fragmented and nonlinear manner, focusing on the most important aspects first.

Essential definition

val bp = new Nodes[State with Order]
        with Hierarchy
        with Directed with AdjacencyList
        with Signal
        with Feedback
        with Flow
        with IO {

This piece of code begins the definition of the bp object--containing the necessary data with the right structure and supported operations for a backpropagation algorithm.

  1. Nodes[State with Order]
    This trait specifies that the bp object is a collection of nodes of type State with Order. State and Order are some basic characteristics Cake provides by default for nodes: the former adds a simple state: Double to nodes, and the latter is utilized to order the nodes inside a hierarchy.

  2. Hierarchy
    Tags the structure as a hierarchical ordering of nodes (e.g. a neural network or a binary tree). It specifies a requirement on the Nodes trait that it be parametrized with Order, a trait that adds an order: Int property to nodes.

fig_neural_network_1

  1. Directed with AdjacencyList
    The Directed trait adds weighted one-directional connections between pairs of nodes. Practically, each connection is a Double value signifying its weight/strength. However, they can either be existent or nonexistent, so the implementation wraps them as an Option[Double], and the methods for accessing and modifying them respect that interface (e.g. bp(1, 2) = None removes a connection from node 1 to node 2).

    AdjacencyList specifies neither a structural nor an operational property: it's merely an implementational detail that functions with the Directed trait and stores the weights in an adjacency list. This line of code could for example be changed to Directed with AdjacencyMatrix so that the actual storage for values in the structure is implemented in a different manner--with no other changes to the code required.

  2. Signal, Feedback and Flow
    In addition to specifying structural properties, it is also useful to specify operational properties that relate to how the structures processes information, or in a more user-oriented lexicon provide basic operations necessary to implement more complex algorithms. Obviously, structure and operation are codependent, and Cake ensures that this dependence is satisfied through Scala's type system. (Currently the only structures of interest are graphs, so these three traits specifically describe the flow of information through a directed, weighted graph.)

    • Flow signifies that information flows through the structure without any structural or temporal dependence--a free-for-all via all connections. It applies to any structure that is the subtype of Nodes[_] with Directed, meaning that practically any graph can support it--however, this is still a statically-checked requirement.

    • Signal and Feedback are more specific types of information flow, they signify flows that have a structural dependence. In practice this means that the structure has to be hierarchical--a neural network is a great example, as it is essentially not an unordered set of nodes, but a set of layers (1 input, N hidden, 1 output layer most commonly). These two traits require that the Hierarchy trait precede them. Hierarchy, in turn, specifies that the nodes are subtypes of Order, a type which Signal and Feedback operate upon.

  3. IO
    Another mostly implementational concern, but crucial nonetheless as it provides protocols for processing inputs and providing outputs from the structure, and the protocol enables connecting to other IO providers and processors.

    In the case of backpropagation, the IO provider (or source) could be a collection of pairs of input and target vectors, while the IO processor (or sink) could report on learning epochs (or even control the learning process itself).

Nodes

Cake doesn't have a specific node type that other nodes extends, rather, nodes are merely assemblages of various arbitrary properties. One example is the attribute of having a state, or a label signifying the node is used as an output.

All nodes in a neural network obviously need a State attribute and an Order attribute (as they exist in a hierarchy). Those in the first layer need Input, and those in the last the Output label. Additionally for backpropagation, the output nodes will also contain a Target attribute, all nodes except input ones need an Error attribute, and there exists a single instance of a special Bias node in each non-output layer.

This is the code for these three node attributes:

trait Error {
    this: State =>
    var error: Double = 0.0
}

trait Target extends Error {
    this: Output =>
    var target: Double = 0.0
}

trait Bias extends Input {
    state = 1
}

Utilization of the cake pattern is evident here as well. In the Error trait definition, a constraint is placed using the this: State => self-type Scala syntax, so that the trait always has to be combined with a State node (as nodes with no states cannot err). Further on, the Target trait extends Error, placing a constraint that only Output nodes may have the target: Double property. Finally, Bias nodes are input nodes with their states set to 1.

Nodes are accessed through the nodes field/method: bp.nodes(nodeIndex).

Initializing bp with nodes

Inside the definition of the bp object,

// val bp = ...
//          ...
//          ... {

    val nodes =
        Hierarchy(2, 2, 1).nodes( // non-bias nodes
            () => new {val order = 0} with Input with Order,
            () => new {val order = 1} with State with Order with Error,
            () => new {val order = 2} with Output with Order with Target
        ) ++
        Nodes.nodes( // bias nodes
            ((() => new {val order = 0} with Bias with Order), 1),
            ((() => new {val order = 1} with Bias with Order), 1)
        )

The nodes property is an abstract member of the Nodes[T] trait, and is simply whichever covariant sequence of T: def nodes: Seq[_ <: T]. This abstract property obviously has to be defined by the implementing object, and here it's a simple val created with two builder methods Hierarchy(rows...).nodes and Nodes.nodes. The hierarchy helper transforms a set of instructions for creating a node in layer i to N layers with M nodes.

The usage of the factory functions is obvious from their types:

Hierarchy (ordered):

def nodes[T <: Order : TypeTag](createInLayer: (()=>T)*)

The TypeTag implicit parameter is irrelevant for functionality, what matters is that this function creates nodes of type T which is a subtype of Order. Another example of a constraint being placed: createInLayer is a sequence of builder functions that can produce any type of nodes as long as they mix in the Order trait.

What's hidden here is the Hierarchy(2, 2, 1) part of the code in the example: this is a simple apply(Int*) function on the companion object Hierarchy that determines the dimensions of each layer (in this case, 2 nodes in the first and second layer, 1 in the final one--again, that's input/hidden, .../output). That information is utilized by the factory function to call the relevant builder methods the correct number of times.

Nodes (unordered):

def nodes[T : TypeTag](fs: (()=>T, Int)*)

This function takes pairs of builder functions and natural numbers that determine how many nodes each function should produce. It's obviously similar to the ordered factory function, with the distinction that no type constraint is placed upon the node objects produced.

Populating the structure with weights

The last section explained the process of initializing the structure with nodes, however, the connections between them are yet to be defined. Connecting the nodes is always done outside of the definition of an object, however utility methods for that are provided as well.

The code bp <= Hierarchy(2, 2, 1).random(-1, 1) will initialize the bp object's weights to random real numbers between -1 and 1: however this only applies to nodes created with the Hierarchy.nodes helper method, which means that bias nodes are excluded. The operator <= is provided by the Directed trait, and the Hierarchy.random(a, b) method returns an initializer object expected as an argument by this operator.

Since the bias nodes are specific to this algorithm, they're set manually with the following code:

bp.of[Bias with Order]. // all nodes from bp that are subtypes of Bias with Node
map(in => (in._1 -> in._2.order)). // (index, node) -> (index, order)
foreach { case (i: Int, inRow: Int) =>
    bp.order(inRow+1). // all nodes in the layer above bias
    filter(!bp.nodes(_).isInstanceOf[Bias]). // that aren't bias nodes themselves
    foreach { j => bp(i, j) = Some(Util.random(-1, 1)) } // connect from bias (i) to node (j)
}

The of[T <: N] method is provided by Nodes[N] trait and returns all of the nodes with the specific subtype requested mapped from their index number in the collection of nodes. The results of this method are usually then mapped, filtered, and processed in other ways for specific operations upon the relevant nodes' properties.

Accessing weights

The Directed trait adds two apply methods that are used to access the weights with either this(...) inside the object definition or bp(...) outside.

  1. this(i: Int, j: Int): Option[Double]: connection from node i to node j, None if nonexistent, Some[Double] if existent

  2. this(i: Int): Map[Int, Double]: the map of connections to node i from various other indices, i.e. keys for the map (nonexistent connections not included)

A third method from(i: Int): Map[Int, Double] returns an analogous map of outward connections from node i.

Defining I/O operations

The IO trait requires the two following methods to be implemented:

def input(channels: Map[String, Seq[Double]]): Unit
def output(): Map[String, Seq[Double]]

The types of these functions are revealing: information is received and transmitted as sequences of doubles mapped onto channels, identified by names (strings).

The input for the backpropagation implementation are input values and the target output values, so these sequences are provided on appropriately-named channels "inputValues" and "targetValues". Its output essentially doesn't matter during the training, however the network still exposes the relevant states on channels "inputs", "outputs", and "errors".

This is the implementation of the above two functions by the backpropagation object:

def input(channels: Map[String, Seq[Double]]) = {
    (channels("inputValues") zip of[Input].values).
    foreach { case (x: Double, i: Input) => i.state = x }

    (channels("targetValues") zip of[Target].values).
    foreach { case (x: Double, t: Target) => t.target = x }

    signal(propagate)
    feedback(backpropagate)
    flow(updateWeight)
}

The first part of the input function is again self-descriptive: it assigns input and target values to the relevant nodes. The second part is more interesting: after recording input, the method also processes it. The three function calls (signal, feedback, and flow) form the backbone of the backpropagation algorithm. The functions that do the actual heavy-lifting--propagate, backpropagate, and updateWeight--will be explained later.

The output method:

def output = Map(
    "inputs" -> of[Input].values.toSeq.map(_.state),
    "outputs" -> of[Output].values.toSeq.map(_.state),
    "errors" -> of[Error].values.toSeq.map(_.error)
)

This is extremely simple, a simple map is created mapping channel names to node values at the time of the call.

Operators

The IO trait is rather useless by itself, the point however is in the operators for chaining I/O operations it provides:

  1. >>(to: IO): connects the left-hand side IO object's outputs to the right-hand-side IO object's input.
  2. +(b: IO): combines the input and output methods of the two provided IO objects into a new IO object

The algorithm

As noted previously the actual backprop is implemented through three helper functions, which are passed as parameters to signal, feedback, and flow methods. These helper functions have a very restrictive type: Int=>Unit, which implies that the helper function will be applied to a sequence of node indices. The ordering of this sequence differentiates the background methods signal, feedback, and flow, in a manner previously elaborated.

For propagation:

def propagate(i: Int) = nodes(i).state =
    sig(this(i) map { case (i: Int, weight: Double) => nodes(i).state * weight } sum)

This is the simplest of the three helper functions, and its operation is obvious to anyone familiar with neural networks: the logistics/sigmoid function applied to the sum of input and weight products from the previous layer.

Backpropagating the error:

def backpropagate(i: Int) = 
    nodes(i).asInstanceOf[Error].error = nodes(i) match {
        case o: Output with Target => (o.state - o.target)

        case s: State with Error => {
            val weights = from(i)

            (weights map { edge =>
                weights(edge._1) * nodes(edge._1).asInstanceOf[Error].error * dsig(nodes(edge._1).state)
            } sum)
        }
    }

The only side-effect of the backpropagate(i: Int) function is setting the error property of the i-th node. The code simply expects that the node at the index i will be of type Error.

The from(i: Int) function was mentioned before as returning a map of indices to nodes that the node with index i connects to; here, it is utilized to backpropagate the error.

Finally, the flow method is called with the following function to update the weights:

def updateWeight(i: Int) = this(i) = this(i) map { case (from: Int, weight: Double) =>
    from -> (weight - learningRate * nodes(i).asInstanceOf[Error].error * dsig(nodes(i).state) * nodes(from).state)
}

The weight update could have been achieved via either signal or feedback methods, however the flow method is used to avoid unnecessary sorting of nodes: weight deltas once the error has been calculated are independent of each other in this algorithm.

Putting it all together

Once the algorithm and the structure have been defined, a set of input and target values is defined, and an IO chain is constructed:

val inputs = List(List(0.0, 0.0),
                  List(0.0, 1.0),
                  List(1.0, 0.0),
                  List(1.0, 1.0))

val or = List(0.0, 1.0, 1.0, 1.0) map (x=>List(x))
val xor = List(0.0, 1.0, 1.0, 0.0) map (x=>List(x))

def pickRandom(i: Int) = scala.util.Random.nextInt(inputs.length)

val training = Revolve("inputValues" -> inputs,
                       "targetValues" -> xor)(pickRandom) >>
               bp >>
               Print(10000)

Most of the above code is rather self-explanatory. Revolve is an IO operator which projects randomly-picked, mutually-dependent value sequences to the desired output channels.

Running the example

Since the bp object is an IO element, it is natural to call the output method on it. The piping constructed via the >> operator ensures that the input/target values, processing, and reporting are done in a correct sequential order. The integer argument to the Print IO sink (10000) specifies that a report should be printed each 10000 iterations (epoch size).