/ Mona

Fundamentals of Mona

I've settled on a lot of Mona features and concepts, and this post will sum up that design progress. Here's an introductory post that lays out the basic ideas and goals of the language. I've paused the development of the implementation for a while, mostly because I had just focused on the design, and worked on some other projects at Petnica.

I've begun writing the skeleton of the standard library, which can be seen on the monac repository on my GitHub: github.com/corazza/monac-scala/tree/master/mona/standard-library. The code is very volatile at the moment though, as the core concepts of the language are still changing rapidly.

What follows is a high-level overview of the most important facets of Mona. I'll try updating the post as the language evolves.

Basic example and description

This is an example of a (trivial) program that prints the last pair of coprime integers up to 10:

import Math( gcd )

data Pair a = Pair a a deriving (Show)

let coprimes = [Pair i j | i <- [1..10], j <- [1..10], gcd i j == 1]

main : () -> ()
main () = {
    last : &[a] -> &a
    last x::[] = x
    last x::xs = last xs
    
    let last_coprime = last coprimes
    
    print last_coprime
}

Most of the above code will be explained in more detail, but for now it is important to notice some of the most immediate structural and syntactic features of the language - namely:

  • the near-equivalence with Haskell syntax (but not semantics, of course);
  • expression blocks a la Scala and nested function definitions, both seen in the main function;
  • type inference - the type annotation is missing for the coprime binding, and the for main it could've been left out (function type annotations for top-level functions are encouraged for the sake of clarity);
  • pattern matching, as seen in the last function;
  • algebraic data types as seen in the definition of Pair a, which is a product type, and type classes with automatic deriving a la Haskell;
  • module system with control over what is imported/exported;
  • references (that are a bit idiosyncratic); and
  • currying - all functions are single-argument

These features are explained in more detail below, along with many others, e.g. ownership, list comprehensions, guards, etc.

It also needs to be noted that immutability is the default for bindings in Mona (explained further in the section Mutability). Another less obvious fact is that Mona references are a fusion of C++ and Rust ones - i.e. they have borrow semantics from Rust; but reference semantics from C++, meaning that no explicit operator is needed when passing arguments to functions by-reference - so when print takes coprimes as an argument, the list is borrowed and only a pointer to it is passed. Pass/return mechanisms and memory are discussed in more detail in the section Ownership and move semantics and those succeeding it.

Structure

Mona programs are collections of function, data type, type class, and global bindings, belonging to modules.

The primary structural difference is the lack of a Haskell-like where keyword - top-level names can be referenced by any module declaration in the file, and when specifying classes and instances, where is replaced by block syntax. Recursion and mutual recursion are achieved through forward-referencing rules similar to those in Scala: functions and lazy bindings can forward-reference each other if no statement between the reference and the declaration (inclusive) is a non-lazy binding.

The module syntax / some semantics is directly taken from Haskell as well. Module declarations effectively serve as export lists for files, and modules can span multiple files. By default, all top-level definitions in a file aren't exported (they reside in a conceptual file-private module).

Expressions and functions

The simplest expressions in Mona are literals (like "hello world" or 142) and function calls. Function application syntax is, as seen above, the same as in Haskell - it is enough to write the name of the function and a list of arguments - without parentheses or commas.

Function definition

Functions are defined through recipes, patterns, and guards.

Recipes and patterns

Recipes are a more convenient syntax for specifying different cases for the function using pattern matching, as in this example of a function computing Fibonacci numbers:

fib : Int -> Int
fib 0 = 0
fib 1 = 1
fib n = fib (n - 1) + fib (n - 2)

The last three lines specify three cases for the fib function: when it is called with the parameter 0, 1, or any other number. This is also a primitive example of pattern matching, which is explained below in full - but for now the intuitive idea is that the language matches the supplied parameter set from the first recipe/pattern to the last, and selects the first match. Also note that the parentheses in fib (n – 1) do not signify an argument list, but rather just wrap the expression n – 1 (needed because of operator precedence).

Guards

Guards are another part of the function definition syntax, and their purpose is also to separate the function in several cases. However, they are considered after a pattern has been matched, and involve boolean expressions to perform tests on parameters, rather than just matching the parameters' structure. Here is an example of a function with a single recipe (pattern) and three guards:

cost : &Car -> Int
cost (Car weight _ _) | weight < 10  = 0
                      | weight < 100 = 50
                      | otherwise    = 150

The otherwise keyword is the catch-all case, similar to _ in patterns.

Expression blocks

In addition to Haskellian expressions, Mona features expression blocks. Several expressions can be placed inside a block delimited by { and }, executed sequentially, and the value of the entire expression block will be the value of the last expression in the block. This effectively eliminates the need for most uses of the return keyword. Here is an example of expression block syntax:

main name::args = { // first expression block, value is () : ()
    println $ "running" ++ name

    let x = 5
    let y = { // second expression block, values is 50 : Int
        let z = x*10
        z + x
    }

    println $ y
}

Usually, the main function should feature a type annotation, since it may be defined to have various types - however, in this case, it was inferred to be main : [String] -> () (the final executable, of course, still returns 0 on a successful exit in a Unix system).

Lambdas and if

Here is a piece of code demonstrating lambdas and if expressions:

println $ filter { \x -> x `mod` 3*5 == 0 } [1..100]

[1..100] `foreach` { \x -> 
    if x `mod` 3*5 == 0
    then println x
    else ()
}

The constructs are explained in more detail individually in the following two paragraphs:

Lambdas

Lambdas, or function literals, are expressed starting with backslash/lambda (\ or λ), followed by a list of named arguments, a right arrow (-> or ), and then an expression. The syntax of lambda expressions doesn't require braces around them in general, but when being passed as arguments to other functions they do have to be wrapped (regular parentheses may be used as well).

If expressions

The if ... then ... else ... construct functions similar as in Haskell, and is analogous to the ternary ? operator in C-like languages, in that it is an expression with a value and a type, and not a control-flow statement. However, there is a significant difference from Haskell in that the resulting value and type of the entire if expression is inferred as () in cases where the else clause is missing.

E.g. the following code:

if a == b then { 
    234
}

Is equivalent (or expanded to) this form:

if a == b then { 
    234
    ()
} else ()

Operators

As in Haskell or Scala, there is no semantic difference between operators and functions. The difference is in the syntax - operators use infix, while regular function application uses postfix form by default. This is demonstrated above with e.g. the $ operator (which evaluates its right argument and passes it on as an argument to the left one, a function); and the `foreach` syntax, which switches a regular function to infix form.

List comprehensions

Provided in the first example, they function as in Haskell. They are best explained in the Haskell report.

Data type definitions

New types are defined using the data  keyword. There is a distinction between the type name, and constructor name(s) - a single type is identified by a module-unique name and a (potentially empty) list of type parameters that enable parametric polymorphism, additionally, each type can have any number of constructors with different names, which are used to actually create values of that type (it's most common to use the same name as the type if there is only a single constructor).

Here are some examples of different data type definitions:

// single constructor named after the type
data Point = Point Float Float

// sum type with two empty constructors, essentially unit + unit
data Bool = True | False

// parametric polymorphism
data Option a = Some a | None

// recursive type
data BinaryTree a = Node (BinaryTree a) a (BinaryTree a) | EmptyNode

Mona's type system is also inherited from Haskell, and it is algebraic: there are sum types and product types. Colloquially, sum types combine different types so that values can be either of those, for example either an integer or a character (yet still considered the same type, but with multiple cases) - they are analogous to the or operation in logic. Product types, analogous to and, create composite types, so that expressions could have both an integer and character components at the same time (similar to structs in C, except that fields are by default identified by order and not by name). This article goes much deeper into the topic of type algebra.

Another interesting class of types are tuples, which are essentially unnamed product types. Their syntax is the same as in Haskell.

Record syntax

Record syntax is used to create constructors that have named fields. Example:

data Vector a = Vector {
    length : Int
    capacity : Int
    storage : Unique a
}

This form introduces in the scope a new function for each field, which returns a reference to a specific field (mutability of the reference depends on the mutability of the original binding for the vector value). It is possible to have other, non-record constructors, when using record syntax.

Recursive types

The above example of BinaryTree a is especially interesting, because on of its constructors is recursive, i.e. one of the fields of the constructor is of the same type. Types with recursive constructors of that kind automatically become heap-only, and constituent fields of that type are implicitly heap pointers. Fields in other types and bindings that refer to values of recursive types have to do so explicitly via the Box type, which is explained later in more detail.

Type classes and class constraints

Another major feature of Mona are type classes, which are used to group abstract related functionality - like being a number, being comparable, or being convertible to a string. Here is an example of a type class for iterable collections:

class Copy itt => Iterate col itt {
    size        : &col a -> Int
    elem        : &col a -> &a -> Bool

    iterator    : &col a -> itt a
    hasNext     : itt a -> Bool
    next        : itt a -> itt a
    get         : itt a -> &a
}

The name of the type class is Iterate, and it has two type parameters, col and itt. A major difference between type classes in Haskell and in Mona is that, in Mona, they can also specify members, not just functions.

Class constraints

The first part of the declaration, Copy itt =>, is called a type constraint - it constrains the type parameter itt to be a member of another class, Copy. (The copy class doesn't specify functionality, it alters the way arguments are passed to functions. These mechanisms will be discussed later.)

Another example of a class constraint in a function:

printFirst : Iterate col _ => &col a -> a
printFirst xs = get $ first xs

The phrase Iterate col _ => before the function type signature constrains the col type parameter to be a member of the Iterate type class, along with another type (iterator for that collection) which is irrelevant for the current function (hence, _ instead of a more descriptive name).

Instances

There are two ways to make some type a member of a type class. The first, using the deriving keyword, is automatic, for example:

data Point = Point Float Float deriving (Show)

The above code makes the Point type a member of the Show class, meaning that it implements the show : Point -> String function - automatically. The string will look something like "Point 3.1 4.2".

The second, manual way, involves implementing the functions that the class specifies in an instance declaration, e.g.:

instance Iterate Vector {
    // unimplemented
    size = ???
    &vec `elem` &elem = ???

    // ...
}

Class definitions can specify default implementations, and instance declarations are required only to provide an essential, minimally required set of implementations. For example, to become a member of the Eq type class, a type only has to implement either == or != functions - since the Eq type class provides default implementations for both of them in terms of each other.

Functional dependencies

The compiler automatically infers a functional dependency in the form col -> itt between the two type parameters of Iterate since at least one of the declared functions didn't use the itt parameter. A functional dependency is a restriction on the combination of the two (or more) type parameters, meaning that some parameters uniquely determine others. In this case, the col parameter uniquely determines itt, so that only a single iterator type can be matched to each collection type (but it it irrelevant if the same iterator type is used for multiple collection types).

This is necessary because functions inside the class declaration which don't use all of the type parameters would not be statically dispatchable, since multiple possible definitions in different instance declarations could exist. For example, if there wasn't a functional dependency col -> itt, the function elem : &col a -> &a -> Bool* could have multiple valid definitions, since multiple instance declarations could use the same col type but a different itt type (which would be pointless, but must still be prevented).

Pattern matching

Pattern matching is one of the most useful features in Mona, which in essence allows for three things: matching values against other values (for example 1 with 1 or a sum type variant with itself), destructuring composed values (like tuples or other product types) in order to bind their constituent parts (like the x and y components of a Point), or finally a combination of the two (like matching the x with a concrete 0 and binding y locally).

In recipes

This is best explained using an example with the familiar map function, which creates a new list by applying a function to every element of another list:

map : (&a -> b) -> &[a] -> [b]
map _ [] = []
map f x::xs = (f x)::(map f xs)

There are two examples of patterns here, showcasing the first and the second use case. The first pattern is in the first recipe for map: most importantly, the second argument, a list to map over, only matches the empty list ([]). An interesting pattern is the underscore for the first argument in the first recipe (_) - it matches any value of the correct type, but doesn't bind it to a name. Essentially, it is a way of saying that the argument is still required, but irrelevant. In the above case, that makes sense, since any function mapped over an empty list will result in an empty list. The second pattern is an example of destructuring mentioned before. The second argument, a single list, is destructured into two constituent parts: its head, bound to x, and its tail (another list), bound to xs. This can be done for record syntax as well:

printLength &Vector { length=l } = println l

The argument here is a single vector, but it is destructured, and the only bound variable in the function is l, a reference to the vector's length. The part after field= is a pattern as well, i.e. if the field is a composite type it can be destructured further, or matched to a specific value.

With the match keyword

Pattern matching can also be done locally using the match keyword, to match an arbitrary value:

let some_list = [1, 2, 3]

print $ match some_list {
    case [] -> "empty",

    // x borrows from some_list
    case x::[] -> "single element",

    // matches anything, but only if no previous matches were found because it's the last match
    case _ -> "multiple elements"
}

In local matches, all bindings are implicitly references to the (inside of the) matched value. The mutability of the reference depends on the mutability of the original binding of the value being matched.

Ownership and move semantics

Ownership and move semantics are two concepts inherited from Rust and assure that many classes of errors, like like pointer aliasing, use-after-free, and multiple deallocations, are prevented.

Ownership

Bindings ("variables") own the objects they are bound to, and when the bindings go out of scope, Mona cleans up the resources and deallocates memory on the heap.

someFunc () = {
    let x = box 15
    let y = *x + 15

    // ...

    // x and y go out of scope
    // their destructors are called
    // heap memory of x is deallocated since the binding owned it
}

Bindings which own resources that need to be cleaned in a more complicated manner (like handles which need to be closed manually), can be made instances of the Destruct type class, and implement the destruct function, which is then called automatically when such bindings exit the scope.

Move semantics

Additionally, each object always has only a single owner. This means that assignment and argument-passing don't use copy semantics: they use move semantics. Effectively, once a local object is moved from one binding to another - no memcpy is done. Instead, the original binding is simply invalidated. For example:

data Test = Test deriving (Show)

let x = Test
let y = x

// works
println $ y

// compiler error: use after move
println $ x

The same logic works for argument passing: once an object is copied to the function's frame, it is also considered moved, and the original binding cannot be used locally after the call.

Copy semantics

Many types don't point to resources, and can safely be copied instead of moved. Such types are members of the Copy type class, which alters the assignment operation and argument passing in order to use copy semantics. Int and Float are good examples:

// Int
let x = 56

// perform a copy assignment
let y = x

// works since x wasn't moved
print x

User-defined data types aren't implicitly Copy, and can only be made such if all their constituents are also Copy, by deriving (Copy).

Borrowing and references

Instead of directly transferring ownership of resources, it is possible to merely borrow them over some duration, through the use of references, a Mona feature that combine similar concepts from Rust and C++.

Borrow semantics

By default, the ownership over an object is transferred into a function when it is applied as an argument, and the function is responsible for destructing the object and the related cleanup. However, a function often just requires a pointer to the object, without being responsible for its resources. For those cases, Mona provides references, which inherit their borrow semantics from Rust.

The map function can again serve as a good example (this time with explicit references markers - &):

map : (&a -> b) -> &[a] -> [b]
map _ &[] = []
map f &x::xs = (f x)::(map f xs)

The second argument of map, a list over which to map the function, is borrowed. This means that the list is not copied from the calling function's frame into map, and it is not owned by any binding in map which would destroy the list when it goes out of scope - but merely a pointer is passed. Moreover, map cannot hand over ownership of the list to another binding.

The first argument, a function of type (&a -> b), also includes a reference marker - but not in front of the map's argument (functions are Copy), but before the function's inner argument a. This means that the function which map accepts as an argument accepts its first argument by-reference.

The ampersands before arguments and their types mean that the argument in question is not moved into a function, but rather borrowed.

Reference semantics

There are, however, differences between Mona and Rust references. Firstly, a reference marker before a function argument, or in a let statement, does not modify the the type of the binding - the binding is still considered of type a, even if it is written as &a What is modified are the passing or binding mechanisms involved, however, in the case of functions, type overall type of the function is still considered modified, meaning that function types (a -> b) and (&a -> b) are separate types.

The behavior of references is, from that point on, similar to that in C++ - they behave as regular bindings, and do not have to be explicitly dereferenced.

Borrow inference

When no explicit function type information is available, the arguments are inferred to be non-reference. This can be overriden by explicit reference markers in the implementation of the function.

For example, the following function f takes its first argument by-move:

let f = \x -> ()

However, this version takes its first argument by-reference:

let f = \ &x -> ()

For example, the filtering function in filter is specified with the type &a -> Bool, meaning that filter can be called with a lambda expression which doesn't explicitly provide a reference marker, such as in the expression:

filter { \x -> x `mod` 3*5 == 0 } [1..100]

Returning references

Functions can return references, which is useful for user-defined pointer types, get function's for collections/iterators, and accessing record fields (although those functions are automatically generated).

An explicit reference marker can be placed in front of the expression being evaluated as the function's value, in case no function type information is available.

Xvalues / rvalue references

In Mona, an explicit distinction between rvalue and lvalue references is unnecessary because all assignments of objects with resources involve moves. Thusly, the only value categories that need to be defined on the level of the language are are lvalues and rvalues - rvalues can be promoted to the level of lvalues in the scope of another function when borrowed.

C++ introduced the notion of an xvalue, because moving is not defined on the level of the language, and functions like std::move, and move constructors, had to be overloaded with explicit rvalue reference types (Type&&). However, in Mona, it is statically known depending on the type of the objects involved, whether a move or copy will be performed (always a move except for Copy types).

Pointer types and the heap

There are several pointer types in Mona, but this post will focus on Box a - a pointer to an object of type a allocated on the heap. The data constructor Box is not directly available, the function box : a -> Box a is exported instead. The data pointed to by the box is owned by the binding of the box, and this fallacious example demonstrates the familiar ownership mechanism:

// pointer to an integer on the heap
let a = box 5

// move a to b
let b = a

// compiler error: use after move
print *a

The box function takes ownership of a provided value with type a, allocates the required space on the heap, and returns an owning pointer of type Box a to that value.

The expression *a is the dereference operator applied to the pointer a. Dereference functionality is provided through the type class Deref, and is further explained in the Mutability section.

Boxed values are automatically deallocated once they exit their scope, since Box belongs to the Destruct type class.

Mutability

Mutability is a property of lvalues that determines whether or not they are reassignable. Most often, it is associated with bindings, i.e. named entities created in let statements. Mutability is marked via the mut keyword, in two scenarios: mutable bindings, and mutable borrows.

In let statements (mutable bindings)

The keyword mut can be used to signify a mutable binding, i.e. a binding that can later be reassigned. Example:

let mut a = 5
a = a + 10

Mutable borrows

Another use for the mut keyword is to borrow a binding through mutable references, by postfixing the borrow marker (&) with it, e.g. the += operator can be defined in this way:

(+=) : Num a => &mut a -> &a -> ()
a += b = {
    a = a + b
}

However, a function can also return a mutable reference. In such cases, the function application expression is categorized as n lvalue, and can be assigned to.

Mutability of Box a and Deref types

The contents of a Box inherit mutability from the Box's bindings. The Deref type class, which Box is an instance of, actually declares two functions: deref : &Box a -> &a and derefMut : &mut Box a -> &mut a, and the dereference operations dispatch them according to mutability.

Conclusion

This concludes the high-level overview of the Mona core language features, and each topic will probably be considered in much more detail in some future post - but from perspectives like implementation or specific use-cases.

The two areas where Mona is still evolving too rapidly to write about are included data structures (such as lists) and standard library structure. I will possibly devote another similar post entirely to such topics. Another potential topic are functional patterns in Mona.