9 May 2021

Monads, part four: examples

This post is part of a series on monads. In the previous post, I presented a technique for implementing new, bespoke monads and applied that technique to well-known, standard ones. In this post, I am going to show two examples of bespoke monads I have recently crafted using this technique, comparing code with and without a monad for the same use-cases and walking through the process of defining the monadic operations.

In my post about genetic search, I presented code that used a monad to thread through the random number generator. I did not, however, show the code without a monad or explain how to come up with it. Let's think through how that would work. Let's first define a Rng type so we get some type safety rather than just passing around a [Double] everywhere. What API does it need? We'll ignore its initialization here, which means it really just needs one function: produce a random number. Because we don't have side-effects, this same function needs to return the new "state" of the random number generator (rng) so we can get another one next time. Otherwise, we'd get a constant random number, which is not the most useful kind.

data Rng = Rng [Double]

rnd :: Rng -> (Rng, Double)
rnd (Rng ds) = (Rng (tail ds), head ds)

With that, we can write a few of the functions used in that blog post. In the interest of space, we're not going to rewrite the entire thing here1, but just look at three functions. Here they are:

make_solution :: Rng -> (Rng, (Double, Double))
make_solution rng_0 =
  let (rng_1, rand_x) = rnd rng_0
      (rng_2, rand_y) = rnd rng_1
  in (rng_2, (rand_x * 10, rand_y * 10))

mutate :: Rng -> (Double, Double) -> (Rng, (Double, Double))
mutate rng_0 (x, y) =
  let (rng_1, change_x) = rnd rng_0
      (rng_2, dx) = rnd rng_1
      (rng_3, change_y) = rnd rng_2
      (rng_4, dy) = rnd rng_3
      new_x = if change_x < 0.1 then x + dx - 0.5 else x
      new_y = if change_y < 0.1 then y + dy - 0.5 else y
  in (rng_4, (new_x, new_y))

step :: Rng -> [(solution, Double)] -> (Rng, [(solution, Double)])
step rng_0 prev =
  let survivors = take 10 prev ++ take 3 (reverse prev)
      (rng_1, children) = rep rng_0 87 (\rng_0 ->
        let (rng_1, parent1) = carousel rng_0 prev
            (rng_2, parent2) = carousel rng_1 prev
            (rng_3, child) = crossover rng_2 (fst parent1) (fst parent2)
            (rng_4, child') = mutate rng_3 child
        in (rng_4, fit child'))
      in (rng_1, srt (survivors <> children))

And here are the monadic versions for reference:

make_solution :: WithRandom (Double, Double)
make_solution = do
  rand_x <- GetRand
  rand_y <- GetRand
  return (rand_x * 10, rand_y * 10)

mutate :: (Double, Double) -> WithRandom (Double, Double)
mutate (x, y) = do
  change_x <- GetRand
  dx <- GetRand
  change_y <- GetRand
  dy <- GetRand
  let new_x = if change_x < 0.1 then x + dx - 0.5 else x
  let new_y = if change_y < 0.1 then y + dy - 0.5 else y
  return (new_x, new_y)

step :: [(solution, Double)] -> WithRandom [(solution, Double)]
step prev = do
  let survivors = take 10 prev ++ take 3 (reverse prev)
  children <- rep 87 (do
    parent1 <- carousel prev
    parent2 <- carousel (parent1 `Data.List.delete` prev)
    child <- crossover (fst parent1) (fst parent2)
    fit <$> mutate child)
  return $ srt $ survivors <> children

It's not a landslide change, and it's really easy to map individual lines between both styles. But if you have to edit one of them, in which one do you think it's the least likely you'll make a mistake in threading through the state of the random number generator? Which one is more work? If you have to add a step to the non-monadic version, will you have the patience to increment all the rng_x variables, or will you go for a rng_2' instead of rng_3 in the middle? How does that approach scale in number of lines of code?

How confident are you the non-monadic step function is correct as presented? Is it easy to check at a glance whether the rng_1 on the last line is related to the one set alongside parent1?

Assuming we agree the monadic style is better in this case, how do we get there? The structure of the monad is going to be determined by the operations we want to have available; in this case, we have just one operation needed: get the next random number. This operation depends only on the (implicit) state of the monad, meaning we don't need to pass it any argument, and the goal is to get back a random number, so we return a Double. This yields the monadic structure:

data WithRandom a where
  Bind :: WithRandom a -> (a -> WithRandom b) -> WithRandom b
  Return :: a -> WithRandom a
  GetRand :: WithRandom Double

What about its interpretation? Building up a monad is all well and good, but we need a run function to give it meaning. In this case, all we really need to do is thread through a [Double], which is reminiscent of the state monad, only we're not trying to be as general. This means we can just copy over the Bind and Return case from the state monad and just make sure we thread the [Double] through; it also means our run function needs to take a [Double] as its initial state. This yields:

run :: [Double] -> WithRandom a -> ([Double], a)
run rands = \case
  Return a -> (rands, a)
  Bind ma f ->
    let (rands', a) = run rands ma
    in run rands' (f a)
  GetRand -> ???

What do we need to do with the GetRand case? Well, we need to do two things:

  1. Return the next random number, and
  2. Make sure we update the state so the next number is not the same.

This, of course, is pretty much what the rnd function did, and is trivial to implement:

run :: [Double] -> WithRandom a -> ([Double], a)
run rands = \case
  Return a -> (rands, a)
  Bind ma f ->
    let (rands', a) = run rands ma
    in run rands' (f a)
  GetRand -> (tail rands, head rands)

The final step is to update the code to use this monad. The process is fairly easy: any function that used to take in a Rng and return a (Rng, a) now takes one fewer argument and returns a WithRandom a. As for the body of the function, simply replace all (rng_x, a) pattern matches with a <-, remove the rng parameters to function calls, and replace rnd with GetRand. (Or implement rnd as rnd = GetRand if you prefer lowercase actions.)

Hopefully this example reiterates the points of the previous post:

  1. Making a new monad is easy.
  2. The monadic form of the code is better, because computers are better than humans at keeping track of tedious things like not accidentally losing a state update.
  3. Switching code to monadic form is, in this case, pretty easy too; almost mechanical.

Next, let's look at a more complex example.

Interpreter

My second example is an interpreter (see full code for reference). As I described monads as mini-language interpreters, it should come as no surprise that they can be useful in writing actual interpreters. Let's start with the language we want to interpret. We're skipping over the text-to-ast parsing phase and we define the language directly in terms of the following data definitions:

newtype Name = Name String
  deriving (Eq, Ord, Show)
newtype Value = Value Int
  deriving (Eq, Show)

data Op
  = Add
  | Sub
  | Mul
  | NotEq
  deriving Show

data Exp where
  Lit :: Value -> Exp
  Var :: Name -> Exp
  Set :: Name -> Exp -> Exp
  Bin :: Op -> Exp -> Exp -> Exp
  Do :: [Exp] -> Exp
  While :: Exp -> Exp -> Exp
  Print :: Exp -> Exp
  deriving Show

This is a very simple, imperative language, with a looping construct and arithmetic operations, as well as variable names. The values are always integers.

As an example program, here is some code to compute a factorial:

fact :: Int -> Exp
fact x =
  let acc = Name "acc"
      i = Name "i"
  in
  Do [
    Set acc (Lit (Value 1)),
    Set i (Lit (Value x)),
    While (Bin NotEq (Lit (Value 1)) (Var i))
      (Do [
        Set acc (Bin Mul (Var acc) (Var i)),
        Set i (Bin Sub (Var i) (Lit (Value 1))),
        Print (Var acc)
      ])
  ]

Or, more precisely, this is a function which, given an integer, returns a program that will print each step of computing its factorial.

We can easily evaluate this without any monad, by just walking down the tree of expressions and evaluating each node as we come back up. Let's first define what evaluation means here: the language has a Print operation, meaning we need to consider side effects. It also has Set on its variables, meaning we need to carry around some sort of environment. Fortunately, this language only has one, global scope (no let or lambda), so we do not need to worry about nesting environments.

For the purpose of this blog post, we choose to represent the side-effects of the program by returning a value of type:

data TweIO = Output Int TweIO | Halt
  deriving Show

append :: TweIO -> Value -> TweIO
append Halt (Value v) = Output v Halt
append (Output p io) v = Output p (append io v)

and we'll assume that the environment has this API:

newtype Env = Env (Data.Map Name Value)
  deriving Show

mt_env :: Env
mt_env = Env Data.Map.empty

lookup :: Env -> Name -> Value
lookup (Env m) n = maybe undefined id (Data.Map.lookup n m)

insert :: Env -> Name -> Value -> Env
insert (Env m) n v = Env $ Data.Map.insert n v m

which, among other things, implies we assume all programs are correct and setup their variables before trying to read them.

With that out of the way, a simple tree-walking evaluator can be written as:

tree_walk_eval :: Exp -> TweIO
tree_walk_eval ex =
  let (_, io, _) = loop ex Halt mt_env
  in io
  where
  loop :: Exp -> TweIO -> Env -> (Value, TweIO, Env)
  loop exp0 out0 env0 =
    case exp0 of
      Lit v -> (v, out0, env0)
      Var n -> (lookup env0 n, out0, env0)
      Set n exp1 -> let (v, out1, env1) = loop exp1 out0 env0
                    in (v, out1, insert env1 n v)
      Bin op e1 e2 ->
        let (v1, out1, env1) = loop e1 out0 env0
            (v2, out2, env2) = loop e2 out1 env1
        in ((bin op) v1 v2, out2, env2)
      Do (exps) -> foldl (\(_, out1, env1) exp1 -> loop exp1 out1 env1) (undefined, out0, env0) exps
      While condition body ->
        let (Value c, out1, env1) = loop condition out0 env0
        in if c == 1
           then do
             let (_, out2, env2) = loop body out1 env1
             loop (While condition body) out2 env2
           else (undefined, out1, env1)
      Print exp1 -> let (v, out1, env1) = loop exp1 out0 env0
                    in (v, append out1 v, env1)

Here, each iteration of the loop returns a triple (Value, TweIO, Env), where:

  • Value is the result of the (presumably nested) expression we evaluated,
  • TweIO is the output produced so far, which we need to keep track of, and
  • Env is ambient state the nested call might have changed.

At this point, and especially in the context of this blog post, I'm hoping this type of structure is reminiscent of a monad. Specifically, it looks like we are trying to combine two "classic" monads: a state monad to keep track of the environment, and a logger monad to keep track of the side-effects.

So let's define a monad. The operations we need here are:

  • Print something.
  • Change a variable in the environment.
  • Get the value of a variable from the environment.

This translates directly to a monadic structure:

data EvalExec a where
  EvalBind :: EvalExec a -> (a -> EvalExec b) -> EvalExec b
  EvalReturn :: a -> EvalExec a
  EvalPrint :: Value -> EvalExec ()
  EvalSet :: Name -> Value -> EvalExec ()
  EvalLookup :: Name -> EvalExec Value

And the semantics are given by:

exec :: (Env, TweIO) -> EvalExec a -> (Env, TweIO, a)
exec (env_0, io_0) = \case
  EvalBind ma f -> let (env_1, io_1, a) = exec (env_0, io_0) ma
                   in exec (env_1, io_1) (f a)
  EvalReturn v -> (env_0, io_0, v)
  EvalLookup n -> (env_0, io_0, lookup env_0 n)
  EvalPrint v -> (env_0, append io_0 v, ())
  EvalSet n v -> (insert env_0 n v, io_0, ())

With that monad defined, we can write another evaluation function using it:

twe_mon :: Exp -> TweIO
twe_mon exp =
  let (_, io, _) = exec (mt_env, Halt) (eval exp)
  in io
  where
  eval :: Exp -> EvalExec Value
  eval = \case
    Lit v -> return v
    Var n -> do
      v <- EvalLookup n
      return v
    Set n exp -> do
      v <- eval exp
      EvalSet n v
      return v
    Bin op e1 e2 -> do
      v1 <- eval e1
      v2 <- eval e2
      return $ (bin op) v1 v2
    Do exps -> do
      mapM eval exps
      return undefined
    While condition body -> do
      c <- eval condition
      if Value 1 == c
      then do
        _ <- eval body
        eval (While condition body)
      else return undefined

    Print exp -> do
      v <- eval exp
      EvalPrint v
      return v

This looks like more code, as it makes more use of newlines, but I think there's an argument to be made for the readability of:

    Bin op e1 e2 -> do
      v1 <- eval e1
      v2 <- eval e2
      return $ (bin op) v1 v2

over

      Bin op e1 e2 -> do
        let (v1, out1, env1) = loop e1 out0 env0
        let (v2, out2, env2) = loop e2 out1 env1
        ((bin op) v1 v2, out2, env2)

What if I'm not writing in Haskell?

So far we have a good, fairly simple approach to making monads, which works great in Haskell. How does that work in other languages? I've recently come across a use-case for a monad while writing Clojure code, which I'll walk through in my next post.


  1. I, however, did rewrite the entire thing without a monad, in order to make sure the code I am showing works as advertized as part of a larger whole. You can see the entire thing here.

Tags: monad-tutorial