Monads, part four: examples
This post is part of a series on monads. In the previous post, I presented a technique for implementing new, bespoke monads and applied that technique to well-known, standard ones. In this post, I am going to show two examples of bespoke monads I have recently crafted using this technique, comparing code with and without a monad for the same use-cases and walking through the process of defining the monadic operations.
Genetic search
In my post about genetic search, I presented code that used a monad
to thread through the random number generator. I did not, however, show the
code without a monad or explain how to come up with it. Let's think through how
that would work. Let's first define a Rng
type so we get some type safety
rather than just passing around a [Double]
everywhere. What API does it need?
We'll ignore its initialization here, which means it really just needs one
function: produce a random number. Because we don't have side-effects, this
same function needs to return the new "state" of the random number generator
(rng) so we can get another one next time. Otherwise, we'd get a constant
random number, which is not the most useful kind.
data Rng = Rng [Double]
rnd :: Rng -> (Rng, Double)
rnd (Rng ds) = (Rng (tail ds), head ds)
With that, we can write a few of the functions used in that blog post. In the interest of space, we're not going to rewrite the entire thing here1, but just look at three functions. Here they are:
make_solution :: Rng -> (Rng, (Double, Double))
make_solution rng_0 =
let (rng_1, rand_x) = rnd rng_0
(rng_2, rand_y) = rnd rng_1
in (rng_2, (rand_x * 10, rand_y * 10))
mutate :: Rng -> (Double, Double) -> (Rng, (Double, Double))
mutate rng_0 (x, y) =
let (rng_1, change_x) = rnd rng_0
(rng_2, dx) = rnd rng_1
(rng_3, change_y) = rnd rng_2
(rng_4, dy) = rnd rng_3
new_x = if change_x < 0.1 then x + dx - 0.5 else x
new_y = if change_y < 0.1 then y + dy - 0.5 else y
in (rng_4, (new_x, new_y))
step :: Rng -> [(solution, Double)] -> (Rng, [(solution, Double)])
step rng_0 prev =
let survivors = take 10 prev ++ take 3 (reverse prev)
(rng_1, children) = rep rng_0 87 (\rng_0 ->
let (rng_1, parent1) = carousel rng_0 prev
(rng_2, parent2) = carousel rng_1 prev
(rng_3, child) = crossover rng_2 (fst parent1) (fst parent2)
(rng_4, child') = mutate rng_3 child
in (rng_4, fit child'))
in (rng_1, srt (survivors <> children))
And here are the monadic versions for reference:
make_solution :: WithRandom (Double, Double)
make_solution = do
rand_x <- GetRand
rand_y <- GetRand
return (rand_x * 10, rand_y * 10)
mutate :: (Double, Double) -> WithRandom (Double, Double)
mutate (x, y) = do
change_x <- GetRand
dx <- GetRand
change_y <- GetRand
dy <- GetRand
let new_x = if change_x < 0.1 then x + dx - 0.5 else x
let new_y = if change_y < 0.1 then y + dy - 0.5 else y
return (new_x, new_y)
step :: [(solution, Double)] -> WithRandom [(solution, Double)]
step prev = do
let survivors = take 10 prev ++ take 3 (reverse prev)
children <- rep 87 (do
parent1 <- carousel prev
parent2 <- carousel (parent1 `Data.List.delete` prev)
child <- crossover (fst parent1) (fst parent2)
fit <$> mutate child)
return $ srt $ survivors <> children
It's not a landslide change, and it's really easy to map individual lines
between both styles. But if you have to edit one of them, in which one do you
think it's the least likely you'll make a mistake in threading through the
state of the random number generator? Which one is more work? If you have to
add a step to the non-monadic version, will you have the patience to increment
all the rng_x
variables, or will you go for a rng_2'
instead of rng_3
in
the middle? How does that approach scale in number of lines of code?
How confident are you the non-monadic step
function is correct as presented?
Is it easy to check at a glance whether the rng_1
on the last line is related
to the one set alongside parent1
?
Assuming we agree the monadic style is better in this case, how do we get
there? The structure of the monad is going to be determined by the operations
we want to have available; in this case, we have just one operation needed: get
the next random number. This operation depends only on the (implicit) state of
the monad, meaning we don't need to pass it any argument, and the goal is to
get back a random number, so we return a Double
. This yields the monadic
structure:
data WithRandom a where
Bind :: WithRandom a -> (a -> WithRandom b) -> WithRandom b
Return :: a -> WithRandom a
GetRand :: WithRandom Double
What about its interpretation? Building up a monad is all well and good, but we
need a run
function to give it meaning. In this case, all we really need to
do is thread through a [Double]
, which is reminiscent of the state monad,
only we're not trying to be as general. This means we can just copy over the
Bind
and Return
case from the state monad and just make sure we thread the
[Double]
through; it also means our run
function needs to take a [Double]
as its initial state. This yields:
run :: [Double] -> WithRandom a -> ([Double], a)
run rands = \case
Return a -> (rands, a)
Bind ma f ->
let (rands', a) = run rands ma
in run rands' (f a)
GetRand -> ???
What do we need to do with the GetRand
case? Well, we need to do two things:
- Return the next random number, and
- Make sure we update the state so the next number is not the same.
This, of course, is pretty much what the rnd
function did, and is trivial to
implement:
run :: [Double] -> WithRandom a -> ([Double], a)
run rands = \case
Return a -> (rands, a)
Bind ma f ->
let (rands', a) = run rands ma
in run rands' (f a)
GetRand -> (tail rands, head rands)
The final step is to update the code to use this monad. The process is fairly
easy: any function that used to take in a Rng
and return a (Rng, a)
now
takes one fewer argument and returns a WithRandom a
. As for the body of the
function, simply replace all (rng_x, a)
pattern matches with a <-
, remove
the rng
parameters to function calls, and replace rnd
with GetRand
. (Or
implement rnd
as rnd = GetRand
if you prefer lowercase actions.)
Hopefully this example reiterates the points of the previous post:
- Making a new monad is easy.
- The monadic form of the code is better, because computers are better than humans at keeping track of tedious things like not accidentally losing a state update.
- Switching code to monadic form is, in this case, pretty easy too; almost mechanical.
Next, let's look at a more complex example.
Interpreter
My second example is an interpreter (see full code for reference). As I described monads as mini-language interpreters, it should come as no surprise that they can be useful in writing actual interpreters. Let's start with the language we want to interpret. We're skipping over the text-to-ast parsing phase and we define the language directly in terms of the following data definitions:
newtype Name = Name String
deriving (Eq, Ord, Show)
newtype Value = Value Int
deriving (Eq, Show)
data Op
= Add
| Sub
| Mul
| NotEq
deriving Show
data Exp where
Lit :: Value -> Exp
Var :: Name -> Exp
Set :: Name -> Exp -> Exp
Bin :: Op -> Exp -> Exp -> Exp
Do :: [Exp] -> Exp
While :: Exp -> Exp -> Exp
Print :: Exp -> Exp
deriving Show
This is a very simple, imperative language, with a looping construct and arithmetic operations, as well as variable names. The values are always integers.
As an example program, here is some code to compute a factorial:
fact :: Int -> Exp
fact x =
let acc = Name "acc"
i = Name "i"
in
Do [
Set acc (Lit (Value 1)),
Set i (Lit (Value x)),
While (Bin NotEq (Lit (Value 1)) (Var i))
(Do [
Set acc (Bin Mul (Var acc) (Var i)),
Set i (Bin Sub (Var i) (Lit (Value 1))),
Print (Var acc)
])
]
Or, more precisely, this is a function which, given an integer, returns a program that will print each step of computing its factorial.
We can easily evaluate this without any monad, by just walking down the tree of
expressions and evaluating each node as we come back up. Let's first define
what evaluation means here: the language has a Print
operation, meaning we
need to consider side effects. It also has Set
on its variables, meaning we
need to carry around some sort of environment. Fortunately, this language only
has one, global scope (no let
or lambda
), so we do not need to worry about
nesting environments.
For the purpose of this blog post, we choose to represent the side-effects of the program by returning a value of type:
data TweIO = Output Int TweIO | Halt
deriving Show
append :: TweIO -> Value -> TweIO
append Halt (Value v) = Output v Halt
append (Output p io) v = Output p (append io v)
and we'll assume that the environment has this API:
newtype Env = Env (Data.Map Name Value)
deriving Show
mt_env :: Env
mt_env = Env Data.Map.empty
lookup :: Env -> Name -> Value
lookup (Env m) n = maybe undefined id (Data.Map.lookup n m)
insert :: Env -> Name -> Value -> Env
insert (Env m) n v = Env $ Data.Map.insert n v m
which, among other things, implies we assume all programs are correct and setup their variables before trying to read them.
With that out of the way, a simple tree-walking evaluator can be written as:
tree_walk_eval :: Exp -> TweIO
tree_walk_eval ex =
let (_, io, _) = loop ex Halt mt_env
in io
where
loop :: Exp -> TweIO -> Env -> (Value, TweIO, Env)
loop exp0 out0 env0 =
case exp0 of
Lit v -> (v, out0, env0)
Var n -> (lookup env0 n, out0, env0)
Set n exp1 -> let (v, out1, env1) = loop exp1 out0 env0
in (v, out1, insert env1 n v)
Bin op e1 e2 ->
let (v1, out1, env1) = loop e1 out0 env0
(v2, out2, env2) = loop e2 out1 env1
in ((bin op) v1 v2, out2, env2)
Do (exps) -> foldl (\(_, out1, env1) exp1 -> loop exp1 out1 env1) (undefined, out0, env0) exps
While condition body ->
let (Value c, out1, env1) = loop condition out0 env0
in if c == 1
then do
let (_, out2, env2) = loop body out1 env1
loop (While condition body) out2 env2
else (undefined, out1, env1)
Print exp1 -> let (v, out1, env1) = loop exp1 out0 env0
in (v, append out1 v, env1)
Here, each iteration of the loop returns a triple (Value, TweIO, Env)
, where:
Value
is the result of the (presumably nested) expression we evaluated,TweIO
is the output produced so far, which we need to keep track of, andEnv
is ambient state the nested call might have changed.
At this point, and especially in the context of this blog post, I'm hoping this type of structure is reminiscent of a monad. Specifically, it looks like we are trying to combine two "classic" monads: a state monad to keep track of the environment, and a logger monad to keep track of the side-effects.
So let's define a monad. The operations we need here are:
- Print something.
- Change a variable in the environment.
- Get the value of a variable from the environment.
This translates directly to a monadic structure:
data EvalExec a where
EvalBind :: EvalExec a -> (a -> EvalExec b) -> EvalExec b
EvalReturn :: a -> EvalExec a
EvalPrint :: Value -> EvalExec ()
EvalSet :: Name -> Value -> EvalExec ()
EvalLookup :: Name -> EvalExec Value
And the semantics are given by:
exec :: (Env, TweIO) -> EvalExec a -> (Env, TweIO, a)
exec (env_0, io_0) = \case
EvalBind ma f -> let (env_1, io_1, a) = exec (env_0, io_0) ma
in exec (env_1, io_1) (f a)
EvalReturn v -> (env_0, io_0, v)
EvalLookup n -> (env_0, io_0, lookup env_0 n)
EvalPrint v -> (env_0, append io_0 v, ())
EvalSet n v -> (insert env_0 n v, io_0, ())
With that monad defined, we can write another evaluation function using it:
twe_mon :: Exp -> TweIO
twe_mon exp =
let (_, io, _) = exec (mt_env, Halt) (eval exp)
in io
where
eval :: Exp -> EvalExec Value
eval = \case
Lit v -> return v
Var n -> do
v <- EvalLookup n
return v
Set n exp -> do
v <- eval exp
EvalSet n v
return v
Bin op e1 e2 -> do
v1 <- eval e1
v2 <- eval e2
return $ (bin op) v1 v2
Do exps -> do
mapM eval exps
return undefined
While condition body -> do
c <- eval condition
if Value 1 == c
then do
_ <- eval body
eval (While condition body)
else return undefined
Print exp -> do
v <- eval exp
EvalPrint v
return v
This looks like more code, as it makes more use of newlines, but I think there's an argument to be made for the readability of:
Bin op e1 e2 -> do
v1 <- eval e1
v2 <- eval e2
return $ (bin op) v1 v2
over
Bin op e1 e2 -> do
let (v1, out1, env1) = loop e1 out0 env0
let (v2, out2, env2) = loop e2 out1 env1
((bin op) v1 v2, out2, env2)
What if I'm not writing in Haskell?
So far we have a good, fairly simple approach to making monads, which works great in Haskell. How does that work in other languages? I've recently come across a use-case for a monad while writing Clojure code, which I'll walk through in my next post.