Monads, part six: But, really, what is a monad?
In the first entry in this series, I argued that "what is a monad?" is not a useful question for the working programmer. In the rest of the series so far, I have explained how to recognize situations in which a monad could be useful, how to apply monads, and how to create bespoke ones.
But I still have not really defined what a monad is, and, while I stand by the argument that the answer is not useful, I can imagine that not having the answer may be a bit frustrating. In this post, I'll try to explain what a monad is, as best I can.
If you are a mathematician, please don't hurt yourself while reading this. I know I am taking a lot of shortcuts here, and some of the statements are arguably so simplified they're wrong. I still think this is a valuable window into what category theory is and how it relates to programming, for practicing programmers with no category theory training.
Definition
The most direct definition of a monad is that it is a monoid in the category of endofunctors. If you're reading this post, though, chances are that doesn't help you much, so let's dig a bit more in each of these concepts.
Monoid
A monoid (in algebra) is a mathematical structure composed of, on the one hand,
a set of elements E
, and, on the other hand, some operation op
, such that:
op
is of typeE -> E -> E
.- There exists an element
u
inE
called the unit, such thatop a u == op u a == a
for alla
inE
. - For any three elements
a
,b
andc
ofE
,op a (op b c) == op (op a b) c
.
Here are a few examples of monoids you may be familiar with, even if you don't know they are monoids:
- The set of all natural1 numbers, \(\mathbb{N}\), with the multiplication operation. The unit in this case is \(1\).
- The set of all real numbers, \(\mathbb{R}\), with the addition operation. The unit in this case is \(0\).
- The set of hours on a clock (1-12), with the addition operation. The unit in this case is 12.
The parallel to programming here is that we can imagine that types are sets, and that programming language functions are real mathematical functions, and then we can pretend we have monoids in programming languages. (If we squint a bit and ignore things like overflows, non-termination, etc.)
In a language like Haskell, the following things could be seen as monoids:
- The type
Int
, along with the operation+
. The unit is0
. - The type
String
, along with the operation<>
. The unit is the empty string""
. - The type
a -> a
, along with the operation.
. The unit is the identity functionidentity x = x
.
The Haskell type system is strong enough to identify the notion of a monoid as a typeclass. It is, in fact, defined in the standard library.2
Is this useful to the working programmer? Actually, yes, very much so. Say you
have a list of a
and you need to reduce it to a single value of type a
by
using a binary operation, knowing that this type and operation together form a
monoid means that:
- You can choose between
foldl
andfoldr
based on non-functional requirements (performance, laziness, etc.), because you know the result will be unchanged. - Furthermore, you can automatically parallelize that reduction with a merge-sort type of approach.
Category
Let us start with a simplified definition of what a category is. A category (for our purposes) is a tuple \(O, M\) such that:
- \(O\) is a set3 of objects: these are the "things" we work with.
- \(M \subseteq O \times O\) is a set of morphisms: these are the "operations" we can apply on the things we work with.
- \(\forall o \in O: (o, o) \in M\): for each of the things we work with, there exists an operation that leaves it unchanged. We usually call morphisms of the form \((x, x)\) "identity". Sometimes, as a shortcut, we say "the identity" to designate the set of all such morphisms.
- \(\forall (a, b), (b, c) \in M: (a, c) \in M\): if you think of objects as dots and morphisms as arrows, this means that any time you have a path between two objects, there is also a direct route between them.
Categories (even small ones) are actually larger than that, in that it is perfectly valid for a category to have multiple morphisms between the same two objects. In that case, each morphism can be independently identified, and the last bullet point above associates a possibly-separate direct route to every single path between two objects. The direct route associated to a given path is called the composition of that path. Note that in the case where there are multiple morphisms from an object to itself, usually only one will be an identity morphism.
A more correct, but more opaque, definition could be to say that a category is a collection of objects and morphisms that respect identity (every object has a morphism from itself to itself) and composition (for any two morphisms such that one ends where the other one begins, there is a morphism that goes from the beginning of the first to the end of the second, and that third morphism is equal to the composition of the other two). Note that identity is defined in terms of morphisms: it is the identity for morphism composition. Objects themselves are opaque and only serve to identify the beginning and ending of morphisms.
I'm not sure how to explain this more clearly, so let's walk through an example.
The way to "apply" category theory to a problem is to first try to construct a "diagram" of dots and arrows, and then verify whether that diagram respects the properties of identity and composition. If it does, then we can further apply known results of category theory to gain insights into the underlying structure of the diagram we just created (and hopefully, from that, derive insights into the original problem).
Let's start with natural numbers and addition. We can construct a diagram in which each natural number is represented as a dot, and in which we add an arrow between any two dots \(a\) and \(b\) if there exists an operation "add \(x\)" such that \(a + x = b\) (where \(a\), \(b\), and \(x\) are all natural numbers).
Does that form a category? The answer is yes, because:
- Every dot (object) has an arrow "\(+0\)" that goes from itself to itself (identity morphisms), which is the identity arrow for morphism composition in this diagram.
- For any three dots (objects) \(a\), \(b\) and \(c\), if there is an arrow from \(a \to b\) and an arrow \(b \to c\), there will also be an arrow \(a \to c\): "add \((c - b - a)\)".
Since the diagram respects both identity and composition, it does form a category, and we can use category theory results on it. What could we gain from that? We can observe that, for any two objects, there is only one morphism between them. A category with this property is called an order, and it's no accident that that name was used: it really does imply our definition can be used to "order" the integers.
Of course, this is not a new insight (we already know the integers are ordered), but hopefully this can illustrate what category theory might be for, in more complex cases.
Let's try to construct another diagram. Let's use a dot to represent a Haskell
type, and an arrow to represent a Haskell function4. For any dot, there
may be many arrows from that dot to itself. For example, the Int
type will
have an arrow for \(x \to x + 1\), another for \(x \to x + 3\), another for
\(x \to 2x\), etc. There will also be arrows between the dots. For example,
there would be an arrow from String
to Int
for the function length
, and
another for the function read
, etc.
Is that diagram a category? It turns out that it is. First, we have to define
what composition means. Since our morphisms are functions, it makes sense to
consider function composition, which turns out to have exactly the properties
we need. For any two functions/morphisms f :: a -> b
and g :: b -> c
, we
can construct a new function/morphism fg = g . f
of type a -> c
. Because
fg
is a valid Haskell function, it is part of our diagram; because its type
is indeed going from the beginning of f
to the ending of g
, it does take
the right "place" in the diagram.
Now, identity. In Haskell, there is a (single) identity: a -> a
function.
From a category theory perspective, we'll consider that there is a separate
"instance" of that function for each type. Is that a good identity morphism?
Remember, the point is not whether or not it leaves elements unchanged, but
whether it serves as an identity for morphism composition, i.e. f . identity = identify . f = f
, which is indeed the case for the specific notion of
composition we have chosen (i.e. direct function composition).
We have both identity and composition, so this is a category. We'll refer to it as "the Haskell category", or sometimes just "Haskell", for short. (Still squinting and ignoring things like memory limitations, non-terminating computations, etc.).
Functor
An "endofunctor" is a special type of functor; before we can explain in what sense it is special, we need to understand what a functor is.
A functor is a mapping between two categories \(A\) and \(B\) that maps every morphism \(f\colon a \to b\) in \(A\) to a morphism \(F_f \colon F_a \to F_b\) in \(B\), every composition \(g\cdot f\) to the composition \(F_g \cdot F_f\), and every identity \(\textrm{id}_a\) to \(\textrm{id}_{F_a}\) (where \(a\) and \(b\) are objects in \(A\), \(f\) and \(g\) are morphisms in \(A\), \(F_a\) and \(F_b\) are objects in \(B\), \(F_f\) and \(F_g\) are morphisms in \(B\), and \(\textrm{id}_x\) is the identify morphism on \(x\), in the category \(x\) is an object in).
Concretely, this means that a functor is a transformation that conserves the categorical structure of a category.
These rules allow a functor to collapse its source category: a transformation that reduces all of the objects in the source category to a single object in the target category, and all morphisms in the source category to the identity morphism on that object in the target category would count as a functor (though not necessarily a very useful one).
They also allow for embedding: the target category does not need to be "covered" by the functor; it is acceptable for the functor to map the entirety of the source category to just a small subset of the elements of the target category.
What functors cannot do is tear apart existing relationships. In a graph sense, functors are allowed to merge nodes but not break connections.
Endofunctor
An endofunctor is a functor for which the source and target categories are one and the same. It will typically be an embedding, because otherwise it would be an identity transformation mapping every single element to itself and that is not very useful.
If we see a programming language like Haskell as a single category (the one formed by types and functions, as explained above), it makes sense that we would be more interested in endofunctors than in the more general notion of functor: what could be the target category of a "non-endo" functor?
As a first concrete example, I will assert that Maybe
is an endofunctor in
the Haskell category. (This is a pretty safe bet, as it is an an instance of
Functor in the standard library.)
Does that hold? It is a mapping from type a
to type Maybe a
, so it does
look like it fits the first criterion: it maps objects in the Haskell category
to objects in the Haskell category (as an embedding). What does it do with
morphisms? As morphisms are functions in the Haskell category, a naïve
interpretation of this question could lead one to suggest is also maps a -> b
to Maybe (a -> b)
. While it is true that it does, that is not an answer to
the question: a -> b
is an object in the Haskell category, not a morphism.
Morphisms are functions, not function types. In order to manipulate functions,
we need a function, not a type constructor. This leads us to the definition of
Functor
in Haskell:
class Functor f where
fmap :: (a -> b) -> f a -> f b
or, in other words, a functor is a tuple (type constructor, fmap
function).
The type constructor maps objects, the fmap
function maps morphisms.
Type constructors for single-parameter algebraic data types like Maybe
always
respect all the relevant rules of what makes a functor, so the bulk of the
constraints are going to be on the definition of fmap
.
Let's look at the definition of fmap
for Maybe
:
instance Functor Maybe where
fmap _ Nothing = Nothing
fmap f (Just a) = Just (f a)
Does that preserve identity and composition? In other words, expressed in pseudo-Haskell, do the following hold?
fmap id = id
fmap (f . g) == fmap f . fmap g
It's pretty easy to prove that they do:
fmap id Nothing = Nothing -- definition of fmap _ Nothing
= id Nothing -- definition of id applied in reverse
fmap id (Just a) = Just (id a) -- definition of fmap f (Just a)
= Just a -- definition of id
= id (Just a) -- definition of id applied in reverse
fmap (f . g) Nothing = Nothing -- definition of fmap _ Nothing
(fmap f . fmap g) Nothing = fmap f (fmap g Nothing) -- definition of .
= fmap f Nothing -- definition of fmap _ Nothing
= Nothing -- definition of fmap _ Nothing
fmap (f . g) (Just a) = Just ((f . g) a) -- definition of fmap f (Just a)
= Just (f (g a)) -- definition of .
(fmap f . fmap g) (Just a) = fmap f (fmap g (Just a)) -- definition of .
= fmap f (Just (g a)) -- definition of fmap f (Just a)
= Just (f (g a)) -- definition of fmap f (Just a)
Defining an endofunctor in Haskell is not very difficult: you need a
one-argument type constructor, and you need to define an implementation of
fmap
that does not change the structure of the functor.
Natural transformations
We're almost there. We have all of the basic concepts (monoid, category, endofunctor), now all that is missing is to put them together.
First, for the definition to make sense, we need to believe that endofunctors form a category. We'll use the endofunctors theselves as objects, but that does raise the question of what the morphisms could be.
For that, we need one more notion from category theory: natural transformations. In the general case, a natural transformation \(N\) between two functors \(F\) (from \(C\) to \(D\)) and \(G\) (from \(C\) to \(D\)) is a selection of morphisms in \(D\) such that, for every object \(c\) in \(C\), \(F_c\) is mapped to \(G_c\), and, for every morphism \(f\) between \(a\) and \(b\) (in \(C\)), \(G_f\cdot N_a = N_b \cdot F_f\).
Let's unpack that, and simplify it a little bit for our concrete case of endofunctors in the Haskell category.
As described above, \(F\) and \(G\) correspond to type constructors
F
and G
, both of which are assumed to have valid definitions of fmap
in their Functor
instance. A natural transformation between them would be a
selection of Haskell functions such that:
- For any type
a
, we have a functionalpha : F a -> G a
(Haskell will let us create that family of functions with a single definition). - For any function
a -> b
,fmap f . alpha = alpha . fmap f
. Under our assumptions (bothfmap
definitions are correct), this is actually always true.
We can now construct this "category of endofunctors": it is the set of all Haskell functor types (as objects), along with all of the "natural transformations", i.e. functions that map between these various functors (as morphisms).
To prove that it is a category, we need two things: a notion of composition, and a notion of identity that works for that notion of composition.
As natural transformations are a subset of the morphisms in \(D\), it is quite obvious that there is a way to compose them. The only question is whether or not the composition of two natural transformation is always itself a natural transformation.
In the specific case of Haskell, we can see that it is the case: if natural
transformations are of type f a -> g a
, they can be composed just like any
other functions, and it follows directly from the type of the .
operator
that, for two natural transformations alpha : F a -> G a
and beta : G a -> H a
, the composition F a -> H a
is still a mapping between two functors and
thus still a natural transformation.
If we can still use .
for composition, we can also still use identity
for
identity.
Back to monoid
We've discussed monoids in terms of algebra above. But we're now mired in category theory, so what is a monoid here? Fortunately, mathematicians do not pick their names at random: a monoid in category theory is very similar to a monoid in algebra. So much so that, in our specific case of the Haskell category, it essentially boils down to the same thing.
We need an additional trick before we can discuss monoids in category theory terms: a way to "pick" the unit for the monoid. In category theory, objects are opaque, they do not have internal structure. Therefore, there is no direct way for us to talk about individual values for a given type.
We can, however, talk about morphisms. In the case of Haskell, there is a
function () -> a
for every possible value of type a
, which means we can
actually talk about individual values by instead talking about the morphism
that produces it.
Informally, in the Haskell category, we can think of a monoid in the following terms:
- An object, i.e. a type
T
. - A function that operates on values of T and composes with itself in an
associative way:
<> : T -> T -> T
such thata <> (b <> c) = (a <> b) <> c
. Note that in a very real way this is a shorthand for<> : (() -> T) -> (() -> T) -> (() -> T)
, at least if we look at it from the perspective of category theory. - A single element in the set underlying
T
to serve as the unit,unit : T
, or more precisely a single morphism that identifies that element,unit: () -> T
.
Mapping these to the category of endofunctors, we need to think a bit about
what the equivalent of ()
is for endofunctors. In the Haskell category, it is
the type that has only one value. In general, categories do not have internal
structure and this kind of object is identified by the structure of the
morphisms around it; specifically, the object ()
in the Haskell category as
we've defined it is a terminal object.
What could play the same role of a terminal object in the category of endofunctors? It has to be a functor, in order to be an object in it. Without going into too many details as to why, it turns out the identity functor can play the same role. The identify functor is the one that maps every object and every morphism to itself, i.e. in Haskell notation no functor at all.
Using that object as our "source" for functor values, the above definition of a monoid translates to the category of endofunctors as:
- An object, i.e. a functor
m
. - A function that operates on values of
m
and composes with itself in an associative way:>=> : (a -> m a) -> (a -> m a) -> (a -> m a)
.5 - A morphism from the "unit" of the category of endofunctors to
m
that identifies a specific element,return : a -> m a
.
Functors do not necessarily have return
, but for many it is not hard to
define. The definition of return
will elevate one of the constructors to a
somewhat sepcial status.
The >=>
operation is going to be specific to the functor. The monoid rules
become:
return >=> h == h
f >=> return == f
(f >=> g) >=> h = f >=> (g >=> h)
which are, incidentally, another way to write the monadic laws.
Conclusion
And there you have it, this is the definition of a monad: a type with a
definition for return
, fmap
, and >=>
. Or, in other words, a
monoid in the category of endofunctors.
Is this useful? Probably not in the daily practice of programming. I still find it interesting to know.
In practice, the Haskell typeclass for Monad
is defined in terms of
>>=
rather than >=>
, which can be defined as:
(>=>) :: Monad m => (a -> m b) -> (b -> m c) -> (a -> m c)
f >=> g = \x -> f x >>= g
In this blog post, any mention of "natural" or \(\mathbb{N}\) is meant to include \(0\).
↩The Haskell type system is not, however, strong enough to prove the properties of a monoid. You can probably trust the instances of the standard library, but do keep in mind that if you ever want to define your own instance of Monoid, it behoves you to ensure the monoid properties hold.
↩In category theory, categories can be "larger" than sets; categories for which the collection of morphisms and objects both form a set are called "small" categories. Small categories are enough for our purposes here, and dealing with sets is easier to reason about. I'm also going to be quite liberal with my use of the word
↩subset
to refer to any way to pick apart some elements, but not all, of a larger whole, regardless of whether the source or the target are actually sets.If you are not too familiar with Haskell, it is worth remembering at this point that Haskell functions are always functions of exactly one argument. If you have multiple arguments in mind, the rest of this discussion may not make as much sense.
↩This can be trivially extended to
↩(a -> m b) -> (b -> m c) -> (a -> m c)
, which is more useful in practice.