Laziness: Clojure vs Haskell
Last week I punted on randomness, and just made my genetic_search
function take a [Double]
. While that was convenient, it is unfortunately not
as general as I thought at the time. I'm still in the process of learning
Haskell, and I got confused between laziness in Clojure and laziness in
Haskell.
So how do they differ? The one-word answer is "purity", but let me try to
expand on that a little bit. Clojure has the "seq" abstraction, which is
defined by the two functions first
and next
, where first
gives you an
element and next
gives you another seq if there are more elements to be
had, or nil
if there are none. When I think of a lazy list, I think in
terms of Clojure seqs, even though Clojure lists are actually not lazy.
How is that different from Haskell? In Clojure, a lazy seq is one where the elements are explicitly produced on demand, and then cached. This sounds a lot like laziness in Haskell, except for one crucial difference: Clojure does not mind how these elements are produced, and in particular, whether that involves any kind of side effects.
In Haskell, on the other hand, laziness means that the elements of a list will be computed on demand, but that only applies to pure computation. There is no lazy side effect in Haskell.
Here is a short Clojure program to illustrate this notion further. This program will ask the user for positive integers and then print a running total.
First, we start with a seemingly pure function which, given a possibly lazy seq of numbers, returns its sum so far:
(defn sum-so-far
[ls]
(reductions + 0 ls))
Testing this out yields the expected behaviour:
t.core=> (sum-so-far [1 2 3 4 5])
(0 1 3 6 10 15)
t.core=>
The range
function, with no argument, returns an infinite lazy seq of
integers, starting with 0. We can use our sum-so-far
function on it:
t.core=> (take 10 (sum-so-far (range)))
(0 0 1 3 6 10 15 21 28 36)
t.core=>
We can also construct a lazy seq by asking the user for input:
(defn read-nums
[]
(print "Please enter a number: ")
(flush)
(cons (Integer/parseInt (read-line)) (lazy-seq (read-nums))))
Testing it works as expected:
t.core=> (take 5 (read-nums))
Please enter a number: 1
Please enter a number: 2
Please enter a number: 3
Please enter a number: 4
Please enter a number: 5
(1 2 3 4 5)
t.core=>
Similarly, we can easily compute the running sum:
t.core=> (take 5 (sum-so-far (read-nums)))
Please enter a number: 1
Please enter a number: 2
Please enter a number: 3
Please enter a number: 4
(0 1 3 6 10)
t.core=>
So we have this one function, sum-so-far
, that can handle any Clojure seq,
regardless of how it is processed, and produces a new seq itself. Such a
function is better thought of as a filter acting on a stream than as a function
taking an argument and returning a result.
Let's look at the Haskell equivalent. The sum-so-far
function seems easy
enough:
sum_so_far :: [Int] -> [Int]
sum_so_far is =
loop 0 is
where
loop :: Int -> [Int] -> [Int]
loop sum [] = [sum]
loop sum (h:t) = sum : loop (sum + h) t
I don't know if Haskell has a direct equivalent to Clojure's reductions
, but
that's not the point here and it's easy enough to code our own. This obvisouly
works as intended on both finite and infinite lists:
*Main> sum_so_far [1, 2, 3, 4, 5]
[0,1,3,6,10,15]
*Main> let ints = 1:map (+1) ints
*Main> take 10 $ sum_so_far ints
[0,1,3,6,10,15,21,28,36,45]
*Main>
But what about read-nums
? It's not too hard to replicate the beginning of the
function:
read_nums :: IO [Int]
read_nums = do
int <- read <$> getLine
return int : ???
How can we replace those ???
? Well, the :
("cons") function needs a list as its second argument, so we could try constructing that:
read_nums :: IO [Int]
read_nums = do
int <- read <$> getLine
tl <- read_nums
return $ int : tl
That typechecks. But the whole point of monads is to sequence computation:
there is no way we can reach the return
line without having first produced
the entire tl
, and thus we're not lazy anymore and can't pile on more
processing on top of this.
What other option do we have? Perhaps Hoogle knows of a functiont that would help here? The type we'd need would look something like:
Int -> IO [Int] -> IO [Int]
so we could replace :
and use that instead. Hoogle does find a result for
that, but it's not quite what we need here. We could of course write a function
with that signature easily enough:
ignore_tl :: Int -> IO [Int] -> IO [Int]
ignore_tl i _ = pure [i]
but that's obviously not what we want.
Are we stuck? Let's take a step back. The solution here is to realize that
Clojure seqs are not the same as Haskell lists. Instead of thinking of
sum-so-far
as a function, let's go back to the idea of thinking of it as a
filter between two streams. What would it take to construct such a filter in
Haskell? We'd need a type with the following operations:
- Produce an element in the "output" stream.
- Request an element from the "input" stream.
- Let my consumer know that I will not be producing further elements.
The decoupling Clojure gives us is to be completely independent of how the
input elements are produced and to produce the output elements on-demand. Let's
model this a bit more precisely. We need a data definition OnDemand
that
represents a filter between two streams of elements. The filter could change
the type, so we'll make it take two type parameters: an input one and an output
one. We start with:
data OnDemand input output
Next, we need to be able to express "here is an element" and "there are no more
elements". We can take a page from the List
book and express those exactly
like Nil
and Cons
:
= Halt
| Out output (OnDemand input output)
Finally, we need to be able to ask for a new element, wait for it, and then keep going. This phrasing suggests we need to suspend the current computation to give our supplier the opportunity to manufacture an input, and keep going after that. A generally good way to model suspended computations is with a continuation:
| In (Maybe input -> OnDemand input output)
where the input parameter is wrapped in a Maybe
because the input stream may
have ran out.
Using this definition, we can rewrite our sum_so_far
as a filter:
{-# LANGUAGE LambdaCase #-}
{- ... -}
sum_so_far :: OnDemand Int Int
sum_so_far =
loop 0
where
loop :: Int -> OnDemand Int Int
loop sum = Out sum (In $ \case
Nothing -> Halt
Just n -> loop (sum + n))
We can make this work on lists again with a simple adapter:
convert_list :: OnDemand input out -> [input] -> [out]
convert_list = \case
Halt -> \_ -> []
Out out cont -> \ls -> out : convert_list cont ls
In f -> \case
[] -> convert_list (f Nothing) []
hd:tl -> convert_list (f $ Just hd) tl
and we can use (convert_list sum_so_far)
as we did before, with both finite
and infinite lists:
*Main> (convert_list sum_so_far) [1, 2, 3, 4, 5]
[0,1,3,6,10,15]
*Main> let ints = 1:map (+1) ints
*Main> take 10 $ (convert_list sum_so_far) ints
[0,1,3,6,10,15,21,28,36,45]
*Main>
But let's stay in the realm of streams for a bit. First, let's define a simple function to produce a stream from a list:
out_list :: [a] -> OnDemand () a
out_list [] = Halt
out_list (h:t) = Out h (out_list t)
Then, let's define a function to drain a stream into a list:
drain :: OnDemand () b -> [b]
drain = \case
Halt -> []
Out b kont -> b : drain kont
In f -> drain $ f Nothing
Now, we can do fun stuff like
*Main> drain $ out_list [1, 2, 3, 4]
[1,2,3,4]
*Main>
Ok, so maybe that's not so much fun yet. We'd like to be able to express the
equivalent of take 10 $ sum_so_far $ ints
. Let's first work on each of these
pieces. We can get a stream of naturals with
ints :: OnDemand () Int
ints = loop 1
where
loop n = Out n (loop (n + 1))
and we can limit a stream to a given number of elements with:
take_od :: Int -> OnDemand a a
take_od 0 = Halt
take_od n = In (\case
Nothing -> Halt
Just a -> Out a (take_od $ n - 1))
We now have all the pieces. What's the equivalent of $
? We need to take two
filters and return a filter than combines them. Here is the code:
join :: OnDemand a b -> OnDemand b c -> OnDemand a c
join od1 od2 = case od2 of
Halt -> Halt
Out c kont -> Out c (join od1 kont)
In f -> case od1 of
Halt -> join od1 (f Nothing)
Out b kont -> join kont (f $ Just b)
In g -> In (\ma -> join (g ma) od2)
We start from the outer filter. If that one says to stop, we don't need to look
into any more input from the inner filter. If we have an output ready, we can
just produce that. So far, so good. What happens if the outer filter needs an
input? Well, in that case, we need to look at the inner one. Does it have an
output ready? If so, we can just feed that into the outer filter. Is it halted?
We can feed that information into the outer filter by calling its continuation
with Nothing
. Finally, if the inner filter itself is also waiting for an
input, we have no other choice but to ask for more input from the context.
We can now have a bit more fun:
*Main> drain $ ints `join` sum_so_far `join` take_od 20
[0,1,3,6,10,15,21,28,36,45,55,66,78,91,105,120,136,153,171,190]
*Main>
This may look like the beginnings of a useful abstraction. But can we do IO with it?
Let's try to write read_nums
. We still cannot write a OnDemand () (IO Int)
that would be useful, just like we could not write a useful IO [Int]
. But the
whole point of this OnDemand
stuff is to do operations one at a time. So
let's define a function that gets a single integer:
import qualified Text.Read
{- ... -}
read_num :: IO (Maybe Int)
read_num = Text.Read.readMaybe <$> getLine
We cannot create an infinite stream of lazy IO actions. We've already gone
through that rabbit hole. But what we can do is define a function that will
run a filter within the IO
context and generate all the required elements on
demand:
process :: IO (Maybe a) -> OnDemand a b -> IO [b]
process io = \case
Halt -> return []
Out hd k -> do
tl <- process io k
return $ hd : tl
In f -> do
input <- io
process io (f input)
Now we can use the exat same sum_so_far
filter with values coming from pure
and impure contexts:
*Main> drain $ ints `join` sum_so_far `join` take_od 10
[0,1,3,6,10,15,21,28,36,45]
*Main> process read_num $ sum_so_far `join` take_od 5
1
2
3
4
[0,1,3,6,10]
*Main>
Here is the full code for reference:
{-# LANGUAGE LambdaCase #-}
module Main where
import qualified Text.Read
data OnDemand a b
= Halt
| Out b (OnDemand a b)
| In (Maybe a -> OnDemand a b)
sum_so_far :: OnDemand Int Int
sum_so_far =
loop 0
where
loop :: Int -> OnDemand Int Int
loop sum = Out sum (In $ \case
Nothing -> Halt
Just n -> loop (sum + n))
convert_list :: OnDemand input out -> [input] -> [out]
convert_list = \case
Halt -> \_ -> []
Out out cont -> \ls -> out : convert_list cont ls
In f -> \case
[] -> convert_list (f Nothing) []
hd:tl -> convert_list (f $ Just hd) tl
out_list :: [a] -> OnDemand () a
out_list [] = Halt
out_list (h:t) = Out h (out_list t)
drain :: OnDemand () b -> [b]
drain = \case
Halt -> []
Out b kont -> b : drain kont
In f -> drain $ f Nothing
ints :: OnDemand () Int
ints = loop 1
where
loop n = Out n (loop (n + 1))
take_od :: Int -> OnDemand a a
take_od 0 = Halt
take_od n = In (\case
Nothing -> Halt
Just a -> Out a (take_od $ n - 1))
join :: OnDemand a b -> OnDemand b c -> OnDemand a c
join od1 od2 = case od2 of
Halt -> Halt
Out c kont -> Out c (join od1 kont)
In f -> case od1 of
Halt -> join od1 (f Nothing)
Out b kont -> join kont (f $ Just b)
In g -> In (\ma -> join (g ma) od2)
print_od :: Show b => OnDemand a b -> IO ()
print_od = \case
Halt -> return ()
In _ -> print "Error: missing input"
Out b k -> do
print b
print_od k
read_num :: IO (Maybe Int)
read_num = Text.Read.readMaybe <$> getLine
process :: IO (Maybe a) -> OnDemand a b -> IO [b]
process io = \case
Halt -> return []
Out hd k -> do
tl <- process io k
return $ hd : tl
In f -> do
input <- io
process io (f input)
main :: IO ()
main = do
let same_filter = sum_so_far `join` take_od 10
let pure_call = drain $ ints `join` same_filter
sidef_call <- process read_num $ same_filter
print pure_call
print sidef_call
And here is a sample invocation:
$ stack run <<< $(seq 1 9)
[0,1,3,6,10,15,21,28,36,45]
[0,1,3,6,10,15,21,28,36,45]
$
So, how hard is it to map all of these learnings to our little genetic
algorithm from last week? A lot easier than it may seem, actually.
First, we need to add the OnDemand
data definition:
+data OnDemand a b
+ = Halt
+ | Out b (OnDemand a b)
+ | In (a -> OnDemand a b)
+
Next, we need to change the exec_random
function: since we're going to ask
for random values from our caller explicitly, we don't need to carry around a
list anymore. In fact, we don't need to carry any state around anymore, which
makes this monad look almost unnecessary. Still, it offers a slightly nicer
syntax for client functions (GetRand
instead of explicit continuations). It's
also quite nice that almost none of the functions that use the monad need to
change here.
-exec_random :: WithRandom a -> [Double] -> ([Double] -> a -> b) -> b
-exec_random m s cont = case m of
- Bind ma f -> exec_random ma s (\s a -> exec_random (f a) s cont)
- Return a -> cont s a
- GetRand -> cont (tail s) (head s)
+exec_random :: WithRandom a -> (a -> OnDemand Double b) -> OnDemand Double b
+exec_random m cont = case m of
+ Bind ma f -> exec_random ma (\a -> exec_random (f a) cont)
+ Return a -> cont a
+ GetRand -> In (\r -> cont r)
The biggest change is the signature of the main genetic_search
function:
instead of getting a [Double]
as the last input and returning a [(solution, Double)]
, we now just return a OnDemand Double (solution, Double)
.
- -> [Double]
- -> [(solution, Double)]
-genetic_search fitness mutate crossover make_solution rnd =
- map head $ exec_random init
- rnd
- (\rnd prev -> loop prev rnd)
+ -> OnDemand Double (solution, Double)
+genetic_search fitness mutate crossover make_solution =
+ exec_random init (\prev -> Out (head prev) (loop prev))
where
- loop :: [(solution, Double)] -> [Double] -> [[(solution, Double)]]
- loop prev rnd = prev : exec_random (step prev)
- rnd
- (\rnd next -> loop next rnd)
+ loop :: [(solution, Double)] -> OnDemand Double (solution, Double)
+ loop prev = exec_random (step prev) (\next -> Out (head next) (loop next))
The changes here are mostly trivial: we just remove the manual threading of the
random list, and add one explicit Out
to the core loop.
Finally, we of course need to change the call in the main
function to
actually drive the new version and provide random numbers on demand. This is a
fairly trivial loop:
- print $ map snd
- $ take 40
- $ genetic_search fitness mutate crossover mk_sol rands
+ loop rands 40 $ genetic_search fitness mutate crossover mk_sol
+ where
+ loop :: [Double] -> Int -> OnDemand Double ((Double, Double), Double) -> IO ()
+ loop rs n od =
+ if n == 0
+ then return ()
+ else case od of
+ Halt -> return ()
+ In f -> do
+ next_rand <- pure (head rs)
+ loop (tail rs) n (f next_rand)
+ Out v k -> do
+ print v
+ loop rs (n - 1) k
Obviously in an ideal scenario the next_rand <- pure (head rs)
could be more
complex; the point here is just to illustrate that we can do any IO we want to
produce the next random element. The full, updated code can be found here
(diff).