28 March 2021

Laziness: Clojure vs Haskell

Last week I punted on randomness, and just made my genetic_search function take a [Double]. While that was convenient, it is unfortunately not as general as I thought at the time. I'm still in the process of learning Haskell, and I got confused between laziness in Clojure and laziness in Haskell.

So how do they differ? The one-word answer is "purity", but let me try to expand on that a little bit. Clojure has the "seq" abstraction, which is defined by the two functions first and next, where first gives you an element and next gives you another seq if there are more elements to be had, or nil if there are none. When I think of a lazy list, I think in terms of Clojure seqs, even though Clojure lists are actually not lazy.

How is that different from Haskell? In Clojure, a lazy seq is one where the elements are explicitly produced on demand, and then cached. This sounds a lot like laziness in Haskell, except for one crucial difference: Clojure does not mind how these elements are produced, and in particular, whether that involves any kind of side effects.

In Haskell, on the other hand, laziness means that the elements of a list will be computed on demand, but that only applies to pure computation. There is no lazy side effect in Haskell.

Here is a short Clojure program to illustrate this notion further. This program will ask the user for positive integers and then print a running total.

First, we start with a seemingly pure function which, given a possibly lazy seq of numbers, returns its sum so far:

(defn sum-so-far
  [ls]
  (reductions + 0 ls))

Testing this out yields the expected behaviour:

t.core=> (sum-so-far [1 2 3 4 5])
(0 1 3 6 10 15)
t.core=>

The range function, with no argument, returns an infinite lazy seq of integers, starting with 0. We can use our sum-so-far function on it:

t.core=> (take 10 (sum-so-far (range)))
(0 0 1 3 6 10 15 21 28 36)
t.core=>

We can also construct a lazy seq by asking the user for input:

(defn read-nums
  []
  (print "Please enter a number: ")
  (flush)
  (cons (Integer/parseInt (read-line)) (lazy-seq (read-nums))))

Testing it works as expected:

t.core=> (take 5 (read-nums))
Please enter a number: 1
Please enter a number: 2
Please enter a number: 3
Please enter a number: 4
Please enter a number: 5
(1 2 3 4 5)
t.core=>

Similarly, we can easily compute the running sum:

t.core=> (take 5 (sum-so-far (read-nums)))
Please enter a number: 1
Please enter a number: 2
Please enter a number: 3
Please enter a number: 4
(0 1 3 6 10)
t.core=>

So we have this one function, sum-so-far, that can handle any Clojure seq, regardless of how it is processed, and produces a new seq itself. Such a function is better thought of as a filter acting on a stream than as a function taking an argument and returning a result.

Let's look at the Haskell equivalent. The sum-so-far function seems easy enough:

sum_so_far :: [Int] -> [Int]
sum_so_far is =
  loop 0 is
  where
  loop :: Int -> [Int] -> [Int]
  loop sum [] = [sum]
  loop sum (h:t) = sum : loop (sum + h) t

I don't know if Haskell has a direct equivalent to Clojure's reductions, but that's not the point here and it's easy enough to code our own. This obvisouly works as intended on both finite and infinite lists:

*Main> sum_so_far [1, 2, 3, 4, 5]
[0,1,3,6,10,15]
*Main> let ints = 1:map (+1) ints
*Main> take 10 $ sum_so_far ints
[0,1,3,6,10,15,21,28,36,45]
*Main>

But what about read-nums? It's not too hard to replicate the beginning of the function:

read_nums :: IO [Int]
read_nums = do
  int <- read <$> getLine
  return int : ???

How can we replace those ???? Well, the : ("cons") function needs a list as its second argument, so we could try constructing that:

read_nums :: IO [Int]
read_nums = do
  int <- read <$> getLine
  tl <- read_nums
  return $ int : tl

That typechecks. But the whole point of monads is to sequence computation: there is no way we can reach the return line without having first produced the entire tl, and thus we're not lazy anymore and can't pile on more processing on top of this.

What other option do we have? Perhaps Hoogle knows of a functiont that would help here? The type we'd need would look something like:

Int -> IO [Int] -> IO [Int]

so we could replace : and use that instead. Hoogle does find a result for that, but it's not quite what we need here. We could of course write a function with that signature easily enough:

ignore_tl :: Int -> IO [Int] -> IO [Int]
ignore_tl i _ = pure [i]

but that's obviously not what we want.

Are we stuck? Let's take a step back. The solution here is to realize that Clojure seqs are not the same as Haskell lists. Instead of thinking of sum-so-far as a function, let's go back to the idea of thinking of it as a filter between two streams. What would it take to construct such a filter in Haskell? We'd need a type with the following operations:

  • Produce an element in the "output" stream.
  • Request an element from the "input" stream.
  • Let my consumer know that I will not be producing further elements.

The decoupling Clojure gives us is to be completely independent of how the input elements are produced and to produce the output elements on-demand. Let's model this a bit more precisely. We need a data definition OnDemand that represents a filter between two streams of elements. The filter could change the type, so we'll make it take two type parameters: an input one and an output one. We start with:

data OnDemand input output

Next, we need to be able to express "here is an element" and "there are no more elements". We can take a page from the List book and express those exactly like Nil and Cons:

  = Halt
  | Out output (OnDemand input output)

Finally, we need to be able to ask for a new element, wait for it, and then keep going. This phrasing suggests we need to suspend the current computation to give our supplier the opportunity to manufacture an input, and keep going after that. A generally good way to model suspended computations is with a continuation:

  | In (Maybe input -> OnDemand input output)

where the input parameter is wrapped in a Maybe because the input stream may have ran out.

Using this definition, we can rewrite our sum_so_far as a filter:

{-# LANGUAGE LambdaCase #-}

{- ... -}

sum_so_far :: OnDemand Int Int
sum_so_far =
  loop 0
  where
  loop :: Int -> OnDemand Int Int
  loop sum = Out sum (In $ \case
    Nothing -> Halt
    Just n -> loop (sum + n))

We can make this work on lists again with a simple adapter:

convert_list :: OnDemand input out -> [input] -> [out]
convert_list = \case
  Halt -> \_ -> []
  Out out cont -> \ls -> out : convert_list cont ls
  In f -> \case
    [] -> convert_list (f Nothing) []
    hd:tl -> convert_list (f $ Just hd) tl

and we can use (convert_list sum_so_far) as we did before, with both finite and infinite lists:

*Main> (convert_list sum_so_far) [1, 2, 3, 4, 5]
[0,1,3,6,10,15]
*Main> let ints = 1:map (+1) ints
*Main> take 10 $ (convert_list sum_so_far) ints
[0,1,3,6,10,15,21,28,36,45]
*Main>

But let's stay in the realm of streams for a bit. First, let's define a simple function to produce a stream from a list:

out_list :: [a] -> OnDemand () a
out_list [] = Halt
out_list (h:t) = Out h (out_list t)

Then, let's define a function to drain a stream into a list:

drain :: OnDemand () b -> [b]
drain = \case
    Halt -> []
    Out b kont -> b : drain kont
    In f -> drain $ f Nothing

Now, we can do fun stuff like

*Main> drain $ out_list [1, 2, 3, 4]
[1,2,3,4]
*Main>

Ok, so maybe that's not so much fun yet. We'd like to be able to express the equivalent of take 10 $ sum_so_far $ ints. Let's first work on each of these pieces. We can get a stream of naturals with

ints :: OnDemand () Int
ints = loop 1
  where
  loop n = Out n (loop (n + 1))

and we can limit a stream to a given number of elements with:

take_od :: Int -> OnDemand a a
take_od 0 = Halt
take_od n = In (\case
  Nothing -> Halt
  Just a -> Out a (take_od $ n - 1))

We now have all the pieces. What's the equivalent of $? We need to take two filters and return a filter than combines them. Here is the code:

join :: OnDemand a b -> OnDemand b c -> OnDemand a c
join od1 od2 = case od2 of
  Halt -> Halt
  Out c kont -> Out c (join od1 kont)
  In f -> case od1 of
    Halt -> join od1 (f Nothing)
    Out b kont -> join kont (f $ Just b)
    In g -> In (\ma -> join (g ma) od2)

We start from the outer filter. If that one says to stop, we don't need to look into any more input from the inner filter. If we have an output ready, we can just produce that. So far, so good. What happens if the outer filter needs an input? Well, in that case, we need to look at the inner one. Does it have an output ready? If so, we can just feed that into the outer filter. Is it halted? We can feed that information into the outer filter by calling its continuation with Nothing. Finally, if the inner filter itself is also waiting for an input, we have no other choice but to ask for more input from the context.

We can now have a bit more fun:

*Main> drain $ ints `join` sum_so_far `join` take_od 20
[0,1,3,6,10,15,21,28,36,45,55,66,78,91,105,120,136,153,171,190]
*Main>

This may look like the beginnings of a useful abstraction. But can we do IO with it?

Let's try to write read_nums. We still cannot write a OnDemand () (IO Int) that would be useful, just like we could not write a useful IO [Int]. But the whole point of this OnDemand stuff is to do operations one at a time. So let's define a function that gets a single integer:

import qualified Text.Read

{- ... -}

read_num :: IO (Maybe Int)
read_num = Text.Read.readMaybe <$> getLine

We cannot create an infinite stream of lazy IO actions. We've already gone through that rabbit hole. But what we can do is define a function that will run a filter within the IO context and generate all the required elements on demand:

process :: IO (Maybe a) -> OnDemand a b -> IO [b]
process io = \case
  Halt -> return []
  Out hd k -> do
    tl <- process io k
    return $ hd : tl
  In f -> do
    input <- io
    process io (f input)

Now we can use the exat same sum_so_far filter with values coming from pure and impure contexts:

*Main> drain $ ints `join` sum_so_far `join` take_od 10
[0,1,3,6,10,15,21,28,36,45]
*Main> process read_num $ sum_so_far `join` take_od 5
1
2
3
4
[0,1,3,6,10]
*Main>

Here is the full code for reference:

{-# LANGUAGE LambdaCase #-}

module Main where

import qualified Text.Read

data OnDemand a b
  = Halt
  | Out b (OnDemand a b)
  | In (Maybe a -> OnDemand a b)

sum_so_far :: OnDemand Int Int
sum_so_far =
  loop 0
  where
  loop :: Int -> OnDemand Int Int
  loop sum = Out sum (In $ \case
    Nothing -> Halt
    Just n -> loop (sum + n))

convert_list :: OnDemand input out -> [input] -> [out]
convert_list = \case
  Halt -> \_ -> []
  Out out cont -> \ls -> out : convert_list cont ls
  In f -> \case
    [] -> convert_list (f Nothing) []
    hd:tl -> convert_list (f $ Just hd) tl

out_list :: [a] -> OnDemand () a
out_list [] = Halt
out_list (h:t) = Out h (out_list t)

drain :: OnDemand () b -> [b]
drain = \case
    Halt -> []
    Out b kont -> b : drain kont
    In f -> drain $ f Nothing

ints :: OnDemand () Int
ints = loop 1
  where
  loop n = Out n (loop (n + 1))

take_od :: Int -> OnDemand a a
take_od 0 = Halt
take_od n = In (\case
  Nothing -> Halt
  Just a -> Out a (take_od $ n - 1))

join :: OnDemand a b -> OnDemand b c -> OnDemand a c
join od1 od2 = case od2 of
  Halt -> Halt
  Out c kont -> Out c (join od1 kont)
  In f -> case od1 of
    Halt -> join od1 (f Nothing)
    Out b kont -> join kont (f $ Just b)
    In g -> In (\ma -> join (g ma) od2)

print_od :: Show b => OnDemand a b -> IO ()
print_od = \case
  Halt -> return ()
  In _ -> print "Error: missing input"
  Out b k -> do
    print b
    print_od k

read_num :: IO (Maybe Int)
read_num = Text.Read.readMaybe <$> getLine

process :: IO (Maybe a) -> OnDemand a b -> IO [b]
process io = \case
  Halt -> return []
  Out hd k -> do
    tl <- process io k
    return $ hd : tl
  In f -> do
    input <- io
    process io (f input)

main :: IO ()
main = do
  let same_filter = sum_so_far `join` take_od 10
  let pure_call = drain $ ints `join` same_filter
  sidef_call <- process read_num $ same_filter
  print pure_call
  print sidef_call

And here is a sample invocation:

$ stack run <<< $(seq 1 9)
[0,1,3,6,10,15,21,28,36,45]
[0,1,3,6,10,15,21,28,36,45]
$

So, how hard is it to map all of these learnings to our little genetic algorithm from last week? A lot easier than it may seem, actually. First, we need to add the OnDemand data definition:

+data OnDemand a b
+  = Halt
+  | Out b (OnDemand a b)
+  | In (a -> OnDemand a b)
+

Next, we need to change the exec_random function: since we're going to ask for random values from our caller explicitly, we don't need to carry around a list anymore. In fact, we don't need to carry any state around anymore, which makes this monad look almost unnecessary. Still, it offers a slightly nicer syntax for client functions (GetRand instead of explicit continuations). It's also quite nice that almost none of the functions that use the monad need to change here.

-exec_random :: WithRandom a -> [Double] -> ([Double] -> a -> b) -> b
-exec_random m s cont = case m of
-  Bind ma f -> exec_random ma s (\s a -> exec_random (f a) s cont)
-  Return a -> cont s a
-  GetRand -> cont (tail s) (head s)
+exec_random :: WithRandom a -> (a -> OnDemand Double b) -> OnDemand Double b
+exec_random m cont = case m of
+  Bind ma f -> exec_random ma (\a -> exec_random (f a) cont)
+  Return a -> cont a
+  GetRand -> In (\r -> cont r)

The biggest change is the signature of the main genetic_search function: instead of getting a [Double] as the last input and returning a [(solution, Double)], we now just return a OnDemand Double (solution, Double).

-               -> [Double]
-               -> [(solution, Double)]
-genetic_search fitness mutate crossover make_solution rnd =
-  map head $ exec_random init
-                         rnd
-                         (\rnd prev -> loop prev rnd)
+               -> OnDemand Double (solution, Double)
+genetic_search fitness mutate crossover make_solution =
+  exec_random init (\prev -> Out (head prev) (loop prev))
   where
-  loop :: [(solution, Double)] -> [Double] -> [[(solution, Double)]]
-  loop prev rnd = prev : exec_random (step prev)
-                                     rnd
-                                     (\rnd next -> loop next rnd)
+  loop :: [(solution, Double)] -> OnDemand Double (solution, Double)
+  loop prev = exec_random (step prev) (\next -> Out (head next) (loop next))

The changes here are mostly trivial: we just remove the manual threading of the random list, and add one explicit Out to the core loop.

Finally, we of course need to change the call in the main function to actually drive the new version and provide random numbers on demand. This is a fairly trivial loop:

-  print $ map snd
-        $ take 40
-        $ genetic_search fitness mutate crossover mk_sol rands
+  loop rands 40 $ genetic_search fitness mutate crossover mk_sol
+  where
+  loop :: [Double] -> Int -> OnDemand Double ((Double, Double), Double) -> IO ()
+  loop rs n od =
+    if n == 0
+    then return ()
+    else case od of
+      Halt -> return ()
+      In f -> do
+        next_rand <- pure (head rs)
+        loop (tail rs) n (f next_rand)
+      Out v k -> do
+        print v
+        loop rs (n - 1) k

Obviously in an ideal scenario the next_rand <- pure (head rs) could be more complex; the point here is just to illustrate that we can do any IO we want to produce the next random element. The full, updated code can be found here (diff).

Tags: clojure haskell