March 8, 2024

Bashing my head against a wall: Week 6 of 48 in 24

This week’s1 challenge was List Ops: create methods for manipulating lists like map, filter and reduce (well, foldr and foldl). I found this exercise frustrating in different languages in different ways. I used bash, elixir and haskell. In elixir and haskell, the frustration was that the exercise required me to return the standard list objects. I think it would have been more fun and more interesting to actually create a new list type and implement the methods there. For example, do these list operations look significantly different if you have a dynamic array compared to if you have a linked list? (For example, reversing a doubly linked list is just swapping the next and previous pointers, whereas reversing an array is going to be quite different).

Elixir

defmodule ListOps do
  # Please don't use any external modules (especially List or Enum) in your
  # implementation. The point of this exercise is to create these basic
  # functions yourself. You may use basic Kernel functions (like `Kernel.+/2`
  # for adding numbers), but please do not use Kernel functions for Lists like
  # `++`, `--`, `hd`, `tl`, `in`, and `length`.

  @spec count(list) :: non_neg_integer
  def count(l) do
    do_count(l,0)
  end

  defp do_count([],acc), do: acc
  defp do_count([_ | t] ,acc) do
    do_count(t, acc+1)
  end

  @spec reverse(list) :: list
  def reverse(l) do
    do_reverse(l,[])
  end

  defp do_reverse([], acc), do: acc
  defp do_reverse([h | t], acc) do
    do_reverse(t, [h|acc])
  end

  @spec map(list, (any -> any)) :: list
  def map(l, f) do
    do_map(l, f, [])
  end

  defp do_map([], _, acc), do: acc
  defp do_map([h|[]], f, acc), do: [f.(h)| acc]
  defp do_map([h | t] ,f ,acc) do
    [f.(h) | do_map(t, f, acc)]
  end

  @spec filter(list, (any -> as_boolean(term))) :: list
  def filter(l, f) do
    do_filter(l, f, [])
  end

  defp do_filter([], _, acc), do: acc
  defp do_filter([h | t], f, acc) do
    if f.(h) do [h | do_filter(t, f, acc)] else do_filter(t, f, acc) end
  end


  @type acc :: any
  @spec foldl(list, acc, (any, acc -> acc)) :: acc
  def foldl([], acc, _), do: acc
  def foldl([h|t], acc, f) do
    foldl(t,f.(h,acc),f)
  end

  @spec foldr(list, acc, (any, acc -> acc)) :: acc
  def foldr([], acc, _), do: acc
  def foldr([h|t], acc, f) do
    f.(h,foldr(t,acc,f))
  end

  @spec append(list, list) :: list
  def append([h|[]], b), do: [h|b]
  def append([],b), do: b
  def append([h|t], b) do
    [h| append(t,b)]
  end

  @spec concat([[any]]) :: [any]
  def concat(ll) do
    do_concat(ll,[])
  end

  defp do_concat([],acc), do: acc
  defp do_concat([h|t], acc) do
    append(h, do_concat(t,acc))
  end
end

The basic approach for most of these operations is the same: define a private do_operation function that takes an extra argument, and then recursively does whatever it’s supposed to do one element at a time until the input list is empty, and then return the last argument. So for count: do_count takes an input list, a function and an accumulator. We pattern match on [ h | t ] which is the elixir way of pattern matching on the head of a list h (the first element) and t the tail of the list (the rest). We add one to acc and keep going until we run out of elements, then we return acc.

With reverse, instead of adding to a running count, we prepend the element to the front of the accumulator.

do_reverse([1,2,3],[]) == do_reverse([2,3],[1])
    == do_reverse([3],[2,1])
    == do_reverse([],[3,2,1])
    == [3,2,1]

Map is an interesting one. First, note that the function passed in as a parameter f, is defined in the tests as a lambda so you need to call it with f.(x). This was not obvious to me and caused some confusion when doing this challenge. You might think that you could simplify do_map to just the following:

  defp do_map([], _, acc), do: acc
  defp do_map([h | t] ,f ,acc) do
    [f.(h) |  acc]
  end

But this will map and reverse the list. (See the similarity to the do_reverse operation). The solution I use here was inspired by some of the community solutions for this exercise I saw on the haskell track. (A peek behind the scenes: I don’t always present the solutions in the order I did them.)

foldl and foldr are both varieties of what is sometimes called reduce: you take a list, a starting value and a binary operation, and you repeatedly apply the binary operation to elements of the list until you run out. If the binary operation is well-behaved then foldl and foldr will give the same output (although performance can still be radically different between them). I’m being slightly vague here about what “well-behaved” means here because I am not quite sure. I think the relevant properties are associativity and commutativity but that’s a topic for another post. Essentially, if you have a list $[2,3,4]$, a starting value $1$ and a binary operation on numbers $\oplus$ then this is what is happening:

$$ \begin{align} \mathrm{foldl}([2,3,4], 1, \oplus) &= \mathrm{foldl}([3,4], 2\oplus 1, \oplus) \\ &= \mathrm{foldl}([4], 3 \oplus (2 \oplus 1), \oplus) \\ &= 4 \oplus (3 \oplus (2 \oplus 1)) \\ \mathrm{foldr}([2,3,4], 1, \oplus) &= 2 \oplus \mathrm{foldr}([3,4], 1, \oplus)\\ &= 2 \oplus (3 \oplus \mathrm{foldr}([4], 1, \oplus)) \\ &= 2 \oplus (3 \oplus (4 \oplus 1)) \end{align} $$

I haven’t really got my head around the performance differences between foldl and foldr but maybe once I do I’ll revisit this topic.

Append and concat are pretty self-explanatory so I won’t explain them in detail.

Haskell

My haskell solutions are pretty similar to the elixir ones, really. At first I thought it strange that foldl' and foldr are listed first here, but the tests for them are partway down the list of tests. We’ll come back to that.

The other thing to note here is that we have fold' rather than foldl. This is a vairant of foldl' that forces the evaluation of its argument rather than permitting haskell’s standard lazy evaluation. I don’t fully understand what’s going on here, but this page explains some of the details. Essentially, haskell won’t actually calculate anything until it has to, instead gathering together a whole bunch of interrelated computational IOUs. The seq x y function basically goes “cash out all the IOUs in x then return y”. So if y depends on x, this is a way of clearing out some of the memory tied up with keeping track of all the calculations haskell hasn’t done yet to calculate x.

module ListOps
  ( length
  , reverse
  , map
  , filter
  , foldr
  , foldl'
  , (++)
  , concat
  ) where

import Prelude hiding
  ( length, reverse, map, filter, foldr, (++), concat )

foldl' :: (b -> a -> b) -> b -> [a] -> b
foldl' f z [] = z
foldl' f z (x:xs) = foldl' f (seq z $ f z x) xs

foldr :: (a -> b -> b) -> b -> [a] -> b
foldr f z [] = z
foldr f z (x:xs) = (f x $ foldr f z xs) 

length :: [a] -> Int
length xs = _length xs 0
  where _length :: [a] -> Int -> Int
        _length [] n = n
        _length (y:ys) n = _length ys (succ n)

reverse :: [a] -> [a]
reverse xs = _reverse xs []
  where _reverse :: [a] -> [a] -> [a]
        _reverse [] ys = ys
        _reverse (x:xs) ys = _reverse xs (x:ys)

map :: (a -> b) -> [a] -> [b]
map f xs = _map f xs []
  where _map  :: (a -> b) -> [a] -> [b] -> [b]
        _map g [] zs = reverse zs
        _map g (y:ys) zs = _map g ys ((g y):zs)

filter :: (a -> Bool) -> [a] -> [a]
filter p xs = _filter p xs []
  where _filter :: (a -> Bool) -> [a] -> [a] -> [a]
        _filter _ [] zs = reverse zs
        _filter q (y:ys) zs = if q y then _filter q ys (y:zs) else _filter q ys zs

(++) :: [a] -> [a] -> [a]
(++) xs ys = _combine (reverse xs) ys
  where _combine :: [a] -> [a] -> [a]
        _combine [] zs = zs
        _combine (a:as) zs = _combine as (a:zs)

concat :: [[a]] -> [a]
concat xss = foldr (++) []  xss

It’s worth noting that haskell and elixir versions of foldl are slightly different in terms of their types, order of function arguments and order of operations. foldr behaves the same, but foldl does not:

$$ \begin{align} \mathrm{foldl}(\oplus, 1, [2,3,4]) &= \mathrm{foldl}(\oplus, 1\oplus 2, [3,4]) \\ &= \mathrm{foldl}(\oplus, (1\oplus 2) \oplus 3, [4]) \\ &= \mathrm{foldl}(\oplus, ((1\oplus 2) \oplus 3) \oplus 4, [])\\ &= ((1\oplus 2) \oplus 3) \oplus 4 \\ \end{align} $$

Instead of defining private do_operation functions, the definitions are made internal to the operations themselves with where. Note that here map is the naive version of map I discussed in the elixir section. I have to add a reverse to it to get the right ordering. And again with filter. I also use reverse in the definition of (++) to put the first list in reverse order, so I add elements to the front of the second list “backwards”, thus ending up with the correct list.

The last of these operations is the most interesting. Concatenating a bunch of lists together is just using a fold with (++) as the binary operation and [] as the starting point. This points towards an interesting question: how many of the other list operations can be achieved just using folds? The answer, it turns out, is all of them!

Normally when I write these posts, I try to stick to what I did in my implementation of the solutions. But for the above two solutions, there’s not a lot to discuss. Once you get the hang of the sort of thing it’s asking for, writing the solutions is mostly fighting with the syntax rather than anything interesting, algorithmically speaking. So I’m going to make an exception here and discuss a much more interesting solution to the haskell track. This actual code is from user Mlinzo, but if I understand the way exercism displays community solutions, this solution is one that is similar to ones lots of people have submitted. I’m going to reproduce it here, because I am pretty sure that user-contributed code on exercism is licensed CC-BY-NC-SA.

Skipping the preamble bits, the code for the folds is pretty similar.

foldl' :: (b -> a -> b) -> b -> [a] -> b
foldl' _ acc [] = acc
foldl' f acc (x:xs) = let acc' = f acc x in acc' `seq` foldl' f acc' xs

foldr :: (a -> b -> b) -> b -> [a] -> b
foldr _ acc [] = acc
foldr f acc (x:xs) = f x (foldr f acc xs)

length :: [a] -> Int
length = foldl' (\acc _ -> acc + 1) 0

reverse :: [a] -> [a]
reverse = foldl' (flip (:)) []

map :: (a -> b) -> [a] -> [b]
map f = foldr ((:) . f) []

filter :: (a -> Bool) -> [a] -> [a]
filter f = foldr (\e acc -> if f e then e:acc else acc) []

(++) :: [a] -> [a] -> [a]
xs ++ ys = foldr (:) ys xs

concat :: [[a]] -> [a]
concat xs = foldr (++) [] xs

The first thing to note is that these solutions are defined in such a way that some arguments are implicit. Let’s start with length. Notice that the foldl' in the definition of length only has two arguments, whereas it should have three. But notice further that no argument is provided on the left hand side of the definition. So this is saying that the length function just is a partially applied fold. It’s is just a fold where the binary operation is “throw away the second argument, return one plus the first argument”. We could rewrite it as:

length xs = foldl' (\acc _ -> acc + 1) 0 xs

We can unpack its application of fold. So, length ['a', 'b', 'c'] is: foldl' (\acc _ -> acc+1) 0 ['a', 'b', 'c']. If we ignore the seq shenanigans, and we replace the lambda with f for now, we can unpack this further as: foldl' f (f 0 'a') ['b', 'c'] and unpacking completely we get f(f(f(0, 'a'), 'b'), 'c'). Remember that f throws away its second argument and adds one to its first. So the above expression becomes: ((0 + 1) + 1) + 1 which is the length of our input.

What about reverse? First, what is the function flip (:) doing? You’d normally see the : operator used as an infix to prepend an element to a list: 1:[2,3,4] = [1,2,3,4]. But you could alternatively write this as: (:) 1 [2,3,4]. Now, flip takes a function and returns a function that does the same thing but accepts its arguments in the opposite order. So f x y = (flip f) y x. That means that flip (:) is a function that takes a list and an item, and puts the item at the front of the list. So effectively you keep prepending items onto the list until you run out of items, thus reversing it.

Before we talk about map, let’s note that foldr (:) [] xs = xs. This is kind of important, because the next few operations all relying on this idea. So what calcluations happen when we do foldr (:) [] [1,2,3]?

foldr (:) [] [1,2,3] == 1 : (foldr (:) [] [2,3])
    == 1 : (2 : foldr (:) [] [3])
    == 1 : (2 : (3 : (foldr (:) [] [])))
    == 1 : (2 : (3 : []))
    == [1,2,3]

So all a map needs to do is apply f to the element before we pass it in as the first argument of (:). So ((:) . f) x returns a function that prepends f x to a list. Since this is foldr this starts at the right end of the list, and thus just applies f to each element prepending them from right to left and leaving us with the original order.

Next up is filter. All the action here is in the function argument: (\e acc -> if f e then e:acc else acc). As a warm up, we can unpack the function (:) into a lambda (\e acc -> e:acc). So if f e is always true, filter would just return the same thing as foldr (:) [], namely the full list.

(++) should now be pretty obvious. It’s just our foldr (:) example from above except that the initial value is not the empty list, but the second of the lists we are concatenating. (Note the order of the arguments in this one). And concat is simply folding on (++).

So all the other list ops can be written in terms of foldl' and foldr. And in fact, it’s beyond the scope of this post but foldl can be written in terms of foldr!

Bash

OK that’s enough elegant satisfying algorithmic solutions. Let’s do some bash hacking. Removing some irrelevant boilerplate from the top of the solution, we are left with this.

#!/usr/bin/env bash

# Due to inherent bash limitations around word splitting and globbing,
# functions that are intended to *return a list* are instead required to
# receive a nameref parameter, the name of an array variable that will be
# populated in the list function.
# See the filter, map and reverse functions.

# Also note that nameref parameters cannot have the same name as the
# name of the variable in the calling scope.


# Append some elements to the given list.
list::append () {
    local -n _array=$1
    shift
    _array=("${_array[@]}" "$@")
}

# Return only the list elements that pass the given function.
list::filter () {
    local filter_f=$1
    local -n input=$2
    local -n result=$3
    for item in "${input[@]}"; do
        if "$filter_f" "$item"; then
            result=("${result[@]}" "$item")
        fi
    done    
}
# Transform the list elements, using the given function,
# into a new list.
list::map () {
    local map_f=$1
    local -n input=$2
    local -n _result=$3
    for item in "${input[@]}"; do
        m=$("$map_f" "$item")
        _result=("${_result[@]}" "$m")
    done
}

# Left-fold the list using the function and the initial value.
list::foldl () {
    local fold_f=$1
    local acc=$2
    local -n ls=$3
    for item in "${ls[@]}"; do
        acc=$("$fold_f" "$acc" "$item")
    done
    echo "$acc"
}

# Right-fold the list using the function and the initial value.
list::foldr () {
    local fold_f=$1
    local acc=$2
    local -n ls=$3
    for ((a=${#ls[@]}-1; a>=0; a--)); do
        item="${ls[a]}"
        acc=$("$fold_f" "$item" "$acc")
    done
    echo "$acc"
}

# Return the list reversed
list::reverse () {
    local -n _ls="$1"
    local -n _sl="$2"
    for item in "${_ls[@]}"; do
        _sl=("$item" "${_sl[@]}")
    done
}

These bash versions of the exercise took ages because I really don’t fully understand the bash model, but I think it helps to take the approach that everything is a string. I don’t fully understand what’s going on here, but my mental model of the comment at the top about namerefs is this: for some reason or other, it’s impossible to return an array in bash. So what we do instead is recieve the name of an array, or a reference to an array, and what we should do is make changes in place to that array. So if we want to treat one of our arguments as an array, we should assign it to a local variable with the -n flag which means we’re treating it as a reference rather than the value. So in append we local -n _array=$1 assigns the reference to our array – the first argument to the function – to the variable _array. If we then want to get the values in that array, we need to do ${_array[@]}. Parameter expansion like this is a powerful but kind of mysterious-looking aspect of bash. OK, so if x is an array, ${x[i]} will give you the ith element of the array. And @ is a sort of shorthand for “all of the numbers!”, so ${x[@]} gives you the whole array.

The next line just says shift. What this does is it basically pops and discards the first input argument. So it turns the second argument into the first, the third into the second and so on, and it discards the first argument. We’ve used the first argument to get our reference to the array, so we’re done with it. (shift n discards the first n arguments).

All append needs to do is put its remaining arguments into the array after the elements of first argument. You build arrays with ( ) in bash. Since we’ve called shift, the “remaining arguments” are all of the arguments, so we can use $@ (it’s that @ shortcut again) to get all the arguments! So, we assign ( \${_array[@]} $@ ) to our variable that references the array we need to modify. And that’s it for append.

The approach for filter is to loop over the items, and if they pass the test, we append them to result array. Note here – because it will be important in a minute – to call the filter function on the element, we just do "\$filter_f" "$item".

OK great, so we can append and we can filter, so map should be fine right? We loop over the elements, and append the result of calling the function on the element. But in this case there is the wrinkle that we need to use \$("\$map_f" "$item") to get the result of applying the function to the item. If I understand the kind of errors I was getting when I was working on this exercise, neglecting the $( ) just tries to append the strings $filter_f and $item instead.

In gooling around for this exercise, I found the following comment in some bash documentation:

# Does bash permit recursion?
# Well, yes, but...
# It's so slow that you gotta have rocks in your head to try it.

So I decided against defining the folds recursively, and went for the simpleminded “loop through the array” approach. There isn’t much to explain that we haven’t seen before here except to point out that for the foldr we’re looping through the list backwards and so we start the index at ${#ls[@]}-1 which is one less that the number of elements in the array (because we’re zero-indexed). The loop statement is enclosed in double parentheses (( )) because that’s how you get bash to treat an expression as numerical rather than as text. Note also that we’re echoing the final result rather than returning it or assigning it to a specific nameref variable. That’s an wrinkle with exercism’s tests, rather than anything bash specific.

And then reverse is what you’d expect: loop through the array, prepending elements to the accumulator.

The bash solutions are on some level algorithmically more straightforward, but the syntax is much harder to read (and, in fact, to write). It’s compact, and I’m sure once you’re used to it it’s a powerful tool. But I definitely enjoyed it the least of the languages I played with this week.


  1. To borrow an idea I think I got from John Baez, I’m not solving one challenge a week, but whenever I do solve one, I solve it this week. ↩︎

© Seamus Bradley 2021–3

Powered by Hugo & Kiss.