Higher Kinds in C# with language-ext [Part 6 - traversables]
This is a multi-part series of articles, it's worth going back to the earlier parts if you haven't read them already:
- Part 1 - Introduction, traits, semigroups, and monoids
- Part 2 - Functors
- Part 3 - Foldables
- Part 4 - Applicatives
- Part 5 - Validation
One interesting stat is that I have managed to remove nearly half a million lines of code from the v5
release of language-ext. At least 200,000 of those lines were auto-generated by some T4 templates (LanguageExt.Transfomers
, which has now been deleted entirely); the rest of the deletions were from hand-written types and extension methods used to simulate higher-kinds (OptionAsync<A>
for example, is emulating the (new) monad-transformer stack OptionT<IO, A>
).
The reason that so many lines of code have been removed is because we don't need to emulate higher-kinds any more, we really can write: OptionT<IO, A>
, whereas before we relied on generated extension methods (I'll talk about transformers in a later article).
So, if the removal of nearly half a million lines of code doesn't justify the techniques I am highlighting in this series, I really don't know what will! Less code pretty much always means fewer bugs and easier long-term maintenance – something I am particularly looking forward to once this release is stable!
Also, v5
has much more functionality – not less!
One thing I noticed as I was working through all of the original language-ext unit-tests – trying to turn them all green – was that I discovered some tests-cases where both the code and the unit-test had a bug. The bugs discovered were in the manually written Traverse
and Sequence
functions.
On [some of the implementations of Traverse
] I had made wrong assumptions and I had carried those wrong assumptions through to the unit-tests! It doesn't matter how good you are at programming, you will always end up making stupid mistakes like this – which is why I much prefer to leverage types over unit-tests: you can't cheat the compiler, but you can easily cheat yourself!
The new implementations of Traverse
(leveraging the higher-kinded traits) exposed those bugs – because they are now following the rules of the trait-types – not the assumptions of a human that errs.
In the Foldables article I showed the graph from the Haskell Typeclassopedia – we're quite a way into implementing the key traits from this diagram:
In this article we will take on the Traversable trait. You can see from the diagram above that there are three arrows leading into the Traversable trait: Functor, Foldable, and Applicative. We have already implemented these in previous articles, so we have all of the components needed to create a Traversable
trait.
But what are traversables? Simply, traversables are a little bit like foldables in that you can iterate over many items in a structure, aggregating as you go. Except, traversables don't expose an aggregated state-value, they instead return a nested foldable/applicative structure. The aggregation comes from running the default applicative behaviour for the structure instead of a provided fold function. This nested structure is 'flipped' so what was inner becomes outer and what was outer becomes inner.
That explanation is too abstract, so let's take a look at an example. Below, we have a collection of strings that we want to parse into integers. Not all strings can be parsed into a valid integer, so we need a way to deal with the failure to parse:
var items = Seq("1", "2", "3", "4", "5");
var values = items.Map(s => parseInt(s));
The example above takes a Seq<string>
, each value is passed to the parseInt
function from the language-ext Prelude
. parseInt
returns an Option<int>
. And so, the result of this would be:
[Some(1), Some(2), Some(3), Some(4), Some(5)]
So, the success of the parsing is embedded into the list.
If we had an item that caused a failure:
var items = Seq("1", "2", "3", "X", "5"); // NOTE: 4th item is 'X'
var values = items2.Map(s => parseInt(s));
Then the result looks like this:
[Some(1), Some(2), Some(3), None, Some(5)]
Now we can tell which items succeed and which failed. Which on its own is pretty useful, but if we want to work with the values in the list we have to do work on every step to test that the value exists.
Often, we want to know whether all items succeeded and only proceed if that is the case. And we often want to early out if any fail, not to continue to parse the rest of the list.
This is where Traverse
comes in. Simply replace Map
with Traverse
:
var items = Seq("1", "2", "3", "4", "5");
var values = items.Traverse(s => parseInt(s));
Now the output is this:
Some([1, 2, 3, 4, 5])
We have 'flipped' the type. Instead of a Seq<Option<int>>
we now have an Option<Seq<int>>
– this is the magic of traverse. It follows the rules of the outer applicative (Option
) to accumulate the inner foldable (Seq
) – resulting in the behaviour of Option
being embedded for each item in the sequence.
If we have a collection where the transformation would produce None
...
var items = Seq("1", "2", "3", "X", "5");
var values = items.Traverse(s => parseInt(s));
... then we get:
None // Option<Seq<int>>
If any item fails, they all fail.
This works for all foldables with all applicatives – injecting the default applicative behaviour for each step (which, by the way, may not shortcut the operation, as it does with Option
).
Here's a simple example using the generic IO library with the Eff
monad:
var items = Seq("c:\\test1.txt", "c:\\test2.txt", "c:\\test3.txt");
var values = items.Traverse(File<Runtime>.readAllText);
That will read all of the files (in parallel). If any fail then the whole thing fails.
Applicatives such as the Validation
type from the last article would collect all of the errors during a Traverse
call. And so, it should be clear that there's a lot of power lurking inside these combined types.
It can't be understated how valuable Traverse
is for pure functional programming. It is one of the most powerful tools to reach for when you find your types are 'the wrong way around' – that is: the flipping of the types and the injecting of the default applicative behaviour allows for some incredibly concise transformations. As you can see it works especially well with sequences of values – but it's not only for sequences!
Let's look at the trait definition:
public interface Traversable<T> : Functor<T>, Foldable<T>
where T : Traversable<T>, Functor<T>, Foldable<T>
{
public static abstract K<F, K<T, B>> Traverse<F, A, B>(
Func<A, K<F, B>> f,
K<T, A> ta)
where F : Applicative<F>;
// ... default implementations removed for clarity ...
}
This is enough to hurt anybody's head! I remember when I was first learning Haskell and I saw the type-signature for traverse
– it took me quite some time to really grok it. Primarily because I had never really dealt with multiple levels of higher-kinded types before, so it messed with my brain a bit.
Here's the Haskell definition:
class (Functor t, Foldable t) => Traversable t where
traverse :: Applicative f => (a -> f b) -> t a -> f (t b)
-- default implementations removed for clarity ...
You should be able to see that we have the same constraints on the class
/interface
as well as the same constraint on the traverse
/ Traverse
function:
(Functor t, Foldable t)
is the same as: Functor<T>, Foldable<T>
Applicative f =>
is the same aswhere F : Applicative<F>
The traverse
/ Traverse
function also has the same arguments and return type:
(a -> f b)
is the same asFunc<A, K<F, B>>
t a
is the same asK<T, A>
f (t b)
is the same asK<F, K<T, B>>
The return type – f (t b)
– was what confused me when I was learning Haskell. It should be a little clearer to you seeing the C# version: it's two, nested, higher-kinded types.
We have two higher-kinds:F
andT
.F
is constrained to be an applicative-functor andT
is constrained to be a functor and a foldable.
So, we pass in a value that can be folded over and mapped. We also pass in a function that takes the bound-value of the foldable/mappable; it returns the applicative K<F, B>
.
So, it should be fairly obvious that we somehow iterate over each value in the foldable/mappable ta
, pass each value to the function f
, and get a value back for each item. What isn't clear is how that K<F, B>
turns into a K<F, K<T, B>>
.
As we're calling Traverse
on Seq<string>
above, let's first look at the implementation of Traverse
for the Seq
type:
static K<F, K<Seq, B>> Traverse<F, A, B>(Func<A, K<F, B>> f, K<Seq, A> ta)
{
return F.Map<Seq<B>, K<Seq, B>>(
ks => ks,
Foldable.foldBack(cons, F.Pure(empty<B>()), ta));
K<F, Seq<B>> cons(K<F, Seq<B>> xs, A x) =>
Applicative.lift(Prelude.Cons, f(x), xs);
}
Ok, that's also enough to hurt anybody's head! Let's break it down:
We have two arguments:
Func<A, K<F, B>> f
– this is the function that does the transformation to the applicativeK<Seq, A> ta
– this is theSeq<A>
that is being traversed
Now let's look at the elements of the first expression. The critical part is:
Foldable.foldBack(cons, F.Pure(empty<B>()), ta)
This leverages the fact that the input type (ta
) is foldable. The fold operation needs an initial state:
F.Pure(empty<B>())
F
is the output Applicative
- so we're lifting empty<B>()
into the F
applicative. empty<B>
is just the constructor of an empty Seq<B>
. So, F.Pure(empty<B>())
is a K<F, Seq<B>>
. It should be fairly clear that this is the result type before we've processed any of the values (and indeed if the input Seq<A>
is empty then this will be the result).
The function that does the folding is cons
. Which is a local function below the expression:
K<F, Seq<B>> cons(K<F, Seq<B>> xs, A x) =>
Applicative.lift(Prelude.Cons, f(x), xs);
It takes in two arguments (as all fold functions do) – the first being the running state and the second being the next item to process. We then do an Applicative.lift
with the Cons
function from the Prelude
(which prepends an item to be the new head of a list); the mapped item (f(x)
); and the list so far xs
.
Applictive.lift
is like callingfun(Prelude.Cons).Map(f(x)).Apply(xs)
– it's a handy shortcut that makes it a little easier to lift functions.
And so, you should see that the applicative Apply
method is chained for each item in the sequence. The function being applied is the Seq
cons function that prepends the mapped values to the head of the list. But, if you remember (from the previous Applicative article) the Apply
function has inbuilt behaviour depending on the structure being applied.
And so, if we were processing a sequence of Option
values then it would know to quit if the result of f(x)
is None
; if we were processing a sequence of Validation
values it would know to collect all of the failed values; if we were processing a sequence of Eff
effects it would know to run them in parallel; etc.
This is super, super powerful and turns into incredibly concise code.
But, there's still some code from the Seq
implementation of Traverse
that we haven't covered:
return F.Map<Seq<B>, K<Seq, B>>(
ks => ks,
Foldable.foldBack(cons, F.Pure(empty<B>()), ta));
The result of the foldBack
call is mapped using Functor.Map
– remember the F
is an Applicative
, which inherits Functor
. We do this to convert from Seq<B>
to K<Seq, B>
. So, even though Seq<B>
inherits K<Seq, B>
we still can't just cast the nested K<F, Seq<B>>
to K<F, K<Seq, B>>
because C# just does not play nice with covariant and contra-variant casting of nested types – there were some combinations of in
and out
on the trait types I could use to make the above compile (without needing to Map
) but they were causing runtime casting exceptions – so I decided to err on the side of caution and make the casting explicit. This adds some overhead, but is correct, which is more important. So, we need to run a map operation to get to the base interface from the concrete nested inner-type.
Here's an alternative approach that doesn't need the Map
:
static K<F, K<Seq, B>> Traversable<Seq>.Traverse<F, A, B>(Func<A, K<F, B>> f, K<Seq, A> ta)
{
return Foldable.foldBack(cons, F.Pure(empty<B>().Kind()), ta);
K<F, K<Seq, B>> cons(K<F, K<Seq, B>> xs, A x) =>
Applicative.lift((b, bs) => b.Cons(bs.As()).Kind(), f(x), xs);
}
This keeps working with the abstract types all the way through the fold operation. A few things to notice compared to the original implementation:
empty<B>()
has.Kind()
called on it, that turnsSeq<B>
intoK<Seq, B>
- The updated
cons
function now has to convertbs
to aSeq<B>
(using.As()
) for each item – this is converting an interface to a struct (Seq<A>
is a value-type). - Once the
Cons
operation is complete it has to be converted back to the base-interface usingKind()
Ultimately, this has to do multiple boxing operations per item (.As()
and .Kind()
) which adds more overhead than necessary. And that is why this isn't the default implementation.
It is nearly always the case that when you're traversing a collection-type, likeSeq<A>
, the applicative type you're mapping to is usually not a collection-type. So, doing theMap
operation once (for the outerApplicative
) is much more efficient than doing it on every item.
Because collections are a little more involved I wanted to present them first. Now let's look at a simpler example: Either<L, R>
:
// Either Traverse
static K<F, K<Either<L>, B>> Traverse<F, A, B>(Func<A, K<F, B>> f, K<Either<L>, A> ta) =>
ta switch
{
Either.Right<L, A> (var r) => F.Map(Right, f(r)),
Either.Left<L, A> (var l) => F.Pure(Left<B>(l)),
_ => throw new NotSupportedException()
};
// Helper method to construct `Right` and cast to the underlying K type
static K<Either<L>, A> Right<A>(A value) =>
Either<L, A>.Right(value);
// Helper method to construct `Left` and cast to the underlying K type
static K<Either<L>, A> Left<A>(L value) =>
Either<L, A>.Left(value);
Other than the constructor functions (Left
and Right
) this is much simpler and should be easier to follow.
We simply pattern-match on ta
which is a K<Either<L>, A>
to find out what state the Either<L, A>
is in:
- For the
Either.Right
case we invoke thef
transformation function to get a resultingF
applicative – we thenMap
the value within theApplicative
and lift it into theRight
constructor, leaving us withK<F, K<Either<L, B>>>
. - For the
Either.Left
case we want to shortcut the expression, so we don't call thef
transformation function at all and instead lift theLeft
state into the applicative usingF.Pure
.
This then captures the Either
short-cutting effect in the Traverse
operation – we literally don't call the transformation operation at all if we're not in any state to deal with its result.
Sequence
In the previous version of language-ext there were also Sequence
extension methods (for all of the types that supported Traverse
. Sequence
is like calling Traverse
with the identity function on an already nested pair of types:
var xs = Seq(Some(1), Some(2), None); // Nested structure
var ys = xs.Sequence(); // This line is equivalent
var zs = xs.Traverse(x => x); // to this line.
// ys == zs
In v5
of language-ext there are three additional methods with default implementations in the Traversable
trait, including Sequence
:
public static virtual K<F, K<T, A>> Sequence<F, A>(
K<T, K<F, A>> ta)
where F : Applicative<F> =>
Traversable.traverse(x => x, ta);
public static virtual K<M, K<T, B>> TraverseM<M, A, B>(
Func<A, K<M, B>> f,
K<T, A> ta)
where M : Monad<M> =>
Traversable.traverse(f, ta);
public static virtual K<F, K<T, A>> SequenceM<F, A>(
K<T, K<F, A>> ta)
where F : Monad<F> =>
Traversable.traverseM(x => x, ta);
Sequence
and SequenceM
are functionally equal, just one constrains to Applicactive
and the other constrains to Monad
. TraverseM
is the monad equivalent of Traverse
.
Mostly, you'd never need to implement these yourself, but there may be some traversable types that could benefit from having bespoke overridden implementations. Specifically, collection types like Seq<A>
have TraverseM
implemented because it enforces sequential operations – this is useful when the applicative type you're working with is asynchronous: TraverseM
enforces a serial evaluation rather than the parallel applicative behaviour – this enables the same functionality as TraverseSerial
and TraverseParallel
from v4
– but generalises it.
BIG NOTE:Sequence
andSequenceM
both take already nested types. C# doesn't like this for anything other than the nestedK
interface types, soSeq<Option<int>>
can't be used withSequence
(compile-time error) – onlyK<Seq, K<Option, int>>
; so if you're already in the more abstract space thenSeqeunce
will work, otherwise not. The workaround is to useTraverse
with an identity function:xs.Traverse(x => x)
– which works fine. This issue seems to just be a fundamental limitation of C# – it's extremely annoying, but at least there's a workaround.
Converting back to concrete
The result of the Traverse
extension method (below) is two nested K
types. Getting back to a concrete type isn't easy, because we can't just cast (as we do inside the .As()
extension).
public static partial class TraversableExtensions
{
public static K<F, K<T, B>> Traverse<T, F, A, B>(
this K<T, A> ta,
Func<A, K<F, B>> f)
where T : Traversable<T>
where F : Applicative<F> =>
Traversable.traverse(f, ta);
...
}
Everything that works with the K
type will continue to work: so using Map
, Apply
, etc. will 'just work'. But, obviously we want to get back to our concrete data types to use them...
To make this as painless as possible, I have found that adding a Traverse
and TraverseM
method to your data-type reduces the complexity of getting back to a concrete state (this is in addition to the trait-implementation of Traverse
):
Here's the Traverse
and TraverseM
implementation for Seq<A>
:
public readonly struct Seq<A>
{
...
public K<F, Seq<B>> Traverse<F, B>(Func<A, K<F, B>> f)
where F : Applicative<F> =>
F.Map(x => x.As(), Traversable.traverse(f, this));
public K<M, Seq<B>> TraverseM<M, B>(Func<A, K<M, B>> f)
where M : Monad<M> =>
M.Map(x => x.As(), Traversable.traverseM(f, this));
...
}
This implementation is pretty much always exactly the same for all types, you just need to replace Seq<B>
with whatever type you're implementing. So, copy-n-paste is your friend here.
What this does is map the inner-type to its concrete-type by calling .As()
on it. And because this is a member of the Seq<A>
type (not an extension method) it gets picked first by C# (ahead of the Traverse
and TraverseM
extension methods).
It still means we have a generic outer-type (K<F, ...>
), but the inner-type (Seq<B>
) is now concrete. This is then enough for us to call .As()
on to make the whole type concrete:
var items = Seq("1", "2", "3", "X", "5");
Option<Seq<int>> values = items.Traverse(s => parseInt(s)).As();
This is a little bit of a rough edge for traversables. It's a bit more effort than the other trait types and less elegant in implementation. However, it does work without any cheating and it does turn into a truly generic system of traversables.
Breaking Changes from v4
- The 500 or so extension methods that existed before for
Sequence
andTraverse
are now gone. The methods in the trait implementations have taken over that role. The type signatures are slightly different, with one variant ofSequence
removed. - You nearly always want to use
Traverse
and notSequence
– because C#'s type-system won't play nice with your nested types most of the time.Sequence
is useful when the inner nested type is already inK<F, A>
form. - In pre-
v5
language-ext there wereTraverse
extension-methods for types that can't now be supported [with the new fixed trait-interface]. And so, you may get some breaking changes whereTraverse
doesn't exist any more for a nested-pair of types. In reality this is unlikely, because they didn't really make much sense and were just there for completeness. TraverseSerial
andTraverseParallel
are currently deprecated. The parallelisation of the operations are now dependent on the implementation ofTraverse
only. However, types likeSeq<A>
also have aTraverseM
implementation which enforces serial operations – so you can think ofTraverse
as parallel andTraverseM
as serial.- Because
Traverse
andSequence
return nested generic-types, it's not possible to make them concrete without having the same number of extension methods as before – so you'll need to call.As()
to get back to your concrete type (see previous section).
Conclusion
There's no doubt that there's a couple of rough edges with traversables due to the limitations of the C# type-system. I will keep experimenting with ideas that could help improve that situation – but I won't do any hacks, I want this to use the type-system as it is.
Really, it would be nice to get some parity with the extension methods from v4
, namely no issues with nested types and implicit variance casting. Luckily, everything is still totally usable, but the issues around Sequence
are a little frustrating.
Even with all of that said, this version now has the bugs in Traverse
and Sequence
– from v4
– fixed. Because of the trait system, not in spite of it. And, you can now build your own traversable types which will instantly work with all of your foldable & applicative types and the foldable & applicative types in language-ext.
One thing that really frustrated me in the past was creating a type in my app domain that I then wanted to use with Traverse
, only to remember that there was no generic traversable trait and that I'd have to implement Traverse
and Sequence
n-times to get the same functionality as the types in language-ext. That problem has now gone – you still need to implement Traverse
for your type, but just once.
When you remember that Traverse
is one of the most powerful tools in the pure-functional-programming-toolkit – it means (regardless of rough edges) we've gained some real FP superpowers here. And these superpowers can finally land in your code-base rather than just the language-ext provided types.
Next up: Monads! I promised myself I'd never write 'Yet Another Monad Tutorial' – well I guess I can't not write one now...