Which set of features would deserve a 1.0 release for Typed Clojure?
We’ve learnt valuable lessons from real-world developers about the pain points of using Typed Clojure, and after several years of mulling it over, we have a much better idea of how we might improve it. This series of posts will outline our proposed solutions, and give an impression of what we hope for Typed Clojure 1.0.
CircleCI has a great blog post outlining numerous issues with Typed Clojure in practice.
They boil down to:
- Type checking is too cumbersome to be interactive
- Hard to distinguish user-errors from type system shortcomings
- Not enough community support for library annotations
- Poor inference of higher-order function usage
I’ve already written about the limitations of the local type inference Typed Clojure uses. In this post, we’ll look at some interesting new directions in local type inference that will help type check more programs, with less local annotations, and more helpful error messages.
Improved Local Inference
When we call higher-order functions as programmers, we’re implicitly reasoning about the flow of values through and between functions. Can we automatically recover this information to help the type checker perform similar reasoning?
Let’s use map
as an example. The control flow of
the following expression
(map (fn [x] x) [1 2 3])
can be described in two steps.
- Call
(fn [x] x)
on elements of[1 2 3]
, - return a sequence of those results.
In a picture:
2.
/----------------v
(map (fn [x] x) [1 2 3]) ;=> [1 2 3]
^------/
1.
If we look at the type of map
, could we recover the same information?
(ann map (All [a b]
[[a -> b] (Seqable a) -> (Seqable b)]))
Yes. We notice the first argument has type [a -> b]
, and so
requires a value of type a
before it produces a value of type b
.
The second argument is of type (Seqable a)
, and is thus our only
source of a
’s. Let’s draw this first step.
(ann map (All [a b]
[[a -> b] (Seqable a) -> (Seqable b)]))
^----------------/
1.
The second step involves collecting the b
’s the function returns
and returning a sequence of them. Since the only way to get a b
is from the function argument, we can draw another arrow.
(ann map (All [a b]
2.
/--------------------------v
[[a -> b] (Seqable a) -> (Seqable b)]))
^----------------/
1.
There are many potential ways this kind of control-flow information can help more type check Clojure code, and provide more informative error messages. Here are some ideas.
Infer local fn's
Take the following code, where f
and g
are
unannotated anonymous functions like (fn [x] ...)
.
(let [h (comp f g)]
(map h c))
;=> r
We can delay checking f
and g
until
h
is called, and wait for more type information
from the map
expression.
v-\ 3.
(let [h (comp f g)]
| ^
2./--------/
| 4.\--v
(map h c)) ;=> r
^-/ 1.
There are 4 control flow edges above.
- Arguments flow from
c
toh
. - The same argument is forwarded to
g
. g
’s result is passed tof
.- A sequence of
f
’s results returned asr
.
All this can be justified just from the types of map
and comp
. Here’s how to derive the information from comp
.
The control flow of an expression ((comp f g) v)
looks like:
v----\ 2.
((comp f g) v) ;=> v'
| ^-----/ 1. ^
3.\-----------------/
Notice that without v
, there is nothing to call g
with,
so it’s safe to assume (comp f g)
by itself can never
call g
, and, by transitivity, f
.
The type of comp
has this exact information: the output of
comp
must be called before either comp
argument is called.
(ann comp (All [a b c]
v-------------\ 2.
[[b -> c] [a -> b] -> [a -> c]]))
| ^-----------/ 1. ^
3.\-------------------------/
In words, once a
is provided:
a
flows to the second argument.b
flows to the first argument.c
flows to the return.
We can justify a delayed check of comp
’s arguments
because providing a
“kicks it all off”.
Improved Error Messages
CircleCI’s blog post rightly complained that error messages often don’t provide enough guidance for a user to identify the cause(s) of an error.
By using the same control flow information as above, we can do a better job of explaining why some applications fail. For example, take
(comp inc boolean)
Typed Clojure currently reports an error which includes the types of
comp
, inc
, and boolean
, and leaves the user to diagnose
the issue.
Instead, we can incorporate the control flow information into the error message like so:
Type Error:
Input to first argument of comp must accept
the output of the second argument of comp.
In the following diagram, `b` was inferred
as Boolean from the result of `boolean`, but
it flowed to the input of `inc`, which only
accepts Number.
[[b -> c] [a -> b] -> [a -> c]]
^ |
\-------------/
in: (comp inc boolean)
Type check transducers
Transducers in Clojure are a frequent source of
anonymous functions.
Transducers “compose left-to-right”, so the following
code first increments elements of c
, then decrements.
(into []
(comp (map (fn [x] (inc x)))
(map (fn [y] (dec y))))
c)
;=> r
We could build a similar control flow graph from
the types of into
, comp
and map
that allow
us to propagate the type of c
to the formal
parameter x
.
(into []
/-----------------v
| (comp (map (fn [x] (inc x)))
\ (map (fn [y] (dec y))))
\
c)
;=> r
Then, having a type for x
, would trigger the checking
of the subsequent composed transducers.
Higher-order operations
Clojure has a host of handy map operations like get-in
,
assoc-in
, and select-keys
. We can “reify them at the
type level” and then use control flow analysis to
type check operations “in order” (of evaluation).
For example, update
updates a map at a given entry
with a function.
(update {:a 1} :a inc) ;=> {:a 2}
By reifying get
and assoc
at the type level, we
could annotate update
like so:
(ann update (All [m k v]
[m k [(Get m k) -> v] -> (Assoc m k v)]))
Here’s a useful “order” to check the type variables,
so we avoid any hard questions about running get
or assoc
“backwards.
(ann update (All [m k v]
2./----------------v
[m k [(Get m k) -> v] -> (Assoc m k v)]))
1. \ | ^ ^ ^ ^
\--------/ /------------------/ /
In words, first we infer the value of the map and key
from the first two arguments to update
. Then we have
enough information to “call” the third argument, which
gets propagated to Assoc
, which itself now has
“ground” types so is easy to calculate.
This is enough information to help check anonymous fn’s
passed to update
, like:
(update {:a 1} :a #(+ 42 %)) ;=> {:a 2}
Notice that the control flow happens to go left-to-right
in update
, but as we’ve seen in other examples like
comp
, it could just as well be reversed and we could
derive the same information.
If for some reason we can’t infer the type of a local
fn, instead of making a last-ditch effort to check
the fn with argument types Any
, we can simply ask the
user to annotate the fn.
For example, checking the following function at type
[Any -> ?]
.
(fn [x] (inc x))
Gives a confusing error message that inc
does not
accept type Any
. Now, since we could lean on control
improved inference to check most fn’s, we can reasonably
be more conservative here.
I’m very excited at the possibilities here. It’s still early
days for this research, but surely, if you can intuitively
reconstruct the information for a given application based
on local information, there should be an algorithm to
recover it, right? :)
Next post
In the next few posts, we’ll describe our approach to improving feedback time from the type checker, a framework for providing custom type rules for macros, and explain how automatic library annotations can help mitigate the lack of community-provided annotations.