The Road to Typed Clojure 1.0: Part 1

Which set of features would deserve a 1.0 release for Typed Clojure?

We’ve learnt valuable lessons from real-world developers about the pain points of using Typed Clojure, and after several years of mulling it over, we have a much better idea of how we might improve it. This series of posts will outline our proposed solutions, and give an impression of what we hope for Typed Clojure 1.0.

CircleCI has a great blog post outlining numerous issues with Typed Clojure in practice.

They boil down to:

  1. Type checking is too cumbersome to be interactive
  2. Hard to distinguish user-errors from type system shortcomings
  3. Not enough community support for library annotations
  4. Poor inference of higher-order function usage

I’ve already written about the limitations of the local type inference Typed Clojure uses. In this post, we’ll look at some interesting new directions in local type inference that will help type check more programs, with less local annotations, and more helpful error messages.

Improved Local Inference

When we call higher-order functions as programmers, we’re implicitly reasoning about the flow of values through and between functions. Can we automatically recover this information to help the type checker perform similar reasoning?

Let’s use map as an example. The control flow of the following expression

(map (fn [x] x) [1 2 3])

can be described in two steps.

  1. Call (fn [x] x) on elements of [1 2 3],
  2. return a sequence of those results.

In a picture:

(map (fn [x] x) [1 2 3]) ;=> [1 2 3]

If we look at the type of map, could we recover the same information?

(ann map (All [a b]
           [[a -> b] (Seqable a) -> (Seqable b)]))

Yes. We notice the first argument has type [a -> b], and so requires a value of type a before it produces a value of type b. The second argument is of type (Seqable a), and is thus our only source of a’s. Let’s draw this first step.

(ann map (All [a b]
           [[a -> b] (Seqable a) -> (Seqable b)]))

The second step involves collecting the b’s the function returns and returning a sequence of them. Since the only way to get a b is from the function argument, we can draw another arrow.

(ann map (All [a b]
           [[a -> b] (Seqable a) -> (Seqable b)]))

There are many potential ways this kind of control-flow information can help more type check Clojure code, and provide more informative error messages. Here are some ideas.

Infer local fn's

Take the following code, where f and g are unannotated anonymous functions like (fn [x] ...).

(let [h (comp f g)]
  (map h c))
;=> r

We can delay checking f and g until h is called, and wait for more type information from the map expression.

              v-\ 3.
(let [h (comp f g)]
              | ^
       |    4.\--v
  (map h c)) ;=> r
       ^-/ 1.

There are 4 control flow edges above.

  1. Arguments flow from c to h.
  2. The same argument is forwarded to g.
  3. g’s result is passed to f.
  4. A sequence of f’s results returned as r.

All this can be justified just from the types of map and comp. Here’s how to derive the information from comp. The control flow of an expression ((comp f g) v) looks like:

       v----\ 2.
((comp f    g)    v) ;=> v'
       |    ^-----/ 1.   ^

Notice that without v, there is nothing to call g with, so it’s safe to assume (comp f g) by itself can never call g, and, by transitivity, f.

The type of comp has this exact information: the output of comp must be called before either comp argument is called.

(ann comp (All [a b c]
              v-------------\ 2.
            [[b -> c] [a -> b] -> [a -> c]]))
              |        ^-----------/ 1. ^

In words, once a is provided:

  1. a flows to the second argument.
  2. b flows to the first argument.
  3. c flows to the return.

We can justify a delayed check of comp’s arguments because providing a “kicks it all off”.

Improved Error Messages

CircleCI’s blog post rightly complained that error messages often don’t provide enough guidance for a user to identify the cause(s) of an error.

By using the same control flow information as above, we can do a better job of explaining why some applications fail. For example, take

(comp inc boolean)

Typed Clojure currently reports an error which includes the types of comp, inc, and boolean, and leaves the user to diagnose the issue.

Instead, we can incorporate the control flow information into the error message like so:

Type Error:
Input to first argument of comp must accept
the output of the second argument of comp.

In the following diagram, `b` was inferred
as Boolean from the result of `boolean`, but
it flowed to the input of `inc`, which only
accepts Number.

 [[b -> c] [a -> b] -> [a -> c]]
   ^             |
in: (comp inc boolean)

Type check transducers

Transducers in Clojure are a frequent source of anonymous functions. Transducers “compose left-to-right”, so the following code first increments elements of c, then decrements.

(into []
      (comp (map (fn [x] (inc x)))
            (map (fn [y] (dec y))))
;=> r

We could build a similar control flow graph from the types of into, comp and map that allow us to propagate the type of c to the formal parameter x.

(into []
    | (comp (map (fn [x] (inc x)))
    \       (map (fn [y] (dec y))))
;=> r

Then, having a type for x, would trigger the checking of the subsequent composed transducers.

Higher-order operations

Clojure has a host of handy map operations like get-in, assoc-in, and select-keys. We can “reify them at the type level” and then use control flow analysis to type check operations “in order” (of evaluation).

For example, update updates a map at a given entry with a function.

(update {:a 1} :a inc) ;=> {:a 2}

By reifying get and assoc at the type level, we could annotate update like so:

(ann update (All [m k v]
              [m k [(Get m k) -> v] -> (Assoc m k v)]))

Here’s a useful “order” to check the type variables, so we avoid any hard questions about running get or assoc “backwards.

(ann update (All [m k v]
              [m k [(Get m k) -> v] -> (Assoc m k v)]))
            1. \ |       ^ ^                  ^ ^
                \--------/ /------------------/ /

In words, first we infer the value of the map and key from the first two arguments to update. Then we have enough information to “call” the third argument, which gets propagated to Assoc, which itself now has “ground” types so is easy to calculate.

This is enough information to help check anonymous fn’s passed to update, like:

(update {:a 1} :a #(+ 42 %)) ;=> {:a 2}

Notice that the control flow happens to go left-to-right in update, but as we’ve seen in other examples like comp, it could just as well be reversed and we could derive the same information.

Conservative checking

If for some reason we can’t infer the type of a local fn, instead of making a last-ditch effort to check the fn with argument types Any, we can simply ask the user to annotate the fn.

For example, checking the following function at type [Any -> ?].

(fn [x] (inc x))

Gives a confusing error message that inc does not accept type Any. Now, since we could lean on control improved inference to check most fn’s, we can reasonably be more conservative here.

I’m very excited at the possibilities here. It’s still early days for this research, but surely, if you can intuitively reconstruct the information for a given application based on local information, there should be an algorithm to recover it, right? :)

Next post

In the next few posts, we’ll describe our approach to improving feedback time from the type checker, a framework for providing custom type rules for macros, and explain how automatic library annotations can help mitigate the lack of community-provided annotations.

20 Sep 2018