Some functions types in Clojure seamlessly handle infinite arguments, while others misuse them and freeze our programs.
Clojure 1.11.1
user=> (apply (fn [& args]) (range))
nil
user=> (apply (fn []) (range))
^C ;; loops forever
Let’s level the playing field.
In the first case, the infinite arguments are preserved as a lazy sequence which is never realized by the function body, while the second case pours them into a Java array before the function body is called, looping “forever” (until the JVM runs out of memory or the size limitations of arrays are hit).
The compiler decides the behavior of fn
in this respect based on the
presence of a rest-argument.
The second case of using a Java array is more fundamental in Clojure, so
special support is needed to do something other than loop forever with infinite arguments.
So which other function types accept infinite arguments and why?
Vars simply apply
their mapped functions,
and thus inherit their ability to handle infinite arguments from the functions they hold.
So, since fn
’s with rest arguments can handle infinite arguments, then
so can vars containing them:
user=> (defn yes-inf [& args] [:yes-inf (take 10 args)])
user=> (apply yes-inf (range))
[:yes-inf (0 1 2 3 4 5 6 7 8 9)]
user=> (apply #'yes-inf (range))
[:yes-inf (0 1 2 3 4 5 6 7 8 9)]
On the other hand, fn
’s with fixed arguments (and vars containing them) cannot handle infinite arguments:
user=> (defn no-inf [])
user=> (apply no-inf (range))
^C
user=> (apply #'no-inf (range))
^C
Intuitively, we can do better: since fixed-arity functions in Clojure only support up to 20 parameters, we could decide to throw an error instead of walking past the 21st argument.
Collections are functions that usually accept 1 or 2 arguments, but they also loop forever when passed infinite arguments.
user=> (apply {} (range))
^C
user=> (apply [] (range))
^C
Again, intuition says we don’t need to walk past the 3rd argument, so this behavior can be improved. We can also add atomic values like symbols and keywords to this category of function.
Multimethods, like vars, are another kind of function that delegates to other functions. Unfortunately, they only support finite arguments.
Clojure 1.11.1
user=> (defmulti disp-first (fn [target & _] (class target)))
user=> (defmethod disp-first Long [target & _] (inc target))
user=> (apply disp-first 1 (range))
^C
Intuitively, if the dispatch function and the dispatch method both support infinite arguments, then so should the multimethod. This can be improved.
Let’s follow these breadcrumbs of intuition to enhance all Clojure functions with better handling of infinite arguments.
First, we noticed that vars inherit their capability for infinite args,
and yet multimethods could potentially do the same.
To bring them up to speed, we can
use the same trick as Var, but twice: instead
of the multimethod calling invoke
on itself, first apply
the dispatch function
then apply
the chosen method.
Now multimethods with rest parameters support infinite arguments!
user=> (defmulti disp-first (fn [target & _] (class target)))
user=> (defmethod disp-first Long [target & _] (inc target))
user=> (apply disp-first 1 (range))
2
But this does not handle fixed-arity multimethods, since they depend on the functions they contain for infinite argument handling:
user=> (defmulti mm (fn []))
user=> (apply mm (range))
^C
This brings us back to AFn
, where fn
’s applyTo
implementation lives.
The basic algorithm is to realize 20 arguments
to determine whether we can call
invoke
without building the final array, otherwise pour the 21st argument
and beyond into an array
to call invoke
. We can’t get around calling
invoke
, but let’s add features to AFn
so then it can throw an
exception rather than diverging in cases where it’s pointless to traverse
possibly-infinite args.
One such case is a fn
without rest arguments, which can only support
up to 20 arguments. The compiler always extends AFunction
in this case, so let’s hardcode the following case in AFn
’s applyTo
:
if the function being applied is an AFunction
, then throw an arity
exception for 21 or more arguments.
The risk: if anyone implements an AFunction
that supports rest arguments,
it won’t work for 21+ args. Let’s assume this never happens.
With this tweak, fixed-arg fn
’s, multimethods, vars, and even
vars containing multimethods now error instead of diverge:
user=> (apply identity (range))
Wrong number of args (21+) passed to: clojure.core/identity
user=> (defmulti mm identity)
user=> (apply mm (range))
Wrong number of args (21+) passed to: clojure.core/identity
user=> (apply #'mm (range))
Wrong number of args (21+) passed to: clojure.core/identity
That works for all 0-20 arg fn
’s;
now for everything else that extends AFn
.
I settled on a new AFn
method that returns this.class
if
the current class does not have a rest argument. If the
runtime type of the collection is identical to that class,
then we can short-circuit rest-argument support. If it’s
not equal, then it must be a extending class that we don’t
control, so continue as normal.
After implementing this method on all functional collections in Clojure, we now have the desirable semantics: error over divergence.
user=> (apply {} (range))
Wrong number of args (21+) passed to: clojure.lang.PersistentArrayMap
Libraries providing their own functional collections need to update their implementations to opt-in.
user=> (apply (proxy [clojure.lang.MapEntry] [0 1])
(range))
^C
These ideas have been implemented in this pull request.
To summarize, we’ve made every function in Clojure (except those
with rest-arguments) lazier either directly (fn
s, collections,
keywords) or indirectly (multimethods, vars). This enables
infinite arguments to error instead of diverge, unless the function
itself walks its infinite arguments. Given that most functions
only support fixed arguments, this enhancement has broad applicability.
In the case of multimethods, we’ve also added support for legitimate
uses of infinite arguments.
Interestingly, this post is essentially the opposite of my previous one: there for collections, we fixed arity exceptions to reflect the actual number of arguments passed; here we cut off the processing of arguments before all arguments can even be counted.
Thanks to Santiago Gepigon III for pairing on this and the post title.