I'm trying to get my head around how type inference is implemented. In particularly, I don't quite see where/why the heavy lifting of "unification" comes into play.
I'll give an example in "pseudo C#" to help clarify:
The naive way to do it would be something like this:
Suppose you "parse" your program into an expression tree such that it can be executed with:
interface IEnvironment
{
object lookup(string name);
}
interface IExpression
{
// Evaluate this program in this environment
object Evaluate(IEnvironment e);
}
So something like "Multiplication" might be implemented with:
class Multiply : IExpression
{
IExpression lhs;
IExpression rhs;
// etc.
public object Evaluate(IEnvironment e)
{
// assume for the moment C# has polymorphic multiplication
return lhs.Evaluate(e) * rhs.Evaluate(e);
}
}
Then to "implement" type inference, you could just do something like:
interface ITypeEnvironment
{
Type getType(string name);
}
interface IExpression
{
//as before
object Evaluate(IEnvironment e);
// infer type
Type inferType(ITypeEnvironment typeEnvironment);
}
Then "Multiplication"'s type inference might just be something like:
class Multiply : IExpression
{
IExpression lhs;
IExpression rhs;
// ...
public Type inferType(ITypeEnvironment typeEnvironment)
{
Type lhsType = lhs.inferType(typeEnvironment);
Type rhsType = rhs.inferType(typeEnvironment);
if(lhsType != rhsType)
throw new Exception("lhs and rhs types do not match");
// you could also check here that lhs/rhs are one of double/int/float etc.
return lhsType;
}
}
lhs and rhs might be simple constants, or "variables" which are looked up in the environment:
class Constant : IExpression
{
object value;
public Type inferType(ITypeEnvironment typeEnvironment)
{
return value.GetType(); // The type of the value;
}
public object Evaluate(IEnvironment environment)
{
return value;
}
}
class Variable : IExpression
{
string name;
public Type inferType(ITypeEnvironment typeEnvironment)
{
return typeEnvironment.getType(name);
}
public object Evaluate(IEnvironment environment)
{
return environment.lookup(name);
}
}
But nowhere in this do we end up with the need for a "unification" algorithm.
So, clearly, my example is not complex enough. Does it need higher order functions? Do we need "parametric polymorphism"?
What is the simplest possible example where "unification" is actually needed to correctly infer the type of an expression.
An example in Scheme would be ideal (i.e. an example of a very small Scheme program where you require unification to correctly infer the type of the s-expression).
Let me completely ignore your example and give you an example of where we do method type inference in C#. (If this topic interests you then I encourage you to read the "type inference" archive of my blog.)
Consider:
Here we have a generic method M that takes a dictionary that maps strings onto lists of T. Suppose you have a variable:
We have called
M<T>
without providing a type argument, so the compiler must infer it.The compiler does so by "unifying"
Dictionary<string, List<int>>
withIDictionary<string, List<T>>
.First it determines that
Dictionary<K, V>
implementsIDictionary<K, V>
.From that we deduce that
Dictionary<string, List<int>>
implementsIDictionary<string, List<int>>
.Now we have a match on the
IDictionary
part. We unify string with string, which is obviously all good but we learn nothing from it.Then we unify List with List, and realize that we have to recurse again.
Then we unify int with T, and realize that int is a bound on T.
The type inference algorithm chugs away until it can make no more progress, and then it starts making further deductions from its inferences. The only bound on T is int, so we deduce that the caller must have wanted T to be int. So we call
M<int>
.Is that clear?
What if you just don't know the type of the variable? That's the whole point of type inference, right? Something very simple like this (in some pseudocode language):
You don't know the type of
x
, until you infer the addition, and realize that it must be an integer because it's added to an integer, or something like that.What if you have another function like this:
You can't figure out the type of
x
until you have figured out the type offoo
, and so on.So when you first see a variable, you have to just assign some placeholder type for a variable, and then later, when that variable is passed to some kind of function or something, you have to "unify" it with the argument type of the function.
Suppose we have a function
f(x, y)
where x might be e.g. FunctionOfTwoFunctionsOfInteger or it might be FunctionOfInteger.
suppose we pass in
f(g(u, 2), g(1, z))
Now with unification, u is bound to 1, and z is bound to 2, so the first argument is a FunctionOfTwoFunctionsOfInteger.
So, we have to infer that the type of x and y are both FunctionOfTwoFunctionsOfInteger
I'm not too familiar with C#, but with lambda expressions (or equivalently delegates or whatever) this should probably be possible.
For an example of where type inferencing is very useful in improving the speed of theorem proving, take a look at "Schubert's Steamroller"
http://www.rpi.edu/~brings/PAI/schub.steam/node1.html
There's an issue of the Journal of Automated Reasoning devoted to solutions and formulations to this problem, most of which involve typed inferencing in theorem proving systems:
http://www.springerlink.com/content/k1425x112w6671g0/