I'm trying to create a memoization interface for functions with arbitrary number of arguments, but I'm failing miserably I feel like my solution is not very flexible. I tried to define an interface for a function which gets memoized automatically upon execution and each function will have to implement this interface. Here is an example with a two parameter Exponential Moving Average function:
class EMAFunction:IFunction
{
Dictionary<List<object>, List<object>> map;
class EMAComparer : IEqualityComparer<List<object>>
{
private int _multiplier = 97;
public bool Equals(List<object> a, List<object> b)
{
List<object> aVals = (List<object>)a[0];
int aPeriod = (int)a[1];
List<object> bVals = (List<object>)b[0];
int bPeriod = (int)b[1];
return (aVals.Count == bVals.Count) && (aPeriod == bPeriod);
}
public int GetHashCode(List<object> obj)
{
// Don't compute hash code on null object.
if (obj == null)
{
return 0;
}
List<object> vals = (List<object>) obj[0];
int period = (int) obj[1];
return (_multiplier * period.GetHashCode()) + vals.Count;
}
}
public EMAFunction()
{
NumParams = 2;
Name = "EMA";
map = new Dictionary<List<object>, List<object>>(new EMAComparer());
}
#region IFunction Members
public int NumParams
{
get;
set;
}
public string Name
{
get;
set;
}
public object Execute(List<object> parameters)
{
if (parameters.Count != NumParams)
throw new ArgumentException("The num params doesn't match!");
if (!map.ContainsKey(parameters))
{
//map.Add(parameters,
List<double> values = new List<double>();
List<object> asObj = (List<object>)parameters[0];
foreach (object val in asObj)
{
values.Add((double)val);
}
int period = (int)parameters[1];
asObj.Clear();
List<double> ema = TechFunctions.ExponentialMovingAverage(values, period);
foreach (double val in ema)
{
asObj.Add(val);
}
map.Add(parameters, asObj);
}
return map[parameters];
}
public void ClearMap()
{
map.Clear();
}
#endregion
}
Here are my tests of the function:
private void MemoizeTest()
{
DataSet dataSet = DataLoader.LoadData(DataLoader.DataSource.FROM_WEB, 1024);
List<String> labels = dataSet.DataLabels;
Stopwatch sw = new Stopwatch();
IFunction emaFunc = new EMAFunction();
List<object> parameters = new List<object>();
int numRuns = 1000;
long sumTicks = 0;
parameters.Add(dataSet.GetValues("open"));
parameters.Add(12);
// First call
for(int i = 0; i < numRuns; ++i)
{
emaFunc.ClearMap();// remove any memoization mappings
sw.Start();
emaFunc.Execute(parameters);
sw.Stop();
sumTicks += sw.ElapsedTicks;
sw.Reset();
}
Console.WriteLine("Average ticks not-memoized " + (sumTicks/numRuns));
sumTicks = 0;
// Repeat call
for (int i = 0; i < numRuns; ++i)
{
sw.Start();
emaFunc.Execute(parameters);
sw.Stop();
sumTicks += sw.ElapsedTicks;
sw.Reset();
}
Console.WriteLine("Average ticks memoized " + (sumTicks/numRuns));
}
Update:
Thanks for pointing out my n00bish error... I always forget to call Reset on the stopwatch!
I've seen another approach to memoization as well... it doesn't offer n-argument memoization, but my approach with the Interface is not much more advantageous since I have to write a class for each function. Is there a reasonable way that I can merge these ideas into something more robust? I want to make it easier to memoize a function without making the user write a class for each function that they intend to use.
How about this? First write a one-argument memoizer:
static Func<A, R> Memoize<A, R>(this Func<A, R> f)
{
var d = new Dictionary<A, R>();
return a=>
{
R r;
if (!d.TryGetValue(a, out r))
{
r = f(a);
d.Add(a, r);
}
return r;
};
}
Straightforward. Now write a function tuplifier:
static Func<Tuple<A, B>, R> Tuplify<A, B, R>(this Func<A, B, R> f)
{
return t => f(t.Item1, t.Item2);
}
And a detuplifier:
static Func<A, B, R> Detuplify<A, B, R>(this Func<Tuple<A, B>, R> f)
{
return (a, b) => f(Tuple.Create(a, b));
}
and now a two-argument memoizer is easy:
static Func<A, B, R> Memoize<A, B, R>(this Func<A, B, R> f)
{
return f.Tuplify().Memoize().Detuplify();
}
To write a three-argument memoizer just keep following this pattern: make a 3-tuplifier, a 3-untuplifier, and a 3-memoizer.
Of course, if you don't need them, there's no need to make the tuplifiers nominal methods:
static Func<A, B, R> Memoize<A, B, R>(this Func<A, B, R> f)
{
Func<Tuple<A, B>, R> tuplified = t => f(t.Item1, t.Item2);
Func<Tuple<A, B>, R> memoized = tuplified.Memoize();
return (a, b) => memoized(Tuple.Create(a, b));
}
UPDATE: You ask what to do if there is no tuple type. You could write your own; it's not hard. Or you could use anonymous types:
static Func<T, R> CastByExample<T, R>(Func<T, R> f, T t) { return f; }
static Func<A, B, R> Memoize<A, B, R>(this Func<A, B, R> f)
{
var example = new { A=default(A), B=default(B) };
var tuplified = CastByExample(t => f(t.A, t.B), example);
var memoized = tuplified.Memoize();
return (a, b) => memoized(new {A=a, B=b});
}
Slick, eh?
UPDATE: C# 7 now has value tuples built in to the language; use them rather than rolling your own or using anonymous types.
First, you need to call sw.Reset()
between your tests. Otherwise your results for the second test will be in addition to the time from the first.
Second, you probably shouldn't use vals.GetHashCode()
in your GetHashCode()
override on the comparer, as this will lead you to getting different hash codes for objects that would evaluate to true
for your Equals
override. For now, I would worry about making sure that equivalent objects always get the same hash code rather than trying to get an even distribution of the codes. If the hash codes don't match, Equals
will never be called, so you'll end up processing the same parameters multiple times.
StopWatch.Stop does not reset the stopwatch so you are accumulating time on each start/stop.
For example
Stopwatch sw = new Stopwatch();
sw.Start();
System.Threading.Thread.Sleep(100);
sw.Stop();
Debug.WriteLine(sw.ElapsedTicks);
sw.Start();
System.Threading.Thread.Sleep(100);
sw.Stop();
Debug.WriteLine(sw.ElapsedTicks);
Gives the following results
228221
454626
You can use StopWatch.Restart
(Framework 4.0) to restart the stopwatch each time, or if not Framework 4.0, you can use StopWatch.Reset
to reset the stopwatch.
Alternative (to tuples & anonymous types) approach might be as follows:
static void Main(string[] args)
{
var func = Memoize<int, int, int>(Func);
Console.WriteLine(func(3)(4));
Console.WriteLine(func(3)(5));
Console.WriteLine(func(2)(5));
Console.WriteLine(func(3)(4));
}
//lets pretend this is very-expensive-to-compute function
private static int Func(int i, int j)
{
return i + j;
}
private static Func<TArg1, Func<TArg2, TRes>> Memoize<TArg1, TArg2, TRes>(Func<TArg1, TArg2, TRes> func)
{
Func<TArg1, Func<TArg2, TRes>> func1 =
Memoize((TArg1 arg1) => Memoize((TArg2 arg2) => func(arg1, arg2)));
return func1;
}
private static Func<TArg, TRes> Memoize<TArg, TRes>(Func<TArg, TRes> func)
{
var cache = new Dictionary<TArg, TRes>();
return arg =>
{
TRes res;
if( !cache.TryGetValue(arg, out res) )
{
Console.WriteLine("Calculating " + arg.ToString());
res = func(arg);
cache.Add(arg, res);
}
else
{
Console.WriteLine("Getting from cache " + arg.ToString());
}
return res;
};
}
Based on these two Memoize funcs you can easily build extensions for as many args as you wish.
I initially came here just looking for an abstract memoization method for a no-parameter function. This isn't exactly an answer to the question but wanted to share my solution in case someone else came looking for the simple case.
public static class MemoizationExtensions
{
public static Func<R> Memoize<R>(this Func<R> f)
{
bool hasBeenCalled = false; // Used to determine if we called the function and the result was the same as default(R)
R returnVal = default(R);
return () =>
{
// Should be faster than doing null checks and if we got a null the first time,
// we really want to memoize that result and not inadvertently call the function again.
if (!hasBeenCalled)
{
hasBeenCalled = true;
returnVal = f();
}
return returnVal;
};
}
}
If you use LinqPad you can use the following code to easily test out the functionality through the use of LinqPad's super cool Dump method.
new List<Func<object>>(new Func<object>[] {
() => { "Entered func A1".Dump(); return 1; },
() => { "Entered func A2".Dump(); return default(int); },
() => { "Entered func B1".Dump(); return String.Empty; },
() => { "Entered func B2".Dump(); return default(string); },
() => { "Entered func C1".Dump(); return new {Name = String.Empty}; },
() => { "Entered func C2".Dump(); return null; },
})
.ForEach(f => {
var f1 = MemoizationExtensions.Memoize(f);
Enumerable
.Range(1,3)
.Select(i=>new {Run=i, Value=f1()})
.Dump();
});
P.S. You will need to include the MemoizationExtensions class in the code of the LinqPad script otherwise it won't work!