Caching of data in Mathematica

2019-04-13 08:36发布

there is a very time-consuming operation which generates a dataset in my package. I would like to save this dataset and let the package rebuild it only when I manually delete the cached file. Here is my approach as part of the package:

myDataset = Module[{fname, data}, 
    fname = "cached-data.mx";
    If[FileExistsQ[fname], 
        Get[fname],
        data = Evaluate[timeConsumingOperation[]];
        Put[data, fname];
        data]
];

timeConsumingOperation[]:=Module[{},
    (* lot of work here *)
    {"data"}
];

However, instead of writing the long data set to the file, the Put command only writes one line: "timeConsumingOperation[]", even if I wrap it with Evaluate as above. (To be true, this behaviour is not consistent, sometimes the dataset is written, sometimes not.)

How do you cache your data?

2条回答
兄弟一词,经得起流年.
2楼-- · 2019-04-13 09:25

In the past, whenever I've had trouble with things evaluating it is usually when I have not correctly matched the pattern required by the function. For instance,

f[x_Integers]:= x

which won't match anything. Instead, I meant

f[x_Integer]:=x

In your case, though, you have no pattern to match: timeConsumingOperation[].

You're problem is more likely related to when timeConsumingOperation is defined relative to myDataset. In the code you've posted above, timeConsumingOperation is defined after myDataset. So, on the first run (or immediately after you've cleared the global variables) you would get exactly the result you're describing because timeConsumingOperation is not defined when the code for myDataset is run.

Now, SetDelayed (:=) automatically causes the variable to be recalculated whenever it is used, and since you do not require any parameters to be passed, the square brackets are not necessary. The important point here is that timeConsumingOperation can be declared, as written, prior to myDataset because SetDelayed will cause it not to be executed until it is used.

All told, your caching methodology looks exactly how I would go about it.

查看更多
叼着烟拽天下
3楼-- · 2019-04-13 09:26

Another caching technique I use very often, especially when you might not want to insert the precomputed form in e.g. a package, is to memoize the expensive evaluation(s), such that it is computed on first use but then cached for subsequent evaluations. This is readily accomplished with SetDelayed and Set in concert:

f[arg1_, arg2_] := f[arg1, arg2] = someExpensiveThing[arg1, arg2]

Note that SetDelayed (:=) binds higher than Set (=), so the implied order of evaluation is the following, but you don't actually need the parens:

f[arg1_, arg2_] := ( f[arg1, arg2] = someExpensiveThing[arg1, arg2])

Thus, the first time you evaluate f[1,2], the evaluation-delayed RHS is evaluated, causing resulting value is computed and stored as an OwnValue of f[1,2] with Set.

@rcollyer is also right in that you don't need to use empty brackets if you have no arguments, you could just as easily write:

g := g = someExpensiveThing[...]

There's no harm in using them, though.

查看更多
登录 后发表回答