So I'm arguing with my friend who claims that a compiler like GCC can detect a pure function automatically without any type information. I doubt that.
Languages like D or Haskell have purity in their type systems and a programmer explicitly defines what function is pure or not. A pure function has no side effects and can therefore very easily be parallelized.
So the question is: Is this all necessary or not? Could a compiler detect purity, without any meta or type information, just by assuming that anything that does IO or accesses global variables automatically is not pure?
Sure, you can detect pure functions in some cases. For instance,
can be detected as pure with simple static analysis. The difficulty is doing this in general, and detecting interfaces which use "internal" state but are externally pure is basically impossible.
GCC does have the warning options
-Wsuggest-attribute=pure
and-Wsuggest-attribute=const
, which suggest functions that might be candidates for thepure
andconst
attributes. I'm not sure whether it opts to be conservative (i.e. missing many pure functions, but never suggesting it for a non-pure function) or lets the user decide.Note that GCC's definition of
pure
is "depending only on arguments and global variables":Strict purity, i.e. the same results for the same arguments in all circumstances, is represented by the
const
attribute, but such a function cannot even dereference a pointer passed to it. So the parallelisation opportunities forpure
functions are limited, but much fewer functions can beconst
compared to the pure functions you can write in a language like Haskell.By the way, automatically parallelising pure functions is not as easy as you might think; the hard part becomes deciding what to parallelise. Parallelise computations that are too cheap, and overhead makes it pointless. Don't parallelise enough, and you don't reap the benefits. I don't know of any practical functional language implementation that does automatic parallelisation for this reason, although libraries like repa parallelise many operations behind the scenes without explicit parallelism in the user code.
There is another problem. Consider
The function is effectively pure, though it contains impure code, but this code cannot be reached. Now suppose
false
is replaced byg(i)
but we know quite sure that g(i) is false (for example, g might check if its argument is a Lychrel number). To prove that isthispure is indeed pure, the compiler would have to prove that no Lychrel numbers exist.(I admit that this is a quite theoretical consideration. One could also decide that if a function contains any impure code, it is itself impure. But this is not justified by the C type system, IMHO.)
Determining if a function is pure (even in the limited sense used by GCC) is equivalent to the halting problem, so the answer is "not for arbitrary functions." It is possible to automatically detect that some functions are pure, others are not pure, and flag the rest as "unknown", which allows for automatic parallelization in some cases.
In my experience, even programmers aren't very good at figuring out such things, so I want the type system to help keep track of it for me, not just for the optimizer.
I discovered while writing an article comparing C# and C++ performance that Visual C++ can indeed detect a pure function of moderate complexity, while calling a function that computed a polynomial.
I called the polynomial function in a loop to eat up time on the clock. The compiler optimized the call to run once before the loop started and re-use the result within the loop. To do that it would have to know that there are no side-effects.
I must say though, is nice to be able to guarantee that the compiler can do an optimization, by marking a function as pure, and it serves as a form of documentation, too.