Is there an effort to develop build-oriented file

2019-07-07 10:39发布

问题:

I recently started to use Git. One of the interesting features I discovered was the use of hashes to quickly detect changes.

On the other hand, I see that build tools (like make, ant, javac, etc.) tries to detect changes in source files by checking file's timestamp.

The problems in this approach are:

  1. If you work on more than one machine, you have to make sure all clocks are in sync, otherwise, a new file may be considered unchanged because the other machine's clock gave it timestamp of the past relative to the build machine.
  2. On a big project, you have to scan all files' timestamp in order to detect a change.

I wonder if someone has already taken the Git approach in order to deal with these issues:

  1. Each file has a unique hash, depending on its content, not timestamp.
  2. Each directory also has its hash, depending on the files in the directory and their hashes.
  3. Even the simple change deeply inside the source tree causes the root directory to have a different hash due to the above rules

Such a mechanism could help making build tools much faster, because detecting a change in source tree is a simple operation of hash comparison. If the hash of source tree root directory has changed, it means that a change occurred deeper in the source tree, so continue to scan the tree recursively for changes - exactly as Git does to detect changes.

It doesn't necessarily mean that this source tree has to be managed by Git. My idea is that the file system would automatically provide file's hash code as one of its attributes / metadata, so the build tool could rely on this instead of on timestamp. And in addition, directory hash would automatically reflect the state of the file in it.

I already read a little bit about Sun's ZFS, but I am not sure it's a complete solution to make builds faster.

What do you think about this idea? Is there already such file system? Is there already such build tool?

回答1:

I'll argue that what you're trying to solve is actually a non-issue:

The clock skew problem can be mostly avoided by using NTP.

Certainly it'd be nice to have clock skew issues eliminated entirely, but we can probably agree that throwing a fairly complex content-tracking system at the problem is overkill.

Regarding performance, scanning the entire tree tends to not be a problem in practice. stat is ridiculously fast (so long as you're not on Windows) -- ls -lR > /dev/null over the entire Linux kernel tree (38k files) takes 350 ms on my system.

In fact, if stat'ing all your files is a problem, then your version control system will become slow, and that will be a much bigger problem than your build performance. Every git status or git diff, for instance, stats all files in your working copy to check their mtimes, so you'd better hope that's fast.

So if you're looking to speed up make, don't look at the file system; it's most likely insignificant compared to whatever is actually eating up your build time.

Hope that eases your mind!