I just saw the first Git tutorial at http://blip.tv/play/Aeu2CAI.
How does Git store all the versions of all the files, and how can it still be more economical in space than Subversion which saves only the latest version of the code?
I know this can be done using compression, but that would be at the cost of speed, but this also says that Git is much faster (though where it gains the maximum is the fact that most of its operations are offline).
So, my guess is that
- Git compresses data extensively
- It is still faster because
uncompression + work
is still faster thannetwork_fetch + work
Am I correct? Even close?
Not a complete answer, but those comments (from AlBlue) might help on the space management aspect of the question:
As for the speed aspect, I mentioned it in my "How fast is git over subversion with remote operations?" answer (like Linus said in its Google presentation: (paraphrasing here) "anything involving network will just kill the performances")
And the GitBenchmark document mentioned by Jakub Narębski is a good addition, even though it doesn't deal directly with Subversion.
It does list the kind of operation you need to monitor on a DVCS performance-wise.
Other Git benchmarks are mentioned in this SO question.
I assume you are asking how it is possible for a git clone (full repository + checkout) to be smaller than checked-out sources in Subversion. Or did you mean something else?
This question is answered in the comments
Repository size
First you should take into account that along checkout (working version) Subversion stores pristine copy (last version) in those
.svn
subdirectories. Pristine copy is stored uncompressed in Subversion.Second, git uses the following techniques to make repository smaller:
Performance (speed of operations)
First, any operation that involves network would be much slower than a local operation. Therefore for example comparing current state of working area with some other version, or getting a log (a history), which in Subversion involves network connection and network transfer, and in Git is a local operation, would of course be much slower in Subversion than in Git. BTW. this is the difference between centralized version control systems (using client-server workflow) and distributed version control systems (using peer-to-peer workflow), not only between Subversion and Git.
Second, if I understand it correctly, nowadays the limitation is not CPU but IO (disk access). Therefore it is possible that the gain from having to read less data from disk because of compression (and being able to mmap it in memory) overcomes the loss from having to decompress data.
Third, Git was designed with performance in mind (see e.g. GitHistory page on Git Wiki):
core.trustctime
config variable).pack.depth
, which defaults to 50. Git has delta cache to speed up access. There is (generated) packfile index for fast access to objects in packfile.git log
" as fast as possible, and you see it almost immediately, even if generating full history would take more time; it doesn't wait for full history to be generated before displaying it.I am not a Git hacker, and I probably missed some techniques and tricks that Git uses for better performance. Note however that Git heavily uses POSIX (like memory mapped files) for that, so the gain might be not as large on MS Windows.