I'm trying to figure out how to get a listing of all the files and their SHA1s in a remote Git repository.
There is a way to do this from a local repo who's command is:
git ls-files * -s
Which returns the following (in an example):
100644 1fd148918032743b3b79db573c63a5d453089808 0 2.txt
100644 ff804781c474a06bd055995e48c30799bc6ab65a 0 README
But the catch here is that you have to perform a full clone and pull all the information down ahead of time. This doesn't work on a bare clone of a remote repository.
Any clue?
So the answer is the following (that i've figured out):
These steps assume your git repo is set up with HTTP access with a update-server-info command set up as a post-receive hook (and possibly other things, as i'm using a git repo set up by github.com). HTTP can also be HTTPS.
HTTP GET /info/refs
This file will contain something like:
4462ced0a4be2135c009ba6224c2191c7a3f844a refs/heads/master
HTTP GET /objects/44/62ed0a4be2135c009ba6224c2191c7a3f844a
Decompress this file using zlib.
This file will contain something like:
commit 219
tree 0d4f34f97d76e54666751a850e9300e8b23c1adb
parent fca1c898e2b4a43c66f211bd3547dc301511721d
author yourname <yourname@email.com> 1295905469 -0800
committer yourname <yourname@email.com> 1295905469 -0800
added a/a.txt.
Take the tree SHA1 at the top.
HTTP GET /objects/0d/4f34f97d76e54666751a850e9300e8b23c1adb
Decompress this file with zlib.
This file will contain something like:
Offset(h) 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
00000000 74 72 65 65 20 39 35 00 31 30 30 36 34 34 20 32 tree 95.100644 2
00000010 2E 74 78 74 00 1F D1 48 91 80 32 74 3B 3B 79 DB .txt..ÑH‘€2t;;yÛ
00000020 57 3C 63 A5 D4 53 08 98 08 31 30 30 36 34 34 20 W<c¥ÔS.˜.100644
00000030 52 45 41 44 4D 45 00 FF 80 47 81 C4 74 A0 6B D0 README.ÿ€G.Ät kÐ
00000040 55 99 5E 48 C3 07 99 BC 6A B6 5A 34 30 30 30 30 U™^HÃ.™¼j¶Z40000
00000050 20 61 00 1A 60 2D 9B D0 7C E5 27 2D DA A6 4E 21 a..`-›Ð|å'-Ú¦N!
00000060 DA 12 DB CA 2B 8C 9F Ú.ÛÊ+ŒŸ
This file's format is the following:
tree<space>##<NULL><object type id><space><filename><NULL><SHA1>
This pattern keeps repeating.
Take the first SHA1 from this example (1fd148918032743b3b79db573c63a5d453089808).
HTTP GET /objects/1f/d148918032743b3b79db573c63a5d453089808
Decompress this file with zlib.
This file will contain something like the following:
Offset(h) 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
00000000 62 6C 6F 62 20 36 00 61 64 73 66 0A 32 blob 6.adsf.2
And there you have the content of an individual file (who's path you've been keeping track of, and who's name you know from the tree listing previously). This file is prefixed with some meta-information, which in this case is:
blob 6<NULL><file content>
Note: If the file you want is in a subdirectory of the root of the repo, the entry in the tree object file will have an object type id of a tree (which appears in this case to be 040000). You can take the SHA1 of that tree object, HTTP GET that object, then decompress it, view the contents, and repeat that process until you drill down to the file you want, then get the file contents using it's SHA1 (like in the last step), and there you go.
I'm not totally sure if this is what you are looking for.. but to be able to get any information about a remote repository, you need to fetch from it. When you fetch from a remote repository, all information about its branches is downloaded to your local copy. As such, you can easily check out a remote branch (git checkout origin/master
) and use the ls-files
command.