I have a large collection of my personal scripts that I would like to start versioning using Git. I've previously organized my code as follows:
~/code/python/projects/ (for large stuff, each project contained in an individual folder)
~/code/python/scripts/ (single file scripts all contained in this directory)
~/code/python/sandbox/ (my testing area)
~/code/python/docs/ (downloaded documentation)
~/code/java/... (as above)
Now i'm going to start versioning my code using git, so that I can have history and backup all my code to a remote server.
I know if I were using SVN I would just keep my entire "~/code/
" directory in a large repository, but I understand this is not a good way to do things with Git.
Most info I've seen online suggests keeping all my project folders in a single place (as in, no separate directories for python or java) with each project containing it's own git repository, and simply having a "snippets" directory containing all single-file scripts/experiments that can be converted into projects at a later date.
But I'm not sure how I feel about consolidating all of my code directories into one area. Is there a good way to keep my separate code directories intact, or is it not worth the effort? Maybe I'm just attached to the separate code directories because I've never known anything else...
Also (as a side note), I'd like to quickly be able to see a chronological history of all my projects and scripts. So I can see which projects I created most recently. I used to do this by keeping a number at the beginning of all my projects, 002project
, 003project
.
Is there automatic or easy way to do this in git without having to add a number to all of the project names?
I'm open to any practical or philosophical code organizing advice you have. Thanks!!!
The reason git dissuade people from having single, monolithic repositories is you cannot clone sub directories of a repository (like you can with SVN)
Say you have
git://blah/somecorp_code.git
which has millions of revisions, and is 15GB. If you just want a subdirectory of that code, tough - you either get all 15GB or nothing.For personal code, this really isn't an issue - I have one "monolithic" git repository, which is about 20MB, and I can happily have it cloned on all the machines I wish to use it on.
No one else uses it, no one else commits, and I rarely do much in the way of branching. It's really just use it a fancy-undo-system with nice syncing and remote backup (private GitHub project)
I organised it as follows:
In the root level of the repository, I have a
code
folder (along with asites
folder, for web-dev stuff - this is why the repository is 20MB)In the code folder, I have folders for various languages (
python
,ruby
,c
etc)In each language directory, I have two folders,
snippets
andprojects
. Inside snippets is a bunch of files, inside projects is a series of folders.These projects are random things I've written, but don't really work on much (toy projects, "I wonder if I could..."-projects etc)
If it's a single Python file, it goes in
code/python/snippets/
, if it's more than one file it goes incode/python/projects/{project name}
When I want to publicly release a project (on Github, usually), I create a new repository, copy the code to this and sync it with Github.
The separate "active project" repository is now unrelated to the monolithic repo. I looked into the submodule project, but it is not intended for this usage - it's designed to make cloning dependencies easy, not manage a series of unrelated repositories
I do have a script that uses the Github API to automatically clone all my projects locally, or update them with
git pull
- it's just self-contained version of githubsync.py (I merged github.py into the same file). It can be found here as gist/373731I used githubsync.py to clone my projects to my laptop and desktop initially, and also routinely run it inside Dropbox, as a backup.
Yes it is.
But once you have that large repository, you have to distinguish the parts in it which will evolve with their own lifecycle and their own tag.
Those would be submodules that will be, as you said, a git repo of their own.
So you still get:
Note: the chronology of projects creation is still better managed with a naming convention.
With that many submodules, you can: