What's a good way to organize a large collecti

2019-02-05 09:54发布

问题:

I have a large collection of my personal scripts that I would like to start versioning using Git. I've previously organized my code as follows:

~/code/python/projects/ (for large stuff, each project contained in an individual folder)
~/code/python/scripts/ (single file scripts all contained in this directory)
~/code/python/sandbox/ (my testing area)
~/code/python/docs/ (downloaded documentation)

~/code/java/... (as above)

Now i'm going to start versioning my code using git, so that I can have history and backup all my code to a remote server.

I know if I were using SVN I would just keep my entire "~/code/" directory in a large repository, but I understand this is not a good way to do things with Git.
Most info I've seen online suggests keeping all my project folders in a single place (as in, no separate directories for python or java) with each project containing it's own git repository, and simply having a "snippets" directory containing all single-file scripts/experiments that can be converted into projects at a later date.

But I'm not sure how I feel about consolidating all of my code directories into one area. Is there a good way to keep my separate code directories intact, or is it not worth the effort? Maybe I'm just attached to the separate code directories because I've never known anything else...

Also (as a side note), I'd like to quickly be able to see a chronological history of all my projects and scripts. So I can see which projects I created most recently. I used to do this by keeping a number at the beginning of all my projects, 002project, 003project.
Is there automatic or easy way to do this in git without having to add a number to all of the project names?

I'm open to any practical or philosophical code organizing advice you have. Thanks!!!

回答1:

I know if I were using SVN I would just keep my entire "~/code/" directory in a large repository, but I understand this is not a good way to do things with Git.

The reason git dissuade people from having single, monolithic repositories is you cannot clone sub directories of a repository (like you can with SVN)

Say you have git://blah/somecorp_code.git which has millions of revisions, and is 15GB. If you just want a subdirectory of that code, tough - you either get all 15GB or nothing.

For personal code, this really isn't an issue - I have one "monolithic" git repository, which is about 20MB, and I can happily have it cloned on all the machines I wish to use it on.

No one else uses it, no one else commits, and I rarely do much in the way of branching. It's really just use it a fancy-undo-system with nice syncing and remote backup (private GitHub project)

I organised it as follows:

In the root level of the repository, I have a code folder (along with a sites folder, for web-dev stuff - this is why the repository is 20MB)

In the code folder, I have folders for various languages (python, ruby, c etc)

In each language directory, I have two folders, snippets and projects. Inside snippets is a bunch of files, inside projects is a series of folders.

These projects are random things I've written, but don't really work on much (toy projects, "I wonder if I could..."-projects etc)

If it's a single Python file, it goes in code/python/snippets/, if it's more than one file it goes in code/python/projects/{project name}

When I want to publicly release a project (on Github, usually), I create a new repository, copy the code to this and sync it with Github.

The separate "active project" repository is now unrelated to the monolithic repo. I looked into the submodule project, but it is not intended for this usage - it's designed to make cloning dependencies easy, not manage a series of unrelated repositories

I do have a script that uses the Github API to automatically clone all my projects locally, or update them with git pull - it's just self-contained version of githubsync.py (I merged github.py into the same file). It can be found here as gist/373731

I used githubsync.py to clone my projects to my laptop and desktop initially, and also routinely run it inside Dropbox, as a backup.



回答2:

I know if I were using SVN I would just keep my entire "~/code/" directory in a large repository, but I understand this is not a good way to do things with Git.

Yes it is.
But once you have that large repository, you have to distinguish the parts in it which will evolve with their own lifecycle and their own tag.
Those would be submodules that will be, as you said, a git repo of their own.

So you still get:

code
  .git (main project)
  python
    .git (main sub-project for all python-related stuff)
    project1 
      .git (first submodule)
    project2
      .git (first submodule)
    ...
    scripts
      .git (one submodules for all your scripts)
    sandbox
      .git (sandbox submodule)
    docs
      .git (docs submodule)
  java
    .git (main sub-project for all java-related stuff)
    ... (repeat same organization)

Note: the chronology of projects creation is still better managed with a naming convention.

With that many submodules, you can:

  • actually clone and work on any part of your collection without necessarily get everything
  • or you can re-built the same old organization you had in the first place