Can someone help me understand the difference between a branch, a fork and a clone in Git?
Similarly, what does it mean when I do a git fetch
as opposed to a git pull
?
Also, what does rebase
mean in comparison to merge
?
How can I squash individual commits themselves together?
How are they used, why are they used and what do they represent?
How does GitHub figure in?
Just to add to others, a note specific to forking.
It's good to realize that technically, cloning the repo and forking the repo are the same thing. Do:
and you can tap yourself on the back---you have just forked some other repo.
Git, as a VCS, is in fact all about
cloningforking. Apart from "just browsing" using remote UI such as cgit, there is very little to do with git repo that does not involveforkingcloning the repo at some point.However,
when someone says I forked repo X, they mean that they have created a clone of the repo somewhere else with intention to expose it to others, for example to show some experiments, or to apply different access control mechanism (eg. to allow people without Github access but with company internal account to collaborate).
Facts that: the repo is most probably created with other command than
git clone
, that it's most probably hosted somewhere on a server as opposed to somebody's laptop, and most probably has slightly different format (it's a "bare repo", ie. without working tree) are all just technical details.The fact that it will most probably contain different set of branches, tags or commits is most probably the reason why they did it in the first place.
(What Github does when you click "fork", is just cloning with added sugar: it clones the repo for you, puts it under your account, records the "forked from" somewhere, adds remote named "upstream", and most importantly, plays the nice animation.)
When someone says I cloned repo X, they mean that they have created a clone of the repo locally on their laptop or desktop with intention study it, play with it, contribute to it, or build something from source code in it.
The beauty of Git is that it makes this all perfectly fit together: all these repos share the common part of
blockcommit chain so it's possible to safely (see note below) merge changes back and forth between all these repos as you see fit.Note: "safely" as long as you don't rewrite the common part of the chain, and as long as the changes are not conflicting.
Fork Vs. Clone - two words that both mean copy
Please see this diagram. (Originally from http://www.dataschool.io/content/images/2014/Mar/github1.png).
Fork
Clone
My answer includes github as many folks have asked about that too.
Local Repositories
git (locally) has a directory (.git) which you commit your files to and this is your 'local repository'. This is different from systems like svn where you add and commit to the remote repository immediately.
git stores each version of a file that changes by saving the entire file. It is also different from svn in this respect as you could go to any individual version without 'recreating' it through delta changes.
git doesn't 'lock' files at all and thus avoid the 'exclusive lock' functionality for an edit (older systems like pvcs come to mind), so all files can always be edited, even when off-line. It actually does an amazing job of merging file changes (within the same file!) together during pulls or fetches/pushes to a remote repository such as github. The only time you need to do manual changes (actually editing a file) is if two changes involve the same line(s) of code.
Branches
Branches allow you to preserve the main code (the 'master' branch), make a copy (a new branch) and then work within that new branch. If the work takes a while or master gets a lot of updates since the branch was made then merging or rebasing (often preferred for better history and easier to resolve conflicts) against the master branch should be done. When you've finished, you merge the changes made in the branch back in to the master repository. Many organizations use branches for each piece of work whether it is a feature, bug or chore item. Other organizations only use branches for major changes such as version upgrades. Fork: With a branch you control and manage the branch, whereas with a fork someone else controls accepting the code back in.
Broadly speaking there are two main approaches to doing branches. The first is to keep most changes on the master branch, only using branches for larger and longer-running things like version changes where you want to have two branches available for different needs. The second is whereby you basically make a branch for every feature request, bug fix or chore and then manually decide when to actually merge those branches into the main master branch. Though this sounds tedious, this is a common approach and is the one that I currently use and recommend because this keeps the master branch cleaner and it's the master that we promote to production, so we only want completed, tested code, via the rebasing and merging of branches.
The standard way to bring a branch "in" to master is to do a
merge
. Branches can also berebase
d to 'clean up' history. It doesn't affect the current state and is done to give a 'cleaner' history. Basically the idea is that you branched from a certain point (usually from master). Since you branched 'master' itself has moved forward. So it would be cleaner if all the changed you have done in a branch are played against the most recent master with all its changes. So the process is: save the changes; get the "new" master, and then reapply the changes again against that. Be aware that rebase, just like merge, can result in conflicts that you have to manually resolve (edit).One 'guideline' to note: Only rebase if the branch is local and you haven't pushed it to remote yet! This is mainly because rebasing can alter the history that other people see which may include their own commits.
Tracking branches
These are the branches that are named origin/branch_name (as opposed to just branch_name). When you are pushing and pulling the code to/from remote repositories this is actually the mechanism through which that happens. For example when you
git push
a branch called 'building_groups', your branch goes first to origin/building_groups and then that goes to the remote repository (actually that's an over-simplification but good enough for now). Similarly if you do agit fetch building_groups
the file that is retrieved is placed in your origin/building_groups branch. You can then choose to merge this branch into your local copy. Our practice is to always do a git fetch and a manual merge rather than just a git pull (which does both of the above in one step).Fetch
ing new branches.Getting new branches: At the initial point of a clone you will have all the branches. However, if other developers add branches and push them to the remote there needs to be a way to 'know' about those branches and their names in order to be able to pull them down locally. This is done via a
git fetch
which will get all new and changed branches into the locally repository using the tracking branches (e.g. origin/). Oncefetch
ed, one cangit branch --remote
to list the tracking branches andgit checkout [branch]
to actually switch to any given one.Merging
Merging is the process of combining code changes from different branches, or from different versions of the same branch (for example when a local branch and remote are out of sync.). If one has developed work in a branch and the work is complete, ready and tested, then it can be merged into the
master
branch. This is done bygit checkout master
to switch to themaster
branch, thengit merge your_branch
. The merge will bring all the different files and even different changes to the same files together. This means that it will actually change the code inside files to merge all the changes. When doing thecheckout
ofmaster
it's also recommended to do agit pull origin master
to get the very latest version of the remote master merged into your local master. If the remote master changed, i.e.moved forward
, you will see information that reflects that during thatgit pull
. If that is the case (master changed) you are advised togit checkout your_branch
and thenrebase
it to master so that your changes actually get "replayed" on top of the "new" master. Then you would continue with getting master up-to-date as shown in the next paragraph.If there are no conflicts then master will have the new changes added in. If there are conflicts, this means that the same files have changes around similar lines of code that it cannot automatically merge. In this case
git merge new_branch
will report that there's conflict(s) to resolve. You 'resolve' them by editing the files (which will have both changes in them), selecting the changes you want, literally deleting the lines of the changes you don't want and then saving the file. The changes are marked with separators such as========
and<<<<<<<<
Once you have resolved any conflicts you will once again
git add
andgit commit
those changes to continue the merge (you'll get feedback from git during this process to guide you). When the process doesn't work well you will find thatgit merge --abort
is very handy to reset things.Interactive rebasing and squashing / reordering / removing commits
If you have done work in a lot of small steps, e.g. you commit code as 'work-in-progress' every day, you may want to "squash" those many small commits into a few larger commits. This can be particularly useful when you want to do code reviews with colleagues. You don't want to replay all the 'steps' you took (via commits), you want to just say here is the end effect (diff) of all of my changes for this work in one commit. The key factor to evaluate when considering whether to do this is whether the multiple commits are against the same file or files more than once (better to squash commits in that case). This is done with the interactive rebasing tool. This tool lets you squash commits, delete commits, reword messages, etc. For example
git rebase -i HEAD~10
Note that's a~
NOT a-
brings up the following:Be careful though and use this tool 'gingerly'. Do one squash/delete/reorder at a time, exit and save that commit, then reenter the tool. If commits are not contiguous you can reorder them (and then squash as needed). You can actually delete commits here too but you really need to be sure of what you are doing when you do that!
Forks
There are two main approaches to collaboration in git repositories. The first, detailed above is directly via branches that people pull and push from/to. These collaborators have their ssh keys registered with the remote repository. This will let them push directly to that repository. The downside is that you have to maintain the list of users. The other approach - forking - allows anybody to 'fork' the repository, basically making a local copy in their own git repository account. They can then make changes and when finished send a 'pull request' (really it's more of a 'push' from them and a 'pull' request for the actual repository maintainer) to get the code accepted.
This second method, using forks, does not require someone to maintain a list of users for the repository.
When you 'fork' - in the github web browser gui you can click on - you create a copy ('clone') of the code in your github account. It can be a little subtle first time you do it, so keep making sure you look at whose repository a code base is listed under - either the original owner or 'forked from' and you, e.g.
Once you have the local copy, you can make changes as you wish (by pulling and pushing them to a local machine). When you are done then you submit a 'pull request' to the original repository owner/admin (sounds fancy but actually you just click on this:- )and they 'pull' it in.
More common for a team working on code together is to 'clone' the repository (click on the 'copy' icon on the repository's main screen). Then, locally type git clone [paste] This will set you up locally and you can also push and pull to the (shared) github location.
Clones
As indicated in the section on github, a clone is a copy of a repository. When you have a remote repository you issue the git clone command against its URL and you then end up with a local copy, or clone of the repository. This clone has everything, the files, the master branch, the other branches, all the existing commits, the whole shebang. It is this clone that you do your adds and commits against and then the remote repository itself is what you push those commits to. It's this local/remote concept that makes git (and systems similar to it such as Mercurial) a DVCS (Distributed Version Control System) as opposed to the more traditional CVS's (Code Versioning Systems) such as SVN, PVCS, CVS, etc. where you commit directly to the remote repository.
Visualization
Visualization of the core concepts can be seen at
http://marklodato.github.com/visual-git-guide/index-en.html and
http://ndpsoftware.com/git-cheatsheet.html#loc=index
If you want a visual display of how the changes are working, you can't beat the visual tool gitg (gitx for mac) with a gui that I call 'the subway map' (esp. London Underground), great for showing who did what, how things changes, diverged and merged, etc.
You can also use it to add, commit and manage your changes !
Although gitg/gitx is fairly minimal, in the last 2-3 years (2009-2012) the number of gui tools continues to expand. Many Mac users use brotherbard's fork of gitx and for Linux a great option is smart-git with an intuitive yet powerful interface:
Note that even with a gui tool, you will probably do a lot of commands at the command line.
For this I have the following aliases in my ~/.bash_aliases file (which is called from my ~/.bashrc file for each terminal session:
Finally, 6 key lifesavers:
1) You mess up your local branch and simply want to go back to what you had the last time you did a git pull:
2) You start making changes locally, you edit half a dozen files and then, oh crap, you're still in the master (or another) branch:
3) You mess up one particular file in your current branch and want to basically 'reset' that file (lose changes) to how it was the the last time you pulled it from the remote repository:
git checkout your/directories/filename
This actually resets the file (like many git commands it is not well named for what it is doing here).4) You make some changes locally, you want to make sure you don't lose them while you do a git reset or rebase: I often make a manual copy of the entire project (
cp -r ../my_project ~/
) when I am not sure if I might mess up in git or lose important changes.5) You are rebasing but things gets messed up:
6) Add your git branch to your PS1 prompt (see https://unix.stackexchange.com/a/127800/10043), e.g.
The branch is
selenium_rspec_conversion
Here is Oliver Steele's image of how it all fits together:
A clone is simply a copy of a repository. On the surface, its result is equivalent to
svn checkout
, where you download source code from some other repository. The difference between centralized VCS like Subversion and DVCSs like Git is that in Git, when you clone, you are actually copying the entire source repository, including all the history and branches. You now have a new repository on your machine and any commits you make go into that repository. Nobody will see any changes until you push those commits to another repository (or the original one) or until someone pulls commits from your repository, if it is publicly accessible.A branch is something that is within a repository. Conceptually, it represents a thread of development. You usually have a master branch, but you may also have a branch where you are working on some feature xyz, and another one to fix bug abc. When you have checked out a branch, any commits you make will stay on that branch and not be shared with other branches until you merge them with or rebase them onto the branch in question. Of course, Git seems a little weird when it comes to branches until you look at the underlying model of how branches are implemented. Rather than explain it myself (I've already said too much, methinks), I'll link to the "computer science" explanation of how Git models branches and commits, taken from the Git website:
http://eagain.net/articles/git-for-computer-scientists/
A fork isn't a Git concept really, it's more a political/social idea. That is, if some people aren't happy with the way a project is going, they can take the source code and work on it themselves separate from the original developers. That would be considered a fork. Git makes forking easy because everyone already has their own "master" copy of the source code, so it's as simple as cutting ties with the original project developers and doesn't require exporting history from a shared repository like you might have to do with SVN.
EDIT: since I was not aware of the modern definition of "fork" as used by sites such as GitHub, please take a look at the comments and also Michael Durrant's answer below mine for more information.