Source Control Branching and Merging
It is something that is so common that it's unsaid most of the time, but the entire professional software world relies on Source Control. When I was a young newbie developer, I didn't even know what Source Control was. It's easy for us to forget those old days and look at new developers when they ask us questions and not mention anything about how they should be using some form of Source Control aka version control.
There are a lot of different version control systems out there these days. There's even heated debates about which is best. The point of this article is not to tout one or the other, but I will anyway. Git is awesome, but their Visual Studio plugin sucks! Team Foundation Server integrates perfectly with VS, but tries to do to much, having an entire Agile system built in and fails as a version control system. Everything else is not as good at Git and also has crappy VS integration.
Seriously though, this article is about Branching and Merging. Do your research. With the exception of Git, best practices with any version control system are "branch rarely". In fact, at many companies the only "branches" you see are just markers for production release versions of software. It can be entertaining when they then bug fix this branch and never check the code into the trunk, but I digress.
As a result of a consistent distaste for branching (mostly because of merge conflicts and slow merge processes) there are a lot of developers out there who don't understand the concept of branching at all, or if they do, they understand it vaguely in the form of "I work in this branch, then my Team Lead pushes my code to production somehow". In the interest of dissemination of knowledge, here is how I understand branching and merging.
That living code base is simply the folder you work in when you write or change code files. As you know already from using Source Control, you then check in any changes you make, and you can bounce around that line to go back in time to old versions of files in case you screwed something up. This is the Source Control every professional developer works with daily and it's enough to get by at most companies.
That living code base can be considered the Trunk, or Master code base. Branching this code is a matter of taking a copy of that folder at a particular time (or version of the files) and starting a new living code base with that new copy.
As you can see, in the diagram, it looks kind of like a tree branch from the Trunk and that's a reference to the point in time at which the code was split. The reason we need to know when they split is because some developers may continue working in the trunk while you're working in the branch, so we need to be able to remember all the changes they've made while you were working in the branch to ensure a smooth merge.
When you merge code, you take the two living code bases and merge the code together. In the image above, all of the changes in the branch had to be re-integrated into the trunk. In this example, I actually stop all work in the branch and put all of the code into the trunk, essentially deleting the branch after the merge.
Merging is the real power of source control. Instead of overwriting the files that exist, it is able to execute each of the changes made to the files as if it were a developer. Merging, however, is the main reason that most developers hate branching. In every Source Control system except Git (and Mercurial, I know), it's an extremely intense process which takes a long time. Not only that, but it's during a merge that you find out that a developer working in the trunk made changes to the same file as a developer working in the branch, and you encounter conflicts, which have to be resolved (much like when two developers try to check-out and edit the same file).
Courtesy of Git
As you can see from this image, you're not limited to one branch, or even one merge. In a good version control system you can branch as many times as you like, and merges should be efficient. This allows for code flexibility.Additionally, while the image above only shows merges in one direction (toward the master) since each living code base is treated the same by the version control system, we are able to merge code from the master/trunk into the branches (for example code fixes) and we can of course branch out for releases. I'm particularly fond of this image, courtesy of nvie even though it's complex.
Notice how he has a few branches that are open all the time. The master branch, which has tagged production builds in it for the exact versions that were pushed out to production. He has a branch explicitly for bugfixes to the master, a branch for staging releases, the developer code branch, and then he branches for each feature. I like this model, because it keeps the master clean, and it allows flexibility to work with different branches and experiment without negatively affecting other developers. In any case, if you understand what's going on there, then you've mastered what it means to branch and merge in Source Control. Take care.
There are a lot of different version control systems out there these days. There's even heated debates about which is best. The point of this article is not to tout one or the other, but I will anyway. Git is awesome, but their Visual Studio plugin sucks! Team Foundation Server integrates perfectly with VS, but tries to do to much, having an entire Agile system built in and fails as a version control system. Everything else is not as good at Git and also has crappy VS integration.
Seriously though, this article is about Branching and Merging. Do your research. With the exception of Git, best practices with any version control system are "branch rarely". In fact, at many companies the only "branches" you see are just markers for production release versions of software. It can be entertaining when they then bug fix this branch and never check the code into the trunk, but I digress.
As a result of a consistent distaste for branching (mostly because of merge conflicts and slow merge processes) there are a lot of developers out there who don't understand the concept of branching at all, or if they do, they understand it vaguely in the form of "I work in this branch, then my Team Lead pushes my code to production somehow". In the interest of dissemination of knowledge, here is how I understand branching and merging.
That living code base is simply the folder you work in when you write or change code files. As you know already from using Source Control, you then check in any changes you make, and you can bounce around that line to go back in time to old versions of files in case you screwed something up. This is the Source Control every professional developer works with daily and it's enough to get by at most companies.
That living code base can be considered the Trunk, or Master code base. Branching this code is a matter of taking a copy of that folder at a particular time (or version of the files) and starting a new living code base with that new copy.
As you can see, in the diagram, it looks kind of like a tree branch from the Trunk and that's a reference to the point in time at which the code was split. The reason we need to know when they split is because some developers may continue working in the trunk while you're working in the branch, so we need to be able to remember all the changes they've made while you were working in the branch to ensure a smooth merge.
When you merge code, you take the two living code bases and merge the code together. In the image above, all of the changes in the branch had to be re-integrated into the trunk. In this example, I actually stop all work in the branch and put all of the code into the trunk, essentially deleting the branch after the merge.
Merging is the real power of source control. Instead of overwriting the files that exist, it is able to execute each of the changes made to the files as if it were a developer. Merging, however, is the main reason that most developers hate branching. In every Source Control system except Git (and Mercurial, I know), it's an extremely intense process which takes a long time. Not only that, but it's during a merge that you find out that a developer working in the trunk made changes to the same file as a developer working in the branch, and you encounter conflicts, which have to be resolved (much like when two developers try to check-out and edit the same file).
Courtesy of Git
As you can see from this image, you're not limited to one branch, or even one merge. In a good version control system you can branch as many times as you like, and merges should be efficient. This allows for code flexibility.Additionally, while the image above only shows merges in one direction (toward the master) since each living code base is treated the same by the version control system, we are able to merge code from the master/trunk into the branches (for example code fixes) and we can of course branch out for releases. I'm particularly fond of this image, courtesy of nvie even though it's complex.
Notice how he has a few branches that are open all the time. The master branch, which has tagged production builds in it for the exact versions that were pushed out to production. He has a branch explicitly for bugfixes to the master, a branch for staging releases, the developer code branch, and then he branches for each feature. I like this model, because it keeps the master clean, and it allows flexibility to work with different branches and experiment without negatively affecting other developers. In any case, if you understand what's going on there, then you've mastered what it means to branch and merge in Source Control. Take care.
Comments
Post a Comment