Charles Engelke's Blog

February 9, 2011

Source Control Basics, by Example

Filed under: Uncategorized — Charles Engelke @ 3:52 pm
Tags: , ,

Many non-developers understand the value of source code and realize that a source control system such as Subversion is extremely important, but don’t really understand how it should be used.  To a lot of people, it’s just a safe used to lock up this important asset.  But really, it’s a much more valuable tool than just a safe.  I’m going to try to describe how it can be used to aid release management, support, and maintenance of products by example.  These examples use Subversion, but the general principles apply to most source control systems.

Core principles

Subversion doesn’t manage each file, it works on an entire directory tree of a files at a time.  That’s a good match for source code.  If you start with an empty subversion repository, you can check it out to a working copy on your own computer, and then start adding your source files and directories to that working copy.

  • repository: the area on a Subversion server where every version of your source code directory tree is stored.
  • working copy: a local folder on your computer where the version of the source code you are working on is kept.

Whenever you want, you can commit your working copy to the repository.  In effect, Subversion stores a snapshot of your source code forever.  You can get a log showing every version that was ever committed, and you can check out a working copy of any version you want, at any time.

  • commit: make the Subversion server keep a snapshot of the source code that matches your current working copy.
  • check out: create a new working copy from any desired snapshot that Subversion has available.  Usually this is based on the latest snapshot, but doesn’t have to be.

Subversion simply numbers each version, or revision, sequentially, so you’ll see versions 1, 2, 3, and so on.  I recently noticed that one of our six year old projects is up to revision twelve thousand and something.  That means that on average, a new snapshot was saved once each business hour over the life of the project.

Before I move on, there are two more points to mention.  First, you don’t have to check out and commit the whole repository at a time.  You can work with any subdirectory you want.  That’s good for dividing up different kinds of work in a project that have little interaction, and it enables the management techniques I’ll be talking about in a minute.  Second, you can’t really commit “whenever you want”.  You can only commit if nobody else has changed the same files you changed since your last checkout.  Otherwise, you need to do another checkout first, and possibly manually resolve any conflicts between your changes and the other folks’ changes.  That sounds like a potential problem to a lot of people (including me) but in practice it works great.

Handling a Release

When you’re ready for a release, all you need to do is note the version number you’re building and packaging from.  That way, if you need to get that exact code back for support or maintenance, it’s extremely easy.  But it could be even easier.  Since you can work on subdirectories of your repository instead of the entire thing, just structure it a bit differently.  Don’t put your source code at the repository root, but in a subdirectory.  That subdirectory is conventionally called the trunk.  To do this, when you first create the repository immediately create a subdirectory called trunk.  Then instead of ever checking out the whole repository, just check out the trunk subdirectory.

The advantage of this is that you can now create a directory sibling to trunk, which will contain copies of all your releases.  By convention, this directory is called tags.  When you are ready to release your code, you copy the entire trunk directory tree to a new child of the tags directory.  Let’s say this release is going to be for 2.1beta2.  The your repository will look something like:

Repository
   |
   +--trunk
   |    |
   |    +--your latest source tree
   |
   +--tags
        |
        +--2.1beta2
              |
              +--snapshot of trunk contents at time of release

Don’t worry about the storage needed to keep this new copy.  Remember, Subversion already needs to keep track of every version of your source tree, and it’s smart enough to store this new “copy” of a snapshot using almost no actual storage.  But even if it needed to use up enough space for a whole new copy, it would be worth it.  Storage is plentiful, and anything that helps you manage the history of your product’s source is priceless.

  • trunk: the subdirectory of your repository containing the current version of your source code (and every prior version, too).
  • tags: the subdirectory that contains other subdirectories, each of which is a copy of a particular version of the trunk.  Each subdirectory should have a meaningful name, and should never be updated (Subversion allows you to check out and update tags, but you should not do it).

Software Maintenance

Everything up to now is useful, important, well-known and widely followed.  But the next step, using source control for more effective software maintenance, seems to be less used, even among seasoned developers I’ve observed.  That’s a shame, because it’s easy to do and a big win.

Suppose you released your software a few weeks ago, and now a user reports a bug.  How are you going to fix it?

You could use your current working copy of the trunk, find the problem, fix it, and then do a build and package from that working copy.  Wait!  You’re using tags now, so you create a new tag that’s a copy of the trunk, and then build and release from that tag.

What’s wrong with that?  Well, your new release doesn’t contain the fixed version of the old release, it contains a fixed version of your trunk.  And that trunk probably has had all sorts of changes made to it in the weeks following the release that contained the bug.  It probably has some new errors in it.  It may even have partially finished new functions and other changes in it.  Even if you work hard to make every build green (passing all tests), you are risking pushing out new errors as you fix the old one.

What you should do instead is make the fix to the exact code you released (which is available in the tag).  Then you’ll know that the only changes between the prior release and your new corrected release were those needed to repair the reported problems.  New functions, restructured code, and other changes that you need to be making in the trunk, won’t affect the bug fix release.

We want to keep each tag frozen, representing exactly what we released.  Sure, we could update it and remember to go back to the proper version when we need to, but its a lot easier to avoid problems if tags aren’t changed.  So we deal with maintenance using branches.  A branch is pretty much like a tag, except that it is generally a copy of a tag, not the trunk, and it is intended to change.

  • branch: the repository subdirectory that contains other subdirectories, each of which is a copy of a tag.  Each subdirectory will be updated as needed to make fixes in the release represented by the tag.

Specifically, you will create a subdirectory of the repository called branches, then copy the 2.1beta2 tag to a subdirectory of branches.  Say you call it 2.1beta2-maintenance.  Next, you will check out a working copy from that branch and do your programming work on it to fix the bug.  As you work on it you commit your changes, and when everything is ready, copy the latest version of the branch to a new tag, perhaps 2.1beta3 (or even 2.1beta2-patch1).  Build the new release from that tag and send it to your users.  You’ve fixed their bug with the least possible chance of creating new problems that didn’t already exist in their release.

Merging Fixes

There’s just one big problem.  The next time you do a new feature release, from a tag copied from the trunk, your fix won’t be in it.  You did all the work on a branch, instead.

Subversion (and other, similar tools) make it easy to solve this problem, too.  You can get a report showing every single change you made on the branch, and then use that report to make the same changes to the trunk.  In fact, Subversion can even make the same changes for you.  This isn’t just copying the changed files from the branch to the trunk, because each of them may have been changed in other ways while you were working on the branch.  This is just looking at what was changed in the branch (delete these lines, add these others) and making the same changes to the trunk.  With luck, the trunk hasn’t diverged so much that the same changes won’t fix the problem there, too.  But if it has, so what?  You’re a developer, and using your head to figure out how to make the same effective changes without messing other things up is one of the things you’re being paid for.

Some people really worry a lot about the potential for duplication of effort in making a fix on a branch and then having to recreate the same fix on the trunk.  But in reality, this rarely requires any thought at all; the automated tools handle it perfectly.  And when they don’t, it’s still just not very hard to do it in both places.  This approach to branching and merging works much better than making the whole team roll back their work in progress, or freezing their changes, while you make a fix.  And it’s one of the biggest wins in using source control.

Summary

Source control tools like Subversion help you keep on top of exactly what source code went in each and every release.  Used properly, they also give you a way to do maintenance fixes with the least possible risk of new problems or errors creeping in.  They cost little or nothing to buy, and require very little effort to run, support and use them.  There are a lot of other ways they help developers, too (comments on the reason for each revision, seeing what was changed at the same time, and knowing who did what if you have a question).  For a manager who wants to know how the team can deal with fixes for multiple releases in an efficient and safe way, understanding tagging, branching and merging as described here are essential.

Blog at WordPress.com.