Quantcast
Viewing all articles
Browse latest Browse all 7

git: No bad puns in title

Regular readers will already know that my current employer recently switched from Subversion to Git for version control. Many people are making such a switch, and Git is definitely the “cool new kid” in town, to the point that there’s a bit of animosity between the Subversion and Git contingents. (OK, Linus calling Subversion users “stupid” — right in front of at least a few Subversion developers — probably didn’t help…) However, I didn’t recommend Git because it was new and cool, and no one should switch to it for that reason. Obviously, I am not a fan of Subversion, but Git was one of many competitors, and I selected it because of its strengths and in spite of its flaws. So it seems reasonable to have an unbiased discussion of those strengths and weaknesses.

If you look at my posts on selecting a VCS, you will notice that my number one criterion is data safety. A version control system should never, in and of itself, cause me data loss. Yes, hard disk failure happens, but hopefully the VCS doesn’t munge data on its own, and if data are corrupted by cosmic rays or something, hopefully the VCS can detect that and provide a way to recover. Git is outstanding at this. First, each object it stores has a SHA1 sum recorded and if the object and SHA1 don’t match, Git complains. Loudly. Second, the distributed nature of Git and its human-comprehensible (well, mostly) data format means that you can very likely recover the corrupted object from elsewhere and simply plug it back in to the damaged repository. (It’s a little harder, but not much, if the data have already been packed.) Third, backing up Git repos is a piece of cake compared to some other VCSs.

My second highest criterion is good branching and merging. Git was designed around a branching and merging model and it shows in everything it does. It has made my developers’ lives much, much easier. Git does have a few quirks even in this area of strength, however. I find it difficult sometimes to figure out the relationships between branches and when they split off and merged together again. GitX (or gitk) helps a lot in this regard, but I’m still struggling to find a really good way to determine (programmatically) if two branches have been properly merged. (I have a serious use case for this.) Some people also find Git’s approach to complex merges problematical. Instead of trying to sort out a particularly twisty conflict, Git just throws up its hands and says, “Here, you’re the human, you figure it out.”*

Git can be managed to support just about any workflow you choose, even one based on mailing patches around. It was designed from the start to be a distributed system, so obviously it will support distributed workflows well, but this can cause some confusion about whether or not it can be used in a centralized fashion and how best to configure this. (To the point that I’ve seen several questions along the lines of, “If I use Git with a central, master repository, am I misusing it?”)

This flexibility is both Git’s greatest strength and its greatest weakness: Fundamentally, Git is not only a VCS but a collection of related tools that you can use to make a VCS. Like Unix, it gives you tremendous flexibility; also like Unix, that means that the learning curve is a little steep, especially when you want to step outside the norm. There are a huge number of git subcommands just within the porcelain (standard user commands), never mind the plumbing (underlying implementation); moreover, every command has a plethora of options. Personally, I found Scott Chacon’s Pro Git and and Jon Loeliger’s Version Control with Git invaluable in learning how to use it.

This level of complexity leads many to think it’s harder than it really is; for most everyday use, it’s actually really simple, but it looks intimidating. I find that the biggest hurdle most developers have to overcome is the notion that committing something does not make it available to everyone else. The extra “push” that’s required takes some getting used to. The other area that generally has really confused my developers is remote vs. local branches. It took me a while to wrap my head around that as well, and to find a good way to explain it in training. It’s very important with Git to have a good understanding of the relationships among different repositories.

One area where I give Git a mixed report card is documentation. The Git man pages vary from incredibly comprehensive to incredibly obscure. On the other hand, there are a couple of very good books (see above) on the subject, and numerous websites. A lot of people know Git really well by now and many of them are very willing to help out on StackOverflow or the Git IRC channel (#git on freenode).

Git punts on user management. Apart from allowing a name and email address to be associated with a commit, Git relies solely on the underlying file system for access control for a repository. Of course, there are numerous options available for setting up user management and serving repositories, from gitosis and its successor gitolite to web-based services like GitHub and Gitorious. Two other areas where Git is still a bit rough around the edges are subproject support (submodules are incredibly awkward) and support for big binary files.

Overall, I like Git as well as anything I’ve ever used, and better than a lot. If you do move to Git, though, do it because it’s the best tool for your situation, not because it’s the cool thing to do.


* There’s an awesome conversation between Linus Torvalds and Bram Cohen about this subject; I come down firmly on Linus’s side in this debate. Cohen is a really smart guy, and much of his thinking about merge algorithms is interesting and useful, but auto-merging in a VCS has to be right. If a mistake is made, I want it made by a developer, not by the algorithm.


Viewing all articles
Browse latest Browse all 7

Trending Articles