Apr 20, 2009

What's the point of github?


While driving to Malmö last Friday to attend a tech talk on git by Sébastian Cevey and hosted by PurpleScout, I was trying to explain distributed source code management systems (like git) to a non-developer friend of mine. I very quickly found myself explaining much more about git than I realized I knew. And I found myself asking, and answering, what I think is a very interesting question: what is the point of github?

The situation is that git, and other distributed source code management systems, like bazaar and mercurial, appear to start from the philosophical position of giving complete control to the end user (in this case the developer). They are not centrally controlled systems, there is no central server, no 'little' boss to ask permission from for access to files, branches or projects. When you clone the repository, you get it all, with all history and everything. Power to the people!

This allows for highly flexible distributed teams, each working in their own way, as suites the developers themselves. It completely solves the usual problem found in central systems like CVS, SVN and, heaven forbid, Perforce: getting permission from a non-developer to do development.

So then, why does a site like www.github.org exist? It seems to imply adding back a central server to the de-centralized system. With a little thought, I realised what was going on. The problem had never been about central control, it was all about who has the control, and distributed systems actually do not remove the concept of central control at all. They just facilitate a situation where the right people are in control.

To explain this, I should re-describe what the original problem was. Consider CVS and SVN, arguably the industry standard(s). You have a central server with the code (and history and branches, etc.). Each user checks out a working copy of a branch of the code. After doing work, they commit back to that branch (dealing with conflicts and merges as needed). This implies a very particular workflow, and forces connectivity to the server for all major actions that require working with the code history (checkout, update, commit, branch, merge, etc.). And the mere existence of the central server implies the existence of IT and admin in the decision making loop, which can only hurt. Perforce, being more susceptible to the influence of IT on purchasing, took this one step further and required connectivity to the central server for almost any development activity, and, can you believe it, even requires developers to unlock each file they plan to work on! Can there be anything worse for developer productivity? Well, yes, anyone remember Microsoft's 'SourceSafe'?

What was the main problem here? It was not actually the central server, but rather it was a few things implied by this architecture:
  • The involvement of non-development staff in the smaller details of what the developer actually needs to do, which adds overhead to development activities, which means higher cost and less efficiency.
  • The implication of a specific workflow in the way the developers need to work with the code-base.
  • The need for regular or even continuous connectivity, which also has performance, efficiency and cost implications.
Distributed systems completely avoid all of this. Each developer has the complete history, and all branches, right there on his computer. They can do absolutely everything they want without asking anyone, and especially not asking people that don't know about software development. Maximum performance!

But at the end of the day, those developers need to get their code back to somebody in charge. There is always going to be one person or organization that actually sells the product, or distributes the product, or supports it. So, no matter how much power the developer thinks they have, the real world is still centrally controlled. But at least now the control is not micro-management. Now the control is closer to the real business, which is about getting good code to the right customers. Distributed source code management allows for this to be done most efficiently. The developers have all the power to do their job most efficiently, but with power comes responsibility and those same developer are now required to do all the merging back into the main code. How is this done without a central server? Easy, each developer simply publishes to their own public copy of the latest code-base. That public copy could even be a shared location on their own computer, accessible to the right people. Or, in the case of open source projects, it could be a world readable resource like github.

And that's the point of github! It is a convenient place for developers to publish their already merged work, for use by the central product distributor.

Not only is this a developers dream come true, but it is a software development companies dream come true. You don't have to manage the central server any more. You also don't have to do as much support merging other peoples code into your own, because you can push that responsibility back out to the developers, where it belongs.

I can't believe this was not done thirty years ago! Why is that? I have two theories:

Cobblers children - since both the customer and supplier are the same (the developer) for code management systems, perhaps it's a case of the cobblers children having the worst shoes. The developers simply work around bad code management systems, because they can.

Corporate control - if we look back at what I've said about the key differences between central and distributed systems, there seems to be a repeating theme regarding the involvement of non-developers, or company IT processes, in the way the older systems worked.

Having personally seen a lot of bad decision making by companies to increase their level of 'perception of control', I'm voting for the latter. (see my blog for more on this).

But those days are numbered! I think concepts like distributed SCM and open source itself are increasing the prevalence of businesses run on the principles of collaboration instead of control, with decision making by the people with the actual information.