We got mentioned a little over at the blog of a new development project called Enclave. For the purposes of full disclosure, I should point out that the person who wrote the blog is actually one of the people in the Cemetery domain here, so someone who is already very familiar with what we’re doing. The post started my mind wandering down a few avenues regarding size, complexity and the tractability of projects. I gave an off-handed estimate for the linked blog entry as to the size of Epitaph. As I sat down and thought about it though, I realised that an off-handed estimate wasn’t really enough. What I wanted was a reasonably accurate metric that would suitably illustrate just how big a beast Epitaph actually is.
Metrics in software engineering are contentious – some of them are good for certain purposes. Some of them are wall to wall bad. Unfortunately, the ones that are simplest to compute are the bad ones. Take the standard unit of the KLOC for example – it means ‘one thousand lines of code’, and for a long time it was the default way by which project size and complexity was measured. Its main advantage as a number is that it is very easy to calculate. Its main disadvantages are that it is useless for accurately gauging anything other than ‘number of lines of code in a system’. KLOCs don’t measure code complexity, although you can infer certain things as a general rule. KLOCs don’t measure productivity – sometimes the most productive thing you can do in a day is actually edit large chunks of code down into small chunks of code. They don’t measure quality – there is no reliable correlation between lines of code and the quality of a software project. Really, they tell us very little. Steve Ballmer of Microsoft had some interesting things to say about KLOCs as a measure for code output:
In IBM there’s a religion in software that says you have to count KLOCs, and a KLOC is a thousand lines of code. How big a project is it? Oh, it’s sort of a 10 KLOC project. This is a 20 KLOCer. And this is 50 KLOCs. IBM wanted to make it the religion about how we got paid – how much money we made off OS/2 and how much they did. How many KLOCs did you do? And we kept trying to convince them – hey, if we have a developer who’s got a good idea and he can get something done in 4 KLOCs instead of 20 KLOCs, should we make less money? He’s made something smaller and faster, less KLOC. KLOCs, KLOCs, that’s the methodology. Ugh!
Nonetheless, the fact that it’s so easy to generate a LOC count and so hard to reliably do much else, I’m going to use it as the basis of this post. Just bear in mind – it doesn’t tell us much beyond what I’m going to talk about. You can’t look at your project and say ‘It’s this number of KLOCs which means blah blah blah’. However, you can say ‘This is what is typically true, on average, for projects of this number of KLOCs’. It’s useless as a specific measure for your project, but in the aggregate it does give us some reasonable data.
Estimating the scale of a large codebase is reasonably complex. For Epitaph, I started off by simply adding together disk size for directories, and then getting a rough ‘average line length’ for code files. I took twenty handlers largely at random and counted how many lines of code they were, and then divided the size of the files against the lines of code. This gave me an average of around 21 characters per line. First attempt to calculate a LOC count then is ‘size of code base / 21’. That yielded around 1.8 million lines of code in our ‘core code’ directories, with around another 1.7 million lines of code in domain areas. That seemed a little on the high side, so I looked into it again.
First problem – I was double-dipping for a lot of the files. We use a revision control system here that makes a kind of ‘tracking’ file for every file registered with the system. When you make changes, it records a ‘diff’ between the old version and the new version (so you can easily revert changes if needed). You can see the basics of how that works by clicking the ‘history’ tab on any arbitrary page on Wikipedia. Those all had to be discounted to get an actual size for our directories.
The second thing was we needed to discount files that weren’t ‘code’. I used a broad definition for this – as long as it was gobbled up by our mudlib in the pursuit of building our game world, it counted as code. Thus, data files and header files were all classed as ‘code’, while text files, HTML files, and graphics weren’t. It was roughly at this point I realised that I needed something that would crunch through this stuff for me, and while it was doing that it may as well do away with the rough code estimation and simply count the lines of code directly. So, I wrote a LOC estimator. This is what it told me:
- Total for “/include/” is 15897 LOC in 539 files
- Total for “/mudlib/” is 262931 LOC in 1423 files
- Total for “/d/game/” is 220133 LOC in 7057 files
- Total for “/d/support/” is 50755 LOC in 1119 files
- Total for “/global/” is 24948 LOC in 125 files
- Total for “/items/” is 62559 LOC in 1454 files
- Total for “/cmds/” is 69961 LOC in 1020 files
- Total for “/data/” is 139581 LOC in 2433 files
- Total for “/net/” is 16822 LOC in 119 files
- Total for “/room/” is 933 LOC in 23 files
- Total for “/secure/” is 24711 LOC in 350 files
- Total for “/www/” is 219656 LOC in 1700 files
- Total overall: 1108887 in 17362 files
A line of code in this respect is made up of any line in any ‘code’ file that has a length of greater than one. We discard white space and open/closing curly braces, and count everything else. It doesn’t ignore comments, so the number is at least a little inflated. However, shameful as it is, we don’t actually have a lot of comments in our mudlib – the ones that we have are almost entirely legacy.
This is a reasonably accurate LOC count for Epitaph then, and puts us a little past 1.1 million lines of code. In the grand scheme of things, this isn’t huge – a modern car probably has more lines of code in its operating system. Windows XP is estimated to have had around 45 million lines of code. But, let’s put it into perspective – World of Warcraft, which has been under constant development by hundreds of employees for over ten years, has 5.5 million lines of code. Quite a lot of our code was ‘inherited’ from the original DW lib we hacked at for so long, but unfortunately it’s not possible to tell at this point how much of it is ours and how much of it is ‘theirs’ without doing a lot more work than I’m willing to. We’ve rewritten a lot of the handlers and inherits from the ground up, and those that we haven’t we’ve chipped away the rough edges and made them shine in the way we want them to shine. We’d need to do a full code analysis to say for sure how much of it is specifically *ours*. It doesn’t matter though, that’s not really my point – this isn’t about bragging, it’s about providing some interesting data to those who might be reading.
This is a physical count of lines of code, and as I say above it’s not worth much more than as a measure simply of ‘look how many lines of code we have’. However, industry standard averages can be a useful guideline – while they won’t take into account the specifics of a project, they’re ‘good enough’ for our purposes. I read a great post today on the subject of ‘a million lines of code’, which you can find at http://www.embedded.com/electronics-blogs/break-points/4026827/A-Million-Lines-of-Code. Some of my favourite points:
- The schedule for developing a million lines of code is 22 times bigger than for 100,000 LOC.
- A million lines of code will typically have 100,000 bugs pre-test. Best-in-class organizations will ship with around 1k bugs still lurking. The rest of us will do worse by an order of magnitude.
- A million lines of code will occupy 67 people (including testers, tech writers, developers, etc) for 40 months, or 223 person-years.
- A million lines of code costs $20m to $40m.
These four key points bring us to the meat of the matter for Epitaph. They highlight just how unusual it is for a MUD of the size and complexity of Epitaph to open at all – the task is absolutely mammoth. They also highlight just how big a task remains ahead of us – we’re not in the ‘pre-test’ phase, and a lot of the code we have has already had years and years of debugging by virtue of being part of another MUD, but we’re also not going to be best in class *yet*. The scale of the task ahead of us in quality assurance is also massive – we’re likely to have somewhere between 1,000 and 100,000 bugs that need dealt with, and we have no idea where abouts the actual number is going to land. Luckily we’ve had a few testing phases before, and the nature of the architecture of Epitaph means that points of failure can be rapidly identified and fixed with maximum impact – as a frightening glimpse of a ‘path not taken’ though, it’s a little chilling. If we hadn’t centralised so many of our systems, we’d be looking at the business end of a nightmare scenario and no mistake.
There are other metrics for calculating the number of bugs – one common one is that there is, on average, one bug every 10 lines of code, which would give us 110,000 bugs. Yeesh. Luckily, the same metric also indicates that around 85% of those will be caught during development, giving us a total of around 0.15 bugs per 10 lines of code, or one bug every 66 lines of code roughly. That gives us a ballpark figure of 16,666 bugs in Epitaph. Yeesh again. Big software projects have more bugs, because it’s harder for any one person to understand the full complexity of what is going on but also because there is *so much* going on that bugs are harder to identify in the first place. And by now, Epitaph is very big – and we don’t have 223 person-years available.
Luckily, much of our game is in the form of area files and config files – we’ve worked hard to consolidate as many of the moving parts as possible into single points of failure. This means that when they break, they *really* break – but it also means that when they’re fixed, they’re fixed for everything. Our separation of ‘development’ and ‘live’ servers ensures that we can have all kinds of logging and automated testing going on when we’re working without it being enabled for players. We have a handler for example that goes over every object in the game world and checks to make sure it has everything that should be set. It checks to see if NPCs are clothed, if rooms are missing shorts or chats, if items are missing materials, and so on. This handler doesn’t run over on Live, so we get the benefit of a kind of ‘test driven development (a *kind of*) without worrying about the game performance problems for players.
The issue of ‘cost’ is interesting here, in that while it may cost that much to commission a piece of software of this type, in real terms its value is only what someone can be expected to pay for it. As much as I love what we’ve done with Epitaph, if someone offered me even the lower end of the estimate above, I’d ditch it so fast that I’d leave a comically spinning licence plate in my flame trails. But, there is no danger of that – while the code may have ‘cost’ a lot, it doesn’t have that *value*. Still, it’s nice to think that what we have here is, conceptually at least, a multi-million pound resource.
These kind of figures are why I always roll my eyes when people act like text based games are in some way simpler than other kinds of games. Such attitudes reveal the ignorance and often prejudices of the person making the statement – the complexity and staggering scale of an entity like Epitaph is simple, unanswerable evidence of their foolishness. Here on Epitaph, I like to think we’re doing the impossible, and it is that which makes us mighty.
 Although again, there are inferences. We’ll get to those.
 Domain code, because so much of it is descriptive, got a different calculation – 40 bytes per line.
 Judging by https://www.facebook.com/windows/posts/155741344475532
 One with players. This also highlights an important fact – no matter how much work you put into a game engine, until you’ve had players going at it with hammers and tongs, it’s not a suitable basis for a ‘real’ game. Until you’ve been *properly* tested in anger, go home and get your fucking shinebox.
 The actual metric is a little bit more complicated than that, but dealing with it as LOC will do for our discussion here.
 Another metric at http://interconnectit.com/100/how-much-does-code-cost/ – £20-£25 for each line of new code.