[Reading Time: 15mn]
In software development projects, documentation is often considered as an after-thought – if not completely ignored. As professionals, we all know that documenting is part of the design and development activity, much like pushing artifacts to production is part of the development process. Of course, I am not talking about the project’s own documentation like specifications / requirements or sponsors meeting minutes or project planning. They are obviously necessary, but not what I want to review here. No, here I am talking about documenting the design and the code. And remember, we have all been told that it is not only good practice, but also contributes to the overall quality of the project. Not convinced? Okay, let me tell you a story we all know already.
Joe Programmer meets his new code base
Remember the canonical example of maintenance: the team members who wrote the software have long quit the project, and you have been appointed to do a small change. The code is easily available because the team used source control from the get-go (I know, this sounds unreal but wait). You even have acess to the right version of the IDE, matching the project files format. Sounds like a dream. But then, you open the project and you wake up: there is not a single line of comment anywhere. The project structure doesn’t look like what it should for that kind of project, meaning it doesn’t follow some known standard. And of course, it is not documented anywhere. Feverishly, you look for unit tests and … voilà, they’re there! But actually, empty. It’s just the default stubs automatically generated by the IDE…
Okay, not everything is lost. After all, you can at least generate some static UML diagram to get some global code navigation map which you will refine as you go deeper reading the source. So you fire up <insert any UML tool with reverse engineering plugin here>, point it to the root of your project and wait as it interprets the code to render some class diagram.
The first result is of course not good: you forgot to remove the framework code from the scanning process and the diagram looks like a picture of an ant farm on a busy day. The second scan is better, as you only get the projects’ own entities in there. Your UML tool even went as fat as drawing relationships between some classes. Wow, it looks great once printed in color! You could stick it on the wall next to your desk and actually look clever now!
Except. This. Is. Only. A. Static. Diagram. Which tells you nothing about how the code flows at runtime, does it? Now you need to switch to another tool which would watch the code base at binary level when it is running, to report on object method calls and draw some kind of sequence diagram.
Phew, finally! Well done! You’re set! Except. You’re not.
Look at this entity class. Looks like it’s been generated by some sort of tool… Hmmm, let me look deeper… Argh! Yes, you found it! Argh, no! Not that, for programmers’ sake, please no! But alas. You soon realise that yes, an ORM framework has been used to map the database structure to your object structure. So, is this when you resign?
2 kinds of documentation
Let’s leave Joe Programmer to his tears, while we quickly review what’s at stake here. There are 2 kinds of code documentation: inside the code base and outside.
The first kind lives with the code, but is fairly limited: comments and unit tests.
Comments help understand local complexities, like why on earth did you override string concatenation in that class, or how this calculation’s intricate loop actually work and what are its constraints (like number bounds). All that kind of very useful detailed eplanations. You can still “decorate” class or method bodies if you like but let’s agree that these are not useful comments (except maybe when they reference your ticketing system).
Unit tests go beyond this. Instead of explaining how such method work, it actually shows how it works. Instead of explaining what constraints should be enforced when using such algorithm, it actually ensures that it blows when you don’t respect them. And of course, because you don’t want unit tests to go dusty, you include them as critical path in your build process.
Of course, this is all not perfect. Comments need to be updated when code changes in order to stay relevent. Unit tests are supremely difficult to design, organise and maintain. But they have 3 advantages:
- they are the minimum ammunition you case use to build maintainable code
- they are part of the code base
- they are useful, even to you.
The second kind of documentation is outside the code. As such, it suffers a lot of disadvantages:
- it is an after-thought, even (or mainly) when it can be generated by analysing the code
- it is not helpful per se (as opposed to unit testing)
- when it cannot be generated by some reverse engineering tool, it is difficult to create and harder to maintain
- it is boring (although it doesn’t need to be)
Examples are of course UML diagrams which should be used as maps to help navigate through the code base. Static diagrams are useful (class diagrams) to get at the picture of the code when of course it is object-oriented. Dynamic diagrams (sequence diagrams) are useful when you need to understand how the code behaves at run time.
Static diagrams can easily be generated by a whole lot of existing UML diagramming tools when your code base is in an object-oriented curly-braces language (C++, Java, C#…). Generating sequence diagrams is of course a lot harder because the tool needs to interact with the code while running, for example attaching to it the way debuggers do. Often, you will need to build sequence diagrams yourself so, if you’re the original creator / maintainer of the code, you would be well advised to draw them from the start.
Data Flow Diagrams (DFD) are of that kind as well, as I don’t know of a tool that can generate them for you. So generally, you will reserve them for very specific data flow aspects of your design, like for example how you get those files from across the network, merge them based on some business rules before injecting the data in the master database.
There are also the standard practices that you follow. The project structure, how you name your code artifacts based on their type, what specific control structure you use for loops, what the favorite array data structure you should use, etc… Depending on the languages and frameworks you use, some of these standards will already be imposed on you. If this is not the case or you need to deviate from them (but seriously: don’t !), then of course, you need to document them so that a new recruit can get at them quickly.
Finally, you could potentially include the documentation of the build process that your team uses. I find this one rather tangent to be honest and I wouldn’t personally make it part of the overall design and development activity. It is a project-level documentation, alongside the documentation of the chosen source control workflow – but your mileage may vary.
Once you have the kind of documentation we’ve discussed above, you need to build a process to have it maintained. And all developers in the team need to understand that producing up-to-date documentation is actually part of the work. Which is why you want to automate it as much as you can. The closer to the code, the better.
Which means that, even for the UML diagrams generated alongside the code base, the documentation needs to be versioned like your source code is, using the same source control system and practices that you are using already.
As a language for describing an enterprise architecture methodology, TOGAF is mainly concerned with the design, implementation and governance of a company’s IT architecture. Very useful. Completely out of scope for the concern at hand, which strongly relates to software design. You could think of TOGAF as a meta-architecture, an architecture language to talk about architecture, a process for defining and maintaining the global architecture of a company’s IT systems architecture. Not the kind of project-based technical documentation which we have discussed here.
Where to go from here
When you design a piece of software, you need to communicate your thoughts and the meaning of your design. To your developer mates, to your sponsors and clients. To do this efficiently, you need some documentation. The kind of documentation we have discussed here is fairly technical and logically targets development teams. It can be expanded but this is the basic minimum without which you cannot talk about your design, your architecture, your software.
And we have actually left one apart: how to document the data model which is part of your architecture? This will be the subject of our next post.