Structuring Your Knowledge Base: Tree or Graph? Or Perhaps Map?


Cat wondering whether to structure its knowledge base as a tree, graph or map

Earlier this year, World of BS issued a sharp rebuke of software documentation in their essay “Why Your Company’s Documentation Sucks”. The writer talked about two approaches to writing documentation: tree-based and graph-based, fiercely advocating the latter over the former. They said that tree-based documentation requires more effort and doesn’t scale and that everybody should start using a graph-based system for documentation instead.

The argument about tree-based versus graph-based documentation has been debated at length in the software community and the answer’s not all that black and white. Actually, we think that the best approach is a delicious blend of the two. No, that doesn’t mean we’re sitting on the fence just so we can be nicey-nicey friends with everyone. We’re saying that, in this case, the fence is 100% the best place to be.

Make like a tree

World of BS calls the tree-based approach the “obvious way to organize information”. And indeed, most file systems work this way. To find a piece of information, you start at the root, work your way up the tree, and travel along the branch that will lead to the, ahem, leaf that you’re after. In other words, documents are arranged in a hierarchy.

For example, a file system of your favorite movies might start with one big folder titled “Movies”. In that folder, you’d have sub-folders for “Sci-Fi”, “Fantasy”, “Horror”, “Western”, “Romance”, “Comedy”, “Thriller” etc. And in each of those sub-folders you’d have movies pertaining to those genres.

So far, so sensible.

But a purely tree-based system becomes problematic when the user doesn’t know exactly where something’s likely to be. No one wants to click through a dozen sub-folders to find, create, or update a piece of information, nor to they want to have to backtrack and start again if they end up in the wrong place.

And there’s another, bigger, problem. What if the piece of knowledge you want to document pertains to more than one ‘branch’? Take the movie Back to the Future Part III. Would you put this in “Sci-Fi”? Or in “Westerns”? Do you duplicate the movie and stick it in both? What about Alien? “Sci-Fi” or “Horror”? How about the ultimate genre-bender, Ghost? That one could arguably go in “Fantasy” AND “Romance” AND “Comedy” AND “Thriller”.

Such is the quandary faced by software engineers looking to add to the collective knowledge. They’ll likely decide that there’s no point duplicating the document, otherwise they’ll need to update it each time for each area. And frankly –

So, the engineer is forced to pick the folder they think is the most relevant and ignore the others even though they’re relevant too. But another engineer might make a different decision, leading to poorly organized documentation – with related documents popping up all over the place. World of BS calls this “documentation rot”. Eventually this leads to engineers losing trust in the documentation, choosing not to use it, and choosing not to add to it.

Make like a graph

The graph-based approach is about having a structure that is non-linear and non-hierarchical, with a system of connected points. Wikipedia is a great example, where instead of being organized in a hierarchy, each document is free-floating and the relationship between documents is not determined by the folder they sit in, but by hyperlinks between them. A Wikipedia page typically has dozens of links to other related pages.

World of BS argues that software documentation should be organized like this because then the user doesn’t need to backtrack all the way to the root if they take a wrong turn. They simply have a walk around the page they’re on and they’ll likely find the information they need because of links to other relevant pages. And more importantly, there are no decisions that need to be made about where a piece of information should go. If a piece of information is relevant to two projects, just add it as a new document and link to it from both.

But it’s really not as simple as that. You can’t just say to users of your knowledge base, “Here’s a big bunch of docs for you to sift through – good luck!” Users need an entry point. If all the documents in your knowledge base are organized in a graph, they’ll have no idea where to start. It’s fine if you’re just casually browsing, but most people will be looking for an answer to a specific question.

The graph structure can work if there’s a super-powerful search facility that can direct you to the most useful node in the graph. The entry point for most Wikipedia users is Google. However, unless your knowledge base is as extensive as the 6,391,804 articles that populate Wikipedia, and your search as powerful as Google, you run the risk of your users getting irrelevant results, and getting frustrated and annoyed that they can’t find what they need. Then, even if the documentation is there to be discovered, no one’s discovered it, rendering it useless.

So, there still needs to be some semblance of a tree for your knowledge base to truly make sense. Really, instead of going hard one way or the other, the best solution is, as with many things in life, a healthy compromise between the two.

So, maybe, make like a map

There are documentation systems out there which can enable you combine the tree and graph-based approaches when structuring your knowledge base. Confluence, for example. With Confluence, you can have the best of both worlds, which I’m going to dub the map-based approach. On a map, you have your starting point and your destination, and the map will show you a sensible way of getting between the two (like a tree). But it’ll also show other ways of getting to your destination, especially if you happen to start in a different place (like a graph). A map-based structure still gives you a path to take, whereas a purely graph-based structure gives you no path and expects you to search for it.

In Confluence, you have spaces. Inside those spaces you have pages. You can then create sub-pages, known as ‘child pages’. And then sub-pages of your sub-pages. That’s your tree, your traditional file system. But Confluence is also built like a wiki, in that you can create links between pages really easily and create a system of related pages. And even if you’re taking your own path around the documentation, through the related links, Confluence keeps the tree visible on the left-hand side of the page, so you still know where you are on the map and how to get back.

You’re probably thinking, okay, but if the structure displayed down the left-hand side is still a tree, how does this help with deciding where in the tree a document should go? Say you have a child page that’s relevant to more than one parent page. Where does it go?

Well, Confluence has a macro called “Include Page”. So you’d do this:

  • Page 1: Content
  • Page 2: [Include Page 1]

This doesn’t just make a static copy of Page 1 onto Page 2. Rather, Page 2 is synched with Page 1 and displaying the live contents of that page. So, the author only has to maintain Page 1 and never needs to worry about Page 2 or anywhere else they’ve used the macro. This is a great tool for when you want a page to appear in more than one place in the tree and enables you to take a more graph-like approach to structuring your knowledge base.


World of BS are wrong to completely dismiss the benefits of the tree-based approach to structuring a knowledge base. After all, there’s a reason that the tree is the “obvious” way to organize information, and that’s because it makes a ton of sense.

However, World of BS are absolutely right that the tree-based approach on its own is not flexible enough for the software community. Indeed, its rigidness is what leads engineers to abandon their documentation systems and instead explain things through writing ad-hoc documents, emails, and comments on forums and Slack. These are also forms of documentation, and often extremely useful in the moment, but they get lost in the ether and are difficult, sometimes impossible, to locate again.

But the graph-based approach can’t work on its own either. Yeah, Wikipedia is great, but Wikipedia is not a knowledge base for software. It’s also worth pointing out that Wikipedia is not entirely graph-based either. For example, some of its pages, such as its guidelines on creating Wikipedia content (below), are organized in a tree-like hierarchy to help the user navigate through them.

Wikipedia guidelines pages organized in a tree-like hierarchy

This is why the best solution is a map-based structure, which mines the benefits of both trees and graphs, and a documentation system like Confluence, which allows for one. Parent and child pages represent the suggested way of getting to where you need to be, while the wiki-esque system of links you can build into each page give users options for a “scenic route”. If the software engineers don’t have to worry quite as much about where a document needs to be, because they can simply create a link to it on a related page, then they’re more likely to use it and add to it.

And then maybe, just maybe, your documentation wouldn’t suck.

Christopher is a self-confessed nerd who’d probably take the cake on Mastermind if Star Trek: Voyager was his specialist subject. He writes fiction about time travel, conspiracies and aliens; loves roller coasters, hiking and Christmas; and hates carpet, rom-coms and anything with chilli in it. He’s written extensively for technology companies and Atlassian partners and specializes in translating complicated technical concepts, specs and jargon into readable, benefits-driven copy that casual readers will understand.