Big Data Taxonomy Visualisation

[Reading Time: 15mn]

What is it?

In the last couple of blog posts, I have explained my thoughts on software architecture documentation in general, with a focus on data-driven architecture documentation.

Data management has become a complex topic over the last decade, mostly caused by the explosion of data sources, itself a consequence of the explosion of internet-connected services and devices. A term has been coined to depict this situation: Big Data.

Oftentimes, our clients want to understand 2 things about Big Data:

  • What it means, in order to understand whether they need to feel concerned.
  • What the landscape is, or the cartography, or even better: the taxonomy.

The approach is interesting because it is pretty generic in science. When you want to tackle a new subject, you start by identifying and collecting facts, which you then describe and classify in categories or hierarchies, called taxonomies. Once information is organised in this way, you can start making sense out of it, and use it as an abstraction which you can use to further study another level of information.

At Artek Consulting, we are software architects. So of course we built tools and automated a process to help us create taxonomy graphical representations (we won’t go into the details of these tools now and we’ll dedicate a future blog post to that).

And to answer our clients questions on Big Data, we applied this process to the subject and produced a Big Data Taxonomy visualisation in the form of a dynamic TreeMap.

Big Data Taxonomy TreeMap

Here’s how it looks like.

Screen Shot 2015-09-02 at

Of course, the dynamic version is available – it’s large though so we put it in its own page.

As you see, there is a lot going on here:

The zoomable TreeMap is based on D3.js, and has been spiced up a little to add title headers. D3 is a javascript SVG library which can build interactive data visualisations. Speaking of which, D3 is itself actually represented in the TreeMap “presentation” group.

The TreeMap represents a hierarchical nomenclature of the Big Data Tools landscape, or taxonomy. Some groups are further refined before ‘leaf’ items are explained in more details. The initial motivation was to group the theoretical elements (see the “research” group) together with the actual software products that have been, and continue to be, implemented.

You can zoom in and out of the groups. Clicking the groups will navigate one level deeper / higher, while clicking an item will get zoom in directly to that item or back to the global map.

Groups names might be further adorned with an information circle: hover your mouse on their title to read the details.

Screen Shot 2015-09-02 at

Leaf items have some more information attached to them when needed: notes, vendors, source type, inception date, lifecycle proposal and further links.

Screen Shot 2015-09-02 at

What’s Next?

The Big Data landscape has been a moving target for a decade so we’ll not get this taxonomy right from the start. This is a Work In Progress ™.

At Artek Consulting, we are happy to bring this as a tool to the community! Please do not hesitate to comment.
We will fix errors and omissions, and continue to update it based on our own experience and on your feedback. So if the topic is of interest, bookmark the page and come back to it regularly.

Thanks for reading



Leave a Reply

Your email address will not be published. Required fields are marked *