Monitoring dashboards

Published on 30 Sep 2018

On Monday my boss came up to me with some pointed questions about our monitoring dashboards and wallboards, what prompted me to

On Monday my boss come up with some very pointed questions about our monitoring wallboards. We talked about it for some time and then I had thought about them some more, because it is a problem that I couldn’t really find instant good solutions over the large internet.

First I wanted to categorize the different kind of dashboards available, because that would give me a tangible idea on what I want to visualize (what is the target of the dashboard) when I am doing it.

In my view there are 3 different kind of dashboards depending on the target of the dashboard - that we could/should/would utilize:

  1. Wall dashboards
  2. Overview dashboards
  3. Telemetry dashboards

Wall dashboards are dashboards which go up to our monitors. In my opinion they should communicate what we want to communicate with them in a single blick. I think the main point for now would be to get an overview of the state of the system and see if there are any problems. I do not think we should have documentation for this kind of dashboard, because it should be obvious even when you haven’t read the docs in 6 months, and we should aim for that. Of course there are some prerequisites to know our system and what components are there, but that should be the only requirement, and after that introduction everybody should be able to see with 1 look onto this board the overall system state. So to summarize:

Overview dashboards are dashboards which live in grafana, and provide us more detailed insight into our systems. They are dashboards which provide insight to how our system is behaving, providing also some high-level telemetry data so that when analysing we are able to ascertain the health and performance of our system. The intended consumption method is to look at them when sitting in front of a screen and being able to look at this for longer than 10 seconds. They could be documented on a high level, though IMO still most of the devs working on could understand it just by looking at the dashboard (even if that understanding is not immediate).

Telemetry dashboards are dashboards which communicate the telemetry of an application in detail, and from them we should be able to look at any application in detail, determine their health and also make prognoses based on the telemetry data. This is the most detailed of all the boards we should/could/would make and also provides the most detailed look into a single application. The documentation on this should be detailed because here the devs would spend the most time, and it should also be very detailed.

Coming up: building wall dashboards on TVs.