Technology

Icinga 2.5 and InfluxDB

This post looks into the official release of Icinga 2.5 featuring the InfluxDB writer plug-in. In my previous post I delved into why we integrated Icinga 2 with InfluxDB, with excellent results, most of all not a single storage related alert since and blissful sleep my reward. That was however performed with a very early cut of the code before it had even hit the community servers. Icinga 2.5 is released today to the general public after 5 months of finesse, bug fixing and improvements to error reporting. I think it only right to outline the official line protocol used to transfer data from Icinga2 to InfluxDB, my personal Icinga2 configuration and how to get the most out of your performance metrics with Grafana.

Line Protocol

Before discussing configuration lets have a look at what actually gets passed on the wire between Icinga and the InfluxDB server.

disk,domain=angel.net,fqdn=puppet.angel.net,hostname=puppet,instance
=/,metric=/ crit=38016122880,max=42240835584,value=9263120384,warn=33792458752 1471951338

A quick refresher for those unfamiliar with InfluxDB. The first element in the line protocol is the measurement name, in this case it is data from the disk check. An optional list of tags follows which are utterly arbitrary text keys and values. A second list defines fields which may be any number of types of typed measurement e.g. floating point, integer, boolean etc. The last figure is the time stamp.

The bits that Icinga 2 gives you for free are:

metric
this tag is the label associated with a check’s performance data in this case ‘/’, the mount point being examined
value
this field is the value returned by the performance data
min,max,warn,crit
these fields are optionally added if available from the performance data and enabled with the enable_send_thresholds option

We format all fields extracted from performance data as floating point values as we have no idea what type the original script intended. You can also enable meta-data fields e.g. check state, with the enable_send_metadata option, and these are formatted based on the internal type as we know what these are meant to be.

Icinga 2 InfluxDB Writer Configuration

Global Configuration

This is my personal configuration, but will be the Data News Blog standard very soon

/**
 * The InfluxdbWriter type writes check result metrics and
 * performance data to an InfluxDB HTTP API
 */

library "perfdata"

object InfluxdbWriter "influxdb" {
  host = "influxdb.angel.net"
  port = 8086
  database = "icinga2"
  host_template = {
    measurement = "$host.check_command$"
    tags = {
      fqdn = "$host.name$"
      hostname = "$host.vars.hostname$"
      domain = "$host.vars.domain$"
    }
  }
  service_template = {
    measurement = "$service.check_command$"
    tags = {
      fqdn = "$host.name$"
      hostname = "$host.vars.hostname$"
      domain = "$host.vars.domain$"
      instance = "$service.vars.instance$"
    }
  }
  enable_send_thresholds = true
}

Host checks use the host_template and service checks use the service_template to determine what tags are added to the data points as they are sent to the InfluxDB server. Most of this is common sense. Using puppet facts I export Icinga 2 host definitions populated with various custom variables set. The InfluxDB writer plug-in creates a data point it interpolates macros like $host.vars.domain$ and retrieves the actual domain name from the host object and sends this as a tag over the wire, furthermore if a macro expansion fails the tag is simply not added.

Dynamic Instance Tagging

I talked in my previous post about applying services to every element of a hash defined in the host variables. Consider the following check

apply Service "mount" for (mount => attributes in host.vars.mounts) {
  import "generic-service"
  check_command = "disk"
  display_name = "mount " + mount
  vars.disk_ereg_path = attributes.path
  vars.instance = mount
  zone = host.name
  assign where host.vars.mounts
}

This iterates over every mount defined in the host.vars.mounts hash and checks that specific instance. Some times you want to know which instance the check was for, more so if the performance data label is the same for all invocations on different resources. This example illustrates setting the service.vars.instance variable for a specific server check instance. The instance macro defined in the InfluxDB writer service template picks up the service variable if defined.

Grafana Dashboards

This is perhaps the most fun section as the results are tangible, and should give you inspiration as to how to craft your tags in Icinga 2 to create useful and well organized dashboards for your own needs. My personal preference is to have machines organized by domain as typically you will have the same host names in different domains.

It makes the whole thing more manageable rather than having one huge list keyed on the fully-qualified domain name. As you can see below the real power comes when we are able to query the schema and work out what mounts there are on a specific system based on the instances we have applied service checks to. This goes for block devices, network interfaces, phys, certificates. The possibilities are endless. Having to create a single graph and then displaying context specific data for particular instance keeps dashboards clean.

 

Templates

Pick a generic measurement which is available on every host to configure your templates for things like host name and domain. To get the variable $domain for example look at the load measurement, then extract all the possible values for the domain tag.

For $hosts perform a similar look-up, extracting all values that exist for the hostname tag, constraining to only the measurements that exist in a particular domain. Later variable queries can consume prior template variables.

The $mount template variable is similar in that we find all values for the instance tag from the disk measurement, but constrained to only measurements for a particular host in a particular domain. The Grafana documentation of the InfluxDB back-end explains the odd looking syntax for constraining the queries.

 

Graphs

Finally we need to create a graph to display the data. The image below depicts how to do this. Simply put we select all data from the diskmeasurement where the domainhostname and instance match the selected template variables. Then finally we are able to select the value field for display.

 

Conclusions

And there you have it! Hopefully I have fostered inspiration. Try it out. Share your experience.