Technology

Greater Manchester Data Dive Report

Big Data in Manchester

Mike summarised big data and its potential far more eloquently than I could, so I defer in this instance. What is critically important to me is the collaboration that occurs in the wider community. We as developers entered this event with a number of things on our mind. First up was to build broad links between ourselves, as a service provider, with public and private sector data analysts and with academia. It’s important to us to listen to what people really want, and this was a great opportunity to engage. Next up, what do businesses need to succeed? How can we help them minimise set up and operational costs, and make them more productive? We had a good time doing all of this which I shall delve into without further ado.

Diving In

The day kicked off with a cup of tea and a series of introductions by the University of Manchester who organised the event. After being divided into teams we were ushered into a creative space within The Landing at MediaCityUK. The task: using data sets provided by Manchester City Council, the University of Manchester and Aridhia (who also provided cloud based data analytics software in the form of AnalytiXagility). We were to mine the data and look for insights that could aid the local community. It also provided ample opportunity to get a feel for how Big Data works and the challenges faced while working with it. We opted to drill into a dataset of ~700,000 twitter messages from a broad region encompassing Greater Manchester and Lancashire and see what could be derived. My initial intuition was to dive in with a custom C++ application to leverage the high performance offered (besides my knowledge of data mining is non-existent, so stick to what you know!) Fairly quickly longitude and latitude were able to be extracted and used to generate a heat map overlay which could be plotted on Google’s maps API. Everything looks much as you’d expect with a fairly even distribution over built-up areas, tending to increase in the major urban centres. There were some rather obvious anomalies however which were worth further investigation.

As it turns out there are some prolific tweeters out there. We are talking in the order of thousands of messages over a single month. Oddly enough most of the small red hot spots are actually single users, one user in particular had something of an infatuation with 5 Seconds of Summer! This highlights one of the challenges we discovered with Big Data in general; filtering out the irrelevant data that skews the picture. Veracity of data as is the parlance within IBM. So what was important to all these twitter users? First we were able to construct a dictionary of all the words used, then allow discrete users only a single vote as to a word’s popularity. This had the effect of mitigating the impact #5SOS and @Luke5SOS had on the overall picture. Given the data set covered a period of June-July it was unsurprising that #WorldCup2014 was on everyone’s mind. More surprisingly the most popular celebrity was @GaryLineker. Using further techniques such as sentiment analysis revealed Gary is generally regarded in a positive light by Greater Manchester, he’ll be pleased to know! To bring the day to a close we presented our tenuously useful findings to the participants. It was good to see the work of professional data analysts to see what is truly possible with analytics.

Wrap Up

Big Data is difficult. First, acquiring the data. There is a lot out there, but you need to gather enough relevant to your studies. Then there is cleaning the data up to reduce noise caused by power users or automated services (one of the most prolific accounts was that of a ‘word of the day’ service). Finally there is the actual analysis, discovering semantics, links and trends. But it was educational to learn all this first hand. Some interesting leads which came out of the day were the Hadoop Manchester group, useful for anyone interested in how big data is done in local business, which I will try to attend. Secondly, and this links in nicely to my time in Italy where these technologies were discussed, there is a lot of interest in data flow analysis (e.g. Apache Spark) as a successor to the map-reduce paradigm, and SQL on Hadoop. We’d be really interested to hear your thoughts on this as they are definitely services we’d be eager to offer.

Technology

Collaboration drives Innovation

Manchester has been at the heart of the development of computing over the last 70 years. Alan Turing taught at Manchester University, and Turing’s work inspired the creation of the world’s first stored program computer, the Small Scale Experimental Machine also known as the Baby, which was designed and built in Manchester.

This was the beginning of an ongoing collaboration between the university and industry which led to the world’s first commercially available general purpose computer being developed and built in the city, the Ferranti Mk1, and other Ferranti models followed, including the UK’s first supercomputer and for a time the most powerful computer in the world, the Ferranti Atlas. That legacy continued in the city through ICL’s work on their influential and commercially successful 2900 mainframe series into the 1970’s.

Manchester University today continues to be at the cutting edge of computing research. Many of you will be familiar with the work of Professor Steve Furber, one of the designers of both the BBC Micro, which introduced a generation in the UK to computer programming, and the ARM microprocessor, which powers our mobile world. Professor Furber’s Advanced Processor Technologies group at Manchester are now engaged amongst other things in one of the Great Challenges of the modern era – building a novel processor architecture to simulate the human brain.

We’re very proud of that continuum, and we think that strong relationships between academia and commercial organisations are key to continuing to develop innovation in computing in the UK into the 21st century. As part of that we’re extremely proud that one of our Cloud Systems Engineers, Simon Murray, has been accepted onto the Tenth International Summer School on Advanced Computer Architecture and Compilation for High-Performance and Embedded Systems, which takes place in Fiuggi, Italy in July this year.

This is a week long event for computer architects and tool builders working in the field of high performance computer architecture, compilation and embedded systems. The school aims at the dissemination of advanced scientific knowledge and the promotion of international contacts among scientists from academia and industry. Courses cover a broad scope of computing from highly scalable distributed systems right down to compilers and processor design, and are taught by some of the worlds foremost computer scientists, from prestigious universities across the world and research organisations inside tech companies like Google and IBM.

Simon will find some time to blog about his experiences in Italy, and we look forward to assimilating some of that thinking into our development teams on his return.

Programming, Technology

Puppet Custom Types

Usually when creating custom types in puppet you will use templates to define a set of resources to manage. If you find yourself littering your template classes with exec statements then the likelihood is that you should probably consider creating a native custom type and directly extending the puppet language. This post is dedicated to just that as online documentation surrounding this topic is hazy at best, so I shall attempt to describe in layman’s terms what I discovered. Not only is this my first attempt at creating a custom type but also my first dalliance with Ruby, so I’m sure noob style errors will abound. Take it easy on me guys!

Background

So for this example I’m going to walk you through the development of my type to handle DNS resource records in BIND. Existing solutions out there typically just create zone files then don’t touch the contents again, assuming that the administrator will have made local changes which shouldn’t be randomly deleted. Some improvements are out there which use the concat library and templates to build the zone files completely under the control of puppet, but that doesn’t fit our requirements.

In our cloud, hardware provisioning is conducted with foreman, which manages detection of new machines via PXE, then provisioning which includes setting up PXE, DHCP and DNS. DNS in this case is handled via a foreman proxy running locally on the name server and performing dynamic zone updates with nsupdate. As you can no doubt appreciate using puppet resource records exclusively will destroy any dynamic updates performed by foreman. What we need is a custom type which creates resource records, which cannot be done in foreman (think CNAME and MX records), but in the same manner so they are not lost in the ether. So without further ado, creating a custom type to manage resource records via a canonical provider, in this case nsupdate.

Concepts

The first consideration when designing a custom type is how is it going to be used. I’m just going to dive straight in and show you a couple of examples of my interface.

dns_resource { 'melody.angel.net/A':
  ensure   => present,
  rdata    => '192.168.2.1',
  ttl      => '86400',
  provider => 'nsupdate',
}

dns_resource { '1.2.168.192.in-addr.arpa/PTR':
  nameserver => 'a.ns.angel.net',
  rdata      => 'melody.angel.net',
}
Title
This is an identifier that uniquely identifies the resource on the system. In this case it is an aggregate of the DNS record name and record type. My first attempt ignored this field entirely and used class parameters to identify the resource, but as we will see later on this is not the correct way of doing it
Properties
These are things like the TTL and relevant data class parameters. Upon identifying a resource or set of resources with the class title, properties are tangible things about the resource that can be observed and modified. In fact these are the fields that puppet will interrogate to determine whether a modification is necessary, which is an important distinction to make
Parameters
These are the remaining class parameters which have nothing to do with the resource being created, but inform puppet how to manage the resource. Parameters such as resource presence and the name server to operate on have no bearing on the resource itself as reported by DNS. These parameters will not be interrogated to determine whether a modification is necessary

Fairly straight forward, but easy to turn down a blind alley if not spelled out explicitly.

Module Layout

Before we delve into code let us first consider the architecture puppet uses to organise custom types. There are two layers we need to consider. At the top level is the actual type definition, which is responsible for defining how the type will manifest itself in your puppet code. Here you define the various properties and parameters which will be exposed, validation routines to sanitise the inputs, munging to translate inputs into canonical forms, default values and finally you can add in automatic requirements. To expand upon this last point a little here you can define a set of puppet resources that are a prerequisite for your type. Puppet will if these resources actually exist, add in dependency edges to the graph and ensure that the prerequisites are executed before your type. Admittedly I don’t like having to rummage through code to identify whether any implicit behaviour is forthcoming, however on this one occasion I will let it slide as it does remove a load of messy meta parameters from the puppet modules themselves.

The second layer is the provider which actually performs the actions to inspect and manage the resource. And here is the flexibility of puppet, you needn’t be limited to a single provider, in this example I’m creating an nsupdate provider but there is no reason why you cannot have a plain text zone file provider, or one for tinydns. These are runtime selectable with the provider class parameter, or are implicitly chosen by way of being the only provider, or based on facts. As an example the package builtin type will check which distro you are running and based on that use apt or yum etc.

Delving a little deeper into providers the general functionality is as follows. When puppet executes an instance of your type it will first ask the provider if the resource exists, if it doesn’t and is requested to be present then the provider will be asked to create it. Likewise if it exists and is requested absent then the provider will be asked to delete the resource. The final case is where the resource exists and is requested to be present. Puppet will inspect each property of the real resource defined by the type and compare with the requested values from the catalogue. If they differ then puppet will ask the provider to perform the necessary updates. Simple

Type code

Hopefully those concepts were straight forward and made clear sense. So lets look at how this all fits together. First lets look at the type definition which lives in the path <module>/puppet/type/<type>.rb

# lib/puppet/type/dns_resource.rb
#
# Typical Usage:
#
# dns_resource { 'melody.angel.net/A':
#   rdata => '192.168.2.1',
#   ttl   => '86400',
# }
#
# dns_resource { '1.2.168.192.in-addr.arpa/PTR':
#   nameserver => 'a.ns.angel.net',
#   rdata      => 'melody.angel.net',
# }
#
Puppet::Type.newtype(:dns_resource) do
  @doc = 'Type to manage DNS resource records'

  ensurable

  newparam(:name) do
    desc 'Unique identifier in the form "/"'
    validate do |value|
      unless value =~ /^[a-z0-9\-\.]+\/(A|PTR|CNAME)$/
        raise ArgumentError, 'dns_resource::name invalid'
      end
    end
  end

  newparam(:nameserver) do
    desc 'The DNS nameserver to alter, defaults to 127.0.0.1'
    defaultto '127.0.0.1'
    validate do |value|
      unless value =~ /^[a-z0-9\-\.]+$/
        raise ArgumentError, 'dns_resource::nameserver invalid'
      end
    end
  end

  newproperty(:rdata) do
    desc 'Relevant data e.g. IP address for an A record etc'
  end

  newproperty(:ttl) do
    desc 'The DNS record time to live, defaults to 1 day'
    defaultto '86400'
    validate do |value|
      unless value =~ /^\d+$/
        raise ArgumentError, "dns_resource::ttl invalid"
      end
    end
  end

  # nsupdate provider requires bind to be listening for
  # zone updates
  autorequire(:service) do
    'bind9'
  end

  # nsupdate provider requires nsupdate to be installed
  autorequire(:package) do
    'dnsutils'
  end

end

The first few lines are just boilerplate code, here you define the name of your type as it will appear in puppet code, and documentation, because everyone loves documentation right?

The ensurable method adds in support for the ensure parameter which shouldn’t come as too much of a surprise. What it also does is forces the creation of create, destroy and exists? methods by the provider.

The name parameter must be defined. desc allows documentation of your class parameters. Following that is our first encounter with parameter validation, which is basically checking for a hostname followed by a slash and one of our supported resource record types. Probably not the most RFC compliant regular expression but it works for now!

The nameserver parameter introduces default values so you don’t have to specify them in your puppet code, and the final thing I wish to draw attention to are the autorequires which add implicit dependencies to the graph as discussed previously and may reference any puppet resource.

Provider Code

Now for the guts of the operation, without further ado here are the contents of <module>/puppet/provider/<type>/<provider>.rb

# lib/puppet/provider/dns_resource/nsupdate

require 'resolv'

# Make sure all resource classes default to an execption
class Resolv::DNS::Resource
  def to_rdata
    raise ArgumentError, 'Resolv::DNS::Resource.to_rdata invoked'
  end
end

# A records need to convert from a binary string to dot decimal
class Resolv::DNS::Resource::IN::A
  def to_rdata
    ary = @address.address.unpack('CCCC')
    ary.map! { |x| x.to_s }
    ary.join('.')
  end
end

# PTR records merely return the fqdn
class Resolv::DNS::Resource::IN::PTR
  def to_rdata
    @name.to_s
  end
end

# CNAME records merely return the fqdn
class Resolv::DNS::Resource::IN::CNAME
  def to_rdata
    @name.to_s
  end
end

Puppet::Type.type(:dns_resource).provide(:nsupdate) do

  private

  # Run a command script through nsupdate
  def nsupdate(cmd)
    Open3.popen3('nsupdate -k /etc/bind/rndc-key') do |i, o, e, t|
      i.write(cmd)
      i.close_write
      raise RuntimeError, e.read unless t.value.success?
    end
  end

  public

  # Create a new DNS resource
  def create
    name, type = resource[:name].split('/')
    nameserver = resource[:nameserver]
    rdata = resource[:rdata]
    ttl = resource[:ttl]
    nsupdate("server #{nameserver}
              update add #{name}. #{ttl} #{type} #{rdata}
              send")
  end

  # Destroy an existing DNS resource
  def destroy
    name, type = resource[:name].split('/')
    nameserver = resource[:nameserver]
    nsupdate("server #{nameserver}
              update delete #{name}. #{type}
              send")
  end

  # Determine whether a DNS resource exists
  def exists?
    name, type = resource[:name].split('/')
    # Work out which type class we are fetching
    typeclass = nil
    case type
    when 'A'
      typeclass = Resolv::DNS::Resource::IN::A
    when 'PTR'
      typeclass = Resolv::DNS::Resource::IN::PTR
    when 'CNAME'
      typeclass = Resolv::DNS::Resource::IN::CNAME
    else
      raise ArgumentError, 'dns_resource::nsupdate.exists? invalid type'
    end
    # Create the resolver, pointing to the nameserver
    r = Resolv::DNS.new(:nameserver => resource[:nameserver])
    # Attempt the lookup via DNS
    begin
      @dnsres = r.getresource(name, typeclass)
    rescue Resolv::ResolvError
      return false
    end
    # The record exists!
    return true
  end

  def rdata
    @dnsres.to_rdata
  end

  def rdata=(val)
    create
  end

  def ttl
    @dnsres.ttl.to_s
  end

  def ttl=(val)
    create
  end

end

I’m using ruby’s builtin resover library to check the presence of a resource on the DNS server. The first 4 classes highlight one of the cool things about ruby, classes aren’t static. What we’re doing here is attaching new methods to the DNS resource types to marshal the relevant data into our canonical form i.e. a string, and also providing an exception case in a super class to catch when we add in support for a new resource type. It would have been easy to omit the override and just let things raise exceptions, but I like giving my peers useful debug.

Onto the main body of the provider. nsupdate unsurprisingly calls the same binary with an arbitrary set of commands. Usually you’d use puppet’s commands method to define external commands which enables a load of debug details but in this situation I needed access to standard in. create, destroy & exists? basically do just that, create, destroy and probe for the existence of a resource as defined by the name. The final four calls are accessors for the properties we defined earlier. You have to be careful with these as regards types as puppet will mismatch 86400 and “86400” and try to update the resource on each execution.

Conclusions

All in all, going from zero to hero in the space of 2 days wasn’t as daunting as I’d expected, new language, new framework. Hopefully I’ve summarised my experiences in a way which my readers will be able to easily digest. On reflection the whole expansion of puppet has been breathtakingly easy, and I’m hoping it will provide some inspiration to better improve our orchestration and provisioning efforts. And hopefully yours too! Until next time.

Programming, Technology

Elastic High-Availabilty Clustering With Puppet

In this post I’m going to demonstrate one method I discovered to facilitate HA clustering in your enterprise. The specific example I’m presenting here is how to easily roll out a RabbitMQ cluster to be used by the Nova (Compute) component of OpenStack. Some other applications which come to mind are load balancers, for example you assign a puppetmaster role to a node when provisioning and have it automatically added to Apache’s round robin scheduler. Thus if our monitoring software decides the existing cluster is under too much strain we can increase capacity in a matter of minutes from bare metal.

Exported Variables

Ideally what we want to provide this functionality is some form of exported variable, when collected, contains all instances of that variable i.e. each rabbit host would export its host name and these could be aggregated. Puppet supports neither exporting variables nor exporting resources with the same names. Custom facts weren’t going to cut it either as they are limited to node scope. Then I tripped upon a neat solution by the good folks at Example42. Their exported variables class quite cleverly exports a variable as a file

define exported_vars::set (
  $value = '',
) {
  @@file { "${dir}/${::fqdn}-${title}":
    ensure  => present,
    content => $value,
    tag     => 'exported_var',
  }
}

Which is realized on the puppet master by including the following class

class exported_vars {
  file { $dir:
    ensure => directory,
  }
  File <<| tag == 'exported_var' |>>
}

And then a custom function is able to look at all the files in the directory, matching on FQDN, variable name, both returning an array of values. It also defaults to a specified value if no matches are found. Perfect!

Elastic RabbitMQ Cluster

Here’s a concrete example of putting this pattern to use

class profile::nova_mq {

  $nova_mq_username = hiera(nova_mq_username)
  $nova_mq_password = hiera(nova_mq_password)
  $nova_mq_port     = hiera(nova_mq_port)
  $nova_mq_vhost    = hiera(nova_mq_vhost)

  $mq_var = 'nova_mq_node'

  exported_vars::set { $mq_var:
    value => "${::fqdn}",
  }

  class { 'nova::rabbitmq':
    userid             => $nova_mq_username,
    password           => $nova_mq_password,
    port               => $nova_mq_port,
    virtual_host       => $nova_mq_vhost,
    cluster_disk_nodes => get_exported_var('', $mq_var, ['localhost']),
  }
  contain 'nova::rabbitmq'

}

A quick run through of what happens. When first provisioned the exported variable is stored in PuppetDB, and the RabbitMQ server is installed. Here we can see the get_exported_var function being used to gather all instances of nova_mq_node that exist, but as this is the first run of the first node we default to an array containing only the local host. When the puppet agent next runs on the puppet master, the exported file is collected and executed. Finally the second run on the RabbitMQ node will pickup the exported variable and add it to the list of cluster nodes.

Gotchas

Some notes to be aware of

  • exported_vars doesn’t recursively purge the directory by default, so nodes which are retired leave their old variables lying about, you’d also need to have dead nodes removed from puppetdb too
  • there are no dependencies between the file and directory creation, so it may take a couple runs to get fully synced
  • with load balanced puppet masters it’s a bit hit or miss as to whether one has collected the exported variables or not when you run your agents. This can be mitigated by provisioning the variable directory on shared storage (think clustered NFS on a redundant GFS file system)

And there you have it, almost effortless elasticity to your core services provided by Puppet orchestration.

Programming, Technology

Transparent Encryption Of Offsite Backups With Puppet And Git

I’ll be going into some detail as to how our source control setup works at a later date, but I wanted to address a hot topic before hand – secure storage of configuration data in the cloud.

All of our source code commits are automatically backed up in the cloud. For us this is GitHub, but this should hold for other SaaS platforms such as those offered by Atlassian. As such all of our configuration data goes onto untrusted systems – be it network address ranges or passwords stored in our Hiera configuration files. This also goes for any certificates that need to be part of our puppet environment.

First Steps

Our initial solutions were based on puppet modules as that was what Google initially hinted at. Hiera_yamlgpg seemed to fit the requirements. This module replaces the default yaml backend provided by puppet and provides transparent decryption of the Hiera data files on the fly by the puppet master upon compilation of a catalog. The plus points of this approach was that the majority of the data file could be left in plain text and only the pertinent fields encrypted with gpg like so :

echo -n Passw0rd | gpg -a -e -r recipient

I ran into issues by forgetting to strip the new line at times, and the Hiera data file soon became a mess of GPG statements. Obviously decrypting obfuscated data was a pain, and performing code review was tedious as there was no way of seeing the actual changes without some leg work.

At this point it was decided we should just opt for full file encryption. This led me to wondering whether git supported some form of hook whereby encryption and decryption could be performed while transferring sensitive data on and off site. As it turns out better exists. Git supports filters which can be run on individual files when checked in and out of working branches.

Transparent Encryption With Git

Files can be tagged with attributes either individually or with wildcards in either .git/info/attributes or .gitattributes. The former is on a single repository basis, the latter is under version control and propagated to all my peers, which seems like the right thing

/hieradata/common.yaml      filter=private

The specified file is tagged with a filter called private. This tag can be arbitrary. Now when we checkout (smudge) a file or check it back in (clean) a file with the private filter we can run arbitrary commands on the contents. The input and output are via stdin and stdout respectively.

git config --global filter.private.smudge decrypt
git config --global filter.private.clean  encrypt

The encrypt and decrypt scripts were initially based on PGP, the keys were already installed from our prior dabble with hiera_yamlgpg. And it worked, but not great. The issue was that GPG doesn’t perform encryption deterministically, most likely including data such as time stamps and who encrypted the data. This led to git thinking that the private files were always modified. Not a problem, the files can be ignored in the index and manually committed when a change actually occurred. But this is hardly the transparent work flow we desired. The real deal breaker was when trying to pull from a remote origin, which git refused to do as it would destroy locally modified files. Back to the drawing board then.

Turns out things work perfectly when you remove the determinism.

Encryption & Decryption In Python

There are a couple solutions out there that use openssl, but required compilation which made me steer clear. We’re a python shop, and I’m a geek, so I architected a solution using AES 256 from python-crypto and encoded into base 64.

Important bits are, key and initialisation vector generation

def gen_key():
    """
    Generate a new key
    """
    try:
        keyf = open(KEY_PATH, 'w')
    except IOError:
        sys.stderr.write('Err: Open {0} for writing\n'.format(KEY_PATH))
        exit(1)
    keyf.write(Random.new().read(KEY_SIZE + AES.block_size))
    keyf.close()

Encryption

def encipher():
    """
    Encipher data from stdin
    """
    key = get_key()
    ivec = get_ivec()
    data = sys.stdin.read()
    datalen = len(data)

    # Now for the fun bit, we're going to append the data to a 32 bit
    # integer which describes then actual length of the data as we
    # need to round the cypher input to the block size, this allows
    # recovery of the exact data length upon deciphering. We also
    # specify big-endian encoding to support cross platform operation
    buflen = round_up(datalen + 4, AES.block_size)
    buf = bytearray(buflen)
    struct.pack_into('>i{0}s'.format(buflen - 4), buf, 0, datalen, data)

    # Encipher the data
    cipher = AES.new(key, AES.MODE_CBC, ivec)
    ciphertext = cipher.encrypt(str(buf))

    # And echo out the result
    sys.stdout.write(HEADER)
    sys.stdout.write(base64.b64encode(ciphertext))

And decryption

def decipher_common(filedesc):
    """
    Decipher data from a file object
    """
    key = get_key()
    ivec = get_ivec()
    ciphertext = base64.b64decode(filedesc.read())
    # Decipher the data
    cipher = AES.new(key, AES.MODE_CBC, ivec)
    buf = cipher.decrypt(ciphertext)

    # Unpack the buffer, first unpacking the big endian data length
    # then unpacking that length of data
    datalen, = struct.unpack_from('>i', buf)
    data, = struct.unpack_from('{0}s'.format(datalen), buf, 4)

    # And echo out the result
    sys.stdout.write(data)

decipher_common takes a file descriptor as when used in diff mode git will provide you with a file name, which may or may not be already decrypted. This is the purpose of the HEADER string, to determine whether to perform the decryption or just echo out the file contents. You can enable the diff functionality updating .gitattributes

/hieradata/common.yaml      filter=private diff=private

And your git configuration to act on the tag

git config --global filter.private.smudge 'dc_crypto decipher'
git config --global filter.private.clean  'dc_crypto encipher'
git config --global diff.private.textconv 'dc_crypto diff'

Obviously you need to be pretty secure with your symmetric key and initialisation vector, but I hope I’ve given enough information for you to avoid the same mistakes I did and keep your data secure in the SaaS world.

Programming, Technology

Dynamic Nagios Host Groups With Puppet

So this little problem caused me some headaches. Coming from a C/C++ systems programming background, Puppet takes some getting used to and this assignment was a learning experience and a half!

The setup we wanted was to have Puppet define a set of host groups on our icinga server, then have every host export a nagios host resource which selected host groups it was a member of based on which classes were assigned to it by the ENC. The little database experience I have suggested that this way was best as it avoided the sprawl of host groups and services being redefined on a per host basis. A lot of other blogs suggest this is the way to go, but allude to the fact that they have a node variable which controls membership, and doesn’t provide the dynamism that we wanted.

Plan Of Attack

With puppet being declarative, using global or class variables to flag which host groups to include wont work as you’re at the mercy of compilation order, which by its very nature is non-deterministic. It gets worse with global variables when using fat storeconfigs, as a definition on one host seems to get propagated to all the others, and thin configs don’t export variables. Placing the exported nagios host definitions in a post-main stage suffered from the scope being reset.

The next idea was to use the ENC, foreman in our case, to generate the host groups. Problem here was our foreman host groups refer to hardware platforms, whereas our nagios host groups refer to software groups. And defining per host host-group membership isn’t going to cut it!

And then there was the eureka moment. Facter 1.7 supports arbitrary facts being generated from files in /etc/facter/facts.d. Facts are available at compilation time, so regardless of ordering they are always available. Better still we can generate them dynamically per host based on the selected profiles and collect them using an ERB template. And here’s how…

Dynamic Nagios Host Groups In Puppet

First piece of the puzzle is to define a module to allow easy generation of custom facts, first by creating the directory structure

class facter::config {
  File {
    ensure => directory,
    owner  => 'root',
    group  => 'root',
    mode   => '0755',
  }

  file { '/etc/facter': } ->
  file { '/etc/facter/facts.d':
    recurse => true,
    purge   => true,
  }
}

Then by creating the reusable definition

class facter {
  define fact ( $value = true ) {
    file { "/etc/facter/facts.d/${title}.txt":
      ensure  => file,
      owner   => 'root',
      group   => 'root',
      mode    => '0644',
      content => "${title}=${value}",
      require => Class['facter::config'],
  }
}

Next up we define a set of virtual host groups. The idea being we can realise them in multiple locations based on profile and not worry about collisions. Think multiple profiles having an apache vhost, rather than fork the community apache sub-module we can just attach at the profile level.

class icinga::hostgroups {
  include facter
  @facter::fact { 'hg_http': }
  @facter::fact { 'hg_ntp': }
}

Then to finally create the facts on the host system something like the following will suffice

class profile::http_server {
  include icinga::hostgroups
  realize Facter::Fact['hg_http']
}

The final piece of the jigsaw is the host definition itself

class icinga::client {
  @@nagios_host { $::hostname:
    ensure     => present,
    alias      => $::fqdn,
    address    => $::ipaddress,
    hostgroups => template('icinga/hostgroups.erb'),
  }
}

And the ERB template to gather the facts we’ve exported

hg_generic<% -%>
<% scope.to_hash.keys.each do |k| -%>
<% if k =~ /(hg_[\w\d_]+)/ -%>
<%= ',' + $1 -%>
<% end -%>
<% end -%>

And there you have it. I’m by no means an expert at either puppet or ruby so feel free to suggest better ways of achieving the same end result. Note this isn’t production code, just off the top of my head, so there may be some mistakes, but you get the gist. Happy monitoring!