Simple service discovery using Ansible Tower and DNS

758 words, 4 minutes.

Service discovery is one of those things that can be extremely handy to have, yet difficult to implement in a simple fashion. There are lots of tools out there to help, but I wanted to make something simple and easy to maintain for my specific use case. So here’s the way I’ve done it, using a couple of DNS records and Ansible Tower.

For my particular need, all I wanted was for newly booted hosts to automatically find their configuration. Since I am using Ansible in its push based model, this meant allowing my newly provisioned host to ‘phone home’ - announcing its presence so that my centralised Ansible host could push the configuration. My centralisation is controlled by Ansible Tower.

But what if a network is fluid? What if I want a ‘baked template’ (for example, a VMware template) without hard coded hostnames in? How would my new host work out where to phone home, or for that matter, how to phone home?

What we need is a reliable, tried and tested, centralised infrastructure that can be used as an information source. What could possibly fit that need? Oh, yes, how about DNS? And how would we make use of it for this purpose? With two simple records - an SRV and a TXT.

In the vein of keeping things simple, on this particular network I am using dnsmasq. It wouldn’t be much more complex to use BIND, of course.

I’ve created two records in dnsmasq, one to allow my hosts to discover where to phone home to, and a second to provide information on what to ask for. Here are those records:

txt-record=lan,"59301625893930005661466df034870d 20"

Let me explain them.

There are two SRV records - you’ll notice the difference being an apparent hostname, and a number towards the end of the line. The line is described like so:

srv_record, hostname, port, priority, weight

Priority and weight allow us to have multiple records for resiliency, with a lower priority taking precedence, and higher weights of the same priority being more important. I tend to skip using weightings, and set the priority fields alone. So here, “tower.lan” takes priority over “tower2.lan”.

Secondly we have the TXT record, with the first portion being a hostname - or in this case, the whole domain, and the second portion being a free form amount of text. I’ve kept this pretty simple, because it works nicely for this next portion.

Ansible Tower has a really neat function called ‘callbacks’, which exposes a given job template as a REST URL. This then permits calling the URL with tools as simple as cURL. It isn’t intended for heavy use (i.e. it’s not there to implement a pseudo client/server architecture) but for ‘phoning home’ to get an initial configuration, it’s brilliant.

Poking the callback URL needs three things to be in place for it to work - a ‘belt and braces’ approach if you like. First, the calling host (our ‘client’) needs to be known to Tower’s inventory; "your name’s not down, you’re not coming in". The client then needs to poke the URL with two pieces of information - a “host config key”, which Tower generates for us, and a job ID, which we get from the template.

Shipped with Tower is a nice little wrapper around this call - look for in /usr/share/awx.

Pulling all this together, I’ve included as part of my VM template, and I simply run it at first boot like so: $(dig +short _cm._tcp.lan srv | awk '/^0/ {print $4}') $(dig +short lan txt | tr -d '"')

The two dig commands grab the information needs - the host to contact and the config key and job ID.

Discovering information like this gives me the flexibility to move Tower hosts around, and even to change the configuration job that’s requested; without having to do anything to my baked template. I can even offer different jobs to hosts with different DNS views - using a more sophisticated DNS server like BIND could give us quite a bit more flexibility. You could take the TXT record theory even further - using different records to provide different actions.

Fundamentally what this has given me is a lot of flexibility without having to ‘reinvent the wheel’.

Lastly, the keen eyed among you will notice my ‘cheap’ call here ignores my SRV record resiliency. Yes, indeed it does. I leave that as an exercise to the reader. You might like to write your own wrapper around cURL, for example.