At an Ansible meetup last week somebody put this question to me: “I’m coming from the Puppet world, and there they have whole structures around testing modules (the same thing as Ansible roles) like Rspec. How do you handle the same thing with Ansible?”. Ah, this is a question I’ve heard many times. So here are my thoughts on it. I wouldn’t do it. Why ever not?! I hear you cry. Read on…
Infrastructure as code has brought with it many good things from software development - not least the ability to define our infrastructures and ‘replay’ them, giving us the foundations of sound “Dev Ops” automation pipelines and the ‘cattle vs pets’ mantra. However, I don’t believe we need to take every last piece of development practices and apply to IaC.
When we model an infrastructure in something like Ansible we are, in effect, laying the foundations for our ‘house’. Get a bit of the foundations wrong, and the house won’t stand up. Less of the metaphor, getting literal - if your web server is configured wrong, it won’t serve web pages. But what if it serves a webpage, but not the webpage I hear you ask. Well, this is where the testing comes in.
A few years back I wrote some Perl code for a monitoring system. The code connected to a number of different kinds of VPN device, of varying versions, and scraped their web interfaces. The system I was replacing didn’t cope very well when versions of the VPN OS changed - it just fell over. I had written a lot of tests before the main coding, and a few of the tests checked for correct scenarios, and false scenarios. In effect I was looking at both sides of the coin - my thinking was in the future if the OS changed then the tests would reveal something that did or did not work (they were also useful to prove that version upgrades of the Perl module would still work correctly too). What I actually checked for was known good strings the monitoring software needed from the VPN devices.
This is my thinking around infrastructure code testing. Think big picture - test the goal you’re expecting to arrive at. Ansible modules are designed to ‘fail fast’ and achieve ‘desired state’. If they’re successful then you’ve implemented the state you want (put a file into place, template a configuration, ensure a service is running, etc). If the module doesn’t arrive at the desired state, it fails. If it fails, the run isn’t successful, and the goal you need to test for isn’t reached.
The point has also been raised about edge cases, such as ‘what if ignore_errors was used by accident, a copy n paste mistake, perhaps’. Well, again, I suggest test for the end goal. And I don’t just mean do one test - I refer to my anecdote above and testing ‘both sides of the coin’. Do a run of the configuration play, then do another run. Do both arrive at the same place - i.e. is the full service delivering what it is supposed to?
Bigger picture context is important - I want to stress that what is in my head isn’t just confined to testing a few plays. I imagine in a well thought out system, testing of infrastructure automation would be, well, automated. If I were building out another infra today I would have a CI job that took IaC definitions and spun up VMs to test the end to end, automatically. Why not have an overnight job that takes the day’s CM work and builds the application from the ground up? (then tests it is delivering what it should).
To me testing of individual roles is adding work, it’s minutiae that doesn’t matter - when the bigger picture is done correctly.
But, if it makes you feel safer about your infrastructure automation then why not? Go ahead and test every role - it’s perfectly possible to do. All I’m saying is I wouldn’t, I would put my effort into making sure the end goal is working correctly.
Because no matter how you get the ball down the pitch, the point of the game is still to get it through the goal posts.