Monitoring Jenkins Slave Nodes with Groovy

June 7, 2013 - Reading time: 4 minutes

This script is now available on GitHub:

https://github.com/DonaldSimpson/groovy

Here is a simple little Groovy script to monitor the health of slaves in Jenkins and let you know when they fail.

I use a lot of Jenkins slaves on many different Jenkins master instances, and I often have something like this running in a Jenkins job that lets me know if there are issues anywhere – which it will do if any of the Jenkins Slave Nodes are offline or dead etc.

To set this up, you first need to install the Groovy Plugin (https://wiki.jenkins-ci.org/display/JENKINS/Groovy+plugin).

Do this in Jenkins by going to “Manage Jenkins -> Manage Plugins -> Available”, then search for “Groovy Plugin”.

Once that’s done, create a new “Freestyle” Jenkins job and add a Build Step to “Execute System Groovy Script”.

Here’s the code:

int exitcode = 0
for (slave in hudson.model.Hudson.instance.slaves) {
 if (slave.getComputer().isOffline().toString() == "true"){
 println('Slave ' + slave.name + " is offline!"); 
 exitcode++;
 }
}

if (exitcode > 0){
 println("We have a Slave down, failing the build!");
 return 1;
}

You can see what’s going on here, but to be clear the main processing logic is:

for each Slave;
-  if it's offline, increment the exit code

if the exit code is greater than zero;
 - return 1, which will fail the job and trigger an email alert.

Obviously this is deliberately simple, and you could extend it to go off and do any number of things – including trying to bring the slave back online again with something like this: https://wiki.jenkins-ci.org/display/JENKINS/Monitor+and+Restart+Offline+Slaves.

I then configure the job run periodically through the Jenkins cron (check “build periodically” in the build triggers section of your jobs config, then put in a cron schedule – “* * * * *” or “@midnight” for example), and to email me if it fails so I know there’s something up.

A word of warning: don’t try changing the “return 1” to something like System.exit(1)… I did that initially, and it killed the running Jenkins instance… doh! 🙂

So when the Groovy script detects an Offline Jenkins Slave Node, the console output should look something like this:


Building on master in workspace /apps/jenkins/jobs/MonitorSlaves/workspace
Slave <YOURSLAVENAME> is offline!
We have a Slave down, failing the build!
Script returned: 1
Build step 'Execute system Groovy script' marked build as failure
Finished: FAILURE
<send an email or whatever here...>

Cheers,

Don

Site Status:

Nov 2024: Migration from WordPress to Bludit complete, clean-up in progress...