Jenkins Agent Nodes

This Jenkins Agent Nodes post covers:

  • What are they?
  • Why may I want one?
  • How do you create one?
    • tasks on On the Master/Server
    • tasks on On the Agent/Client
  • Other ways of creating Agent Nodes
  • Related posts and links

What are they?

Jenkins Agents  are small Java “Client” processes that connect back to a “Master” Jenkins instance over the Java Network Launch Protocol (JNLP).

Why may I want one?

Once it’s up and running, an Agent instance can be used to run tasks from a Master Jenkins instance on one or more remote machines providing an easy to use and flexible distributed system which can lend itself to a wide variety of tasks.

As these are Java processes, you are not restricted by architecture and can mix and match the OS’s of your agent nodes as required – Windows, Linux, UNIX, iSeries, OVMS etc – anything capable of running a modern version of Java (I think JNLP was introduced in 1.5?) and you can also group and categorize subsets of different types (both logical and physical) of Agents; intended use, availability, location, available resources, Cloud or VM versus Physical tin – anything that helps you decide when you want to use which host.

There are many different ways you can choose to utilize these nodes – they can be used to spread the load of an intensive build process over many machines when they are available, you can delegate specific tasks to specific machines only, or you can use labels to group different classes or types of Nodes that are available for certain tasks, making the most use of your available resources. You can also have Jenkins create Cloud server instances – Amazon EC2 for example – when certain thresholds are reached, and stop them when they are no longer required.

This post focuses on a pretty manual approach to the creation of Jenkins Agent Nodes with the intention of explaining them well enough to allow you to create them on any platform that can run a modern version of Java – there are probably simpler solutions depending on your needs and setup. A later post will touch on a few of the many possible uses for these nodes.

So, how do you create one?

There are several different ways to go about setting up an Agent, and the “best” approach depends on your situation, needs and environment(s) – for a simple Linux setup letting Jenkins do all the work for you makes life really easy, you can just select that option in your new Jenkins Agent Node and complete this screen to have Jenkins set it up for you:

Where the Username and Password are the credentials you want Jenkins to use to connect and start the Agent process on the remote server. This simple approach also allows the Master instance to initiate the JNLP connection and bring your agents online when required, avoiding any need to manually visit each agent node.

This keeps things nice and simple and reduces the admin overhead too,  but sometimes this type of approach can’t be used (on different OS’s like OVMS, iSeries, Windows etc) and I’m going to go on to outline what I think is the most “versatile” method – defining the Node on the Master instance, and manually setting up and starting the corresponding Agent/Client Node on the remote host – going through these steps should provide enough detail on how Agent Nodes work and connect to get one up and running on anything that can run a JVM.

1. On the Master/Server

Define the host: navigate to Jenkins > Manage Jenkins > Manage Nodes > New Node
Enter a suitable Node Name (I’d recommend something descriptive, and usually including the host name or part of it) then either select to create a “Dumb Agent” or copy an existing Node if you have one, then complete the configuration page similar to this:


where you specify the requested properties – path, labels, usage, executors etc. These are explained in more detail in the “?” for each item if required.

Here you can also state if you want to keep your Jenkins Agent for tied jobs only, or if it is to be utilized as much as possible – this obviously depends on your requirements. You can also specify the Launch method that best applies to your needs & requirements.

2. On the Agent/Client host

You don’t need to do very much to create a new agent node – typically if I’m setting up a few *NIX and Windows hosts I would archive a simple shell/DOS script that starts and manages the process along with the slave.jar file from the Master Jenkins instance. There are alternative methods that may suit your needs – you can start agents via SSH from the Master server for example and there’s a comparable method for Windows – but this simple approach should help you understand the underlying idea that applies to them all.

You can “wget” (or use a Browser on Windows) the slave.jar file directly from the Master Jenkins instance using the URL

http://[your jenkins host]:[port number]/jnlpJars/slave.jar

If you let JNLP initiate the process, the slave.jar will be downloaded from Jenkins automatically.

Note that Jenkins will inherit the effective permissions of the user that starts the process – this is to be expected, but it’s often worth having a think about the security aspects of this, along with the access requirements for the types of things you want your agent to be able to do… or not do.

On Windows hosts, you can use the jenkins-agent.exe to easily install Jenkins as a Windows Service, which can then be started at boot time and run under whatever user/permissions you wish set via the Services panel.

My *NIX “startagent.sh” script does a few environment/sanity checks, then kicks of the agent process something like so:

${NOHUP} ${JAVA} -jar slave.jar -jnlpUrl http://SERVERNAME:PORT/computer/USER__NODENAME/slave-agent.jnlp &

The HTTP URL there should match the one provided by the Jenkins page when you were defining the Node. If all goes well you should see the node state changed to Connected on the Master Hudson instance, and if not, then nohup.out should provide some pretty obvious pointers on the problem.

Some common causes are:

Jenkins host, port or node name wrong
Java version not found/wrong
Lack of write permissions to the file system
Lack of space (check /tmp too)
Port already in use
Errors in the jenkins-slave.xml file if you’ve tweaked it
Firewalls…

Jenkins also provides some health monitoring of the connected Node which you can see in the Jenkins > Nodes page:
Disk Space, Free Temp Space, Clock time/sync, Response Time and Free Swap are monitored
and you can have your Node taken off line if any of these passes a set threshold.

This should hopefully be enough info to provide an overview of what Jenkins Agents are, and enough to get one up and running on your chosen platform. Where possible it’s best to keep things simple – use SSH and let the Master instance manage things for you if you can – but when that’s not possible there are alternatives.

When I get the chance I will add some information on the next steps – creating and delegating jobs on Jenkins Agent Nodes, and some thoughts and suggestions for just a few of the many uses for this sort of distributed Build and Deployment system.

Related Posts and Links:

Monitoring Jenkins Slave Nodes with Groovy
– how to keep an  eye on your Jenkins Slaves

Jenkins Slave Nodes – using the Swarm Plugin
– automatically connect new Slave Nodes to create a “Swarm”

Getting the current user in Jenkins
– several approaches

Managing Jenkins as a service and starting at boot time
– on Linux & Windows

Jenkins plugins
– details on some of my most frequently used plugins

Jenkins DIY Information Radiators
– what they are for, and how to make your own

The Jenkins Wiki has more details information on Distributed Builds and different slave-launching strategies.

Feedback, questions and constructive comments are very welcome!

Some Useful Solaris Commands

Here are a few (mostly) Solaris tips and tricks I have found useful and wanted to keep a note of.

 

prstat

This provides similar info to top on Linux boxes – you can run it as plain old prstat, or give it some options. I like prstat -a as it reports on both processes and users. As with all of these commands, the man pages have further details.

 

xargs

Not just a Solaris command, but this is very useful on any *NIX box – I frequently use it to translate the output from the previous command in to something that can be undertood by the next one, for example:

find . -type f -name *.txt | xargs ls -alrt

Will translate and pass the output of the “find” command to ls in a way that ls understands.

 

pargs

I use the pargs command when I need to get more information on a running process than the Solaris ps utility will give (there’s no -v option) if they have a lot of arguments.

Call pargs with the PID of your choice, and it will display a nice list of each argument that the process was started with, for example:

> pargs 16446
16446:  /usr/jdk/jdk1.6.0/jre/bin/java com.MyJavaProgram
argv[0]: /usr/jdk/jdk1.6/jre/bin/java
argv[1]: com.MyJavaProgram
argv[2]: MyFirstArgument.ini
argv[3]: SomeOtherArg.txt
argv[4]: AndAnotherArg

pargs can also display all of this info on one line with the -l option (useful for scripting), and if you call it with -e it also displays all of the Environment variables too.

 

pwdx

Simply pass it a PID and it will tell you the current working directory for that process.

 

[g]rep

When writing a shell script that queries running processes, I often find my own script showing up in the results – for instance a script that does a “ps -eaf | grep MyProcessName” may pick up the java process I’m after (the running instance of “./MyProcessName“) and the grep process-check itself (as in the “ps -eaf | grep MyProcessname“).

A handy way to avoid this is by changing your search criteria to “grep [M]yProcessName” instead. Grep interprets (and effectively ignores) the square brackets, with the result that your grep query no longer matches its own search 🙂

 

I will add more when I think of them, if you have any good ones then please post them!

Persisting file permissions in a tar.gz file with Ant & tar

Discovered an interesting issue recently where file permissions were not being preserved when Taring up a build with Ant.

The existing approach was to chmod the files as desired then simply tar them all up and hope for the best:

<tar destfile=”dist/${hudson.build.tag}.tar.gz” basedir=”dist/” compression=”gzip” />

This doesn’t work,  but if you use tarfileset and set the filemode you can explicitly set things as required like this:

<tar destfile=”dist/${hudson.build.tag}.tar.gz” longfile=”gnu” compression=”gzip”>

<tarfileset dir=”dist/” filemode=”755″>

<include name=”**/*scripts/*.sh” />

<include name=”**/somescript.ext” />

</tarfileset>

<tarfileset dir=”dist/”>

<include name=”**/*” />

<exclude name=”**/*scripts/*.sh” />

<exclude name=”**/somescript.ext” />

</tarfileset>

</tar>

Here I am adding the first two scripts with chmod 755, then adding everything else which will get the default/umask permissions. I exclude the previous files – not sure if that’s required or not but I don’t want to risk overwriting them.

Now when you gunzip and tar xvf the resulting build, you get the required permissions.

There’s more info and further examples in the Apache Ant Manual.

Cheers,

Don

Using lock files in a bash shell script

This post is old (2011), there are many better ways to do this.
See https://www.unix.com/man-page/linux/1/flock/ for one example.
Also pgrep and lsof examples here:
https://www.baeldung.com/linux/bash-ensure-instance-running

Wrote this script recently – I had written a simple shell script that updated an HTML page with its output, then realised it would be all too easy for simultaneous writes to clobber the file.

This kind of concurrency can & should really be solved properly by using a database obviously, but it got me thinking and playing around and I ended up with the below – it’s clearly very “happy path” with loads of room for improvements – please feel free to suggest some or add to it 🙂


#!/bin/bash

#
# Example script that uses lock files to avoid concurrent writes
# TODO: loads more validation and error handling!
#
# www.DonaldSimpson.co.uk
# 25th May 2011

setup_lockfile(){
# set name of this program’s lockfile:
MY_NAME=`basename $0`
LOCKFILE=/tmp/lock.${MY_NAME}.$$
# MAX_AGE is how long to wait until we assume a lock file is defunct
# scary stuff, with loads of scope for improvement…
# could use fuser and see if there is a process attached/not?
# maybe check with lsof? or just bail out?
MAX_AGE=5
echo “My lockfile name is ${LOCKFILE}”
sleep 1
}

check_lock(){
# Check for an existing lock file
while [ -f /tmp/lock.${MY_NAME}* ]
do
# A lock file is present
if [[ `find /tmp/lock.* -mmin +${MAX_AGE}` > “0” ]]; then
echo “WARNING: found and removing old lock file…`ls /tmp/lock.${MY_NAME}*`”
rm -f /tmp/lock.${MY_NAME}*
else
echo “A recent lock file already exists : `ls /tmp/lock.${MY_NAME}* | awk -F. {‘print $2″.”$3″, with PID: ” $4’}`”
echo “Will wait until the lock file is over ${MAX_AGE} minutes old then remove it…”
fi
sleep 5
done
}

create_lock(){
# ok to carry on… create a lock file – quickly 😉
touch ${LOCKFILE}
# check we managed to make it ok…
if [ ! -f ${LOCKFILE} ]; then
echo “Unable to create lockfile ${LOCKFILE}!”
exit 1
fi
echo “Created lockfile ${LOCKFILE}”
}

cleanup_lock(){
echo “Cleaning up… ”
rm -f ${LOCKFILE}
if [ -f ${LOCKFILE} ]; then
echo “Unable to delete lockfile ${LOCKFILE}!”
exit 1
fi
echo “Ok, lock file ${LOCKFILE} removed.”
}

setup_lockfile
check_lock
create_lock

# Any calls to exit() from here on should first call cleaup_lock
# Do main processing tasks here…
sleep 20


# All Done.
cleanup_lock

Serving WordPress as the default page

Here’s a note of what I needed to do in order to get WordPress serving as the default site on my domain – it was originally at www.donaldsimpson.co.uk/wordpress/ and I wanted it to just be www.donaldsimpson.co.uk

A bit of a Google shows there are many ways to do this, but here’s how I did it:

vi /opt/bitnami/apache2/conf/httpd.conf

then comment the current entry and add a new one pointing to the htdocs dir for WordPress:

#DocumentRoot “/opt/bitnami/apache2/htdocs”
DocumentRoot “/var/www/html/donwp.freemyip.com”

Then restart Apache (/opt/bitnami/apache2/bin/apachectl restart or similar) after which you just need to go to the WordPress Admin General Settings page and change these values to point to the root of your site/domain:

WodPress address (URL) www.donaldsimpson.co.uk

Site address (URL) www.donaldsimpson.co.uk

And that should be that – you can now delete that backup you made at the start…

 

Update:

It’s may be a good idea to define your WP_HOME and WP_SITEURL in your wp-config.php file too, like so:

define(‘WP_HOME’, ‘https://www.donaldsimpson.co.uk’);
define(‘WP_SITEURL’, ‘https://www.donaldsimpson.co.uk’);

This avoids a database lookup to get these details, which should speed things up fractionally too 🙂

 

 

Quick directory listing for large file systems

 

Useful bit of Perl code – folk at work found this useful approach on the web somewhere – it’s much quicker than doing a recursive find apparently:

 

my @dirlist = ();

sub process_files

{

my $path = shift;

opendir (DIR, $path) or die “Unable to open $path: $!”;

my @files =

map { $path . ‘\’ . $_ }

grep { !/^.{1,2}$/ }

readdir (DIR);

closedir (DIR);

for (@files)

{

if (-d $_))

{

print $_.”n”;

push @dirlist, $_;

push @files, process_files ($_);

}

}

}

process_files(“.”);


Pardon the indentation/formatting 😉

Cheers,

 

Don