Munin and Nagios¶
Munin integrates perfectly with Nagios. There are, however, a few things of which to take notice. This article shows example configurations and explains the communication between the systems.
Setting up Nagios passive checks¶
Receiving messages in Nagios¶
First you need a way for Nagios to accept messages from Munin. Nagios has exactly such a thing, namely the NSCA which is documented here: NSCA.
NSCA consists of a client (a binary usually named send_nsca
and
a server usually run from inetd
. We recommend that you enable
encryption on NSCA communication.
You also need to configure Nagios to accept messages via NSCA. NSCA is, unfortunately, not very well documented in Nagios’ official documentation. We’ll cover writing the needed service check configuration further down in this document.
Configuring Nagios¶
In the main config file, make sure that the command_file
directive is set and that it works. See
External Command File
for details.
Below is a sample extract from nagios.cfg
:
command_file=/var/run/nagios/nagios.cmd
The /var/run/nagios
directory is owned by the user nagios runs as.
The nagios.cmd
is a named pipe on which Nagios accepts external input.
Configuring NSCA, server side¶
NSCA is run through some kind of (x)inetd.
Using inetd¶
the line below enables NSCA listening on port 5667:
5667 stream tcp nowait nagios /usr/sbin/tcpd /usr/sbin/nsca -c /etc/nsca.cfg --inetd
Using xinetd¶
the lines below enables NSCA listening on port 5667, allowing connections only from the local host:
# description: NSCA (Nagios Service Check Acceptor)
service nsca
{
flags = REUSE
type = UNLISTED
port = 5667
socket_type = stream
wait = no
server = /usr/sbin/nsca
server_args = -c /etc/nagios/nsca.cfg --inetd
user = nagios
group = nagios
log_on_failure += USERID
only_from = 127.0.0.1
}
Common¶
The file /etc/nsca.cfg
defines how NSCA behaves.
Check in particular the nsca_user
and command_file
directives,
these should correspond to the file permissions and the
location of the named pipe described in nagios.cfg
.
nsca_user=nagios
command_file=/var/run/nagios/nagios.cmd
Configuring NSCA, client side¶
The NSCA client is a binary that submits to an NSCA server whatever it
received as arguments. Its behaviour is controlled by the file
/etc/send_nsca.cfg
, which mainly controls encryption.
You should now be able to test the communication between the NSCA client and the NSCA server, and consequently whether Nagios picks up the message. NSCA requires a defined format for messages. For service checks, it’s like this:
<host_name>[tab]<svc_description>[tab]<return_code>[tab]<plugin_output>[newline]
Below is shown how to test NSCA.
$ echo -e "foo.example.com\ttest\t0\t0" | /usr/sbin/send_nsca -H localhost -c /etc/send_nsca.cfg
1 data packet(s) sent to host successfully.
This caused the following to appear in /var/log/nagios/nagios.log
:
[1159868622] Warning: Message queue contained results for service 'test' on host 'foo.example.com'. The service could not be found!
Sending messages from Munin¶
Messages are sent by munin-limits based on the state of a monitored data source:
OK
, Warning
, Critical
and Unknown
(O/W/C/U).
Configuring munin.conf¶
Nagios uses the above mentioned send_nsca
binary to send messages to Nagios.
In /etc/munin/munin.conf
, enter this:
contacts nagios
contact.nagios.command /usr/bin/send_nsca -H your.nagios-host.here -c /etc/send_nsca.cfg
Note
Be aware that the -H
switch to send_nsca
appeared sometime after send_nsca
version 2.1. Always check send_nsca --help
!
Configuring Munin plugins¶
Lots of Munin plugins have (hopefully reasonable) values for Warning and Critical levels. To set or override these, you can change the values in munin.conf.
Configuring Nagios services¶
Now Nagios needs to recognize the messages from Munin as messages about services it monitors. To accomplish this, every message Munin sends to Nagios requires a matching (passive) service defined or Nagios will ignore the message (but it will log that something tried).
A passive service is defined through these directives in the proper Nagios configuration file:
active_checks_enabled 0
passive_checks_enabled 1
A working solution is to create a template for passive services, like the one below:
define service {
name passive-service
active_checks_enabled 0
passive_checks_enabled 1
parallelize_check 1
notifications_enabled 1
event_handler_enabled 1
register 0
is_volatile 1
}
When the template is registered, each Munin plugin should be registered as per below:
define service {
use passive-service
host_name foo
service_description bar
check_period 24x7
max_check_attempts 3
normal_check_interval 3
retry_check_interval 1
contact_groups linux-admins
notification_interval 120
notification_period 24x7
notification_options w,u,c,r
check_command check_dummy!0
}
Notes¶
host_name
is either the FQDN of the host_name registered to the Nagios plugin, or the host alias corresponding to Munin’s notify_alias directive. Thehost_name
must be registered as a host in Nagios.service_description
must correspond to the plugin’s name, and for Nagios to be happy it shouldn’t have any special characters. If you’d like to change the service description from Munin, use notify_alias on the data source. Available in Munin-1.2.5 and later.
A working example is shown below:
[foo.example.com]
address foo.example.com
df.notify_alias Filesystem usage
# The above changes from Munin's default "Filesystem usage (in %)"
What characters are allowed in a Nagios service definition?
See Nagios docs on Illegal Object Name Characters
service_description
: This directive is used to define the description of the service,
which may contain spaces, dashes, and colons (semicolons, apostrophes, and quotation
marks should be avoided). No two services associated with the same host
can have the same description. Services are uniquely identified with their host_name
and service_description directives.
Note
This means that lots of Munin plugins will not be accepted by Nagios. This limitation impacts every plugin with special characters in them, e.g. ‘(’, ‘)’, and ‘%’. Workarounds are described in ticket #34 and the bug has been fixed in the Munin code in changeset 1081.
Alternatively you can use check_munin.pl to gather fresh data from nagios instead of check_dummy.
Sample munin.conf¶
To illustrate, a (familiar) sample munin.conf configuration file shows the usage:
contact.nagios.command /usr/local/nagios/bin/send_nsca nagioshost.example.com -c /usr/local/nagios/etc/send_nsca.cfg -to 60
contacts none # Disables warning on a system-wide basis.
[example.com;]
contacts nagios # Enables warning through the "nagios" contact for the group example.com
[foo.example.com]
address localhost
contacts none # Disables warning for all plugins on the host foo.example.com.
[example.com;bar.example.com]
address bar.example.com
df.contacts none # Disables warning on the df plugin only.
df.notify_alias Disk usage # Uses the title "Disk usage" when sending warnings through munin-limits
# Useful if the receiving end does not accept all kinds of characters
# NB: Only available in Munin-1.2.5 or with the patch described in ticket 34.
Setting up Nagios active checks¶
Use check_munin.p to get data from munin-node directly into nagios and then use it as a regular check plugin. Basically munin-node become a kind of snmp agent with a lot of preconfigured plugins.