Troubleshooting¶
This page lists some general troubleshooting strategies and methods for Munin.
Check node agent¶
Is the munin-node process (daemon) running on the host you want to monitor?
Did you restart the munin-node process after you made changes to its configuration?
Check connectivity¶
The examples show a munin-node agent running on 127.0.0.1; replace it with your node address.
Note
You can use netcat to port 4949.
Using telnet
was the previous recommended way as it was a fairly standard install.
We don’t recommend it anymore since netcat
is now almost as ubiquitous as telnet
and it offers a real native TCP connection, whereas telnet
does not.
Note that using socat also works perfectly, but it is not as mainstream.
Does the munin-node agent allow connections from your munin master?
Here we try to connect manually to the munin-node that runs on the Munin master host. It can be reached via IP address 127.0.0.1
or hostname localhost
and port 4949
.
Output of a netcat
session should be something like this:
# nc localhost 4949
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
# munin node at [your hostname]
Does the above output give the same hostname that should be expected upon configuration in munin.conf?
Note
If you have a fully qualified domain name (FQDN) in munin-node.conf, the host you’re monitoring has to identify itself with FQDN as well.
E.g. if the masters node tree has the following entry:
[foo.example.com]
address foo.example.com
…then a netcat session to the node should give you the following output:
# munin node at foo.example.com
Note
If the connection test fails, check the allow directive in munin-node.conf and make sure any firewalls allow contact on destination port 4949.
Check the Logs¶
Munin’s log files (typically below /var/log/munin/
) are a good source of information while debugging problems.
Log files of a munin-node:
munin-node.log
andmunin-node-configure.log
: configuration issues and connection messages
Log files of a munin master:
munin-cgi-graph.log
andmunin-graph.log
: issues with generating graphsmunin-cgi-html.log
andmunin-html.log
: issues with generating html contentmunin-update.log
: fetch configuration and values from a remote munin-nodemunin-limits.log
: generated alarms due to specified warning/critical thresholds
Debugging Plugins¶
Which plugins are enabled on the node?¶
Does munin-node recognize any plugins? Try issuing the command list
(being connected to the agent) and a (long) list of plugins should show.
# nc localhost 4949
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
# munin node at foo.example.com
list
open_inodes irqstats if_eth0 df uptime [...]
Note
Some plugins require specific capabilities (most notably: multigraph). These plugins do not show up in the list, unless the client announces this capability. For example type cap multigraph
before list
in order to also find multigraph plugins in the list.
Check a particular plugin¶
Check on agent host
Note
All the commands here need to be run as user root
. A common method of becoming root
is via the sudo
command, but refer to your local documentation for a more specific instruction.
Restart munin-node, as it only reads the plugin list upon start. (Good to test a plugin with munin-run, without enabling it right away.)
/etc/init.d/munin-node restart
Call munin-run on the monitored host to see whether the plugin runs through.
Try with and without the config
plugin argument. Both runs should not emit any error message.
Note
You can also use the --debug
flag, as it shows if the configuration file
is correctly parsed, mostly for UID & environment variables.
Regular run:
# munin-run df
_dev_hda1.value 83
Config run:
# munin-run df config
graph_title Filesystem usage (in %)
graph_args --upper-limit 100 -l 0
graph_vlabel %
graph_category disk
graph_info This graph shows disk usage on the machine.
_dev_hda1.label /
_dev_hda1.info / (ext3) -> /dev/hda1
_dev_hda1.warning 92
_dev_hda1.critical 98
Check from Munin master
Does the plugin run through munin-node, with and without config?
Regular run:
# nc foo.example.com 4949
Trying foo.example.com...
Connected to foo.example.com.
Escape character is '^]'.
# munin node at foo.example.com
fetch df
_dev_hda1.value 83
[...]
.
With config:
# nc foo.example.com 4949
Trying foo.example.com...
Connected to foo.example.com.
Escape character is '^]'.
# munin node at foo.example.com
config df
graph_title Filesystem usage (in %)
graph_args --upper-limit 100 -l 0
graph_vlabel %
graph_category disk
graph_info This graph shows disk usage on the machine.
_dev_hda1.label /boot
_dev_hda1.info /boot (ext3) -> /dev/hda1
_dev_hda1.warning 92
_dev_hda1.critical 98
[...]
.
If the plugin works for munin-run
but not through netcat
, you might have a $PATH
problem.
Note
Set {{{env.PATH}}} for the plugin in the plugin’s environment file.
Check Munin Master¶
Do the directories specified by dbdir
, htmldir
, logdir
and rundir
defined in munin.conf have the correct permissions? (If you first run munin as root, maybe they’re not readable/writeable by the user that runs the cron job)
Is munin-cron established as a cron controlled process, run as the Munin user?
Does the output when running munin-update as the Munin user on the server node show any errors?
Try running “munin-cron --debug > /tmp/munin-cron.debug
” and check the output file /tmp/munin-cron.debug
.
Check data collection¶
This step will tell you whether munin-update (the master) is able to communicate with munin-node (the agent).
Run munin-update as user munin
on the Munin master machine.
# su -s /bin/bash munin
$ /usr/share/munin/munin-update --debug --nofork --host foo.example.com --service df
You should get a line like this:
Aug 11 22:39:51 - [6846] Updating /var/lib/munin/example.com/foo.example.com-df-_dev_hda1-g.rrd with 57
After this, replace df
with the service you want to check, such as hddtemp_smartctl
.
If one of these steps does not work, something is probably wrong with the plugin or how munin-node talks to the plugin.
- Does the plugin run when executed directly? If it runs when executed as root and not through munin-run (as described above), the plugin has a permission problem. See this article on environment files.
- Does the plugin output contain too few, too many and/or illegal characters?
- Does Munin (munin-cron and its children) write values into RRD files? Hint:
rrdtool fetch [rrd file] AVERAGE
- Does the plugin use legal field names? See Notes on Field names.
- In case you loan data from other graphs, check that the fieldname.type is set properly. See Munin file names for a quick reference on what any error messages in the logs might indicate.
Frequent Incidents¶
SELinux blocks Munin plugins¶
- See the documentation start page for links to SELinux rules for Munin.
RRD files are filled with 0¶
although munin-node seems to show sane values.
- The plugin’s output shows GAUGE values, but were declared as COUNTER or DERIVE in the plugin’s config.
Note
GAUGE is the default data type in Munin! Any other data type for a field must be explicitly declared.
RRD files are filled with NaN
¶
although munin-node seems to show sane values.
- Check that there are no invalid characters in the plugin’s output.
- For new plugins let munin gather data for about 20 minutes and things will unwrinkle
munin-node won’t give any data¶
although it is configured properly.
- Check that there is a
.value
directive for every of the plugin’s field names (yes, I managed to forget that recently).
munin-node only temporary returns valid data¶
- Check that no race conditions occur. A typical race condition is updating a file with crontab while the plugin is trying to read the file.
The graphs are empty¶
- The plugin’s output shows GAUGE values, but were declared as COUNTER or DERIVE in the plugin’s config. (GAUGE is default data type in Munin)
- The files to be updated by Munin are owned by root or another user account
- The local user browser cache may be corrupt, especially if “most” graphs are displayed correctly and “some” graphs are blank. In Firefox (or your browser of choice) go to tools and clear recent history, then check to see if the graphs are now properly displayed.
A plugin’s graph is missing¶
Check the following conditions if there is no graph produced for plugin:
- the plugin file (or a symlink to it) is placed in the plugin directory (typically:
/etc/munin/plugins
)- the executable permission of the plugin file is set
- munin-node was restarted after the plugin was added
- user/group is configured for the plugin (if necessary)
- the plugin works as expected locally via munin-run
- the munin master supports all capabilities required by the plugin (e.g. type
cap multigraph
beforelist
in an interactivenc
/telnet
session)- no related error messages for this plugin appear in
/var/log/munin/munin-update.log
(on the munin master)- an rrd file is created on the munin master (e.g. below
/var/lib/munin
)
Other mumbo-jumbo¶
- Run the different stages in munin-cron manually, using
--debug
,--nofork
, something like this:
# su - munin -c "/usr/lib/munin/munin-update \
--debug --nofork \
--host foo.example.com \
--service df"