Asynchronous proxy node

Context

We already discussed that munin-update is the fragile link in the munin architecture. A missed execution means that some data is lost.

The problem : updates are synchronous

In Munin 1.x, updates are synchronous : the epoch and value in each service are the ones munin-update retrieves each scheduled run.

The issue is that munin-update has to ask every service on every node every run for their values. Since the values are only computed when asked, munin-update has to wait quite some time for every value.

This design is very simple, it therefore enables munin to have the simplest plugins since they are completely stateless. While being the greatest strength of munin, it still puts a severe blow on scalability : more plugins and/or nodes means obviously a slower retrieval.

Evolving Solution : Parallel Fetching

1.4 addresses some of these scalability issues by implementing parallel fetching. It takes into account that the most of the execution time of munin-update is spent waiting for replies.

Note that there is the max_processes configuration parameter that control how many nodes in parallel munin-update can ask.

Now, the I/O part is becoming the next limiting factor, since updating many RRD files in parallel means massive and completely random I/O for the underlying munin-master OS.

Serializing & grouping the updates is possible with the rrdcached daemon from rrdtool starting at 1.4 and on-demand graphing. This looks very promising, but doesn’t address the root defect in this design : a hard dependence of regular munin-update runs. And upon close analysis, we can see that 1.4 isn’t ready for rrdcached as it asks for flush each run, in munin-limits.

2.0 : Stateful plugins (supersampling)

2.0 provides a way for plugins to be stateful. They might schedule their polling themselves, and then when munin-update runs, only emit collect already computed values. This way, a missed run isn’t as dramatic as it is in the 1.x series, since data isn’t lost. The data collection is also much faster because the real computing is done ahead of time. This behavior is called supersampling.

2.0 : Asynchronous proxy node

But changing plugins to be self-polled is difficult and tedious. It even works against of one of the real strength of munin : having very simple, therefore stateless, plugins.

To address this concern, a proxy node was created. For 2.0 it takes the form of 2 tools : munin-asyncd and munin-async.

The proxy node in detail (munin-async)

Overview

These 2 processes form an asynchronous proxy between munin-update and munin-node. This avoids the need to change the plugins or upgrade munin-node on all nodes.

munin-asyncd should be installed on the same host than the proxied munin-node in order to avoid any network issue. It is the process that will poll regularly munin-node. The I/O issue of munin-update is here non-existent, since munin-async stores all the values by plainly appending them in text files without any processing. The files are defined as one per plugin, rotated per a timeframe.

Theses files are later read by munin-async client part that is typically accessed via ssh from munin-update. Here again no fancy processing is done, just plainly read back to the calling munin-update to be processed there. This way the overhead on the node is minimal.

The nice part is that the munin-async client does not need to run on the node, it can run on a completely different host. All it takes is to synchronize the spoolfetch dir. Sync can be periodic (think rsync) or real-time (think NFS).

In the same idea, the munin-asyncd can also be hosted elsewhere for disk-less nodes.

Specific update rates

Having one proxy per node enables a polling of all the services there with a plugin specific update rate.

To achieve this, munin-asyncd optionally forks into multiple processes, one for each plugin. This way each plugin is completely isolated from others. It can set its own update_rate, it is isolated from other plugins slowdowns, and it does even completely parallelize the information gathering.

SSH transport munin-async-client uses the new SSH native transport of 2.0. It permits a very simple install of the async proxy.

Notes

In 1.2 a service is the same as plugin, but since 1.4 and the introduction of multigraph, one plugin can provide multiple services. Think it as one service, one graph.

Installation

munin-async is a helper to poll regularly

The munin asynchronous proxy node (or “munin-async”) connects to the local node periodically, and spools the results.

When the munin master connects, all the data is available instantly.

munin-asyncd

The Munin async daemon starts at boot, and connects to the local munin-node periodically, like a munin master would. The results are stored the results in a spool, tagged with timestamp.

You can also use munin-asyncd to connect to several munin nodes. You will need to use one spooldir for each node you connect to. This enables you to set up a “fanout” setup, with one privileged node per site, and site-to-site communication being protected by ssh.

munin-async

The Munin async client is invoked by the connecting master, and reads from the munin-async spool using the “spoolfetch” command.

Example configuration

On the munin master

We use ssh encapsulated connections with munin async. In the the munin master configuration you need to configure a host with a “ssh://” address.

[random.example.org]
  address ssh://munin-async@random.example.org

You will need to create an SSH key for the “munin” user, and distribute this to all nodes running munin-asyncd.

The ssh command and options can be customized in munin.conf with the ssh_command and ssh_options configuration options.

On the munin node

Configure your munin node to only listen on “127.0.0.1”.

You will also need to add the public key of the munin user to the authorized_keys file for this user.

  • You must add a “command=” parameter to the key to run the command specified instead of whatever command the connecting user tries to use.
command="/usr/share/munin/munin-async --spoolfetch" ssh-rsa AAAA[...] munin@master

The following options are recommended for security, but are strictly not necessary for the munin-async connection to work

  • You should add a “from=” parameter to the key to restrict where it can be used from.

  • You should add hardening options. At the time of writing, these are “no-X11-forwarding”, “no-agent-forwarding”, “no-port-forwarding”, “no-pty” and “no-user-rc”.

    Some of these may also be set globally in /etc/ssh/sshd_config.

no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty,no-user-rc,from="192.0.2.0/24",command="/usr/share/munin/munin-async --spoolfetch" ssh-rsa AAAA[...] munin@master

See the sshd_config (5) and authorized_keys(5) man pages for more information.