In this, the third article in our Collector Highlight Series, we take a look at collectd, a fantastic enterprise-grade collection agent. Librato features a native integration with collectd, enabling instant server metrics and shareable dashboards out of the box.
What Is It?
Collectd is a modular metrics collection daemon written in C. It loops through a list of user-specified plug-ins, executing each to gather performance metrics from the OS, or locally-running userspace processes. Once gathered, Collectd outputs these metrics on a set interval using one or more output plugins to targets like log files, aggregation daemons like StatsD, and metrics processing systems like Librato. Collectd is a great way to begin collecting data, offering tons of useful metrics for a very small operational investment.
Collectd is a great fit for you if:
You want a flexible stand-alone collection agent to collect performance metrics from your systems using the stand-alone agent pattern.
You’re running Virtual Machine instances and want to grab per-instance CPU/Disk/Memory metrics.
You want a simple way to collect metrics from running server processes like MySQL, Apache, Redis, Nginx, or MongoDB.
You’re looking for a well documented, widely used, and trusted open-source collection agent that is available on most Linux distributions.
How Do I Install It?
Collectd gets installed on every system you want to monitor, and it's pretty simple to install. It runs as a stand-alone daemon process and is configured by way of a classical UNIX conf file in /etc/collectd. You can obtain and build collectd from source, but packages exist for all major distros, and most small ones.
Getting started with collectd on a Debian-based system is straightforward:
apt-get install collectd
How Does It Work?
Collectd generally follows the Stand-Alone Agent Pattern. It runs on every host you want to monitor, and either reports directly to an upstream metrics aggregator, or writes metrics to the local filesystem.
Starting the daemon
If collectd isn’t already running, you should be able to start the daemon using the appropriate init method for your OS, or directly by executing collectdmon. If collectd won’t run, or if it appears to be constantly restarting, you can run it manually with an -f switch, which will prevent it from forking into the background.
Collectd also includes the -t switch, which tests the validity of the configuration file and is helpful for troubleshooting startup problems.
Collectd's behavior is dictated by two types of plugins. Input plugins gather performance data from the OS or applications running on the system. The CPU input plugin, for example, interacts with the OS to measure the same CPU-related metrics returned by the UNIX top command, like the percent of time the CPU spends executing user-space processes or waiting on I/O.
The nginx plugin, by comparison, queries a running nginx server to gather metrics like the current number of requests and connection information. Users are encouraged to write their own plugins to pull data from specific resources and contribute them back to the project so others can benefit from them.
Output plugins are then used to send the gathered metrics data to other services for storage or analysis. The write_http plugin is one example of an output plugin, sending metrics data to a remote webserver in the prescribed JSON format. Other output plugins support graphing systems like RRDTool, the AMQP message transport, or even humble CSV files.
Many plugins exist for collectd. The default collectd installation on the current Ubuntu LTS (trusty) comes preconfigured with 100 plugins, 14 of which are automatically enabled. To give you a feel for the sorts of metrics that are collected out of the box, here are all of the input plugins that are enabled by default:
- battery - For systems with internal batteries like laptops
- cpu - CPU stats (%wait, %user etc..)
- df - Filesystem capacity (inodes free etc..)
- disk - Disk Performance (I/O per second)
- entropy - Measures the effectiveness of the PRNG
- interface - Network Interface (I/O per second)
- irq - Times per second the OS has handled an interrupt
- load - 1, 5 and 15 minute Load average
- memory - RAM usage
- processes - Number of processes grouped by state (Running/Sleeping/Stopped etc..)
- swap - Swap capacity and usage
- users - Number of users currently logged in
Plugin configuration and dependencies
For each plugin that collectd loads, there is a LoadPlugin line in the collectd.conf file. Some plugins require only this line, although most require some additional configuration to do things like specify formats, or locate files or directories in the filesystem.
A few plugins depend on other plugins to operate. A notable example is the JMX plugin, which requires the Java plugin to function. Settings and dependency information for each plugin are fully documented at the collectd wiki.
Collectd's polling interval is controlled by the Interval attribute in the collectd.conf file. Because many upstream visualization tools make assumptions based on this interval, you should think carefully about your desired resolution, set it once and avoid changing it, and take steps to ensure that this setting remains the same on every host.
With collectd's network plugin it's possible to specify one or more collection servers, to which all hosts emit their metrics. This can simplify per-host configuration and minimize network access control permissions, providing a means to aggregate and proxy a site-wide metrics stream by configuring the server to write to an upstream service like Librato.
Emitting to Librato
Our turn-key collectd integration uses the native collectd write_http plugin, which is included in every collectd source and binary package since version 4.8. Simply enable our collectd integration from your account settings, and we'll provide you with a write_http plugin configuration stanza to append to your collectd.conf file.
By default, Librato will use the fully qualified domain name (FQDN) of the sending host as the metric source name. If you use cloud-based or otherwise ephemeral infrastructure, and wish to specify the source name, you can override the system FQDN by setting the hostname attribute in the collectd.conf.
Librato’s turn-key collectd integration includes a service-side metrics filter. Configuring a service-side whitelist makes it simple to standardize the set of collectd metrics you store across every source and avoid surprises on your monthly bill.
Hints, Tradeoffs, and Gotchas
Modifying collectd's polling interval will affect the resolution of your metrics in upstream visualization systems. Some systems handle this better than others. RRDTool, for example, is heavily dependant on a pre-configured polling interval, so changing this setting could render your existing RRD's inoperable. Again, set it carefully, and don’t touch it.
Whether you're running ephemeral cloud instances or physical hardware, collectd and Librato are a simple, scalable, and powerful solution for systems monitoring. Sign up for a free trial today, and we'll have you correlating systems metrics across your infrastructure in minutes.