When development started on NGINX in 2002, the goal was to develop a web server which would be more performant than Apache had been up to that point. While NGINX may not offer all of the features available in Apache, its default configuration is able to handle approximately four times the number of requests per second, and accomplish this while using significantly less memory.
While switching to a web server with better performance seems like a no-brainer, it’s important that you have a monitoring solution in place to ensure that your web server is performing optimally, and that users who are visiting the NGINX-hosted site receive the best possible experience. But how do we ensure that the experience is as performant as expected for all our users?
This article is meant to assist you in putting together a monitoring plan for your NGINX deployments. We’ll look at what metrics you should be monitoring, why they are important, and then we’ll look at putting a monitoring plan in place using Librato.
Monitoring is a Priority
As engineers, we all understand and appreciate the value that monitoring provides. In the age of DevOps, however, when engineers are responsible for both the engineering of solutions and deployment of those solutions into a production environment, monitoring is often relegated to the list of things we plan to do in the future. In order to be the best engineers we can be, monitoring should be the priority from day one.
From an engineering perspective, accurate and effective monitoring allows us to test the efficiency of our solutions, as well as assist in the identification and troubleshooting of inefficiencies and other potential problems. Once the solution has moved to requiring operational support, monitoring allows us to ensure that the application is running efficiently, and alerting us when things go wrong. In addition, an effective monitoring plan should help to identify problems before they start, allowing engineers to resolve issues proactively, instead of being purely reactive.
Specific Metrics to Consider with NGINX
Before we can develop a monitoring plan, we need to know what metrics are available to be monitored, understand what they mean, and how we can use them. There are two distinct groups of metrics we should be concerned with—metrics related to the web server itself, and those related to the underlying infrastructure.
While a highly performant web server like NGINX may be able to handle more requests and traffic, it is vital that the machine hosting the web server has the necessary resources as well. Each metric represents a potential limitation affecting the performance of your application. Ultimately, you want to ensure your web server and underlying infrastructure are able to operate efficiently without approaching those limitations.
NGINX Web Server-specific Metrics
Usually measured in seconds, this is an indication of how long the web server has been running, and is useful for detecting server restarts.
Indicates the number of client connections with the server. This may include actual users and automated tasks or bots.
Each connection may be making one or more requests to the server. This number indicates the total count of requests coming in.
This is a measure of the amount of information traveling between the clients and the server, both uploads and downloads. This number will be affected by the type of traffic—For instance, if clients are uploading and downloading images, this number would be expected to be higher. It’s also important to measure load units as well (bytes, kilobytes, etc.)
An indication of the processing usage of the underlying machine. This should be measured as utilization across all cores, if using a multi-core machine.
Measurement of the memory currently in use on the machine.
Swap is what the host machine uses when it runs out of memory or if the memory region has been unused for a period of time. It is significantly slower, and is generally only used in an emergency. When an application begins using swap space, it’s usually an indicator that something is amiss.
Similar to traffic, this is a measurement of information flowing in and out of the machine. Again, load units are important to monitor here as well.
Even if the web server is not physically storing files on the host machine, space is required for logging, temporary files, and other supporting files.
Load is a performance metric which combines many of the other metrics into a simple number. A common rule of thumb is the “load” on the machine should be less than the number of processing cores.
Let’s look at how to configure monitoring on your instances with Librato, along with building a dashboard which will show each of those metrics.
Installing the Librato Agent on The Server
Before you start, you’ll need an account set up with Librato. If you don’t already have one, you can create a demo account which will give you 30 days to try the service, free of charge. Sign up at https://metrics.librato.com/sign_up.
The first thing you will want to do in order to allow Librato to aggregate the metrics from the server is to install the agent on all instances. In order to do this, you’ll need to reference your Librato API key when setting up the agent. Log into your Librato account and navigate to the Integrations page.
Locate the Librato Agent integration, and click on it. It should look similar to the image below.
I used the Easy Install option when setting up the instances for this article. Ensure that Easy Install is selected, and select your Linux distribution. I used an Ubuntu image in the AWS Cloud, but this will work on almost any Linux server.
Note: Prior to installation of the agent, the second box on the screen below will not contain the success message.
Copy the command from the first box, and then SSH into the server and run the Easy Install script.
When the agent installs successfully, you should be presented with the following message on your terminal. The “Confirm successful installation” box on the Librato agent screen should look similar to the above, with a green checkbox. You should also see “Nice! You’ve hooked up the Librato Agent correctly.”
Configuring the Librato Agent
With the agent installed, the next step is to configure NGINX to report metrics to the agent. Navigate back to the Integrations page and locate the NGINX integration.
Click on the integration, and the following panel will appear. Enable the integration, and then click on the Configuration button. A second panel will appear with specific configurations to be made to the instance. Some familiarity with NGINX and its configuration will be required at this point. Once these changes have been made, you’ll need to restart the NGINX service for the changes to take effect, and your metrics will start flowing from the server into Librato.
When everything is configured, either click on the NGINX icon under Associated Spaces, or navigate to the Spaces page directly, and then select the NGINX icon to view the default dashboard provided by Librato.
Working With the NGINX Space
The default NGINX Space provided by Librato offers many of the metrics we discussed earlier, related to the performance of the web server itself, and should look similar to the image below.
Now we need to add some additional metrics in order to get a full picture of the performance of our server. Unfortunately, you can’t make changes to the default space, but it’s really easy to create a copy, and then add additional metrics of your own. Start by clicking the Copy Space button at the top of the screen to create a copy.
Create a name for your custom space. For this example, I’m monitoring an application called Retwis, so I’m calling mine “NGNIX-Retwis.” It’s also helpful to select the “Open Space on Completion” option, so you don’t have to go looking for the space after it’s created.
Alright, let’s do some customization. First, we want to ensure that we’re only monitoring the instances we need to. We do this by filtering the chart or space, and you can find out more about how to set and filter these in the documentation for Dynamic Tags.
With our sources filtered, now we can add some additional metrics. Let’s look at CPU usage, Memory Usage, and Load. Look at the bottom right of the space, and click on the Plus button. For CPU and Memory usage, let’s add a Stacked chart. We’ll add one for each. Click on the Stacked icon.
In the Metrics search box, type “CPU” and hit enter. A selection of available metrics will appear below. I’m going to select AWS.EC2.CPUUtilization, but your selection may be different depending on the infrastructure you’re using. Select the checkbox next to the appropriate metric, and then click on Add Metrics to Chart. You can add multiple metrics to the chart by repeating the same process, but we’ll stick with one for now.
If you click on Chart Attributes, you can change the scale of the chart, adjust the Y-axis label, and even link it to another space, if you want to show more detail for a specific metric. When you’re done, click on the green Save button, and you’ll be returned to your space, with the new chart now added. Repeat this for Memory Usage. EC2 didn’t have a memory option, so I chose the “librato.memory.memory.used” metric.
For Load, I’m going to use a Big Number Chart Type, and select the librato.load.load.shortterm metric. When you’re done, your chart should look similar to what is shown below.
Pro tip: You can move charts around by hovering over a chart, clicking on the three dots which appear at the top of the chart, and dragging it around. Clicking on the menu icon on the top right of the chart will allow you to edit, delete, and choose other options related to the chart.
Once you have a monitoring plan in place and functioning, the next step is to determine baseline metrics for your application and then set up alerts which will be triggered when significant deviations occur. One such baseline which is useful to determine and monitor is traffic. A significant reduction in traffic may indicate a problem preventing clients from accessing the service. A significant increase in traffic would indicate an increase in clients, and may require either an increase in the capacity of your environment (in the case of increased popularity), or, potentially, the deployment of defensive measures in response to a cyber attack.
To learn more about Alerts, visit the Librato knowledge base, and view the documentation on Alerts.