By Matt Delaney, DevOp, Edmonton
There is a vast treasure trove of information hiding inside the log files on your servers, but visualizing and exploring such a large amount of textual data can be difficult. There are a few problems you must first overcome.
For starters, these log files are often distributed over many machines, so you must first aggregate them into a single location. Next, you'll need to find the information you are looking for (which can involve doing a fair bit of command-line magic), and even then it may be difficult to visualize without the aid of some graphical tools.
Logstash directly addresses these common pain points by: aggregating log files, parsing them, inserting them into a searchable database (Elastic Search), and providing a number of ways to visualize the data (Kibana).
In this series of blog posts, I'll take a look at how Cybera was able to leverage Logstash to find the cause of an elusive performance issue that we experienced with our Learning Management Cloud. I'll then show how to set up logstash to monitor and analyze the log files of a number of different systems.
So without further ado, let's dive in.
In the Learning Management Cloud we run Moodle instances for a number of post-secondary institutions. Recently, we received reports of occasional slow Moodle page load times ' specifically, a few users said their quizzes were taking a long time (more than 15 seconds) to load or submit answers. Initially we had nothing more to go on than a few dates and times that people noticed the problem. A natural place to look was at the Apache access logs, so that is where we started.
The above graphs were generated using Kibana (often used as a front-end to visualize Logstash data). The left graph shows the number of Apache requests over time, the middle shows the mean response time for requests, and the right shows the mean number of bytes per request. The response time graph shows three distinct spikes, so this is a good sign that we're on the right track.
To look at only quiz related traffic, we filter out any request that does not have the word 'quiz' in the request path. This produces the following graphs over the same timeframe:
The middle graph shows the same spikes in response time, so it looks like we are on the right track. Now we need to look at what is happening just before each spike, and hopefully we will start to see a pattern.
Aside from creating nice graphs, Kibana can also show data in nice sortable tables. The table below is sorted to show requests in descending order of response time. Because there are so many requests, we narrowed the timespan to only encompass the first spike in response time and only consider POST requests (based on a hunch that anything that hits the system really hard would likely be the result of a POST request).
Sure enough, the very first request takes about 182 seconds to complete and it occurs at the very start of the first spike in response time. It is a similar story for the other two spikes ' both are preceded by a POST request to modduplicate.php. Below is a graph that we generated to show the occurrence of the load spikes, with the times of modduplicate requests tagged.
As you can see, each large spike is immediately preceded by a POST to modduplicate. It turns out that this request kicks off a database-intensive operation that created a bottleneck, slowing down all other requests that hit the database drastically. When we did a similar investigation of other spikes in response time, we found they were also preceded by similar database-intensive requests.
Now, imagine attempting to do the above analysis using typical command-line tools. Just the act of finding the exact times of the spikes in response time would have been quite difficult and time consuming, and that was only the first step in locating possible causes.
Hopefully, this gives you an idea of how powerful and time-saving Logstash can be. In my next few posts, I will go step-by-step through the process of setting up a logstash server, monitoring log files, and analyzing the data.