Introduction to Logstash: Getting Started

By Matt Delaney, DevOp, Edmonton

In my last post I talked a bit about how Cybera uses Logstash in its infrastructure to solve some performance issues we ran into on our Learning Management Cloud. In today'€™s blog I'€™ll take a closer look at what Logstash is. We'€™ll also get our hands dirty and set up a very simple Logstash instance (on a single machine) and start pulling in some logs. For simplicity'€™s sake, I will assume that you are going to run Logstash in Linux (specifically Ubuntu).

The purpose of Logstash is to get events from any number of inputs (could be from a file, a queue, another Logstash instance, etc), apply filters (parse, modify, or perform any number of processing tasks), and finally output to any number of destinations (Elastic Search, Nagios, another Logstash instance, etc).

As I mentioned, Logstash works with events. The simplest definition of an event is that it is really just a timestamp with some data. Let'€™s consider an Apache access log. Here, an event starts out as a line in your apache access log that, among other things, contains an IP address, timestamp, verb, request, response code, agent and more.

The Logstash website has an excellent tutorial on setting up a standalone Logstash instance and introducing the configuration file. So, rather than duplicating their work, I suggest you go through these links now. I'€™ll wait here.

  1. Setting up a standalone Logstash instance
  2. Familiarizing yourself with Logstash'€™s config file

Okay, so at this point I'€™ll assume that you have a working standalone Logstash instance and a basic understanding of the configuration file. Lets start dealing with the Apache access logs.

First, if you don'€™t already have Apache installed, do so now (If you are running Ubuntu you can run sudo apt-get install apache2). Now make a request to your webserver by either loading it in your web browser, or you can use (curl localhost). Next, take a look at your access log (/var/log/apache2/access.log in Ubuntu) and you should see something like:

127.0.0.1 - - [05/Feb/2014:18:59:55 +0000] "GET / HTTP/1.1" 200 460 "-" "curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.1 zlib/1.2.3.4 libidn/1.23 librtmp/2.3"

There is a lot of information here that isn'€™t in a very readable format, so lets use Logstash to make life a little easier. Enter the following into your logstash config file:

input {
  # This will let us use our Apache access logs as an input
  file {
    type => "apache-access"
    path => ["/var/log/apache2/access.log*"]
    exclude => ["*.gz"]
  }
}
filter {
  if [type] == "apache-access" {
    # This will parse the apache access event
    grok {
      match => [ "message", "%{COMBINEDAPACHELOG}" ]
    }
  }
}
output {
  # this will send our event to Elastic Search
  elasticsearch {
    embedded => true
  }
}

The input and output sections should look familiar to you from the Logstash tutorial. Logstash is simply looking for new lines in the access log and outputting them to Elastic Search after we apply the filter.

The filter is the interesting part. Here we are using the grok filter to parse the access log. Grok lets us assign names to regular expressions and combine them to do more complex pattern matching. One of the patterns supplied by Logstash just happens to match the default combined apache log format. I'€™ll take a closer look at this shortly.

For now, lets fire up our standalone Logstash instance using this config file (assuming you went through the first tutorial):

java -jar logstash-1.3.3-flatjar.jar agent -f logstash.conf -- web

Next, make a few more requests to your Apache server using either your browser or curl. If you load up Kibana (http://localhost:9292 if you'€™re running this locally) you should see something similar to this:

Logstash5

You can see a list of fields we are parsing out of the log file on the left, and if you click on the event in the table, you'€™ll see the details in a much easier to read format. This table can easily be modified to show whichever fields are of interest.

So that works if you use the default log format, but lets say that you want to know how long it takes to serve each request. If we alter the log format for Apache by doing the following:

1.  Add the following line to /etc/apache2/apache2.conf

LogFormat "%h %l %u %t "%r" %s %b %D "%{Referer}i" "%{User-agent}i"" new_format

2.  In /etc/apache2/sites_enabled/000-default find the line

CustomLog ${APACHE_LOG_DIR}/access.log combined

and change it to

CustomLog ${APACHE_LOG_DIR}/access.log new_format

3.  Restart Apache (sudo service apache2 restart)

We'€™d now get grok parse errors for new access events. Go ahead make a request and then take a look at the event in Kibana. You'€™ll see that the event has the tag '€˜_grokparsefailure'€™ attached to it. In other words, we can'€™t use the default grok pattern for our new Apache logs, which means we need to craft our own.

When you find yourself in this situation there are two things you can do that are extremely valuable: 1) Using existing grok patterns as a starting point, and 2) Using a grok debugger.

  • The default patterns supplied by Logstash can be found here
  • A grok debugger can be found here

NOTE: when using the debugger it is useful to see what patterns it is including. To do this, look for the patterns link in the navigation bar at the top.

In this case, our new log is extremely close in format to the '€˜COMBINEDAPACHELOG'€˜ grok pattern, so that is probably a good place to start.

You can develop the new pattern would using the grok debugger, but first you'€™ll need to find the existing pattern, as COMBINEDAPACHELOG is not a pattern natively supported by the debugger. If you look in the git repository mentioned above you'€™ll find that COMBINEDAPACHELOG is defined as:

COMBINEDAPACHELOG %{COMMONAPACHELOG} %{QS:referrer} %{QS:agent}

When you put this pattern into the grok debugger you'€™ll notice it does not give a match because COMMONAPACHELOG is not in the list of default patterns. If you look up this pattern in the Logstash git repository you'€™ll see it is defined as:

COMMONAPACHELOG %{IPORHOST:clientip} %{USER:ident} %{USER:auth} [%{HTTPDATE:timestamp}] "(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})" %{NUMBER:response} (?:%{NUMBER:bytes}|-)

Therefore, the COMBINEDAPACHE pattern is really:

COMBINEDAPACHELOG %{IPORHOST:clientip} %{USER:ident} %{USER:auth} [%{HTTPDATE:timestamp}] "(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})" %{NUMBER:response} (?:%{NUMBER:bytes}|-) %{QS:referrer} %{QS:agent}

Now, if you use this pattern you'€™ll see that there is still no match (but at least the debugger knows about all the patterns we'€™re using). The difference between our old format and the new format is that we now added a number (the time taken to serve the request) right before the referrer. So, if we add that to the pattern we get:

%{IPORHOST:clientip} %{USER:ident} %{USER:auth} [%{HTTPDATE:timestamp}] "(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})" %{NUMBER:response} (?:%{NUMBER:bytes}|-) (?:%{NUMBER:response_time}|-) %{QS:referrer} %{QS:agent}

Sure enough, when you try this in the debugger, you get a match. So now we have a new pattern that we can define as:

NEWAPACHELOG %{COMMONAPACHELOG} (?:%{NUMBER:response_time}|-) %{QS:referrer} %{QS:agent}

So now that we have our new pattern we need a place to put it. We have two options here: We can either use it directly in our config file, or we can create a new pattern file and refer to the pattern in our configuration file by name. We'€™ll do the latter with the following steps:

1.  Create a new directory called patterns and a file in it called '€˜apache.pattern'€™ with the content as follows:

NEWAPACHELOG %{COMMONAPACHELOG} (?:%{NUMBER:response_time}|-) %{QS:referrer} %{QS:agent}

2. Edit the grok section in logstash.conf so it reads:

    grok {
       patterns_dir => "patterns"
       match => [ "message", "%{NEWAPACHELOG}" ]
    }

Now if you restart Logstash and make some requests you should see our new Apache log format being parsed as expected. Success!

We'€™ve covered a lot of ground. You should be able to start constructing grok patterns to parse other single line event logs. Next time, we'€™ll look at a larger scale deployment using Chef and start pulling logs in from multiple servers.