Grepping multiline log messages
Grep is very useful when browsing a lot of log messages for specific terms. However some applications don't always adhere to the concept one message - one line. So sometimes you end up with messages varying from one to tens of lines. You can of course specify how many lines to display before and after match is found, but you still end up with either truncated messages or lines that you don't expect in you output.
Pcregrep - the multiline grep tool
In this case the pcregrep program comes very handy. This is a grep clone that implements PCRE (Perl compatible regular expressions). One nice feature of it is that it can search for match in multiple lines. Let's take the following piece of log file testdata.txt:
[2011-06-01 10:10:10] Application startup
[2011-06-01 11:11:11] Got request, params are:
remote_ip: 1.2.3.4
user: foo
[2011-06-01 12:12:12] Some error message
[2011-06-01 13:13:13] Got request, params are:
remote_ip: 1.2.3.6
user: bar
request_id: 10
[2011-06-01 10:10:10] Application shutdown
Now we want to extract only the two rows, containing information about the requests. We construct a regex that matches everything that contains the string 'Got request' and is placed between tags ([.*]):
\[[^\[]*\].*Got request[^\[]*
We use it in conjunction with the multiline switch -M and get
$ pcregrep -M '\[[^\[]*\].*Got request[^\[]*' testdata.txt
[2011-06-01 11:11:11] Got request, params are:
remote_ip: 1.2.3.4
user: foo
[2011-06-01 12:12:12] Some error message
[2011-06-01 13:13:13] Got request, params are:
remote_ip: 1.2.3.6
user: bar
request_id: 10
[2011-06-01 10:10:10] Application shutdown
Not bad, but we still have an extra line. Adding the -o (display only matching) does something:
$ pcregrep -o -M '\[[^\[]*\].*Got request[^\[]*' testdata.txt
[2011-06-01 11:11:11] Got request, params are:
remote_ip: 1.2.3.4
user: foo
[2011-06-01 13:13:13] Got request, params are:
remote_ip: 1.2.3.6
user: bar
request_id: 10
To remove the extra blank line, pipe the result to plain old grep and tell it to remove empty lines:
$ pcregrep -o -M '\[[^\[]*\].*Got request[^\[]*' testdata.txt | grep -v "^$"
[2011-06-01 11:11:11] Got request, params are:
remote_ip: 1.2.3.4
user: foo
[2011-06-01 13:13:13] Got request, params are:
remote_ip: 1.2.3.6
user: bar
request_id: 10
Now we have only matches that we wanted.
No comments yet
This page was last modified on 2024-10-03 19:38:19