Linux: Search logs using bash tools

By | February 8, 2017

Sometimes we need to find specific things in application vast logs, sometimes 10 or 20 rotated logs each hundred of MBs each.
For a quick search without using some specialized log viewer we can use the very powerful bash text processing commands.

Problem:
We have 10 log files each 100MB and we need to find out all the file names of the files that are processed in a certain stage.
We know that an entry of the log that reports the processing looks like (note this is a Websphere Application log):

[7/21/16 16:09:28:831 IST] 00000116 MessageInputH I com.xxx.tp.impl.MessageInputHandlerImpl b MessageInputHandlerImpl: File processor started processing of message: CREATE-21072016-000009.zip

We want to find a list of all the files from all the logs.

STEP 1: Find all the entries in all the files that match the above line
The easy way is to use find.

find . -type f -exec grep -H 'MessageInputHandlerImpl: File processor started processing of message:' {} \;

The result is a list of all the entries from all files.

...
[7/21/16 16:09:28:831 IST] 00000116 MessageInputH I com.xxx.tp.impl.MessageInputHandlerImpl b MessageInputHandlerImpl: File processor started processing of message: CREATE-21072016-000009.zip
...

The above will look in all the files from the current directory (.) and will search all the lines containing ‘MessageInputHandlerImpl: File processor started processing of message:’

STEP 2: Separate only the information we need after some separator.

We use cut -d that works as a string tokenizer. We change the expression to.

find . -type f -exec grep -H 'MessageInputHandlerImpl: File processor started processing of message:' {} \; | cut -d ' ' -f16 

The result will be a list of sub-strings from the found entries at Step 1 but containing only the 16th token, where token separator is ‘ ‘.

...
CREATE-21072016-000009.zip
...

We can continue further and for example determine all the business dates when the files were generated. Note that we know that the second token from the file name is the business date. We change the command by adding another tokenizer command in the pipe and the followed by sort (This command sorts a text stream or file forwards or backwards, or according to various keys or character positions.) and then uniq (This filter removes duplicate lines from a sorted file.)

find . -type f -exec grep -H 'MessageInputHandlerImpl: File processor started processing of message:' {} \; | cut -d ' ' -f16 | cut -d '-' -f2 | sort | uniq

The result will display a list of unique entries for each business date :

...
21072016
...

For reference and more details see Advanced Bash-Scripting Guide: Chapter 16. External Filters, Programs and Commands

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.