I have a Apache access.log file, which is around 35GB in size. Grepping through it is not an option any more, without waiting a great deal.
I wanted to split it in many small files, by using date as splitting criteria.
Date is in format [15/Oct/2011:12:02:02 +0000]
. Any idea how could I do it using only bash scripting, standard text manipulation programs (grep, awk, sed, and likes), piping and redirection?
Input file name is access.log
. I'd like output files to have format such as access.apache.15_Oct_2011.log
(that would do the trick, although not nice when sorting.)
I made a slight improvement to Theodore's answer so I could see progress when processing a very large log file.
Also I found that I needed to use
gawk
(brew install gawk
if you don't have it) for this to work on Mac OS X.Perl came to the rescue:
Well, it's not exactly "standard" manipulation program, but it's made for text manipulation nevertheless.
I've also changed order of arguments in file name, so that files are named like access.apache.yyyy_mon_dd.log for easier sorting.
Kind of ugly, that's bash for you:
One way using
awk
:This will output files like:
Against a 150 MB log file, the answer by chepner took 70 seconds on an 3.4 GHz 8 Core Xeon E31270, while this method took 5 seconds.
Original inspiration: "How to split existing apache logfile by month?"
Here is an
awk
version that outputs lexically sortable log files.Some efficiency enhancements: all done in one pass, only generate
fname
when it is not the same as before, closefname
when switching to a new file (otherwise you might run out of file descriptors).Pure bash, making one pass through the access log: