I am working for a log analyzer system,which read the log of tomcat and display them by a chart/table in web page. (I know there are some existed log analyzer system,I am recreating the wheel. But this is my job,my boss want it.)
Our tomcat log are saved by day. For example:
2011-01-01.txt
2011-01-02.txt
......
The following is my manner for export logs to db and read them:
1 The DB structure
I have three tables: 1)log_current:save the logs generated today.
2)log_past:save the logs generated before today.
The above two tables own the SAME schema.
+-------+-----------+----------+----------+--------+-----+----------+----------+--------+---------------------+---------+----------+-------+
| Id | hostip | username | datasend | method | uri | queryStr | protocol | status | time | browser | platform | refer |
+-------+-----------+----------+----------+--------+-----+----------+----------+--------+---------------------+---------+----------+-------+
| 44359 | 127.0.0.1 | - | 0 | GET | / | | HTTP/1.1 | 404 | 2011-02-17 08:08:25 | Unknown | Unknown | - |
+-------+-----------+----------+----------+--------+-----+----------+----------+--------+---------------------+---------+----------+-------+
3)log_record:save the information of log_past,it record the days whose logs have been exported to the log_past table.
+-----+------------+
| Id | savedDate |
+-----+------------+
| 127 | 2011-02-15 |
| 128 | 2011-02-14 |
..................
+-----+------------+
The table shows log of 2011-02-15 have been exported.
2 Export(to db)
I have two schedule work.
1) day work.
at 00:05:00,check the tomcat log directory(/tomcat/logs) to find all the latest 30 days log files(of course it include logs of yesterday.
check the log_record table to see if logs of one day is exported,for example,2011-02-16 is not find in the log_record,so I will read the 2011-02-16.txt,and export them to log_past.
After export log of yesterday,I start the file monitor for today's log(2011-02-17.txt) not matter it exist or not.
2)the file monitor
Once the monitor is started,it will read the file hour by hour. Each log it read will be saved in the log_current table.
3 tomcat server restart.
Sometimes we have to restart the tomcat,so once the tomcat is started,I will delete all logs of log_current,then do the day work.
4 My problem
1) two table (log_current and log_past).
Because if I save the today's log to log_past,I can not make sure all the log file(xxxx-xx-xx.txt) are exported to db. Since I will do a check in 00:05:00 every day which make sure that logs before today must be exported.
But this make it difficult to query logs accros yestersay and today.
For example,query from 2011-02-14 00:00:00 to 2011-02-15 00:00:00,these log must be at log_past.
But how about from 2011-02-14 00:00:00 to 2011-02-17 08:00:00 ?(suppose it is 2011-02-17 09:00:00 now).
It is complex to query across tables.
Also,I always think my desing for the table and work manner(schedule work of export/read) are not perfect,so anyone can give a good suggestion?
I just need to export and read log and can do a almost real-time analysis where real-time means I have to make logs of current day visiable by chart/table and etc.
First of all, IMO you don't need 2 different tables
log_current
andlog_past
. You can insert all the rows in the same table, saylogs
and retrieve usingselect * from logs where id = (select id from log_record where savedDate = 'YOUR_DATE')
This will give you all the logs of the particular day.Now, once you are able to remove the current and past distinction between tables using above way, I think the problem you are asking here would be solved. :)