I have a big log file with huge volume. How do i take only the json string, just the json string only when there is an error in the next line but after '_____GP D_____' in the previous line?
2017-04-22T11:27:11+06:00 smth.com pgp: [16136]: INFO:modules.gp.helpers.parameter_getter:_____GP D_____
2017-04-22T11:27:11+06:00 smth.com pgp: [16136]: {'D': 't12', 'telephone': None, 'from_time': '2016-04-22 11:30', 'C': 'C12', 'to_time': '2016-04-22 11:40', 'email': None}
2017-04-22T11:27:11+06:00 smth.com pgp: [16136]: INFO:tornado.access:200 POST /gp/C (192.168.1.240) 15.77ms
2017-04-22T11:28:19+06:00 smth.com pgp: [16136]: INFO:modules.security.authentication:LOADING USER...
2017-04-22T11:28:19+06:00 smth.com pgp: [16136]: INFO:modules.gp.helpers.parameter_getter:_____GP D_____
2017-04-22T11:28:19+06:00 smth.com pgp: [16136]: {'D': 'testim12', 'telephone': None, 'from_time': '2017-04-20 17:30', 'C': 'CnGP13', 'to_time': '2017-04-22 21:40', 'email': None}
2017-04-22T11:28:19+06:00 smth.com pgp: [16136]: ERROR:modules.common.actionexception:ActionError: [{'from': 'time is already passed'}]
2017-04-22T11:28:19+06:00 smth.com pgp: [16136]: Traceback (most recent call last):
2017-04-22T11:28:19+06:00 smth.com pgp: [16136]: File "/app/src/modules/base/actions/base_action.py", line 96, in do_action
2017-04-22T11:28:19+06:00 smth.com pgp: [16136]: self._produce_response()
2017-04-22T11:28:19+06:00 smth.com pgp: [16136]: modules.common.actionexception.ActionValidationErr: []
for example from this log file i want
'{'D': 'testim12', 'telephone': None, 'from_time': '2017-04-20 17:30', 'C': 'CnGP13', 'to_time': '2017-04-22 21:40', 'email': None}'.
Only when i have an exception, ' ERROR:modules.common.actionexception:ActionError:' in the next line? how do i do it?
That's the same problem as the one you had yesterday, just with an additional check before selecting the line - for example checking if the next line contains
]: ERROR:
string:However, since your
found_line
would actually contain the whole line (including the timestamp), you need to first strip that out, and that all depends on how your logger is set. A reasonable way, based on your data is to skip the first39
characters (<date-time> smth.com pgp:
) and pick up everything after the next colon, assuming that the number in the following brackets can change (if not - you can just strip out the firstn
characters and be done with it):Beware, tho, that the 'error' check might fail if some of the logged data contains that exact pattern - if you want to hone in on it you can try using the similar technique that we use to lift the JSON out of the log line and check if it begins with
ERROR:
.You should also try doing things on your own instead of blindly copying code from SO - you won't learn much this way.
Using a generator function:
One of the interest of the generator is that it will catch an eventual stopIteration exception caused by
next()
(for example when there are less than 2 lines after the line with_G PD_
.) and stop.Note that this approach assumes that
_G PD_
lines are separated by at least two lines.You could use this, where the JSON string is in capture group 1
(?m)^.*?_____GP[ ]D_____.*\r?\n\s*^[^{\r\n]+(.+)\r?\n\s*^.*?ERROR:.*
https://regex101.com/r/UQ8gni/2
Explained