Python parse from log file

2019-06-14 12:41发布

I have a big log file with huge volume. How do i take only the json string, just the json string only when there is an error in the next line but after '_____GP D_____' in the previous line?

2017-04-22T11:27:11+06:00 smth.com pgp: [16136]: INFO:modules.gp.helpers.parameter_getter:_____GP D_____
2017-04-22T11:27:11+06:00 smth.com pgp: [16136]: {'D': 't12', 'telephone': None, 'from_time': '2016-04-22 11:30', 'C': 'C12', 'to_time': '2016-04-22 11:40', 'email': None}
2017-04-22T11:27:11+06:00 smth.com pgp: [16136]: INFO:tornado.access:200 POST /gp/C (192.168.1.240) 15.77ms

2017-04-22T11:28:19+06:00 smth.com pgp: [16136]: INFO:modules.security.authentication:LOADING USER...
2017-04-22T11:28:19+06:00 smth.com pgp: [16136]: INFO:modules.gp.helpers.parameter_getter:_____GP D_____
2017-04-22T11:28:19+06:00 smth.com pgp: [16136]: {'D': 'testim12', 'telephone': None, 'from_time': '2017-04-20 17:30', 'C': 'CnGP13', 'to_time': '2017-04-22 21:40', 'email': None}
2017-04-22T11:28:19+06:00 smth.com pgp: [16136]: ERROR:modules.common.actionexception:ActionError: [{'from': 'time is already passed'}]
2017-04-22T11:28:19+06:00 smth.com pgp: [16136]: Traceback (most recent call last):
2017-04-22T11:28:19+06:00 smth.com pgp: [16136]:   File "/app/src/modules/base/actions/base_action.py", line 96, in do_action
2017-04-22T11:28:19+06:00 smth.com pgp: [16136]:     self._produce_response()
2017-04-22T11:28:19+06:00 smth.com pgp: [16136]: modules.common.actionexception.ActionValidationErr: []

for example from this log file i want

'{'D': 'testim12', 'telephone': None, 'from_time': '2017-04-20 17:30', 'C': 'CnGP13', 'to_time': '2017-04-22 21:40', 'email': None}'.

Only when i have an exception, ' ERROR:modules.common.actionexception:ActionError:' in the next line? how do i do it?

3条回答
做个烂人
2楼-- · 2019-06-14 12:47

That's the same problem as the one you had yesterday, just with an additional check before selecting the line - for example checking if the next line contains ]: ERROR: string:

found_line = None  # store for our matched line
with open("input.log", "r") as f:  # open your log file
    for line in f:  # read it line by line
        if line.rstrip()[-14:] == "_____GP D_____":  # if a line ends with our string...
            found_line = next(f).rstrip()  # grab the next line as our potential candidate
            if next(f).find("]: ERROR:") != -1:  # if the next line contains an error marker
                break  # match found, break out as we don't need to search any more...
            else:  # the next line wasn't an error...
                found_line = None  # ... reset the potential result and continue searching

However, since your found_line would actually contain the whole line (including the timestamp), you need to first strip that out, and that all depends on how your logger is set. A reasonable way, based on your data is to skip the first 39 characters (<date-time> smth.com pgp:) and pick up everything after the next colon, assuming that the number in the following brackets can change (if not - you can just strip out the first n characters and be done with it):

if found_line:
    found_line = found_line[found_line.find(":", 39) + 1:].strip()

Beware, tho, that the 'error' check might fail if some of the logged data contains that exact pattern - if you want to hone in on it you can try using the similar technique that we use to lift the JSON out of the log line and check if it begins with ERROR:.

You should also try doing things on your own instead of blindly copying code from SO - you won't learn much this way.

查看更多
混吃等死
3楼-- · 2019-06-14 12:47

Using a generator function:

def getjson (f):
    for line in filter(lambda x: '_GP D_' in x, f):
        line1 = next(f)
        line2 = next(f).split(' ', 4)
        if line2[4].startswith('ERROR'):
            yield line1.rstrip().split(' ', 4)[4]

with open('input.log', 'r') as f:
    for json in getjson(f):
        print(json)

One of the interest of the generator is that it will catch an eventual stopIteration exception caused by next() (for example when there are less than 2 lines after the line with _G PD_.) and stop.

Note that this approach assumes that _G PD_ lines are separated by at least two lines.

查看更多
老娘就宠你
4楼-- · 2019-06-14 12:50

You could use this, where the JSON string is in capture group 1

(?m)^.*?_____GP[ ]D_____.*\r?\n\s*^[^{\r\n]+(.+)\r?\n\s*^.*?ERROR:.*

https://regex101.com/r/UQ8gni/2

Explained

 (?m)                          # Modifiers: multi-line
 ^ .*? _____GP [ ] D_____ .*   # Line that starts error block
 \r? \n \s*                    # Required newline 
 ^ [^{\r\n]+                   # Up to start of Json string
 ( .+ )                        # (1), Json string
 \r? \n \s*                    # Required newline 
 ^ .*? ERROR: .*               # Line that ends error block
查看更多
登录 后发表回答