Python string parsing from .txt

I have strings of the following form:

}# => 2[1 HMDB00001 ,2 HMDB00002]
}# => 5[1 HMDB00001 ,2 HMDB00002, 3 HMDB00003 ,4 HMDB00004,5 HMDB00005]
}# => 1[1 HMDB00001]

in a .txt file. I am trying to parse them in python lists using the re.search() with regular expressions, but so far unsuccessful. As u can guess the list should contain elements as follows elements = ["1 HMDB00001", "2 HMDB00002", "3 HMDB00003"]. Lists are independent from each other. So, when parsing only one line can be taken in consideration (eg. }# => 2[1 HMDB00001 ,2 HMDB00002]).

标签： python parsing

3条回答

劫难

2楼-- · 2019-08-27 17:59

(?<=[\[,])\s*(\d+ HMDB0+\d+)

Use re.findall instead.See demo.

https://regex101.com/r/eS7gD7/19#python

import re
p = re.compile(r'(?<=[\[,])\s*(\d+ HMDB0+\d+)', re.IGNORECASE | re.MULTILINE)
test_str = "}# => 2[1 HMDB00001 ,2 HMDB00002]\n}# => 5[1 HMDB00001 ,2 HMDB00002, 3 HMDB00003 ,4 HMDB00004,5 HMDB00005]\n}# => 1[1 HMDB00001]"

re.findall(p, test_str)

0人赞添加讨论(0) 举报

Melony?

3楼-- · 2019-08-27 18:05

This seems to work, but its hard to tell for sure given your question. You may be able to piece together a solution from the answers you get.

import re

strings = [
    '}# => 2[1 HMDB00001 ,2 HMDB00002]',
    '}# => 5[1 HMDB00001 ,2 HMDB00002, 3 HMDB00003 ,4 HMDB00004,5 HMDB00005]',
    '}# => 1[1 HMDB00001]',
]

for s in strings:
    mat = re.search(r'\[(.*)\]', s)
    elements = map(str.strip, mat.group(1).split(','))
    print elements

Which outputs:

['1 HMDB00001', '2 HMDB00002']
['1 HMDB00001', '2 HMDB00002', '3 HMDB00003', '4 HMDB00004', '5 HMDB00005']
['1 HMDB00001']

0人赞添加讨论(0) 举报

孤傲高冷的网名

4楼-- · 2019-08-27 18:12

Assuming your pattern is exactly: one digit, one space, HMDB, 5 digits, in that order.

Results are stored in a dict for each line.

import re

matches = {}
with open('my_text_file.txt', 'r') as f:
    for num, line in enumerate(f):
        matches.update({num: re.findall(r'\d\sHMDB\d{5}', line)})

print(matches)

If HMDB might differ, you can use r'\d\s[a-zA-Z]{4}\d{5}'.

0人赞添加讨论(0) 举报

Python string parsing from .txt

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间