I have strings of the following form:
}# => 2[1 HMDB00001 ,2 HMDB00002]
}# => 5[1 HMDB00001 ,2 HMDB00002, 3 HMDB00003 ,4 HMDB00004,5 HMDB00005]
}# => 1[1 HMDB00001]
in a .txt file. I am trying to parse them in python lists using the re.search() with regular expressions, but so far unsuccessful. As u can guess the list should contain elements as follows elements = ["1 HMDB00001", "2 HMDB00002", "3 HMDB00003"]
. Lists are independent from each other. So, when parsing only one line can be taken in consideration (eg. }# => 2[1 HMDB00001 ,2 HMDB00002])
.
Use
re.findall
instead.See demo.https://regex101.com/r/eS7gD7/19#python
This seems to work, but its hard to tell for sure given your question. You may be able to piece together a solution from the answers you get.
Which outputs:
Assuming your pattern is exactly: one digit, one space,
HMDB
, 5 digits, in that order.Results are stored in a dict for each line.
If
HMDB
might differ, you can user'\d\s[a-zA-Z]{4}\d{5}'
.