Python regular expression split string into number

2019-05-25 04:07发布

I would like to split a string into sections of numbers and sections of text/symbols my current code doesn't include negative numbers or decimals, and behaves weirdly, adding an empty list element on the end of the output

import re
mystring = 'AD%5(6ag 0.33--9.5'
newlist = re.split('([0-9]+)', mystring)
print (newlist)

current output:

['AD%', '5', '(', '6', 'ag ', '0', '.', '33', '--', '9', '.', '5', '']

desired output:

['AD%', '5', '(', '6', 'ag ', '0.33', '-', '-9.5']

3条回答
Animai°情兽
2楼-- · 2019-05-25 04:31

As mentioned here before, there is no option to ignore the empty strings in re.split() but you can easily construct a new list the following way:

import re

mystring = "AD%5(6ag0.33--9.5"
newlist = [x for x in re.split('(-?\d+\.?\d*)', mystring) if x != '']
print newlist

output:

['AD%', '5', '(', '6', 'ag', '0.33', '-', '-9.5']
查看更多
Summer. ? 凉城
3楼-- · 2019-05-25 04:37

Unfortunately, re.split() does not offer an "ignore empty strings" option. However, to retrieve your numbers, you could easily use re.findall() with a different pattern:

import re

string = "AD%5(6ag0.33-9.5"
rx = re.compile(r'-?\d+(?:\.\d+)?')
numbers = rx.findall(string)

print(numbers)
# ['5', '6', '0.33', '-9.5']
查看更多
Evening l夕情丶
4楼-- · 2019-05-25 04:39

Your issue is related to the fact that your regex captures one or more digits and adds them to the resulting list and digits are used as a delimiter, the parts before and after are considered. So if there are digits at the end, the split results in the empty string at the end to be added to the resulting list.

You may split with a regex that matches float or integer numbers with an optional minus sign and then remove empty values:

result = re.split(r'(-?\d*\.?\d+)', s)
result = filter(None, result)

To match negative/positive numbers with exponents, use

r'([+-]?\d*\.?\d+(?:[eE][-+]?\d+)?)'

The -?\d*\.?\d+ regex matches:

  • -? - an optional minus
  • \d* - 0+ digits
  • \.? - an optional literal dot
  • \d+ - one or more digits.
查看更多
登录 后发表回答