Python re.findall behaves weird

2018-12-31 00:16发布

The source string is:

# Python 3.4.3
s = r'abc123d, hello 3.1415926, this is my book'

and here is my pattern:

pattern = r'-?[0-9]+(\\.[0-9]*)?|-?\\.[0-9]+'

however, re.search can give me correct result:

m = re.search(pattern, s)
print(m)  # output: <_sre.SRE_Match object; span=(3, 6), match='123'>

re.findall just dump out an empty list:

L = re.findall(pattern, s)
print(L)  # output: ['', '', '']

why can't re.findall give me the expected list:

['123', '3.1415926']

标签: python regex
2条回答
大哥的爱人
2楼-- · 2018-12-31 01:00
s = r'abc123d, hello 3.1415926, this is my book'
print re.findall(r'-?[0-9]+(?:\.[0-9]*)?|-?\.[0-9]+',s)

You dont need to escape twice when you are using raw mode.

Output:['123', '3.1415926']

Also the return type will be a list of strings.If you want return type as integers and floats use map

import re,ast
s = r'abc123d, hello 3.1415926, this is my book'
print map(ast.literal_eval,re.findall(r'-?[0-9]+(?:\.[0-9]*)?|-?\.[0-9]+',s))

Output: [123, 3.1415926]

查看更多
与君花间醉酒
3楼-- · 2018-12-31 01:11

There are two things to note here:

  • re.findall returns captured texts if the regex pattern contains capturing groups in it
  • the r'\\.' part in your pattern matches two consecutive chars, \ and any char other than a newline.

See findall reference:

If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result unless they touch the beginning of another match.

Note that to make re.findall return just match values, you may usually

  • remove redundant capturing groups (e.g. (a(b)c) -> abc)
  • convert all capturing groups into non-capturing (that is, replace ( with (?:) unless there are backreferences that refer to the group values in the pattern (then see below)
  • use re.finditer instead ([x.group() for x in re.finditer(pattern, s)])

In your case, findall returned all captured texts that were empty because you have \\ within r'' string literal that tried to match a literal \.

To match the numbers, you need to use

-?\d*\.?\d+

The regex matches:

  • -? - Optional minus sign
  • \d* - Optional digits
  • \.? - Optional decimal separator
  • \d+ - 1 or more digits.

See demo

Here is IDEONE demo:

import re
s = r'abc123d, hello 3.1415926, this is my book'
pattern = r'-?\d*\.?\d+'
L = re.findall(pattern, s)
print(L)
查看更多
登录 后发表回答