可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
Let\'s say I have a string \'gfgfdAAA1234ZZZuijjk\'
and I want to extract just the \'1234\'
part.
I only know what will be the few characters directly before AAA
, and after ZZZ
the part I am interested in 1234
.
With sed
it is possible to do something like this with a string:
echo \"$STRING\" | sed -e \"s|.*AAA\\(.*\\)ZZZ.*|\\1|\"
And this will give me 1234
as a result.
How to do the same thing in Python?
回答1:
Using regular expressions - documentation for further reference
import re
text = \'gfgfdAAA1234ZZZuijjk\'
m = re.search(\'AAA(.+?)ZZZ\', text)
if m:
found = m.group(1)
# found: 1234
or:
import re
text = \'gfgfdAAA1234ZZZuijjk\'
try:
found = re.search(\'AAA(.+?)ZZZ\', text).group(1)
except AttributeError:
# AAA, ZZZ not found in the original string
found = \'\' # apply your error handling
# found: 1234
回答2:
>>> s = \'gfgfdAAA1234ZZZuijjk\'
>>> start = s.find(\'AAA\') + 3
>>> end = s.find(\'ZZZ\', start)
>>> s[start:end]
\'1234\'
Then you can use regexps with the re module as well, if you want, but that\'s not necessary in your case.
回答3:
regular expression
import re
re.search(r\"(?<=AAA).*?(?=ZZZ)\", your_text).group(0)
The above as-is will fail with an AttributeError
if there are no \"AAA\" and \"ZZZ\" in your_text
string methods
your_text.partition(\"AAA\")[2].partition(\"ZZZ\")[0]
The above will return an empty string if either \"AAA\" or \"ZZZ\" don\'t exist in your_text
.
PS Python Challenge?
回答4:
import re
print re.search(\'AAA(.*?)ZZZ\', \'gfgfdAAA1234ZZZuijjk\').group(1)
回答5:
You can use re module for that:
>>> import re
>>> re.compile(\".*AAA(.*)ZZZ.*\").match(\"gfgfdAAA1234ZZZuijjk\").groups()
(\'1234,)
回答6:
With sed it is possible to do something like this with a string:
echo \"$STRING\" | sed -e \"s|.*AAA\\(.*\\)ZZZ.*|\\1|\"
And this will give me 1234 as a result.
You could do the same with re.sub
function using the same regex.
>>> re.sub(r\'.*AAA(.*)ZZZ.*\', r\'\\1\', \'gfgfdAAA1234ZZZuijjk\')
\'1234\'
In basic sed, capturing group are represented by \\(..\\)
, but in python it was represented by (..)
.
回答7:
You can find first substring with this function in your code (by character index). Also, you can find what is after a substring.
def FindSubString(strText, strSubString, Offset=None):
try:
Start = strText.find(strSubString)
if Start == -1:
return -1 # Not Found
else:
if Offset == None:
Result = strText[Start+len(strSubString):]
elif Offset == 0:
return Start
else:
AfterSubString = Start+len(strSubString)
Result = strText[AfterSubString:AfterSubString + int(Offset)]
return Result
except:
return -1
# Example:
Text = \"Thanks for contributing an answer to Stack Overflow!\"
subText = \"to\"
print(\"Start of first substring in a text:\")
start = FindSubString(Text, subText, 0)
print(start); print(\"\")
print(\"Exact substring in a text:\")
print(Text[start:start+len(subText)]); print(\"\")
print(\"What is after substring \\\"%s\\\"?\" %(subText))
print(FindSubString(Text, subText))
# Your answer:
Text = \"gfgfdAAA1234ZZZuijjk\"
subText1 = \"AAA\"
subText2 = \"ZZZ\"
AfterText1 = FindSubString(Text, subText1, 0) + len(subText1)
BeforText2 = FindSubString(Text, subText2, 0)
print(\"\\nYour answer:\\n%s\" %(Text[AfterText1:BeforText2]))
回答8:
you can do using just one line of code
>>> import re
>>> re.findall(r\'\\d{1,5}\',\'gfgfdAAA1234ZZZuijjk\')
>>> [\'1234\']
result will receive list...
回答9:
Just in case somebody will have to do the same thing that I did. I had to extract everything inside parenthesis in a line. For example, if I have a line like \'US president (Barack Obama) met with ...\' and I want to get only \'Barack Obama\' this is solution:
regex = \'.*\\((.*?)\\).*\'
matches = re.search(regex, line)
line = matches.group(1) + \'\\n\'
I.e. you need to block parenthesis with slash \\
sign. Though it is a problem about more regular expressions that Python.
Also, in some cases you may see \'r\' symbols before regex definition. If there is no r prefix, you need to use escape characters like in C. Here is more discussion on that.
回答10:
In python, extracting substring form string can be done using findall
method in regular expression (re
) module.
>>> import re
>>> s = \'gfgfdAAA1234ZZZuijjk\'
>>> ss = re.findall(\'AAA(.+)ZZZ\', s)
>>> print ss
[\'1234\']
回答11:
>>> s = \'/tmp/10508.constantstring\'
>>> s.split(\'/tmp/\')[1].split(\'constantstring\')[0].strip(\'.\')
回答12:
One liners that return other string if there was no match.
Edit: improved version uses next
function, replace \"not-found\"
with something else if needed:
import re
res = next( (m.group(1) for m in [re.search(\"AAA(.*?)ZZZ\", \"gfgfdAAA1234ZZZuijjk\" ),] if m), \"not-found\" )
My other method to do this, less optimal, uses regex 2nd time, still didn\'t found a shorter way:
import re
res = ( ( re.search(\"AAA(.*?)ZZZ\", \"gfgfdAAA1234ZZZuijjk\") or re.search(\"()\",\"\") ).group(1) )
回答13:
Simple is better than complex
Also, you can extract numbers from any string if your target is finding numbers(integers).
>>> \'\'.join([n for n in \"gfgfdAAA1234ZZZuijjk\" if n.isdigit()])
>>> \'1234\'
In this way, you don\'t need to use \"re\"
module.