import re
str="x8f8dL:s://www.qqq.zzz/iziv8ds8f8.dafidsao.dsfsi"
str2=re.match("[a-zA-Z]*//([a-zA-Z]*)",str)
print str2.group()
current result=> error
expected => wwwqqqzzz
I want to extract the string wwwqqqzzz
. How I do that?
Maybe there are a lot of dots, such as:
"whatever..s#$@.d.:af//wwww.xxx.yn.zsdfsd.asfds.f.ds.fsd.whatever/123.dfiid"
In this case, I basically want the stuff bounded by //
and /
. How do I achieve that?
One additional question:
import re
str="xxx.yyy.xxx:80"
m = re.search(r"([^:]*)", str)
str2=m.group(0)
print str2
str2=m.group(1)
print str2
Seems that m.group(0)
and m.group(1)
are the same.
match
tries to match the entire string. Use search
instead. The following pattern would then match your requirements:
m = re.search(r"//([^/]*)", str)
print m.group(1)
Basically, we are looking for /
, then consume as many non-slash characters as possible. And those non-slash characters will be captured in group number 1.
In fact, there is a slightly more advanced technique that does the same, but does not require capturing (which is generally time-consuming). It uses a so-called lookbehind:
m = re.search(r"(?<=//)[^/]*", str)
print m.group()
Lookarounds are not included in the actual match, hence the desired result.
This (or any other reasonable regex solution) will not remove the .
s immediately. But this can easily be done in a second step:
m = re.search(r"(?<=//)[^/]*", str)
host = m.group()
cleanedHost = host.replace(".", "")
That does not even require regular expressions.
Of course, if you want to remove everything except for letters and digits (e.g. to turn www.regular-expressions.info
into wwwregularexpressionsinfo
) then you are better off using the regex version of replace
:
cleanedHost = re.sub(r"[^a-zA-Z0-9]+", "", host)
print re.sub(r"[.]","",re.search(r"(?<=//).*?(?=/)",str).group(0))
See this demo.
output=re.findall("(?<=//)\w+.*(?=/)",str)
final=re.sub(r"[^a-zA-Z0-9]+", "", output [0])
print final
import re
str="x8f8dL:s://www.qqq.zzz/iziv8ds8f8.dafidsao.dsfsi"
re.findall('//([a-z.]*)', str)