I have the following code:
url = 'abcdc.com'
print(url.strip('.com'))
I expected: abcdc
I got: abcd
Now I do
url.rsplit('.com', 1)
Is there a better way?
I have the following code:
url = 'abcdc.com'
print(url.strip('.com'))
I expected: abcdc
I got: abcd
Now I do
url.rsplit('.com', 1)
Is there a better way?
strip
doesn't mean "remove this substring". x.strip(y)
treats y
as a set of characters and strips any characters in that set from the ends of x
.
Instead, you could use endswith
and slicing:
url = 'abcdc.com'
if url.endswith('.com'):
url = url[:-4]
Or using regular expressions:
import re
url = 'abcdc.com'
url = re.sub('\.com$', '', url)
If you are sure that the string only appears at the end, then the simplest way would be to use 'replace':
url = 'abcdc.com'
print url.replace('.com','')
def strip_end(text, suffix):
if not text.endswith(suffix):
return text
return text[:len(text)-len(suffix)]
Since it seems like nobody has pointed this on out yet:
url = "www.example.com"
new_url = url[:url.rfind(".")]
This should be more efficient than the methods using split()
as no new list object is created, and this solution works for strings with several dots.
Depends on what you know about your url and exactly what you're tryinh to do. If you know that it will always end in '.com' (or '.net' or '.org') then
url=url[:-4]
is the quickest solution. If it's a more general URLs then you're probably better of looking into the urlparse library that comes with python.
If you on the other hand you simply want to remove everything after the final '.' in a string then
url.rsplit('.',1)[0]
will work. Or if you want just want everything up to the first '.' then try
url.split('.',1)[0]
In one line:
text if not text.endswith(suffix) or len(suffix) == 0 else text[:-len(suffix)]
How about url[:-4]
?
If you know it's an extension, then
url = 'abcdc.com'
...
url.rsplit('.', 1)[0] # split at '.', starting from the right, maximum 1 split
This works equally well with abcdc.com
or www.abcdc.com
or abcdc.[anything]
and is more extensible.
For urls (as it seems to be a part of the topic by the given example), one can do something like this:
import os
url = 'http://www.stackoverflow.com'
name,ext = os.path.splitext(url)
print (name, ext)
#Or:
ext = '.'+url.split('.')[-1]
name = url[:-len(ext)]
print (name, ext)
Both will output:
('http://www.stackoverflow', '.com')
This can also be combined with str.endswith(suffix)
if you need to just split ".com", or anything specific.
url.rsplit('.com', 1)
is not quite right.
What you actually would need to write is
url.rsplit('.com', 1)[0]
, and it looks pretty succinct IMHO.
However, my personal preference is this option because it uses only one parameter:
url.rpartition('.com')[0]
import re
def rm_suffix(url = 'abcdc.com', suffix='\.com'):
return(re.sub(suffix+'$', '', url))
I want to repeat this answer as the most expressive way to do it. Of course, the following would take less CPU time
def rm_dotcom(url = 'abcdc.com'):
return(url[:-4] if url.endswith('.com') else url)
However, if CPU is the bottle neck why write in Python?
When is CPU a bottle neck anyway?? in drivers , maybe.
The advantages of using regular expression is code reusability. What if you next want to remove '.me' , which only has three characters?
Same code would do the trick.
>>> rm_sub('abcdc.me','.me')
'abcdc'
This is a perfect use for regular expressions:
>>> import re
>>> re.match(r"(.*)\.com", "hello.com").group(1)
'hello'
Or you can use split:
a = 'abccomputer.com'
res = a.split('.com',1)[0]
def remove_file_type(infile):
import re
return(re.sub('\.[^.]*$','',infile))
remove_file_type('abc.efg')'abc'
In my case I needed to raise an exception so I did:
class UnableToStripEnd(Exception):
"""A Exception type to indicate that the suffix cannot be removed from the text."""
@staticmethod
def get_exception(text, suffix):
return UnableToStripEnd("Could not find suffix ({0}) on text: {1}."
.format(suffix, text))
def strip_end(text, suffix):
"""Removes the end of a string. Otherwise fails."""
if not text.endswith(suffix):
raise UnableToStripEnd.get_exception(text, suffix)
return text[:len(text)-len(suffix)]
If you mean to strip only extension
url = 'abcdc.com'
print('.'.join(url.split('.')[:-1]))
It works with any extension, with potential other dots existing in filename as well. It simply splits string to list on dots and joins it without last element.
Probably not the fastest, but for me it's more readable than other methods.